INTELLIGENT AUTONOMOUS SYSTEMS 9
This page intentionally left blank
Intelligent Autonomous Systems 9 IAS-9
Edited by
Tamio Arai The University of Tokyo, Japan
Rolf Pfeifer University of Zurich, Switzerland
Tucker Balch Georgia Institute of Technology, USA
and
Hiroshi Yokoi The University of Tokyo, Japan
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2006 The authors All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 1-58603-595-9 Library of Congress Control Number: 2006920171 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the UK and Ireland Gazelle Books Falcon House Queen Square Lancaster LA1 1RN United Kingdom fax: +44 1524 63232
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
v
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Preface The IAS-9 conference aims to address the main issues of concern within the IAS community. The conference covers both the applied as well as the theoretical aspects of intelligent autonomous systems. Autonomy and adaptivity are key aspects of truly intelligent artificial systems, dating from the first IAS conference in 1989. New directions of research have recently emerged from the synergetic interaction of many fields, such as cognitive science, operations research, mathematics, robotics, mechanics, electronics, informatics, and economics, interdisciplinary as well as transdisciplinarily. One key insight is that to realize both intelligence and autonomy, it is crucial to build real-world devices and abstract principles of design from them. The goal of IAS-9 is to lay out new scientific ideas and design principles for artificial systems able to survive in nature and in our society. The conference proceedings stimulate novel challenges as well as exciting research directions. A total of 146 scientific papers were submitted from 16 countries. All of the submitted papers were reviewed by the program committee, and 112 were accepted as full papers. We have 5 invited guest speakers at IAS-9: Andrew Adamatzky from the University of West England addresses the new direction of computation; Hod Lipson from Cornell University shows the frontier study of evolutionary robotics; Tomomasa Sato from The University of Tokyo presents the COE project of Japan for the real world application; Masahiro Fujita from SONY addresses the communication and service robotic system; and Shigeyuki Hosoe from RIKEN Bio-mimetic control research center shows the human analysis toward robotic learning. The conference takes place at Kashiwa new campus of the University of Tokyo, where frontier sciences are being created as “transdisciplinary” studies. A novel research center on artifacts, RACE, is also located on this campus with other three interdisciplinary research centers. I hope all participants of IAS-9 will enjoy the atmosphere of the campus and the facilities of the research building, and experience the novel trend of “transdisciplinary” studies in Japan. We sincerely appreciate the support of the Inoue Foundation of Science and Kayamori Foundation of Informational Science Advancement, the Robotics Society of Japan and the Research into Artifact Center of Engineering at the University of Tokyo. We would also like to express our gratitude to everybody of the program committee who contributed to the collection and the selection of high-level papers, and to the local committee members who supported the management of IAS-9. We look forward to seeing you at the conference site of IAS-9 in Tokyo. Tamio Arai, Rolf Pfeifer, Tucker Balch and Hiroshi Yokoi
vi
IAS-9 Conference Organization General Chair Tamio Arai, The Univ. of Tokyo, Japan Steering Committee Rüdiger Dillmann, Univ. of Karlsruhe, Germany Maria Gini, Univ. of Minnesota, USA Frans Groen, Univ. of Amsterdam, the Netherlands Thomas C. Henderson, University of Utah Yukinori Kakazu, Hokkaido Univ., Japan Enrico Pagello, Univ. of Padua and LADSEB-CNR, Italy Anthony Stentz, Carnegie Mellon Univ., USA Program Committee Co-Chairs In America: Tucker Balch, Georgia Institute of Technology, USA In Europe/Africa: Rolf Pfeifer, The Univ. of Zurich, Switzerland In Asia/Oceania: Hiroshi Yokoi, The Univ. of Tokyo, Japan Publicity Chair Kanji Ueda, Univ. of Tokyo, Japan Organized Session Chair Jun Ota, Univ. of Tokyo, Japan Max Lungarella, Univ. of Tokyo, Japan Local Organization Hajime Asama, Univ. of Tokyo, Japan Yusuke Maeda, Yokohama National Univ., Japan Masao Sugi, Univ. of Tokyo, Japan Ryosuke Chiba, Univ. of Tokyo, Japan Ryuichi Ueda, Univ. of Tokyo, Japan Program Committee Europe Fumiya Iida, The University of Zurich, Switzerland Miriam Fend, The University of Zurich, Switzerland Gabriel Gomez, The University of Zurich, Switzerland Giorgio Metta, University of Genoa, Italy Giulio Sandini, University of Genova, Italy Frank Pasemann, Frauenhofer Institute for Autonomous Intelligent Systems, Germany Tom Ziemke, University of Skovde, Sweden
vii
Noel Sharkey, University of Sheffield, England Owen Holland, University of Essex, England Barbara Webb, University of Edinburgh, Scottland Philippe Gaussier, Cergy Pontoise University, France Arnaud Revel, ENSEA, France Pierre-Yves Oudeyer, Sony Computer Science Lab., France Francesco Mondada, EPFL, Switzerland Roland Siegwart, EPFL, Switzerland Rüdiger Dillmann, University of Karlsruhe, Germany Sven Behnke, University of Freiburg, Germany Raoul Rojas, Freie Universitat Berlin, Germany Auke Ijspeert, EPFL, Switzerland Aude Billard, EPFL, Switzerland Andrew Adamatzky, University of West England, England Huosheng Hu, University of Essex, England Stefano Carpin, International University Bremen, Germany Daniele Nardi, University of Roma “La Sapienza”, Italy Alicia Casals, Technical University of Catalonia, Spain Ray Jarvis, Monash University, Australia Pedro Lima, Technical University of Lisbon, Portugal Enzo Mumolo, University of Trieste, Italy Carme Torras, Institut de Robotica i Informatica Industrial, Spain Ernesto Burattini, University of Naples “Federico II”, Italy Andrea Bonarini, Politecnico di Milano, Italy Gerhard K. Kraetzschmar, University of Applied Sciences Bonn-Rhein-Sieg, Germany Antonio D’Angelo, University of Udine, Italy Angel P. del Pobil, Universitat Jaume I, Spain America Paul Scerri, Carnegie Mellon University Tony Stentz, Carnegie Mellon University Omead Amidi, Carnegie Mellon University Doug MacKenzie, Mobile Intelligence Inc. Maria Gini, University of Minnesota Sven Koenig, University of Southern California Daniel Lee, University of Pennsylvania Devin Balkom, Dartmouth University Wes Huang, Rensselaer Polytechnic Institute Joelle Pineau, McGill University Tom Collins, Georgia Institute of Technology Eric Johnson, Georgia Institute of Technology Artur Arsenio, Massachusetts Institute of Technology Jose Carmena, UC Berkeley Darrin Bentivegna, ATR, Kyoto, Japan Josh Bongard, Cornell University Paul Fitzpatrick, Massachusetts Institute of Technology Steve Collins, University of Michigan
viii
Chandana Paul, Cornell University Chris Atkeson, Carnegie Mellon University Asia Tetusnari Inamura, The University of Tokyo, Japan Yoshihiko Nakamura, The University of Tokyo, Japan Yuichi Kobayashi, RIKEN BMC, Japan Hiroaki Yamaguchi, Musashi Institute of Technology, Japan Yoshihiro Miyake, Tokyo Institute of Technology, Japan Yasutake Takahashi, Osaka University, Japan Shinkichi Inagaki, Nagoya University, Japan Kousuke Inoue, Ibaraki University, Japan Wenwei Yu, Chiba University, Japan Yuko Ishiwaka, Hokkaido University, Japan Toshiyuki Kondo, Tokyo Institute of Technology, Japan Toshio Hori, AIST, Japan Mitsuo Wada, Hokkaido University, Japan Mihoko Otake, The University of Tokyo, Japan Masashi Furukawa, Asahikawa National College of Technology, Japan Koichi Osuka, Kobe University Koichi Nishiwaki, AIST, Japan Koh Hosoda, Osaka University Keiji Suzuki, Future University – Hakodate, Japan Kazuhiro Ohkura, Kobe University, Japan Takashi Kawakami, Hokkaido Institute of Technology, Japan Katsuyoshi Tsujita, Osaka Institute of Technology, Japan Jun Hakura, Iwate Prefectural University, Japan Kosei Ishimura, Hokkaido University, Japan Hiroshi Ishiguro, Osaka University, Japan Akio Ishiguro, Nagoya University, Japan
ix
Contents Preface Tamio Arai, Rolf Pfeifer, Tucker Balch and Hiroshi Yokoi IAS-9 Conference Organization
v vi
Papers of Invited Guest Speakers Reaction-Diffusion Intelligent Wetware Andrew Adamatzky Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality Hod Lipson, Josh Bongard, Victor Zykov and Evan Malone Real World Informatics Environment System Tomomasa Sato Understanding and Realization of Constrained Motion – Human Motion Analysis and Robotic Learning Approaches Shigeyuki Hosoe, Yuichi Kobayashi and Mikhail Svinin
3
11 19
30
Part 1. Navigation and Motion Planning The Specifiability Requirement in Mobile Robot Self-Localization Francesco Capezio, Antonio Sgorbissa and Renato Zaccaria
41
Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments Mattia Castelnovi, Antonio Sgorbissa and Renato Zaccaria
49
Autonomous Robot Vision System for Environment Recognition Woong-Jae Won, Sang-Woo Ban and Minho Lee
57
Multi-Resolution Field D* Dave Ferguson and Anthony Stentz
65
Path and Observation Planning of Vision-Based Mobile Robots with Multiple Sensing Strategies Mitsuaki Kayawake, Atsushi Yamashita and Toru Kaneko
75
Mobile Robot Motion Planning Considering Path Ambiguity of Moving Obstacles Hiroshi Koyasu and Jun Miura
85
Robotic Navigation Using Harmonic Functions and Finite Elements Santiago Garrido and Luis Moreno
94
x
The Hippocampal Place Cells and Fingerprints of Places: Spatial Representation Animals, Animats and Robots Adriana Tapus, Francesco Battaglia and Roland Siegwart
104
Incremental Reconstruction of Generalized Voronoi Diagrams on Grids Nidhi Kalra, Dave Ferguson and Anthony Stentz
114
Learning from Nature to Build Intelligent Autonomous Robots Rainer Bischoff and Volker Graefe
124
Part 2. Tracking Control & Active Vision A Laser Based Multi-Target Tracking for Mobile Robot Masafumi Hashimoto, Satoshi Ogata, Fuminori Oba and Takeshi Murayama Simultaneous Environment Mapping and Mobile Target Tracking Abedallatif Baba and Raja Chatila
135
Omnidirectional Active Vision for Evolutionary Car Driving Mototaka Suzuki, Jacob van der Blij and Dario Floreano
153
145
Part 3. Localization Map of Color Histograms for Robot Navigation Takanobu Kawabe, Tamio Arai, Yusuke Maeda and Toshio Moriya
165
Designing a System for Map-Based Localization in Dynamic Environments Fulvio Mastrogiovanni, Antonio Sgorbissa and Renato Zaccaria
173
Appearance-Based Localization of Mobile Robots Using Local Integral Invariants Hashem Tamimi, Alaa Halawani, Hans Burkhardt and Andreas Zell
181
Enhancing Self Covertness in a Hostile Environment from Expected Observers at Unknown Locations Mohamed Marzouqi and Ray Jarvis
189
Part 4. Multi-Agent Robots An Experimental Study of Distributed Robot Coordination Stefano Carpin and Enrico Pagello Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot Kenneth Payne, Jacob Everist, Feili Hou and Wei-Min Shen Mutual Localization and 3D Mapping by Cooperative Mobile Robots Julian Ryde and Huosheng Hu
199
207 217
xi
Part 5. Network Agent Systems Exploration of Complex Growth Mechanics of City Traffic Jam for the Adaptive Signal Control Kouhei Hamaoka and Mitsuo Wada Coordinated Control of Mobile Antennas for Ad-Hoc Networks in Cluttered Environments Gianluca Antonelli, Filippo Arrichiello, Stefano Chiaverini and Roberto Setola An Adaptive Behavior of Mobile Ad Hoc Network Agents Masao Kubo, Chau Dan, Hiroshi Sato and Takashi Matubara
227
235
243
Part 6. Evolution and Learning Learning the Cooperative Behaviors of Seesaw Balancing Agents – An Actor-Critic Aproach – Takashi Kawakami, Masahiro Kinoshita and Yukinori Kakazu
255
Evolutionary Reinforcement Learning for Simulated Locomotion of a Robot with a Two-Link Arm Yohannes Kassahun and Gerald Sommer
263
Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot Viktor Zhumatiy, Faustino Gomez, Marcus Hutter and Jürgen Schmidhuber
272
Transition Entropy in Partially Observable Markov Decision Processes Francisco S. Melo and Isabel Ribeiro
282
Movement Control of Tensegrity Robot Masaru Fujii, Shinichiro Yoshii and Yukinori Kakazu
290
An Adaptive Neural Controller for a Tendon Driven Robotic Hand Gabriel Gómez, Alejandro Hernandez and Peter Eggenberger Hotz
298
Self-Organizing Route Guidance Systems Based on Coevolution of Multi-Layered Guidance Vector Fields Yasuhiro Ohashi and Kosuke Sekiyama
308
Object Transportation Using Two Humanoid Robots Based on Multi-Agent Path Planning Algorithm Shotaro Kamio and Hitoshi Iba
318
Cognitive Map Plasticity and Imitation Strategies to Extend the Performance of a MAS P. Laroque, E. Fournier, P.H. Phong and P. Gaussier
326
Examination of Abilities Based on Pseudolite System for Indoor Positioning Isamu Kitano and Keiji Suzuki
334
xii
A Memory-Based PID Controller for Indoor Airship Robot Takamasa Sato and Keiji Suzuki
341
Co-Evolutionary Design for AGV Systems Ryosuke Chiba, Jun Ota and Tamio Arai
349
An Update Method of Computer Simulation for Evolutionary Robotics Yoshiaki Katada and Kazuhiro Ohkura
357
Vision-Based Teleoperation of a Mobile Robot with Visual Assistance Naoyuki Kubota, Daisuke Koudu, Shinichi Kamijima, Kazuhiko Taniguchi and Yasutsugu Nogawa
365
Part 7. Adaptation Adaptive Control Strategy for Micro/Nano Manipulation Systems Hwee Choo Liaw, Denny Oetomo, Bijan Shirinzadeh and Gursel Alici Smart Roadster Project: Setting up Drive-by-Wire or How to Remote-Control Your Car Joachim Schröder, Udo Müller and Rüdiger Dillmann
375
383
A Reactive Approach for Object Finding in Real World Environments Abdelbaki Bouguerra
391
A Geometric Approach for an Intuitive Perception System of Humanoids David Israel Gonzalez-Aguirre and Eduardo Jose Bayro-Corrochano
399
Autonomous Learning of a Topological Model in a Road Network Gabriel Aviña Cervantès and Michel Devy
408
Reinforcement Learning Performance Evaluation: An Evolutionary Approach Genci Capi and Masao Yokota
416
Quantify Distinguishability in Robotics Aurélien Hazan, Frédéric Davesne, Vincent Vigneron and Hichem Maaref
425
Group Transport Along a Robot Chain in a Self-Organised Robot Colony Shervin Nouyan, Roderich Groß, Marco Dorigo, Michael Bonani and Francesco Mondada
433
Growing Virtual Neural Tissue: Binding Spiking Neurons Through Sensory Input Pascal Kaufmann and Gabriel Gómez
443
Part 8. Emergent Synthesis Hormone-Inspired Adaptive Distributed Synchronization of Reconfigurable Robots Feili Hou and Wei-Min Shen
455
xiii
Spatial Prisoner’s Dilemma in a Network Environment Introducing Heterogeneous Information Distribution Hiroshi Kuraoka, Nobutada Fujii and Kanji Ueda Behavioral Decision for Multi-Agent Systems with Dynamic Interaction Yusuke Ikemoto and Toshio Fukuda
463 471
Co-Creative Composition Using Multiagent Learning: Toward the Emergence of Musical Structure Shintaro Suzuki, Takeshi Takenaka and Kanji Ueda
479
Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony Roderich Groß, Marco Dorigo and Masaki Yamakita
487
Lot Release Control Using Genetics Based Machine Learning in a Semiconductor Manufacturing System Ryohei Takasu, Nobutada Fujii, Kanji Ueda and Motohiro Kobayashi
497
Design of an AGV Transportation System by Considering Management Model in an ACT Satoshi Hoshino, Jun Ota, Akiko Shinozaki and Hideki Hashimoto
505
Analysis of Purchase Decision Making: Network Externalities and Asymmetric Information Yohei Kaneko, Nariaki Nishino, Sobei H. Oda and Kanji Ueda
515
Part 9. Dynamics, Morphology, and Materials in Intelligent Behavior Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling Simon Bovet
525
Active Learning of Local Structures from Attentive and Multi-Resolution Vision Maxime Cottret and Michel Devy
534
Modular Design of Home Service Robot System with Hierarchical Colored Petri Net Guohui Tian, Feng Duan and Tamio Arai
542
Auctions for Task Allocation to Robots Maitreyi Nanjanath and Maria Gini
550
Exploration of Natural Dynamics Through Resonance and Chaos Alex Pitti, Max Lungarella and Yasuo Kuniyoshi
558
One-Legged Locomotion with a Compliant Passive Joint Juergen Rummel, Fumiya Iida and Andre Seyfarth
566
Analysis of Dynamical Locomotion of Two-Link Locomotors Kojiro Matsushita, Hiroshi Yokoi and Tamio Arai
574
xiv
Part 10. Mobiligence A Modular Robot That Self-Assembles Akio Ishiguro, Hiroaki Matsuba, Tomoki Maegawa and Masahiro Shimizu
585
Autonomous Robust Execution of Complex Robotic Missions Paul Robertson, Robert Effinger and Brian Williams
595
Emergence of Small-World in Ad-Hoc Communication Network Among Individual Agents Daisuke Kurabayashi, Tomohiro Inoue, Akira Yajima and Tetsuro Funato
605
Parametric Path Planning for a Car-Like Robot Using CC Steers Weerakamhaeng Yossawee, Takashi Tsubouchi, Masamitsu Kurisu and Shigeru Sarata
613
Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol Chomchana Trevai, Jun Ota and Tamio Arai
622
Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System Using Communication Tomohisa Fujiki, Kuniaki Kawabata and Hajime Asama
632
From Mobility to Autopoiesis: Acquiring Environmental Information to Deliver Commands to the Effectors Antonio D’Angelo and Enrico Pagello
640
Part 11. RoboCup Cooperative Agent Behavior Based on Special Interaction Nets Oliver Zweigle, Reinhard Lafrenz, Thorsten Buchheim, Uwe-Philipp Käppeler, Hamid Rajaie, Frank Schreiber and Paul Levi A Framework for Advanced Robot Programming in the RoboCup Domain – Using Plug-In System and Scripting Language Hayato Kobayashi, Akira Ishino and Ayumi Shinohara Major Behavior Definition of Football Agents Through XML José Luis Vega, Ma. de los Ángeles Junco and Jorge Ramírez Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot Jörg Stückler, Johannes Schwenk and Sven Behnke
651
660 668
676
Ball Tracking with Velocity Based on Monte-Carlo Localization Jun Inoue, Akira Ishino and Ayumi Shinohara
686
Fast Vector Quantization for State-Action Map Compression Kazutaka Takeshita, Ryuichi Ueda and Tamio Arai
694
xv
Incremental Purposive Behavior Acquisition Based on Modular Learning System Tomoki Nishi, Yasutake Takahashi and Minoru Asada
702
Part 12. Real World Information Systems Simple Form Recognition Using Bayesian Programming Guy Ramel, Adriana Tapus, François Aspert and Roland Siegwart
713
Towards Robust State Estimation with Bayesian Networks: A New Perspective on Belief Propagation Jan Nunnink and Gregor Pavlin
722
HRP-2W: A Humanoid Platform for Research on Support Behavior in Daily Life Environments Tetsunari Inamura, Kei Okada, Masayuki Inaba and Hirochika Inoue
732
Human Supporting Production Cell “Attentive Workbench” Masao Sugi, Yusuke Tamura, Makoto Nikaido, Jun Ota, Tamio Arai, Kiyoshi Takamasu, Kiyoshi Kotani, Akio Yamamoto, Hiromasa Suzuki, Yoichi Sato, Fumihiko Kimura and Seiichi Shin
740
A Foveal 3D Laser Scanner Integrating Texture into Range Data Marcus Walther, Peter Steinhaus and Rüdiger Dillmann
748
Autonomous Collaborative Environment for Project Based Learning Mihoko Otake, Ryo Fukano, Shinji Sako, Masao Sugi, Kiyoshi Kotani, Junya Hayashi, Hiroshi Noguchi, Ryuichi Yoneda, Kenjiro Taura, Nobuyuki Otsu and Tomomasa Sato
756
Part 13. Humanoid Robots Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability Based on an Evaluation of the Muscle Loads Tomoaki Yoshikai, Yuto Nakanish, Ikuo Mizuuchi and Masayuki Inaba
767
Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation Kenji Asa, Kosei Ishimura and Mitsuo Wada
776
Tendon Arrangement Based on Joint Torque Requirements for a Reinforceable Musculo-Skeletal Humanoid Yuto Nakanishi, Ikuo Mizuuchi, Tomoaki Yoshikai, Tetsunari Inamura and Masayuki Inaba Vision-Based Virtual Information and Semi-Autonomous Behaviours for a Humanoid Robot Olivier Stasse, Jean Semere, Neo Ee Sian, Takashi Yoshimi and Kazuhito Yokoi
786
794
xvi
Load Distributed Whole-Body Motion Generation Method for Humanoids by Minimizing Average Joint Torque Ratio Ryusuke Adachi, Shigeru Kanzaki, Kei Okada and Masayuki Inaba Measurement and Simulation Verification of Reflexive Responses to Perturbation During Walking Shahed Sarwar, Wenwei Yu, Masaru Kumagai, Masaki Sekine, Tamotsu Katane, Toshiyo Tamura and Osami Saitou Toward a Human-Like Biped Robot with Compliant Legs Fumiya Iida, Yohei Minekawa, Juergen Rummel and Andre Seyfarth
804
812
820
Part 14. Service Robotics and Human Support Using JavaSpace for a PEIS Ecology Beom Su Seo, Mathias Broxvall, Marco Gritti, Alessandro Saffiotti and Jung Bae Kim A New Heating Method for the Actuation of the Shape Memory Alloy (SMA) Actuator Chee Siong Loh, Kojiro Matsushita, Hiroshi Yokoi and Tamio Arai
831
839
Informational Organization on Network Among Multiple Agents Yoshihito Shikanai, Koichi Ozaki and Sumio Yamamoto
847
A Flexible Task Knowledge Representation for Service Robots Steffen Knoop, Sven R. Schmidt-Rohr and Rüdiger Dillmann
856
Intelligent Autonomous Japanese Comic “MANGA” Designing Support System Yuko Ishiwaka and Yuka Kobayasi
865
Part 15. Human Behavior Analysis Behavior Induction by Geometric Relation Between Symbols of Multi-Sensory Pattern Naoki Kojo, Tetsunari Inamura and Masayuki Inaba Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure. Qualitative Assessment Using Arm Trajectory Profiles Yukio Horiguchi, Satoshi Tsukamoto, Hiroyuki Ono, Tetsuo Sawaragi and Masahiro Sato A Human Behavior Discrimination Method Based on Motion Trajectory Measurement for Indoor Guiding Services Hajime Asama, Atsushi Morimoto, Kuniaki Kawabata and Yasushi Hada
875
883
891
xvii
Part 16. Mutual Adaptation among man and machines Effects of Shared Communicational Modality to Joint Activity of Human Operator and Robot Autonomy Yukio Horiguchi and Tetsuo Sawaragi Learning of Object Identification by Robots Commanded by Natural Language Chandimal Jayawardena, Keigo Watanabe and Kiyotaka Izumi An f-MRI Study of an EMG Prosthetic Hand Biofeedback System Alejandro Hernández A., Hiroshi Yokoi, Takashi Ohnishi and Tamio Arai Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation Huiyu Zhou, Huosheng Hu and Nigel Harris
903 913 921
930
Learning and Control Model of Arm Posture K.S. Kim, H. Kambara, D. Shin, M. Sato and Y. Koike
938
Competitive Learning Method for Robust EMG-to-Motion Classifier Ryu Kato, Hiroshi Yokoi and Tamio Arai
946
Mutual Adaptation Among Man and Machine by Using f-MRI Analysis Hiroshi Yokoi, Alejandro Hernandez Arieta, Ryu Katoh, Takashi Ohnishi, Wenwei Yu and Tamio Arai
954
Part 17. Women in Robotics, Human Science and Technology Behavior Generation of Humanoid Robots Depending on Mood Kazuko Itoh, Hiroyasu Miwa, Yuko Nukariya, Massimiliano Zecca, Hideaki Takanobu, Stefano Roccella, Maria Chiara Carrozza, Paolo Dario and Atsuo Takanishi Construction of Human-Robot Cooperating System Based on Structure/Motion Model Fumi Seto, Yasuhisa Hirata and Kazuhiro Kosuge
965
973
Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms Keiko Homma
981
Analysis of Skill Acquisition Process – A Case Study of Arm Reaching Task – Kahori Kita, Ryu Kato, Hiroshi Yokoi and Tamio Arai
991
Generation of Size-Variable Image Template for Self-Position Estimation Considering Position Shift Kae Doki, Naohiro Isetani, Akihiro Torii and Akiteru Ueda Subjective Age Estimation Using Facial Images – the Effects of Gender, Expressions and Age Groups – Noriko Nagata, Naoyuki Miyamoto, Yumi Jinnouchi and Seiji Inokuchi
999
1007
xviii
Evaluation and Suppression of Overrun of Microorganisms Using Dynamics Model for Microrobotic Application Naoko Ogawa, Hiromasa Oku, Koichi Hashimoto and Masatoshi Ishikawa From Muscle to Brain: Modelling and Control of Functional Materials and Living Systems Mihoko Otake
1015
1025
Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs Maria Gini, Jan Pearce and Karen Sutherland
1033
Author Index
1041
Papers of Invited Guest Speakers
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
3
Reaction-Diffusion Intelligent Wetware Andrew Adamatzky Unconventional Computing Centre, Faculty of Computing, Engineering and Mathematical Sciences, University of the West of England, Bristol, United Kingdom
[email protected]
Abstract. We give an overview of recent results on implementation of computing, actions, and emotions in spatially extended reaction-diffusion chemical systems [1-3]. We pinpoint all essential ingredients of intelligence found in spatio-temporal dynamics of nonlinear chemical systems, and show outlines of future designs and prototypes of chemical intelligent ‘gooware’.
Reaction-diffusion chemical systems are well known for their unique ability to efficiently solve combinatorial problems with natural parallelism. In spatially distributed chemical processors, the data and the results of the computation are encoded as concentration profiles of the chemical species. The computation per se is performed via the spreading and interaction of wave fronts. The reaction-diffusion computers are parallel because the chemical medium’s micro-volumes update their states simultaneously, and molecules diffuse and react in parallel [1,3]. During the last decades a wide range of experimental prototypes of reaction-diffusion computing devices have been fabricated and applied to solve various problems of computer science, including: image processing, pattern recognition, path planning, robot navigation, computational geometry, logical gates in spatially distributed chemical media, arithmetical and memory units. These important ņ but scattered across many scientific fields ņ results convince us that reaction-diffusion systems can do a lot. Are they are capable enough to be intelligent? Yes, reaction-diffusion systems are smart ņ showing a state of readiness to respond, able to cope with difficult situations, capable for determining something by mathematical and logical methods ņ and endowed with capacity to reason. Sensing and Reacting Reaction-diffusion computers allow for massive parallel input if data. Equivalently, reaction-diffusion robots would need no dedicated sensors, each micro-volume of the medium, each site of the matrix gel, is sensitive to changes in one or another physical characteristic of the environment. Electric field, temperature and illumination are ‘sensed’ by reaction-diffusion devices, and these are three principle parameters in controlling and programming reaction-diffusion robots. Thus, for example, velocity of excitation waves in
4
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
Belousov-Zhabotinsky reactors is increased by a negative and decreased by a positive electric field, waves are splitted in two by high intensity electric field, wave-fronts can be (de)stabilised, spiral waves are generated by alternating fields, morphological diversity of generated patterns can be increased. Increasing temperature is known to drive space-time dynamics of the Belousov-Zhabotinsky reactor from periodic to chaotic oscillations. Light is a key to sophisticated control of excitable (light-sensitive) chemical media: applying light of varying intensity we can tune medium’s excitability, waves’ velocities, pattern formation. Coping with Difficult Tasks Hard computational problems of geometry, image processing and optimisation on graphs are resource-efficiently solved in reaction-diffusion media due to intrinsic natural parallelism of the problems [1,3]. Let us look at three examples. The Voronoi diagram is a subdivision of plane by data planar set. Each point of the data set is represented by a drop of a reagent. The reagent diffuses and produces a colour precipitate when reacting with the substrate. When two or more diffusive fronts of the ‘data’ chemical species meet, no precipitate is produced (due to concentration-dependent inhibition). Thus, uncoloured domains of the computing medium represent bisectors of the Voronoi diagram. A skeleton of a planar shape is computed in the similar manner. A contour of the shape is applied to computing substrate as a disturbance in reagent concentrations. The contour concentration profile induces diffusive waves. A reagent diffusing from the data-contour reacts with the substrate and the precipitate is formed. Precipitate is not produced at the sites of diffusive waves’ collision. The uncoloured domains correspond to the skeleton of the data shape. To compute a collision-free shortest path in a space with obstacles, we can couple two reaction-diffusion media. Obstacles are represented by local disturbances of concentration profiles in one of the media. The disturbances induce circular waves travelling in the medium and approximating a scalar distance-to-obstacle field. This field is mapped onto the second medium, which calculates a tree of ‘many-sources-one-destination’ shortest paths by spreading wave-fronts [3]. Logical reasoning Most known so far experimental prototypes of reaction-diffusion processors exploit interaction of wave fronts in a geometrically constrained chemical medium, i.e. the computation is based on a stationary architecture of medium’s inhomogeneities. Constrained by a stationary wires and gates RD chemical universal processors pose a little computational novelty and none dynamical reconfiguration ability because they simply imitate architectures of conventional silicon computing devices. To appreciate in full massive-parallelism of thin-layer chemical media and to free the chemical processors from limitations of fixed computing architectures we adopt an unconventional paradigm of architecture-less, or collision-based, computing. A non-linear medium processor can be either specialised or general-purpose (universal). A specialized processor is built to solve only one particular problem, possibly
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
5
with different data sets and variations in interpretations of results. Specialised computing devices are quite handy when we deal with image processing, problems of mathematical morphology, or computation on graphs [1]. A device is called computation universal if it computes any logical function. Therefore to prove a medium's universality we must represent quanta of information, routes of information transmission and logical gates, where information quanta are processes, in states of the given system. We could highlight two types of computational universality. They can be generally named architecture-based, or stationary, and architecture-less, or collision-based, types of a universal computation. An architecture-based, or stationary, computation implies that a logical circuit is embedded into the system in such a manner that all elements of the circuit are represented by the system's stationary states. The architecture is static. If there is any kind of 'artificial' or 'natural' compartmentalisation the medium is classified as an architecture-based computing device. Personal computers, living neural networks, cells, and networks of chemical reactors are typical examples of architecture-based computers.
Figure 1. Implementation of simple logical gate in numerical model of sub-excitable Belousov-Zhabotinsky medium. Boolean variables X and Y are represented by wave-fragments travelling west and east, respectively. In this particular example, both variables take value Truth. When wave-fragments collide, they change their trajectories. Thus signals following new trajectories represent logical conjunction X AND Y. If one of the wave-fragments were absent, then another fragment would travel undisturbed and its undisturbed trajectory represent operation NOT X AND Y or X AND NOT Y, respectively.
A collision-based, or dynamical, computation employs mobile compact finite patterns, mobile self-localized excitations or simply localisations, in active non-linear medium (Fig. 1). The localisations travel in space and do computation when they collide with each other. Essentials of collision-based computing are following. Information values (e.g. truth values of logical variables) are given by either absence or presence of the localisations or other parameters of the localisations. The localisations travel in space and do computation when they collide with each other. There are no predetermined stationary wires (a trajectory of the travelling pattern is a momentarily wire). Almost any part of the medium space can be used as a wire. Localisations can collide anywhere within a space sample, there are no fixed positions at which specific operations occur, nor location specified gates with fixed operations. The localisations undergo transformations, form bound states, annihilate or fuse when they interact with other mobile patterns. Information values of localisations are transformed in result of collision and thus a computation is implemented [1,3].
6
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
Chemical Emotions Sensing, computing and reasoning are essential but not sufficient components of intelligence: emotions are the prime ingredient which turns the brew of computing into actions of apprehension, cooperation, creation and synergy. In reaction-diffusion systems, an affective mixture is a theoretical construct which represents emotions spreading and reacting one with another as a massive pool of locally interacting entities. Molecules of the affective mixture correspond to basic emotions ņ happiness, anger, fear, confusion and sadness ņ which diffuse and react with each other by quasi-chemical laws (Fig. 2).
Figure 2. Space-time development of one-dimensional chemical reactor mimicking evolution of emotional pool with happiness, anger and confusion [2].
In computational experiments with affective solutions, in well-stirred and thin-layer reactors, we uncover and classify varieties of behavioural modes and regimes determined by particulars of emotion interactions. We demonstrate that a paradigm of affective solutions offers an unconventional but promising technique for designing, analyzing and interpreting integral and spatio-temporal dynamics of emotional developments in artificial chemical minds [2]. Locomotion and manipulation We have demonstrated that reaction-diffusion chemical systems can solve complex problems, do logical reasoning and express emotions. Now we want the chemical reactors to stop minding their own business but act: move and manipulate. The physico-chemical artefacts are known to be capable for sensible motion. Most famous are Belousov-Zhabotinsky vesicles, self-propulsive chemo-sensitive drops and ciliary arrays (Yoshikawa Lab, Kyoto). Their motion is directional but somewhere lacks sophisticated control mechanisms.
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
7
In computational experiments we established that two reagents are enough to produce a sustained amoeba-like pattern ņ compact domain of wave-like dynamics enclosed in undulating membrane ņ in reaction-diffusion system (Fig. 3). Such types of proto-cell like patterns are sensitive to stimulation of their membranes, which can provoke their movement.
Figure 3. Reaction-diffusion proto-cell emerged in cellular automaton (binary cell state set, eight-cell neighbourhood) model of chemical system. Each cell of the automaton takes two states, 0 (white pixel) and 1 (black pixel). A cell in state 0 (1) takes state 1 (0) if it has from 4 to 7 neighbours in state 1; otherwise the cell does not change its state.
At present stage of reaction-diffusion intelligence research it seems to be difficult to provide effective solutions for experimental prototyping of combined sensing, decisionmaking and actuating. However as a proof-of-concept we can always consider hybrid ‘wetware+hardware’ systems. For example, to fabricate a chemical controller for robot, we place a reactor with Belousov-Zhanotinsky solution onboard of a wheeled robot (Fig. 4), and allow the robot to observer excitation wave dynamics in the reactor. When the medium is stimulated at one point, target waves are formed. The robot becomes aware of the direction towards source of stimulation from the topology of the wave-fronts [3].
Figure 4. The robot controlled by liquid-phase Belousov-Zhabotinsky medium [3].
8
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
A set of remarkable experiments were undertaken by Hiroshi Yokoi and Ben De Lacy Costello. They built interface between robotic hand and Belousov-Zhabotinsky chemical reactor (Fig. 5).
Figure 5. Closed-loop system of robotic hand interacting with Belousov-Zhabotinsky medium [3,4]. Photograph of Prof. Yokoi and Dr. De Lacy Costello experimental setup.
Excitation waves propagating in the reactor were sensed by photo-diodes, which triggered finger motion. When bending fingers touched the chemical medium with their glass nails filled with colloid silver, which triggered circular waves in the medium [3]. Starting from any initial configuration, the chemical-robotic system does always reach a coherent activity mode, where fingers move in regular, somewhat melodic patterns, and few generators of target waves govern dynamics of excitation in the reactor [4].
Figure 6. Transporting rectangular object in Belousov-Zhabotinsky chemical medium [5].
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
9
Massively parallel smart manipulation is yet another hot topic of reaction-diffusion intelligence. This was demonstrated in our recent experiments on manipulating simple objects in a simulated massive-parallel array of actuators controlled by experimental excitation dynamics in a Belousov-Zhabotinsky chemical medium [5]. We assumed that manipulated objects should be pushed by travelling excitation wave-fronts. Therefore we encoded snapshots of gel-based Belousov-Zhabotinky media and converted them into arrays of local force vectors in such a manner that every pixel of the medium’s snapshot got its own force vector, the orientation of which was determined by the gradient of concentrations of chemical species in the medium in the neighbourhood of the pixel. We have shown that various types of excitation dynamics ņ target waves, spiral waves, wave-fragments ņ are capable of transportation and manipulation of small objects and larger, spatially extended objects, planar shapes (Fig. 6). Conclusion In this sketchy paper we provided basic paradigms and facts which will lead us to design and fabrication, and mass-production, of shapeless and sluggish but highly intelligent creatures. These artificial reaction-diffusion wet-bots will act as computers, transporters, builders, and, ultimately, parts of our body. The wet-bots will fill our life with care, understanding, gentleness and love (Fig. 7).
Figure 7. The slogan on the first poster of “The Blob” (1956) reflects our vision of future reaction-diffusion intelligence.
10
A. Adamatzky / Reaction-Diffusion Intelligent Wetware
When things can go wrong? The things could not possibly go wrong unless we step with both feet in the trap of conventional architectures! The field of reaction-diffusion computing started twenty years ago [6] as a sub-field of physics and chemistry dealing with image processing operations in uniform thin-layer excitable chemical media. The basic idea was to apply input data as two-dimensional profile of heterogeneous illumination, then allow excitation waves spread and interact with each, and then optically record result of the computation. The first even reaction-diffusion computers were already massively parallel, with parallel optical input and outputs. Later computer scientists came to the field, and started to exploit traditional techniques ņ wires were implemented by channels where wave-pulses travel, and specifically shaped junctions acted as logical valves. In this manner, most ‘famous’ chemical computing devices were implemented, including Boolean gates, coincidence detectors, memory units and more. The upmost idea of reaction-diffusion computation was if not ruined then forced into cul-de-sac of non-classical computation. The breakthrough happened when paradigms and solutions from the field of dynamical, collision-based computing and conservative logic, were mapped onto realms of spatiallyextended chemical systems. The computers became uniform and homogeneous again.
Refererence [1] Adamatzky A. Computing in Nonlinear Media and Automata Collectives (IoP Publishing, 2001). [2] Adamatzky A. Dynamics of Crowd-Minds (World Scientific, 2005). [3] Adamatzky A., De Lacy Costello B., Asai T. Reaction-Diffusion Computers (Elsevier, 2005). [4] Yokoi H., Adamatzky A., De Lacy Costello B., Melhuish C. Excitable chemical medium controlled for a robotic hand: closed loop experiments, Int. J. Bifurcation and Chaos (2004). [5] Adamatzky A., De Lacy Costello B., Skachek S., Melhuish C. Manipulating objects with chemical waves: Open loop case of experimental Belousiv-Zhabotinsky medium. Phys. Lett. A (2006). [6] Kuhnert L. A new photochemical memory device in a light sensitive active medium. Nature 319 (1986) 393.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
11
Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality Hod Lipson, Josh Bongard, Victor Zykov, Evan Malone Computational Synthesis Lab Cornell University, Ithaca NY 14853, USA
Abstract. This talk will outline challenges and opportunities in translating evolutionary learning of autonomous robotics from simulation to reality. It covers evolution and adaptation of both morphology and control, hybrid co-evolution of reality and simulation, handling noise and uncertainty, and morphological adaptation in hardware.
Keywords. Evolutionary robotics, co-evolutionary learning, estimation-exploration, rapid prototyping.
Introduction The idea that machine learning processes inspired by biological evolution can be used to design autonomous machines, has its roots in the early days of evolutionary computation and has been implemented numerous times, starting with the seminal work of Sims [9]. Nevertheless, the transition of evolutionary robotics from simulation to reality has been met with many challenges, as is evident from the relatively few examples of successful implementations of these methods in physical reality. Though many robotic experiments are carried out in simulation, a robot must ultimately function in physical reality. Consider the problem of evolving controllers for a dynamical, legged robot, shown in Figure 1 [13]. The nine-legged machine composed of two Stewart platforms back to back. The platforms are powered by twelve pneumatic linear actuators, with power coming from an onboard 4500psi paintball canister. While most robotic systems are use position-controlled actuators whose exact extension can be set, pneumatic actuators of the kind used here are force-controlled. Like biological muscle, the controller can specify the force and duration of the actuation, but not the position. It is therefore a challenging control problem. The controller architecture for this machine was an open-loop pattern generator that determines when to open and close pneumatic valves. The on-off pattern was evolved; Candidate controllers were evaluated by trying them out on the robot in a cage, and measuring fitness using a camera that tracks the red ball on the foot of one of the legs of the machine (see inset in Figure 1b for a view from the camera). Snapshots from one of the best evolved gates are
12
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
shown in Figure 1c. Nolfi and Floreano [8] describe many other interesting hardware experiments evolving controllers for wheeled robots
(a)
(b)
(c) Figure 1: Evolving a controller for physical dynamic legged machine: (a) The nine-legged machine is powered by twelve pneumatic linear actuators arranged in two Stewart platforms. The controller for this machine is an open-loop pattern generator that determines when to open and close pneumatic valves. (b) Candidate controllers are evaluated by trying them out on the robot in a cage, and measuring fitness using a camera that tracks the red foot (see inset). (c) Snapshots from one of the best evolved gates. From [13].
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
13
While evolution successfully generated viable gaits in this case, applying evolutionary processes to physical machines is difficult for two reasons. First, even if we are only evolving controllers for a machine with a fixed morphology, each evaluation of a candidate controller involves trying it out in reality. This is a slow and costly process that also wears out the target system. Performing thousand of evaluations is usually impractical. Second, if we are evolving morphology as well, then how would these morphological changes take place in reality? Changes to the controller can be done simply by reprogramming, but changes to the morphology require more sophisticated processes. Nature has some interesting solutions to this problem, such as growing materials, or self-assembling and self-replicating basic building blocks like cells.
1. Evolving controllers for physical morphologies One approach to evolving controllers for fixed morphologies is to make a simulator with such fidelity that whatever works in simulation will also work in reality equally well. This is possible only for some types of locomotion, such as quasi-static kinematics that can be accurately predicted [6][4]. Figure 2a shows some of the machines that evolved for quasistatic locomotion in simulation; these machines were “copied” from simulation into reality using rapid-prototyping technology (Figure 2b) where they functioned in a way similar to their simulation. Unfortunately, however, it is unlikely that a similarly predictive dynamic simulator would exist, given that machine dynamics are inherently chaotic and sensitive to initial conditions and many small parameter variations. But even if such simulators existed, creating accurate models would be painstakingly difficult, or may be impossible because the target environment is unknown.
Figure 2: Evolving bodies and brains: (a) Three evolved robots, in simulation (b) the three robots reproduced in physical reality using rapid prototyping. From [6].
14
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
An alternative approach to “crossing the reality gap” is to use a crude simulator that captures the salient features of the search space. Techniques have been developed for creating such simulators and using noise to cover uncertainties so that the evolved controllers do not exploit these uncertainties [5]. Yet another approach is to use plasticity in the controller: Allow the robot to learn and adapt in reality. In nature, animals are born with mostly predetermined bodies and brains, but these have some ability to learn and make final adaptations to whatever actual conditions may arise. A third approach is to co-evolve simulators so that they are increasingly predictive. Just as we use evolution to design a controller, we can use evolution to design the simulator so that it captures the important properties of the target environment. Assume we have a rough simulator of the target morphology, and we use it to evolve controllers in simulation. We then take the best controller and try it – once – on the target system. If successful, we are done; but if the controller did not produce the anticipated result (as is likely to happen since the initial simulator was crude), then we observed some unexpected sensory data. We then evolve a new set of simulators, whose fitness is their ability to reproduce the actual observed behavior when the original controller is tested on them. Simulators that correctly reproduce the observed data are more likely to be predictive in the future. We then take the best simulator, and use to evolve a new controller, and the cycle repeats: If the controller works in reality, we are done. If it does not work as expected, we now have more data to evolve better simulators, and so forth. The co-evolution of controllers and simulators is not necessarily computationally efficient, but it dramatically reduces the number of trials necessary on the target system. The co-evolutionary process consists of two phases: Evolving the controller (or whatever we are trying to modify on the target system) – we call this the exploration phase. The second phase tries to create a simulator, or model of the system – we call this the estimation phase. To illustrate the estimation-exploration process, consider a target robot with some unknown, but critical, morphological parameters, such as mass distribution and sensory lag times. Fifty independent runs of the algorithm were conducted against the target robot. Figure 3a shows the 50 series of 20 best simulator modifications output after each pass through the estimation phase. Figure 3a makes clear that for all 50 runs, the algorithm was better able to infer the time lags of the eight sensors than the mass increases of the nine body parts. This is not surprising in that the sensors themselves provide feedback about the robot. In other words, the algorithm automatically, and after only a few target trials, deduces the correct time lags of the target robot's sensors, but is less successful at indirectly inferring the masses of the body parts using the sensor data. Convergence toward the correct mass distribution can also be observed. But even with an approximate description of the robot's mass distribution, the simulator is improved enough to allow smooth transfer of controllers from simulation to the target robot. Using the default, approximate simulation, there is a complete failure of transferal: the target robot simply moves randomly, and achieves no appreciable forward locomotion. It is interesting to note that the evolved simulators are not perfect; they capture well only those aspects of the world that are important for accomplishing the task. The exploration-estimation approach can be used for much more than transferring controllers to robots – it could be used by the robot itself to estimate its own structure. This
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
15
would be particularly useful if the robot may undergo some damage that changes some of its morphology in unexpected ways, or some aspect in its environment changes. As each controller action is taken, the actual sensory data is compared to that predicted by the simulator, and new internal simulators are evolved to be more predictive. These new simulators are then used to try out new, adapted controllers for the new and unexpected circumstances. Figure 3b shows some results applying this process to design controllers for a robot which undergoes various types of drastic morphological damage, like loosing a leg, motor, or sensor, or combinations of these. In most cases, the estimation-exploration process is able to reconstruct a new simulator that captures the actual damage using only 4-5 trials on the target robot, and then use the adapted simulator to evolve compensatory controllers that recover most of the original functionality. There are numerous applications to this identification and control process in other fields.
(a)
(c)
(b) Figure 3: Co-evolving robots and simulators: (a) Convergence toward the physical characteristics of the target robot. Each pass through the estimation phase produces a set of mass changes for each of the nine body parts of the robot (top row) and a set of time lags for each of the eight sensors (bottom row). The open circles indicate the actual differences between the target robot and the starting default simulated robot [1]. (b) Three typical damage recoveries. a: The evolutionary progress of the four sequential runs of the exploration EA on the quadrupedal robot, when it undergoes a failure of one of its touch sensors. The hypotheses generated by the three runs of the estimation EA (all of which are correct) are shown. The dots indicate the fitness of the best controller from each generation of the exploration EA. The triangle shows the fitness of the first evolved controller on the target robot (the behavior of the ‘physical’ robot with this controller is shown in b); the filled circle shows the fitness of the robot after the damage occurs (the behavior is shown in c); the squares indicate the fitness of the ‘physical’ robot for each of the three subsequent hardware trials (the behavior of the ‘physical’ robot during the third trial is shown in d). e-h The recovery of the quadrupedal robot when it experiences unanticipated damage. i-l The recovery of the hexapedal robot when it experiences severe, compound damage. The trajectories in b-d, f-h and j-l show the change in the
16
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
robot’s center of mass over time (the trajectories are displaced upwards for clarity) [2]. (c) The simulator progressively learns the entire robot morphology from scratch. Panels (a-g) are progressive intermediate selfinference stages, panel (h) is the true target system [3].
2. Making morphological changes in hardware An evolutionary process may require a change of morphology, or production of a new physical morphology altogether. One approach for generating new morphology is to use reconfigurable robots [12]. Reconfigurable robots are composed of many modules that can be connected, disconnected and rearranged in various topologies to create machines with variable body plans. Self-reconfigurable robots are able to rearrange their own morphology, and thus adapt in physical reality. Figure 4a shows one example of a self-reconfiguring robot composed of eight identical cubes [14]. Each cube can swivel around its (1,1,1) axis, and connect and disconnect to other cubes using electromagnets on its faces. Though this robot contains only 8 units, it is conceivable that future machine will be composed of hundreds and thousands of modules of smaller modules, allowing much greater control and flexibility in morphological change.
Figure 4: Transferring morphological changes into reality (a) Reconfigurable molecube robots [14], (b) Rapid prototyping, (c) Future rapid prototyping systems will allow deposition of multiple integrated materials, such as elastomers, conductive wires, batteries and actuators, offering evolution a larger design space of integrated structures, actuators and sensors, not unlike biological tissue. From [7].
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
17
An alternative approach to varying morphology is to produce the entire robot morphology automatically. For example, the robots shown in Figure 2b were produced using rapid prototyping equipment: These are 3D printers, that deposit material layer by layer to gradually build up a solid object of arbitrary geometry, as shown in Figure 4b. This “printer”, when coupled to an evolutionary design process, can produce complex geometries that are difficult to produce any other way, and thus allow the evolutionary search much greater design flexibility. But even when using such automated fabrication equipment we needed to manually insert the wires, logic, batteries and actuators. What if the printer could print these components too? Future rapid prototyping systems may allow deposition of multiple integrated materials, such as elastomers, conductive wires, batteries and actuators, offering evolution an even larger design space of integrated structures, actuators and sensors, not unlike biological tissue. Figure 4c shows some of these printed components [7].
Figure 5: Macro-scale physical models of stochastic self-assembly. (a) Stochastic self-assembly and self reconfiguration of 10-cm scale modules on an oscillating air table: Top: units with electromagnets; Bottom: Units with swiveling permanent magnets [10]. (b) Three dimensional stochastic self assembly and reconfiguration of 10cm cubes in oil [11].
Looking at biology, one would ultimately like to emulate ‘growing structures’ – structures that can actively move material from one place to another, adapting to needs in situ. As we move to smaller and smaller scales, however, deterministically moving material becomes increasingly difficult. An interesting alternative is to exploit the random ‘Brownian’ motion of the particles in the environment to assist in stochastic self assembly. Figure 5 shows some macro-scale prototypes of such stochastically reconfiguring systems, both in 2D and in 3D. Implementation of such systems at the micro scale, using many thousands of units, entails many physical as well as computational challenges, involving local actuation, sensing, and control.
18
H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality
3. Conclusions The transition of evolutionary robotics methods from simulation to reality has met several hurdles: Besides the scaling limits of evolutionary computation itself, we are confronted with the limits of simulation and modeling, the cost, time and risk of training machines in reality, and the technical challenge of adapting morphology in reality. Again we have resorted to inspiration from biology: Better ways to design adaptive simulations, better ways to determine the most useful physical evaluation, and new ways to adapt physical morphology through automatic reconfiguration and material growth, all leading to new ideas for engineering and new engineering insights into biology.
4. References [1]
Bongard J. C., Lipson H., (2004) “Once More Unto the Breach: Automated Tuning of Robot Simulation using an Inverse Evolutionary Algorithm”, Proceedings of the Ninth Int. Conference on Artificial Life (ALIFE IX), pp.57-62
[2]
Bongard J., Lipson H. (2004), “Automated Damage Diagnosis and Recovery for Remote Robotics”, IEEE International Conference on Robotics and Automation (ICRA04), pp. 3545-3550
[3]
Bongard, J. and Lipson, H. (2004) “Integrated Design, Deployment and Inference for Robot Ecologies”, Proceedings of Robosphere 2004, November 2004, NASA Ames Research Center, CA USA
[4]
Hornby G.S., Lipson H., Pollack. J.B., 2003 “Generative Encodings for the Automated Design of Modular Physical Robots”, IEEE Transactions on Robotics and Automation, Vol. 19 No. 4, pp 703-719
[5]
Jakobi, N. (1997). Evolutionary robotics and the radical envelope of noise hypothesis. Adaptive Behavior, 6(1):131–174.
[6]
Lipson H., Pollack J. B. (2000) Automatic design and manufacture of artificial lifeforms. Nature, 406:974–978
[7]
Malone E., Lipson H., (2004) “Functional Freeform Fabrication for Physical Artificial Life”, Ninth Int. Conference on Artificial Life (ALIFE IX), Proceedings of the Ninth Int. Conference on Artificial Life (ALIFE IX), pp.100-105
[8]
Nolfi S., Floreano D. (2004), Evolutionary Robotics: The Biology, Intelligence, and Technology of SelfOrganizing Machines, Bradford Books
[9]
Sims K. “Evolving 3D morphology and behaviour by competition”. Artificial Life IV, pages 28–39, 1994.
[10] White P. J., Kopanski K., Lipson H. (2004), “Stochastic Self-Reconfigurable Cellular Robotics”, IEEE International Conference on Robotics and Automation (ICRA04), pp. 2888-2893 [11] White P., Zykov V., Bongard J., Lipson H. (2005) Three Dimensional Stochastic Reconfiguration of Modular Robots, Proceedings of Robotics Science and Systems, MIT, Cambridge MA, June 8-10, 2005 [12] Yim, M., Zhang, Y. and Duff, D., "Modular Reconfigurable Robots, Machines that shift their shape to suit the task at hand," IEEE Spectrum Magazine cover article, Feb. 2002 [13] Zykov V., Bongard J., Lipson H., (2004) "Evolving Dynamic Gaits on a Physical Robot", Proceedings of Genetic and Evolutionary Computation Conference, Late Breaking Paper, GECCO'04 [14] Zykov V., Mytilinaios E., Adams B., Lipson H. (2005) "Self-reproducing machines", Nature Vol. 435 No. 7038, pp. 163-164
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
19
Real World Informatics Environment System Tomomasa SATO1 Department of Mechano-Informatics Graduate School of Information Science and Technology The University of Tokyo
Abstract. This paper proposes a real world informatics environment system realized by multiple human behavior support cockpits (HBSC’s). The human behavior support cockpit (HBSC’s) is synthesized by such supportive environment as illumination environment (physiological support), object access environment (physical support) and background music environment (psychological support). These HBSC’s are implemented by cooperating the real world informatics environment system components of humanoid robots, audio/visual agents and ubiquitous appliances. In the paper, the author describes images of real world informatics environment system and presents research results of its constituent elements by dividing them into the following research groups; a humanoid robot (HR), VR system (VR), attentive environment system (AE), neo-cybernetic system (NC), and human informatics (HI) research group. Keywords. Human Robot Symbiosis System, Human Robot Interaction, Human Support
1.
Introduction
Advanced robot in the future should be able to offer such services of a servant, a secretary, a friend and a pet simultaneously and continuously to human. To this end, a lot of research efforts have been devoted on humanoid robots, communication robots, mobile robots and manipulation robots. These efforts can be classified as an individual robot approach. An environmental robot is another research approach to this end. In the environmental robot system, multiple robotic elements are distributed over space to offers services to human. Thanks to the progress of micro devices and network technology, the environmental robot systems have been widely under study such as Smart Room [1], Intelligent Room [2], Easy Living [3], Self [4], Aware House [5], neural network House [6], and Intelligent Space [7]. While research approach of individual robot or environmental robot attracts many researchers independently, these two approaches are not exclusive but complimentary to 1 Department of Mechano-Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 Japan. Tel: +81 3 5841 6441; Fax: +81 3 5802 5379; Email:
[email protected].
20
T. Sato / Real World Informatics Environment System
each other, i.e. the integrated system of both individual robot and environmental robot will realize better robotic performance than each independent robot. Taking account of the stated consideration, researchers of the 21st century COE (center of excellence) program of the University of Tokyo started a new project of a real world information system from 2002 supported by the Ministry of Education, Culture, Sports, Science and Technology. The real world information system project is intended to develop a real world informatics environment system. The system is a human symbiotic environment system where such intelligent entities as agents in the information world, avatars in the virtual reality world and humanoid robots or ubiquitous appliances with sensors and actuators in the real world are seamlessly linked with each other and are interacting with human to offer support. The social significance of constructing such real world informatics environment system is that it can demonstrate and show a prototype of the forthcoming ubiquitous society in a concrete manner. Scientific significance of the project is that the project reveals a model of human recognition and behavior by analyzing the relationship between the real world informatics system and the human inside it. In this paper, the author first of all makes clear basic concept of the real world informatics environment system and proposes to realize the system by combining multiple human behavior support cockpits (HBSC’s). The paper then touches upon several new functions of the human behavior support cockpits (HBSC’s) from the viewpoint of human behavior support environment by introducing research groups of the project. Lastly, the paper also reports realized new interactions between a human and a system in terms of the human behavior support cockpit taking account of the real world informatics related researches.
Higher-order communication with a remote place Assistance in reading or operations
Dialogue with an information agent Work assistance by AWB
The point of integration Show Room = Future Living Room x Natural x Always available
Life support by a humanoid Promotion of work efficiency and activation of communication
x Nonbinding x Coexisting with humans
Figure 1: Image of a real-world information system environment
T. Sato / Real World Informatics Environment System
2.
21
Project Target
2.1 Image of Real World Informatics Environment Figure 1 illustrates a conceptual sketch of the real world informatics environment system proposed by the members of 21st century COE project in 2002. In the environment, an information agent recognizes human daily movements continuously, a Virtual Reality (VR) system capable of natural dialogue interacts with the human, a humanoid robot handling complicated tasks comes close to the human to assist, and a ubiquitous appliance (a future information home electronics item) offers a helping hand to the human. As a result of discussion in the demonstration committee of the project, we obtained the stated research target image of the real world informatics environment system. This image sketch illustrates daily life in the future with multiple people, i.e. the daily life in the ubiquitous society in the future. Major characteristics of such an environment system are that it is a natural environment and is a comfortable place for a human to live, various system elements work all the time providing services while excessive physical and psychological constraints on a human are avoided as much as possible, and it is a humansystem symbiotic environment where the system adjusts itself to a human. 2.2 Human Behavior Support Cockpit In this section, the author proposes to realize the world informatics environment system by multiple human behavior support cockpits (HBSC’) which supports a human to perform that specific behavior by providing necessary and/or satisfactory environment. The human behavior support cockpit (HBSC) is synthesized by cooperating such elements as humanoid robots, information agents, VR systems and ubiquitous appliances depending upon human behavior. Here let us explain the human behavior cockpit by taking concrete example of reading newspaper behavior. When we want to read a newspaper, we have to have a newspaper at hand and we need illumination if it is dark in a room. These are the environments we need to read a newspaper. In this case, we need both illuminated environment and newspaper access environment. By synthesizing these two necessary environments, the newspaper reading cockpit is realized. This is the typical example of human behavior support cockpit by synthesizing multiple human support environments of illumination and object access. In the real world informatics environment system, the human behavior support cockpit is synthesized by the combination of not only a newspaper access environment realized by a humanoids robot bringing it near a human but also a newspaper readable illumination environment realized by a robotic lamp which is a component of the ubiquitous appliances of the room as described later.
22
T. Sato / Real World Informatics Environment System
2.3 Main Features of Human Behavior Support Cockpit 2.3.1 Elementary Human Behavior Support Environments to Realize Cockpit The human behavior support cockpit (HBSC) is synthesized by combining the following four categories of human behavior support environments by considering human property. These are the support environment of a) physiological support environment, b) physical support environment, c) informational support environment and d) psychological support environment. An illuminated space is an example of a physiologically supported environment. Physical support environment enables the human physically supported situation, for example a humanoid robot is able to realize a newspaper accessible physical environment by bringing the newspaper close to the human. An active projector easily realizes informational support environment by projecting necessary information required to perform particular behavior. Lastly the robot system should realize psychologically comfortable environment by providing BGM (back ground music) if it is adequate time for that. Thus by combining the stated human behavior support environment, the human behavior support cockpit (HBSC) is synthesized. A specific human behavior support cockpit (HBSC) is required depending upon the individual human behavior. 2.3.2 Factors of Human Behavior Support Cockpit The most important factors of a specific human behavior support cockpit (HBSC) are related to a) who, b) where, c) when, d) what, e) how and, f) why. When just one person is living in a room alone, then who to support is apparent, however in the case of multiple person, the system has to decide to whom the cockpit give support. The place or the space where the cockpit should be generated and when the support should be offered must be carefully examined, otherwise the support becomes useless or troublesome. The content of the support can be classified as explained in the previous section, i.e. a) physiological support, b) physical support, c) informational support and d) psychological support. In many cases, multiple superposed supports are needed in our daily life. In addition to this, even if the content of the support is the same, the cockpit has to select the way or the media of the support carefully. For example, the system has to select whether it uses voice channel or visual channel in case of informational support. The system always has to pay keen attention to a human so that it can provides adequate support to the human constantly, it means that the system has to have the capability to explain why some specific cockpit is offered to the human. 2.4 Realization of Human Behavior Support Cockpit The size of the human behavior support cockpit (HBSC) can be categorized by either a) a device level, b) an appliance level, c) a furniture level, c) an individual robot level, and d) a room level or combination of them depending upon the size of the cockpit. In addition to construction and sophistication of such individual components of the human behavior support cockpit realized by either devices, furniture, robots and/or room
T. Sato / Real World Informatics Environment System
23
itself, the combination of them have also to be investigated for better performance of support environment. 2.5 Research Group of Project Before realizing the final real world informatics environment system, the following subenvironment systems have been realized so far, i.e., a) a humanoid and robot environment system, b) a VR environment system, c) an ubiquitous environment system, and d) an audio/visual agent environment system. To realize the stated sub-systems, the following five research groups are formed in the project; x Human Robotics Research Group (HR), x VR System Research Group (VR), x Neo-Cybernetics Research Group (NC), x Attentive Environment Research Group (AE), x Human Informatics Research Group (HI). The first four groups correspond to the stated a) to d) sub-environment and the last research group of Human Informatics is formed to provide the project with basic scientific base of human. So far, discussions on a real world informatics environment system and its constituent elemental environment system were deepened. Details of these images will be presented in the following chapters along with individual research results.
3.
Human Robotics Research Group (HR)
Within the real world information system project that is intended to construct an intelligent environment interacting with a human in a new way, the Human Robotics Research Group (HR) will conduct researches focusing on innovative robots physically assisting a human such as a humanoid capable of equally interacting with a human in terms of physical handling of object and communication, wearable a ubiquitous supporting device that provides necessary information accompanying a human, and a room environment integrated robot casually watching over and giving helping hand to a human. Through collaboration with the other Human Informatics Research Group, NeoCybernetics Research Group and VR System Research Group within the real world information system project, we attempt to implement an intelligent system environment
Figure 2: Image of human robotics environment
24
T. Sato / Real World Informatics Environment System
interacting with a human by way of human behavior cockpit. Figure 2 illustrates the image of a human robot environment system. This image sketch shows a place where people get together, and a robot answers a human’s call correctly (communication environment), attends a human maintaining a comfortable distance (communication environment), exchanges information (communication environment) and goods (object access environment) with a human politely. They are related to provide assistance by knowing a person’s daily living behaviors. As a perceptual function of a cockpit, knowing of the existence of a human, responding to a human motion, and observing the direction of a sight line are basic and indispensable; however, there still remain some unsolved problems. To realize such a cockpit, this research group allotted and deepens research and development of an environment robot with a sensing floor and robotic lamp device (described later in section 7) collaborating with each other, a mobile and a humanoid robot.
4. VR System Research Group (VR) This research group promotes research and development of augmented reality (real space with information and images) technology and tele-existence technology (in which an operator controls a remote robot as if it A actually is on board.) to display a “cyber world essentially equivalent to real world” within daily space where a human actually lives rather than in a special room. This provides the human behavior cockpit with powerful tool to realize displaying environment Figure 3 is an image sketch of the VR system environment. The upper right and lower left of the figure are remote space— B here, referred to as Space A (upper right) and Space B (lower left). An entirecircumference cameras we installed around. Cameras in Space B are used to capture Figure 3: Image of a VR system environment a 3-D entire-circumference image of Person, and the image is presented by an entire-circumference 3-D image generator in Space A to bring about the effect of making a viewer in Space A feel as if the Person existing in Space B existed in Space A (communication environment). Bidirectional tele-existence so that Person B also feels as if he or she were in Space A is to be implemented. Moreover, construction of a system to realize high-level integration of the real world and information world using a self-propelled projector and projector capable of free adjustment of a projection direction is utilized.
T. Sato / Real World Informatics Environment System
25
5. Neo-Cybernetic Research Group (NC) The purpose of this research group is to provide a machine with functions and abilities such as vision, hearing, and touch senses, targeting a wide range of research subjects from intelligent high-performance hearing, vision and touch sensors to a high-speed robot or a voice interaction personalized agent that behaves like a human. Figure 4 shows an image sketch of the neo-cybernetics system environment. The plot currently under our consideration could be a system guiding through a museum demonstration site. A large self-propelled display with a voice interaction personalized agent has a dialogue with a user through speech recognition and synthesis (communication environment). A guest is to be led or followed by this equipment or by acoustic processing, directional hearing and high-speed vision functions, where a selection of information Figure 4: Image of neo-cybernetics environment media becomes important research issue (information display environment). Stated communication and information displaying environment are synthesized to realize support cockpit of museum walk around behavior.
6. Attentive Environment Research Group (AE) The group mainly studies an environment which we call the Attentive Environment (AE) that “offers a helping hand to an operator” as a new intelligent machinery system for assisting operations in cell production, under the theme of “a new interaction between a human and machine” in a production environment. Furthermore, as a specific development objective, a system that flexibly handles major parts during assembly was reviewed, and an Attentive Workbench (AWB) system that integrates a motion tray (object access environment), information display (information display environment), vital sign monitor (physiological support environment) and parts tag (computer readable information environment) have been under development. Figure 5 shows an image of an attentive environment system. This is an environment Figure 5: Image of an attentive environment system where a worker is doing assembly system environment
26
T. Sato / Real World Informatics Environment System
work, i.e. the figure illustrates the assembly behavior support cockpit in the configuration of a room. The system recognizes the intention of the worker from his motions, and tries to “offer a helping hand”—presenting information or moving parts which corresponds to information environment and physical object access environment. For example, the system may display necessary information on the desktop, bring necessary tools and parts to an operator’s hand as he extends his arm, or transport assembled parts to the next procedure to provide object access environment to the worker. Motions recognition shall be done without discomforting the operator. A visual system installed at the top of the system is used to recognize especially motions of the operator’s arm or finger and the direction of a sight line (head). Information presentation is done by presenting an icon, figure or document on the desktop by a projector, using a method of extended reality to provide object access environment. Furthermore, the operator’s fatigue degree is also recognized by letting the operator wear a monitor device for vital signs such as the heart rate, and work speed and the degree of assistance shall be varied accordingly to ensure suitable physiological environment.
7. Human Informatics Research Group (HI) The Human Informatics Research Group develops research for modeling human perception, behavior and recognition functions based on measurement and analysis to form human scientific basis of this project; establishing mathematical scientific theories of real world information processing; and constructing basic methods for new forms of recognition/behavior, communication, and coordination/assistance functions. This environment is called a showroom, intending not only to demonstrate interactive system technology truly compatible with a human but also to do analysis and modeling study of a human (i.e., mathematical science of real-world information system). A measurement interface for a robot experiment and interface construction, and international multipoint remote lectures for result transmission and educational feedback will also be tried in the future.
8. Example of Human Behavior Support Cockpits In this chapter, an example of a human behavior support cockpits (HBSC’s) will be presented. These cockpits are composed of an illumination environment to a adapt to several human behaviors such as entering the room, sitting on the chair, sitting on the cushion, being on bed and getting up from bed, i.e. a entering the room cockpit, a sitting on the chair cockpit, a sitting on the cushion cockpit, a being on bed cockpit and a getting up from bed cockpit respectively realized by the cooperation of a sensing floor and a robotic lamp to generate illumination support environment [8].
T. Sato / Real World Informatics Environment System
27
8.1 Sensing Floor A sensing floor is a high resolution pressure sensor distributed floor to detect human position and objects such as a chair and a cushion. The sensing floor is composed of 16 units (Fig.6). Each unit has 16*16=4096 on/off sensors distributed in a square of 500mm width. Therefore the sensor pitch is 7mm. The output of the sensors is digital one bit. The sampling frequency is 5 Hz. This sensing floor is able to detect the user's a) moving on the bed b) sitting on the chair c) sitting on the cushion d) being on bed.
Figure 7: Entrance Illumination
Figure 8: Desk Illumination
Figure 9: Table Illumination
Figure 10: Bed Illumination
Figure 6: Sensing Floor
8.2 Robotic Lamp The Robotic lamp is a robotic 5 DOF desk light with the following functional capabilities: a) brightness control, b) posture control and c) back drivability to detect human physical contact. 8.3 Human Behavior Support Illumination Environment The following figures show how the robotic lamp illuminated the entrance as the user entered the room (Fig.7) and illuminated the desk as the user sat on the chair (Fig.8). Robotic lamp illuminated the table as the user sat on the cushion near the table (Fig.9) and illuminated the bed after the user moved on the bed (Fig.10) which corresponds to the entering the room behavior support cockpit, the sitting on the chair behavior support cockpit,
28
T. Sato / Real World Informatics Environment System
the sitting on the cushion behavior support cockpit, the being on bed behavior support cockpit and the getting up from bed behavior support cockpit respectively. These figures presents the robotic lamp system is able to provide adequate illumination environment depending upon user’s behavior.
9. Conclusion Bearing realization of a new human-system interaction in mind, we have been promoting the project to realize a real world informatics environment system that is implemented by multiple human behavior support cockpits. This paper mentioned the research objectives of the real world information system project of the University of Tokyo and proposes human behavior support cockpit realized by synthesizing the behavior support environment of a physiological support environment (e.g. an illumination environment), a physical support environment (e.g. object access environment) and psychological support environment (in some case realized by a communication environment). The project is composed of research group such as a Human Robot Research Group (HI), a VR Research Group (VR), a Neocybernetics Research Group (NC), a Attentive Work Bench Research Group (AE) and Human Informatics Research Group (HI). Research target of each research group and examples of human behavior support cockpit are also explained. Our future tasks are promotion and integration of researches in sophistication of many system elements including a humanoid, VR system, information agent and ubiquitous appliance, and pursuit of the relation between those information systems and a human to realize human behavior environment. By synthesizing these environment, proposed human behavior support cockpit would be realized as the new function of the real world informatics environment system.
Acknowledgement The author would like to express great thanks to Professor Shigeki Sagayama, Professor Masayuki Inaba, Professor Yasuo Kuniyoshi, Professor Susumu Tachi, Professor Hiromasa Suzuki and Professor Kiyoshi Takamasu who are the leaders of the research group of the Neo-Cybernetics Environment, Human Robotics Environment, Human Informatics Environment, VR System Environment, Attentive Environment, and Attentive Environment of this project respectively. Great thank also to Dr. Hiroshi Morishita, Prof. Mihoko Otake, Dr. Shinji Sako, Dr. Kiyoshi Kotani, Dr. Masao Sugi for their keen discussion as specially appointed researchers and Mr. Yosuke Kita for nice picture of conceptual sketches of environments. This project is the part of the 21st century COE programs of and supported by the Ministry of Education, Culture, Sports, Science and Technology.
T. Sato / Real World Informatics Environment System
29
References [1] Alex Pentland, “Smart Rooms”, Scientific American, pp.54-62, April, 1996. [2] Mark C. Torrance, “Advances in human computer interaction: The intelligent room,” In Working Notes of CHI 95 Research Symposium, Denver, Colorado, May 6-7, 1995. [3] Barry Brumitt et.al., “EasyLiving: Technologies for Intelligent Environments,” Proceedings of International Symposium on Handheld and Ubiquitous Computing, 2000. [4] Y. Nishida, T. Hori, T. Suehiro, and S. Hirai, “Sensorized environment for selfcommunication based on observation of daily human behavior,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems pp.1364-1372, 2000. [5] I. A. Essa, “Ubiquitous sensing for smart and aware environments: technologies towards the building of an aware home,” Position Paper for the DARPA/NSF/NIST workshop on Smart Environment, July 1999. [6] Michael C. Mozer, “The neural network house: An environment that adapts to its inhabitants”. In Proceedings of the AAAI, 1998. [7] Joo-Ho Lee, Noriaki Ando, Hideki Hashimoto, “Design Policy for Intelligent Space”, IEEE Systems, Man, and Cybernetics Conference, 1999. [8] Tomomasa Sato, Mehrab Hosseinbor, Takeru Kuroiwa, Rui Fukui, Jun Tamura, and Taketoshi Mori: “Behavior and Personal Adaptation of Robotic Lamp,” Journal of Robotics and Mechatronics, Vol.17, No.1, 2005 pp69-76, 2005. [*] http://www.u-tokyo.ac.jp/coe/list06_e.html
30
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Understanding and realization of constrained motion – human motion analysis and robotic learning approaches a,b
a
Shigeyuki Hosoe, Yuichi Kobayashi and Mikhail Svinin a RIKEN Bio-mimetic Control Research Center b Nagoya University Nagoya, Japan
[email protected], {koba,svinin}@bmc.riken.jp
a
Abstract. This paper presents two approaches toward the understanding and realization of constrained motion of the upper limbs. The first approach deals with the analysis on the constrained human movements under the framework of optimal control. It is shown that the combination of the optimality criteria constructed by muscle force change and hand contact force change can explain the formation of the constrained reaching movements. The second approach comes from robotics. It is illustrated by application of reinforcement learning to a robotic manipulation problem. Here, an assumption on the existence of holonomic constraints can accelerate learning by introducing function approximation techniques within model-based reinforcement learning. The idea of estimation (parameterization) of the constraints can relate two different problems under the common perspective of “learning constrained motion.” Keywords. Constrained motion, Human motion analysis, Optimal control, Reinforcement learning
1. Introduction Historically, the robotics research has grown from industrial applications. However, recently it attracts more and more attention for the reason that robots can be regarded as “bio-mimetic machines,” where the principles of the robot control are expected to help understanding of the principle of human motion control. That is, we may understand the function of human brain by constructing machines that mimics human body and by testing those machines using biologically inspired control principles. It has been pointed out that the interaction between the human body and the environment plays an important role for the understanding of human intelligence [1]. This perspective was emphasized in contrast to the symbol-based information processing researches, which mainly considered functions of brain independently of the motion of human body. Examples of interaction with the external environment can be found in locomotion and manipulation tasks. In locomotion tasks one deals with the control of lower limbs, while in the manipulation task we are interested in the control of upper limbs. Typically,
S. Hosoe et al. / Understanding and Realization of Constrained Motion
31
the external environment in the locomotion tasks is static. In comparison with the locomotion problems, the control of upper limbs motions may have different properties. For example, as can be seen in many manipulation problems, the interaction between the upper limbs and the environment can change the latter. In general, the property of “changing the world” can not be seen in locomotion tasks. This can require the agent to have qualitatively different abilities of adapting to the environment. Additionally, the interaction between the upper limbs and the environments can be physically constrained. The property of being constrained by the external environment requires further analysis of the basic mechanisms for the generation of constrained movements that can be exploited by the human control system for the creation of adaptation mechanisms with richer computational structures. In this paper, we present two approaches toward the understanding and realization of the constrained motion of the upper limbs. First, we show the analysis of upper limbs motion physically constrained by the environment. The analysis is based on the idea that the human motion can be explained using optimization principles. Secondly, we show the realization of constrained motions by a manipulator using reinforcement learning. Here, we introduce some assumptions on the learning problem so that learning performance is improved in comparison with the conventional reinforcement learning methods taken without such specific assumptions. Fig.1 shows the relation between two approaches described in this paper. Both these approaches have constrained motions as the common control problem. The two arrows in the figure denote the connection between the approaches. The left arrow indicates the research on human motion analysis which aims to the development of the robot technology as well as understanding of human itself. The right arrow indicates the expected contribution to the understanding of human motion by developing robot control methods while borrowing some ideas from human motion for the construction of robot control strategies. Understanding of human motion analysis of human motion
Constrained motion construction of robot control
Development of engineering technology
Fig. 1 : Two approaches for the understanding and realization of constrained motion
2. Analysis of human constrained motion It is known that unconstrained reaching movements of human arm have invariant features expressed by straight (slightly curved) hand path in the task space and a bell-shaped hand velocity profile [3]. One approach to account for the invariant
32
S. Hosoe et al. / Understanding and Realization of Constrained Motion
features of reaching movements is via optimization theory. The cost function for the optimal control problem can be generally described as
J
³
T 0
L(q, q , q,")dt ,
(1)
where q denotes the vector of the joint angles. The optimal control and the optimal trajectory for the upper limb system can be derived by minimizing this cost function. For example, Flash and Hogan proposed a minimum hand jerk criterion [3]. Other criteria, such as a minimum joint torque change criterion [4] and a minimum muscle tension change criterion, were also tested for the prediction of reaching movements. Roughly, the criteria of optimality can be classified depending on the planning space (task or joint) and the criterion domain (motion or force) as indicated in Fig.2.
Fig. 2 : Classification of optimization criteria (cost function)
Fig. 3 : Experimental setup for human motion measurement
The above-indicated criteria of optimality were tested mainly for the reaching movements in free space. However, humans also make arm movements interacting with the environment (opening a door, turning a steering wheel, rotating a coffee mill, and so on). To manipulate objects or to use tools, the central nervous system (CNS) has to take into account the constraints imposed by the external environment when planning hand trajectories. It should also be noted that constrained movements constitute a class of force interactive tasks with redundancy in the distribution of the force actuation across the muscles. Understanding how human deals with the external constraints and solves the redundancy problem is important not only from the point of view of computational neuroscience but also for the construction of intelligent control systems for robot manipulators interacting with the external environment. Considering optimal control problems for the prediction of reaching movements, we noticed that criteria defined in the force domain of the task space have not been considered in literature. Next, by analyzing human motion profiles in a crank rotation task (see Fig. 3), we found that the conventional criteria (minimum joint torque and minimum muscle force change criteria) do not predict well the experimental data in the motion domain (tangential direction of movement) but they can capture the tendency of the internal force profiles. To take into consideration the force interactive nature of the crank rotation task, we have proposed a minimum hand force change criterion which is essentially a dynamic version of the minimum hand jerk criterion. This criterion gives
33
S. Hosoe et al. / Understanding and Realization of Constrained Motion
much better prediction of the velocity profiles. However the level of the normal forces, given by this criterion, is much lower than that observed in the experimental data. Consequently, we proposed a combined form of the optimality criterion, which has two components‒the hand force change and the muscle force change–balanced by a weight factor. Our simulation results show that in comfortable point-to-point movements the combined criteria predict the experimental data better. In addition to the motion and force trajectories in the task space, the combined criterion formulated at the muscle level can also capture the basic patterns of muscle activity (see Fig. 4). Even though the mathematical model we employed is relatively simple and does not take into account the dynamics at the metabolic level, the predictions are quite satisfactory. This suggests that the smoothness of both hand and actuating forces is of primary importance for explaining human movements constrained by the external environment. 0 400
f6
0 400
f5
0 400
f4
0 400
f3
0 400
f2
EMG6
EMG4 EMG3 EMG2
Muscle force (N)
IEMG
EMG5
EMG1
0.2
0.4
0.6
0.8
1
1.2
1.4
0
0.25
Time (s)
0.5
0.75
1
1.25
1.5
f1
Time (s)
Fig. 4 : EMG signals (left) and muscle forces predicted by proposed criterion (right)
It should be noted that the combined criteria introduced in this research take into account the geometry of the external constraints through the force change term, which depends on the orientation of the moving contact frame. Knowledge of the normal and the tangential directions of the constraint is essential in the calculation of the combined criteria. It is therefore assumed that, after enough learning and adaptation, the CNS implicitly constructs a parameterization of the constraint manifold. This may raise new interesting questions about the nature of the biological motor control of human movements.
3. Reinforcement learning for robotic constrained motion Reinforcement learning (RL) was originally related to the psychology of animal learning. Later it was connected to the optimal control framework and recently applied to robot control problems (e.g. [10]). RL is characterized by the fact that we do not need the perfect information about the evaluation and the dynamics of the system [1]. The general objective of reinforcement learning is to maximize the discounted sum of reward. One of the key notions in RL is known as a Temporal-Difference (TD) learning, where the following TD-error G t is used to update the value function V ;
34
S. Hosoe et al. / Understanding and Realization of Constrained Motion
Gt
rt J (V ( xt 1 ) V ( xt )) ,
(2)
Here, xt and rt denote the state variable and reward at time t , respectively. One of the advantages of the TD-learning is that it does not require explicit representations of the state transition function and the reward function, which results in less computation. Another advantage is that the TD-learning does not require any specific assumptions except the one that the state transition should be Markov decision process [1]. This generality means wide applicability of the learning method in principle, though it is difficult to practically apply TD-learning to control problems due to the slow convergence. On the other hand, there is a possibility to improve the efficiency of learning. This can be done by the explicit approximation of the state transition function and the reward function, and by utilizing them for learning even at the higher computational cost. In the discrete state and action domains, such explicit approximation methods are known as Dyna [7] or real-time dynamic programming [8]. Here, we extend the approximation to the continuous state domain by introducing a suitable function approximation scheme and call such an approach a model-based reinforcement learning. In this paper, we focus on a manipulation problem where the actual motion is restricted to a sub-manifold, which is included in the manifold of the total system, as indicated in Fig.5. In other words, we deal with the problem which includes holonomic constraints. The actual motion of the manipulator is restricted to the motion of the object as long as the contacts between the manipulator and the object are maintained (see Fig.6). If we can obtain an appropriate expression for such a low-dimensional motion, learning will be more effective. In the control of the constrained motion, there are two objectives, 1) to reach the desired configuration and 2) not to break the constraints (not to let the fingertip slip or part from the object). The second objective can be transformed to a discontinuous reward (see Fig. 7), which is given when the manipulator fails to keep the constraint. Thus, we introduce two assumptions on the control problem, the holonomic constraint and the discontinuous reward.
x3 constrained motion space
p2 p1
x1
x2
original state space
Fig. 5 : Sub-manifold generated by constraint
Fig. 6 : Constrained manipulation
motion
of
35
S. Hosoe et al. / Understanding and Realization of Constrained Motion
Based on these assumptions, we propose a learning scheme which is specified by the following three features: a) Construction of the mapping from N to a smaller-dimensional parameter space. b) Model-based reinforcement learning taking into account the system dynamics and using an approximation of the reward function. c) Approximation of discontinuous reward functions. Fig.8 shows a specific example of the manipulation task used in our study. Here, a two-link manipulator rotates an object to the desired angle without slipping and parting from the object.
R
y
q2
x2
Fcx
W2
Fcy
x1
q1 W1
x
T
xr , yr
Fig. 8 : Rotational task of one-dof Fig. 7 : Discrete reward function with object continuous boundary The motion of the manipulator is restricted to one-dimensional space while the constraint is maintained as indicated in Fig.9 which shows the constraint curve the joint space obtained during the learning. Fig. 10 shows the reward profile of the proposed learning method.
q2 -1.6
-1.8
-2
-2.2
0.9
1
1.1
q1
Fig. 9 : Obtained curve of constraint in joint angle space
Fig. 10 : Reward profile of proposed learning method
First, we compared the developed model-based learning method (using the proposed mapping) with the conventional Q-learning method and found that it was
36
S. Hosoe et al. / Understanding and Realization of Constrained Motion
impossible to obtain a control policy achieving the goal with Q-learning. This implies that the explicit approximation of the reward plays an important role in learning. Secondly, we applied the model-based learning without the proposed mapping. This time, the learning method could achieve the goal, but the averaged performance of the obtained policies was much worse. Thus, the learning performance could be improved by changing the compactness of the representation of state.
4. Discussion In the analysis of constrained human motion, we showed that the combined criteria of hand force change criterion and muscle force change criterion explained the experimental results well. The combined criterion essentially requires the knowledge of the geometrical information of the constraint. This result implies that human might construct a parameterization of the constraint, which can be obtained after some trial movements. This idea of parameterization of the constrained manifold was then considered within reinforcement learning framework. In the application of the model-based reinforcement learning to a manipulation task, we confirmed that the parameterization of the constraint improved the learning performance. The idea of estimation of the geometrical information of the constraint was also applied to the estimation of a discontinuous reward that specifies the boundary of the region for a discrete reward. This also greatly helped to improve the learning performance. From the perspective of robot control, in addition to the parameterization of constrained manifold, there can be other kinds of knowledge that can accelerate reinforcement learning. This issue can be addressed by constructing reinforcement learning methods that effectively utilize the additional knowledge on the manipulation problem and by measuring human motion with different conditions and clarifying whether humans really utilize such knowledge or not. In the human motion analysis, we dealt with the human motions that had been obtained by sufficient amount of training. It is also very important to focus on the learning of such motions. The relation of RL and the brain function has been discussed in the context of reward estimation and TD-error [12, 13]. On the other hand, some correspondences between the activities of Purkinje cells in the cerebellum and the learning framework based on motion primitives have been pointed out [14]. It is natural that human motion skills are obtained not by a single RL algorithm but by the combination of several kinds of learning frameworks, and the combination can be characterized by an effective usage of acquired (and somehow abstracted) knowledge. We will focus on this aspect of the acquisition of human motion skill in the future study.
5. Conclusion In this paper, we presented two approaches for the understanding of human constrained motion and the realization of robotic learning. In the analysis of human constrained motion, we proposed a combined criterion of hand-force change criterion and muscle
S. Hosoe et al. / Understanding and Realization of Constrained Motion
37
force change criterion. The criterion predicted the experimental results well enough. The idea that human might construct the parameterization of the constraint was applied to the robotic motion learning. We proposed to introduce the knowledge of the constraint into the reinforcement learning framework by model-based learning with function approximation. The simulation results showed that the proposed learning method improved the learning performance. We expect that the further development of formal learning methods can be realized by mimicking the learning techniques witnessed in human movements.
Reference [1] Richard S.Sutton and Andrew G.Barto, “Reinforcement Learning,” MIT Press, 1998. [2] Rolf Pfeifer and Christian Scheier, “Understanding Intelligence,” MIT Press, 1999. [3] P. Morasso, “Spatial control of arm movements,” Experimental Brain Research, 42, pp. 223-227, 1981. [4]
T. Flash and N. Hogan, ”The coordination of arm movements”, Journal of Neuroscience, vol.5, pp.1688-1703,1985.
[5] K. Ohta, M. Svinin, Z.W. Luo, S. Hosoe, and R. Laboissiere, “Optimal Trajectory Formation of Constrained Human Arm Reaching Movements,” Biological Cybernetics, Vol. 91, No. 1, July 2004, pp. 23-36. [6] Mikhail Svinin, Youhei Masui, Zhiwei Luo, and Shigeyuki Hosoe, “On the Dynamic Version of the Minimum Hand Jerk Criterion,” Journal of Robotic Systems, Vol. 22, No. 11, pp.639-651, 2005. [7] R. S. Sutton, “Dyna, an Integrated Architecture for Learning, Planning, and Reacting,” Proc. of the 7th Int. Conf. on Machine Learning, pp. 216-224,1991. [8] Andrew G. Barto, Steven J. Bradke and Satinder P. Singh,
Learning to Act using Real-Time Dynamic
Programming,” Artificial Intelligence, Special Volume: Computational Research on Interaction and Agency, 72, 1995, pp.81-138. [9] C. G. Atkeson, A. W. Moore, and S. Schaal: “Locally Weighted Learning for Control,” Artificial Intelligence Review, Vol.11, pp.75-113,1997. [10] J. Morimoto and K. Doya:“Reinforcement Learning of dynamic motor sequences: learning to stand up,” Proc. of IEEE/RSJ Int. Conf. of Intelligent Robots and Systems, Vol.3, pp.1721-1726, 1998. [11] Y. Kobayashi and S. Hosoe, “Reinforcement Learning for Manipulation Using Constraint between Object and Robot,” Proc. of IEEE Int. Conf. on Systems, Man & Cybernetics, pp. 871-876, 2005. [12] A.G. Barto, “Adaptive critics and the basal ganglia,” In Models of Information Processing in the Basal Ganglia, Cambridge, MA: MIT Press, pp.215-232, 1995. [13] C.D. Fiorillo, P.N. Tobler, and W. Schultz, “Discrete coding of reward probability and uncertainty by dopamine neurons,” Science, vol. 299, pp. 1898-1902, 2003. [14] K. A. Thoroughman and R. Shadmehr, “Learning of action through adaptive combination of motor primitives,” Nature 2000 Oct 12;407(6805):682-3.
This page intentionally left blank
Part 1 Navigation and Motion Planning
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
41
The Specifiability Requirement in Mobile Robot Self-Localization Francesco Capezio, Antonio Sgorbissa 1 , and Renato Zaccaria DIST - University of Genova Abstract. In last years, Simultaneous Localization And Mapping techniques monopolize research in mobile robot self-localization, especially in the civilian, Service Robotics domain. In this paper we argue that techniques derived from the industrial scenario, in particular beacon-based triangulation systems, should be taken into considerations even for civilian applications whenever we deem important to provide accurate, a priori specifications about the system behavior in a generic, yet untested environment. The paper provides an analytical expression of sensitivity to errors for triangulation depending on the system geometry. Keywords. Mobile Robotics, Beacon-based triangulation, System design
1. Introduction Self-localization has been the key for success for all those systems which are agreed upon as “big achievements” in the last decade (some renown examples are [1] [2] [3]). This is not a surprise, if we consider long term autonomy as a discriminator (the “six months between missteps” that Moravec advocates [4]): whenever robots are required to work continuously, for a long period of time, carrying out their tasks with no performance degradation or human intervention, the ability to keep dead reckoning errors (which are intrinsically cumulative) limited over time plays a fundamental role. Among the criteria which are usually adopted for classifying localization methods, in this work we focus onto a single, specific criterion: i.e., by extending the dialectic between “artificial landmark” and “natural landmark” [5], we distinguish two categories: 1. Approaches which rely on features which are naturally present in the environment to update the robot’s perceptions. Autonomous systems in the civilian, Service domain mainly belong to this category; features in the environment are detected by means of proximity sensors (sonars and, more recently, laser rangefinders [2] [3]) or vision. Simultaneous Localization And Mapping falls in this class: it puts further constraints by assuming that, in addition to being unmodified, the environment is also unknown. 2. Approaches which introduce artificial, ad hoc designed features or devices in the environment in order to improve and simplify the perception process. AGVs in the industrial domain very often belong to this category; since both the signal emitting source and the sensor are designed ad hoc, the only limit is one’s imagination. Some examples 1 Correspondence to: Antonio Sgorbissa, DIST - University of Genova. Tel.: +39 010 353 2801; Fax: +39 010 353 2154; E-mail:
[email protected].
42
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
are spray-painted tracks in the floor adopted by Bell and Howell Mailmobile, Komatsu Z-shaped metal landmarks [6], MDARS-I retroreflective landmarks [7], DBIR LaserNav [8], and TRC Beacon Navigation System [9]. Approaches belonging to the second class allow better performance: that is why they are often preferred in the industrial domain. However, there are often very good motivations to leave the environment unaltered: commonly raised questions address the system’s migrability (i.e., can it behave indifferently in different environments?), biological plausibility, cost, accessibility (i.e., is it possible at all to modify the environment?), and would deserve further discussion. Here we focus onto a different issue, which is rarely taken into consideration when evaluating a localization systems: we call it specifiability, a market-inspired term which means “how accurately can a consumer assess... whether or not the good will satisfy the purpose it is purchased for” [12]. In Section II we show that the current trend in mobile robotics (especially indoor, SLAM-based, Service Robotics) has many difficulties to answer to the specifiability requirement. In Section III we focus onto the triangulation algorithm: we provide a closed form expression of triangulation and we formally derive an expression of the algorithm sensitivity to measurement errors. In Section IV, MatLab simulations and experimental results with the beacon-based system that we designed are shown. Conclusions follow.
2. Specifiability In [3] a large scale application in the museum scenario is discussed: the localization subsystem, based on two laserscanner and a feature based approach, undoubtedly demonstrates to be robust and reliable enough for the assigned tasks. Suppose however to be interested in buying or renting a similar system, and to consequently ask for a priori, formal specifications about its localization accuracy in a new, untested environment. Suppose also that, as the only available information, you can provide interior plans of the new environment and, maybe, some photographic snapshots. Unfortunately, in the case of laser-based (or vision based) feature extraction and matching, predicting the system’s accuracy from elementary considerations about the geometry of the environment is not simple. The situation is even more dramatic if the environment dynamically changes during task execution. Notice that this lack of specifiability is not due to limitations of the sensor; the main problem is the complexity of the interactions of the sensor with the environment, which causes feature extraction and matching to combinatorially depend on many factors: the number, the geometry and distribution of available features, the relative position and orientation of the robot with respect to each of them, the similarities and differences between potentially confusable features, lighting conditions (for vision), etc. On the opposite, complete, formal specifications are usually provided for AGV localization and navigation in the industrial domain [5]. Let us focus onto positioning errors: either the robot’s motion is constrained along a fixed trajectory or it relies on a finite number of artificial landmarks to correct its position, distribution p(errr |r) of the positioning error errr can be predicted for every robot’s configuration r by knowing 1. measurement error distribution p(errs |r) for each landmark si sensed in r; 2. sensitivity to errors, i.e. how positioning errors depends on measurement errors. If the robot’s state can be expressed in closed form as r = F(s1 , .., sn ), sensitivity is given by the Jacobian J of the partial derivatives of F with respect to s1 , .., sn .
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
43
About 1, notice that also statistical approaches relying on natural landmarks are required to compute p(errs |r). However, with artificial landmarks, measurement errors can be evaluated with higher fidelity both in the theoretical and the experimental noise analysis: since one usually designs both the sensor and the landmark, the underlying model is better known and - hopefully - simpler. About 2, sensitivity proves to be particularly relevant when using artificial landmarks. In this case, landmark detection does not depend anymore on the complexity of the environment (which can be ignored), but on the spatial arrangement of landmarks as it has been decided with the sole aid of interior plans: i.e., for every r, it is always possible to know in advance which landmarks are visible and the corresponding p(errs |r). Thus, once the spatial arrangement of landmarks has been decided and sensitivity has been computed on the basis of pure geometric considerations, positioning errors p(errr |r) are predictable and fully specifiable, for any area of the Workspace in which enough landmarks are available for localization, as a function of the sole robot state r. Among localization systems commonly used in the industrial domain, beacon-based localization through triangulation seems a good compromise to meet all the requirements introduced so far while suiting two major needs of civilian scenarios: beacons can be elevated from ground (thus avoiding occlusions due to surrounding people) and they allow the robot to exhibit complex metric navigation in the free space [8] [9]).
3. Triangulation expressed in closed form Triangulation assumes that the robot knows the position of (at least) three landmarks bi = (xbi , ybi ), i = 1, .., 3 (i.e., beacons in our system) with respect to a fixed Cartesian frame F . By measuring the relative angles θ1 , θ2 , θ3 between the vehicle’s heading (to be computed) and the three landmarks, triangulation allows to retrieve the position and orientation r= (xr , yr , θr ) of the vehicle with respect to F (Fig. 1a). The problem has a graphic solution which is very simple and evocative (Fig. 1b): if the robot observes two landmarks (say b1 , b2 ) separated by an angle θ12 (i.e., the difference θ1 − θ2 , with positive angles measured counter clockwise), it must be on a circle C12r univocally determined by that angle and the distance between the landmarks. When considering another couple of landmarks (say b2 , b3 ) and the corresponding angle θ23 , another circle C23r is defined, which intersects with the previous one in two points: b2 and (xr , yr ), i.e. the actual robot’s position. Finally, by knowing (xr , yr ) and any of the three angles θ1 , θ2 , θ3 , orientation θr can be retrieved. This graphic solution (also referred to as Circle Intersection [10]) can be implemented as a closed form expression xr = fx (θ12 , θ23 ), yr = fy (θ12 , θ23 ), θr = fθ (θ12 , θ23 )
(1)
which will be described in details in the following. The graphic approach helps to visualize singularity configurations which are well known in literature: 1. the robot lies on the same circle (possibly collapsing to a straight line) which connects the three beacons: C12r and C23r overlap and the solution is not univocal; 2. the robot lies on the line which connects two landmarks, say b1 , b2 : the subtended angle θ12 equals 0 or π and C12r consequently collapse to a line. Notice that, whilst in case 1 the problem has intrinsically infinite solutions, in case 2 a unique solution still exits: it just requires a special care to deal with it. Moreover, in
44
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
Figure 1. a) The Triangulation Problem b) Graphical solution
both cases, all configurations which are close to singularities reveal to be problematic: a report [10] of triangulation algorithms in literature, compared on the basis of their sensitivity to measurement errors, shows that Circle Intersection provides the higher accuracy as possible (by ignoring iterative, approximate methods such as the Newton-Rapson which assume an a priori estimate of the robot’s position). In [11] authors claim that the Trigonometric Method is able to find a solution even for type 2 singularities whilst Circle Intersection is not. This is imprecise: of the three subtended angle θ12 , θ23 , θ31 (each uniquely defining a circle C12r , C23r , C31r ), at most one can equal 0 or π (unless all beacons lie on the same line). The information available is thus redundant, allowing to consider the two angles for which the singularity is not reached. Since its simplicity and efficiency, we consider only the Circle Intersection algorithm, of which we give a closed form expression which is much simpler than in [10]. Next, to express sensitivity to error, we analytically compute the Jacobian J of the partial derivatives of fx , fy and fθ with respect to θ12 , θ23 , θ31 . This idea is first introduced in [11]; surprisingly, authors do not provide a closed form expression of J, and limit themselves to simulating the effect of angular errors on an Error Sensitivity Map. Let us imagine to choose angles θ12 and θ23 , and to consequently consider the circles C12r and C23r . First, we compute the center of C12r : by elementary geometric considerations (Fig. 2a), we know that the angle between b1 and b2 , when measured from the center cr12 , is doubled with respect to θ12 . Thus, the distance between cr12 and b1 b2 can be computed as follows 2 2 d12 = 0.5 (xb2 − xb1 ) + (yb2 − yb1 ) tan(θ12 ) (2) notice that, if we consider θ12 in the interval 0 2π , d12 can assume negative values: this implicitly considers the fact that cr12 can be on both sides of b1 b2 . Notice also that the circles defined by θ12 and θ12 + π are the same; however, for every angle θ12 , cr12 is univocally defined. Next, we compute the midpoint m12 between b1 and b2 and we add to m12 a vector d12 which is perpendicular to b1 b2 and has length d12 After some substitutions, this yields the following expression for cr12 (and cr23 ): xcr12 = 0.5 (xb1 + xb2 + (yb2 − yb1 )/tan (θ12 )) ycr12 = 0.5 (yb1 + yb2 − (xb2 − xb1 )/tan (θ12 ))
(3)
There are some singular points in Equation 3: 1. θ12 = 0, π corresponds to a circle with infinite radius (i.e., collapsing to a line), a type 2 singularity which requires considering a different couple of landmarks
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
45
2. θ12 = π, 3π/2 corresponds to a circle whose center lies on b1 b2 , i.e., the midpoint between the two landmarks. Next, we compute the Equation of the straight line cr12 cr23 (Fig. 2b) and the distance db2 between cr12 cr23 and b2 (the first intersection of C12r and C23r ) db2 =
(ycr23 − ycr12 ) xb2 − (xcr23 − xcr12 ) yb2 − ycr23 xcr12 + ycr12 xcr23 2 2 (ycr23 − ycr12 ) + (xcr23 − xcr12 )
(4)
since the second intersection (i.e., corresponding to xr , yr ) is symmetrical to b2 with respect to cr12 cr23 , we compute it by adding to b2 a vector db2 which is perpendicular to cr12 cr23 and is doubled than db2 (but has an opposite sign). 2 2 xr = xb2 − 2db2 (ycr23 − ycr12 ) (ycr23 − ycr12 ) + (xcr23 − xcr12 ) (5) 2 2 yr = yb2 + 2db2 (xcr23 − xcr12 ) (ycr23 − ycr12 ) + (xcr23 − xcr12 ) To compute orientation θr , we use Equation 6 (handling the special case ycr23 = ycr12 ). xcr23 − xcr12 yb2 − yr − θ2 = arctan − − θ2 θr = arctan (6) xb2 − xr ycr23 − ycr12 Notice that, at this point, we could simply substitute Equations 3 and 4 in 5 and 6, thus obtaining a single, closed form expression as in Equation 1. Sensitivity to errors can be analytically written as follows: Δxr = Δθ12 · ∂fx /∂θ12 + Δθ23 · ∂fx /∂θ23 Δyr = Δθ12 · ∂fy /∂θ12 + Δθ23 · ∂fy /∂θ23 Δθr = Δθ12 · ∂fθ /∂θ12 + Δθ23 · ∂fθ /∂θ23
(7)
which requires to compute the Jacobian J of the partial derivatives of fx , fy , fθ with respect to θ12 and θ23 . By noticing that the denominator in Equation 4 and 5 is the same, we are allowed to write: n = num (db2 ) , d = den (db2 )2 ∂xr ∂θ12 ∂yr ∂θ12 ∂θr ∂θ12
cr12 n = 2 ∂y ∂θ12 d − 2 (ycr23 − ycr12 )
1 ∂n d∂θ12
−
n ∂d d2 ∂θ12
1 ∂n n ∂d cr12 n = −2 ∂x + 2 (x − x ) − cr23 cr12 2 d ∂θ12 d ∂θ12 ∂θ12 d ∂ycr12 cr12 = d1 ∂x (y − y ) − (x − xcr12 ) cr23 cr12 cr23 ∂θ12 ∂θ12
(8)
Similarly, we can compute the partial derivatives with respect to θ23 . Equation 8 can be used 1) to improve the system’s accuracy in run-time and 2) to compute the sensitivity to errors for every r and with different beacon configurations. 1. In Equation 1, we expressed r= (xr , yr , θr ) as a function F(θ12 , θ23 ); however, as we already stressed, other two choice are available, namely to consider the couple (θ23 , θ32 ) or (θ12 , θ31 ). We call F1 = F(θ12 , θ23 ), F2 = F(θ23 , θ31 ) and F3 = F(θ12 , θ31 ) and J1 , J2 , J3 the corresponding Jacobian matrices of the partial derivatives. Whenever a triplet (θ12 , θ23 , θ32 ) is measured, we compute, for each component of the state (xr , yr , θr ), which Ji yields the minimum sensitivity to errors (it
46
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
Figure 2. a) Computing the circle center b) Intersecting circles A. Circle Intersection
just requires to compute Equation 8 and 7 three times): this allows to choose which of the subtended angles (θ12 , θ23 , θ32 ) should be considered (and which one should be ignored). Angles which are close to 0 or π are automatically rejected; solutions which are very sensitive to errors can be rejected as well. Finally, when more than three beacons are available, Ji can be used to determine which triplet should be considered. 2. For any given beacon distribution, we are allowed to predict the positioning error distribution p(errr |r) for any location of the environment r in which at least three beacons are visible. To achieve this, for any given r we compute the three expected angles (θ12 , θ23 , θ32 ) = h(r) (i.e., h is the measurement model), we evaluate the three expected Jacobian matrices J1 , J2 , J3 as a function of r as in Equation 8, and we finally derive positioning errors from measurement errors p(errs |r) through Equation 7. If we limit ourselves to a worst case analysis, computing the maximum positioning error as a function of the maximum measurement error is trivial; otherwise, more complex manipulations are required. This procedure can provide guidelines for beacon arrangement.
4. Experimental Results In the following, to simplify computations, we assume a maximum error errθmax on the measure of θ12 , θ23 , and θ31 , thus ignoring the full distribution p(errs |r); moreover, we consider the x− and y−components of r, whilst θr is neglected. Since errθmax = Δθ12 = Δθ23 = Δθ31 in Equation 7 is constant, we can easily compute, for each Fi , i = 1, .., 3, the maximum position error errrmax as a function of errθmax and r: in Equation 9, (θ12 , θ23 ) = h(r) are considered.
∂fx 2 ∂fy 2 ∂fy 2 ∂fx 2 errr max = Δx2r + Δyr2 = errϑ max + + + (9) ∂θ12 ∂θ23 ∂θ12 ∂θ23 Under these assumptions, the Euclidean norm S of J is sufficient to determine the sensitivity to measurement errors for each position in the Workspace r, and can be effectively adopted to decide which angle in the triplet (θ12 , θ23 , θ32 ) should be ignored. Figure 3 shows the plot of S(r) versus (xr , yr ) when b1 = (0, 0), b2 = (10, 5), b3 = (10, 0) and different couples of subtended angles are considered, i.e., S1 (θ12 , θ23 ), S2 (θ23 , θ31 ), and S3 (θ12 , θ31 ); values are saturated to Smax = 100 for ease of representation. In all the Figures the highest sensitivity to error corresponds to the circle which connects the three beacons; positions which are close to this circle have an increasing sensitivity, thus
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
47
Figure 3. Sensitivity computed with (θ12 , θ23 ), (θ23 , θ31 ), and (θ12 , θ31 ); minimum sensitivity.
yielding very inaccurate results during localization. Straight lines connecting couples of beacons yields inaccurate results as well (e.g., b1 b2 and b2 b3 for S1 ); however, it is sufficient to consider different couples: Figure 3 shows the plot of mini (Si ), in which the singularity is no more present. Figure 4a shows the plot of errrmax (expressed in meters) when errθmax = 1◦ , projected onto (x, errrmax ): it can be noticed that the highest accuracy (about 6cm) is achieved within the connecting circle, whilst errors increase outside. Figure 4c and d shows S(r) with different beacon configurations: by plotting sensitivity it is possible to infer how to arrange beacons in order to achieve the desired accuracy in any given area of the environment. Finally, the approach has been tested for many years with different experimental setups: Figure 4b shows the positioning errors of a real robot by considering the couple (θ12 , θ23 ); the resolution is obviously lower (data are collected every 0.3m), but the trend of errr and its magnitude approximately meet our expectations (values expressed in mm) 5. Conclusions If we want autonomous mobile robots to “hit the market” for civilian applications, it is fundamental to provide customers with a priori accurate specifications of the behavior of a system in a previously untested environment. To deal with the specifiability requirement, we propose to borrow localization methods from industrial AGVs; in particular, beacon based triangulation, which proves to be a good compromise for its simplicity and versatility. Analytical expressions of triangulation and its sensitivity to errors are given: simulations show how this information can be used to predict the system’s accuracy through geometric considerations.
48
F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization
Figure 4. a) Plot of errrmax b) A real world experiment c, d) Sensitivity with different beacon configurations
References [1] Thrun, S., Beetz, M., Bennewitz, M., Burgard, W., Cremers, A., Dellaert, F., Fox, D., Hähnel, D., Rosenberg, C., Roy, N., Schulte, J., Schulz. D., 2000. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva. Int. J. of Robotic Research, Vol. 19, No 11. [2] Burgard, W., Cremers, A.B., Fox, D., Hähnel, D., Lakemeyer, G., Schulz, D., Steiner, W., and Thrun., S., 2000. Experiences with an interactive museum tour-guide robot, AI, 114 (1-2). [3] Siegwart R., Arras K.O., Bouabdallah S., Burnier D., Froidevaux G., Greppin X., Jensen B., Lorotte A., Mayor L., Meisser M., Philippsen R., Piguet R., Ramel G., Terrien G., Tomatis N., 2003. Robox at Expo.02: A Large-Scale Installation of Personal Robots, Robotics and Autonomous Syst., vol. 42, issue 3-4. [4] Moravec, H., Rise of the Robots, in Scientific American, December 1999, pp. 124-135. [5] J. Borenstein, H. R. Everett, L. Feng, Where am I? Sensors and Methods for Autonomous Mobile Robot Positioning - Available from ftp.eecs.umich.edu/people/johannb, 1996. [6] Matsuda, T. and Yoshikawa, E., 1989. Z-shaped Position Correcting Landmark, Proc. of the 28th SICE Annual Conference. [7] Laird, R.T, H.R. Everett, G.A. Gilbreath, R.S. Inderieden, and K.J. Grant, "Early User Appraisal of the MDARS Interior Robot," American Nuclear Society 8th International Topical Meeting on Robotics and Remote Systems (ANS’99), Pittsburgh, PA, 25-29 April, 1999 [8] Maddox, J., Smart Navigation Sensors for Automatic Guided Vehicles, Sensors, April 1994. [9] TRC - Transitions Research Corp., Beacon Navigation System, Product Literature, Shelter Rock Lane, Danbury, CT 06810, 203-798-8988. [10] Cohen, C. and Koss, F., 1992. A Comprehensive Study of Three Object Triangulation, Proc. of the 1993 SPIE Conference on Mobile Robots, Nov. 18-20, pp. [11] Shoval, S., and Sinriech, D., 2001. Analysis of Landmark Configuration for Absolute Positioning of Autonomous Vehicles, Journal of Manifacturing Systems, Vol. 20 No. 1. [12] Koppius, O., 1999. Dimensions of Intangible Goods, Proc. of the Thirty-second Annual Hawaii Int. Conference on System Sciences-Vo 5, 1999
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
49
Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments Mattia Castelnovi, Antonio Sgorbissa, Renato Zaccaria Laboratorium DIST Università di Genova {mattia.castelnovi, antonio.sgorbissa, renato.zaccaria}@unige.it Abstract—A force field algorithm based on inclinometer readings is presented. It leads the robot to the goal by preventing it from turning upside down because of the inclination of the terrain. The method is supported by a navigation algorithm, which helps the system to overcome the well known local minima problem for force fieldbased algorithms. By appropriately selecting the “ghost-goal” position, known local minima problem can be solved effectively. When the robot finds itself in danger of local minimum, a “ghost-goal” appears while the true goal temporarily disappears in order to make the robot go outside dangerous configurations. When the robot escapes the possible dangerous configuration the system makes the true goal appear again and the robot is able to continue on its journey by heading towards it. Simulation results showed that the Ghost-Goal algorithm is very effective in environments with complex rugged terrain for a robot only equipped with an inclinometer.
1. Introduction Autonomous outdoor navigation in vast unknown environments is a special challenge for mobile robots. In the recent past, autonomous mobile robots have been able to accomplish a great deal of tasks in indoor environment, such as exploration[1], self-localization[2] and map building[3]. In most of these cases the problem of autonomous navigation supposed that the world is a flat surface. Such assumption allowed researchers to develop navigation algorithms that consider the robot a material point, and to consequently plan a path to the goal as a curve on a plane. One category of algorithms which proved to be especially suited to deal with unknown, dynamic environments is the so called force-field (also known as Artificial Potential Field) method. Unfortunately, when moving autonomous robots from indoor to outdoor environments many problems arise. In fact, in outdoor environments robotic platforms must deal with other hazardous situations [4], in addition to collision with obstacles one of them being the risk of turning upside down while trying to reach the goal [5, 6]. When the robot is climbing up or down a hill, moving through dunes, etc., the navigation system must take into consideration the maximum allowable pitch and roll values for the robot to avoid turning upside down. The following work proposes a forcefield algorithm which relies on data acquired from an inclinometer with the purpose of keeping the robot far from “dangerous slopes”. Obviously the difference between “dangerous slopes” and obstacles is quite hard to describe: in this paper the ground slope is considered as if it were continuous; we assume that obstacles would locally produce an
50
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
infinite derivative, which cannot be detected by the inclinometer and requires ad hoc sensors (proximity sensors, for example). However obstacles are not considered here, we focus on a force-field algorithm which proves to be adequate to lead the robot to the goal, whilst preventing it from turning upside down because of the inclination of the terrain. Our method differs from other existing approaches, which usually rely on vision systems [7, 8] or proximity sensors [6], our system is thus provided with just an inclinometer which locally provides information about the current pitch and roll of the robot. As a consequence of the poor sensing capabilities that are available, the potential for getting stuck in local minima is higher. This problem is solved by a novel approach, which we call the “ghost goal” method. The method works in the following manner: when the robot finds itself in danger because of a local minimum, a “ghost-goal” appears, and the true goal temporarily disappears. Once the potential for local minima is reduced, it is able to continue on its journey by heading towards the true goal. Simulation results showed that the approach is very effective in environments with complex configuration for a robot equipped just with an inclinometer. The paper is structured as follows: section II describes how the force-field inspired method is implemented using data from an inclinometer; section III is focuses Ghost-goal algorithm which help the system not to fall into local minima; section IV describes experimental results. The final section is devoted to conclusions.
2. Inclinometer Based Force Field Method The presented system implements a force-field method based on the robot position and its pitch and roll angles. In force-field methods [9], the goal configuration exerts an attractive force on the robot, while obstacles exert a repulsive force. We extend the basic principle in order to deal with constraints on the robot’s maximum allowable pitch and roll. First of all we define a fixed Cartesian frame (Of, Xf, Yf) and a Cartesian frame (Or, Xr, Yr) centered on the robot, with the Xr axis oriented along the heading of the robot (Fig.1). In the following, all the quantities with subscript ‘f’ are defined with respect to (Of, Xf, Yf ), whilst those with subscript ‘r’ are defined with respect to (Or, Xr, Yr). Since the robot is going to be used in outdoor missions, we assume that it is equipped with a GPS and that the system is able to localize itself and to know the position of the goal with enough precision for our purpose. In particular, we define: x pr and rr , two unit vectors lying respectively along the Xr and the Yr axes. x qf(t), the configuration of (Or, Xr, Yr) with respect to the fixed frame at time t x tr(t), a unit vector directed from qf(t) to the goal x fr(t), the sum of the repulsive forces exerted by terrain slopes in qf(t) x gr(t), the gradient of the terrain in qf(t) expressed as inclination percentage x ir(t), a vector defined as follows: ir(t)·gr(t)=0, |ir(t)|=|gr(t)| (tangential to the isocline passing through qf(t), with the same magnitude as the gradient) It is straightforward to show that, if we want to minimize the pitch of the robot at time t, we need to minimize the absolute value of the dot product pr·gr(t), which can be done by forcing the robot to move along the isocline curve (i.e., by maximizing|pr·ir(t)|). On the
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
51
opposite, if we want to minimize the roll, we have to minimize the absolute value of rr·gr(t), which can be done by forcing the robot to climb or descend the gradient (i.e., by maximizing |pr·gr(t)|). Thus, if we want to minimize both the pitch and the roll components, we have to choose the direction of motion of the robot in order to maximize the quantity |pr·ir(t)|+|pr·gr(t)|. This can be done by choosing the velocity vector (i.e. the robot heading) v(t)= ir(t) + gr(t). (1) However, in most cases, the maximum values allowed for the pitch and the roll components are different, since they depend on various physical characteristics (e.g. geometrical dimensions, weight distribution). Thus, at time t, given the current values pr(t) and rr(t) and their maximum allowed values (PitchUB, RollUB), we define the following quantities: 1 (2) mag r
mag p
RollUB
2
- RollAngle2
2
1
PitchUB
2
(3)
- PitchAngle 2
2
Finally, we calculate the velocity vector v (t)
mag p i r (t) mag r g r (t) mag p
2
i r (t)
2
mag r
2
g r (t)
(4) 2
as the most promising direction for the robot to safely move to (we are not considering the goal yet, nor obstacles to be avoided). In other words, as the pitch of the robot gets closer to its upper bound, magp quickly increases, thus forcing the robot to move along the isocline curve. Analogously, as the roll component gets closer to its upper bound, magr quickly increases, forcing the robot to climb/descend the gradient. That is, magp and magr are used to weight the sum of ir(t) and gr(t) in eq. 4, thus imposing different constraints for the maximum pitch and roll values (respectively PitchUB and RollUB) in order to safely keep the robot away from a dangerous configuration. Finally notice that, for each configuration qf(t), there are two vectors tangential to the isocline: ir(t) and -ir(t). As a consequence, since we want to maximize the absolute value | pr·ir(t)|, we can choose both ir(t) and -ir(t) in eq. 4. The choice is performed on the basis of the following rule: If ir(t) . tr(t ) < - ir(t). tr(t) choose ir(t) else choose - ir(t)
(5)
The vector pointing in the direction closer to the goal is selected to introduce a bias toward it (vector tr(t) ). In the same way the gr (t) vector is selected: if gr(t) · tr(t ) > - gr(t)·tr(t) choose gr(t) else choose - gr(t)
(6)
Finally, the attractive vector tr(t) heading towards the target is weighted and added to the velocity vector vr(t), as is typical in force-field models [1]. The resulting law of motion is:
52
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
vr (t)=w1(magp ir(t) + magr gr(t))+w2 tr (t)
(7)
which drives the robot towards the goal while safely keeping pitch and roll values below the given threshold (w1, w2 are weighting factors) . In Fig. 3 it is possible to see different paths made by the robot when different upper bounds were imposed (in figures the robot icon just represents the robot and it is not scaled, but in the algorithm the robot is treated as a point). The Fig 3.a shows the path when both PitchUB and RollUB are at a very low value (5°=11.1%). The Fig 3.b shows the same mission with PitchUB = 5° and RollUB = 90° and the rest of the figures show the robot path when other values of PitchUB and RollUB (in Fig. 3 as p and r) are selected. In Figs. 13 and 14, the behavior of system with different PitchUB and RollUB values are shown when the Goal is located on the top of the hill. In the first of those Figure RollUB is much higher than PitchUB (PitchUB = 30°, RollUB = 90°). As a consequence, the robot moves towards the top of the hill by following a spiral shaped trajectory, which is necessary to keep the pitch very low with respect to the roll. In the second experiment (Fig. 14) the opposite is true: since PitchUB is much higher than RollUB (PitchUB = 90°, RollUB = 30°). The robot climbs toward the top of the hill by following a straight trajectory which allows to keep the roll value very low.
3. Ghost-goal algorithm 3.1. Local Minima problem in force- field methods The force-field method considers the robot as a particle under the influence of an artificial field of forces which is expected to summarize the “structure” of the free space [10]. The force-field function is usually defined as the sum of an attractive vector which pulls the robot toward the goal configuration and a repulsive vector which pushes the robot away from obstacles. One known drawback of the force-field method is the occurrence of local minima in the force function [15]. Local minima occur in configurations where the repulsive force generated by obstacles cancel exactly the attractive force generated by the goal (Figs. 6, 7, 12). The Ghost-Goal algorithm includes appropriate techniques for escaping from local minima. 3.2. Ghost-Goal applied on a inclinometer based Force Field Method This section explains how the Ghost-Goal algorithm avoids local minima. The Ghost-goal algorithm can be divided into 2 steps: 1) Barrier detection, 2) Ghost-Goal position selection. 1. Barrier detection: two unit vectors are introduced: the first one, which we call tdisc(t) heads to the goal like unit vector tr(t), but its direction is discretised with a very low resolution of 45° (i.e., like a wind star). The second one, which we call gdisc(t), points toward the direction of the gradient like gr(t) (with the same resolution than tdisc(t)). For example, in Fig. 4. , tdisc(t) is pointing West, and gdisc(t) is pointing North-West instead. The system periodically checks the direction of tdisc(t) and gdisc(t), and makes some
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
53
considerations about the situation that the robot is probably about to face. If the two unit vectors “almost” lie over the same straight line, the robot is possibly approaching a local minimum (Fig. 5). It has been decided to discretized the directions because it is a fast and easy way to compare the directions of 2 vectors keeping a margin of safety: If (| tdisc(t) gdisc(t)| = 1 )AND(|pr| > PitchBarrierFinderTH OR |rr| > RollBarrierFinderTH) then barrier = TRUE (8) bdisc(t) = gdisc(t) else barrier = FALSE Of course, this can happen only if the slope of the ground is significantly high to influence the robot’s motion as stated in Equations 4 and 7. At the same time, the system monitors if both pitch and roll values are below their Barrier-Finder-thresholds defined as follows: PitchBarrierFinderTH = PitchUB/\ RollBarrierFinderTH = RollUB/\
(9)
where \ ҏis a constant which manage the sensibility of the system to the local minima. If one of them is over the threshold and the tdisc(t) and gdisc(t) almost lie over the same sector direction (for example in Fig. 5 gdisc(t) is pointing North-West), a flag barrier is set equal to TRUE and a new vector bdisc(t) is set equal to gdisc(t)..This means that there is a potential local minimum to the goal. This step allows the system to identify dangerous situations which are due either to the slope of the ground or the relative position of goal.
Figure 2. 3D model for Figure 1. Definition of the area of the the used vectors experiments in Fig.3.
Figure 4. Configuration Figure 5. Configuration Figure 3. Different experiments made with different without barrier detection with barrier detection PitchUB and RollUB .
2. Ghost-Goal position selection. Once we have detected a possible local minimum, the robot must avoid falling into it. To deal with this, the algorithm does not require to perform local search procedures; in fact, a solution can be easily found just by temporarily move the Goal position, thus providing the robot with fallacious perceptions which, in some sense, are the analogous of what we call “a mirage”. The robot “sees” the goal in a more
54
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
conformable and less hazardous position and tries to reach it using the same rules used in its usual (= without dangerous situations) navigation. In this way it takes itself far away from the local minima. If barrier = TRUE Then Set the Ghost Goal(…)
(10)
As soon as the dangerous situation disappears, the system is able to “understand” that it was just a mirage, a ghost, and it is no more useful to follow that direction. In fact, when the relative position of the robot has changed with respect to the goal and the surrounding hills or valleys, the just avoided local minima -probably - are not dangerous anymore. The robot can start again heading to the real goal. Obviously, we must carefully select the Ghost-Goal in order to make the robot escape from local minima while moving closer to the real goal or, at least, while not going “too far” from it. To achieve this, the Ghost Goal is not explicitly represented as a position in the Workspace, but as a suggested direction tsafe(t) (a unit vector with the same 45° discretization than tdisc(t), gdisc(t), bdisc(t)), which is chosen according to the following rules: Set the Ghost Goal(…) If RobotDir[arg(bdisc(t)-HT), arg(bdisc(t))+HT] then arg(tsafe (t)) = arg(bdisc(t))±HT
(11)
where HT identifies the maximum deviation from the real goal allowed; its sign is chosen in order to minimize the discrepancy between tsafe (t) and tdisc(t). If barrier = FALSE then tsafe(t) = tdisc(t)
(12)
I.e., as soon as the gradient changes, either in its direction or its magnitude, the Ghost-Goal disappears and the robot starts heading again towards the real goal.
4. Experimental Results The system has been developed and tested using the AE-SIM simulator [11] a tool included in ETHNOS [12] (Expert Tribe in a Hybrid Network Operating System) a programming environment for multi autonomous robot systems. The software allows the robot to receive data from the simulated environment in the same way as it would receive it from a real sensor. To simulate an outdoor environment, Digital Elevation Models have been used. The size of each cell is 1 meter2, which is very accurate compared to other inclinometer-driven navigation algorithms implemented in simulation for analogous mobile robot tasks [13]. Figure 11 shows the path of the robot in a labyrinth which has a flat surface on the ground and sloped walls for making inclinometer sensing. This experiment shows that the robot reaches the goal even though it encounters 5 local minima. In Figs. 8, 9, 10, a “castle-like” environment is considered. Walls are inclined with an increasing slope (i.e., the second
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
55
order derivative is positive). The goal is placed onto a “flat” square in the middle of the castle. A small entrance is present in the walls, to show that the system is able to find paths in very difficult situation too. The first mission (Fig.9) is made with a low value of PitchUB and a high value of RollUB. The robot reaches the entrance after a spiral shaped trajectory that makes the robot come closer to the goal without overcome the threshold. This simulation shows how the Ghost-Goal algorithm it is able to reach the goal even in extreme situations. Next figure (Fig.10) shows the same experiment made with a low value of RollUB and a high value of PitchUB. Since the walls at their top are almost vertical the robot turns in order not to damage itself. Two short turns allow it to reach the goal.
Figure 6. Same of Fig. 2.b. without Ghost-Goal algorithm
Figure 8. 3D model Figure 7. Same of for the castle-like Figs. 13 and 14 Figure 9. Spiral Figure 10. Straight to environment without Ghostgoal robot shaped robot trajectory the experiment Goal algorithm trajectory.
13. Spiral Figure 14. Straight to Figure 11. Path made inside Figure 12. Same of Fig.11 Figure without Ghost-Goal algorithm shaped trajectory goal trajectory. a wall-sloped Labyrinth
In Figs. 6 and 7 examples of navigation missions without the Ghost-goal algorithm is shown: in Fig. 6 is possible to see the same experiment of Fig 2.b. using the same threshold (PitchUB = 5° and RollUB = 90°). The robot stops in front of the hill in correspondence of a local minimum. It continues to wheel around itself, but it is not able to move away from that position. The same happens in fig, 7, the robot, after coasting around the hill, stops in a local minimum and it is not able to escape from that position. In Fig 12 the robot stops at the first local minimum of the labyrinth. It is important to remember that, in these situations, the robot would have been able to reach the goal. Because the PitchUB is lower than the slope of the hills or the walls of the labyrinth it could have reached the goal by simply coasting around the hill (or the walls of the labyrinth) or using a spiral shaped trajectory.
5. Conclusions This paper presents a force-field inspired method based on inclinometer data which implements a new algorithm for escaping from local minima. The results from experiments
56
M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments
conducted on different kinds of slope and threshold configurations have been shown. The developed system seems to be robust enough to be used with real robot. Before and during the system’s development, many experiments were made with different slopes and robot configurations. Without the Ghost-Goal algorithm, the robot was often able to overstep the upper bounds on its allowed pitch and roll, i.e., a really dangerous situation in the case a real robot. After developing the Ghost-Goal algorithm, the simulated robot becomes able to complete missions without exceeding the maximum allowed pitch or roll. The Ghost-goal algorithm allows the system to better capitalize on its capabilities and to reach the goal overcoming the problem of local minima even if some issues still arise, for example, from situation like C-obstacles due also to the discretization of the direction. For a complete autonomous navigation using the Ghost-Goal algorithm, future work will focus on, installing the system on a real robot provided with a localization system and testing it with supplementary sensors (e.g., for obstacle avoidance). REFERENCES [1] [2] [3] [4]
[5] [6]
[7] [8] [9] [10] [11]
[12]
[13] [14]
[15]
W. Burgard, M. Moors, C. Stachniss, and F. Schneider Coordinated Multi-Robot Exploration. IEEE Transactions on Robotics, 21(3), 2005. S. Thrun, D. Fox, W. Burgard and F. Dellaert Robust Monte Carlo localization for mobile robots. Journal of Artificial Intelligence, 128 (1-2), pp. 99-141, 2001. Folkesson, J., Jensfelt, P., and Christensen, H. Vision slam in the measurement subspace. In Intl Conf. on Robotics and Automation (Barcelona, ES, Apr 2005), IEEE. ] Castelnovi, M., Arkin, R., and Collins, T., 2005. "Reactive Speed Control System Based on Terrain Roughness Detection", 2005 IEEE International Conference on Robotics and Automation, Barcelona, SP, April 2005. Arkin, R.C. and Gardner, W., 1990. "Reactive Inclinometer-Based Mobile Robot Navigation", Proc. 1990 IEEE International Conf. on Robotics and Automation, Cincinnati, OH, pp. 936-941. Marco Giaroli, Antonio Sgorbissa, Renato Zaccaria, 2002, "Reactive Safe Navigation in Outdoor Environments", International Symposium on Robotics and Automation (ISRA 02), September 2002, Toluca, Mexico. Langer, D., Rosenblatt, J., and Hebert, M.,.1994. A Behavior-Based System for Off-Road Navigation, IEEE Trans. Robotics and Automation, Vol. 10, No. 6. Matthies, L.H., 1992. Stereo Vision for Planetary Rovers: Stochastic Modeling to Near Real-Time Implementation, International Journal of Computer Vision, 8:1. Arkin, R.C., 1989. "Motor Schema-Based Mobile Robot Navigation", International Journal of Robotics Research, Vol. 8, No. 4, pp. 92-112. Brooks, R. A., 1986, "A Robust Layered Control System for a Mobile Robot." IEEE Journal of Robotics and Automation, Vol. RA-2, No. 1, pp. 14-23. Andrea Raggi, Antonio Sgorbissa, Renato Zaccaria, “AE-SIM: Simulating Intelligent Robots in Intelligent Environments”, Proceedings from The IEEE Conference on Control Applications/ International Symposium on Intelligent Control/ International Symposium on Computer Aided Control Systems Design (CCA/ISIC/CACSD 2004), Taipei, Taiwan, September 1-4 2004 Maurizio Piaggio, Antonio Sgorbissa , Renato Zaccaria ," A Programming Environment for Real Time Control of Distributed Multiple Robotic Systems " Advanced Robotics , The International Journal of the Robotics Society of Japan, Vol. 14, N.1, VSP Publisher, 2000. Arkin, R.C. and Gardner, W., 1990. "Reactive Inclinometer-Based Mobile Robot Navigation", Proc. 1990 IEEE International Conf. on Robotics and Automation, Cincinnati, OH, pp. 936-941 Zou Xi-yong and Zhu Jing “Virtual local target method for avoiding local minimum in potential field based robot navigation”, Journal of Zhejiang University SCIENCE (ISSN 1009-3095, Monthly) 2003 Vol. 4 No. 3 p.264-269. Arkin, R. 2000. Behavior-Based Robotics. MIT Press, Cambridge, MA .
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
57
Autonomous Robot Vision System for Environment Recognition Woong-Jae Won, Sang-Woo Ban and Minho Lee* School of Electrical Engineering and Computer Science, Kyungpook National University 1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Korea *
[email protected] Abstract. In this paper, we propose a novel autonomous robot vision system that can recognize an environment by itself. The developed system has self-motivation to search an interesting region using human-like selective attention and novelty scene detection models, and memorize the relative location information for selected regions. The selective attention model consists of bottom-up saliency map and low level topdown attention model. The novelty detection model generates a scene novelty using the topology information and energy signature in the selective attention model. Experimental results show that the developed system successfully identify an environment as well as changing of environment in nature scene. Keywords. Autonomous robot vision system, selective attention model, novelty scene detection, environment recognition
Introduction The present intelligent system has been mainly based on task-specific manners and/or the programmer’s input about environmental information. Even through such a traditional artificial intelligent robot has been successfully used for simple tasks, we need a new artificial intelligent paradigm to continuously adapt the changes of an environmental and to voluntarily increase a new knowledge without any interrupt by a programmer. This concept would be useful in developing a truly human-like robot and it could be applied to many engineering fields such as artificial secretary, household robot, medical treatment nursing robot, etc. Recent researches have been toward developing a more human-like machine such as an autonomous mental development (AMD) mechanism [1,2,3]. The AMD mechanism is that robots can increase their own knowledge through continuous interaction with human and environment. In order to implement such an intelligent system with the AMD concept, we need to endow the developed robot system with environment perception and novelty detection based on human interaction [4]. In this paper, we propose an autonomous robot vision system which can not only learn the surroundings with self-motivated desire but also detect a novelty in a dynamic scene. The developed system is based on human-like selective attention model which imitates the human visual search mechanism. The selective attention model uses bottom-up process in conjunction with an interactive process that ignores an unwanted area and/or pays attention to a desired area in subsequent visual search process through human interaction [5,6,7]. Novelty detection has been used in video streams by
58
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
Gaborski et al. [8], in which the subtraction between two saliency maps for two consecutive input frames was used to detect a novel scene. In section 1, we describe an object preferred selective attention model with a scaleability using a maximum entropy algorithm. In section 2, we explain a novelty detection and environment recognition model that can memorize a visual scan path topology and an energy signature for a dynamic scene. In section 3, we describe the implementation of proposed autonomous robot vision system, and experimental results will be followed. In final section, we summarize our results and mention about a further research.
1. Visual selective attention model The proposed selective attention model is implemented by imitating the functions of the visual cortex from the retina to the LIP (Lateral Intra-Parietal cortex) through the LGN and the visual cortex area [9]. Reward and punishment function of the hypothalamus and associated limbic structures is considered to develop the low level top-down attention model reflecting the reinforcement and inhibition function [6,9]. Figure 1 shows the proposed object preferred selective attention model which consists of bottom-up processing and low level top-down attention model.
Figure 1. Object preferred selective attention model
The simple biological visual pathway for bottom-up processing is from the retina to the visual cortex through the LGN [9]. In our approach, we use the bottom-up saliency map (SM) model that reflects the functions of the retina cells, the LGN and the visual cortex. Since the retina cells can extract edge and intensity information as well as color opponency, we use these factors as the basis features of the bottom SM model. In order to consider the function of the LGN and the ganglian cells, we consider the oncenter and off-surround operation by the Gaussian pyramid image with different scale, and then constructing three feature bases such as intensity, edge, color feature map. These feature maps make a candidate of salient regions by the bottom-up processing. In order to reflect the human-like attention that has prefer selection to an object, we consider a symmetry operator because an arbitrary object contains a symmetry
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
59
information, which is realized by the noise tolerant generalized symmetry transformation (NTGST) algorithm based on edge information [10]. After obtaining the candidate of selective attention regions, we modify the scan path to pay attention on an object by considering the degree of symmetry. Then, we can construct a saliency map which has more preference on an object [7]. Then, we adapt an entropy maximization approach to select the proper scale of the salient areas by using the bottom-up SM model which has prefer attention to an object. The scale selection algorithm is based on Kadir’s approach [11]. For each salient location, the proposed model chooses those scales at which the entropy is its maximum or peaked, then it weights the entropy value at such scales by some measure of the self-dissimilarity in the scale space of the bottom-up SM model. The most appropriate scale for each salient area centered at location x is obtained by equation
scale ( x )
arg m ax{ H D ( s , x ) u W D ( s , x )}
(1)
s
where D is the set of all descriptor values, H D ( s, x) is the entropy defined by equation (2) and WD ( s, x) is the inter-scale measure define by equation (3)
H D ( s , x ) ¦ p d , s , x log 2 p d , s , x
(2)
d D
W D (s, x)
s2 2s 1
¦
p d , s ,x p d , s 1,x
(3)
dD
where pd , s , x is the probability mass function for scale s, position x, and descriptor value d, which takes on value in D. Although the proposed bottom-up the SM model is generated only by primitive features such as intensity, edge, color and symmetry information. In order to develop a more plausible selective attention model, we need to develop an interactive procedure with a human supervisor, together with bottom-up information processing. Human beings ignore uninteresting area, even if they have primitive salient features, and they can memorize the characteristics of the unwanted area, Humans do not pay attention to new areas that have characteristic similar to learned unwanted area. In addition, human perception can focus on an interesting area, even if it does not have primitive salient features, or if it is less salient than other areas. We propose a now selective attention model that mimics the human-like selective attention mechanism and that can consider not only primitive input features, but also interactive properties in the environment as reflecting limbic structures including the hypothalamus function in our brain [9]. We use a fuzzy ART network “learns” and “memorizes” the characteristics of uninteresting and/or interesting areas decided by a human supervisor [6]. After the successful training of the fuzzy ART network, an unwanted salient area is inhibited and a desired area is reinforced by the vigilance value U of the fuzzy ART network [6,12]. As shown in figure 1, inputs for the fuzzy ART network can be composed of intensity, edge and color feature maps within a salient area. However, the sizes of the selected salient areas can be different, and thus we need to consider how we can adapt the input dimension of the fuzzy ART network corresponding to the various sizes of the salient regions. In this paper, we use a log-polar transformation to generate a fixed size of input dimension for the fuzzy ART network for various sizes of salient areas in the selective attention model. It means that one of the quantized areas by log-polar transformation corresponds to one of input neurons in the fuzzy ART network. The proposed model can generate a fixed uniform scale for every salient area with different
60
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
scales, of which the proposed fuzzy ART model makes to train the arbitrary scaled feature maps for the selected area in order to reinforce and inhibit a salient region.
2. Environment recognition based on novelty detection In order to develop a low level novelty scene detection model, we developed a robust scene description model with a tolerance against noise and a slight affine transformation such as translation, rotation, zoom-in and zoom-out in a dynamic environment. Such a model can be used to detect novelty by comparing the description of the current scene with that of an experienced scene, like the assumable function of the hippocampus in our brain. In the proposed model, we developed a relative topology matching method by using visual scan paths obtained from a saliency map, which were generated by the low level top-down attention model, together with the bottom-up saliency map (SM) model. We can describe an input scene robustly by using the visual scan path based on the selective attention model. Moreover, the size information of every salient area is obtained by the entropy maximization method, which is used to represent a scan path topology of an input scene. In addition, the energy signatures obtained from the SM are used as an additional factor to decide whether a novelty is occurred in the input scene [13]. The obtained scan path topology and energy signature are used as the input of the developed incremental computation model, which can incrementally memorize scene information that is experienced at a new place or at different time for the same place. Moreover, the proposed model is able to recognize the placement and the direction of a field of view from memorized scene information in the incremental computation model. Figure 2 shows the proposed incremental computation model of which the input consists of geometry topology of locations of the salient areas, the scale information of the salient areas and the relative energy signature, which are defined as a relative distance score f d , a relative scale score f s , and a energy score f e , respectively. If the model is executed for the first time, the incremental computation model needs to be initiated with only one output neuron. When the input of the incremental computation model is prepared, the proposed model obtains the location information pi of the camera system and the direction information d j of the field of view of the camera system, by feedback information such as encoder sensors. If the location information is new, the incremental computation model generates a new output node for that location and direction. After getting the location and direction information, the proposed model selects a winner node from among the nodes related with the dedicated location and direction. Then, the model checks the degree of similarity using a vigilance value U ' . If the degree of similarity between the current input scene and the winner node is less than the vigilance value U ' , then the proposed model indicates it is a novelty scene and creates a new node, otherwise the current input scene is considered as previously seen and the proposed model adjusts the weights of the winner node according to the input scene through the unsupervised learning. The developed model can recognize the location of its placement by using the trained incremental computation model for scene memorization and scene novelty detection, as described
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
61
above. The proposed placement recognition algorithm uses the same procedure that makes an input vector ( f d , f s , f e ) for the incremental computation model.
Figure 2. Incremental model for recognizing an environment
Assume that the training for novelty scene detection is successfully finished using the proposed incremental computation model. When the model is placed in a room without a priori information, the proposed computation model autonomously selects a winner node from among all of the output nodes according to the input vector ( f d , f s , f e ) . Then, the proposed computation model can get location and direction information as well as the instance that the model has been seen the scene from the indices pi and d j , respectively, of the selected winner node. Finally, the proposed model can indicate its placement from the experienced scene memory, whenever it is placed at some place.
3.
Hardware implementation and experimental results
We implement a pan-title unit for autonomous robot vision system for recognizing environment. Figure 3 shows implemented system called by SMART-v1 (Self Motivated Artificial Robot with Trainable selective attention model version 1).
Figure 3. SMART-v1 platform
62
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
The SMART-v1 has four DOF and 1394 CCD camera and Text to Speech module (TTS) to communicate with human to inquire about an attentive object, and title sensor to set offset position before starting moving. We use the Atmega128 as the motor controller and zigbee to transmit motor command from a PC to SMART-v1. The SMART-v1 can continuously search an interesting region by the selective attention model which is explained in section 2. Also, it can detect a novelty in dynamic situation as well as environment recognition. Followings are one of operating mode to recognize an environment (1) We set the SMART-v1 in arbitrary place. The SMART-v1 does not have any a priori knowledge for environment. (2) Then, the visual selective attention model operates for the first scene. After deciding the salient regions, it asks whether the selected region interests to human supervisor. After getting an answer about the selected region, the fuzzy ART network learns the region as inhibition and reinforcement region according to supervisor’s decision. (3) The SMART-v1 memorizes the location and direction information of the selected regions by the explained the environment recognition model using the motor encoder signals. (4) If the SMART-v1 detects a novelty in a scene, then it asks the human what it is. Then human gives a labeling, which consists of the geometry topology of locations of the salient areas, the scale information of the salient areas and the relative energy signature, for the selected region. The SMART-v1 simply memorizes the characteristics of the selected region by color histogram and topology of orientation information. (5) After training the environment, we can ask environment information to the SMART-v1 where a specific object is. Then, the SMART-v1 can reply the direction and location information based on past experience. Figure 4 show the experimental results of the proposed selective attention model. Figure 4 (a) is the result of bottom-up SM model, Figure 4 (b) is the result after inhibition for the second attention region in Figure 4 (a). After considering the symmetry information, we can ignore the 5th attentive region because it does not contain symmetry information so much. Finally, we can change the scan path to pay attention a left computer monitor rather than a red cushion as most salient region using the reinforcement fuzzy ART network as shown in Figure 4 (d).
(a)
(b)
(c)
(d)
Figure 4. The result of selective attention model : (a) bottom-up attention point, (b)inhibit for 2nd attention region in (a), (c) ignore for 5th attention region in (b), (d)reinforcement for 2nd attention region in (c)
Figure 5 shows that it is hard to indicate a novel scene by simply subtracting the saliency maps in a dynamic environment, in which the obtained input scenes usually
63
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
have different affine transformed field of view each other. Therefore, Gaborski’s [8] approach can not apply to detecting novel scenes in the dynamic environment. Figure 6 shows that the proposed model can successfully detect a novelty scene in a dynamic environment. Figure 6 (c) has novelty information, while figures 6 (a) and (b) do not have novelty information, except for a slight change in the field of view. Moreover, figures 6 (d) and (e) show that the proposed model can get similar energy signatures in two corresponding SMs. Table 1 shows the performance of the proposed novelty scene detection model using total 61 scenes which consist of 44 novelty scenes and 17 nonovelty scenes at three directions in the interior environment.
(a)
(b)
(c)
(d)
(e)
Figure 5. Scan path topologies and saliency maps for two different time scenes for the assumed same scene that has a little affine transformed field of view: (a) and (b) scan path topologies at time t and t + dt, respectively, (c) and (d) saliency maps at time t and t+dt, respectively, (e) subtraction between (c) and (d)
(a)
(b)
(c)
(d)
(e)
(f)
Figure 6. Scan path topologies and saliency maps for three different time scenes for the assumed same scene that has a little affine transformed field of view: (a), (b) and (c) scan path topologies at time t, t + dt and t + 2dt, respectively, (d), (e) and (f) saliency maps at time t, t+dt and t+2dt, respectively.
Table 1. The performance of the proposed novelty scene detection model:61 scenes were considered to evaluate among which 44 scenes were novel scenes and the remainder 17 scenes were no novel scenes. (TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative)
Novelty perception No-novelty perception Correct Recognition Rate
Novelty scenes (44 scenes) TP 41 FN 3 95.3%
No-novelty scenes (17 scenes) FP 2 TN 15 88.2%
Total (61 scenes) F(P+N) 5 T(P+N) 56 91.8%
64
4.
W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition
Conclusion
We implemented a novel autonomous robot vision system that can detect a novelty as well as environment recognition. The proposed system is based on human-like selective attention model which consists of bottom-up SM and top-down reinforcement and inhibition models. The variant scale problem in selective attention regions can be solved by introducing the log-polar transformation. The incremental model based on novelty detection successfully recognizes both the environment and change of environment. As a future work, we are investigate in efficient way to describe an attention region including a task-nonspecific object representation Acknowledgment This research was funded by the Brain Neuroinformatics Research Program of the Ministry of Commerce, Industry and Energy. References [1]
C. Breazeal : Imitation as Social Exchange between Humans and Robots, in proceedings of the Symposium on Imitation in Animals and Artifacts (AISB99), Edinburg, Scotland, 1999, 96 - 104. [2] J. Weng et, all: A theory for mentally developing robots, The 2 International conference on Development and Learning on Development and Learning, 2002, 131 - 140. [3] B. Scassellati: Investigating models of social development using a humanoid robot, in proceedings of International Joint Conference on Neural Networks, 2003, Portland, USA. 2704 - 2709. [4] S. Singh and M. Markou: An approach to novelty detection applied to the classification of image regions, IEEE Trans. On Knowledge and Data Eng, 16 (2004), 396-407. [5] L. Itti, C. Koch, E. Niebur: A model of saliency-based visual attention for rapid scene analysis, IEEE Trans PATT Mach Intell ,20 (1998), no. 11, 1254 -1259. [6] S. B. Choi, S. W. Ban, M Lee: Human-Like selective Attention Model with Reinforcement and Inhibition Mechanism, in proceedings of the 11th International Conference on Neural Information Processing, 2004, Calcutta, India, 694- 699. [7] W. J. Won, S. W. Ban, M. Lee: Real Time Implementation of a selective attention model for the intelligent robot with autonomous mental development, in proceeding of International Symposium on Industrial Electronics 2005, Dubrovnik, Croatia, D7-D5. [8] R.S. Gaborski, V.S. Vanigankar, V.S. Chaoji and A.M. Teredesai, “VENUS: A system for novelty detection in video streams with learning,” in proceedings of AAAI 2004, San Jose, USA, 2004, pp. 1-5. [9] E.B GoldStein (Ed.), Sensation and perception 4th ed., USA: International Thomson Publishing Company, 1996. [10] K. S. Seo, C. J. Park, C. J. Cho and H. M. Choi: Context-free maker-controlled watershed transform for efficient multi-object detection and segmentation, IEICE Trans, 384-A (2001), no.6, 1392-1400. [11] T. Kadir and M. Brady: Scale, saliency and image description, International Journal of Computer Vision, 45 (2001), 83 -105. [12] G.A Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds and D.B. Rosen: Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Trans. on Neural Networks, 3 (1992), No.5, 698-713. [13] S.-W. Ban and M. Lee: Novelty analysis in dynamic scene for autonomous mental development, Lecture notes in Computer Science, 3696 (2005), 1-6.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
65
Multi-resolution Field D* Dave Ferguson and Anthony Stentz The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA {dif, tony}@cmu.edu Abstract. We present a multi-resolution path planner and replanner capable of efficiently generating paths across very large environments. Our approach extends recent work on interpolation-based planning to produce direct paths through non-uniform resolution grids. The resulting algorithm produces plans with costs almost exactly the same as those generated by the most effective uniform resolution grid-based approaches, while requiring only a fraction of their computation time and memory. In this paper we describe the algorithm and report results from a number of experiments involving both simulated and real field data.
1. Introduction A common task in mobile robot navigation is to plan a path from an initial location to a desired goal location. Typically, this is done by storing the information about the environment in a discrete traversability cost map, where the cost of traversing each area depends on any number of factors, including safety risk, traversal time, or fuel expended, and then planning over this cost map. In robotics, it is common to improve the efficiency of planning by approximating the cost map as a graph. For instance, given a cost map consisting of a 2D grid of cells of uniform size (or resolution), a graph is typically constructed by placing nodes at the centers or corners of the grid cells and inserting edges between each node and the nodes in its 8 adjacent cells. Several algorithms exist for planning paths over such graphs. Dijkstra’s algorithm and A* are useful for computing initial paths to the goal from every location or a single location, respectively [2,6]. D* and D* Lite are very effective at repairing these paths when new information about the environment is received [9,5]. Combining uniform resolution grids with these incremental planning algorithms has been a very popular approach for mobile robot navigation in partially-known or dynamic environments. However, such an approach has two significant limitations. Firstly, the small, discrete set of transitions modeled by the graph can reduce the quality of the resulting paths. In the case of a uniform resolution grid, each node in the extracted graph only has edges connecting it to its 8 closest neighboring nodes. This limits the heading of any solution path to increments of π4 . This can lead to suboptimal paths and undesirable behavior [3]. Secondly, uniform resolution grids can be very memory-intensive. This is because the entire environment must be represented at the highest resolution for
66
D. Ferguson and A. Stentz / Multi-Resolution Field D*
Figure 1. (left, center ) Multi-resolution Field D* produces direct, low-cost paths (in blue/dark gray) through both high-resolution and low-resolution areas. (right) The GDRS XUV robot used for autonomous navigation of outdoor terrain. Multi-resolution Field D* was initially developed in order to extend the effective range of rugged outdoor vehicles such as this by one to two orders of magnitude.
which information is available. For instance, consider a robot navigating a large outdoor environment with a prior overhead map. The initial information contained in this map may be coarse. However, the robot may be equipped with onboard sensors that provide very accurate information about the area within some field of view of the robot. Using a uniform resolution grid-based approach, if any of the high-resolution information obtained from the robot’s onboard sensors is to be used for planning, then the entire environment needs to be represented at a high-resolution, including the areas for which only the low-resolution prior map information is available. Storing and planning over this representation can require vast amounts of memory. To overcome the first of these limitations, improvements to uniform resolution grid-based planners have been developed that use interpolation to produce paths not constrained to a small set of headings [7,3]. In particular, the Field D* algorithm extends the D* family of algorithms by using linear interpolation to derive the path cost of points not sampled in the grid [3]. This algorithm efficiently produces very low-cost paths with a range of continuous headings. A number of techniques have been devised to address the second limitation. One popular approach is to use quadtrees rather than uniform resolution grids [8]. Quadtrees offer a compact representation by allowing large constant-cost regions of the environment to be modeled as single cells. They thus represent the environment using grids containing cells of varying sizes, known as non-uniform resolution grids or multi-resolution grids. However, paths produced using quadtrees and traditional quadtree planning algorithms are again constrained to transitioning between the centers or corners of adjacent cells and can thus be grossly suboptimal. More recently, framed quadtrees have been used to alleviate this problem somewhat [1,10]. Framed quadtrees add cells of the highest resolution around the boundary of each quadtree region and allow transitions between these boundary cells through the quadtree region. As a result, the paths produced can be much less costly, but the computation and memory required can be large due to the overhead of the representation (and in pathological cases can become significantly more than is required by a full high-resolution representation). Also, segments of the path between adjacent
D. Ferguson and A. Stentz / Multi-Resolution Field D*
67
high-resolution cells suffer the same π4 heading increment restrictions as classical uniform resolution grid approaches. In this paper, we present an approach that combines the ideas of interpolation and non-uniform resolution grid representations to address both of the limitations mentioned above. The resulting approach provides very cost-effective paths for a fraction of the memory and, often, for a fraction of the time required by uniform resolution grid-based approaches. We begin by describing how linear interpolation can be used to provide direct, low-cost paths through non-uniform resolution grids. We then present our new algorithm, Multi-resolution Field D*, along with a number of example illustrations and results comparing it to its uniform resolution predecessor.
2. Interpolation-based Path Planning Given a grid with nodes at the corners of every grid cell, classic grid-based planners restrict the available actions from each node to be direct paths to neighboring nodes. This restriction affects the quality and effectiveness of grid-based paths, as mentioned above. 2.1. Using interpolation to produce more accurate path cost estimates We can overcome the limitations of classical grid-based planning if we allow each node to transition to any point on any neighboring edge, rather than to just the small set of neighboring nodes. We define the neighboring edges of a node s to be all edges that can be reached from s via a straight-line path for which s is not an endpoint (see Figure 2 (left)). The rationale here is that the optimal path from s to the goal must pass through one of these neighboring edges, so if we knew the optimal paths from every point on any of these edges to the goal we could compute the optimal path for s. If E is the set of neighboring edges of s and PE is the set of all points on these edges, then the cost of the optimal path for s is g ∗ (s) = min (c(s, p) + g ∗ (p)), p∈PE
(1)
where c(s, p) is the cost of the cheapest path from s to p that does not travel through any other neighboring edges of s, and g ∗ (p) is the cost of the optimal path from point p to the goal. Unfortunately, there are an infinite number of points p on any one of the neighboring edges of s, so solving for the least-cost paths from each of these points is impossible. Fortunately, a method was presented in [3] that uses linear interpolation to approximate the path cost of any point on an edge using just the path costs of the two endpoints of the edge. In that work, this approximation was used to provide an efficient closed form solution to the above path cost calculation for any given corner node s in a uniform resolution grid. Using this method, very accurate paths through uniform resolution grids can be efficiently computed.
68
D. Ferguson and A. Stentz / Multi-Resolution Field D*
(a)
(b)
(c)
(d)
Figure 2. (left) The neighboring edges (dashed in red/gray, along with their endpoints in gray) from a given node (white) in a grid containing cells with two different resolutions: low-resolution and high-resolution. (a, b) Some of the possible paths to a neighboring edge/node from a low-resolution node. On the left are the possible optimal path types (in blue/dark gray) through the top low-resolution edge (dashed in red/gray) and its endpoint nodes (in gray). Linear interpolation is used to compute the path cost of any point along the top edge. On the right are the possible optimal path types (in blue/dark gray) to one neighboring high-resolution corner node (in gray). (c, d) Some of the possible paths (in blue/dark gray) from a high-resolution corner node (white) to a neighboring low-resolution edge (c) and to a high-resolution node (d, left) and edge (d, right).
2.2. Combining interpolation with multiple resolutions We can use the same basic idea to provide accurate path costs for nodes in non-uniform resolution grids. The main difference is that, while in the uniform resolution case each node s has exactly 8 neighboring edges of uniform length, in the non-uniform resolution case a node may have many more neighboring edges with widely-varying lengths. However, linear interpolation can still be used to approximate the path costs of points along these edges, exactly as in the uniform resolution case. As a concrete example of how we compute the path cost of a node in a multi-resolution grid, we now focus our attention on a grid containing cells of two different resolutions: high-resolution and low-resolution. This two-resolution case addresses the most common navigation scenario we are confronted with: a lowresolution prior map is available and the robot is equipped with high-resolution onboard sensors. Although we restrict our attention to this scenario, the approach is general and can be used with arbitrarily many different resolutions. In a grid containing two different resolutions, each node can reside on the corner of a low-resolution cell, the corner of a high-resolution cell, and/or the edge of a low-resolution cell. Examples of each of these possibilities can be seen in Figure 2(b): the white node is the corner of a low-resolution cell and the gray node is the corner of a high-resolution cell and on the edge of a low-resolution cell. Let’s look at each of these possibilities in turn. Firstly, imagine we have a node that resides on the corner of a low-resolution cell. We can calculate the least-cost path from the node through this cell by
D. Ferguson and A. Stentz / Multi-Resolution Field D*
69
ComputePathCost(s) 1 vs = ∞; 2 for each cell x upon which s resides 3 if x is a high-resolution cell 4 for each neighboring edge e of s that is on the boundary of x 5 vs = min(vs , minp∈Pe (c(s, p) + g i (p))); 6 else 7 for each neighboring edge e of s that is on the boundary of x 8 if e is a low-resolution edge 9 vs = min(vs , minp∈Pe (c(s, p) + g i (p))); 10 else 11 vs = min(vs , minp∈EPe (c(s, p) + g(p))); 12 return vs ;
Figure 3. Computing the Path Cost of a Node s in a Grid with Two Resolutions
looking at all the points on the boundary of this cell and computing the minimumcost path from the node through any of these points. We can approximate the cost of this path by using linear interpolation to provide the path cost of arbitrary boundary points, exactly as in uniform resolution Field D*. However, some of the boundary may be comprised of high-resolution nodes. In such a case, we can either use interpolation between adjacent high-resolution nodes and allow the path to transition to any point on an adjacent high-resolution edge, or we can restrict the path to transitioning to one of the high-resolution nodes. The former method provides more accurate approximations, but it is slightly more complicated and less efficient. Depending on the relative sizes of the high-resolution cells and the low-resolution cells, either of these approaches may be appropriate. For instance, if the high-resolution cells are much smaller than the low-resolution cells, then interpolating across the adjacent high-resolution edges when computing the path from a low-resolution node is not critical, as there will be a wide range of heading angles available just from direct paths to the adjacent high-resolution nodes. However, if the high-resolution cells are not significantly smaller than the lowresolution cells then this interpolation becomes important, as it allows much more freedom in the range of headings available to low-resolution nodes adjacent to high-resolution nodes. In Figure 2 we illustrate the latter, simpler approach, where interpolation is used to compute the path cost of points on neighboring, strictly low-resolution edges (e.g. the top edge in (a)), and paths are computed to each neighboring high-resolution node (e.g. the gray node on the right edge in (b)). For nodes that reside on the corner of a high-resolution cell, we can use interpolation to approximate the cost of the cheapest path through the highresolution cell (see the paths to the right edge in (d)). Finally, for nodes that reside on the edge of a low-resolution cell, we can use a similar approach as in our low-resolution corner case. Again, we look at the boundary of the low-resolution cell and use interpolation to compute the cost of points on strictly low-resolution edges (e.g. the top edge in (d)), and for each high-resolution edge we can choose between using interpolation to compute the cost of points along the edge, or restricting the path to travel through one of the endpoints of the edge. The latter approach is illustrated for computing a path through the left edge in (d).
70
D. Ferguson and A. Stentz / Multi-Resolution Field D*
key(s) 1 return [min(g(s), rhs(s)) + h(sstart , s); min(g(s), rhs(s))]; UpdateNode(s) 2 if s was not visited before, g(s) = ∞; 3 if (s = sgoal ) 4 rhs(s) = ComputePathCost(s); 5 if (s ∈ OPEN) remove s from OPEN; 6 if (g(s) = rhs(s)) insert s into OPEN with key(s); ComputeShortestPath() ˙ key(sstart ) OR rhs(sstart ) = g(sstart )) 7 while (mins∈OPEN (key(s))< 8 remove node s with the minimum key from OPEN; 9 if (g(s) > rhs(s)) 10 g(s) = rhs(s); 11 for all s ∈ nbrs(s) UpdateNode(s ); 12 else 13 g(s) = ∞; 14 for all s ∈ nbrs(s) ∪ {s} UpdateNode(s ); Main() 15 g(sstart ) = rhs(sstart ) = ∞; g(sgoal ) = ∞; 16 rhs(sgoal ) = 0; OPEN = ∅; 17 insert sgoal into OPEN with key(sgoal ); 18 forever 19 ComputeShortestPath(); 20 Wait for changes to grid or traversal costs; 21 for all new cells or cells with new traversal costs x 22 for each node s on an edge or corner of x 23 UpdateNode(s);
Figure 4. The Multi-resolution Field D* Algorithm (basic version)
Thus, for each node, we look at all the cells that it resides upon as either a corner or along an edge and compute the least-cost path through each of these cells using the above approximation technique. We then take the cheapest of all of these paths and use its cost as the path cost of the node. Pseudocode of this technique is presented in Figure 3. In this figure, Pe is the (infinite) set of all points on edge e, EPe is a set containing the two endpoints of edge e, g i (p) is an approximation of the path cost of point p (calculated through using linear interpolation between the endpoints of the edge p resides on), c(s, p) is the cost of a minimum-cost path from s to p, and g(p) is the current path cost of corner point p. We say an edge e is a ‘low-resolution edge’ (line 8) if both the cells on either side of e are low-resolution. An efficient solution to the minimizations in lines 5 and 9 was presented in [3].
D. Ferguson and A. Stentz / Multi-Resolution Field D*
71
Figure 5. Multi-resolution Field D* used to guide an agent through a partially-known environment. On the left is a section of the path already traversed showing the high-resolution cells. This data was taken from Fort Indiantown Gap, Pennsylvania.
3. Multi-resolution Field D* The path cost calculation discussed above enables us to plan direct paths through non-uniform resolution grids. We can couple this with any standard path planning algorithm, such as Dijkstra’s, A*, or D*. Because our motivation for this work was robotic path planning in unknown or partially-known environments, we have used it to develop an incremental replanning algorithm which we call Multi-resolution Field D*. A basic version of the algorithm is presented in Figure 4. Here, the ComputeMinPathCost function (line 4) takes a node s and computes the path cost for s using the path costs of all of its neighboring nodes and interpolation across its neighboring edges. Apart from this, notation follows the D* Lite algorithm: g(s) is the current path cost of node s, rhs(s) is the one-step lookahead path cost for s, OPEN is a priority queue containing inconsistent nodes (i.e., nodes s for which g(s) = rhs(s)) in increasing order of key values (line 1), sstart is the initial agent node, and sgoal is the goal node. h(sstart , s) is a heuristic estimate of the cost of a path from sstart to s. Because the key value of each node contains two ˙ key(s ) iff the first quantities a lexicographic ordering is used, where key(s) < element of key(s) is less than the first element of key(s ) or the first element of key(s) equals the first element of key(s ) and the second element of key(s) is less than the second element of key(s ). For more details on the D* Lite algorithm and this terminology, see [5]. As with other members of the D* family of algorithms, significant optimizations can be made to this initial algorithm. In particular, several of the optimizations presented in [4] are applicable and were used in our implementation. Because of its use of interpolation, Multi-resolution Field D* produces extremely direct, cost-effective paths through non-uniform resolution grids. Figures 1 and 5 show example paths planned using the algorithm.
72
D. Ferguson and A. Stentz / Multi-Resolution Field D*
Figure 6. Computation time and solution cost as a function of how much of the environment is represented at a high-resolution. The x-axis of each graph depicts the percentage of the map modeled using high-resolution cells, ranging from 0 (all modeled at a low-resolution) to 100 (all modeled at a high-resolution). (top) Initial planning. A path was planned from one side to the other of 100 randomly generated environments and the results were averaged. (bottom) Replanning. 5% of each environment was randomly altered and the initial paths were repaired.
4. Results Multi-resolution Field D* was originally developed in order to extend the effective range over which unmanned ground vehicles (such as the XUV vehicle in Figure 1) could operate by orders of magnitude. We have found the algorithm to be extremely effective at reducing both the memory and computational requirements of planning over large distances. To quantify its performance, we ran several experiments comparing Multi-resolution Field D* to uniform resolution Field D*. We used uniform resolution Field D* for comparison because it produces less costly paths than regular uniform grid-based planners and far better paths than regular non-uniform grid-based planners. Our first set of experiments investigated both the quality of the solutions and the computation time required to produce these solutions as a function of the memory requirements of Multi-resolution Field D*. We began with a randomly generated 100 × 100 low-resolution environment with an agent at one side and a goal at the other. We then took some percent p of the low-resolution cells (centered around the agent) and split each into a 10 × 10 block of high-resolution cells. We varied the value of p from 0 up to 100. We then planned an initial path to the goal. Next, we randomly changed 5% of the cells around the agent and replanned a path to the goal. We focussed the change around the robot to simulate new information being gathered in its vicinity. The results from these experiments are presented in Figure 6. The x-axis of each graph represents how much of the environment was represented at a high-resolution. The left graphs show how the time required for planning changes as the percent of high-resolution cells increases, while the middle and right graphs show how the path cost changes. The y-values in the middle and right graphs are path costs relative to the path cost computed when 100% of the environment is represented at a high-resolution. The right
73
D. Ferguson and A. Stentz / Multi-Resolution Field D*
Total Planning and Replanning Time (s) Initial Planning Time (s) Average Replanning Time (s) Percent high-resolution cells
Field D*
Multi-res Field D*
0.493
0.271
0.336
0.005
0.0007
0.0012
100
13
Figure 7. Results for uniform resolution Field D* versus Multi-resolution Field D* on a simulated robot traverse through real data acquired at Fort Indiantown Gap. The robot began with a low-resolution map of the area and updated this map with a high-resolution onboard sensor as it traversed the environment. The map was 350 × 320 meters in size.
graph shows the standard error associated with the relative path costs for smaller percentages of high-resolution cells. As can be seen from these results, modeling the environment as mostly low-resolution produces paths that are only trivially more expensive than those produced using a full high-resolution representation, for a small fraction of the memory and computational requirements. This first experiment shows the advantage of Multi-resolution Field D* as we reduce the percentage of high-resolution cells in our representation. However, because there is some overhead in the multi-resolution implementation, we also ran uniform resolution Field D* over a uniform, high-resolution grid to compare the runtime of this algorithm with our Multi-resolution version. The results of uniform resolution Field D* have been overlaid on our runtime graphs. Although uniform resolution Field D* is more efficient than Multi-resolution Field D* when 100% of the grid is composed of high-resolution cells, it is far less efficient than Multi-resolution Field D* when less of the grid is made up of high-resolution cells. Our second experiment simulated the most common case for outdoor mobile robot navigation, where we have an agent with some low-resolution prior map and high-resolution onboard sensors. For this experiment, we took real data collected from Fort Indiantown Gap, Pennsylvania, and simulated a robotic traverse from one side of this 350 × 320 meter environment to the other. We blurred the data to create a low-resolution prior map (at 10 × 10 meter accuracy) that the robot updated with a simulated medium-resolution sensor (at 1×1 meter accuracy with a 10 meter range) as it traversed the environment. The results from this experiment are shown in Figure 7. Again, Multiresolution Field D* requires only a fraction of the memory of uniform resolution Field D*, and its runtime is very competitive. In fact, it is only in the replanning portion of this final experiment that Multi-resolution Field D* requires more computation time than uniform resolution Field D*, and this is only because the overhead of converting part of its map representation from low-resolution to highresolution overshadows the trivial amount of processing required for replanning. In general, our results have shown that Multi-resolution Field D* produces paths that are trivially more expensive than those returned by uniform resolution Field D*, and it provides these paths for a fraction of the memory. Further, in many cases it is also significantly more computationally efficient than uniform resolution Field D*.
74
D. Ferguson and A. Stentz / Multi-Resolution Field D*
5. Conclusions We have presented Multi-resolution Field D*, an interpolation-based path planning algorithm able to plan direct, low-cost paths through non-uniform resolution grids. This algorithm addresses two of the most significant shortcomings of grid-based path planning: solution quality and memory requirements. By using a non-uniform resolution grid to represent the environment and linear interpolation to approximate the path costs of points not sampled on the grid, Multi-resolution Field D* is able to provide solutions of comparable quality to the most effective uniform resolution grid-based algorithms for a fraction of their memory and computational requirements. It is our belief that, by combining interpolation with non-uniform representations of the environment, we can ‘have our cake and eat it too’, with a planner that is extremely efficient in terms of both memory and computational requirements while still producing high quality paths. 6. Acknowledgements This work was sponsored in part by the U.S. Army Research Laboratory, under contract “Robotics Collaborative Technology Alliance” (contract number DAAD19-01-2-0012). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies or endorsements of the U.S. Government. Dave Ferguson is supported in part by a National Science Foundation Graduate Research Fellowship. References [1] D. Chen, R. Szczerba, and J. Uhran. Planning conditional shortest paths through an unknown environment: A framed-quadtree approach. IEEE International Conference on Intelligent Robots and Systems (IROS), 1995. [2] E. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. [3] D. Ferguson and A. Stentz. Field D*: An Interpolation-based Path Planner and Replanner. International Symposium on Robotics Research (ISRR), 2005. [4] D. Ferguson and A. Stentz. The Field D* Algorithm for Improved Path Planning and Replanning in Uniform and Non-uniform Cost Environments. Technical Report CMU-RI-TR-05-19, Carnegie Mellon School of Computer Science, 2005. [5] S. Koenig and M. Likhachev. D* Lite. National Conference on Artificial Intelligence (AAAI), 2002. [6] N. Nilsson. Principles of Artificial Intelligence. Tioga Publishing Company, 1980. [7] R. Philippsen and R. Siegwart. An Interpolated Dynamic Navigation Function. IEEE International Conference on Robotics and Automation (ICRA), 2005. [8] H. Samet. Neighbor Finding Techniques for Images Represented by Quadtrees. Computer Graphics and Image Processing, 18:37–57, 1982. [9] Anthony Stentz. The Focussed D* Algorithm for Real-Time Replanning. International Joint Conference on Artificial Intelligence (IJCAI), 1995. [10] A. Yahja, S. Singh, and A. Stentz. An efficient on-line path planner for outdoor mobile robots operating in vast environments. Robotics and Autonomous Systems, 33:129–143, 2000.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
75
Path and Observation Planning of Vision-based Mobile Robots with Multiple Sensing Strategies1 Mitsuaki Kayawake a,2 , Atsushi Yamashita a and Toru Kaneko a a Department of Mechanical Engineering Shizuoka University Abstract. In this paper, we propose a new path and viewpoint planning method for a mobile robot with multiple observation strategies. When a mobile robot works in the constructed environments such as indoor, it is very effective and reasonable to attach landmarks on the environment for the vision-based navigation. In that case, it is important for the robot to decide its motion automatically. Therefore, we propose a motion planning method that optimizes the efficiency of the task, the danger of colliding with obstacles, and the accuracy and the ease of the observation according to the situation and the performance of the robots. Keywords. Path planning, Viewpoint planning, Mobile robot, Landmark
1. Introduction In this paper, we propose a new path and viewpoint planning method for a mobile robot that has two active cameras according to the performance of the robot. The robot navigation is a very important technology to execute various tasks. The navigation is usually executed while the robots move and estimate their positions and orientations by using the information from several sensors. A dead-reckoning is a method that can estimate the robot position and orientation with internal sensor. However the error of it is accumulated in proportion to the traveling distance and the recovery of the error is impossible only with internal sensors. Therefore, external sensors are always utilized for the robot navigation. The robot can observe landmarks in the environment and measure the relationship between these points and the robot itself in the image-based navigation. When the robots observe them, the problems are how to measure the accurate position and orientation of landmarks, and where and which landmarks the robots should observe while there are multiple landmarks. As to the former problem, there are a lot of studies that improve the accuracy of 3-D measurement, i.e., [1]. However, there is a limit in accuracy when the robot always observes the same landmark regardless of the distance between the robot and the landmark. 1 Tel.:
81 53 478 1604; Fax: 81 53 478 1604;
[email protected].
2 E-mail:
76
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
Therefore, the latter problem is very important for the robot navigation. This means that the robot must decide the path and choose the observing landmarks according to its position while it moves[2]. Navigation planning methods under the uncertainty are also proposed[3]-[6]. On the other hand, the design of the optimal arrangement of artificial landmark is very important in the indoor environment[7]. Takeuchi et al. proposed a method to dispose artificial landmarks in the environment, and to navigate a mobile robot there[8]. The landmarks are disposed so that the robot can observe at least one landmark at any positions, and the robot plans the observation so that they can observe landmarks.
2. Purpose We have already proposed the landmark observation method with multiple strategies[9]. In [9], the robot used the C-obstacle (configuration obstacle) to escape the danger of the collision with the obstacle, and the only width of the expansion of C-obstacle was decided. However, when the given environment is complex, deciding the expanding width becomes difficult. If the expanding width of C-obstacle was unsuitable, the robot was not able to reach the destination, or it had to travel unnecessary long moving distance. Therefore, the robot had to look for appropriate expanding width of C-obstacle. Then, the robot in this work searches for the path of the expanding width of two or more Cobstacles, and discovers the path in appropriate width among that. In this method, a primary value of the expanding width of C-obstacle is given to the robot, and the robot continue to increase constant width about the expanding width of C-obstacle. The robot searches for the path in the expanding width of each C-obstacle. When the path that can be connected with the destination is not found, the robot ends the search for the path. Using observation technique of [9], we propose the improvement technique for path planning. We make the shape of C-obstacle better. In [9], the vicinity of the vertex of Cobstacle has been expanded more widely than original expanding width. Therefore, there was a problem to which it was not able to pass in the path that should be able to pass. In this work, the corner of C-obstacle is expressed by the polygon. The expanding width in the vicinity of the vertex approximates to original expanding width by this method, and solves problem of [9]. In [9], the n-th shortest paths were searched in each visibility graph by a simple brute force method. In our path planning, the path is planned by Dijkstra algorithm. Even if the combination of all the vertices is not confirmed, this technique can discover the shortest path. As the result, the amount of the path searching is decreased. Optimal path and viewpoint planning is necessary for the robot to execute works efficiently. At the same time, “optimal” path and viewpoints change according to the performance of the robot. Some previous works can plan the optimal paths and viewpoints of the robot [10]. However, they don’t consider the relationship between optimal planning results and the performance of the robot explicitly. For the robot with good performance, path can be planned by giving priority to moved distance more than safety. On the other hand, the robot with bad performance will be able surely to reach the goal the plan of path to which it gives priority to safety. Therefore, “optimal” path and viewpoints depend on the performance of the robot. There are multiple evaluation methods such as high accuracy, path length, and safety. Evaluation methods also change when the performance changes. Therefore, we propose a new path and viewpoint planning method for a mobile
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
77
Environmental information
Setting primary error tolerance Replan
Shortest path planning Observation planning
Possible to reach the destination ? Yes
No
Alternative path planning Repeat
Observation planning
All vertexes of C-obstacles are selected
No Yes
Changing in error tolerance Can the path connected from the start to the goal be planned?
Yes
No
Selection of the shortest path that can reach Motion planning
Figure 1. Path and observation planning.
robot with multiple observation strategies according to the performance of the robot. We verify the effectiveness of this method through simulations. The accuracy of multiple observation strategies depends on the positions of landmarks that can be observed. Therefore, the robot must choose the optimal landmarkobservation strategy to consider the number and the configuration of visible landmarks in each place. The path and viewpoints are planned by considering not only the accuracy of the optimal observation strategy but also the dead-reckoning error of the robot also plans the path and viewpoints. Our motion planning method is designed to guarantee that the robot never collides with obstacle. The robot selects the shortest path from the safe paths. After deciding the shortest path, the number of viewpoints is minimized. This is because it is desirable that the cameras equipped with the robot are utilized for the other uses of the landmarkobservation. The motion planning begins from the input of environmental information including the landmark position, the start position, the goal position, and the extra information to the robot (Figure 1). The shortest path and the viewpoint are planned from input environmental information. As for shortest path, it is distinguished whether to reach the goal without colliding with the obstacle by the observation plan. When it is possible to reach goal position, error tolerance is changed and it plans from the shortest path plan again. When it is not possible to reach the goal, paths other than shortest path are planned and
78
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
Landmark 1
Landmark
Landmark 1 Landmark 2 Landmark 3
Landmark 2 (a) Stereo observation.
(b) Two-landmark observation.
(c) Three-landmark observation.
Figure 2. Three observation strategies.
the viewpoint of those is planned. Shortest path that can reach the goal in searched path is assumed to be optimal path.
3. Three Observation Strategies 3.1. Problem Statement We make the assumptions for the planning of the robot navigation. The robot can move in all direction at any time and uses the information from two active cameras that can change their directions independently. The shape of the robot is expressed as the circle whose radius is R. The environment map that indicates the positions of walls, obstacles, and landmarks is also previously provided to the robot. Therefore the map is available in the robot. All landmarks whose heights are same with those of robot’s cameras are attached to the environment. The shape of the landmark is a circle and the radius of each landmark is constant. Each landmark can be distinguished with each other. The detail of each error of the observation is explained in [9]. We develop three observation strategies: (a) stereo observation, (b) two-landmark observation, and (c) three-landmark observation (Figure 2). 3.2. Stereo Observation The stereo observation can be executed when one or more landmarks are inside the robot’s field of view. The robot estimates its position and orientation with the triangulation. In this strategy, the 3-D positions of left and right ends of the observed landmark are measured with the information of disparities. The position and orientation of the robot in the world coordinate can be calculated from the coordinate value of two points. 3.3. Two-Landmark Observation The two-landmark observation can be executed when two or more landmarks are inside the robot’s field of view. Left and right ends of the nearer landmark and a center point of the distant landmark are extracted from two acquired images. This observation calculates only angle information from that image. The position and the orientation of the robot in the world coordinate can be decided as a result. 3.4. Three-Landmark Observation The three-landmark observation can be executed when three or more landmarks are inside the robot’s field of view. The relationship between three landmarks and the robot is estimated from the coordinate value of the center of three landmarks in images.
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots Accurate position of landmark
Accurate size of landmark
79
Observed size of landmark
v
u
Observed position of landmark
r
(a) Position error. (b) Size error. Figure 3. Image error.
Next position error from position error
Next viewpoint (ideal)
Next viewpoint (ideal) Robot
Angle error
Next viewpoint (actual)
Position error
Next position error from angle error
Next viewpoint (actual) Robot
(a) Position error. (b) Orientation error. Figure 4. Position and orientation error of robot.
The accuracy of the estimated position and orientation become higher as compared with two-landmark observation method. This is because the distance between the extracted points in images is larger in three-landmark observation than in two-landmark observation. In addition, the image noise at the edge of the landmark is generally larger than that at the center, because the coordinate value of the center position in the image coordinate can be obtained accurately by calculating the center of gravity of the pixels that belong to the landmark. 3.5. Optimal Observation Strategy The robot chooses the optimal landmark-observation strategy that can estimate its position and orientation precisely. The optimal strategy can be decided when the path and the position of the viewpoint is planned. At first, the visible landmarks in each place are selected to consider the robot’s field of view. The robot cannot observe landmarks when obstacles are between the robot and landmarks and cannot observe them from the back side. Incidentally, the error of image such as quantization error always occurs. Then, the theoretical estimation errors of robot’s position and orientation are calculated by considering the situation that the errors of landmark’s position and size (shape) in images occur (Figure 3). We assumed that the position error of the landmark’s center point in the image is (Δu, Δv). The size error of the landmark in the image Δr is also considered. It means that the observed landmark’s position in the image may shift (±Δu, ±Δv) from the true position at the maximum. The observed radius may also shift ±Δr from the true size.
80
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
The robot estimates how the position and orientation errors occur in the world coordinate about all combination of visible landmarks when the errors occur in the images (image coordinates) of two cameras. However, the position error and the orientation error are not compared directly because the dimensions of them are different from each other. The position error is expressed as the dimension of length, i.e., [mm], and the orientation error is expressed as the dimension of angle, i.e., [deg]. Therefore, we transform the orientation error (the dimension of angle) into the position error (the dimension of length). The total sum of the error when the robot moves at a certain distance while the position and orientation error occur is calculated (Figure 4). This means that the total error at the next time’s position of the robot when it moves is the sum of the position error (Epos,max , Figure 4(a)) and the incorrect moving distance under the influence of the orientation error (Eang,max , Figure 4(b)). Epos,max is the distance between the ideal robot position without error and the real position with error derived from position error. Eang,max is the distance between the ideal robot position without error and the real position with error derived from orientation error. Therefore, Epos,max and Eang,max have the same dimension of length. In this way, the estimated error of the robot position in the world coordinate that is caused by the image error is calculated. The optimal observation strategy is equal to the observation method that has minimum position estimation error. The optimal observation strategy is decided as follows: Emax (p, m) = Epos,max (p, m) + Eang,max (p, m) → min,
(1)
where p is the present position, m is the moving distance, and Emax (p, m) is the total error when the robot move distance m from p. The direction of two cameras is decided by selecting the optimal observation strategy in each place. Therefore, the navigation planning of the mobile robot can be executed.
4. Path and viewpoint Planning 4.1. Path Planning The path of the robot is planned based on the visibility graph.In our motion planning method, original obstacles are expanded R + S where S is the margin for safety (Figure 5(a)). In this paper, we call S the error tolerance. When generating C-obstacles, shapes of their vertices are approximated with the polygons because of simplicity of computations. Therefore, their vertices were approximated to intersection T1 , T2 of Li (i = 1, 2, 3). The vertices of C-obstacles are connected with each other and a visibility graph is constructed (Figure 5(b)). In this step, multiple visibility graphs are generated by changing S for optimal planning. This is because it is difficult to find the best S in complex environment beforehand. The robot of bad performance can plan path that safely reaches the goal by changing S(Figure 5(c)). In each visibility graph, the shortest path from a start position to a goal one is searched by Dijkstra algorithm (Figure 5(d)). However, the robot cannot necessarily
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
L1
T1
81
L2 T2 L3
C-Obstacle
R+S Obstacle
Obstacle
start
(a) C-obstacle.
Goal
(b) Visibility graph (S: small).
Shortest path
A Alternative path
(c) Visibility graph (S: large).
(d) Shortest path and alternative path.
Figure 5. Expanded C-obstacle and path planning.
move along the shortest path due to its performance and the configuration of obstacles and landmarks, and several candidates of path must be prepared. Therefore, to search for paths other than shortest path by Dijkstra algorithm, the vertex of C-obstacle is chosen at random as a relay point(= A) of the start position and the goal one. About the relay point, path from start position to A and path from A to goal position are searched by Dijkstra algorithm. The paths are composed by the connection of them. In this paper we call them alternative paths. Path for which it searched by choosing the relay point at random and quite different path can be planned. 4.2. Viewpoint Planning Viewpoint is planned about each path that is planned in the previous step. Viewpoint planning for arriving at the goal position safely is executed by considering the observation error of landmarks and dead-reckoning error. The robot estimates its position and orientation by dead-reckoning while moving after observing landmark(s). If it moves long distance after estimating its position with landmark(s), the error of estimated position and orientation is accumulated in proportion to the traveling distance. Therefore, the robot must estimate its position with landmark(s) frequently before position error is accumulated. Here, it is assumed that the robot observes landmark(s) where the distance between the present position ps and the start position is ms along the planned path. Let Dmax (ps , m) be the estimated maximum dead-reckoning error when the robot move distance m from ps , Emax (ps , m) be the estimated maximum error from the observation (1), and S (error tolerance) means the maximum error of the robot position for not colliding with obstacles. The maximum distance mmax that the robot can move without observing landmarks is expressed as follows (Figure 6(a)): Dmax (ps , mmax ) + Emax (ps , mmax ) ≤ R + S.
(2)
82
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots Ideal position of the future Ideal position
reach most distant point = next view point
mmax
R+S
Error
Ps
Actual position
P1
Position that comes off from range by error at first
(a) Distance that can be moved without observation.
A P2 Pg planned path path with error mPs,max (b) Selection of view points.
Figure 6. Observation planning
Here, let pg be the position whose distance from the start is ms + mps ,max . The path from ps to pg is divided into n + 1 position, and we define mi as the distance between the each divided position pi and the start position of the path (p0 = ps , pn = pg ). When the next viewpoint from ps is pi , the next viewpoint from pi must satisfy the following equation when the total number of observation becomes small (Figure 6(b)). mi + mpi ,max → max .
(3)
The robot selects the optimal landmark-observation strategy that satisfies (3). Therefore, the viewpoints can be decided one by one in the following way. 1. mps ,max and pg that satisfy (2) are calculated when the robot observes landmarks at ps . 2. The path from ps to pg is divided into n + 1 position pi . 3. pi that satisfies (3) is calculated, and pi is regarded as the next viewpoint. 4. If mi + mpi ,max is smaller than the distance between the start position and the goal position of planned path, ps is replaced with pi and go to step 1). If it is large, the viewpoint planning finishes. The optimal viewpoints, the optimal observation strategies in each viewpoint, the optimal observed landmark(s) and the direction of the cameras in each viewpoint could be planned in the above procedure.
5. Results of Motion Planning In this section, the effectiveness of the motion planning by the difference of the performance of the robot is verified. In this paper, robot performance means dead-reckoning error and image error. The dead-reckoning error shows at the rate how much error occurs about moving distance. The image error shows the position’s error of the landmark and the size error of the landmark. Figure 7(a) shows environment. The primary value of the Table 1. Robot performance. Name
Dead-reckoning error
Robot1
±10 %
±1 Pixel
Robot2
±40 %
±1 Pixel
Robot3
±40 %
±5 Pixel
Image error
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots 1m
: Landmark 1m
1m Robot3
Observation point
Obstacle Destination
83
Robot2
Robot1
Start
(a) Environment.
(b) Robot1 and Robot2 path.
(c) Robot3 path.
Figure 7. Simulation results. Table 2. Comparison of path. Name
Error torelance (cm)
Distance (cm)
Robot1
60
2359
6
Robot2
120
3246
19
Robot3
150
5140
31
The total number of observation
expanding width of C-obstacle is decided to S=0cm, and the increasing constant width is decided to 30cm. We compare Robot1 whose dead-reckoning error and image error is small, Robot2 whose dead-reckoning error is large, and Robot3 whose dead-reckoning error and image error are large, for evaluating the relationship between the planning result of the path and the viewpoints and the performance of the robots (Table 1). As the result of planning, Robot1 and Robot2 can reach the goal along the paths shown in Figure 7(b). In addition Robot3 can reach the goal along the path shown in Figure 7(c). The path of Robot1 is narrow and the landmark that can be observed is few. However, because the dead-reckoning performance of Robot1 is good, the path planned can be traced accurately. Moreover, because the observation accuracy is good, the error can be accurately corrected by a little observation frequency. On the other hand, because the dead-reckoning performance of Robot2 is worse than Robot1, it cannot trace accurately the planned path. Therefore, it cannot plan the path of Robot1. In addition, the path reaching the destination was not found at this error tolerance (=60cm). Then, error tolerance is changed, and Robot2 plans the path with wide width of the road though the distance of it is longer than the path of Robot1. Moreover, because the dead-reckoning performance of Robot2 is bad, it should correct frequently the dead-reckoning error by observing the landmark. As a result, the observation frequency of it is more than frequency of Robot1. The path of Robot3 is wide and landmarks that can be observed are many. Because the dead-reckoning performance and the observation accuracy of Robot3 are worst, it cannot be moved along the planned path. Therefore, it plans the path that safely reaches the destination though moving distance becomes long. In addition it is understood that Robot3 observed a lot of landmarks and moves along the route. From these results, it is shown that the optimal path observation points, observation strategies can be planned according to the performance of the robot. In concrete terms, the robot with high performance (small dead-reckoning and image error) can select the short path from the start to the goal, although there are few landmarks and this is a dangerous path. Contrarily, the robot with low performance selects the safe path in which a lot of landmarks can be observed and the accuracy of positioning is high, although the distance between the start and the goal is long.
84
M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots
6. Conclusion In this paper, we propose a new path and viewpoint planning method for autonomous mobile robots with multiple observation strategies. The robot estimates its position by observing landmarks attached on the environments with two active cameras. It chooses the optimal landmark-observation strategy depending on the number and the configuration of visible landmarks in each place. The optimal path, viewpoints, and observation strategies that minimize the estimation error of the robot position and the number of observation can be selected properly. The effectiveness of our method is shown through simulations. As the future works, it should be better that models of sensor and motion error are based on probabilistic ones, such as Kalman filters, particle filters, SLAM.
Acknowledgment This research was partly supported by CASIO Science Promotion Foundation, Japan.
References [1] M. Okutomi and T. Kanade: “A Multiple-baseline Stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.15, No.4, pp.353–363, 1993. [2] K. Nagatani and S. Yuta: “Path and Sensing Point Planning for Mobile Robot Navigation to Minimize the Risk of Collision,” Proceedings of the 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2198–2203, 1993. [3] A. Lazanas and J.-C. Latombe: “Motion Planning with Uncertainty: A Landmark Approach,” Artificial Intelligence, Vol.76, No.1-2, pp.287–317, 1995. [4] Th. Fraichard and R. Mermond: “Integrating Uncertainty and Landmarks in Path Planning for Car-Like Robots,” Proceedings of IFAC Symposium on Intelligent Autonomous Vehicles, Vol.2, pp,397–402, 1998. [5] N. Roy and S. Thrun: “Coastal Navigation with Mobile Robots,” Advances in Neural Processing Systems 12, Vol.12, pp.1043–1049, 1999. [6] J.P. Gonzalez and A. Stents: “Planning with Uncertainty in Position: An Optimal and Efficient Planner,” Proceedings of the 2005 IEEE International Conference on Intelligent Robots and Systems, pp,607–614 2005. [7] K. Tashiro, J. Ota, Y. C. Lin and T. Arai: “Design of the Optimal Arrangement of Artificial Landmarks,” Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pp.407–413, 1995. [8] K. Takeuchi, J. Ota, K. Ikeda, Y. Aiyama and T. Arai: “Mobile Robot Navigation Using Artificial Landmarks,” Intelligent Autonomous Systems 6, pp.431–438, 2000. [9] A. Yamashita, K. Fujita, T. Kaneko, and H. Asama: “Path and Viewpoint Planning of Mobile Robots with Multiple Observation Strategies,” Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3195-3200, 2004. [10] T. Kanbara, J. Miura, Y. Shirai, A. Hayashi and S. Li: “A Method of Planning Movement and Observation for a Mobile Robot Considering Uncertainties of Movement, Visual Sensing, and a Map,” Proceedings of the 4th Japan-France / 2nd Asia-Europe Congress on Mechatronics, pp.594–599, 1998.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
85
Mobile Robot Motion Planning Considering Path Ambiguity of Moving Obstacles Hiroshi Koyasu and Jun Miura Department of Mechanical Engineering, Osaka University Suita, Osaka 565-0871, Japan Email: {koyasu,jun}@cv.mech.eng.osaka-u.ac.jp Abstract. This paper describes a motion planning method for mobile robot which considers the path ambiguity of moving obstacles. Each moving obstacle has a set of paths and their probabilities. The robot selects the motion which minimizes the expected time to reach its goal, by recursively predicting future states of each obstacle and then selecting the best motion for them. To calculate the motion for terminal nodes of the search tree, we use a randomized motion planner, which is an improved version of a previous method. Simulation results show the effectiveness of the proposed method. Keywords. motion planning, dynamic environment, mobile robot
1. Introduction Planning a path without collision with both static and dynamic obstacles is one of the fundamental function of mobile robots. Most of past research on motion planning in dynamic environments can be classified into two categories. One is for reactive methods, which do not predict the paths of moving obstacles [5]. In such a reactive method, not the optimality but the safety is more important. The other is for deliberative methods which are based on the assumption that the paths of moving obstacles are completely known; an optimal motion can be generated by employing a planning in space-time [3], or by using the concept of velocity obstacle [6,10]. However, it is difficult to predict paths of obstacles without uncertainty. We divide the uncertainty of an obstacle motion into the path ambiguity (which path to choose) and the error in following a path. The latter can be handled by considering the range of uncertainty in motion planning [2]. A similar approach is, however, difficult to apply to the former because a very inefficient motion may be planned when the possible paths of an obstacle spread in a wide area. It is, therefore, necessary to cope with multiple path candidates. Bennewitz et al. [1] proposed a motion planning method which considers multiple path candidates of obstacles. To avoid collision with an obstacle which has multiple path candidates, they use a heuristic cost function which is defined in space-time; the cost is set to be large near path candidates and proportional to the probability of each path candidate. Using this cost function, motion planning is done by using the A∗ search
86
H. Koyasu and J. Miura / Mobile Robot Motion Planning
algorithm. The method, however, does not consider the change of the environment to be recognized by future observations. Miura and Shirai [9] proposed to explicitly predict possible future states (i.e., which path a moving obstacle actually takes) of the environment and their probabilities for motion planning. By recursively searching for the optimal motion for each predicted state, they selected the optimal next motion minimizing the expected time to the goal. The method was, however, tested only for the case where there is only one moving obstacle with two path candidates and the possible motions of the robot are given in advance. In this paper, we expand our previous method [9] so that it can handle multiple moving obstacles with multiple path candidates and can plan robot motions according to the obstacle configuration and movements. We use a randomized motion planner, which is an improved version of a previous method. We show the effectiveness of the proposed method by simulation experiments. 2. Model of Moving Obstacles In the environment, there are static and moving obstacles. Positions of static obstacles are given in advance. Each moving obstacle has its own set of candidate destinations and moves to one of them with avoiding collision with static obstacles. We represent a set of possible paths by using a tangent graph [8]. The initial probability of each obstacle taking one of its paths is given. The robot periodically observes the position and the velocity of each moving obstacle, and updates the probabilities of its path candidates. When a new observation is obtained, the probability of the jth candidate path of the ith obstacle is updated by (1) P (pathij | oi (t)) = αP (pathij )P (oi (t) | pathij ), where oi (t) is the observed position of the obstacle at time t, and α is a normalization constant. P (pathij ) is given by P (pathij | oi (t − 1)). P (oi (t) | pathij ) is the likelihood of the path calculated by using the Gaussian positional uncertainty. 3. Predicting the State of Moving Obstacles We define a state of a moving obstacle as a set of possible obstacle path candidates. The states can be classified into two cases. One is the case where only one candidate is remaining. We call this case a fixed state. The other is the case where multiple candidates are still remaining. We call this case an ambiguous state. An ambiguous state will be changed into a fixed state by future observations; the probability of a fixed state thus becomes large as time elapses. 3.1. Uncertainty of obstacle motion on a path we assume that the error of a predicted position of an obstacle on a path is distributed within the so-called 3σ region of a 2D Gaussian. Its mean is the position predicted by assuming that the obstacle moves at the currently observed speed. The variance of the Gaussian perpendicular to the path is constant by supposing that an obstacle tries to follow the path as closely as possible. The variance along the path is proportional to the moving distance of the obstacle. 3.2. Calculating the probability of a state We first consider the case where there is one obstacle with two path candidates. Fig. 1 shows the relationship between the prediction of the positional uncertainty and the path ambiguity in such a case. In the figure, path1 and path2 are drawn as black arrows. Black points are predicted positions on the path candidates at time t1 and t2 . Let Ri (t) be the
H. Koyasu and J. Miura / Mobile Robot Motion Planning
87
Obstacle position and its uncertainty at time t 1 R 2 (t1 ) position and its uncertainty R 1 (t 1 ) at time t 2 path 2
R 2 (t 2 )
R 1 (t 2 ) path 1
Figure 1. Prediction of the positional uncertainty and the path ambiguity of a moving obstacle.
3σ region of the Gaussian on pathi at time t (i = 1, 2). If the obstacle is observed in the overlapped region of R1 (t) and R2 (t) (the hatched region in the figure), the state is an ambiguous state. The probability of this ambiguous state is calculated by (2) P (path1 )P (x1 (t) ∈ R2 (t)) + P (path2 )P (x2 (t) ∈ R1 (t)), where P (pathi ) is the prior probability of pathi , and xi (t) is the position at time t when the obstacle is actually on pathi . The first term of Eq. (2) is the probability that the obstacle on path1 cannot be determined to be on either of the paths. P (x1 (t) ∈ R2 (t)) is calculated by integrating the probability density function of x1 (t) in R2 (t). There are two fixed state. One is the case where the obstacle is determined to be on path1 , and the other is the case of path2 . The probability of the case of path1 is calculated by / R2 (t)). (3) P (path1 )P (x1 (t) ∈ Next, let us consider the case where there are n path candidates. The number of states of the obstacle is 2n − 1, that is, all combinations of paths. The probability of an ambiguous state where there are m remaining path candidates is calculated as follows. Let L = {l1, · · ·, lm } be a set of indices of the remaining path candidates. For the case where the obstacle is on pathli , the probability that the m paths are still possible is equal to the probability that xli is inside regions Rlj (j = 1, · · ·, i−1, i+1, · · ·, m) and is outside the other n − m regions. The weighted sum of such probabilities for the m remaining paths becomes the probability of the ambiguous state as follows: m P (pathli )P (xli (t) ∈ ∩j=i Rlj (t), xli (t) ∈ / ∪k∈L (4) / Rk (t)). i=1
Also, the probability of a fixed state where the obstacle is on pathi is given by / ∪j=i Rj (t)). (5) P (pathi )P (xi (t) ∈ When there are multiple moving obstacles, a state of the environment is represented by a set of the states of all obstacles, and its probability is the product of the probabilities of the states of the obstacles. A state of the environment is called fixed if the paths of all obstacles are uniquely determined. Otherwise, a state is ambiguous. 4. Planning Method The robot selects the motion which minimizes the expected cost. We use the time to reach the goal as the cost. We first predict possible future states of the environment and their probabilities. Then, by recursively searching for the optimal motion for each predicted state, we select the optimal next motion which minimizes the expected cost. This search process thus composes an AND/OR tree of robot motions and predicted states of the environment, as shown in Fig. 2. Once the next motion is determined by this search process, the robot executes it and observes the environment to update the state. The robot plans the subsequent motion based on the updated state.
88
H. Koyasu and J. Miura / Mobile Robot Motion Planning
Initial State
Motion i
Motion M
Minimization
State j
State L
Summation weighetd by probability
Motion 1
State 1
Motion i
fixed state
State j’
ambiguous state
Depth limit Motion i
Figure 2. AND/OR tree search.
S mi
Ci
mj 1
Sj C 1j
l
Sj C lj
prune Cj
goal n
Sj >C i
goal Figure 3. Prune a branch of the AND/OR tree.
Figure 4. A path for calculating the lower bound.
4.1. Search for the Motion Minimizing the Expected Cost We explain the search for the motion minimizing the expectation of the cost using Fig. 2. We first pick a next motion and predict a set of states after the robot executes the motion. If a state in the set is fixed, we calculate the cost of the state by using the motion planner described in Sec. 4.3. If a state is ambiguous, we recursively search for the optimal motion (and thus the cost) for the state. We limit the search depth to avoid combinatorial explosion. When the search reaches the depth limit, we calculate a feasible robot motion which considers all possible obstacle paths, i.e., plan a robot motion by assuming that every remaining path candidate is actually taken by some obstacle. The expectation of the cost of the next motion is then obtained by calculating the sum of the expectation of the costs of all possible states weighted by their probabilities. We calculate the expectation of the cost for every candidate for the next motion, and select the one which has the minimum expectation of the cost. 4.2. Pruning of the AND/OR Tree by Using Lower Bound To reduce the computation cost of the search, we perform a pruning of the AND/OR tree. Fig. 3 shows this pruning process, when searching for the motion minimizing the expected cost for state S. Suppose that for motion mi in state S, expected total cost Ci is calculated. This cost is the sum of the cost from the initial state to S and that from S to the goal state. For another motion mj , if its lower bound Cj is larger than Ci , mj can be pruned. This lower bound changes as the total cost of a predicted state to be reached after executing mj is obtained. Let Sjk (k = 1, · · · , n) be the predicted states. After the costs Cjk for Sjk (k = 1, · · ·, l ≤ n) are calculated, the lower bound becomes
H. Koyasu and J. Miura / Mobile Robot Motion Planning
Cj l = Cj +
l
P (Sjk ) Cjk − Cj .
89
(6)
k=1
If Cj l > Ci , we prune branch mj without calculating the costs of remaining n − l states. Lower bound Cj is calculated as follows. We consider a motion pair which is composed of a turning motion and a straight motion to reach the goal from the current position, with neglecting any collisions with obstacles, as shown in Fig. 4. We examine the cost of such motion pair for a given set of turning radii and set the minimum cost to Cj . 4.3. Calculating a Feasible Path To calculate a feasible path for a fixed state (including the case where a robot has to make a plan by considering all possible path candidates at the depth limit), we use an improved version of the method proposed by Hsu et al. [4]. Their method is a randomized search method which considers non-holonomic constraints of the robot motion. The method picks a robot motion at random and generates a probabilistic roadmap of sampled state×time points, called milestones, connected by short admissible trajectories. The probabilistic roadmap consists of a tree of milestones whose root is the initial robot pose. This search process finishes by finding a milestone which is in the endgame region. In the method, the completeness is an important issue, but the quality of planned path is not fully discussed. Since the method finishes the search by finding only one path, a found path may be much more inefficient than the optimal one. We improve their method in the following points: • Give weights to milestones so that a wide space is explored. • Iterate the path search until a sufficiently good path is found. In this paper, the endgame region is defined as the set of robot poses from which a feasible path to the goal can be generated as the one shown in Fig. 4. 4.3.1. Weighting Milestones In sampling milestones, if the weights of milestones are uniform, the tree may grow under a set of limited milestones; this may make a path quality low. To deal with this problem, Hsu et al. [4] give a weight to a milestone inversely proportional to the number of other milestones in the neighborhood of the milestone in the configuration space. However, the distance between milestones under a specific milestone is not necessarily small when their depths are largely different. Rapidly-exploring Random Tree (RRT) [7], which uses the size of the Voronoi regions for weighting, may have the same drawback. To make the tree grow relatively uniformly, we give the same weight to subtrees if their roots are at the same depth. Therefore the weight to milestone i with depth di and with bi motions which have not been expanded is given by (bi /B)(1/B)di , where B is the number of robot motions (i.e., branching factor). However, searching by using these weights is probabilistically equal to the breadthfirst search; this may require much computation. So we put more weights on deeper milestones. The weight of the milestone is modified as (bi /B)(a/B)di , where a (> 1) is a constant. 4.3.2. Evaluating the Quality of a Path We iterate the path search several times and select the best path among the generated ones. We here consider the problem of determining when to stop the iteration. To solve this problem, we need to be able to evaluate how good the current best path is. Therefore, we estimate the proportion of the number of potential paths whose costs are greater than
90
H. Koyasu and J. Miura / Mobile Robot Motion Planning
milestones milestones that found collision
di
found path
i
D
potential path
j
2
proportion
1
Figure 6. Milestones and their potential paths.
Figure 5. Number of potential paths on a milestone tree. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
bi B
acutual path quality predicted path quality
0
1000
2000
3000
4000
5000
rank Figure 7. Predicted path qualities.
that of the current best path to the total number of all possible paths. After a feasible path is found, if the lower bound of a milestone is larger than the cost of the feasible path, no paths passing the milestone are more efficient than the feasible path. To estimate the proportion described above, we need to estimate the number of potential paths which passes a milestone. We assume that milestones at the same depth have the same number of potential paths. Under this assumption, a deeper milestone has fewer potential paths; Fig. 5 illustrates this property. In the figure, D denotes the maximum depth of the tree, which is set to that of the current best path. The total number of the potential paths is B D (B is the branching factor). Fig. 6 illustrates how to calculate the number of potential paths of a milestone. In the figure, milestone i has a child milestone j. We consider that a potential path belongs to the deepest milestone on the path. So in the figure, potential path set 1 belongs to milestone i and set 2 belongs to milestone j. A milestone which is a child of milestone i and has not been expanded has B D−(di +1) potential paths. If milestone i has bi such children, its number of potential paths is bi B D−(di +1) . If a milestone cannot be added to the tree due to collision, the potential paths passing it can never be obtained as feasible paths. So when milestone j cannot be added, the number of potential paths passing it is B D−dj , and this number should be excluded from the total number of the potental paths. Let I be the set of the indices of milestones whose lower bounds are greater than the cost of the current best path, and J be the set of the indices of milestones which cannot be added due to collision. The proportion of the number of potential paths whose costs are greater than that of the current best path to the total number of potential paths is thus
91
H. Koyasu and J. Miura / Mobile Robot Motion Planning
obstacle 1
goal obstacle 2
robot position
obstacle 1
robot position
t=0 obstacle 1
goal obstacle 2
robot position
goal obstacle 2
t=1 obstacle 2
goal obstacle 1
robot position
t=3
t=4
obstacle 1
goal obstacle 2
robot position
t=2 obstacle 2
goal obstacle 1
robot position
t=5
Figure 8. Simulation result.
estimated by
bi B D−di −1 . − j∈J B D−dj
i∈I
BD
(7)
If the estimated proportion becomes greater than 1−α, the current best path is expected to belong to the best 100α% of the entire set of potential paths. If this condition is satisfied with a small α, we consider that a sufficiently good path is obtained. Currently, we use 0.1 as α. To validate Eq. (7), we performed the following experiment. First, we generated 5000 paths from a milestone tree. Next, for each path, we calculated the estimated proportion given by Eq. (7) using the path as the current best path, and compared it with the actual proportion. Fig. 7 shows the comparison result. The estimated proportion is always less than the actual one mainly because the estimation is based on the lower bound. When the quality of a path is high enough, the estimated proportion is near to the actual; this suggests that this estimated proportion is a good measure of path quality. 5. Experimental Results Fig. 8 shows the results of a simulation experiment. In the figure, white regions are free space, dark gray lines are path candidates of moving obstacles. Gray lines are planned
92
H. Koyasu and J. Miura / Mobile Robot Motion Planning
goal obstacle 2 obstacle 1
robot position(start)
Figure 9. Probabilities of the paths of each obstacle.
Figure 10. Result of motion planning without considering ambiguity.
robot paths, among which a darker line indicates a path with a higher probability. On paths of both the robot and the obstacles, circles are drawn with a fixed time interval. In the experiment, the number of obstacles is two, the number of initial path candidates of each obstacle is three, and the depth limit of search is two. A motion of the robot is represented by a pair of the turning radius and the speed. The number of robot motions is nineteen. The prior probability of each path candidates is set equally. Obstacle 1 actually takes the right path, and obstacle 2 takes the middle. Fig. 9 shows the change of the probabilities of paths for each obstacle. For obstacle 1, the probability of taking the right path becomes dominant at t = 1, but the ambiguity of paths remains until t = 4. For obstacle 2, the probability of taking the middle path becomes dominant at t = 1, but the ambiguity of paths remains until t = 4. At time t = 0, since all paths of the obstacles are equally probable, the robot selects the motion to move a position where the robot can safely avoid collision with the obstacles even if they will take any paths. At time t = 1, the situation is still highly ambiguous and the robot selects a motion similar to the one selected at time t = 0. At time t = 2, since the path of each obstacle is almost uniquely determined, only a few robot motions in a generated plan have high probabilities. At time t = 3, the robot now knows that both obstacles will not take their left paths and generates a plan in which the probability of robot motions which will cross over the left paths. At time t = 4, there is no obstacle in front of the robot. However, since the environment is not fixed, the robot generates two slightly different paths due to the randomness of the planner. At time t = 5, the paths of the obstacles are fixed, and the robot plans a single path towards the goal. The computation time at t = 0 is 140 [sec] using an Athlon 2200+ PC. At t = 1, the time is 9.5 [sec]. After t = 1, the time does not exceed 0.5 [sec]. The computation time is decreased with time elapses, because the ambiguity of the environment is decreased. We also compared our method with the method which plans a robot motion to avoid all possible path candidates of obstacles. Fig. 10 shows the planning result at t = 0 by this method in the same situation as the one shown in Fig. 8. In the plan, the robot moves slowly until t = 2 to wait for the obstacles to go away. The expected cost of the plan is 12.90 [sec]. On the other hand, our method selects a faster motion in the same situation, and the expected cost is 10.99 [sec]. This result shows the effectiveness of the proposed method.
H. Koyasu and J. Miura / Mobile Robot Motion Planning
93
6. Conclusion This paper has described a path planning method which considers path ambiguity of moving obstacles. The method predicts possible future states and their probabilities, and searches for a robot motion for each predicted state. This prediction and search are recursively performed until every state becomes terminal (i.e., the ambiguity of a state is resolved or the search reaches the depth limit). The method finally selects the next motion which minimizes the expected time to the goal. The proposed method can deal with fairly general motion planning problems with arbitrary static obstacle configurations and multiple moving obstacles with multiple path candidates. The path planner which we use for calculating the costs of terminal nodes is an improved version of a previous randomized method; it iteratively generates a set of paths until a sufficiently good path is generated. The number of iteration is determined based on the quality evaluation of the current best path. The current method still needs too much calculation time to be used on-line. A future work is to consider the trade-off between the quality of a planned path and the computational cost in order to appropriately control the planning time. References [1] M. Bennewitz, W. Burgard, and S. Thrun. Adapting navigation strategies using motions patterns of people. In Proc. of the 2003 IEEE Int. Conf. on Robotics and Automation, pages 2000–2005, 2003. [2] A. Elnagar and A. Basu. Local Path Planning in Dynamic Environments with Uncertainty. In Proc. of IEEE Int. Conf. on Multisencor Fusion and Integration for Intelligent Systems, pages 183–190, 1994. [3] K. Fujimura. Time-Minimum Routes in Time-Dependent Networks. IEEE Tran. of Robotics and Automation, 11(3):343–351, 1995. [4] D. Hsu, R. Kindel, J.C. Latombe, and S. Rock. Randomized kinodynamic motion planning with moving obstacles. Int. J. Robotics Research, 21(3):233–255, 2002. [5] S. Ishikawa. A method of indoor mobile robot navigation by using fuzzy control. In Proc. 1991 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems, pages 1013–1018, 1991. [6] F. Large, S. Sekhavat, Z. Shiller, and C. Laugier. Towards real-time global motion planning in a dynamic environment using the nlvo concept. In Proc. of the IEEE Int. Conf. on Intelligent Robots and Systems, 2002. [7] S. M. LaValle and J. J. Kuffner. Rapidly-exploring random trees: Progress and prospects. In B. R. Donald, K. M. Lynch, and D. Rus, editors, Algorithmic and Computational Robotics: New Directions, pages 293–308. A K Peters, Wellesley, MA, 2001. [8] Y.-H. Liu and S. Arimoto. Path planning using a tangent graph for mobile robots among polygonal and curved objects. Int. J. of Robotics Research, 11(4):376–382, 1992. [9] J. Miura and Y. Shirai. Probabilistic uncertainty modeling of obstacle motion for robot motion planning. J. of Robotics and Mechatronics, 14(4):349–356, 2002. [10] Z. Qu, J. Wang, and C. E. Plaisted. A new analytical solution to mobile robot trajectory generation in the presence of moving obstacles. IEEE Transactions on Robotics, 20(6):978– 993, 2004.
94
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Robotic Navigation using Harmonic Functions and Finite Elements Santiago Garrido a,1 , Luis Moreno a a Carlos III University b Madrid, Spain Abstract. The harmonic potentials have proved to be a powerful technique for path planning in a known environment. They have two important properties: Given an initial point and a objective in a connected domain, it exists a unique path between those points. This path is the maximum gradient path of the harmonic function that begins in the initial point and ends in the goal point. The second property is that the harmonic function cannot have local minima in the interior of the domain (the objective point is considered as a border). Our approach has the following advantages over the previous methods: 1) It uses the Finite Elements Method to solve the PDE problem. This method permits complicated shapes of the obstacles and walls. 2) It uses mixed border conditions, because in this way the trajectories are smooth and the potential slope is not too small and the trajectories avoid the corners of walls and obstacles. 3) It can avoid moving obstacles in real time, because it works on line and the speed is high. 4) It can be generalized to 3D or more dimensions and it can be used to move robot manipulators. Keywords. Path Planning, Harmonic Functions, Finite Elements
1. Introduction Robot motion control can be viewed as an underconstrained problem. The robot exists at a certain configuration, and must get to a certain goal configuration, using any free path. If properly specified, such underconstrained interpolation problems can be solved using Laplace’s equation. Intuitively, Laplace’s equation can be thought of as governing the shape of a thin membrane: Obstacle positions are pulled up, and goal positions are pulled down. If a ball bearing is dropped on the membrane, it will always fall into one of the goal positions, never hitting the obstacles. The movement of the ball corresponds to a change in state of a robot (e.g., the movement of joints during a reach). The trajectory of the ball is always a smooth, obstacle-avoiding path. The harmonic potentials have proved to be a powerful technique for path planning in a known environment. They have two important properties: Given an initial point and a objective in a connected domain, it exists a unique path between those points. This path is the maximum gradient path of the harmonic function that begins in the initial point 1 Correspondence to: Santiago Garrido, Carlos III University, Leganés, Madrid, Spain. Tel.: +34 91 6245990; Fax: +34 91 6249430; E-mail:
[email protected]
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
95
and ends in the goal point. The second property is that the harmonic function cannot have local minima in the interior of the domain (the objective point is considered as a border). These properties means that a path can always be found if it exists and it is impossible to be stuck in a local minima. These are the most important properties that made harmonic functions extremely interesting for path planning purposes. Our approach has the following advantages over the previous methods: 1) It uses the Finite Elements Method to solve the PDE problem. This method permits complicated shapes of the obstacles and walls. 2) It uses mixed border conditions, because in this way the trajectories are smooth and the potential slope is not too small and the trajectories avoid the corners of walls and obstacles. 3) It can avoid moving obstacles in real time, because it works on line and the speed is high. 4) It can be generalized to 3D or more dimensions and it can be used to move robot manipulators. 1.1. Previous work Every method concerning robot motion planning has its own advantages and application domains as well as its disadvantages and constraints. Therefore it would be rather difficult either to compare methods or to motivate the choice of a method upon others. In contrast to many methods, robot motion planning through artificial potential fields (APF) considers simultaneously the problems of obstacle avoidance and trajectory planning (Arkin [1][2]. In addition the dynamics of the manipulator are directly taken into account, which leads in our opinion to a more natural motion of the robot. The first use of the APF concept for obstacle avoidance was presented by Khatib [8]. He proposed a Force Involving an Artificial Repulsion from the Surface (FIRAS, in French) which should be nonnegative, continuous and differentiable. However, the potential field introduced exhibits local minima other than the goal position of the robot. To solve the preceding problem, Volpe and Khosla [11] developed new elliptical potential functions called “superquadric artificial potential functions”, which do not generate local minima in the physical space. They have shown that superquadric potential fields can be constructed only for simple shapes like square or triangular figures. The problem of local minima remains, because the superquadric potential functions do not generate local minima in the workspace but the local minima can occur in the C-space of the robot. The contributions of Koditschek in [9][10][6] are worth to be mentioned because they introduce an analytic potential field in the C-space without local minima. However the topology of the application range is limited to obstacles, which have to be ball-, or star-shaped, otherwise no solution can be found. The contributions of Connolly [4] and of Kim and Khosla [7] are the most successful methods concerning robot motion planning with potential field. They used the harmonic functions to build a potential field in the C-space without local minima. The harmonic functions attain their extreme values at the boundary of the domain. Using a potential function to accomplish a certain motion implies that the trajectory of the robot is not known or calculated in advance, which means that the robot chooses autonomously the way to reach its goal. The main problem of this method is the complication of the Panel Method used that means that the obstacles have to be very simple, like rectangles. The method of harmonic functions based on the Finite Difference Method has been used for guiding robots by Connolly and Gruppen [3]. Edson Prestes [5] used this method for exploration of unknown environments. The method of harmonic functions is not very extended because up to now has been based in the Finite Difference Method, and for
96
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
this reason it is not easy to implement. It is quite slow and has to be used in static environments and the environment has to be very easy with obstacles and walls with parallel straight lines.
2. Harmonic Functions Equilibrium problems in two-dimensional, and higher, continuous give rise to elliptic partial differential equations. A prototype is the famous equation of Laplace: ∇2 φ = 0. This equation holds for the steady temperature in an isotropic medium, characterizes gravitational or electrostatic potentials at points of empty space, and describes the velocity potential of an irrotational, incompressible fluid flow. The two-dimensional counterpart of this equation lies at the foundation of the theory of analytic functions of a complex variable. An harmonic function φ on a domain Ω ∈ R is a function which satisfy the n ∂2φ = 0 where xi is the i − th Cartesian coordinate and Laplace’s equation: ∇2 φ = ∂x2 i=1
i
n is the dimension. In the case of robot path planning, the boundary ∂Ω of Ω, is formed by all the walls, obstacles and the goal, but not the start point. Maximum Principle: A nonconstant harmonic function φ(x) defined on Ω takes its Maximum value M on the boundary and guarantees that there are no local minima in the interior of Ω.
Figure 1. Robot’s trajectories obtained to solved Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
97
2.1. Problems using Harmonic Functions The use of harmonic functions as path planning potentials is a powerful technique but it is not very extended because : First, the previous methods based on finite differences need that all obstacles and walls of the region are composed by parallel straight lines. Second, the previous methods based on finite differences are slow and difficult to implement in a dynamic environment that changes with time.Third, some implementations of these methods that does not assure the existence of a trajectory. For example, when using Harmonic functions as a potential field it is not possible to add an attractive potential to the goal and a repulsive potential from the obstacles because the domain is not the same. In the first case, the goal is one of the borders because this is the only way of being the minimum of the function. In the second case the borders are the maximum and the goal can’t be one of them because the resulting addition function does not have the minimum at the goal. Because of that, the only way of doing a potential attractive to the goal and repulsive from the obstacles is defining the obstacles borders value at the maximum value, the goal border at the minimum value and to solve the Laplace function with this border values. Fourth, in case the obstacles are not fixed, it is necessary to recalculate the harmonic function and path continuously, at each time that a movement is registered in order to obtain an adaptive version of the method.
3. Boundary Conditions The different kinds of contour conditions imposed to the Laplace equation have a critical importance in the solution obtained of the equation and the quality of the trajectory. The directions field associated to the partial derivatives problem give the path that the robot has to track. To achieve a good quality path to goal, some convenient properties of the direction fields are: 1. the trajectories have to be smooth and as short as possible. 2. the potential slope have not to be small. 3. The trajectories have to avoid the corners of walls and obstacles. Because of that, the gradient has to be perpendicular or tangent to the borders of the work space. The solution can be considered as the potential function and the trajectories associated to the direction field of the potential, that is the path of the robot. The contour conditions can be given in three ways: by the values of the potencial function in the contour (Dirichlet), by the gradient values in the contour (Neumann), or by a linear combination of the two of them (Robin or mixed): Dirichlet φ = c(x, y) in the contour. Usually is taken as a constant function φ = K. Neumann ∇(φ) = c(x, y) in the contour. Usually is taken as a constant function ∇(φ) = K. Robin or Mixed n · a · ∇(φ) + b · φ = c(x, y) in the contour. Usually is taken as a constant function n · a · ∇(φ) + b · φ = K. If the solution of the Laplace equation is the velocity potential of an irrotational, incompressible fluid flow, then the Dirichlet conditions represent the different contour levels, the Neumann conditions correspond to the entrance or exit of fluid flow and the Robin conditions correspond to the two of them at the same time.
98
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
Figure 2. Solutions of Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.
Dirichlet Conditions. In the subsection 2.1 was treated some problems in the use of harmonic functions and the reasons why this method is not very extended. Another problem in the way this method is used is that in some articles (e.g. Karnik et al.) three different values are given to the initial point, to the walls and obstacles and to the objective point (value 10 to the inicial point, value 8 to the walls and obstacles and value -10 to the objective). In this situation the trajectories can go to the walls as can be seen in the figure. This situation happens because the Maximum’s Principle says that the solution of the Laplace’s equation can’t have its maximum (and Minimum) in an interior point, but it says nothing about the border. The unique way of not having the end point of a trajectory in the walls is to give them the maximum value. The reason why some authors give to the initial point a higher potential value than the walls values is because in regions far from the objective point the potential values are quite flat ( for Dirichlet conditions). In this situation the gradient is very small and computational error can be produced (round off, quantization,...). The correct way to be sure that all the trajectories end at the objective point is to have only two values of the potential in the border conditions: the maximum value in the walls and obstacles and the minimum value in the objective point. It is important to note that the objective point, the obstacles and the walls belong to the contour of the Laplace’s equation problem. It is possible too, to give the high potential value to the initial point. But, in this way the unicity of the trajectories is lost, and it will be necessary a selection criteria in order to choose the initial direction.
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
99
This high potential value for the initial point is motivated for the logarithmic dependency of the distance to the objective point. Because of this reason, points distant to the objective have a small gradient that difficult the numerical calculus of the trajectory. In order to assure that the trajectories end at the objective point and the solution gradient is not small it is a preferable to use the Robin border conditions. Proposition.- If the borders have a maximum potential value (=1) and the objective point has a minimum value (=0), then there exist a unique trajectory from the initial point to the objective point.
.
Figure 3. Contour of Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.
This proposition shows the convenience of choosing the contour conditions of Dirichlet or Robin, because they give fixed values to the potential conditions in the contour. When Dirichlet’s contour conditions are used the solution is quite flat for values far from the objective point. For this reason, it is easy to have computational errors in the trajectory computation. Apart from that, the trajectories are very smooth and they are not very close to the walls corners and obstacles. Neumann’s Conditions. In the Neumann’s Conditions case, the descent is smooth and continuous, with a slope not close to zero and because of that it is more difficult to have computational errors in the trajectory calculation. In the Neumann’s conditions case it is possible, too, to assure the existence of a unique trajectory from the initial point to the objective point. If the walls and obstacles are sources, with positive flow and the objective point is the only sink, with negative flow, then by similitude with the problem of th velocity potential of an irrotational, incompressible flow there is a unique trajectory from the initial point to the objective. However, the problem with Neuman’s conditions is that the trajectories are very close to the contour corners and the trajectories are so close to these corners that it is impossible to continue the calculation of the trajectory, as can be seen in the figure. Robin Conditions (or mixed). Finally, with the contour conditions of Robin have the advantages of two both and none of the disadvantages: the solution have a smooth slope, continuous and big enough, and, at the same time, the trajectories are not close to the corners. This let us an easy calculation of the trajectories.
4. Comparison between Finite Elements Method and Finite Difference Method The finite elements method (FEM) is particularly useful when a robust approximation is sought to solve partial differential equations on an inhomogeneous mesh. Solid mathematical foundations and a great deal of generality allow for different implementations.
100
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
The most important difference between Finite Elements Method and Finite Difference Method is the treatment of the geometry of the domain. Finite Differences Method uses rectangular meshes and Finite Element Method usually uses triangular meshes. J. Rossell uses a hierarchical decomposition of the domain, but it can be awkward to try to work finite differences over nonuniform meshes. Generally speaking the finite element method is more flexible with curved geometries, etc. Clearly the details of finite element method are more complex than those of the finite difference technique. The end-points of both methods are essentially the same namely a system of linear or nonlinear algebraic equations. The finite element method, however, has a number of advantages: • it is well suited to problems involving complex geometries. • it can readily handle problems where the physical parameters vary with position within the domain. • it can also be used for nonlinear and/or time-varying problems. • complex boundary conditions can be readily dealt with. • general computer programs to perform finite element calculations can be, and have been, developed conventional numerical techniques can be used to solve the equations resulting from a finite element analysis. In few words, anything that can be done nicely with finite differences can be done well with finite elements, but that the converse does not hold.
.
Figure 4. Avoidance of stationary obstacles.
5. Implementation of the FEM method The method starts by approximating the computational domain with a union of simple geometric objects, in the 2D case triangles. The triangles form a mesh and each vertex is called a node. The mesh design has to strike a balance between the ideal rounded forms of the original sketch and the limitations of his simple building-blocks, triangles or quadrilaterals. If the result does not look close enough to the original objects, we can always improve it using smaller blocks. Now using the Laplace’s equation (expressed in Ω) −∇2 u = 0. If uh is a piecewise linear approximation to u, it is not clear what the second derivative term means. Inside each triangle, ∇uh is a constant (because uh is flat) and thus the second-order term vanishes. At the edges of the triangles, ∇uh is in general discontinuous and a further derivative makes no sense. This is the best approximation of u in the class of continuous piecewise polynomials. Therefore we test the equation for uh against all possible functions v of that class. Testing means formally to multiply the residual against any function and then integrate, i.e.,
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
101
determine uh such that Ω (∇ uh )vdx = 0 for all possible v. The functions v are usually called test functions. Partial integration (Green’s formula) yields that uh should satisfy Ω ∇uh ∇v − n · (c∇uh )vds = 0 where ∂Ω is the boundary of Ω and n is the outward pointing ∂Ω normal on ∂Ω. Note that the integrals of this formulation are well-defined even if uh and v are piecewise linear functions. Boundary conditions are included in the following way. If uh is known at some boundary points (Dirichlet boundary conditions), we restrict the test functions to v = 0 at those points, and require uh to attain the desired value at that point. For all the other points Robin boundary conditions are used,i.e., (∇uh ) · n + quh = g . The FEM formulation is: to find uh such that Ω ∇uh ∇v + ∂Ω1 quh vds = ∂Ω1 gvds ∀v where ∂Ω1 is the part of the boundary with Robin conditions. The test functions v must be zero on ∂Ω − ∂Ω1 . Any continuous piecewise linear uh is represented as a combination uh (x) = N i=1 Ui φi (x) where φi are some special piecewise linear basis functions and Ui are scalar coefficients. Choose φi like a tent, such that it has the "height" 1 at the node i and the height 0 at all other nodes. For any fixed v, the FEM formulation yields an algebraic equation in the unknowns Ui . To determine N unknowns, we need N different instances of v. What better candidates than v = φj , j = 1, 2, ..., N ?. We find a linear system KU = F where the matrix K and the right side F contain integrals in terms of the test functions φi , φj and the coefficients defining the problem: q, and g. The solution vector U contains the expansion coefficients of uh , which are also the values of uh at each node xi since uh (xi ) = Ui . If the exact solution u is smooth, then FEM computes uh with an error of the same size as that of the linear interpolation. It is possible to estimate the error on each triangle using only uh and the PDE coefficients (but not the exact solution u, which in general is unknown). The FEM method provides functions that assemble K and F automatically. To summarize, the FEM approach is to approximate the PDE solution u by a piecewise linear function which is expanded in a basis of test-functions φi , and the residual is tested against all the basis functions. This procedure yields a linear system KU = F . The components of U are the values of uh at the nodes. For x inside a triangle, uh (x) is found by linear interpolation from the nodal values.
6. Avoidance of stationary and moving obstacles
.
Figure 5. Avoidance of moving obstacles.
102
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
Figure 6. Local Path Planning using the trajectory done by the solution of the Laplace’s equation calculated by Finite Element Method, with sensor data.
This technique can be used at a global level to plan a free obstacles trajectory by considering the environment information contained in an a priory map. Besides, this technique can also be applied at a local level to consider sensor based information. This situation is required because in spite of the robot has an initial global map of the environment and makes an initial path planning from the initial point to the goal. This map is not perfect and can have errors such as obstacles that doesn’t appear in the map. This obstacles can be static or they can move. In order to avoid obstacles, the robot requires to consider local map information extracted from sensors. This local map is used in a reactive way to find the path avoiding the obstacles. When there are static obstacles, the followed strategy consist of the steps shown in figure 7. This kind of algorithm is possible to be performed today because of the speed of current computers and the flexibility of the resolution of the Laplace’s equation with the finite elements method that permits all kind of shapes in walls and obstacles. An important detail to be considered is that the gradient modulus is proportional to the objective distance (without obstacles). For this reason, it is better to use a constant speed in order to avoid big differences in speed of the robot.
7. Conclusions In this paper, we introduced the harmonic potential function for building potential fields for path planning and moving obstacle avoidance. We proposed the use of harmonic functions to eliminate the local minima even in very complicated environments. The effectiveness of the proposed strategy has been demonstrated using this new finite element method robot navigation, autonomous motion planning in known and unknown environments with dynamic obstacles is greatly improved. The computational efficiency of the proposed control scheme makes it particularly suitable for real-time implementation.The trajectory of the global map is done off-line and the trajectory in a local map of dimensions 40x40, with a grid composed by 3200 triangles lasts less than one second in a 1.6 Mhz laptop PC.
S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements
103
Begin
Calculation of the position of the obstacles t hat doesn’t appear in the map
Resolution of the Laplace’s equation with the border conditions given by the obstacles and walls positions . no
Calculation of the direction field given by the potential of the solution of the Laplace’s equation.
Calculation of the trajectory to the objective.
Tracking of the calculated trajectory a small distance , which depends on the distance to the next mobile object.
Has the robot found the objective? yes End
Figure 7. Flow chart of the algorithm.
References [1] R. C. Arkin. Integrating behavioral, perceptual, and world knowledge in reactive navigation. Robotics and Autonomous Systems, 6, 1990. [2] R. C. Arkin. Behavior-Based Robotics-Intelligent robots and autonomous agents. The MIT Press, 1998. [3] R.A. Grupen C.I. Connolly. The applications of harmonic functions to robotics. J. Robotic Systems, 10(7):931–946, 1993. [4] C. I. Connolly. Harmonic functions as a basis for motor control and planning. PhD thesis, University of Massachussets, Amherst, MA, USA, 1994. [5] M. Trevisan M.A.P. Idiart E. Prestes, P. M. Engel. Exploration method using harmonic functions. Rob. Aut. Systems, 40(1):2542–2558, 2002. [6] D.E. Koditschek E. Rimon. Exact robot navigation using artificial potential functions. IEEE Transactions on Robotics and Automation, 8(5):501–518, 1992. [7] P. K. Khosla J. O. Kim. Real-time obstacle avoidance using harmonic potential functions. IEEE Transactions on Robotics and Automation, 8(3):338–349, 1992. [8] O. Khatib. Commande dynamique dans l’espace opérationnel des robots manipulateurs en présence d’obstacles. PhD thesis, École Nationale Supérieure de l’Aéronautique et de l’Espace (ENSAE), Toulouse, France, 1980. [9] D. E. Koditschek. The control of natural motion in mechanical systems. Journal of Dynamic Systems, Measurement, and Control, 113:547–551, 1991. [10] D.E. Koditschek. Some applications of natural motion. Journal of Dynamic Systems, Measurement, and Control, 113:552–557, 1991. [11] P. Khosla R. Volpe. Manipulator control with superquadric artificial potential functions: Theory and experiments. IEEE Transactions on Systems, Man, and Cybernetics,, 20(6):1423– 1436, 1990.
104
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
The Hippocampal Place Cells and Fingerprints of Places: Spatial Representation Animals, Animats and Robots a.
Adriana Tapusa1, Francesco Battagliab and Roland Siegwarta Ecole Polytechnique Fédérale de Lausanne (EPFL), Autonomous Systems Lab, Switzerland b. Collège de France, LPPA, France
Abstract. In this paper we address the problem of autonomous navigation seen from the neuroscience and the robotics point of view. A new topological mapping system is presented. It combines local features (i.e. visual and distance cues) in a unique structure – the “fingerprint of a place” - that results in a consistent, compact and distinctive representation. Overall, the results suggest that a process of fingerprint matching can efficiently determine the orientation, the location within the environment, and the construction of the map, and may play a role in the emerging of spatial representations in the hippocampus
1. Introduction In all our daily behaviors, the space we are living and moving in plays a crucial role. Many neurophysiologists dedicate their work to understand how our brain can create internal representations of the physical space. Both neurobiologists and robotics specialists are interested in understanding the animal behavior and their capacity to learn and to use their knowledge of the spatial representation in order to navigate. The ability of many animals to localize themselves and to find their way back home is linked to their mapping system. Most navigation approaches require learning and consequently need to memorize information. Stored information can be organized as cognitive maps – term introduced for the first time in [31]. Tolman’s model advocates that the animals (rats) don’t learn space as a sequence of movements; instead the animal’s spatial capabilities rest on the construction of maps, which represent the spatial relationships between features in the environment. Several methods, each with its advantages and drawbacks, have been proposed to construct maps in the framework of autonomous robot navigation, from precise geometric maps based on raw data or lines to purely topological maps using symbolic descriptions. To mention only a few papers in the vast SLAM (Simultaneous Localization and Mapping) literature, Leonard and Durrant-Whyte introduced for the first time the concept of SLAM as the construction of maps while the robot moves 1
Corresponding Author: Ecole Polytechnique Fédérale de Lausanne (EPFL), Autonomous Systems Lab, 1015 Lausanne, Switzerland ; E-mail :
[email protected]
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
105
through the environment and the localization with respect to the partially built maps [15]. Many works in space representations are based on metric maps. The stochastic map technique to perform SLAM [3, 7, 15] and the occupancy grids approaches [28] are typical examples belonging to this kind of space representation. More recent visionbased metric approaches use SIFT features [24]. However, metric SLAM can become computationally very expensive for large environments. Thrun in [29] proposes probabilistic methods that make the metric mapping process faster and more robust. However, one of the main shortcomings of metric maps is that they are not easily extensible so as to be useable for higher level, symbolic reasoning. They contain no information about the objects and places within the environment. Topological approaches to SLAM attempt to overcome the drawbacks of geometric methods by modeling space using graphs. Significant progress has been made since the seminal papers by Kuipers [13, 14]; Kortenkamp and Weymouth in [12] have also used cognitive maps for topological navigation. They defined the concept of gateways which mark the transition between two adjacent places in the environment. Their work has been an amelioration of Mataric’s approach [17], contributing towards the reduction of the perceptual aliasing problem (i.e. observations at multiple locations are similar). They have used the data from sonars combined with vision information in order to achieve a rich sensory place-characterization. A model by Franz, Schölkopf and Mallot [8] was designed to explore open environments within a maze-like structure and to build graph-like representations. The model described in [5] represents the environment with the help of a Generalized Voronoi Graph (GVG) and localize the robot via a graph matching process. This approach has been extended to H-SLAM (i.e. Hierarchical SLAM) in [16], by combining the topological and feature-based mapping techniques. In [30], Tomatis et al. have conceived a hybrid representation, similar to the previously mentioned work, in which a global topological map with local metric maps associated to each node for precise navigation is described. Topological maps are less complex, permit more efficient planning than metric maps and they are easier to generate. Maintaining global consistency is also easier in topological maps compared to metric maps. However, topological maps suffer from perceptual aliasing (i.e. observations at multiple locations are similar) and the difficulty in automatically establish a minimal topology (nodes). Our method uses fingerprints of places to create a cognitive model of the environment. The fingerprint approach, by combining the information from all sensors available to the robot, reduces perceptual aliasing and improves the distinctiveness of places. The main objective of this work is to enable the navigation of an autonomous mobile robot in structured environments without relying on maps a priori learned and without using artificial landmarks. A new method for incremental and automatic topological mapping and global localization [26] using fingerprints of places is described. The mapping method presented in this paper uses fingerprints of places to create a cognitive model of the environment. The construction of a topological mapping system is combined with the localization technique, both relying on fingerprints of places. This fingerprint-based approach yields a consistent and distinctive representation of the environment and is extensible in that it permits spatial cognition beyond just pure navigation.
106
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
2. Navigation Framework Navigation strategies are based on two complementary sources of information (available on the mobile agent: animal, robot): idiothetic and allothetic. The idiothetic source yields internal information about the mobile agent movements (e.g. speed, acceleration) and the allothetic source provides external information about the environment (e.g. the cues coming from the visual, odor, laser range finders, sonars, etc.). Idiothetic information provides a metric estimate of the agent’s motion, suffering from errors accumulation, which makes the position estimation unreliable at long-term. In contrast, the allothetic (sensory) data is stationary over the time, but is susceptible to perceptual aliasing (i.e. observations at multiple locations are similar) and requires non-trivial processing in order to extract spatial information. The map-based navigation needs map-learning and localization. Map-learning is the process of constructing a map representing the environment explored by the mobile agent and localization is the phenomenon of finding the mobile agent’s location (position) in the map. Localization and mapping are interdependent – to localize the robot, a map is necessary and to update a map the position of the mobile agent is needed. This is usually known as Simultaneous Localization and Mapping (SLAM) problem that is of a “chicken and egg“ nature. While navigating in the environment, the mobile agent first creates and then updates the map.
3. Fingerprints of Places and Space Cognition The seminal discovery of place cells, by O’Keefe and Dostrovsky [20], in the rat hippocampus – cells whose firing pattern is dependent on the location of the animal in the environment – led to the idea that the hippocampus works as a cognitive map of space [21]. It was shown in [4] (for a review see e.g. [23]) that the lesion of the hippocampus impairs the performance of rodents in a wide variety of spatial tasks indicating a role of the hippocampus in map-based navigation. The framework for topological SLAM (Simultaneous Localization and Mapping) (see Figure 2) that we propose here organizes spatial maps in cognitive graphs, whose nodes correspond to fingerprints of places, and may be seen as a possible mechanism for the emergence of place cells. The computational model describes how a mobile agent can efficiently navigate in the environment, by using an internal spatial representation (similar to some extent to hippocampal place cells). This model builds a topological (qualitative) representation of the environment from the sequence of visited places. Many visual based systems for place fields based on metric information have been extensively discussed in literature (e.g. [10], [11] and [1] are just some of them). In this work, places in the environment are characterized by fingerprints of places. This characterization of the environment is especially interesting when used within a topological framework. In this case the distinctiveness of the observed location plays an important role for reliable localization and consistent mapping. A fingerprint of a place is a circular list of features, where the ordering of the set matches the relative ordering of the features around the robot. We denote the fingerprint sequence using a list of characters, where each character is the instance of a specific feature defining the signature of a place. In this work, we choose to extract color patches and vertical edges from visual information and corners (i.e. extremity of line-segments) from laser scanner. The letter cvc is used to characterize an edge, the letters cAc,cBc,cCc,...,cPc to
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
107
represent hue bins and the letter ccc to characterize a corner feature (i.e. in this work, a corner feature is define as the extremity of a line-segment extracted with the DouglasPeucker algorithm). An cempty spacec between features is also denoted by the character cnc in the sequence, providing the angular distance between the features, which is some kind of very rough metric information. Figure 1 depicts how a fingerprint of a place is generated through an example. More details about the fingerprint approach can be found in [27]. With our fingerprint based-approach, the allothetic sensors are used (e.g. this choice has been made because similarly the animals are using multimodal sensory information). The fingerprints of places are integrating the information from the omnidirectional camera and the laser range finder, characterizing different places and being used to map (model) the environment. The relative angular position of the local features is also enclosed in the fingerprint of a place. A fingerprint of a place is associated to each distinctive place within the environment and so the result given by the fingerprint matching algorithm is strongly correlated to the location of the mobile agent in the environment, giving high or the highest probability to the correct place associated to the fingerprint. The firing of place cells units can be seen as the manifestation of fingerprint matching. The closer to the center of the place field the animal is, the higher the rate of neural firing.
a)
b)
c)
Figure 1: Fingerprint generation. (a) panoramic image with the vertical edges and color patches detected, denoted by cvc and cAc…cPc, respectively ; (b) laser scan with extracted corners ccc; (c) the first three images depict the position (0 to 360°) of the colors (I-light blue, B- orange and E-light green), vertical edges and corners, respectively. The forth image describes the correspondence between the vertical edge features and the corner features. By regrouping all these results together and by adding the empty space features, the final fingerprint is: cIfvnvcvfnvvncvnncvBnvBccE
108
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
Similarly, the nearer the new observation of the robot (i.e. the new observed fingerprint of a place) will be with respect to the registered (learned) place (i.e. a known fingerprint of a place), the higher the probability of the mobile agent of being in an already explored place. One of the main issues in topological map building is to detect when a new node should be added in the map. Most of the existing approaches to topological mapping place nodes periodically in either space (displacement, 'd) or time ('t) or alternatively attempt to detect important changes in the environment structure. Any of these methods cannot result in an optimal topology. In contrast, our approach is based directly on the differences in the perceived features. Instead of adding a new node in the map by following some fixed rules (e.g. distance, topology) that limit the approach to indoor or outdoor environments, our method introduces a new node into the map whenever an important change in the environment occurs. This is possible using the fingerprints of places. A heuristic is applied to compare whether a new location is similar to the last one that has been mapped.
Figure 2: The spatial representation framework encodes the topological relationships between places, by comparing the actual observation (fingerprint of a place) of the mobile agent with the previously mapped places.
The process of introducing a new node in the topological map is split into several sequences of steps as follows: 1) Start with an initial node (i.e. fingerprint f0) 2) Move and at each 't (time) or 'd (distance), take a new scan with the laser scanner and a new image with the omnidirectional camera and generate the new fingerprint fi 3) Calculate the probability of matching, prob_matching, between the fingerprints fi-1 and fi respectively. Compute the dissimilarity factor, dissimilarity. prob_matching = P (fi ¸ fi-1)
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
109
dissimilarity(fi , fi-1) = 1- prob_matching 4) If dissimilarity(fi , fi-1)
Figure 3: Adding a new node automatically to the topological map by moving in an unexplored environment. The image is composed of seven measurement points (i.e. fingerprints of places) represented by the black points. The blue points depict the data given by the laser range finder and they are used only for reference. The mapping system includes all the fingerprints of places in a node until a significant change in the environment occurs and the dissimilarity between the fingerprints is greater than the threshold T.
Closing the loop problem (i.e. the identification of a place previously visited, if the robot returned to it) is an important problem in SLAM. Thus, for topological maps, this means that if a place (i.e. a node) has been visited before, and the robot returns to it, the robot should detect it. Contrary to other methods used for solving this problem, based usually on the perception, in our case, loops are identified and closed with the help of the localization technique using POMDP (Partially Observable Markov Decision Processes). In order to accomplish consistency of the topological map, a method similar
110
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
to the one described in [30] is used. Our method for closing the loops with fingerprints of places is detailed and fully described in [27]. Experiments were performed in a typically indoor environment (i.e. a portion of our institute building; see Figure 4). The resulting map is composed of 20 nodes. Each node is represented by a mean fingerprint which is a characterization of all the fingerprints composing the respective node. Nodes are automatically placed in locations where either a very salient landmark was present, or a door or another obstacle creates very different visual environments on its two sides (reflecting an intuitive notion of “place”). In fact, the doors of some rooms remained closed at the time of experimentation, and no node was placed in front of the respective rooms (see Figure 4). The representation obtained reproduces correctly the topological structure of the laboratory (see Figure 4). It is important to mention that the map is realized by using local features only, organized in fingerprints of places. The topological structure of the environment emerges in the map as a result of the computation, even though was not an input to the algorithm.
Raw Scan Map (for reference only)
Coffee Room
Corridor Bill’s Room Figure 4: Floor plan of the environment where the experiments have been conducted. The robot starts at the point S and ends at the point E. The trajectory length is 75 m. During this experiment, the robot collected 500 data sets (i.e. images and scans) from the environment. The extracted topological map is superimposed on an architectural sketch of the environment. The second image shows the extracted topological map given by our method, superimposed on the raw scan map.
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
111
The quality of the topological maps obtained with our fingerprint-based technique can be easily estimated by testing the localization on it. To test the localization, more than 1000 fingerprint samples, acquired while the robot was traveling a new path of 250 m in the indoor environment shown above, were used to globally localize the robot. The results, obtained with the fingerprint matching algorithm (i.e. global alignment with uncertainty), have given a percentage of successful matches of 81%. However, false-classified nodes have delivered high probability (2nd or 3rd highest probability) and can be used with a POMDP (Partially Observable Markov Decision Processes). The POMDP localization improves the results obtained with the fingerprint matching approach. Adding the motion of the robot enables to decrease further the pose uncertainty to a level that could never be reached by fingerprint matching alone. A success rate of 100% was obtained for the tests performed in this work. More details about our indoor localization approach using POMDP and results can be found in [27]. The closing the loop problem has also been tested. The robot succeeded all the times to close the loops. As explained earlier, due to the fact that the offices are quite small, the fingerprints of places are very similar, and thus a single node per room is enough. Since a node contains a posterior knowledge about its environment and is the aggregation of all the fingerprints of places between the last node and the current place where an important change into the environment occurred, closing the loop problem does not appear in these cases (i.e. when one node per office is sufficient).
4. Hippocampal Place Cells as Fingerprints of Places? The method presented here can efficiently create representations of places in an environment and locate the robot/animat in the environment. The place cells in the hippocampus accomplish the same task: the activation of a place cell, or perhaps better, of an assembly of place cells connected to each other, indicates that the hippocampus is locating the animal in a certain place. It can be suggested here that the hippocampus may indeed extract place from its sensory input by constructing fingerprints of places similar to that described in this work. Indeed, in environments rich in landmarks, or features, the hippocampal cognitive map is dominated by the sensory inputs (see e.g. [19], [9], [2]). Changing the relative position of landmarks can cause a complete change in place cells activity (“remapping”) so that a new set of place cells gets an assigned to a given place, just as it would be the case for our fingerprint algorithm [6]. Many theoreticians have proposed models of place cells based on visual inputs, where the visual stream is encoded in metric terms, that is, in terms of the distances between the landmarks, and between each landmarks and the agent (e.g. [1], [10], [11]). Fingerprint representations are based on the relative angular position of the landmarks from a given point of view, a much simpler and robust measure, and may be able to explain many of the experimental evidences on place cells, at least those in which multiple landmarks were available to the animal. It is also to be remarked that some kind of metric information is contained in the fingerprint representation, through the “empty space” symbols in the fingerprint sequence, and that that information may allow the explanation of a larger class of experimental results. For the brain to perform the fingerprint matching, several building blocks are necessary: first, the identification of the landmarks, which may take place for example in the inferotemporal cortex, second, the determination of the relative position of multiple landmarks, which probably takes place in the parietal lobe ([6], [22]). The
112
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
hippocampus may gather this information and produce a unitary representation (which would correspond to a fingerprint), presumably in terms of an attractor configuration of the CA3 module (which is very rich in recurrent synaptic connections and is thought to work as an attractor network module). At the moment of localization, the current input may be fed into the attractor dynamics, and, if the fingerprint matches one of the previously stored ones, the corresponding attractor is recalled. In the case of a failed match, the attractor dynamics will not produce an attractor state, and this fact may be use to signal a novel situation, and trigger the plasticity processes that allow the storage of a new memory. This vision of hippocampal space representations highlights the role of the hippocampus as a processor of combinatorial information, whose importance transcends the purely spatial domain. In the case of space computation the hippocampus would process combinations of landmark identity and relative position information, and produce an index, which can be attached to a physical location. It is important to remark here that in our scheme the place representation does not entail any notion of Euclidean space, contrarily to what hypothesized in [21] and in a number of more recent works (see review in [23]). In our view, the computation of place from sensory input (through a fingerprintlike procedure), could be integrated in the hippocampus by the idiothetic information, which plays an important role especially in conditions in which only poor sensory input is available (for example, in the dark), and to disambiguate situations of perceptual aliasing (see e.g. [25]).
5. Conclusions Here, we tried to present our research framework, underlying the interest of mutual inspiration between robotics, biology and neurophysiology. Our computational model has some foundation in neurobiology, being similar with the hippocampus, which plays a crucial role in spatial representation. In order to validate our model experimentally, we have tested it with a real autonomous mobile robot. The mobile agent continuously interacted with the environment and thereby accumulated information about its space. Thus, an incremental and dynamic navigation framework was built, allowing the mobile agent to cope with unknown situations. The proposed spatial representation is an incrementally learned representation, based on fingerprints of places; the fingerprint place modeling being comparable with the place coding model in the animals (rats) hippocampus. Acknowledgments The authors would like to thank the BIBA IST-2001-32115 EU project, which is funding this research. References [1] [2]
Arleo A. and Gerstner W., Spatial Cognition and Neuro-Mimetic Navigation: A Model of Hippocampal Place Cell Activity, Biological Cybernetics, 83:287-299, 2000. Battaglia FP, Sutherland GR, McNaughton BL (2004) Local sensory cues and place cell directionality: additional evidence of prospective coding in the hippocampus. J Neurosci 24:4541-4550.
A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31]
113
Castellanos J.A., and Tardos J.D. (1999), Mobile Robot Localization and Map Building: Multisensor Fusion Approach, Kluwer. Cho, Y.H., Giese K.P., Tanila, H.T., Silva, A.J. and Eichenbaum, H. (1998), Abnormal hippocampal spatial representations in alphaCaMKIIT286A and CREBalphaDelta- mice, Science 279 (1998) 867869. Choset, H., and Nagatani, K.(2001), Topological Simultaneous Localization and Mapping (SLAM): Toward Exact Localization Without Explicit Localization, IEEE Trans. On Robotics and Automation, Vol 17, No.2, April. Cressant, A., Muller, R.U. and Poucet B. (2002), Remapping of place cells firing patterns after maze rotations, Exp. Brain. Res. 143, 470-9 Dissanayake, Newman, Clark, Durrant-Whyte and Csorba (2001), A Solution to the Simultaneous Localization and Map Building (SLAM) problem, IEEE Trans. On Robotics and Automation, Vol 17, No.3, June. Franz, M.O., Schölkopf, B., Mallot, H.A. and Bülthoff, Learning view graphs for robot navigation, Autonomous Robots 5 111-125, 1998. Gothard KM, Skaggs WE, Moore KM, McNaughton BL (1996b) Binding of hippocampal CA1 neural activity to multiple reference frames in a landmark-based navigation task. Journal of Neuroscience 16:823-835. Hartley T, Burgess N, Lever C, Cacucci F, O'Keefe J (2000) Modeling place fields in terms of the cortical inputs to the hippocampus. Hippocampus 10:369-379. Kali S, Dayan P (2000) The involvement of recurrent connections in area CA3 in establishing the properties of place fields: a model. J Neurosci 20:7463-7477. Kortenkamp, D. and Weymouth, T., Topological mapping for mobile robots using a combination of sonar and vision sensing, In Proceedings of AAAI-94, Seattle, WA, 1994. Kuipers, B. J., Modeling Spatial Knowledge, Cognitive Science, 2: 129-153, 1978. Kuipers, B. J., The Cognitive Map: Could it have been any other way?, In Spatial Orientation: Theory, Research and Application. Picks H.L. and Acredolo L.P. (eds.), New York. Plenum Press, 1983. Leonard J.J, H.F. Durrant-Whyte (1992), Directed Sonar Sensing for Mobile Robot Navigation, Kluwer Academic Publishers, Dordrecht. Lisien, B., et al. (2003), Hierarchical Simultaneous Localization and Mapping, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robot and Systems, Las Vegas, USA, October 27-30. Mataric, M. J., Navigating with a rat brain: A neurobiologically-inspired model for robot spatial representation. In: J.A.Meyer, S.W.Wilson (Eds), From Animals to Animats, MIT Press, Cambridge, MA, 1991. Needleman, S.B., and Wunsch, C.D, A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology 48, 442-453, 1970. O'Keefe J, Burgess N (1996) Geometric determinants of the place fields of hippocampal neurons [see comments]. Nature 381:425-428. O’Keefe, J., and Dostrovsky, J., The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171-175, 1971. O’Keefe, J., and Nadel, L., The hippocampus as a cognitive map, Clarendon, Oxford, 1978. Poucet B, Lenck-Santini PP, Paz-Villagran V, Save E (2003) Place cells, neocortex and spatial navigation: a short review. J Physiol Paris 97:537-546. Redish AD (1999) Beyond the Cognitive Map: From Place Cells to Episodic Memory. Cambridge, MA: MIT Press. Se S., Lowe D. and Little, J. (2002), Mobile Robot Localization and Mapping with uncertainty using Scale-Invariant Visual Landmarks, The International Journal of Robotics Research, Vol. 21, No. 8, pp. 735-758. Skaggs WE, McNaughton BL (1998) Spatial firing properties of hippocampal CA1 populations in an environment containing two visually identical regions. Journal of Neuroscience 18:8455-8466. Tapus, A., Tomatis, N. and Siegwart, R., Topological Global Localization and Mapping with Fingerprint and Uncertainty. In Proceedings of the ISER, Singapore, June 2004. Tapus, A., Topological SLAM- Simultaneous Localization and Mapping with Fingerprints of Places. Ph.D Thesis, Ecole Polytechnique Federale de Lausanne, October 2005. Thrun, S. (1998), Learning metric-topological maps for indoor mobile robot navigation. In Artificial Intelligence 99(1):21-71. Thrun, S. (2000), Probabilistic algorithms in robotics. In Artificial Intelligence Magazine 21(4):93-109. Tomatis, N., I. Nourbakhsh, and R. Siegwart (2003). Hybrid simultaneous localization and map building: a natural integration of topological and metric. Robotics and Autonomous Systems, 44:3-14. Tolman, E. C., Cognitive maps in rats and men, Psychological Review, 55:189-208, 1948.
114
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Incremental Reconstruction of Generalized Voronoi Diagrams on Grids Nidhi Kalra 1 , Dave Ferguson and Anthony Stentz The Robotics Institute, Carnegie Mellon University, Pittsburgh, USA Abstract. We present an incremental algorithm for constructing and reconstructing Generalized Voronoi Diagrams (GVDs) on grids. Our algorithm, Dynamic Brushfire, uses techniques from the path planning community to efficiently update GVDs when the underlying environment changes or when new information concerning the environment is received. Dynamic Brushfire is an order of magnitude more efficient than current approaches. In this paper we present the algorithm, compare it to current approaches on several experimental domains involving both simulated and real data, and demonstrate its usefulness for multirobot path planning. Keywords. Voronoi diagrams, incremental algorithms, robot navigation
1. Introduction Efficient path planning is a fundamental requirement for autonomous mobile robots. From a single robot navigating in a partially-known environment to a team of robots coordinating their movements to achieve a common goal, autonomous systems must generate effective trajectories quickly. The efficiency of path planning algorithms can be greatly affected by the type of representation used to encode the environment. Common representations include uniform and non-uniform grids, probabilistic roadmaps, generalized Voronoi diagrams (GVDs), and exact cell decompositions. GVDs in particular are very useful for extracting environment topologies. A GVD is a roadmap that provides all possible path homotopies in an environment containing obstacle regions. The GVD also provides maximum clearance from these regions. Such a representation has practical application to many robotic domains such as multirobot planning, stealthy navigation, surveillance, and area coverage. Figure 1 presents the GVD of an outdoor environment. GVDs can be used as complete representations of their environments [14,3] and to augment other representations such as probabilistic roadmaps [6,5]. Given a GVD, planning from a start position to a goal position consists of three steps. First, plan from the start to its closest point on the GVD (the access point). Second, plan along the GVD until the point closest to the goal is reached (the departure point). Then, plan from the departure point to the goal. Since the GVD is a graph, any graph search algorithm can be used to plan between the access and departure points, often at a fraction of the computational expense required to plan over the complete environment. 1 Correspondence to: Nidhi Kalra; Robotics Institute; Carnegie Mellon University; 5000 Forbes Ave, Pittsburgh, PA 15232 Tel.: +1 412 268 9923; E-mail:
[email protected]
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
115
Figure 1. An outdoor environment traversed by one of our John Deere eGator robotic platforms. (Left) The map generated by the robot (white is traversable area, black is untraversable, gray is unknown, and the grid has 5m spacing). (Center) The GVD constructed from this map by Dynamic Brushfire with C-space obstacle expansion. (Right) The eGator and the environment in which these maps were generated.
In many domains, robots must navigate in environments for which prior information is unavailable, incomplete, or erroneous. To harness the benefits of GVDs for planning in these environments, a robot must update its map when it receives sensor information during execution, reconstruct the GVD, and generate a new plan. GVD reconstruction and replanning must occur frequently because new information is received almost continuously from sensors. However, because new information is usually localized around the robot, much of the previous GVD remains correct and only portions require repair. Unfortunately, as we discuss in Section 2, the existing algorithms for constructing GVDs have no local reconstruction mechanism and cannot take advantage of this feature; instead, they discard the existing GVD and create a new one from scratch. This frequent full reconstruction is both computationally expensive and unnecessary. In this paper, we present the Dynamic Brushfire algorithm for incremental reconstruction of GVDs on grids. We first discuss related techniques for GVD construction. Next, we present our algorithm both intuitively and through pseudocode. We then compare this algorithm to existing algorithms on several common robot navigation scenarios on both real and simulated data. We also demonstrate the usefulness of GVDs and our algorithm in particular for coordinated multirobot path planning.
2. Related Work The Voronoi region of an obstacle O is the set of points whose distance to O is less than or equal to their distance to every other obstacle in the environment. The GVD of an environment is the intersection of two or more Voronoi regions. Each of these intersection points is equidistant from its two (or more) closest obstacles. Several methods exist for
116
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
computing the GVD. First, the GVD can be constructed in continuous space. For example, Nageswara et al. [12] represent the obstacles as simple polygons and geometrically construct the GVD, and Choset et al. [4] simulate an agent “tracing out” the GVD as it moves through the environment. Second, the GVD can be constructed in discrete space, e.g. on grids. In this paper we focus on GVDs constructed on grids because of the prevalence of grid-based environment representations in mobile robot navigation. Here, the Voronoi regions and the GVD are computed over the finite set of grid cells. Researchers have used graphics hardware to generate grid-based GVDs very quickly [7]; however, this is often infeasible for mobile robot platforms with limited on-board hardware. Alternatively, fast hardware-independent algorithms exist for constructing GVDs on low-dimensional grids [1,2,13]. These algorithms require as input a binary grid and a mapping from each occupied cell in the grid to the obstacle to which it belongs (the latter information helps determine the boundaries between different Voronoi regions). In general, these algorithms scan the grid and compute for each cell its closest obstacle and its distance to that obstacle; those cells that are equidistant from two or more obstacles are included in the GVD. These algorithms run in linear time O(mn) where m and n are dimensions of the grid. Thus, the computation required is a factor of the resolution of the representation rather than the complexity of the environment. The first of these is the well-known Brushfire algorithm [1]. Brushfire is analogous to Dijkstra’s algorithm for path planning in that it processes cells in an OPEN priority queue, where decreasing priority maps to increasing distance from an obstacle. Initially, each obstacle cell in the environment is placed on the queue with a priority of 0 (the cell’s distance to the nearest obstacle). Then, until the queue is empty, the highest-priority cell is removed, its neighboring cells’ distances are computed, and any cell c whose distance distc has been lowered is updated to be on the queue with priority distc . The distance distc of each cell is approximated from the distances of its neighbors: distc =
min [distance(c, a) + dista ]
a∈Adj(c)
(1)
where Adj(c) is the set of cells adjacent to c (usually 4- or 8- connected) and distance(c, a) is the distance between c and a (usually Manhattan or Euclidean distance). Brushfire only makes one pass through the grid but has the added expense of keeping a priority queue which usually requires O(log(x)) time for operations, where x is the number of elements in the queue. The quasi-Euclidean distance transform algorithm developed by Rosenfeld et al. [13] makes two sequential passes through the grid, from top to bottom and then bottom to top. For each cell encountered, it simply performs an 8-connected search of neighboring cells to determine the cell’s distance and nearest obstacle using Eq. 1. The Euclidean distance transform algorithm by Breu et al. [2] constructs the GVD by determining which Voronoi regions intersect each row in the grid. For each cell in a particular row, it computes the exact distance to the obstacle that is closest to it. Although these algorithms are fast, they provide no mechanism for incremental reconstruction. We compare their performances with Dynamic Brushfire on several example robotics scenarios in Section 4.
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids 1
2
3
4
5
6
7
8
9
10
117
Figure 2. A sequence of distance propagations when obstacles (red/grey dots) are added and removed. (Top) A new obstacle cell in the center of the environment produces a wave of overconsistent states (in bold black) which terminates at the boundary of the new Voronoi region (in gray). (Bottom) The center cell is removed and produces a wave of underconsistent states (in bold blue/light gray) that pushes out to the corners of the old Voronoi region while an overconsistent propagation (in bold black) pulls back inwards to the boundaries of the new Voronoi regions.
3. The Dynamic Brushfire Algorithm Just as Brushfire is analogous to Dijkstra’s algorithm for planning, Dynamic Brushfire is analogous to the unfocused D* family of efficient replanning algorithms [15,10]. When new information is received concerning the environment (e.g. from a robot’s sensors), these algorithms only propagate the new information to portions of the map that could be affected. Thus, they avoid unnecessarily re-processing the entire state space. In the grid-based GVD context, new information consists of obstacles being asynchronously added and removed from the environment, in whole or in part. When an obstacle cell is removed, the distances increase for exactly those cells that were closer to it than to any other obstacle cell. When an obstacle cell is added, the distances decrease for exactly those cells that are closer to it now than to any other obstacle cell. Dynamic Brushfire is efficient because it determines and limits processing to only cells within these two sets. 3.1. Algorithm Intuition Dynamic Brushfire requires the same input as the other approaches discussed in Section 2: a grid and a mapping from obstacle cell to obstacle. For each cell s it maintains the obstacle obsts to which it is currently closest. It also maintains a distance dists to its closest obstacle from the previous GVD construction and a one-step lookahead distance distnew s to the closest obstacle given the changes that have occurred since that construction. A cell is consistent if dists = distnew , overconsistent if dists > distnew , or underconsistent s s new if dists < dists . Like A*, D*, and Brushfire, Dynamic Brushfire keeps a priority queue OPEN of the inconsistent cells to propagate changes. A cell s’s priority is always min(dists , distnew ) s and cells are popped with increasing priority values. When a cell is popped from the OPEN queue, its new distance is propagated to its adjacent cells, and any newlyinconsistent cells are put on the OPEN queue. Thus, the propagation emanates from the source of the change and terminates when the change does not affect any more cells.
118
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
When a cell is set to obstacle, it reduces the new minimum distance of adjacent cells which propagate this reduction to their adjacent cells, etc. This creates an overconsistent sweep emanating from the original obstacle cell (Figure 2, panels 1 to 3 ). An overconsistent sweep terminates when cells are encountered that are equally close or closer to another obstacle and thus cannot be made overconsistent. These cells lie on the boundary between Voronoi regions and will be part of the new GVD (Figure 2, panels 4 and 5). When an obstacle cell is removed, all cells that previously used it to compute their distances are invalid and must recompute their distances. This propagation occurs in two sweeps. First, an underconsistent propagation sweeps out from the original cell and resets the affected cells (Figure 2, panels 6 and 7). This sweep terminates when cells are encountered that are closer to another obstacle and cannot be made underconsistent. Thus, at most those cells in the removed obstacle cell’s Voronoi region are invalided. Then, an overconsistent propagation sweeps inwards and uses the valid cells beyond this region to recompute new values for the invalid cells (Figure 2, panels 8 and 9). 3.2. Algorithm Pseudocode Initialize () 1 OPEN =TIES = ∅; 2 foreach cell s 3 dists =distnew =∞ s 4 parents =ties =∅ 5
obsts =∅
SetObstacle (o, O) =0 1 2 obsto = O 3 parento =o distnew o
4 validO =TRUE 5 insert(OPEN , o, distnew ) o
RemoveObstacle (o, O) =∞ 1 distnew o 2 obsto =∅ 3 if O - o =∅ 4 validO =FALSE; 5 insert(OPEN , o, disto )
Figure 3. Pseudocode for initializing the grid and adding and removing obstacle cells.
Figures 3, 4, and 5 present pseudocode for the Dynamic Brushfire algorithm. In ad, and obsts , each cell s tracks the cell parents from which its dition to dists , distnew s distance is computed and a cell ties that is adjacent to s and equidistant to a different obstacle. Additionally an obstacle is invalid (validO = FALSE) when it is entirely removed from the environment. Dynamic Brushfire first initializes all distances to infinite and pointers to unknown. Then, obstacles are added and removed using SetObstacle and RemoveObstacle. When a cell o is added to obstacle O (SetObstacle(o, O)), its distance becomes zero, its parent cell is itself, and it is placed on the queue as an overconsistent cell. When a cell o is removed from obstacle O (RemoveObstacle(o, O)), its properties are reinitialized and the cell is placed on the queue as an underconsistent cell. The GVD is reconstructed by calling the function RebuildGVD which removes cells from the OPEN queue for processing. When an overconsistent cell is removed (lines 2-4), its lookahead distance will be correct, so its current distance is updated. When an underconsistent cell is removed (lines 7-10), its old distance is reset so that it can subsequently be processed as an overconsistent cell. The GVD is marked in two steps. First, when a cell s becomes consistent (line 3 in RebuildGVD), we check if it should be considered for inclusion in the GVD. In the function ConsiderForGVD, if a cell s has an adjacent cell n in a different Voronoi region, s is placed on the TIES list of possible GVD cells and n is recorded for later use. Thus, if any two adjacent cells lie in different Voronoi regions, at least one of them will be placed on the TIES list. Once the OPEN list is empty, the TIES list is examined in function ConstructGVD to determine which cells belong on the GVD using the same criterion.
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
RebuildGVD()
ConsiderForGVD(s) 1 foreach n ∈ Adj(s) 2 if obsts = obstn 3 ties = n
1 while s = remove(OPEN ) 2 3
119
if distnew < dists s dists = distnew ; s
4
ProcessLower(s)
5 6
ConsiderForGVD(s) else
7 8
dists =∞ if distnew = dists s
9 insert(OPEN , s, 10 ProcessRaise(s) 11 ConstructGVD()
4
insert(TIES , s)
ConstructGVD(s) 1 while s = remove(TIES )
distnew ) s
2 3
foreach n ∈ Adj(s) if obstn = obsts
4 5
n and s are on the GVD ties = n ; tien = s
6 7
else n and s are not on the GVD
8
ties = ∅ ; tien = ∅
Figure 4. Pseudocode for rebuilding the GVD. Function RebuildGVD removes cells from the OPEN queue to be processed and functions ConsiderForGVD and ConstructGVD mark the GVD. ProcessLower(s)
ProcessRaise(s)
1 foreach n ∈ Adj(s) 2 if tien =s 3 4 5
1 foreach n ∈ Adj(s) 2 if tien =s
insert (TIES , n) > distnew if distnew n s d = distance(n, s) + distnew ) s
3 4
6
if d < distnew n
5 6
7
distnew n
7
8 9
parentn = s obstn = obsts
10
=
d
insert(OPEN , n, distnew ) s
8 9 10 11 12 13
insert(TIES , n) if parentn =s new ; distnew =∞ distold n =distn n obstold n =obstn foreach a ∈ Adj(n) s.t. obsta is valid d =distance(n, a) + distnew a if d < distnew n
distnew = d ; parentn =a n obstn =obsta old if distnew = distold n n or obstn = obstn insert(OPEN , n, min(distnew , distn )) n
Figure 5. Pseudocode for propagating overconsistencies (ProcessLower) and underconsistencies (ProcessRaise).
The functions ProcessLower and ProcessRaise both update cells’ distances and place inconsistent cells on the queue. ProcessLower propagates overconsistencies: a cell examines its neighbors to see if it can lower their distances. Any changed neighboring cell has its lookahead distance, parent, and closest obstacle updated and is placed on the queue. ProcessRaise propagates underconsistencies: a cell examines its neighbors to see if any used its old distance to compute their own (line 4). For each such cell, the lookahead distance is reset and then recomputed using current values (lines 7-11). Any changed cell is placed back on the OPEN list. Additionally, both ProcessLower and ProcessRaise also find neighboring cells that were marked on the GVD because their Voronoi region was different from the current cell’s. Such cells are reinserted into the TIES list to be reexamined (lines 2-3). The termination and correctness of Dynamic Brushfire follow from the proof of D* Lite [11] and are shown in [8].
120
a
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
b
c
d
Figure 6. The maps used to generate results. From left to right, the complete correct map, the map with random errors, the map at lower resolution, and an illustration of an Unmanned Aerial Vehicle (UAV) updating the GVD to the correct map from the low resolution map.
4. Experiments and Results In this section we present and discuss statistical comparisons between Dynamic Brushfire and competing algorithms on several robotic scenarios. We also present results from applying Dynamic Brushfire to real robot data and to a multirobot planning problem. 4.1. Comparison to Other Algorithms We compared Dynamic Brushfire to the Brushfire, Euclidean Distance Transform, and quasi-Euclidean Distance Transform algorithms discussed in Section 2 on four scenarios. The first scenario is common to many domains and involves constructing the GVD once given a static environment. We use 200×200 cell environments that are 20% occupied by randomly placed obstacles ranging in size from 5×5 to 20×20 cells (see Figure 6 (a)). The remaining three scenarios are unique to and occur often in robotics: they require traversing the environment and repairing the GVD as new information is gathered from sensors. For this, we simulate an Unmanned Aerial Vehicle (UAV) sweeping this environment with a 10 cell omni-directional sensor and require that the UAV repair the GVD after every 10 cells of traverse. In the first of these three scenarios, the UAV has no prior map. In the second, it must repair the GVD given an erroneous prior map in which each obstacle has a 30% chance of having incorrect dimensions, of being placed incorrectly, or of being absent. An additional 10% of obstacles are randomly added (see Figure 6 (b)). Finally, the third involves repairing a GVD given a 20×20 cell low-resolution prior map (see Figure 6 (c)). Figure 6 (d) illustrates the UAV’s traverse as it incorporates new information into a prior low-resolution map. We ran each algorithm on each scenario on 100 different maps. The results from these experiments are shown in Table 1 and in Figure 7. The performance difference between Dynamic Brushfire and the other algorithms depends primarily on how much of the old computation can be preserved and how often repair must occur. This is true of most incremental repair algorithms (e.g. D* and D* Lite). Thus, in initial construction of the GVD (where prior computation does not exist), the extra processing that enables repair causes Dynamic Brushfire to be competitive to but slightly slower than the other algorithms. In the other three scenarios, Dynamic Brushfire outperforms its competitors by an order of magnitude. This improvement is most pronounced when updating from the low resolution and erroneous maps (Dynamic Brushfire is 20 and 17 times faster, respectively, than its closest competitor) because information about the environment is gained at every step and is localized around the robot. When constructing the GVD without a prior map,
121
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
Figure 7. A graph of the results from Table 1 on the three incremental update scenarios. From left to right in each group: Brushfire, EDT, Q-EDT, and Dynamic Brushfire. Table 1. A comparison of the performances of the four approaches on four GVD construction and repair scenarios: initial construction of a complete map (Initial), incremental construction without a prior map (Blank), incremental construction with an erroneous prior map (Error), and incremental construction with a low resolution prior map (Lowres). Each scenario was tested 100 times for each algorithm; the left column indicates the mean total time taken in seconds, and the right column indicates the standard deviation of this measure. Algorithm Brushfire
Initial
Blank
Error
Lowres
μ
σ
μ
σ
μ
σ
μ
σ
0.158
0.006
20.075
1.091
10.467
2.382
20.666
1.393
Q-EDT
0.091
0.003
12.161
0.061
5.662
1.256
11.356
0.065
EDT
0.104
0.003
11.264
0.078
7.227
1.669
15.212
1.276
DynamicBrushfire
0.162
0.005
1.887
0.174
0.328
0.051
0.551
0.068
large portions of the map have yet to be observed and the GVD is sparse. Thus, changes must be propagated over larger areas than in the other cases. As a result, the improvement is significant (Dynamic Brushfire is 6 times faster) but less than in the other scenarios. 4.2. GVD Construction on Real Data We have also tested Dynamic Brushfire on real environmental data obtained from traverses by a John Deere e-Gator robotic platform equipped with a SICK LMS laser in an outdoor wooded environment (Figure 1, top and bottom right). In one experiment, our robot explored a 75 × 40 meter area (Figure 1, left). We generated a GVD of the final data (Figure 1, center). We also repaired the GVD after each sensor scan from the laser. The roughly 2,400 cell updates over 90 repair episodes in a 375×200 cell environment took Dynamic Brushfire 2.7 seconds. For comparison, the same experiment took Brushfire 31.1 seconds. In a second experiment, we gathered sensor data from a traverse by the e-Gator platform in a similar but larger environment and then returned three weeks later for a second traverse over the same area. The first traverse provided a prior map for the second traverse. Here, repair on a prior map of size 680 ×480 cells with roughly 2,200 cell updates over 156 repair episodes took Dynamic Brushfire 11.4 seconds. For comparison, the same experiment took Brushfire 250.6 seconds. 4.3. Application to Multirobot Path Planning This work was originally motivated by the need for efficient path planning for multirobot teams. Specifically, we are interested in solving the constrained exploration problem in which a team of agents must navigate through an environment while maintaining com-
122
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
Figure 8. The path of two robots successfully completing a constrained exploration task from the left to the right of the environment. The paths are in bold blue/gray and bold black, obstacles are in light gray, and the GVD generated on this environment is in thin light gray. The coverage areas of the four communication towers are marked by dashed circles.
munication constraints. An instance of this problem is presented in Figure 8. Here, two robots must navigate through the environment to reach their goal locations, from the left to the right. Each robot must also always remain in contact with one of the communication towers in the environment at all times, either directly or indirectly via its teammate. Because the area of coverage of some neighboring communication towers do not overlap, the robots must coordinate to hop over the gaps. In such domains where teammates’ actions are highly interdependent, centrally planning for parts of the team can be beneficial [9]. However, algorithms for computing centralized plans typically have complexity exponential in the number of robots. While normally such planning might be intractable, it can be made feasible by reducing the search space (e.g. by planning only on the GVD). We compared planning on the GVD to planning over the entire space. In this instance, A* took only 0.36 seconds to determine a path when the search space was limited to the GVD, while it took 94.9 seconds when searching over the entire grid. We then repeated the same task but this time gave the robots an erroneous prior map of the environment. They constructed the GVD of the current map, planned a path, executed a portion of that plan while improving the map with sensor information, and then repaired the GVD and replanned again. Replanning after every five steps resulted in 74 planning and GVD construction episodes; the total planning time was 12.2 seconds and the GVD repair time using Dynamic Brushfire was 4.5 seconds. For comparison, GVD repair using Brushfire took 132.3 seconds.
5. Conclusions In this paper we have presented Dynamic Brushfire, a new algorithm for efficient, incremental reconstruction of GVDs on grids. Dynamic Brushfire operates analogously to the D* family of algorithms for path replanning: it restricts the propagation of changes in the environment to only those areas that could be affected. We have compared our algorithm to several leading approaches for constructing GVDs and found our algorithm to be significantly more efficient, particularly in typical robotics applications. We are currently using this algorithm for multirobot path planning in unknown and uncertain environments.
N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids
123
6. Acknowledgments This work was sponsored by the U.S. Army Research Laboratory, under contract “Robotics Collaborative Technology Alliance” (contract number DAAD19-01-2-0012). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies or endorsements of the U.S. Government. Dave Ferguson is supported in part by an NSF Graduate Research Fellowship. References [1] Jerome Barraquand and Jean-Claude Latombe. Robot motion planning: A distributed representation approach. Technical Report STAN-CS-89-1257, Computer Science Department, Stanford University, May 1989. [2] Heinz Breu, Joseph Gil, David Kirkpatrick, and Michael Werman. Linear time euclidean distance transform algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:529–533, May 1995. [3] Howie Choset and Joel Burdick. Sensor based planning, part I: The generalized voronoi graph. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1995. [4] Howie Choset, Sean Walker, Kunnayut Eiamsa-Ard, and Joel Burdick. Sensor-based exploration: Incremental construction of the hierarchical generalized voronoi graph. The International Journal of Robotics Research, 19(2):126 – 148, February 2000. [5] Mark Foskey, Maxim Garber, Ming Lin, and Dinesh Manocha. A voronoi-based hybrid motion planner. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2001. [6] Leonidas Guibas, Christopher Holleman, and Lydia Kavraki. A probabilistic roadmap planner for flexible objects with a workspace medial-axis-based sampling approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1999. [7] Kenneth Hoff, Tim Culver, John Keyser, Ming Lin, and Dinesh Manocha. Fast computation of generalized voronoi diagrams using graphics hardware. In SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999. [8] Nidhi Kalra, Dave Ferguson, and Anthony Stentz. The dynamic brushfire algorithm. Technical Report CMU-RI-TR-05-37, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, September 2005. [9] Nidhi Kalra, Dave Ferguson, and Anthony Stentz. Hoplites: A market-based framework for complex tight coordination in multi-robot teams. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2005. [10] Sven Koenig and Maxim Likhachev. D* lite. In Proceedings of the National Conference on Artificial Intelligence (AAAI), 2002. [11] Maxim Likhachev and Sven Koenig. Lifelong Planning A* and Dynamic A* Lite: The proofs. Technical report, College of Computing, Georgia Institute of Technology, 2001. [12] Nageswara Rao, Neal Stoltzfus, and S. Sitharama Iyengar. A "retraction" method for learned navigation in unknown terrains for a circular robot. IEEE Transactions on Robotics and Automation, 7(5), October 1991. [13] Azriel Rosenfeld and John Pfaltz. Sequential operations in digital picture processing. Journal of the Association for Computing Machinery, 13(4), 1966. [14] Peter Forbes Rowat. Representing spatial experience and solving spatial problems in a simulated robot environment. PhD thesis, University of British Columbia, 1979. [15] Anthony Stentz. Optimal and efficient path planning for partially-known environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1994.
124
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Learning from Nature to Build Intelligent Autonomous Robots Rainer BISCHOFF and Volker GRAEFE Intelligent Robots Laboratory, Bundeswehr University Munich, Germany Abstract. Information processing within autonomous robots should follow a biomimetic approach. In contrast to traditional approaches that make intensive use of accurate measurements, numerical models and control theory, the proposed biomimetic approach favors the concepts of perception, situation, skill and behavior – concepts that are used to describe human and animal behavior as well. Sensing should primarily be based on those senses that have proved their effectiveness in nature, such as vision, tactile sensing and hearing. Furthermore, human-robot communication should mimic dialogues between humans. It should be situation-dependent, multimodal and primarily based on spoken natural language and gestures. Applying these biomimetic concepts to the design of our robots led to adaptable, dependable and human-friendly behavior, which was proved in several short- and long-term experiments. Keywords. Bionics, robot system architecture, situation, skill, behavior
1. Introduction Nature has created highly complex, efficient and dependable systems in the form of organisms since the very beginning of life on earth. Design and function of organisms have been optimized under evolutionary pressure over billions of years, a small step at a time. It is, therefore, an attractive idea to apply nature’s solutions to today’s engineering problems and, specifically, to the creation of robot intelligence. A first step is to base a robot’s sensor modalities, world modeling, control and communication with humans primarily on those principles that proved their effectiveness in nature. This biomimetic approach to the realization of robot intelligence differs from the traditional one in many respects. Table 1 gives an overview. 2. Human information processing Especially if a robot, such as a personal robot, is supposed to work closely together with humans in human-populated environments, its information processing design might benefit from understanding the human one and from employing similar concepts. An overall qualitative model of human performance was proposed by Rasmussen [1]. In his model he classified three typical levels of performance of skilled human operators: skill-, rule-, and knowledge-based performance (Figure 1). According to Rasmussen, a skill-based behavior represents „sensory-motor performance during acts or activities which [...] take place without conscious control as smooth, automated, and highly integrated patterns of behavior”. Performance is mostly based on feedforward control which depends upon a very flexible and efficient dynamic internal world model. Handwriting and sports are examples of such well coordinated rapid movements.
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
125
Table 1. Summary of biomimetic vs. traditional approaches to robot intelligence Category
Traditional Approach
Biomimetic Approach
Sensor modalities
often only single modality
multi-modal, combination of:
often range sensors: laser range finder, sonar, radar, depth from stereo vision, motion, …
exteroceptors: vision, audition, touch, olfaction, gustation
proprioceptors: angle encoders, strain gauges, bodily acceleration sensors, kinesthetic sensors …
interoceptors: internal temperature measurements, battery gauging, …
Modeling
Robot control
seemingly suitable for constructing numerical models, e.g., 3-D surface reconstruction
suitable for situation assessment
global coordinates
local coordinates (if any)
accurate geometrical maps
attributed topological maps
detailed numerical models of the robot and the external world
qualitative models of relevant aspects of a vaguely known and ever-changing world
complete knowledge of all robot and world parameters
largely qualitative internal model of the robot’s situation
accurate and complete calibration of sensors, actuators, kinematics etc.
no or intrinsic calibration
accurate measurements
largely qualitative perception
control theory
highly adaptive calibration-free control situation-dependent behavior selection and execution
Communication
exact trajectory teaching
situation- and context-depending dialogues resembling inter-personal communication
artificial, well-structured (usually written) command language
spoken natural language augmented by gestures and visual and haptic events
commands and status reports based on coordinates
commands and status reports based on current situation and perceivable objects
Goal
Knowledge-Based Behavior Symbols
Identification
Decision of Task
Planning
Recognition
Association State / Task
Stored Rules for Tasks
Rule-Based Behavior Signs
Skill-Based Behavior Feature Formation
Sensory Input
(Signs)
Automated Sensori-Motor Patterns
Signals Actions
Fig. 1. Human information processing. Simplified model illustrating the three levels of performance of skilled human operators (Rasmussen [1]).
126
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
At the next level – rule-based behavior – human performance is goal-oriented, but structured through stored rules (procedures). These rules may have been obtained through instructions, derived from experience or deduced by conscious problemsolving. Rules are typically defined as „if-then-else” clauses, and operators would experience them as stereotype acts. Rasmussen makes a distinction between skill- and rule-based behavior by the level of training and attention a person requires to perform a certain task. Whereas skill-based performance does not necessitate a person’s conscious attention, rule-based performance is generally based on explicit know-how. At the highest conceptual level human performance is goal-controlled and knowledge-based. It requires a mental model of the system or task enabling the person to develop and predict different plans to reach a certain goal. Rasmussen’s model is well suited to describe a typical service task, such as serving a drink, in terms of the required knowledge, rules, skills, symbols, signs and signals. It is interesting to see that both a human and a robotic servant could apply the same model to decompose a task into a set of elementary action patterns. Supposed that the servant would know how to serve a drink, he would probably apply the following rules: grasp bottle, grasp glass, fill glass from bottle, approach table, place glass onto table Each rule would in turn be composed of elementary sensorimotor skills, e.g.: grasp bottle := locate object bottle, open hand, move hand towards object, close hand around object Each sensorimotor skill would require signals (i.e., continuous sensory input) or a sign (to start some automated sensorimotor skill). The completion of a rule or sensorimotor skill is indicated by a sign representing a specific system state. The rule „grasp bottle” could thus be terminated by the sign „hand closed around the object”, the termination sign of the rule „approach table” could be, e.g., „docking position reached”. Knowledge-based operations are necessary when a set of rules does not yet exist to describe a specific task or when an unfamiliar situation occurs. For example, imagine that the servant experiences unexpected difficulties in opening a closed bottle. To handle this exception it is required to devise a plan and probably a set of rules allowing the removal of the cork (or any other stopper) before being able to fill the glass. On this knowledge-based level, sensory inputs are interpreted as symbols that are related to concepts and functional properties, i.e., problem solving requires knowledge about the process („when the stopper has been removed, the filling can begin”).
3. Robot Information Processing So far, basically three paradigms for organizing robot information processing have emerged: hierarchical (sequential), reactive (parallel), and hybrid. As a combination of hierarchical and reactive, the hybrid approach is meant to combine the best of both worlds. It allows the implementation of both purely reactive behaviors (= reflexive behaviors) and more cognitively challenging (computationally more expensive) functions to a varying degree. Although quite a number of hybrid architectures have been developed (see [2] for an overview) it is still an open research question how to organize and coordinate the reactive and deliberative parts of the system. As part of our solution to this question, we introduced a situation-oriented skill-based system architecture [3] that – according to our biomimetic approach – favors perception over measurement and situation-based calibration-free control over accurate modeling and classical control
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
127
theory. This architecture is based on the concepts of situation, perception, behavior and skill, which are briefly explained in the sequel and illustrated in Figure 2. According to the classical approach, robot control is model-based and depends on a continuous stream of accurately measured data. Numerical models of the kinematics and dynamics of the robot and of the external objects that the robot should interact with, as well as quantitative sensor models, are the basis for controlling the robot’s motions. The main advantage of model-based control is that it lends itself to the application of classical control theory and, thus, may be considered a straight-forward approach. The weak point of the approach is that it breaks down when there is no accurate quantitative agreement between reality and the models, which is all too often the case. Organisms, on the other hand, have the ability to continuously cope with, and to survive in, the complex, dynamic and unpredictable real world. They are robust and adapt easily to changes of their own conditions and of the environment. They never need any calibration, and they normally do not know the values of any parameters related to the characteristics of their „sensors” or „actuators”. Obviously, they do not suffer from the shortcomings of model-based control which leads us to the assumption that they use something other than measurements and numerical models for controlling their behaviors. Perhaps their behavior control is based on a qualitative and holistic assessment of situations and the selection of actions to execute from an existing repertoire on that basis. We define a robot’s (internal) situation as the set of all decisive factors that should ideally be considered by a robot in selecting the correct action at a given moment. These decisive factors include: the goals of the robot, i.e., permanent goals, usually internal goals of the robot, such as survival, and satisfaction of desires, intentions and plans; and transient goals emerging from the actual mission; the state of the robot (state of motion, state of actuators and sensors, presently executed behaviors, focus of attention of the perceptual system,...); the state of the robot’s environment, i.e., perceivable objects in the environment and their suspected or recognized states; and static characteristics of the environment, even if they currently cannot be perceived by the robot’s sensors; the repertoire of available skills and knowledge of the robot’s abilities to change the present situation in a desired way by executing appropriate skills. Main basis for the recognition of situations are perceptions that integrate sensory impressions of events in the external world. Perceptions help the robot to become aware of something through its senses. Generally speaking, perception plays a similar role in the biomimetic approach as measurement does in the traditional one. In the proposed system architecture situations provide the framework for the set of the robot skills. A skill is a goal-directed, well organized ability that is inbuilt, can be acquired and improved through learning, and is performed with economy of effort (derived from psychological literature [4]). Whereas a skill describes functional underpinnings of a robot and is implemented as a specific software module, a behavior is an observable activity. Skills may thus be the basis for behaviors. It should be noted that a robot may show a certain behavior not only based on the execution of skills, but also based on purely reactive responses to external or internal stimulations, e.g., reflexes. Figure 2 (left) illustrates our definition of the term „situation” by embedding it in the perception-action cycle of a situation-oriented skill-based robot. As defined above, the robot’s image of the situation emerges at any given moment by combining internal and external goals, knowledge and percepts.
128
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
Knowledge Base Storage
Situation Assessment & Skill Management
kills
Pr oc e
Com mu nic
ills Sk
skill execution
kills
rS
reflexive shortcut
&
knowledge (internal actions)
actuation
situation assessment
S ing
lls Ski
situation
ss
ive at
interaction
perception
supervision
or Sens
sensation
external goals
Data
internal goals
ROBOT
MMI Peripherals
oto
HUMAN
disturbances
skill management
M
ENVIRONMENT
Sen so
(external actions)
Sensors
r imotor Skil
ls
Actuators
disturbances
Fig. 2. Biomimetic robot information processing. Left: Perception-action cycle of a situation-oriented skillbased robot in interaction with its environment (and a human) based on the key concepts of perception, situation, and skill. Right: System architecture of a personal robotic assistant based on those key concepts.
4. Implementation Due to space limitations the actual implementation of the described fundamental concepts can only be sketched; for details cf. [3]. Figure 2 (right) shows the essence of the situation-oriented skill-based robot architecture as we have implemented it. The situation module (situation assessment & skill management) acts as the deliberator of the whole system. It is interfaced via skills in a bidirectional way with all hardware components – sensors, actuators, knowledge base storage and MMI peripherals (manmachine and machine-machine interface peripherals). Skills have direct access to the hardware components and, thus, let the robot sense, think (plan) and act. They obtain certain information, e.g., sensor readings, generate specific outputs, e.g., arm movements or speech, or plan a route based on map knowledge. Skills report to the situation module via events and messages on a cyclic or interruptive basis to enable a continuous and timely situation update and error handling. Whereas some skills require real-time connections to and from the situation module, e.g., sensor skills, non-real-time performance is sometimes allowed, e.g., for data base maintenance and route planning. The situation module fuses data and information from all system components to make situation assessment and skill management possible. As such, it provides cognitive skills, i.e., it is responsible for planning an appropriate skill sequence to reach a given goal. By activating and deactivating the inbuilt skills, the situation-dependent concatenation of skills is realized that leads to complex and elaborate robot behavior. Motor skills control simple movements of the robot’s actuators. They can be arbitrarily combined to yield a basis for more complex control commands. Encapsulating the access to groups of cooperating actuators leads to a simple interface structure and allows an easy generation of pre-programmed motion patterns. Sensor skills encapsulate the access to one or more sensors and provide the situation module with proprio-, extero- and interoceptive data. Sensorimotor skills combine both sensor and motor skills to yield sensor-guided robot motions. Data processing skills organize and access the system’s knowledge bases. They return specific information upon request and modify existing, or add newly gained, knowledge to the data bases. Various types of knowledge bases are being used, including an attributed topological map for storing the static characteristics of the environment, an object data base, a subject data base and a list of missions to accomplish. Communication takes place via various different channels (acoustic, optical, tactile) and requires a variety of Communicative skills to pre-
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
129
process input from humans and to generate a valuable feedback for them according to the current situation and the given application scenario. They depend on various sensor skills to receive the required information, e.g., hearing skills for understanding speech, vision skills, and even tactile skills, to allow situated dialogues with references to visible and tangible objects and subjects in the world. When comparing Rasmussen’s model of human information processing (cf. Fig. 1) with our model of robot information processing (cf. Fig. 2) the three blocks „feature formation”, „recognition” and „identification” are realized by sensor skills, and „automated sensor-motor patterns” are mostly realized by motor and sensorimotor skills. The cognitive skills provided by the situation module correspond to the three blocks „decision of task”, „planning” and „association state/task”. Part of „planning” and „stored rules for task” are realized by data processing skills. As communicative skills involve the entire information processing system they cannot be assigned to an individual block of Figure 1. In our implementation all tasks and threads of the robot’s software system run asynchronously, but can nevertheless be synchronized via messages or events. Overall control is realized as a finite state automaton that is capable of responding to prioritized interrupts and messages. A finite state automaton has been preferred over other possibilities because of its simplicity. Motor skills are mostly implemented at the microcontroller level within the actuator modules. The other skills are implemented on a network of digital signal processors that is physically embedded in a PC.
5. Real-world experiments Over the years four of our robots have been equipped with different variants of the described system architecture and brought into short- and long-term field trials: two autonomous vehicles (ATHENE I and II), a stationary manipulator and a humanoid robot (HERMES). All robots were endowed from the start with certain basic skills, and were able to incrementally extend their knowledge and skills through learning. Calibration-free navigation. Two different vision-guided mobile robots were used as experimental platforms for learning to navigate a network of corridors by concatenating simple vision- and odometry-based sensorimotor skills: HERMES, a humanoid personal robotic assistant, and the older ATHENE II, a transportation robot with tricycle wheel configuration. HERMES has a unique wheel configuration with two independently driven and steered wheels mounted at the front and the back, and two caster wheels arranged at the centers of the left and right sides of a quadratic undercarriage allowing many different drive modes (differential, car-like, fork-lift, omnidirectional) [5]. While ATHENE II is equipped with a monochrome video camera mounted on a 1-DOF platform steerable around its vertical axis, HERMES’ main sensors are two monochrome video cameras mounted on independent pan/tilt drive units, in addition to a pan/tilt unit that controls the common „head” platform. Various standard lenses (from wide angle to tele) and a Suematsu lens were used during the experiments. It should be explicitly noted that it would have been difficult and time-consuming to measure all relevant kinematic parameters of such versatile robots with any degree of accuracy and to design model-based controllers for all possible modes of operation. Our calibration-free approach makes it unnecessary. For testing our calibration-free navigation concept we have equipped both robots with reliable, robust and sufficiently fast dynamic vision systems [5]. They yield the
130
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
required data, such as the position of a guideline that is typically detected at the intersections of the floor with the walls despite adverse conditions, such as specular reflections and difficult illumination conditions, in general at video field rate (i.e., 50 Hz). In the beginning of the actual driving experiments HERMES and ATHENE II were placed somewhere in a corridor. Their exact orientation and position were unknown, but a section of the guideline had to be in the field of view. Corridor navigation was possible and worked fine in a tilt angle range from 0° (horizontal) to about 50° (looking downward). Negative tilt angles (i.e., looking upward) were not tested, but could have worked as long as the guideline was visible. Based on their navigation skills map building and delivery missions were successfully performed (Fig. 3. left). Calibration-free manipulation. Manipulation experiments were first carried out with the vision-guided stationary manipulator (Fig. 3, middle). The cameras are attached to the robot on metal rods at the first link so that they rotate around the axis of joint 1 together with the arm. They are mounted in a rather unstable way to make the impossibility of any calibration or precise adjustment obvious, and to allow easy random modifications of the camera arrangement. Objects of various shapes could be grasped, although no knowledge regarding the optical or kinematic parameters of the robot was used. Even arbitrary unknown rotations and shifts of the cameras were tolerated while the robot was operating, which demonstrated the robot’s extreme adaptability. Furthermore, it was shown that such an extreme adaptability does not necessarily have to degrade pick and place cycle times if appropriate learning schemes are used. With its two arms with 6 DOF and a two-finger gripper each and a bendable body, HERMES possesses an even more complex manipulation system which – besides grasping - allowed us to carry out human-robot interaction experiments. The biggest advantage of our skill-based approach is that it enormously reduces complexity in designing intelligent robot behaviors, such as taking/handing over objects from/to people, placing them onto other objects and gesturing in communicative acts. Communication and interaction. HERMES has been presented successfully at trade fairs, in television studios and in a museum, and at various demonstrations in our institute environment. The robot is able to deliver simple services within initially unknown environments for users that may be initially unknown, too. Services handled by the robot include, but are not limited to, transportation tasks, entertainment, and gathering and providing information about people and their living or working environment (Fig. 3, right). For example, many types of dialogues exist to cooperatively teach the robot new knowledge and to build a common reference frame for subsequent execution of service tasks. It is possible for the robot – besides learning an attributed topological map of the environment - to learn persons’ names, to learn how locations and objects are denominated by a specific person, where objects of personal and general interest are located, and how to grasp specific objects. This requires several databases and links between them in order to retrieve the required information [6]. One of the most promising results of our experiments is that the biomimetic approach seems to pay off. First, despite the complexity of the overall system the natureinspired system architecture allows the integration of newly developed skills very easily making the system scalable. Second, although we experienced drifting of system parameters in the long-term experiments due to temperature changes or simply wear of parts or aging, the robots’ performances were not affected by such drifts because our algorithms only rely on qualitatively (not quantitatively) correct information and adapt to parameter changes automatically. We found that the implemented skills worked very well not only around our laboratory, but also in other settings, despite the rather differ-
R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots
131
ent appearance of objects of interest (e.g., corridors, docking stations). Third, according to the museum staff who ran our longest field trial for a period of more than 6 months (October 2001 – April 2002) in the Heinz Nixdorf MuseumsForum (HNF) in Paderborn, Germany, HERMES was one of the few robots at the show that could regularly be demonstrated in action, and among them it was considered the most intelligent and most dependable one. This statement is supported by the fact that the museum staff never called for advice once the initial setup was done.
gripper
camera
camera
object
Fig. 3. Experiments and Results. Left: Image taken by the HERMES' vision system during corridor navigation (tracking a guideline, indicating two forms of interruptions (door and junction)). Middle: Setup of stationary robot arm used for grasping (Mitsubishi Movemaster RV-M2, 5 DOF, two-finger gripper and a stereo vision system). Right: HERMES describing the way to a location of interest by means of voice and gestures.
6. Summary and conclusions A biomimetic approach to the realization of robot intelligence by studying mother nature’s best ideas and then imitating her designs and processes to solve robotic problems has been proposed. A robot information processing architecture was derived by studying Rasmussen’s model of human information processing. It has been implemented on various robots and has proved to endow robots with intelligent situation-oriented behavior. The ability to sense in a human-like way by means of vision, touch and hearing – the most powerful sensor modalities known – is enabling our robots to perceive their environments, to understand complex situations and to behave intelligently. While today’s robots are mostly strong with respect to a single functionality, e.g., navigation or manipulation, our results illustrate that many functionalities can be integrated within one single robot through a nature-inspired unifying situation-oriented skill-based system architecture. Furthermore, testing a robot in various environmental settings, both short- and long-term, with people having different needs and different intellectual, cultural and social backgrounds, is enormously beneficial for learning the lessons that will eventually enable us to build dependable personal robotic assistants. 7. References 1. 2. 3. 4. 5. 6.
Rasmussen, J. (1983): Skills, rules, and knowledge; Signals, signs, and symbols, and other distinctions in human performance models. IEEE Trans. on Systems, Man and Cybern. Vol. 13, No. 3, pp. 257-266. Arkin, R. C. (1998): Behavior-Based Robotics. MIT Press, Cambridge, MA, 1998. Bischoff, R.; Graefe, V. (2004): HERMES – a versatile Personal Robotic Assistant. Proc. of the IEEE, Spec. Issue on Human Interactive Robots for Psychological Enrichment, Vol. 92, No. 11, p. 1759-1779. Proctor, R. W.; Dutta, A. (1995): Skill Acquisition and Human Performance. In: Bourne, L. E. (ed.). Advanced Psychology Texts. SAGE Publications, Thousand Oaks. Graefe, V.; Bischoff, R. (2004): Robot Navigation without Calibration. Proceedings IEEE International Workshop on Intelligent Robots and Systems (IROS ‘04). Sendai, pp. 627-632 Bischoff, R.; Graefe. V. (2002): Dependable Multimodal Communication and Interaction with Robotic Assistants. 11th IEEE Int. Works. on Robot and Human Interactive Comm. Berlin, pp. 300-305.
This page intentionally left blank
Part 2 Tracking Control & Active Vision
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
135
A Laser Based Multi-Target Tracking for Mobile Robot Masafumi HASHIMOTO a1, Satoshi OGATA b, Fuminori OBA b, and Takeshi MURAYAMA c a
Faculty of Engineering, Doshisha University, Japan Faculty of Engineering, Hiroshima University, Japan c Faculty of Dentistry, Hiroshima University, Japan b
Abstract. This paper presents a method for tracking multiple moving objects with invehicle 2D laser range sensor (LRS) in a cluttered environment, where ambiguous/false measurements appear in the laser image due to observing clutters and windows, etc. Moving objects are detected from the laser image with the LRS via a heuristic rule and an occupancy grid based method. The moving objects are tracked based on Kalman filter and the assignment algorithm. A rule based track management system is embedded into the tracking system in order to improve the tracking performance. The experimental results of two people tracking validate the proposed method. Keywords. Mobile robot, Laser range sensor, Multi-target tracking, Occupancy grid method, Kalman filter, Data association, Assignment algorithm
1. Introduction Tracking (i.e., estimating position and velocity) of multiple targets (moving objects) is an important issue for achieving autonomous navigation and cooperation of mobile robots and vehicles. There has been much interest in the use of stereo vision or laser range sensor (LRS) in mobile robotics and vehicle automation [1-13]. Although a number of papers exist in tracking of multiple moving objects with the LRS in clean/crowded environments [2-4, 68, 11, 12], there are a few efforts [5, 13] in the use of the LRS in cluttered environments, where ambiguous/false measurements appear in the laser image by observing clutters, windows and so on. In this paper we present a method for tracking multiple moving objects with the LRS in cluttered environments. The tracking of multiple moving objects is achieved by two steps; target detection (finding moving objects from the laser image) and target tracking (tracking the moving objects detected). The occupancy grid method is one of effective methods for the target detection [2, 3]; another approach to find moving objects is based on the scan matching of successive laser scans [13]. In this paper we apply the occupancy grid based method for the target detection. In the occupancy grid method the cells occupied by obstacles are marked. For each new cell of occupation in the current scan the corresponding cell in the previous 1 Corresponding Author: Department of Information Systems Design, Faculty of Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, 610-0321 Japan; E-mail:
[email protected]
136
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
scan is checked. If that cell is marked the cell in the current scan is considered as stationary, otherwise as moving. Most of the conventional occupancy grid based target detection are based on the measurements in a few successive scans of the LRS, and thus cluttered environments deteriorate the detection performance since ambiguous/false measurements appear in the laser image. In order to enhance the performance of the target detection in cluttered environments this paper presents an alternative method based on a heuristic rule and an occupancy grid method. The target tracking is divided into two approaches; model based approach (e.g., Kalman filter and Particle filter) [4, 6-13] and model free approach [2, 3, 5]. The model based approach makes it possible to track objects temporarily occluded by stationary/ moving objects, and thus it is more suitable than the model free approach. Data association (one-to-one matching of tracked objects and measurements) is necessary for the target tracking in multi-target environments. Many data association techniques have been addressed in the aerospace area (e.g., Joint Probabilistic Data Association Filter (JPDAF) and Multiple Hypothesis Tracker (MHT)) [15]. In this paper we apply the assignment algorithm [14] for the data association. The algorithm is simple extension of well-known Nearest Neighboring Standard Filter (NNSF) [15] for the data association in multi-target environments, and it allows the multi-target tracking in cluttered environments. Moreover in this paper a rule based track management system is presented in order to improve the tracking performance. This paper is organized as follows: In Section 2, our experimental system is overviewed. In Sections 3 and 4, methods of moving target detection and tracking, respectively, are presented. In Section 5, two experiments are conducted to validate our method, followed by conclusions.
2. Experimental System Figure 1 shows our mobile robot (a smart wheelchair) controlled with a joy-stick. It has two steered/drive wheels in its front and at rear and four passive casters at four corners. Each of the steered/drive wheels consists of a steering unit and a drive unit. A potentiometer is attached at the steering unit to sense the steering angle, and an encoder is attached at the drive unit to measure the wheel velocity. A yaw-rate gyro is attached at the robot body to sense the turn velocity accurately. These five internal sensors are exploited for determining the ego-motion of the robot with the dead reckoning. A forward-looking LRS (Sick LMS200) mounted on the robot takes the scan image, which is represented by a sequence of distance samples in the horizontal half plane; the distance samples are angularly ordered counterclockwise. The angular resolution of the LRS is 0.5deg, and then the number of distance samples are 361(=180/0.5+1) in one scan image. The scanning frequency is about 75Hz; however the performance of serial communication link imposes a restriction to about 10Hz.
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
137
Laser range sensor (LRS)
Steered/drive wheel
Figure 1.
Overview of mobile robot (smart wheelchair)
Figure 2. Samples taken with LRS within an indoor environment. The symbol of open circle shows the sample point with LRS.
3. Target Detection We describe a method for finding moving objects from the scan image of the LRS based on Fig.2, where moving objects are the objects (D), (G) and (I). Detecting moving objects are achieved by the three steps; sample clustering, exclusion of distant objects (detection of nearby objects), and exclusion of stationary objects (detection of moving objects). The distance samples coming from the same object have the similar values, while those coming from different objects are significantly difference. Thus the distance samples taken with the LRS are clustered by checking the gap between two adjacent samples. Based on the sample clustering the ten objects (A) to (J) are detected as in Fig.2. It is assumed that for collision avoidance, it has interest in moving objects located in the vicinity of the robot. In order to exclude distant objects (objects located far away from the robot) from the sample clusters as many as possible, we use a heuristic rule; as shown in Fig.2 the two edge-samples related to the object (i), where i=B to I, are denoted by bsi and bei, respectively. The edge-sample related to the right object of the object (i) is denoted by be(i-1), and that related to the left object of the object (i) is done by bs(i+1). Objects that comply with bsi be ( i 1) t rbg ޓand bei bs (i1) t rbg , where rbg is a threshold, are considered as the distant objects. Based on this heuristic rule the objects (A),(C), (F), (H) and (J) are excluded, and the objects (B), (D), (E), (G) and (I) remain. It should be noted that the distant object (E) is not excluded since our exclusion algorithm is only based on the two gaps related to three adjacent sample-clusters and the relatively distant object is excluded. In our experiments described in Section 5, the threshold rbg is set at 0.3m. As shown in Fig.3 a grid map is represented in a reference coordinate frame 6 w ( Ow ; X wYw ). The LRS measurements related to the objects (B), (D), (E), (G) and (I) are mapped in 6 w , and those objects are marked on the grid map. In order to judge whether each cell on the grid map is occupied by stationary or moving objects, we count how many
138
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
Yw
Robot
Ow
Xw
Figure3. Grid map; the gray cells show the field of view of the LRS.
successive scans each cell is marked in. The cell marked in more than ten successive scans is referred to as the stationary object cell. For the objects (B), (D), (E), (G) and (I) we obtain the value of D i b / a ; where a is the number of cells occupied by the object (i), where i= (B), (D), (E), (G) and (I); b is the number of the stationary object cell related to the object (i). The object that complies with D i 0.5 is decided as the moving object, and thus the moving objects (D), (G) and (I) are extracted. Finally the nearby objects (G) and (I) are merged. In our experiments the cell size is set at 0.3m x 0.3m. For mapping the laser image on the grid map we find the change of the robot position and orientation between the laser scans with the dead reckoning system. Although the dead reckoning causes the drift error over time, we use the change in a limited number of previous scans. Thus the effect of the drift error on the mapping is very small.
4. Target Tracking The position of the moving object in 6 w is denoted by ( x, y ) . The object is assumed to move at nearly constant velocity. A rate kinematics of the object motion is then given by xk
Fx k 1 G'xk 1
(1)
where x ( x, x , y, y )T . 'x ('x, 'y)T is an acceleration. The subscripts ( k 1 ) and ( k ) represent the ( k 1 )th scan and the k th scan, respectively, of the LRS. The system matrices F and G are constant. The LRS measurements related to the moving object (i.e., the distance zr and the bearing zT ) give the position in 6 w by ( z x , z y ) ( zr cos zT , zr sin zT ) . The measurement model related to the moving object is then
139
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
zk
H k xk Lk pk 'z k
(2)
where z ( z x , z y )T . 'z is a measurement noise. p is the position vector of the LRS in 6 w . The system matrices are given by Hk
ª cos\ LRSk « sin\ LRSk ¬
0 sin\ LRSk 0 cos\ LRSk
0º and Lk 0»¼
ª cos\ LRSk « sin\ LRSk ¬
sin\ LRSk º cos\ LRSk »¼
\ LRS is the orientation of the LRS in 6 w . The position and orientation, p and \ LRS , are estimated with the dead reckoning system. The position and velocity of the moving object are predicted with Kalman filter [15]. As shown in Fig.4 a validation region with the radius of rVG is set around the predicted position of the tracked object. The measurement inside the validation region, which is considered to come from the tracked object, is applied for the track update with Kalman filter. In our experiments the radius of the validation region, rVG , is set at 0.5m. As in Fig.4 (a)-(c) in real world multiple measurements would exist inside a validation region; multiple tracked objects would also compete for measurements. For achieving the reliable data association (one-to-one matching of tracked objects and measurements), we exploit the assignment algorithm [14], in which a combined likelihood function relating to measurements is evaluated. We assume that the number of tracked objects, whose validation regions are overlapped partially, is N and that the number of measurements, which exist inside the validation region, is M . A binary assignment variable anm is defined such that anm =1 if the measurement m is assigned to the object n, otherwise anm =0, where m=1 to M and n=1 to N. The assignment variable is subject to the following constraints: M N
¦a
nm
m 1
N M
¦ anm n 1
½ 1ޓޓ for ޓn 1,, N , N 1,, N M ° ° ¾ 1ޓޓ for ޓm 1,, M , M 1,, M N ° ¿°
(3)
where anm =1 for n t N 1 and m d M denotes that measurements come from new moving
(a) Case 1
(b) Case 2
(c) Case 3
(d) Case 4
(e) Case 5
Figure 4. Tracking condition; the symbols of closed circle and open diamond mean the tracked object and the measurement, respectively. The circle drawn by broken line is the validation region.
140
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
objects or false alarms; anm =1 for n d N and m t M 1 denotes that tracked objects are invisible; anm =1 for n t N 1 and m t M 1 is dummy. A set of assignment variables and a set of measurements in the kth scan of the LRS are denoted by a {anm : n, m 1,, N M } and Z k {z m ,k }mM 1 , respectively. The data association is achieved by finding the optimal assignment a * so that the combined likelihood function p>Z k | a @ relating to the measurements Z k might be maximized under the constraints in Eq.(3). Equivalently this can be rewritten as the following standard minimization problem: N M M N
a* arg min a
¦ ¦a n 1
c
nm nm
(4)
m 1
The cost function cnm is given by [14]
cnm
ln[ODVVG Lnm ,k|k 1 ] for n d N , m d M ° ® ln[1 OD ] ޓfor n d N , m t M 1ޓ ° for n t N 1 0 ¯
(5)
where OD is the detection probability of the tracked object, and VVG is the area of the validation region. Lnm is the likelihood function related to the measurement under the assumption that the measurement m comes from the object n . In our experiments we set OD =0.9. In cluttered environments ambiguous/false measurements may cause incorrect extraction of stationary and moving objects, and then false alarms occur (i.e., false moving objects are detected). Moreover in real world moving objects appear in and disappear from the field of view of LRS. They also meet the occlusion, crossing, merging and splitting situations. In order to cope with such conditions we build a track management system based on the following rules: a) Track initiation: As shown in Fig.4 (a) and (d) the measurements that are not matched with any tracked objects are considered to come from new moving objects or false alarms. The false alarms would disappear soon. From this reason the robot tentatively initiates tracking of the measurements with Kalman filter. If the measurements are constantly visible in more than 13 successive scans of the LRS, they are decided to come from new moving objects, and the tracking is continued. If the measurements disappear within 12 scans, they are decided as the false alarms, and the tentative tracking is terminated. b) Track termination: When the tracked objects leave the field of view of the LRS or they are occluded by other objects/obstacles in environments, no measurements exist within their validation regions as shown in Fig.4 (e). If no measurements arise from the temporal occlusion the measurements would appear again. The robot thus predicts the positions of the tracked objects with Kalman filter. If the measurements appear again within 30 scans of the LRS, the robot proceeds with tracking the objects. Otherwise the tracking is terminated.
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
141
c) Merging and crossing: Merging or crossing of moving objects brings a situation as shown in Fig.4(b). In the crossing situation, however, new measurements would appear soon within the validation regions. If new measurements appear again within eight scans of the LRS the objects are considered as crossing, otherwise as merging.
5. Experimental Results
We conducted an experiment to evaluate the proposed method; two people walking around were tracked in a room environment as shown in Fig.5. They always exist in the field of view of the LRS, and they cross many times. Their waking speed is between 0.3 and 1.2m/s. The LRS is set on the robot at a height of 0.47m from the floor, and thus it observes around knees of the people. Figure 6 shows the experimental results; the subfigure (a) plots all the samples taken with the LRS in the experiment. (b) shows the result of detecting moving objects. (c) shows the tracking result, where black and gray bold lines show the tracks of two people, and the black dotted line shows the track of the robot. The LRS observes many kinds of stationary objects in the room environment (e.g., walls, desks, chairs, boxes, cables, and pipes), and ambiguous measurements appear. The measurements cause many false moving objects. However the tracking system works well, and thus only the two walking people are tracked as in Fig. 6(c). We conducted another experiment; two people walking around were tracked in room/corridor environments as shown in Fig.7. The room environment is the same as that conducted in the above experiment. The left side of the corridor is windows, while the right side is walls and doors as shown in Fig.7; the windows disturb the laser image due to the laser transmission/diffusion, and thus many spurious measurements appear at the window side of the corridor as in Fig.8 (a). However our tracking system achieves the people tracking correctly as in Fig.8(c).
6. Conclusions
In this paper we presented a laser based multi-target tracking in cluttered environments where ambiguous/spurious measurements are visible in the laser image. The moving objects were detected based on a heuristic rule and an occupancy grid based method. The detected objects were tracked based on Kalman filter and the assignment algorithm. In order to enhance the tracking performance a rule based track management method was incorporated into the tracking system. Two experimental results validated our method. The method proposed in this paper allows detecting and tracking only the small moving objects (e.g., walking people). Current research effort is directed toward detecting and tracking moving objects with a wide range of size (e.g., mobile robots and vehicles as well as walking people).
142
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
Figure 5. Experimental environment
(a) Laser image with LRS
(b) Result of detecting moving object
(c) Result of tracking moving object
Figure 6. Experimental result; in the subfigure (c) black-dotted line, black-bold line and thin-bold line show the tracks of the mobile robot and two people, respectively.
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
143
Figure 7. Experimental environment
(a) Laser image with LRS
(b) Result of detecting moving object
(c) Result of tracking moving object Figure 8. Experimental result; in the subfigure (c) black-dotted line, black-bold line and thin-bold line show the tracks of the mobile robot and two people, respectively.
144
M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot
Acknowledgement This research was supported by the Academic Frontier Research Project on "New Frontier of Biomedical Engineering Research" of Ministry of Education, Culture, Sports, Science and Technology.
References [1] H.Koyasu, J.Miura, and Y.Shirai: Realtime Omnidirectional Stereo for Obstacle Detection and Tracking in Dynamic Environment, Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2001), CD-ROM (2001) [2] E.Prassler, J.Scholz and A.Elfes: Tracking Multiple Moving Objects for Real-Time Robot Navigation, Autonomous Robots, Vol.8, pp.105-116 (2000) [3] E.Prassler, J.Scholz and P.Fiorini: A Robotic Wheelchair for Crowded Public Environment, IEEE Robotics and Automation Magazine, pp.38-45 (2001) [4] B.Kluge, C.Kohler and E.Prassler: Fast and Robust Tracking of Multiple Moving Objects with a Laser Range Finder, Proc. of the 2001 IEEE Int. Conf. on Robotics & Automation (ICRA 2001), pp.1683-1688 (2001) [5] M.Lindstrom and J.-O.Eklundh: Detecting and Tracking Moving Objects from a Mobile Platform Using a Laser Range Scanner, Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2001), CD-ROM (2001) [6] D.Schulz, W.Burgard, D.Fox, and A.B.Cremers: People Tracking with Mobile Robot using Sample-based Joint Probabilistic Data Association Filters, The Int. J. of Robotics Research, pp.99-115 (2003) [7] A.Fod, A.Howard, and M.J Mataric: A Laser-Based People Tracker, Proc. of the 2002 IEEE Int. Conf. on Robotics & Automation (ICRA 2002), pp.3024-3029 (2002) [8] O.Frank, J.Nieto, J.Guivant, and S.Scheding: Multiple Target Tracking Using Sequential Monte Carlo Methods and Statistical Data Association, Proc. of the 2003 IEEE Int. Conf. on Robotics & Automation (ICRA 2003), pp.2718-2723 (2003) [9] C.Hue, J-P.Le Cadre, and P.Perez: Tracking Multiple objects with Particle Filtering, IEEE Trans. on Aerospace and Electronic Systems, Vol.38, No.3, pp.791-811 (2003) [10] D.Caveney, B.Feldman, and J.K.Hedrick: Comprehensive Framework for Multisensor Multitarget Tracking in the Adaptive Cruise Control Environment, Proc. of 6th Int. Symp on Advanced Vehicle Control (AVEC’02), pp.697-702 (2002) [11] K.Nakamura, H.Zhao, R.Shibasaki, K.Sakamoto, T.Ooga, and N.Suzukawa: Tracking Pedestrians by using Multiple Laser Range Scanners, Proc. of ISPRS Congress, vol. 35, part B4, pp.1260-1265 (2004) [12] B.Jensen, et al.: Narrative Situation Assessment for Human-Robot Interaction, Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA 2003), CD-ROM (2003) [13] C.Wang, C.Thorpe, and S.Thrun: Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas, Proc. of Int. Conf. on Robotics and Automation (ICRA 2003), CD-ROM (2003) [14] K.R.Pattipati, R.L.Popp and T.Kirubarajan: Survey of Assignment Techniques for Multitarget Tracking,: Multitarget-Multisensor Tracking: Applications and Advances volume III (Edited by Y.Bar-Shalom and W.D. Blair), Artech House, Inc., pp.77-159 (2000) [15] Y.Bar-Shalom and T.E.Fortmann: Tracking and Data Association, Academic Press,Inc. (1988)
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
145
Simultaneous Environment Mapping And Mobile Target Tracking Abedallatif Baba 1 and Raja Chatila 2 LAAS-CNRS 7 avenue du Colonel Roche 31077 Toulouse Cedex 4 France Abstract. This paper presents an approach for building a dynamic environment model based on an occupancy grid using a SLAM technique, while detecting and tracking mobile objects using an Auxiliary Multiple-Model particle filter. The mobile objects, such as people, are distinguished from the fixed parts and not included in the model, and their motion is tracked. Keywords. Mobile Robot Navigation, SLAM, People tracking, Particle filter, Occupancy grid.
1. Introduction Most of the approaches addressing SLAM and environment mapping assume that the environment is static. In the case of service and personal robots interacting with people, this hypothesis is clearly incorrect. The resulting map will therefore include the moving objects, or a trace of their trajectories as phantom obstacles, and this might result in incorrect models [7]. Moreover, it is highly desirable that a robot can determines the positions and trajectories of the humans in its surroundings in order to be able to focus its attention on those which are willing to interact with it (e.g., those approaching the robot or moving in its close vicinity). For these reasons, it is necessary for the robot to detect and track of mobile objects, while it is mapping its environment. This paper presents an approach that relies on an occupancy grid to model the environment, and a particle filter to track moving objects, which are not included in the map. In the next section, we briefly overview related work. Then in section 3 we recall mapping using occupancy grids. Section 4 presents the target tracking approach and the Auxiliary Multiple-Model particle filter technique. Section 5 shows results obtained so far, then we conclude with future directions. 2. Related Work There has been recently a number of results on robot mapping and localization in populated environments. Most approaches suppose a known map, and focus on distinguishing 1 E-mail: 2 E-mail:
[email protected] [email protected]
146
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
mobile objects from the static map. For example, Burgard et al. [4] update a static map using the most recent sensory input to deal with people in the environment. Montemerlo et al. [10] present an approach to simultaneous localization and people tracking using a conditional particle filter. Arras et al. [1] use line features for localization and filter range-measurements reflected by persons. Fox et al. [5] use a probabilistic technique to filter out range measurements that do not correspond to the environment model. Our technique, in contrast, does not require a given map. On the other hand, there are a variety of techniques for tracking persons, for predicting future poses of persons, or for adopting the behavior of the robot according to the information obtained about the persons in its vicinity [9,2,11]. However, these techniques do not filter the measurements corresponding to persons in order to improve the model of the environment. They suppose a known map. Our approach is tobuild the map while tracking people circulating in the environment to avoid including them in the map. Recently, Wang and Thorpe [13] addressed this issue. Their approach segments the data to extract features. Hahnel, Schulz, and Burgard. [8] presented a probabilistic approach to filtering people out of sensor data and to incorporate the results of the filtering into the mapping process. Their approach is very sound, but it needs to use scan alignment for each two successive maps to detect the dynamic objects, which may be time consuming. The approach we use in this paper doesn’t need specific features, which makes it more robust. It’s not using scan alignment neither: the environment is represented by occupancy grids, which enables to be independent of any kind of representation or features, and the grid representation is less complex to deal with that the scans themselves. Human motion is detected by comparing the successive occupancy grid models, and tracking is accomplished by a bearing only tracking approach using a particle filter.
3. Map Building and Moving Target Detection Approaches to environment mapping usually rely on appearance representations (raw data), feature-based visual or geometrical representations (e.g., 2D segments), occupancy grids, or topological graphs. All these representations have advantages and drawbacks. It is not the purpose of this paper to discuss the representation issues. We have selected for this work occupancy grid because of the simplicity of their construction. In this representation the environment is discretized in cells, and to each cell is assigned a random variable describing its probable state of being empty or occupied. We won’t detail the building of the occupancy grid here, the algorithm being widely known. See [12] for the theoretical foundations and the algorithm. Figure 1 is an example of the grid built by the robot in our lab. Robot localization is currently done by using an edge-based representation of obstacles. The occupancy grids representation enables to compare successive maps (represented by a matrix of numbers in [0,1]) to detect changes rather easily. A sudden change in occupancy state for already perceived grid cells indicates a moving object. The probability of occupancy of a given cell is indeed going to change drastically in two successive views for a mobile object, whereas for fixed objects there will be smooth changes. A comparison of consecutively built grids followed by a thresholding of the result reveal the moving object. As a first approach we are using a grid difference and thresholding. Figure 2 shows three consecutive maps with a target moving to the left .
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
147
Figure 1. Example of occupancy grid built with a 2D scanning laser range finder. The occupancy value is coded as follows: green: unknown; white: free; blue occupied. The area is 4x4m2 ; a cell is 5x5cm2
Figure 2. Occupancy grid with an object moving towards the left. The occupancy probabilities change drastically when the object moves.
4. Target Tracking Tracking people in the robot environment is not a new problem. Stated otherwise, this is a classical target tracking problem, present in a large number of applications. The Kalman filter, quite popular for SLAM, was actually devised for this application. In our context, we however need a recursive bayesian filter, which does not have the limitations of linearity nor gaussianity. As mentioned before, particle filters are a common tool for coping with this problem. The problem can be stated as follows: given several targets defined by their state (position, velocity, acceleration) measured by noisy sensors, compute at each time step the estimate of the target state. To simplify the tracking poroblem, we consider only the bearing information. In the next section, we recall the basic technique for bearing tracking. 4.1. Bearing-Only tracking The approach for bearing only tracking as developed in [6] is as follows. In the 2D plane, we suppose a target X t defined by its position and velocity. The robot state is also defined
148
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
by its position and velocity, and the relative target state vector is their difference. This is the state to be estimated. From the target state dynamic equation Xk = fk−1 , νk−1 and sensor measurements Zk = hk (Xk , wk ), where νk−1 and wk are considered to be white noises with known probability density functions and mutually independent, we want to compute the posterior P (Xk |Zk ). The target is supposed to be changing directions (maneuvring target), which is quite natural for humans in an indoors environment. Given a straight trajectory, the behavior models of the target are defined as moving straight ahead (noted 1), turning clockwise (2), and turning anti-clockwise (3). S = {1, 2, 3} denotes the set of three models. A transition probability matrix (jump Markov Model) defines the state change probability among these three behaviors: ⎛
Πab
⎞ π11 π12 π13 = ⎝ π21 π22 π23 ⎠ π31 π32 π33
where πij ≥ 0, and the sum along the lines is equal to one. 4.2. Auxiliary Multiple-Model Particle Filter The Auxiliary Sequential Importance Resampling filter (ASIR) [6], generates particles from the sample at k − 1. Each particle is assigned a weight conditioned on the current measurement. Thus, ASIR does resampling at the previous time step based on estimates that characterize the posterior of X knowing Xk−1 . The advantage is that the generated new points are in a region of high likelihood, which means a lower sensitivity to outliers if the process noise is small, which we suppose here given the dynamics of the problem. On the other hand, the Multiple-Model Particle Filter (MMPF) is a technique for non-linear filtering with switching dynamic models, which is the case of our problem. A combination of both ASIR and MMPF is the Auxiliary Multiple-Model Particle Filter which we have implemented (see [6] for the detailed algorithm).
5. Results We show in this section simulation results for the tracking approach discussed above. We have first applied the approach in the case where the dynamic target is one person moving randomly in the vicinity of a robot which is the observer. In this simulation, we have chosen the number of particles for each model of the set S in section 4, to be 100 particles, for having an good accuracy. In this case, the computation time to estimate each step of the target trajectory was 500 ms. However, a reasonable accuracy can be obtained with 10 particles only, with a computing time of 80 ms. The sampling interval is T = 0.05 s, the frequency of the laser scan being 20 Hz. The typical maneuvre target acceleration is 0.2m.s−2 . The transition probability matrix is:
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
149
Figure 3. Person tracking. The robot moves along the piecewise straight line (in green); the red circles represent the real target trajectory, the blue points are the particles.
⎛
Πab
⎞ .32 .34 .34 = ⎝ .34 .34 .32 ⎠ .34 .32 .34
Here we have chosen the probability for the target to go straight ahead π11 , less than the others, i.e., the person prefers to change directions randomly during movement, which will be more difficult to be tracked. We have also chosen probabilities π23 and π32 to be also less than the other values, because there is less chance for one person to change suddenly directions from right to left or the opposite. The robot-person geometry is shown in figure 3, where the robot and the person move in variable speed and their movements are simulated along the axes x and y by a linear model in which one variable changes randomly. Figure 4 is a zoomed part showing the more weighted particles. Figure 5 shows another run with a more complex trajectory. As shown in the figures, the results have a good accuracy (a few cm). Tracking can be done while the robot is constructing its map with the occupancy gird, which enables to detect the moving targets and to supply the tracker technique with their coordinates, and to discard them from the map, as mentioned in section 3. With a laser scanner, the occupancy grid algorithm distinguishes between the dynamic and the static cells using the occupancy probability which is related to the hit rate in the cell. Each time the algorithm detects a dynamic cell, it starts up a tracker chain for this detected cell. To reduce the number of trackers, we must however group neighboring cells moving with the same velocity and direction, considering them to be only one target. This approach can be generalized to tracking several persons moving simultaneously in the robot vicinity. Figure 6 shows two persons (targets) being tracked. The case when a moving target hides another moving target is not addressed in this paper. This might produce additional noise if the two targets are moving together.
150
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
Figure 4. Person tracking. Zoomed from figure 3. The circles (red) represent the real target trajectory, the points (in blue) are the particles. The stars (in blue) are the more weighted particles which represent the more probable predicted state of the target.
Figure 5. Person tracking. A more complex trajectory.
6. Conclusion In this paper we have presented a Bearing-Only Tracking approach to track people in the data obtained with robot sensors, while the robot is also building a map of its environment using an uncertainty grid, itself used to detect the moving targets. We have shown simulation results using MATLAB with good accuracy. The next stage is to experiment this tracking technique on the robot. Further improvements are to label the moving targets so that grid cells occupied by moving persons can be identified as such, even if the
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
151
Figure 6. Tracking two persons. The robot moves again along the piecewise straight line (in green); the red circles represent the real target trajectory, the blue points are the particles.
person comes to a stop. The purpose is to trigger an interaction with this person, and also to keep the mapping process from including in the model.
Acknowledgements The authors wish to thank the anonymous reviewers for their valuable comments.
References [1] K.O. Arras, R. Philippsen, M. de Battista, M. Schilt, and R. Siegwart. A navigation framework for multiple, mobile robots and its applications at the Expo.02 exhibition. In Proc. of the IROS-2002 Workshop on Robots in Exhibitions, 2002. [2] M. Bennewitz,W. Burgard, and S. Thrun. Using EM to learn motion behaviors of persons with mobile robots. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [3] D. Beymer and Konolige K. Tracking people from a mobile platform. In IJCAI-2001 Workshop on Reasoning, with Uncertainty in Robotics, 2001. [4] W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun. Experiences, with an interactive museum tour-guide robot. Artificial Intelligence, 114(1-2), 2000. [5] D. Fox, W. Burgard, and S. Thrun. Markov localization for mobile robots in dynamic environments. Journal of Artificial Intelligence Research (JAIR), 11:391-427, 1999. [6] B. Ristic, S. Arulampalam, N. Gordon. Beyond the Kalman filter, particle filter for tracking applications. Artech House 2004. [7] D. Hähnel, D. Schulz, and W. Burgard. Mapping with mobile robots in populated environments. In Proc. Of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [8] D. Hähnel, R. Triebel, W. Burgard, S. Thrun. Map Building with Mobile Robots in Dynamic Environments. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2003.
152
A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking
[9] ]B. Kluge, C. Koehler, and E. Prassler. Fast and robust tracking of multiple moving objects with a laser range finder. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2001. [10] M. Montemerlo and S. Thrun. Conditional particle filters for simultaneous mobile robot localization and people-tracking (slap). In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2002. [11] C. Stachniss and W. Burgard. An integrated Approach To Goal-directed Obstacle Avoidance Under Dynamic Constraints For Dynamic Environments. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [12] S Thrun. Learning occupancy grids with forward sensor models. Autonomous Robots, 15:111-127, 2003. [13] C.-C. Wang and C. E. Thorpe. Simultaneous Localization And Mapping With Detection And Tracking of Moving Objects. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2002.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
153
Omnidirectional Active Vision for Evolutionary Car Driving Mototaka Suzuki a,1 , Jacob van der Blij b , and Dario Floreano a a Laboratory of Intelligent Systems, Ecole Polytechnique F´ed´erale de Lausanne (EPFL) b Artificial Intelligence Institute, University of Groningen (RuG) Abstract. We describe a set of simulations to evolve omnidirectional active vision, an artificial retina scanning over images taken via an omnidirectional camera, being applied to a car driving task. While the retina can immediately access features in any direction, it is asked to select behaviorally-relevant features so as to drive the car on the road. Neural controllers which direct both the retinal movement and the system behavior, i.e., the speed and the steering angle of the car, are tested in three different circuits and developed through artificial evolution. We show that the evolved retina moving over the omnidirectional image successfully detects the taskrelevant visual features so as to drive the car on the road. Behavioral analysis illustrates its effective strategy in algorithmic, computational, and memory resources. Keywords. Active Vision, Omnidirectional Camera, Artificial Evolution, Neural Networks, Mobile Robots
1. Introduction The omnidirectional camera is a relatively new optic device that provides a 360 degrees field of view, and it has been widely used in many practical applications including surveillance systems and robot navigation [1,2,3]. However, in most applications visual systems uniformly process the entire image, which would be computationally expensive when detailed information is required. In other cases the focus is determined for particular uses by the designers or users. In other words, the system is not allowed to freely interact with the environment and selectively choose visual features. Contrarily, all vertebrates and several insects – even those with a very large field of view – share the steerable eyes with a foveal region [4], which means that they have been forced to choose necessary information from a vast visual field at any given time so as to survive. Such a sequential and interactive process of selecting and analyzing behaviorally-relevant parts of a visual scene is called active vision [5,6,7,8]. Our previous work has demonstrated that it can also be applied to a number of real world problems [9]. In this article we explore omnidirectional active vision applied to a car driving task: Coupled with an omnidirectional camera, a square artificial retina can immediately ac1 Correspondence
to: Mototaka Suzuki, Ecole Polytechnique F´ed´erale de Lausanne, EPFL-STI-I2S-LIS, Station 11, CH-1015 Lausanne, Switzerland. Tel.: +41 21 693 7742; Fax: +41 21 693 5859; E-mail: Mototaka.Suzuki@epfl.ch; WWW homepage: http://lis.epfl.ch.
154
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
cess any visual feature located in any direction, which is impossible for the conventional pan-tilt camera because of the mechanical constraints. It is challenging for the artificial retina to select behaviorally-relevant features in such a broad field of view so as to drive a car on the road. Omnidirectional active vision is not biologically plausible. But as claimed in [10] it is interesting to study visual systems from a broader point of view which contains those that have never been available in biology. Also, there are several engineering applications that could benefit from omnidirectional vision. Some promising applications are discussed in section 5. A 1/10 scale model car equipped with an omnidirectional camera and three different circuits are modeled in simulation. We show that the evolved retina moving over the omnidirectional image successfully detects the task-relevant visual features so as to drive the car on the road. Behavioral analysis illustrates its effective strategy in algorithmic, computational, and memory resources. In comparison to the results obtained with a pantilt camera mounted on the same car, we show that omnidirectional active vision performs the task very robustly in spite of more difficult initial conditions.
2. Experimental Setup Figure 1 shows the real and simulated model cars as well as the views through the real and simulated omnidirectional cameras. The omnidirectional camera consists of a spherical mirror and a CCD camera. It is mounted on a 1/10 scale model car (KyoshoT M ) which has four motorized wheels. We simulated the car and the circuits using Vortex1 libraries, a commercially available software package that models gravity, mass, friction, and collisions. Additionally we used a vision software for modeling the view from the omnidirectional camera, which had originally been developed in the Swarm-bots project2 . Figure 2 shows the three circuits; ellipse, banana, and eight shaped, used in the present evolutionary experiments. An artificial retina actively moves over the omnidirectional view3 . Figure 3 illustrates the unwrapping process from the polar view to the Cartesian view and the retina overlaid on each image. In order to evaluate the performance of the omnidirectional active vision system, we also simulate a pan-tilt camera mounted on the same car and compare the results obtained in the same experimental condition. The neural network is characterized by a feedforward architecture with evolvable thresholds and discrete-time, fully recurrent connections at the output layer (Figure 4). The input layer is an artificial retina of five by five visual neurons that receive input from a gray level image of 240 by 240 pixels. The activation of a visual neuron, scaled between 0 and 1, is given by the average gray level of all pixels spanned by its own receptive field or by the gray level of a single pixel located within the receptive field. The choice between these two activation methods or filtering strategies, can be dynamically changed by one output neuron at each time step. Two proprioceptive neurons provide input information about the measured position of the retina with re1 http://www.cm-labs.com 2 http://www.swarm-bots.org/ 3 A similar approach has been taken for evolving flocking behavior of three simulated robots independently in [11], inspired by our previous work [9].
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
155
Figure 1. The real 1/10 scale 4WD model car (KyoshoT M ) with an omnidirectional camera mounted on the roof of the car (top left) and the simulated one (top right). The car base is 19.5 cm (W), 43 cm (L), and 13.5 cm (H). View through the real omnidirectional camera (bottom left) and one through the simulated camera (bottom right).
Figure 2. Three circuits where the robot car is tested: from left to right, ellipse, banana, and eight shaped circuits. Each circuit is designed such that an 8 m×8 m room accommodates it. The width of the roads in all circuits is 50 cm. In the present experiments the sidelines are omitted.
spect to the chassis of the car: the radial and angular coordinates for the omnidirectional camera; or the pan and tilt degrees for the pan-tilt camera. These values are in the interval [retina size/2, radius − retina size/2] pixels (retina size = 240 pixels, radius = 448 pixels in these experiments) and [0, 360] degrees for radial and angular coordinates respectively. The values for the pan-tilt camera are in the interval [−100, 100] and [−25, 25] degrees respectively. Each value is scaled in the interval [0, 1] and encoded as a proprioceptive input. A set of memory units stores the values of the output neurons at the previous sensory motor cycle step and sends them back to the output units through a set of connections, which effectively act as the recurrent connections among output units [12]. The bias unit has a constant value of −1 and its outgoing connections represent the adaptive thresholds of the output neurons [13]. Output neurons use the sigmoid activation function f (x) = 1/(1 + exp(−x)) in the range [0, 1], where x is the weighted sum of all inputs. They encode the motor commands of the active vision system and of the car for each sensory motor cycle. One neuron determines the filtering strategy used to set the activation values of visual neurons for the
156
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
Figure 3. Left: The polar image taken by the omnidirectional camera and the overlaid retina. Right: The corresponding unwrapped image and the retina in Cartesian coordinate.
next sensory motor cycle. Two neurons control the movement of the retina (or camera), encoded as speeds relative to the current position. The remaining two neurons encode the directional and rotational speeds of the wheels of the car. Activation values above 0.5 stand for left (directional) and forward (rotational) speeds whereas activation values below 0.5 stand for right and backward speeds, respectively. Filter VangVrad D
S
Bias
Proprioceptive neurons
Memory neurons
Visual neurons
Figure 4. The architecture is composed of a grid of visual neurons with nonoverlapping receptive fields whose activation is given by the gray level of the corresponding pixels in the image; a set of proprioceptive neurons that provide information about the movement of the retina with respect to the chassis of the car; a set of output neurons that determine at each sensory motor cycle the filtering used by visual neurons, the new angular (Vang ) and radial (Vrad ) speeds of the retina (or pan and tilt speeds of the pan-tilt camera), and the directional (D) and rotational (S) speeds of the wheels of the car; a set of memory units whose outgoing connection strengths represent recurrent connections among output units; and a bias neuron whose outgoing connection weights represent the thresholds of the output neurons. Solid arrows between neurons represent fully connected layers of weights between two layers of neurons. Dashed arrows represent 1:1 copy connections (without weights) from output units to memory units, which store the values of the output neurons at the previous sensory motor cycle step.
The neural network has 165 evolvable connections that are individually encoded on five bits in the genetic string (total length= 825). A population of 100 individuals is evolved using truncated rank-based selection with a selection rate of 0.2 (the best 20 individuals make four copies each) and elitism (two randomly chosen individuals of the population are replaced by the two best individuals of the previous generation). Onepoint crossover probability is 0.1 and bit-toggling mutation probability is 0.01 per bit.
157
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
3. Evolution of Neural Controllers The fitness function was designed to select cars for their ability to move straight forward as long as possible for the evaluation time of the individual. Each individual is decoded and tested for three trials, each trial lasting 500 sensory motor cycles. A trial can be truncated earlier if the operating system detects a drop of the height of the center of the car, which corresponds to going off-road. The fitness criterion F is a function of the measured speed St of the four wheels and the steering direction Dt of the front wheels: F =
E T 1 f (St , Dt , t) t=0 e=0 E × T × Smax
f (St , Dt , t) = St ×
1−
(1)
|Dt |/Dmax
(2)
where St and Dt are in the range [−8.9, 8.9] cm/sec and [−45, 45] degrees, respectively; f (St , Dt , t) = 0 if St is smaller than 0 (backward rotation); E is the number of trials (three in these experiments), T is the maximum number of sensory motor cycles per trial (500 in these experiments), T is the observed number of sensory motor cycles (for example, 34 for a robot whose trial is truncated after 34 steps if the car goes off-road). At the beginning of each trial the position and orientation of the car as well as the position of the retina within the image are randomly initialized. We performed these replications of the evolutionary run starting with different genetic populations. In all cases the fitness reached stable values in less than 20 generations (Figure 5) which correspond to successful on-road driving. The fitness values both of the best individuals and of the population average obtained with the pan-tilt camera were higher than those with the omnidirectional camera in all three circuits. Notice that the fitness can never be one because the car must steer in corners so as to not go off-road. Omnidirectional camera, Eight-shaped circuit 0.6
0.5
0.5
0.4
0.4
Fitness
Fitness
Pan-tilt camera, Eight-shaped circuit 0.6
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
5
10
Generations
15
20
0
5
10
15
20
Generations
Figure 5. Evolution of neural controllers of the car with the pan-tilt camera (left) and the omnidirectional camera (right) in the eight shaped circuits. Thick line shows the fitness values of the best individuals and thin line shows those of the population average. Each data point is the average of three evolutionary runs with different initializations of the population. Vertical lines show the standard error.
158
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
4. Behavioral Analysis The behavioral strategy of the best evolved car equipped with the pan-tilt camera is as follows: At the beginning of the trial the car points the camera downward and to its right (or left depending on the evolutionary run), and it steers so to maintain the edge between the road and the background within the retina. Due to the lack of space, we show only the behavioral analysis of the best individual with the omnidirectional camera evolved in the eight shaped circuit because the circuit possesses all of the geometrical characteristics which the ellipse and banana shaped circuits have. It also has an intersection which would disturb the car’s perception if simple edge detection is the strategy developed by the evolved individual, which is sufficient for driving in the banana and ellipse shaped circuits. Indeed, since the best evolved individuals with the pan-tilt camera all adopted this strategy, they were disturbed more largely at the intersection and went off-road in several trials. Our preliminary tests also confirmed that the best individuals evolved in the eight shaped circuit were general in the sense that they could drive in the other two circuits as well. Figure 6 shows the strategy of the best individual evolved in the eight shaped circuit: During the starting period the evolved car moves back and forth, and then starts going forward at full speed once the retina finds the front road arena. Another effective strategy acquired by the best individuals in other evolutionary runs is that the car moves forward very slowly until the retina finds the road area, and starts driving at full speed immediately after finding it (data not shown). Both behaviors serve to spare time for the retina to find the road area during the most critical period at the beginning of the trial. In the intersection although the perception of the car is disturbed by the crossing road, which corresponds to the slight deviation of the trajectory, the evolved car manages to find the straight road beyond the intersection by moving the retina upward in the radial direction, which corresponds to “looking farther”, and corrects its trajectory (Figure 6, right). After passing the intersection, the retina moved downward again and maintained the straight road area in sight. The rotational speeds of the wheels and the angular position of the retina did not change significantly while passing the intersection.
5. Discussion The slightly lower fitness values of the evolved individuals with the omnidirectional camera than those with the pan-tilt camera are due to two main reasons: 1) It is harder to find the front road area out of the omnidirectional camera view than out of the pantilt camera view; 2) Evolved individuals can find the area, but it takes them some time because during the road search the car does not move much. Despite this more difficult initial condition, we have shown that artificial evolution selects the individuals capable of “waiting” until the retina finds the appropriate location and of driving at full speed after that. Therefore the slightly lower fitness values of the best evolved individuals with the omnidirectional camera do not mean that those individuals are inferior to those with the pan-tilt camera. The lower fitness is caused by the waiting behavior at the beginning of each test in order to robustly start driving at full speed afterward. Such a behavior has never been observed in any evolutionary run with the pan-tilt camera. Once the road area is detected, the best evolved car with the omnidi-
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
159
Figure 6. An example of the evolved car with the omnidirectional camera driving in the eight shaped circuit. Top left: Motor activations of the wheel speed (thick line) and the steering angle (thin line) during the first 150 time steps. Speed values above – and below – 0.5 stand for forward – and backward – rotational speeds. Angle values are mapped from 0.5 to 1 for rotations to the left of the car and from 0.5 to 0 for rotations to the right. Dotted vertical line denotes the moment when the retina fixated its position on the front road area. Bottom left: The corresponding trajectory of the car (shown only for the first 70 time steps). Top right: The radial position of the retina. Shaded period corresponds to while passing the intersection. The angular position of the retina remained stable and did not change even while passing the intersection. Bottom right: The corresponding trajectory of the car around the intersection.
rectional camera perfectly drives as that with the pan-tilt camera does. The advantage is lower algorithmic, computational, and memory resources. For comparison with the results obtained with the pan-tilt camera, we did not implement the zooming function in the present experimental setup. However it enables the system to select visual features more specifically by choosing an appropriate resolution for each feature. Indeed, our previous work has demonstrated that in a non-panoramic view the active zooming function played a crucial role in their performance [9], which encourages us to further apply it to the current system. The current setup also allows us to lay multiple retinas within a single omnidirectional camera image so that they each specialize in detecting separate features simultaneously and perform behaviors. In real circuits, there are a number of features to which car drivers must pay attention (e.g., sidelines, signals, other cars). Indeed, [14] developed a multi-window system with a conventional non-panoramic camera for detecting such features during real car navigation, but the position and shape of each window were predetermined by the designers. Active feature selection by multiple retinas which are moving independently over the omnidirectional view may display significant advantages over an active single retina or the fixed multi-window system in several tasks, e.g., navigation
160
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
in dynamic, unknown environments. Further investigations must be done to validate this hypothesis. The present method may also offer a significant advantage in landmark navigation of mobile robots because of its fast, parallel searching for multiple landmarks in any direction. Instead of processing the 360 degrees field of view, the system could actively extract only small task-related regions from the visual field, which dramatically lightens the computation and the memory consumption.
6. Conclusions In this article we have explored omnidirectional active vision applied to a car driving task. The present simulations have shown that the evolved artificial retina moving over the omnidirectional view successfully detects the behaviorally-relevant feature so as to drive the car on the road. Although it costs time to find the appropriate feature in such a broad field of view during the starting period, the best evolved car overcomes the disadvantage by moving back and forth or moving slowly until the retina finds the appropriate feature. Then the car starts driving forward at full speed. The best evolved car equipped with the omnidirectional camera performs robust driving on the banana, eight, and ellipse shaped circuits in spite of the more difficult initial condition. The advantage of the present methodology is lower algorithmic, computational, and memory resources. Currently we aim to implement an active zooming function of the retinas, to generalize the neural controllers in different-featured circuits (e.g., backgrounds, sidelines, textures), and to transfer the evolved neural controller into the real car shown in Figure 1.
Acknowledgments The authors thank Giovanni C. Pettinaro for the software to model the view of the omnidirectional camera, which has originally been developed in the Swarm-bots project. Thanks also to Danesh Tarapore for enhancing the readability of this article and to two anonymous reviewers for their useful comments. MS and DF have been supported by EPFL. JB has been supported by University of Groningen.
References [1] Y. Matsumoto, K. Ikeda, M. Inaba, and H. Inoue. Visual navigation using omnidirectional view sequence. In Proceedings of International Conference on Intelligent Robots and Systems, pages 317–322, 1999. [2] L. Paletta, S. Frintrop, and J. Hertzberg. Robust localization using context in omnidirectional imaging. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2072–2077, 2001. [3] I. Stratmann. Omnidirectional imaging and optical flow. In Proceedings of the IEEE Workshop on Omnidirectional Vision, pages 104–114. IEEE Computer Society, 2002. [4] M. F. Land and D.-E. Nilsson. Animal Eyes. Oxford University Press, Oxford, 2002. [5] R. Bajcsy. Active perception. Proceedings of the IEEE, 76:996–1005, 1988.
M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving
161
[6] J. Aloimonos, I. Weiss, and A. Bandopadhay. Active vision. International Journal of Computer Vision, 1(4):333–356, 1987. [7] J. Aloimonos. Purposive and qualitative active vision. In Proceedings of International Conference on Pattern Recognition, volume 1, pages 346–360, 1990. [8] D. H. Ballard. Animate vision. Artificial Intelligence, 48(1):57–86, 1991. [9] D. Floreano, T. Kato, D. Marocco, and E. Sauser. Coevolution of active vision and feature selection. Biological Cybernetics, 90(3):218–228, 2004. [10] C. G. Langton. Artificial life. In Artificial Life, pages 1–48. Addison-Wesley, 1989. [11] S. Lanza. Active vision in a collective robotics domain. Master’s thesis, Faculty of Engineering, Technical University of Milan, 2004. [12] J. L. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990. [13] J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the theory of neural computation. Addison-Wesley, Redwood City, CA, 1991. [14] E. D. Dickmanns, B. Mysliwetz, and T. Christians. An integrated spatio-temporal approach to automatic visual guidance of autonomous vehicles. IEEE Transactions on Systems, Man, and Cybernetics, 20(6):1273–1284, 1990.
This page intentionally left blank
Part 3 Localization
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
165
Map of Color Histograms for Robot Navigation Takanobu KAWABE a , Tamio ARAI b , Yusuke MAEDA c,1 and Toshio MORIYA d a Nippon Telegraph and Telephone West Corp., Japan b The University of Tokyo c Yokohama National University d Hitachi, Ltd. Abstract. In this paper, we propose a method to estimate a robot position and orientation in a room by using a map of color histograms. The map of color histograms is a distribution map of colors measured by a mobile robot in multiple directions at various points. The size of the map should be strongly compressed because the memory capacity of the robot is restricted. Thus, the histograms are converted by the discrete cosine transform, and their high-frequency components are cut. Then, residual low-frequency components are stored in the map of color histograms. Robot localization is performed by matching between the histogram of an obtained image and reconstructed histograms out of the map by means of histogram intersection. We also introduce a series of estimations with the aid of odometry data, which leads to accuracy improvement. Keywords. Color histograms, robot navigation, mobile robots
1. Introduction The number of robots used for entertainment and as pets has increased dramatically in the past decade [1]. Self-localization is one of the most fundamental functions of pet robots so that it can live and perform some useful tasks at home and in the office. Compared with conventional AGV (Automated Guided Vehicle) or mobile robots, pet robots cannot move precisely. This is partly because their environment is not well structured and partly because their locomotive mechanism is unsteady and their sensing systems detect too much noise. Thus, several methods of self-localization should be introduced to ensure that the robots have robust navigation systems. Some of them might be good in a global environment, and some others might be accurate enough in small areas. One easy and robust method of localization is to utilize artificial or natural landmarks as studied in [2][3]. Operators, however, need to teach the features and positions of the landmarks [4]. Robots can take a route that is accurately taught in advance as a sequence of sensor data [5][6]. However, such a data sequence must be relatively large. 1 Correspondence
to: Yusuke MAEDA, Div. of Systems Research, Fac. of Eng., Yokohama National University, 79-5 Tokiwadai, Hodogaya-ku, Yokohama 240-8501, JAPAN; Tel.: +81 45 339 3918; Fax: +81 45 339 3918; E-mail:
[email protected].
166
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
Inverse Discrete Cosine Transform
Discrete Cosine Transform
(a) Original Image
(b) Normalized Color Histogram
(c) Low Frequency Components
(d) Compressed Color Histogram
Figure 1. The process of making a compressed color histogram
In this paper, we propose a simple and robust algorithm for a small robot to localize itself by matching an obtained image onto a map of color histograms. Here, “localization” means that both the position and orientation of a robot are estimated. The proposed algorithm is expected to be applied to a pet robot such as Sony’s AIBO [7] that has the following characteristics: 1. Vision system: The robot has a conventional low-resolution color camera installed in its forehead, whose height is close to the floor level. 2. Memory capacity: The available memory for the data for navigation is severely restricted. In this section, we discussed the requirement for pet robot navigation and its solution. Our proposed method is outlined in Section 2, and color histograms are then explained in Section 3. Section 4 deals with self-localization using the similarity between two histograms. In Section 5, maps of color histograms are generated in a room environment; experiments involving self-localization are conducted, and the results are discussed. The conclusion is presented in Section 6.
2. Outline of Map of Color Histograms Let us assume that a mobile robot moves around in a room with sets of furniture, such as chairs and desks, and items such as waste-paper baskets. Since the height of the robot’s camera is lower than that of the furniture, the images are directly affected by the position of the furniture. Therefore, the making of a map of a room requires that multiple images of the room be taken from various positions in various directions. The characteristics necessary for the image map include the following: • • • •
Robustness against changes in illumination Robustness against occlusions Small memory consumption Simple and quick matching
To meet the requirement, we propose a map of color histograms. The robot can be localized based on the similarity between color histograms of an obtained image and those stored in the map. The sequence of making a compressed color histogram for the map is illustrated in Figure 1: the obtained image (a) is converted into three sets of normalized color histograms (b); each histogram is transformed into frequency components by the discrete cosine transform and only low-frequency components are stored in the map (c); when the map is used, the stored low-frequency components are transformed inversely into histograms (d).
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
167
However, color histograms of images taken at different poses can be very similar. Therefore, in order to improve the accuracy of self-localization, we also use a history of color histograms and odometry.
3. Representation of Color Histograms 3.1. Normalized Color Histograms A color histogram is obtained for each color component from an image taken by a camera. Each histogram has the number of pixels in the vertical axis against the amount of the corresponding color component in the horizontal axis, as shown in Figure 2. Then, the values of each color component are divided by the number of total pixels of the image in order to normalize the histogram in terms of the image size. As described in Section 2, the color histogram must be robust against changes of the lighting conditions in both color temperature and brightness. The robustness is strongly affected by the expression of color. Thus, we compared the color space, such as RedGreen-Blue (RGB), Hue-Saturation-Intensity (HSI, or HSV or HSL), and YUV. Generally speaking, the hue in HSI has little influence on the lighting conditions. The hue, however, is not stable when the chroma of the color is low. Thus, we adopt the RGB color space for histogram generation. The average brightness of the whole image should also be normalized for the robustness. We adjust the brightness of the image by means of gamma correction before we make the map of color histograms. Let us formulate the above procedure in more detail. A normalized color histogram on each color c out of RGB, hc , is defined as follows: hc = {hc (0), hc (1), . . . , hc (N − 1)} 1 δ(ic , n) hc (n) = NI δ(ic , n) =
(1) (2)
i∈I
1
n 1 when N ≤ ic γ < 0 otherwise,
n+1 N ,
(3)
where N is the number of discrete steps for the amount of each color component (in this research, N = 256); i is one of the pixels in an image, I; NI is the number of pixels of I; 0 ≤ ic ≤ 1 is defined as the amount of color c of the pixel i. The gamma-value γ is adjusted so that the average of the brightness of all pixels of the image might be set to 0.5 as follows: 2 γ = argmin (Y (i, γ) − 0.5) , (4) γ
i∈I
where 1/γ
1/γ
Y (i, γ) = 0.299 ired + 0.587 i1/γ green + 0.114 iblue .
(5)
168
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
Histogram
400 300 200 100 0
0.025
Normalized Histogram
Normalized Histogram
500
0.02 0.015 0.01 0.005 0
0
64
128 192 Brightness (unit)
0.015 0.01 0.005 0
0
Figure 2. Original Histogram
0.02
64
128 192 Brightness (unit)
0
64
128 192 Brightness (unit)
Figure 3. Normalized Histogram Figure 4. Compressed Histogram
3.2. Histogram Compression by Discrete Cosine Transform We use the discrete cosine transform (DCT) for compression of color histograms. It is a technique for converting a signal into elementary frequency components, and widely used in compression, especially, in image compression, such as JPEG or MPEG. The eigenspace method is another possible algorithm of compression; however, a very large amount of calculation is required in order to obtain an orthogonal basis. On the other hand, the DCT is easy to calculate. When the histogram is used to express the characteristics of an image, the outward appearance is more important than the detailed appearance. Moreover, the form of color histograms of a natural image usually slopes gently. Therefore, the high-frequency components of a color histogram can be ignored. Let us denote the DCT of a histogram h = {h(i)|i = 0, 1, . . . , N − 1} by H. Then, H is given by: H = {H(u)|u = 0, 1, . . . , N − 1} N −1 (2x + 1)uπ 2 H(u) = k(u) h(x) cos N 2N x=0 1 √ when u = 0, 2 k(u) = 1 otherwise.
(6) (7)
(8)
The high-frequency components of the DCT may include a considerable amount of noise in the histogram. The noise is not only caused by colors on the texture of objects but is also generated as false intermediate colors in pixels; for example, pink is often found between black and white. Therefore, only low-frequency components, H(0), . . . , H(NA − 1) are stored in the map of color histograms (NA ≤ N ). ˆ can be reconstructed by applying the inverse An approximated original histogram h discrete cosine transform (IDCT) to the low-frequency components H(0), . . . , H(NA − ˆ is given by: 1). The reconstructed color histogram, h, ˆ = h(u)|u ˆ h = 0, 1, . . . , N (9) ˆ h(u) =
NA −1 (2u + 1)xπ 2 . k(x)H(x) cos N x=0 2N
(10)
In our experiments, NA is set to one-eighth of N (i.e., NA = 32). The three figures from Figure 2 to Figure 4 show the difference among the histograms. The original color
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
(a-1)
169
(a-2)
(3)
(b-1) (1) Original Images
(b-2) (2) Filtered Histogram
(3) Histogram Intersection of (a) and (b)
Figure 5. Histogram Intersection
histogram, Figure 2, changes into Figure 3 by normalization with regard to the image size and the brightness. Figure 4 is a reconstructed histogram from the low-frequency components of the DCT; it shows that the rough form of Figure 3 is maintained, although the high-frequency components of Figure 3 were removed.
4. Self-Localization Using Map of Color Histograms 4.1. Similarity Between Two Histograms The similarity between two histograms is calculated by the algorithm of the histogram intersection [8]. The intersection between the two histograms, h1 and h2 , is given by: d∩ (h1 , h2 ) =
N −1
min(h1 (i), h2 (i)),
(11)
i=0
which means the area of the overlapping portion of the two histograms. The process of the algorithm is explained in Figure 5. Two original images, (a-1) and (b-1), are converted separately into RGB histograms: each histogram is transformed and squeezed by the DCT and a low-pass filter, and then transformed inversely to a compressed histogram, as shown in (a-2) and (b-2). The intersection is computed as (3). The matching for localization is taken in the space of the histogram rather than that of the frequency. This is because the former produces more direct and reliable results. ˆ red1 , h ˆ green1 , h ˆ blue1 ) ˆ 1 = (h The similarity between two sets of RGB histograms, H ˆ ˆ ˆ ˆ and H2 = (hred2 , hgreen2 , hblue2 ), can be calculated by the summation of the histogram intersections of RGB as follows: ˆ red1 , h ˆ red2 ) + d∩ (h ˆ green1 , h ˆ green2 ) + d∩ (h ˆ blue1 , h ˆ blue2 ). (12) ˆ 1, H ˆ 2 ) = d∩ (h d∩ (H
170
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
A robot can be localized based on the similarity between a set of color histograms of ˆ and each set of reconstructed histograms from the map; we can an obtained image, H, ˜ as follows: estimate the position and orientation of the robot, p, ˆ H(m)) ˆ m ˜ ← argmax d∩ (H,
(13)
p˜ ← p(m), ˜
(14)
m∈M
ˆ where M is the map of color histograms; m is an element of the map; H(m) is a set of reconstructed RGB color histograms at m; p(m) is the position and orientation corresponding to m. 4.2. Localization Accuracy Improvement Using Odometry Self-localization of a robot as described above may sometimes fail, because color histograms of images taken at different poses can be very similar. Thus, we use a history of color histograms and odometry in order to improve the localization accuracy. Let us assume that the robot moves stepwise. We denote the relative pose difference before and after the i-th movement by Δpi , which can be roughly estimated from odometry. Then, self-localization after the n-th movement can be performed as follows: pn−k (m) = p(m) −
k−1
Δpn−i
(15)
i=0
mn−k (m) ← argmin p(m ) − pn−k (m) m ∈M
m ˜ ← argmax m∈M
n
(16)
ˆ n−k , H(m ˆ wk d∩ (H n−k (m)))
(17)
p˜ ← p(m), ˜
(18)
k=0
ˆ i is a set of RGB color histograms at the i-th step; w is a discount factor to where H alleviate the effect of odometry error and 0 ≤ w ≤ 1.
5. Experiments Localization experiments were conducted to estimate the error of the proposed method. They were made in a room 6 [m] wide by 8 [m] long in the daytime. A robot installed with a CCD color camera, XC-777A by Sony, was manually moved in the room and took images. The height of the camera was 505 [mm]. The number of pixels of the camera was 768[Horizontal] by 494[Vertical], but 160[H] by 120[V] was used for processing. 5.1. Procedure The experimental procedure was as follows:
171
T. Kawabe et al. / Map of Color Histograms for Robot Navigation 0.5
600
0.4 rotational error [rad]
translational error [mm]
700
500 400 300 200
with normalization
100
without normalization
0.3 0.2 with normalization 0.1
0
without normalization
0
0
2
4
6
8
10
12
14
16
0
2
4
6
8
10
12
14
16
step
step
Figure 6. Average Translational Error
Figure 7. Average Rotational Error
1. Image Acquisition for Map of Color Histograms: First, the images were taken in the area of 2.4 [m] by 1.2 [m] in the room. The robot pose (x, y, θ) at the sampling points was as follows: x = 305i − 1220 [mm] y = 305j − 610 [mm]
(i = 0, . . . , 8) (j = 0, . . . , 4)
θ = kπ/6 [rad] (k = 0, . . . , 11). Thus, the total number of the sampling points was 9 × 5 × 12 = 540. Then, the map was defined as a list of all the quadruplets m = (x, y, θ, V), where V = {(Hred (i), Hgreen (i), Hblue (i))|i = 0, . . . , NA − 1}. Since one pixel has one byte for each RGB, 3 × 160 × 120 = 57600 bytes are necessary for an image. 2. Generation of Map of Color Histograms: Now, each image was compressed into low-frequency components of a normalized color histogram on each color of RGB by means of the algorithm given in Section 3. As a frequency component was expressed by 8 bytes of a real number and only 32 components were stored for each color in the map, the size of the frequency components was 8 × 32 × 3 = 768 bytes. It was 1.33% of its original image size, 57600 bytes. The total size of color histograms in the map was 768[B] × 540 ≈ 0.4[MB]. 3. Localization: We made one hundred trials of self-localization. The initial position and orientation of the robot at each trial was chosen at random. For simplicity, we restrict the robot pose as follows: x = 305i/2 − 1220 [mm] y = 305j/2 − 610 [mm]
(i = 0, . . . , 16) (j = 0, . . . , 8)
θ = kπ/12 [rad] (k = 0, . . . , 23). The robot was moved at random to an adjacent pose manually for each step. 5.2. Experimental Results Average translational and rotational errors of localization in 100 trials are shown in Figure 6 and Figure 7, respectively. For comparison, results without normalization (i.e.,
172
T. Kawabe et al. / Map of Color Histograms for Robot Navigation
without gamma correction for adjustment of the brightness) are also shown. We can see that normalization leads to better localization. Each of the graphs tends to decrease as the step number increases; that is the accuracy improvement by the history of color histograms and odometry. The translational and rotational errors almost converge to a certain level. The residual translational error is larger than the distance interval of sampling points, 305 [mm], while the residual rotational error is smaller than the angle interval of the sampling points, π/6 ≈ 0.52 [rad]. This is because the color histograms are more sensitive to the rotational movement of the robot than the translational one (especially in the camera direction).
6. Conclusion In this paper, we proposed an algorithm that makes it possible for a robot to localize itself with a vision sensor by using a map of color histograms. • The normalized color histogram of an image by gamma correction is used for the robustness against changes of the lighting conditions. • The map is compressed by means of the DCT of color histograms, thus retaining only the low-frequency components of the histograms. The size of the compressed map is reduced to 1.33 % of that of original images. • In the experiments in a room environment, the localization error was about 400 [mm] in translation and 0.3 [rad] in rotation with the help of odometry, when the interval of sampling points for the map building was 305 [mm] in translation and 0.52 [rad] in rotation.
References [1] H. H. Lund (2003) “Adaptive Robotics in the Entertainment Industry,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 595–602. [2] G. Jang, S. Kim, W. Lee, I. Kweon (2003) “Robust Self-localization of Mobile Robots using Artificial and Natural Landmarks,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 412–417. [3] V. Ayala, J. B. Hayet, F. Lerasle, M. Devy (2000) “Visual Localization of a Mobile Robot in Indoor Environments Using Planar Landmarks,” Proc. of 2000 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 275–280. [4] R. Ueda, T. Fukase, Y. Kobayashi, T. Arai, H. Yuasa, J. Ota (2002) “Uniform Monte Carlo Localization - Fast and Robust Self-localization Method for Mobile Robots,” Proc. of 2002 IEEE Int. Conf. on Robotics and Automation, pp. 1353–1358. [5] Y. Adachi, H. Saito, Y. Matsumoto, T. Ogasawara (2003) “Memory-based Navigation Using Data Sequence of Laser Range Finder,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 479–484. [6] T. Matsui, H. Asoh, S. Thompson (2000) “Mobile Robot Localization Using Circular Correlations of Panoramic Images,” Proc. of 2000 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 269–274. [7] M. Fujita, H. Kitano (1998) “Development of an Autonomous Quadruped Robot for Robot Entertainment,” Autonomous Robots, vol.15, no.1, pp. 7–18. [8] M. J. Swain, D. H. Ballard (1991) “Color Indexing,” Int. J. on Computer Vision, vol.7, no.1, pp. 11–32.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
173
Designing a System for Map-Based Localization in Dynamic Environments Fulvio Mastrogiovanni, Antonio Sgorbissa 1 , and Renato Zaccaria DIST - University of Genova Abstract. This paper deals with map-based self-localization in a dynamic environment. In order to detect localization features, laser rangefinders traditionally scan a plane: we hypothesize the existence of a “low frequency cross-section” of the 3D environment (informally, over people heads), where even highly dynamic environments become more “static” and “regular”. We show that, by accurately choosing the laser scanning plane, problems related to moving objects, occluded features, etc. are simplified. Experimental results showing hours of continuous work in a realworld, crowded scenario, with an accuracy of about 2.5cm and a speed of 0.6m/sec, demonstrate the feasibility of the approach. Keywords. Mobile Robotics, Map-based self-localization, System design
1. Introduction This paper deals with the self-localization subsystem of an autonomous mobile robot within a dynamic, indoor environment. It is commonly accepted that, if we want the robot to operate autonomously for a long period of time while carrying out complex navigation tasks (e.g., for autonomous trasportation, autonomous surveillance, museum tour guiding, etc. [1] [4]), self-localization is a fundamental, yet-to-be solved issue. Since incremental dead-reckoning methods are not sufficient because of cumulative errors which increase over time, a very common approach consists in collecting observations from the surrounding environment (natural or artificial features or landmarks), and to periodically update the robot pose by comparing measurements with a map of the environment. If required, each new observation can be used to update the map itself, thus estimating both the pose of the robot and each surrounding feature in the environment (this is also known as Simultaneous Localization And Mapping). In this general scenario, existing localization methods are often classified along different axes (Local Vs. Global, Active Vs. Passive, etc. [5]). In spite of their differences, almost all the existing approaches rely on the so-called Markov assumption; i.e., sensor readings are assumed to depend only on the relative configuration of the robot with respect to the surrounding, presumed static features. Obviously, this is unrealistic in realworld scenarios, where people walk all around and objects are moved frequently from place to place. However, since the assumption allows to simplify computations, most ap1 Correspondence to: Antonio Sgorbissa, DIST - University of Genova. Tel.: +39 010 353 2801; Fax: +39 010 353 2154; E-mail:
[email protected].
174
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
proaches rely on post-processing techniques to filter out data originating from unmodeled, dynamically changing features. The present work is focused on a method relying on a SICK laser rangefinder as the only sensing device in the system, and an Extended Kalman Filter for position estimate. In particular, we face problems related to the dynamism and the inherent complexity of real-world environments by introducing a new dimension, i.e. elevation from the ground. Since laser rangefinders traditionally scan a plane, we hypothesizes the existence of a 2D Low Frequency Cross-Section (informally, “over people heads”), where even highly dynamic environments become more “static” and “regular”: the attribute “low frequency” thus refers both to the temporal and the spatial dimension. The localization process can be successfully managed here, with the robot standing “on the tips of its toes”, thus decoupling position tracking from obstacle avoidance: whilst features for localization should be extracted in the LFCS, which is static and has a simpler topology, sensors for obstacle avoidance should be put lower, in order to deal with the complex and ever changing topology of a dynamic and chaotic human environment. Section II clarifies the LFCS concept, and introduces an “unevenness index” to better formalize it. In Section III the feature extraction and matching algorithms are discussed. Section IV shows experimental results. Conclusions follows.
2. The Low Frequency Cross-section Most approaches in literature make assumptions about the stationarity of the environment. People walking around the robot, objects moved from place to place, or even other moving robots evidently violate this assumption, thus making feature detection harder both as a consequence of occlusions and the fact that features themselves change significantly over short periods of time. Two approaches are widely known in literature to deal with this problem. According to the first approach, the map of the world has to be continuously updated using current observations, in order to reflect the actual state of the environment. This approach is often unfeasible, because of the complexity and the large amount of relevant features which must be taken into account. Recently, important advances in the field of Simultaneous Localization and Mapping (SLAM) [7] [8] have lead to partial solutions, in particular for those features of the environment which appear/disappear at a low frequency (e.g., furniture); however, this is an open research issue. The second approach is based on techniques to filter out sensor readings which do not have a correspondence in the a priori map. For example, in [5] such a filter is used to compute the entropy value of the state estimate, thus considering only perceptions which reduce the entropy value itself. The main drawback of the approach is its inherent conservativeness, since it is not able to incorporate new information in the map. The main purposes of this work is to introduce a novel approach to deal with complex, dynamic indoor environments. We notice that the intrinsic dynamic nature of human populated environments is not a characteristic of the environment “as a whole”, but it mainly depends on elevation from the floor. In particular, it does exist a well defined height range, roughly located over our heads, which we call the Low Frequency CrossSection: experience shows that, in such 2D cross-section of the 3D Workspace, the status of the world varies at an extremely low frequency. Consider for example the displacement
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
175
Figure 1. Bottom, mid and top view.
of objects and furniture from one place to another: in a typical office-like environment, people usually move chairs, tables, wastebaskets, boxes, PCs, books, etc., from and to places which are “close” to the floor, in order to be handy for humans; these objects are moved around on a daily-basis. Shelves and closets, which are placed at higher elevation from the floor or have a bigger size which could cover an entire wall, are displaced (at most) on a yearly-basis, i.e. a time significantly longer than the life of most current robotic set-ups. We can summarize this by saying that in LFCS the environment is more “static” than in the area below. For analogous reasons, we claim that, over a given height, the environment is more “regular” and “simple” than below. This is a consequence of the fact that humans are inherently “conservative”, i.e., they tend to minimize the energetic strain when performing a particular task. Therefore, it is more common to put down a book on a messy stack on our desktop (if we use it everyday) rather than in a well ordered (but not easily reachable) shelf located over our heads, especially if we plan to take it again after a couple of minutes. Fig. 1 shows a picture of our lab, with laser scans taken at different heights: bottom view (10 cm from the floor, quite messy); middle view (90 cm, very messy); top view (200 cm, i.e. in the LFCS). The idea of a “freely sensing area” is not new. A successful attempt is described in [6], where authors deal with the localization problem by pointing a TV camera upward, with the purpose of detecting ceiling lights and matching them against a previously stored map in memory. The main difference between LFCS and this approach is straightforward: LFCS is a general method for laser based feature extraction, which opportunistically chooses the most promising laser scanning plane but can be adopted in any indoor environment; the approach in [6] is oriented towards solving a very specific problem, thus not being portable at all. To give a formal meanings to terms such as “static”, “regular” and “simple”, we introduce an “unevenness index”; unevenness measures and the fractal degree are commonly used in geography to measure the length and complexity of shore profiles, and can be applied to quantify the regularity of a given laser scan. The unevenness index U is simply provided by the following formula: U=
L C
(1)
L in Equation 1 is the length of the polyline which is built by connecting all the raw scan points, thus not relying on a particular segmentation algorithm. C – commonly referred to as the chord of the polyline – is the length of the segment located between the first and the last scan point. It is straightforward to depict the behavior of U :
176
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
1) U is always greater than 1. This comes from the fact that the link between the start and the final point of a polyline is always shorter than the polyline itself. U could theoretically be equal to 1 in the case the polyline is made up of a single segment: in other words, when all the points are collinear. 2) U is a measure of the discontinuity between contiguous scan points. High U values are detected if range measurements vary at high frequencies between contiguous scan points. On the contrary, low U values are detected if the points are mostly colinear or their range variation is very smooth. 3) U is proportional to the depth of the environment. Long corridors are mainly characterized by high values of U , while rooms by lower values. This “not explicitly modeled” effect can be reduced, but not removed, by dividing a full scan into disjoint subsets of range measurements, and considering each polyline and the corresponding chord separately. Notice that this is similar to the approach adopted in geography, where coastal profiles are divided into “mostly convex” and “mostly concave” areas before computing the fractal degree. All the data collected confirm the intuition that, at ground level, U is higher; it significantly decreases as we move the laser to a higher elevation, i.e., corresponding to the LFCS. Here, transitions between close scan points are generally smoother, since the environment is less cluttered. Finally, the same is obviously true in a dynamic environment, crowded with people moving around the robot: U substantially increases - with respect to its reference value computed in the LFCS - when scans are taken at a lower height.
3. Self-Localization Two phase are iteratively executed, namely Acquisition and EKF-Update. 3.1. Acquisition We adopt a typical Split & Merge method [9]. This family of line extraction methods exhibits a number of advantages compared to other existing techniques [10] [11] [12] [3] [2]: they exploit local scan templates, they are faster – due to the fact they act on smaller data sets – and produce more accurate results. The required steps are: Preprocessing, Scan Segmentation, Segment Line Extraction, and Feature Matching. Preprocessing and Scan Segmentation do not deserve further discussion: they basically consist in standard techniques to filter out invalid readings, and to segment range data into “almost continuous”, non overlapping, sets of points Sj to be feeded to the next computation step. Line extraction recursively iterates on each set, trying to find local subsets strictly made up of collinear points. This is accomplished by iteratively calculating the fitting value φj of each set Sj (equations are not shown here: see [9]). If φj ≤ Φ, where Φ is the fitting threshold, the points belonging to Sj are considered collinear. If not, the algorithm splits Sj into m smaller sets and computes φjm for each of them. This possibly originates several segment lines from the original set. When a segment line lj is found, ρlj and θlj are measured: in particular, by defining Lj (with capital letter) the straight (infinite) line on which lj lies, ρlj is the distance between Lj and the robot, θlj is the angle between ρlj and the robot heading. By relying on geometrical considerations, both end points of each segment line lj are found. Finally, a set of segment lines λ such that
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
177
Figure 2. Left: The plot of U versus time in a different 5 mins tests; Right: mean and variance.
Figure 3. Left: The plot of U versus time in a 3 hours test; Right: mean and variance.
λ = lj , j = 1, ..., |λ|
(2)
is returned, ordered according to the angle θlj associated to each lj (from right to left). During Feature Matching, each line lj is compared with the segment lines which compose the environmental model. The model is a set of |μ| oriented segment lines (each defined by a start and an end point) to allow to distinguish the opposite surfaces of the same wall (or object). μ = mk , k = 1, ..., |μ| (3) In particular, different heuristic conditions must be satisfied in order to obtain a correct match; equations are not shown here for sake of brevity. As a result, each sensed line lj can virtually match against more map lines mk , thus producing a set of matching couples: M Cj = < lj , mk >, mk ∈ M In the next phase, some criteria are adopted to select the best matching map line for each sensed line. 3.2. EKF-Update Two steps are involved: Best Match Selection and Kalman Filtering. As usual, to select the most likely match in the set M Cj , we refer the current estimated robot position. In
178
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
particular, for every map line mk included in M Cj , we compute the measurement model ρ˜mk and θ˜mk (its complete expression is given in equation 6, when describing the EKF), which returns the estimated distance and angle given the current position estimate. Next, we assign a likely score lsjk to each couple < lj , mk >∈ M Cj according to Equation 4.
lsjk = Θρ ρlj − ρ˜mk 2 + Θθ θlj − θ˜mk 2 (4) In Equation 4, Θρ and Θθ are experimentally determined in order to differently weight errors with respect to the distance and to the angle; the j th measure (ρlj , θlj ) feeded to the EKF corresponds to the matching couple mcjk such that: (5) mcjk = arg max lsjk k
If more sensed line lj are available, the following step is iteratively invoked in order to process all the matching pairs. In the last step, position is corrected through an Extended Kalman Filter. An EKF is a linear recursive estimator for the state of a dynamic system described by a nonlinear process f and a measurement model h. Its good properties and limitations are well known in literature, thus not requiring to be discussed here. However, it is interesting to notice that, in our case, the measurement model h is linear, therefore partly allowing to ignore some of the problems which are related to linearization (unfortunately, the kinematic model of our differentially driven vehicle is not linear). In particular, if
T sˆr,t = x ˆr,t , yˆr,t , θˆr,t is the a priori estimate of the robot’s position at the tth instant,
T and the 2-dimensional vector z˜m ,t = ρ˜m ,t , θ˜m ,t is the predicted measure (i.e., k
k
k
corresponding to the map segment mk in the matching couple mcjk ), the measurement model is given by the following linear equations: a mk x ˆr + bmk yˆr + cmk b mk ˜ − θˆr , θmk = atan (6) ρ˜mk = amk a2mk + b2mk I.e., we express segment lines mk in the map through the implicit equation amk x + bmk y + cmk = 0 of the corresponding straight line. An observability analysis have been performed, showing that the observability space, for each measurement lj , has dimension 2; i.e., the system is not completely observable. However, since the measurement model h varies depending on the equation of the line mk which is currently considered, subsequent measurements – corresponding to different, hopefully perpendicular lines – allow to correct all the component of the robot state.
4. Experimental Results Experiments allowed to validate the intuition about a Low Frequency Cross-Section. In particular, the laser scanner has been set at three different heights from the ground: 10 cm (bottom view), 90 cm (mid view) and roughly 2 meters (top view). From these views we collected about one thousand scans in different places of our department, a house, and a garage; each scan lasts about five minutes, thus requiring to average results over time. The experiments show that bottom and mid views (Fig. 2) almost exhibit a comparable behavior for the U parameter, whilst for the top view (in darker gray) U
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
179
Figure 4. Left: simulated robot (8 hours); Middle: real robot (2 hours); Right: snapshots of the experiment
is significantly smaller. Obstacle profiles, when detected in the LFCS, are much more regular than below; moreover, the variance over time of each scan is significant in the bottom and mid view, whilst it is almost zero in the top view (small fluctuations are due to sensor noise). Experiments with a longer duration of about 2 hours, shown in Fig. 3, confirm this evidence. Exploiting these results, we performed many localization experiments, both in simulation and with two real robots (a modified TRC Labmate base and the home-built robot Staffetta). For simulation, we simulate errors both in rangefinder measurements [2] and in odometry. In Fig. 4 (left), the simulated robot is asked to navigate through an environment which replicates the ground floor of our department: the figure depicts the superimposition between the ground truth and the estimated robot position after 8 hours simulation. In Fig. 4 (middle) a similar experiment is shown with a real robot; the system proves to be able to localize itself for almost 2 hours (the maximum time allowed by batteries), even it is almost continuously surrounded by students (Fig. 4 right). Moreover, it demonstrates its robustness when moved to a different scenario, i.e.: the second floor of our department (Fig. 5). During experiments the normal life of the department is not interrupted: people are walking by, furniture are displaced etc. In spite of this, positioning have an accuracy of about 2.5 cm when robots are moving at a speed of 0.6m/sec.
5. Conclusion The paper shows that map-based self-localization within a dynamic environment can be successfully performed by opportunely designing the laser to scan a 2D plane where the environment slowly changes, both temporally and spatially (i.e., which we call a Low Frequency Cross-Section). This concept is formalized through the unevenness parameter U , which attributes a score to the “roughness” of laser scans; as expected, scans taken outside the LFCS are much less stable and regular. Localization experiments – performed both in simulation and with a real robot – demonstrate that the approach is very robust and precise even in highly crowded environments.
180
F. Mastrogiovanni et al. / Designing a System for Map-Based Localization
Figure 5. Top: simulated robot (8 hours); Bottom: real robot (2 hours)
References [1] I.J. Cox, BLANCHE - An Experiment in Guidance and Navigation of an Autonomous Robot Vehicle, IEEE Int. J. of Robotics and Automation, Vol. 7 no 2 (1991), 193–204. [2] P. Jensfelt and H. I. Christensen, Pose Tracking Using Laser Scanning and Minimalistic Environmental Models, IEEE Int. J. of Robotics and Automation, Vol. 17 no 2 (2001), 138–147 [3] S. Thrun, C. Martin, Y. Liu and, Hahnel, R. Emery-Montemerlo, D. Chakrabarti and W. Burgard, A Real-Time Expectation-Maximization Algorithm for Acquiring Multilanar Maps of Indoor Environments With Mobile Robots, IEEE Int. J. of Robotics and Automation, Vol. 20 no 3 (2004), 433–442. [4] S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Creemers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg, N. Roy, J. Schulte and D. Schulz, Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva, Int. J. of Robotics Research, Vol. 19 no 11 (2000), 972–999. [5] D. Fox, Markov Localization: A Probabilistic Framework for Mobile Robot Localization and Navigation, PhD Thesis, Institute of Computer Science III, University of Bonn (1998) [6] D. Fox, S. Thrun, W. Burgard, and F. Dellaert, Particle Filters for Mobile Robot Localization, Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, Springer-Verlag, eds. A. Doucet and N. de Freitas and N. Gordon, New York (2001). [7] J. Nieto, J. Guivant, and E. Nebot, DenseSLAM: The Unidirectional Information Flow (UIF), Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004). [8] J. B. Folkensson and H. Christensen, Robust SLAM, Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004). [9] D. Sack and W. Burgard, A Comparison of Methods for Line Extraction from Range Data, Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004) [10] M. Bennewitz, W. Burgard, and S. Thrun, Using EM to learn motion behaviors of persons with mobile robots, Proc. of IROS’02, Lausanne, Switzerland (2002). [11] R. O. Duda and P. E. Hart, Pattern classification and scene analysis, Wiley and Sons, New York (1973). [12] Y. Liu, R. Emery, D Chakrabati, W. Burgard, S. Thrun, Using EM to learn 3D models of indoor environments with mobile robots, Proc. of the Int. Conf. on Machine Learning (2001)
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
181
Appearance-based Localization of Mobile Robots using Local Integral Invariants Hashem Tamimi a , Alaa Halawani b Hans Burkhardt b and Andreas Zell a a Computer Science Dept., University of Tübingen, Sand 1, 72076 Tübingen, Germany. b Institute for Computer Science, University of Freiburg, 79110 Freiburg, Germany. Abstract. In appearance-based localization, the robot environment is implicitly represented as a database of features derived from a set of images collected at known positions in a training phase. For localization the features of the image, observed by the robot, are compared with the features stored in the database. In this paper we propose the application of the integral invariants to the robot localization problem on a local basis. First, our approach detects a set of interest points in the image using a Difference of Gaussian (DoG)-based interest point detector. Then, it finds a set of local features based on the integral invariants around each of the interest points. These features are invariant to similarity transformation (translation, rotation, and scale). Our approach proves to lead to significant localization rates and outperforms a previous work. Keywords. Appearance-based robot localization, integral invariants, (DoG)-based interest point detector.
1. Introduction Vision-based robot localization demands image features with many properties. On one hand the features should exhibit invariance to scale and rotation as well as robustness against noise and changes in illumination. On the other hand they should be extracted very quickly so as not to hinder other tasks that the robot plans to perform. Both global and local features are used to solve the robot localization problem. Global features are extracted from the image as a whole, such as histograms [18], Principal Component Analysis (PCA)-based features [3], and integral invariants [17]. On the other hand, local features are computed from areas of high relevance in the image under consideration such as Scale Invariant Feature Transform (SIFT) [6], kernel PCA-based features [16], and wavelet-based features [15]. Local features are more commonly employed because they can be computed efficiently, are resistant to partial occlusion, and are relatively insensitive to changes in viewpoint. There are two considerations when using local features [5]: First, the interest points should be localized in position and scale. Interest points are positioned at local peaks in a scale-space search, and filtered to preserve only those that are likely to remain stable over transformations. Second, a signature of the interest point is built. This signature should be distinctive and invariant over transformations caused by changes in camera pose as well as illumination changes. While point localization and signature aspects of interest point algorithms are often designed together, they can be considered independently [7].
182
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
In this paper we propose the application of the integral invariants to the robot localization problem on a local basis. First, our approach detects a set of interest points in the image based on a Difference of Gaussian (DoG)-based interest point detector developed by Lowe [6]. Then, it finds a set of descriptive features based on the integral invariants around each of the interest points. These features are invariant to similarity transformation (translation, rotation, and scale). Our approach proves to lead to significant localization rates and outperforms a previous work that is described in Section 4.
2. Integral Invariants Following is a brief description of the calculation of the rotation- and translationinvariant features based on integration. The idea of constructing invariant features is to apply a nonlinear kernel function f (I) to a gray-valued image, I, and to integrate the result over all possible rotations and translations (Haar integral over the Euclidean motion): T[ f ](I) =
2π 1 M−1 N−1 P−1 ∑ ∑ f (g(n0 , n1 , p P )I) PMN n∑ 0 =0 n1 =0 p=0
(1)
where T[ f ](I) is the invariant feature of the image, M, N are the dimensions of the image, and g is an element in the transformation group G (which consists here of rotations and translations). Bilinear interpolation is applied when the samples do not fall onto the image grid. The above equation suggests that invariant features are computed by applying a nonlinear function, f , on the neighborhood of each pixel in the image, then summing up all the results to get a single value representing the invariant feature. Using several different functions finally builds up a feature space. To preserve more local information we remove the summation over all translations. This results in a map T that has the same dimensions of I: 1 P−1 2π I (2) (T [ f ] I) (n0 , n1 ) = ∑ f g n0 , n1 , p P p=0 P Applying a set of different f ’s will result in a set of maps. A global multidimensional feature histogram is then constructed from the elements of these maps. The choice of the non-linear kernel function f can vary. For example, invariant features can be computed by applying the monomial kernel, which has the form:
P−1
f (I) =
∏ I xp , yp
1
P
(3)
p=0
One disadvantage of this type of kernels is that it is sensitive to illumination changes. The work in [10] defines another kind of kernels that are robust to illumination changes. These kernels are called relational kernel functions and have the form: f (I) = rel (I (x1 , y1 ) − I (x2 , y2 )) with the ramp function
(4)
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
183
Figure 1. Calculation of f = rel (I(0, 3) − I(4, 0)), by applying Equation 2, involves applying the function rel to the difference between the grey scale value of the pixels that lie on the circumference of circle of radius 3 and pixels that lie on the circumference of another circle of radius 4 (taking into consideration a phase shift of π 2 between the corresponding points) and taking the average of the result [17].
rel (γ ) =
⎧ ⎨1 ⎩
ε −γ 2ε
0
if γ < −ε if − ε ≤ γ ≤ ε if ε < γ
(5)
centered at the origin and 0 < ε < 1 is chosen by experiment. Global integral invariants have been successfully used for many applications such as content-based image retrieval [12], texture-classification [9], object recognition [8], and robot localization [17]. Figure 1 illustrates how these features are calculated. Please refer to [12] for detailed theory.
3. DoG-based Point Detector The interest points, which are used in our work, were first proposed as a part of the Scale Invariant Feature Transform (SIFT) developed by Lowe [6]. These features have been widely used in the robot localization field [11] [14]. The advantage of this detector is its stability under similarity transformations, illumination changes and presence of noise. The interest points are found as scale-space extrema located in the Difference of Gaussians (DoG) function, D(x, y, σ ), which can be computed from the difference of two nearby scaled images separated by a multiplicative factor k: D (x, y, σ ) = (G (x, y, kσ ) − G (x, y, σ )) ∗ I (x, y) = L (x, y, kσ ) − L (x, y, σ )
(6)
where L(x, y, σ ) defines the scale space of an image, built by convolving the image I(x, y) with the Gaussian kernel G(x, y, σ ). Points in the DoG function, which are local extrema in their own scale and one scale above and below are extracted as interest points. The interest points are then filtered for more stable matches, and more accurately localized to scale and subpixel image location using methods described in [2].
184
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
4. Using Global Integral Invariants For Robot Localization In [17], integral invariants are used to extract global features for solving the robot localization problem by applying Equation 2 to each pixel (n0 , n1 ) in the image I. The calculation of the matrix T involves finding an invariant value around each pixel in the image which is time consuming. Instead of this, Monte-Carlo approximation is used to estimate the overall calculation [13]. This approximation involves applying the nonlinear kernel functions to a set of randomly chosen locations and directions rather than to all locations and directions. Global features achieve robustness mainly because of their histogram nature. On the other hand, local features, extracted from areas of high relevance in the image under consideration, are more robust in situations where the objects in images are scaled or presented in different views [4]. Such situations are often encountered by the robot during its navigation. In the next sections we modify the global approach by applying the integral invariants locally around a set of interest points.
5. Extracting Local Integral Invariants Unlike the existing approach, explained in Section 2, the features that we propose are not globally extracted; they are extracted only around a set of interest points. Our approach can be described in the following steps: 1. Interest point detection: The first stage is to apply the DoG-based detector to the image in order to identify potential interest points. The location and scale of each candidate point are determined and the interest points are selected based on measures of stability described in [6]. 2. Invariant features initial construction: For each interest point located at (n0 , n1 ) we determine the set of all points which lie on the circumference of a circle of radius r1 . We use bilinear interpolation for sub-pixel calculation. Another set of points that lie on a circumference of a circle of radius r2 are determined in the same manner. Both circles have their origin at (n0 , n1 ). To make the features invariant to scale changes, the radii are adapted linearly to the local scale of each interest point. This way the patch that is used for feature extraction always covers the same details of the image independent of the scale. 3. Nonlinear kernel application: A non-linear kernel function is applied to the values of the points of the two circles. Each point located at (x0 , y0 ) on the circumference of the first circle is tackled with another point located at (x1 , y1 ) on the circumference of the second circle, taking into consideration a phase shift θ between the corresponding points. This step is repeated together with step 2 for a set of V kernel functions fi , i = 1, 2, · · · ,V . The kernels differ from each other by changing r1 , r2 and θ . Finally we apply Equation 7 for each interest point located at (n0 , n1 ). 1 P−1 2π Fi (n0 , n1 ) = ∑ fi g n0 , n1 , p I , i = 1, 2, · · · ,V. (7) P p=0 P We end up with a V -dimensional feature vector, F, for each single interest point.
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
(a) Step (1)
(b) Step (2)
(c) Step (3)
(d) Step (4)
(e) Step (5)
(f) Step (6)
(g) Step (7)
(h) Step (8)
185
Figure 2. Eight different image samples that belong to the same reference location.
6. Experimental Results In this section we present the experimental results of our local integral invariants compared with the global integral invariants reviewed in Section 4. 6.1. Setting up the Database To simulate the robot localization we use a set of 264 gray scale images taken at 33 different reference locations. Each has a resolution of 320 × 240. In each reference location we apply the following scenario capturing an image after each step: (1) The robot stops. (2) It translates 50 cm to the right. (3) The pan-tilt unit rotates 20 degrees to the left. (4) The robot moves 50 cm ahead. (5) The pan-tilt unit rotates 40 degrees to the right. (6) The pan-tilt unit rotates 20 degrees to the left. (7) The pan-tilt unit moves five degrees up. (8) The pan-tilt unit moves 10 degrees down. Figure 2 includes eight sample images of one reference location. The database is divided into two equal parts. 132 images are selected for training and 132 for testing. This partitioning is repeated 50 times with different combinations for training and testing images. The average localization rate is computed for each combination. We assume that optimal localization results are obtained when each input image from one of the reference locations matches another image that belongs the same reference location. 6.2. Global Integral Invariants In order to compare our work with the work of [17] which involves calculating the global features, a set of 40 × 103 random samples is used for the Monte-Carlo approximation, which is also suggested in [13] for best performance. For each sample we apply a set of three different kernels. Both monomial and texture kernel functions are investigated for best localization accuracy using the above images. For each image a single
186
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
D-dimensional histogram is build with D = 3. Each dimension contains 8 bins which has experimentally led to best results. The histograms are compared using the l 2 − Norm measure. 6.3. Local Integral Invariants We use the following parameters when implementing our descriptive features: For each interest point we set V = 12 which gives us a 12-dimensional feature vector that is generated using a set of either relational kernel functions or monomial kernel functions. Best results were obtained with ε = 0.098 in Equation 5. When evaluating the localization approach we first compare each individual feature vector from the image in the query with all the other feature vectors, extracted from the training set of images, using the l 2 − Norm measure. Correspondences between the feature vectors are found based on the method described in [1] which leads to robust matching. Then we apply a voting mechanism to find the corresponding image to the one in query. The voting is basically performed by finding the image that has the maximum number of matches. Figure 3 gives an example of the correspondences found between two different images using the proposed approach.
Figure 3. Matching two different images using the proposed approach.
6.4. Results Figure 4 demonstrates the localization rate of the proposed approach and the existing one. It can be seen that the local integral invariants-based approach performers better than the global integral invariants-based approach using any of the two kernel types but gives best results using the relational kernel function. The global integral invariants-based approach has better results when using the monomial kernel function. One way to test if the results of the proposed approach are statistically significant or not, is to calculate the confidence interval for each experiment, and to check if the confidence interval of the mean values of the proposed approach does not overlap the confidence interval of the mean values of the
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
187
100 Localization rate (%)
90 80 70 60 50 40 30 20 10 0
LR
LM
GR
GM
Figure 4. The localization rate of the local integral invariants-based approach using relational kernel functions (LR) and monomial kernel functions (LM), compared with the existing global integral invariants-based approach using relational kernel functions (GR) and monomial kernel functions (GM).
existing approach. The 95%-confidence interval test is used here, which means that the real mean value of each experiment lies in this interval with the propability of 95%. On the top of each bar in Figure 4, the following information are represented from outside to inside: The maximum and minimum values of the data, the standard deviation and the 95%−confidence interval. The confidence interval in the bar (LR) does not overlap any confidence interval of the other bars. This means that the results of this local integral invariants-based approach are significantly better than the other results. The average localization time of the global approach and the local approach are 0.42 and 0.86 seconds respectively using a 3GHz Pentium 4. The additional computation time in the case of the local approach is due to the additional complexity in the feature extraction stages and the computation overhead during the comparison of the local features.
7. Conclusion In this paper we have proposed an appearance-based robot localization approach based on local integral invariants. The local features have a compact size but are capable of matching images with high accuracy. In comparison with using global features, the local features show better localization results with a moderate computational overhead. Using local integral invariants with relational kernel functions leads to significant localization rate than the monomial kernel functions.
Acknowledgment The first and second author would like to acknowledge the financial support by the German Academic Exchange Service (DAAD) of their PhD. scholarship at the Universities of Tübingen and Freiburg in Germany. The authors would also like to thank Christian Weiss for his assistance during the experimental part of this work.
188
H. Tamimi et al. / Appearance-Based Localization of Mobile Robots
References [1] P. Biber and W. Straßer. Solving the Correspondence Problem by Finding Unique Features. In 16th International Conference on Vision Interface, 2003. [2] M. Brown and D. Lowe. Invariant Features from Interest Point Groups. In British Machine Vision Conference, BMVC, pages 656–665, Cardiff, Wales, September 2002. [3] H. Bischof H, A. Leonardis A, and D. Skocaj. A Robust PCA Algorithm for Building Representations from Panoramic Images. In Proceedings 7th European Conference on Computer Vision (ECCV), pages 761–775, Copenhagen, Denmark, 2002. [4] A. Halawani and H. Burkhardt. Image retrieval by local evaluation of nonlinear kernel functions around salient points. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), volume 2, pages 955–960, Cambridge, United Kingdom, August 2004. [5] Y. Ke and R. Sukthankar. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In CVPR (2), pages 506–513, 2004. [6] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004. [7] K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(10):1615–1630, 2005. [8] O. Ronneberger, H. Burkhardt, and E. Schultz. General-purpose Object Recognition in 3D Volume Data Sets using Gray-Scale Invariants - Classification of Airborne Pollen-Grains Recorded with a Confocal Laser Scanning Microscope. In Proceedings of the International Conference on Pattern Recognition (ICPR), volume 2, Quebec, Canada, 2002. [9] M. Schael. Invariant Grey Scale Features for Texture Analysis Based on Group Averaging with Relational Kernel Functions. Technical Report 1/01, Albert-Ludwigs-Universität, Freiburg, Institut für Informatik, January 2001. [10] M. Schael. Texture Defect Detection Using Invariant Textural Features. Lecture Notes in Computer Science, 2191:17–24, 2001. [11] S. Se, D. Lowe, and J. Little. Vision-based Mobile robot localization and mapping using scale-invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 2051–2058, Seoul, Korea, May 2001. [12] S. Siggelkow. Feature Histograms for Content-Based Image Retrieval. PhD thesis, AlbertLudwigs-Universität Freiburg, Fakultät für Angewandte Wissenschaften, Germany, December 2002. [13] S. Siggelkow and M. Schael. Fast Estimation of Invariant Features. In W. Förstner, J. M. Buhmann, A. Faber, and P. Faber, editors, Mustererkennung, DAGM, pages 181–188, Bonn, Germany, September 1999. [14] H. Tamimi, H. Andreasson, A. Treptow, T. Duckett, and A. Zell. Localization of Mobile Robots with Omnidirectional Vision using Particle Filter and Iterative SIFT. In Proceedings of the 2005 European Conference on Mobile Robots (ECMR05), Ancona, Italy, 2005. [15] H. Tamimi and A. Zell. Vision-based Global Localization of a Mobile Robot using Wavelet Features. In Informatik aktuell, editor, Autonome Mobile Systeme (AMS), 18. Fachgespräch, Karlsruhe, 4. - 5. December, pages 32–41, Karlsruhe, Germany, 2003. [16] H. Tamimi and A. Zell. Vision based Localization of Mobile Robots using Kernel approaches. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), pages 1896–1901, 2004. [17] J. Wolf, W. Burgard, and H. Burkhardt. Robust Vision-based Localization by Combining an Image Retrieval System with Monte Carlo Localization. IEEE Transactions on Robotics, 21(2):208–216, 2005. [18] C. Zhou, Y. Wei, and T. Tan. Mobile Robot Self-Localization Based on Global Visual Appearance Features. In Proc. IEEE Int. Conf. Robotics & Automation (ICRA), pages 1271–1276, 2003.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
189
Enhancing Self Covertness in a Hostile Environment from Expected Observers at Unknown Locations Mohamed MARZOUQI and Ray JARVIS Intelligent Robotics Research Centre, Monash University, Australia {mohamed.marzouqi, ray.jarvis}@eng.monash.edu.au
Abstract. In this paper, a new approach is introduced to the field of covert robotics. A mobile robot should minimize the risk of being exposed to observers which are expected to be operating within the same environment but at unknown locations. The robot should enhance its covertness by hiding. A proper hiding point should be chosen while taking into account the risk of the path to it. The approach depends on the assumption that the probability of an object to be observed is proportional to its exposure to the free-spaces of the environment, where a possible observer might be standing. Given that, a visibility risk map is generated that is used to find the hiding point and a covert path to it. The approach has been tested on simulated known environments. A number of test cases are presented. Keywords. Mobile robot, covert path planning, hiding.
Introduction This work is part of our research on covert robotics [1,2,3,4], where the aim is to study the strategies, behaviors and techniques that can be used to create mobile robots with the ability to accomplish different types of navigation tasks covertly (i.e. without being observed by hostile observers in the operating environment). Covert robotics systems can be useful in various applications, especially where safety and security are required (e.g. police and military fields). To date, the research on covert navigation has shown a noticeable weakness. Our previous work in this field has concentrated on planning a covert path between two nominated points (see [1,3,4]). A robot would require one or more different covert navigation abilities to accomplish a real-life covert mission successfully. Examples of such abilities are: covert navigation to a specified destination, tracking or approaching a target in a stealthy manner, and searching or exploring a hostile environment covertly. In this paper, the ability of hiding from observers whose locations within the environment are unknown is considered. Hiding is an important tool that a robot may use during its mission to enhance its covertness. For example, robot carrying out a task in a hostile environment may need to hide if it detects possible risk of being discovered. In previous work, number of strategies has been used for robots hiding. One example is to plan a path to a dark spot in the environment to hide from expected observers. It has been used in number of research problems for the purpose of monitoring an area covertly [5,6]. In such research, the robot is equipped with a light
190
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
intensity sensor to search for the darkest area where the robot can stand to be hidden, or at least to be less noticeable. The success of this strategy depends heavily on the environment’s lighting condition. Another strategy to minimize self exposure during navigation is to minimize the size of the robot. Insect robotics [7,8] is an obvious example that can be used for such a purpose. Dependence upon the robot’s size for covertness is realistic, since an insect robot is either difficult to notice, or to be recognized as a robot if it is observed. On the other hand, a robot’s capability decreases along with its size. Insect robots are mainly used for spying but not for heavy-duty jobs (e.g. carrying an object) which might be required in a covert mission. Other examples are pursuit-evasion and hide-and-seek games, which are known problems in autonomous robots field. The aim is to guide one or a team of robots to search for or track one or a group of robots [9,10]. The work done in this area has only focused on developing the pursuer/seeker side, while the evader/hider is controlled manually or has a random behavior to escape the opponent’s observation. In [2], we have presented hiding as a path planning strategy. The aim is to allow a mobile robot in a known stationary environment to plan a path with low exposure to observers at initially known locations. This is achieved by choosing a hiding point that has maximum guaranteed time before possible observation by any observer (i.e. the farthest point in distance from the observers’ view). In this paper, the work is extended to the case of unknown observers’ locations. The strategy used for this case is based on the assumption that the probability of an observer (who is moving randomly) to detect the robot is proportional to the robot’s position exposure to the free-spaces of the environment. Depending on this assumption, a hiding point is chosen while considering the risk of the path to it. The organization of this paper is as follows. The definition and assumptions of the problem are presented in section 1. The hiding strategy is explained section 2. The approach algorithm is presented in section 3. Simulation experiments have been conducted to evaluate the approach. In section 4, results for different cases are illustrated. The conclusion and our direction for future research are in section 5.
1. Problem Definition The robot’s objective is to enhance its covertness in the environment by moving (if necessary) to a location that would offer a higher covertness than its current location. The robot’s covertness is violated by being exposed to at least one observer. The following assumptions have been made. The operating environment is a 2-d space that is represented as a uniform grid-based map, each of its cells are either a free-space or occupied by an obstacle that no agent (i.e. the robot or an observer) can penetrate. Obstacles within the environment are the only means of maintaining covertness. They are assumed opaque and higher than the sight level of all agents. The environment is assumed to be known and stationary (neither obstacles nor free-spaces change). It can be indoor or outdoor but with specified real or virtual borders to limit the search space for a path. The robot can expect observers to be within the environment but has no knowledge about their locations or behaviour (e.g. if the observer is a sentry, the robot would not know its strategy in searching for intruders). The expected observers could be moving around with no fixed positions.
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
191
2. The Hiding Strategy A proper hiding point is a location that is less likely to be observed (or inspected) by the observers. Choosing a hiding point would normally be based on the environmental knowledge or information given about the observers (e.g. a room that is least visited by the observers is a good hiding place). Our hiding strategy will only depend on the environment structure as the only knowledge available. Based on human covert behavior in hiding, a good hiding place is the one with minimum exposure to the free-spaces of the environment. This is considered an important factor since the probability of a position to be exposed to an observer at an unknown location would be mostly proportional to its exposure to the free-spaces. A position that is exposed to the whole environment (e.g. a center of a star shape environment) is guaranteed to be exposed to any observer. Still, there are other factors that can affect choosing a hiding point other than exposure (e.g. a hallway is riskier as a hiding place than an isolated store room). The rest of this paper will concentrate on the exposure only as a crucial hiding factor, keeping other factors for future research. The first step in the approach is to build a visibility map (introduced previously in [1]). In the visibility map, the exposure value of each of its free-space cells is calculated. The exposure value represents the number of free-space cells visible to the corresponding cell (Each cell’s value will be in the range from 0 to the total number of free-space cells). Fig. 1 shows an example of a calculated visibility map where the exposure values are represented in grayscale. A cell gets darker as it becomes more hidden, and vice versa. The basic way to calculate the exact exposure value at each cell is to use a ray shooting method. At any cell, the free-space cells are counted at each direction from that cell, where the exposure value is the sum of the counted cells in all directions. This process is time consuming for high resolution maps as its time complexity is O(n2), where n is the number of free-space cells. A faster method is described in [1] in details. The time consumed in generating the visibility map in the case of known environments is acceptable, as all calculations will be performed only once offline.
Figure. 1. A visibility map of an environment. Obstacles are in white. The brightness level of any location in the free-space represents its exposure to the free-spaces.
192
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
A variation to the described case is if the robot expects the observers to be in some places than others (e.g. hallway). Such knowledge can be implemented in the visibility map by making the cells exposed to those areas have a higher calculated visibility risk value. To do that, a map is created that each of its cells has a probability value of the robot’s expectation for an observer to visit that cell. The visibility cost at any cell will be the sum of the probability values at those cells exposed to it. The rest of this paper will concentrate on uniform probability of the observers’ locations. Choosing the hiding point is only part of the solution. A covert path to the hiding point is the second part which may also affect the choice of the hiding point (e.g. a point with maximum covertness may not be considered if the path to it is risky). The next section presents our approach that is based on the described strategy.
3. The Hiding Algorithm Let t denote the hiding period that starts from the time the robot decides to hide. t is initially known (e.g. the robot can estimate that the detected risk will be cleared after a period of time). A hiding plan is evaluated by measuring the risk of both the hiding spot and the path to it. H(c) in (1) is the hiding cost that represents the total risk of choosing a cell c as a hiding point and navigating to it through p. m is the time the robot needs to follow a path from its current location to c. V(c) is the visibility cost at cell c (which is the exposure value calculated in the visibility map). pi is the robot’s position along p at moment i. A cell is ignored if the time to approach it is higher than the hiding time (i.e. m>t). To describe (1) in words, the total cost of any hiding plan is the sum of the visibility costs that encountered at each time frame of the hiding period. This is logical since the risk of being exposed is proportional to the time the robot spends in the environment and the risk of its position at any moment during that time. The chosen hiding point is the one with the lowest H(c). It should be obvious that the robot’s current position is also considered as a hiding point option with m = 0. m
H (c )
¦V ( p ) V (c).(t m) i
,t ! m
(1)
i 0
The equation in (1) will form a balance that considers the hiding time as well as the chosen point and path. For example, if the robot needs to hide for a very short period of time, then choosing a hiding point with a low exposure value but navigating to it through a highly exposed path would be irrational. On the other hand, if the robot needs to hide for a very long time then a hiding point with the lowest visibility cost may be chosen regardless of the risk of the path to it. The chosen path to the hiding point can be the shortest path. A better approach is to follow a covert path that can minimize the visibility cost of approaching the hiding point, and hence produce a better hiding plan. For example, a good hiding point might be ignored if the short path to it is highly exposed, on the other hand, it might be considered if there is a covert path to it with low exposure. Finding a covert path will depend on the already calculated visibility map, as described next. The Dark Path (DP) algorithm has been used here to find the covert path to each cell. We have introduced this algorithm in previous work [1,3,4] as an optimal covert
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
193
path planner between two nominated points. The DP algorithm is an extension of the known Distance Transform (DT) algorithm [11] (a short collision-free path planning algorithm, a brief description is in [1]). An important advantage offered by the DT algorithm is its ability to accept other cost factors other than distance. In the DP algorithm, the visibility cost of the path is considered as an additional cost. Different applications would require different visibility cost factors that can satisfy the path’s requirements. In our case the visibility cost at any cell is its exposure. A generated path is a balance between shortness and covertness. Similar to the DT algorithm, a cost map is created that each of its cells holds a value that represent the minimum cost to the destination. The process of finding the DP cost map is as follows. Firstly, the robot’s cell location is given a 0 value in the created cost map, which is where the cost propagation will start from. All other cells get a high value. Secondly, a cost propagation process is conducted to assign each cell a value equals the least value amongst the n neighbors’ costs, added to that the time W to move to that neighbor and the visibility cost of the cell itself. The DP cost at each cell is represented by (2). Į is used to balance between the path shortness and covertness. It is set to 1 here. Once the cost map is generated, each cell will hold a value that represents the minimum visibility/time cost to approach it from the robot’s current location. A covert path to any cell c can be generated by tracing the steepest descent trajectory from c to the robot. i 8
Cost (c )
min Cost (ni ) W i D u V (c )
(2)
i 1
The first part of the equation at (1) (i.e. the path visibility risk cost) can be calculated efficiently using the created DP cost map without the need to follow the path to each point and sum its visibility costs. This is done by creating a time map that the value at each of its cells is m (i.e. the time to reach that cell through the covert path). This is done during the process of generating the DP cost map. Initially, the robot location is given an m value of 0 in the time map. Then, at the time of generating the DP cost map using (2), m of c in the time map equals m of its neighbor (which it gets its visibility/time cost from) added to that the value of W . Once the time map is generated, it is subtracted from the DP cost map which will produce a map that each of its freespace cells has the value of the total visibility cost of the covert path to that cell (scaled by Į).
4. Experimental Results 4.1. Observers are Expected after a Period of Time This is a simple case scenario which is demonstrated here to concentrate on the choice of the hiding spot but not the path to it. The robot knows that the observers are currently outside the environment but expected to be inside later. It has enough time to approach any hiding point. The path to the chosen point will be safe, hence the shortest path should be followed. The described algorithm can be easily implemented here by setting Į in (2) to 0. This will make the DP algorithm generate the shortest path and make the calculated visibility cost for any path =0. t is set to a value higher than the
194
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
maximum m in the time map. Fig. 2 shows an example of a simulated experiment. The robot at R (which is represented as a point for simplicity) has followed a short path (the continuous line) to the point H1 which has the minimum visibility cost in the map. The visibility map is shown on the top of the environment map for evaluation purposes. If the traveled distance is a concerned to safe motion power or find a closer hiding point to where the robot is operating, then the robot can choose the nearest cell that has a visibility cost within a specified threshold to the maximum visibility cost. In Fig. 2, the robot has H2 as another hiding spot option that is closer to it than H1. The path to H2 is the dash line. 4.2. Observers are inside the Environment In this case, the robot expects the observers to be within the environment, but at unknown locations. Fig. 3 shows an example. Given that the hiding time t is set to a very high value, the robot at R has chosen H2 as it has the minimum exposure value. The robot has followed a covert path to that point. When setting t to a lower value (3 times less), the robot has chosen H1 since the path to H2 is risky compared to the remaining hiding time that the robot will stay at H2. The difference between the calculated hiding costs at H1 and H2 for the two different time settings are shown in Table 1 in its first two rows. Fig 4 represents a combination of the cases in Fig. 2 and 3. The robot expects the observers to be in the environment within a period of time (which we will call here the safe time). The safe time in this case is not long enough to allow the robot to reach any hiding point in the environment safely through the shortest path. To accommodate that, Į is set to 0 only if the time to reach a cell is less than the safe time (using the time map). Otherwise it is set to 1. The shaded area around the robot in Fig. 4 is the region which the robot can reach safely any point within it through the shortest path. t has been set here to a low value, however, H2 has been selected since the highly exposed part of its path is within the safe area. The hiding cost at H2 becomes accordingly less than H1 and hence it has been chosen as the hiding point (see Table 1).
Figure. 2. The observers are expected to be in the environment after a long period of time. The robot at R has followed a short path (continuous line) to the best hiding point at H1. H2 is another hiding option that is closer than H1 but with a little higher visibility cost (environment size is 200x120).
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
195
Figure. 3. The observers are inside the environment. When t is set to a high value, H2 is chosen as the best hiding point ignoring its path cost. When t is set to a low value, H1 is preferred. In both cases the robot has followed a covert path to each point.
H2
Figure. 4. The shaded area is the region which the robot has the time to reach any point within it safely through the shortest path. Observers are expected to be in the environment after the safe time. Even t has been set here to a low value, the robot chooses H2 since the highly exposed part of its path is within the safe region.
Table 1. A comparison table of the calculated H() at the points H1 and H2. The chosen hiding point for each case is in bold. H(H1)
H(H2)
(x1000)
(x1000)
0
1,098
1,338
900
0
2562
1,656
300
50
810
750
Fig.
Hiding Time
Safe Time
3
300
3 4
196
M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment
5. Conclusion A hiding strategy is presented for a mobile robot that needs to minimize the risk of being exposed to expected observers in the environment at unknown locations. The approach so far depends only on the exposure to free-spaces risk factor. Different factors should be studied that can be combined with the exposure factor to produce more efficient results. Moreover, hiding in unknown environments would be another interesting extension that may depend on the same presented strategy.
Acknowledgment The authors would like to thank Etisalat University College and the Emirates Telecommunication Corporation (Etisalat), United Arab Emirates, as well as the Intelligent Robotics Research Centre (IRRC), Monash University, for providing the financial support for this work.
References [1]
[2] [3]
[4]
[5] [6] [7] [8]
[9]
[10] [11]
Mohamed Marzouqi and Ray A. Jarvis, “Covert path planning for autonomous robot navigation in known environments,” Proc. Australasian Conference on Robotics and Automation ACRA, Brisbane, December 2003. Mohamed Marzouqi and Ray A. Jarvis, “Covert robotics: Hiding in known environments,” Proc. IEEE Conference on Robotics, Automation and Mechatronics RAM, Singapore, December 2004. Mohamed Marzouqi and Ray A. Jarvis, “Covert robotics: covert path planning in unknown environments,” Proc. Australasian Conference on Robotics and Automation ACRA, Canberra, December 2004. Mohamed Marzouqi and Ray A. Jarvis, “Covert path planning in unknown environment with known or suspected sentry location,” Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Canada, August 2005. B. Kratochvil, “Implementation of scout behaviors using analog sensing methods”, Technical report no. 03-044, University of Minnesota, 2003. P. E. Rybski, et al. “A team of robotic agents for surveillance”, Proceedings of the Fourth International Conference on Autonomous Agents, ACM, pp. 9-16, 2000. J.M. McMichael and M.S. Francis. “Micro air vehicles toward a new dimension in flight”, DARPA, USA, 1997. R. T. Vaughan, G. S. Sukhatme, F. J. Mesa-Martinez, and J. F. Montgomery. “Fly spy: Lightweight localization and target tracking for cooperating air and ground robots”. Distributed Autonomous Robotic Systems, vol. 4, pp. 315-324, 2000. S. M. LaValle, H. H. Gonz'alez-Ba, C. Becker, and J.-C. Latombe. “Motion strategies for maintaining visibility of a moving target”. In Proc. IEEE Int'l Conf. on Robotics and Automation, pp. 731-736, 1997. K. Mann, “Evolving robot behavior to play hide and seek”, The Journal of Computing in Small Colleges archive, vol.18, pp. 257–258, May 2003. R. A. Jarvis, ”Collision-free path trajectory using distance transforms,” Proc. National Conference and Exhibition on Robotics, Melbourne, August 1984. Also in Mechanical Engineering Transactions, Journal of the Institution of Engineers, Australia,1985.
Part 4 Multi-Agent Robots
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
199
An experimental study of distributed robot coordination a
Stefano Carpin a and Enrico Pagello b International University Bremen – Germany b University of Padova – Italy
Abstract. Coordinating the path of multiple robots along assigned paths is a difficult problem (NP-Hard) with great potential for applications. We here provide a detailed study of a randomized algorithm for scheduling priorities we have developed, also comparing it with an exact approach. It turns out that for problems of reasonable size our approach does not perform significantly worse than the optimal one, while being much faster. Keywords. Multi-robot systems, intelligent mobility, randomized algorithms
1. Introduction Planning the motion of multiple robot systems has been a task investigated since the early days of mobile robotics. While the problem is interesting in itself, because of the inherent computational complexity it exhibits, it has to be admitted that few applications have been presented up to now in the context of autonomous mobile robots. This is partly due to the fact that systems with a remarkable number of robots have not been deployed yet. Moreover, when numerous multiple robots are to operate in a shared unstructured environment, one of the holy grails of multi-robot research, their motion is commonly governed by reactive navigation modules rather than by precisely planned paths. That said, it is not implied that the problem in itself lost interest from the applicative point of view. The demand for systems capable of governing the motion of multiple objects in shared environments is instead ever increasing. Applications include for example luggage handling systems at airports, storage systems in factories, moving containers in harbors, and more. The theme of intelligent mobility is envisioned to play a major role in the foreseeable future. Maybe one of the major differences that will be seen will be a decrease of individual robots’ degree of autonomy. Sticking to the storage systems in factories example, mobile carts are not free to move wherever they want, but are rather constrained to proceed along predefined paths, usually hard wired in hardware. Given such a set of predefined paths, coordinating the motion of vehicles along these paths will be asked more and more often. And, of course, it will be asked to find solutions that optimize certain performance indices, like for example, consumed energy, time needed to complete the task, and the alike. From a computational point of view these problems have been studied since quite some time, and their inherent computational complexity has soon been outlined. It is therefore a must to look for approximated and heuristic based
200
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
solutions, if one is required to deal with instances with a multitude of moving objects. In this paper we offer an experimental assessment of a simple randomized approach to solve the coordination problem we have proposed in the past. Section 2 discusses related literature, while the problem is formalized in section 3. The solving algorithm is illustrated in section 4, and the experimental framework and results are shown in section 5. Conclusions are finally provided in section 6. 2. Related work The problem of multi-robot motion planning has been constantly studied in the past. The first major distinction concerns centralized versus decentralized approaches. When a centralized motion planner is used, one process has to plan the motion of whole set of robots. The obvious drawback is in the high dimensionality of the configuration space to be searched. In a decentralized approach, every robot plans its own motions and then has to undergo a stage of negotiations to solve possible collisions with other robots. Decentralized approaches are inherently incomplete, but much faster. Sánchez and Latombe [16] recently speculated that decentralized approaches are likely to show their incompleteness often when used in industrial production plants. In the case of mobile robot systems, however, the environment is likely to be less cluttered and hence these problems are less likely to occur. Efficient methods to solve the single robot motion planning are available [4][9][13] and will not be further discussed here (extensive treatments of the motion planning problem are available, like [6] and [10]). A common approach to solve the multi-robot motion planning problem consists in assigning priorities to robots and planning their motion according to the priorities [5]. When planning the motion of a robot with priority pj , it is necessary to take into consideration the already planned motions for robots with priority pi , where pi < pj . Finding a good priority schema is a hard problem in itself [2]. A related problem, which is the one we will address in this paper, consists in coordinating the path of a set of robots along given specified paths. As the paths may intersect with each other, it might be necessary to stop certain robots when approaching an intersection point in order to give way to other robots and avoid collisions. In this context one is generally interested to minimize certain parameters, like, for example, the time needed by all robots to reach their final destination. This rules out certain trivial coordination schemas, like for example the one where just one robot moves and all the other remains stationary, since the overall required time would be too high. LaValle solved the problem using a game-theoretic approach based on multi-objective optimization [11][12]. The approach allows to tune the algorithm behavior between centralized planning and complete decentralized planning. The author reports that optimal results can be found, but this required significant amount of time. Simèon et al. [17] solve the path coordination problem using a resolution complete algorithm. They show how it is possible to split a given path in segments where the robot will not collide with any other robot, and segments where its path intersects other paths. The authors report results involving up to 150 robots, but where no more than 10 robots are intersecting each other’s path. Computation time is in the order of minutes. Akella and Hutchinson [1] solve the problem of robot coordination along predefined routes by simply varying the start time of each robot. Once a robot starts to move, it never stops until it reaches its target position. Peng and Akella recently extended these ideas addressing the case of robots with kinodynamic constraints [14].
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
201
Figure 1. The simplest case of robot coordination, involving two robots only. R1’s path can be divided into three segments, free, occupied and free respectively. R2’s segment can be divided into two segments. If R2 is given a priority higher than R1, and both robots travel at the same speed, R1 will not be able to reach its final destination, because of R2 sitting on its path.
3. Problem formulation The problem we aim to study is the following: given n robots, assume that n paths, one for each robot, are given. We assume that each path has been subdivided into free and occupied segments. This is feasible in the light of the results presented in [14] and [17]. s(i) Hence, a path pi can be seen as a sequence p1i . . . pi , where each pji is either a free or an occupied segment, and s(i) is the number of segments composing path pi . The task is to find a coordination schema, i.e. a mapping: C : [0, T ] → {1 . . . s(1)} × {1 . . . s(2)} × . . . × {1 . . . s(n)}.
(1) s(i)
In the beginning robot i is positioned at segment p1i , and in the end has to reach pi . While moving through the different segments, a certain amount of time will be spent to traverse each segment. Let t(pji ) be the time spent by robot i to traverse segment pji (1 ≤ j ≤ s(i)). Throughout the paper we assume that robots only move forward along their path, though they can stop at certain points to give way to other robots. However, they never backtrack along their route. The goal is to find a coordination schema that minimizes the time needed by all robots to complete their motion. Formally we aim to minimized the following quantity z = max
1≤i≤n
s(i)
t(pji ).
(2)
j=1
It can be shown that this problem is equivalent to the Job Shop Scheduling (JSS) Problem, which is known to be NP-hard [7]. The JSS problem asks how to schedule n jobs that have to be processed through m machines in way that the overall required time is minimized. The constraints are that no machine can process more than one job at the same time, and that each job has to be processed by the machine in a given order. In the path coordination problem, each robot is a job, and each free or occupied segment is a machine. Therefore, under the assumption that P = N P , the search for a coordination schema that minimizes the time needed to complete the motion task is doomed to take exponential time. This motivates the great number of approximated and heuristic approaches that have been tried.
202
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
4. Random rearrangements In our former work [3] we proposed a simple distributed schema to solve the multi-robot motion planning problem. Similar ideas were later used in [8]. The algorithm is slightly modified here to describe how the various robots can operate to find a valid coordination schema, and is depicted in algorithm 1. The algorithm assumes that a datastructure SpaceTime is available. SpaceTime records which part of the space is occupied or free at a given time. The algorithm picks a random priority schema (line 1), and then schedules the robot motions according to the selected priority schema. The first considered robot will be scheduled to move straight along its path, and SpaceTime will be accordingly updated. When scheduling successive robot motions, it is necessary to check whether the robot can move to its next path segment or if it is already occupied (line 6). If this is possible, the robot moves to its next segment (line 7), or a delay is inserted (line 10). In both cases SpaceTime is updated to record robots’ position (line 11) while time evolves (lines 8 and 10). We here stick to the hypothesis formulated in our former work, i.e. that Algorithm 1 Coordination of n robots 1: pick a random permutation of the set {1 . . . n} 2: for i ← 1 to n do 3: T ime ← 0 4: j←1 5: while j < s(i) do 6: if SpaceTime(Time,pji ) is free then 7: Advance robot i to its segment pji 8: T ime ← T ime + t(pji ) 9: else 10: T ime ← Delay 11: update SpaceTime each robot will apply this procedure to compute a coordination schema, and that in the end the one leading the best value for the variable z formerly defined will be used to execute the real motion. So, when n robots are involved, n independently chosen random priority schemas are tried. The coordination schema does not compute any path and does not alter any of the paths it starts with. Hence there are cases when certain priority schemas do not lead to any solution. For example, if robot i’s goal position is on the path pj given to robot j and i has a higher priority than j, it may reach its goal position before j passes through the same point. At that point robot j will not be able to complete its motion. This could be avoided if robots would be allowed to move backwards along their path, but this is not considered here. From a practical point of view, it has to be mentioned that even a naive implementation of the above algorithm can solve a coordination task in a time in the order of a fraction of a second1 . 1 the reader should not be misleaded by this statement. The algorithm just computes a random permutation and tries to schedule robots one after the other according to this schema. No paths are computed and no optimization is tried, hence the small time required to find a solution.
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
203
5. Experimental results While the conceptual equivalence between the path coordination problem and the JSS problem has been outlined since long time, to the best of our knowledge there have been no attempts to set up an experimental framework to compare the exact approach and heuristic solutions proposed along the years. While one can easily guess that solving the JSS problem will take long time, it is interesting to assess how much the best solution found by heuristic algorithms differ from the optimal one. With respect to the algorithm presented in section 4, it is also appealing to measure how likely it is to select a random priority schema that leads to a deadlock situation. We have then developed two experimental setups. The first one is based on the formerly described algorithm, while the second is based on a freely available implementation of Mixed Integer Linear Programming algorithms [15] and is used to solve the associated JSS problem. The operating scenario has been simplified to a square grid, with robots starting and ending at random positions and moving along randomly generated paths. It is worth observing that these hypothesis do not oversimplify the problem. We anticipated in section 2 that there exist efficient algorithms to both compute paths and decompose them into free and occupied segments. Here we are only interested in the successive step, i.e. the coordination along these provided paths. The first set of tests analyzes the possibility of not finding a valid priority schema. Figure 3 illustrates the results. Fixing three different environment sizes, we have solved 20 randomly generated problems involving a varying number of robots. The figure illustrates the percentage of generated priority schemas that are not valid. A sort of saturation effect can be observed, i.e. when the number of robots increases, so does the fraction of invalid priority schemas. The reader should however observe that this effect manifests itself when the number of robots is big. The bigger the environment, the longer the paths travelled by each robot, hence the number of collisions and the chance that a robot will terminate its run on some other path. This was decided on purpose, to test the algorithm under extreme conditions. It however never happened that none of the generated priority schemas was feasible. The algorithm always produced a solution. It is also worth noting the fact that when trying to coordinate n robots, the algorithm always tries only n of the possible n! priority schemas. In the most extreme case with 35 robots, only 35 out of the 35! possible priority schemas were tried. This is clearly a negligible fraction. Still, the system is able to find a solution. In the second set of tests we have developed, we have measured the variability of the generated solutions, in term of maximum time taken to complete the given coordination task. Results are presented in figure 3, where the relative standard deviation2 is plotted versus the number of robots in the team, for different environment sizes. It can be observed that the relative standard deviation stays most of the time bounded below a certain environment dependent threshold, and grows in the end when the number of robots is such that the number of collisions becomes big. The last point would be comparing the solution found with the proposed algorithm with the one found by the exact JSS solution. Doing this comparison is however computationally hard. For the scenario involving a 20 × 20 no differences were observed. When the number of robot grows however, one is confronted with instances of the JSS problem with tenths of machines and jobs, and hundred of constraints. Instanced of this 2 relative standard deviation is the ratio between standard deviation and mean. In this case it is more appropriate than the simple standard deviation since the average path size grows with the number of robots
204
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
Figure 2. Percentage of invalid priority schemas for different environment sizes and different sizes of the multi-robot team
size require few hours to be solved. This pushes execution time very high, hence making impossible for the moment to do a precis comparison for each and every case (formerly illustrate charts are the result of more than 2000 randomly generated problems)
6. Conclusions The simple randomized schema for finding priority schemas turns out to be competitive under rather general conditions. None of the testcases generated led to an instance un-
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
205
Figure 3. Relative standard deviation for different environment sizes and different sizes of the multi-robot team
solvable by the algorithm. This result is somehow intriguing, if one considers the number of priority schemas tried versus the number of possibilities. We have also observed that the quality of the solution, intended as the time to bring all the robots to their final destination, is often not far from the optimal one. The analyzed values for the relative standard deviation show that when problem becomes difficult, produced solutions span a wide range of possible values. The proposed approach shows that in many practical situations, suboptimal approaches are still satisfactory, and the need of guaranteed optimal solutions offers very often an unbalanced ratio between quality increase and required time.
References [1] S. Akella and S. Hutchinson. Coordinating the motions of multiple robots with specified trajectories. In Proceedings of the IEEE International Conference on Robotics and Automation,
206
S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination
pages 624–632, 2002. [2] M. Bennewitz, W. Burgard, and S. Thrun. Finding and optimizing solvable priority schemes for decoupled path planning techniques for teams of mobile robots. Robotics and Autonomous Systems, 41(2-3):89–99, 2002. [3] S. Carpin and E. Pagello. A distributed algorithm for multi-robot motion planning. In Proceedings of the fourth Eurpoean Conference on Advanced Mobile Robots, pages 207–214, Lund (Sweden), September 2001. [4] S. Carpin and G. Pillonetto. Motion planning using adaptive random walks. IEEE Transactions on Robotics, 21(1):129–136, 2005. [5] M. Erdman and T. Lozano-Pérez. On multiple moving objects. Algorithmica, 2:477–521, 1987. [6] H. Choset et al. Principles of robot motion. MIT Press, 2005. [7] M.R Garey and D.S. Johnson. Computers and Intractability. A guide to the theory of NPCompleteness. W.H. Freeman and Company, 1979. [8] Y. Guo and L.E. Parker. A distributed and optimal motion planning approach for multiple mobile robots. In Proceedings of the IEEE Conference on Robotics and Automation, 2002. [9] L.E. Kavraki, P. Švestka, J.C. Latombe, and M.H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580, 1996. [10] S.M. LaValle. Planning Algorithms. Available Online. [11] S.M. LaValle. Robot motion planning: A game-theoretic approach. Algorithmica, 26:430– 465, 2000. [12] S.M. LaValle and S.A. Hutchinson. Optimal motion planning for multiple robots having independent goals. IEEE Transactions on Robotics and Automation, 14(6), 1998. [13] S.M. LaValle and J.J. Kufner. Randomized kinodynamic planning. International Journal of Robotics Research, 20(5):378–400, 2001. [14] J. Peng and S. Akella. Coordinating multple robots with kinodynamic constraints along specified paths. International journal of robotics research, 24(4):295–310, 2005. [15] GNU project. The gnu linear programming toolkit. www.gnu.org/software/glpk/glpk.html, 2005. [16] G. Sánchez and J.C Latombe. Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In Proceedings of the IEEE International Conference on Robotics and Automation, 2002. [17] T. Siméon, S. Leroy, and J.P. Laumond. Path coordination for multiple mobile robots: A resolution-complete algorithm. IEEE Transactions on Robotics and Automation, 18(1):42– 49, 2002.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
207
Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot Kenneth PAYNE, Jacob EVERIST, Feili HOU and Wei-Min SHEN1 University of Southern California, Information Sciences Institute, USA
Abstract. This paper proposes a novel method for localizing a stationary infrared source of unknown orientation relative to a static docking sensor. This method uses elliptical approximations of likely positions of the infrared source and computes the intersections to find the most probable locations. It takes only a few samples to localize, is easily computed with inexpensive microcontrollers, and is robust to sensor noise. We then compare our approach with two other methods. The first uses a Bayesian filter across a map of discrete locations in the robot’s operational workspace to determine the suspected source position. The second also uses a probability distribution map but uses the method described by Elfes in his paper on probabilistic sonar-based mapping and navigation [1]. We show that our approach localizes quickly with a single sensor and is no more computationally demanding than other methods.
Keywords. Alignment, Docking, Bayesian, Infrared Beacon, Elliptical Probabilistic Sensing, SeReS, CONRO.
Introduction and Previous Work Self-reconfigurable robots are modular systems that can change and adapt their configuration to the environment based on the needs and characteristics of their task. These robots rely heavily on their inherent ability to connect and disconnect multiple times to change the configuration of modules in the network. A system that is able to align and correct for docking offsets can change shape reliably. One of the fundamental problems encountered during reconfiguration is the issue of correcting the error in connector alignment. Traditionally, the reconfiguration process would be hand-coded into the system by either an operator or choreographed by an off-board computer [2]. However, one of the goals of self-reconfigurable robots is to achieve reconfiguration at run-time [3][4][5][6][7]. To accomplish this, a method of localization is needed to overcome noise in motors and proprioceptive sensors, as 1
Corresponding Author: Wei-Min Shen, USC/Information Sciences Institute, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292, USA; E-mail:
[email protected].
208 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot
well as error caused by docking friction and slippage. Many researchers have addressed this issue for both reconfigurable and general robotic docking [8][9][10][11][12]. Occasionally, those phenomena are somewhat predictable and can be approximated. For single-sensor modular systems, alignment with the infrared beacon was typically accomplished by scanning back and forth, localizing through multiple passes [10][13]. Systems with multiple sensors usually involved large amounts of computational complexity [14]. Lastly, the discretized methods discussed are based on the Bayes filter as well as Elfes’ work on sonar range finding [1]. Probabilistic methods achieve their objectives through sparse sampling and computation of probability distributions, which are robust to sensor and actuator noise. We designed our probabilistic approach for detecting error in sensor alignment and compared it to two others. For this series of experiments, we focus on the detection and localization of a single infrared light. The novel method we have developed is a parametric approximation of the two probability distribution-based methods. If we derive an accurate sensor model, the accuracy of the probabilistic methods is not significantly affected by sensor noise. The advantage of our approach over others is that it localizes quickly with a single sensor, yet uses no more computation than other methods, and can be carried out on a simple, inexpensive on-board microcontroller.
1. Experiment Characteristics 1.1. CONRO SeReS Robot Module and Testing Environment CONRO is a self-reconfigurable robot system (Fig. 1a). Each module has four interconnecting dock faces with an infrared emitter/receiver pair (for digital communication and analog sensing) as well as a pair of pin-and-latch docking systems, discussed in [12]. For this experiment, we used a CONRO SeReS module equipped with an Atmel Atmega128L processor2 and 128K flash memory.
a.
b.
Figure 1. (a) The CONRO Self-Reconfigurable Robot. (b) The SeReS conversion test environment. The operator holds a mobile infrared beacon with a plumb bob above a graduated strip mounted below the robot.
All three experiments were conducted in the same room (in Figure 1b) under the controlled lighting conditions of a typical office environment with diffuse fluorescent lighting in the ceiling, matte white walls, and celadon carpeting. This is by no means an approximation of the real world, but is instead used as an initial testing environment 2
For more information on this processor, see http://www.atmel.com, and also http://www.atmel.com/dyn/resources/prod_documents/doc2467.pdf.
K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 209
for the proof-of-concept work done in this experiment. Further testing should involve variable light and reflectivity in the environment, such as under direct sunlight, and ambient or cloudy skies. 1.2. Sensor Models For our sensor model, we simply use an approximation of the real light characteristics of the infrared detector on the SeReS module. SeReS reads in a 10-bit brightness value that is then used to determine a suspected angular offset for both yaw θ and
pitch φ (Figure 2). We collected sensor response data in 100° vertical and horizontal sweeps to map detector sensitivity to both rotational axes. This was a simple yet appropriate approach since SeReS modules often perform side-to-side or up-and-down motions to find their neighbors in reconfiguration tasks. Pitch
% Saturation vs Relative Orientation
Yaw 3
2.5
% Saturation (at 30 cm)
2
1.5 1
0.5 0 -60
-40
-20
0
20
40
60
-0.5 Degrees from Sensor Centerline
a.
b.
Figure 2. Brightness vs. Angular Offset Sensor Model. Note that in (a) “% Saturation” refers to total sensor saturation. Since these experiments were conducted at 30 cm, a very small but sensitive range is detected, and the infrared detector never completely saturates. This is from a 10-bit value representing a 5V range. (b) shows how the orientation workspace of SeReS is mapped onto a 2D spherical section. Note also that orientation can be expressed in degrees (0°,0° center) or in discretized 4.2° x 4.2° cells (12,12 center cell).
2. The Elliptical Intersection Method The most important part of a probability distribution map is the set of peaks indicating likely regions in which the source should be found. As the sensor model can only approximate range but not specific orientation from the current position, each location has with it an associated elliptical region of likely positions. The elliptical intersection method represents the highest rings of those regions of probability as parametric ellipses. By determining the intersections of these ellipses (Figure 3), a concise set of points can be found where the infrared source may be located. In our light model, we measured the response curves of the infrared detector on both θ and φ axes. This simple model ensures that all major and minor axes of associated ellipses will be parallel. The major and minor axes of each ellipse are the hypothesized angular offset taken from the light reading in Figure 2a. Lastly, the three samples are arranged in a triangle to cover more operational space.
210 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot
a.
b.
Figure 3: (a) The elliptical intersection method begins by calculating the sizes of three sampling ellipses, taken from different points in SeReS’ operational space, and whose a and b terms are determined from the light measured at each sample point with respect to the light model from Figure 2a. (b) shows the resulting intersections of the ellipses determined in (a). After discarding the outliers, the remaining cluster will envelope the region most likely to contain the infrared source.
Recall from elementary geometry that the intersection of two ellipses is Eq. (1) and Eq. (2) after simplification. In these equations, the variables a and b are the vertical and horizontal angular offsets taken from the light model in Figure 2a, and the position (h, k) is each point at which sampling takes place.
(θ − h1 )2 (φ − k1 )2 +
a12 § 1 ¨¨ 2 © a1
b12
=
(θ − h2 )2 (φ − k 2 )2 +
a 22
(1)
b22
§ h 2 (φ − k1 )2 · § 1 · 2 § 2h1 · ¸=¨ 2 ¸¸θ − ¨¨ 2 ¸¸θ + ¨ 12 + ¨a ¸ ¨a b12 ¹ © a1 ¹ © 1 ¹ © 2
· 2 § 2h2 ¸¸θ − ¨¨ 2 ¹ © a2
§ h 2 (φ − k 2 )2 · ¸¸θ + ¨ 22 + ¨a b22 ¹ © 2
· (2) ¸ ¸ ¹
As the current SeReS orientation (h, k) and the suspected angular offsets (a, b) will not be known until sampling at runtime, a general solution is needed. We used Sylvester’s determinant of the matrix Eq. (3) to remove the θ term. The determinant itself is a very long fourth order polynomial with thirty-four terms. This allows us to find its φ roots by simplifying it into the form of Eq. (4) and solving.
ª1 « 2 « a1 « «0 « «1 « a2 « 2 « «0 ¬
(Aφ
2h1 a12 1 a12 2h − 22 a2 1 a22 −
4
h12 (φ − k1 ) + a12 b12 2h − 21 a1 2 2 h2 (φ − k 2 ) + a22 b22 2h − 22 a2 2
º » » ( φ − k1 )2 » + » b12 » » 0 » » 2 ( φ − k2 ) » + b22 »¼ 0
h12 a12
h22 a22
)
+ Bφ 3 + Cφ 2 + Dφ + E =0 F
(3)
(4)
K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 211
Since the intersections are represented by floating point numbers, we must use the Newton-Raphson method of approximating roots the same way it is done on modern graphing calculators. By solving Eq. (1) for θ and substituting the roots found, one can find the set of real 2D points at which the ellipses intersect. Next, by clustering the points together, we approximate the location of the source by discarding outliers and taking the center of mass of the remaining cluster. The sensor model guarantees that the major and minor axes of the ellipses will be parallel to the pitch and yaw axes of the SeReS module respectively. Also, the arrangement of sample points chosen for taking readings guarantee that there will be up to nine intersections. To simplify clustering, simply discard those points whose angular offset from the center of mass is greater than the average, and then recalculate the new center of mass. Due to the guarantee of axial symmetry and triangular arrangement, this clustering method is sufficient. If the light model were more complex, we would likely have to draw a clustering method from [15]. It should be noted that if fewer than three intersections are found, the data is insufficient to produce a solution. This is addressed later.
3. Discretized Methods In these localization methods, we separated the angular offset space into a set of discrete cells that contain the probability of the light source being in that location. We used a 24 x 24 2D map of the SeReS module’s yaw θ and pitch φ operational space that is also interpreted as the orientation of the infrared detector. This map places cell (0, 0) at the lower west corner and (23, 23) is at the upper east (Figure 2b). To update the belief functions, readings must be taken from various locations. This yields a ring or peak of high probability around SeReS cell location depending on the infrared source’s suspected location. If the new probability for a given cell in the map is lower than the current belief for that position, it will decrease. This eliminates competing peaks from the map. We chose the four cells (12, 12), (5, 12), (14, 6), and (14, 14), because together these locations cover SeReS’ entire operational workspace.
a.
b.
Figure 4: In both images, readings are taken from SeReS’ central orientation at cell (12,12). In (a), the suspected brightness is far away from cell (12, 12) so an elliptical valley is created in the probability distribution. In (b) the source is very close to SeReS’ current orientation, so a very small ring (becomes a peak) is created around the current orientation of the SeReS module.
For all sensed values b > 4.3% of complete infrared detector saturation, the function would return suspected angular offsets of 0°. This occurs mainly at perfect
212 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot
alignment, and supports the assumption that the suspected source position was at an angular offset of 0° from SeReS’ current position. This also creates a peak at the current position in the 24 x 24 map. The next part of the curve shows that for all brightness values 0.1% < b < 4.3%, the curve can be approximated by a parabola opening downwards. Lastly, for all sensed brightness b < 0.1%, a wide distribution was used for each cell. This essentially lowers the probability of the region surrounding SeReS and leaves the oblique regions with a nearly equal chance of containing the source. Not every point is expected to see the light source, and so this method maintains probability mass for those points not in the immediate detector range. See Figure 4 for illustration.
3.1. Implementation of Bayesian Filter Recall the Bayes filter, governed by Eq. (5) for sensor observations. The robot starts with a uniform prior belief distribution of the light source. The state, s, is the position of the sensor in the 24 x 24 cell map. Due to the large size of the individual cells, (4.2° x 4.2°), Bayes’ action model tested to be P (s | s ' , action) = 1.0 for all commanded servo motions, and is therefore left out for brevity. In Eq. (6) we represent the state as the coordinate pair (θ , φ ) . The observation, o, in Eq. (5) is the detected brightness B and the suspected angular offsets (a,b) that follow from the light model given in Figure 2a. The evidence η is found in the usual way by taking the summation of all nonnormalized P and then dividing each cell by that value.
Bel ' (s | o ) = ηP (o | s ) ∗ Bel (s ) , where η =
1 . evidence
Bel ' (θ , φ | B, a, b ) = ηP(B, a, b | θ , φ ) ∗ P(θ , φ )
(5) (6)
3.2. Implementation of Elfes’ Method In pursuit of another method for updating the prior, we also tried Elfes’ approach [1]. Elfes held that the new belief of a cell in an observation is governed by Eq. (7).
P(cell) = P(cell) + P(reading) - P(cell) * P(reading)
(7)
We picked this method because sound and light dissipate in a similar fashion, and we hypothesized that what worked for Elfes’ on sonar-range finding would work on beacon location. Our infrared detector gives us magnitude but not direction just as Elfes’ sonar gave magnitude (and by virtue of positioning on the robot, direction), but did so with heavy noise [1]. As Elfes used repetition in his method to circumvent the sonar noise, we used it to overcome the noise in the infrared detector. The rest of the process is carried out in the same fashion as for our Bayesian method.
K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 213
4. Findings
4.1. Elliptical Intersection Method For each method, we ran ten trials with the infrared source placed in various locations. For the elliptical intersection method, the results were erratic. In simulation, the system used very regular curves for brightness and always measured perfectly. In real application however, the sample points sometimes yielded 0% intensity, leaving us with the challenge of representing an “out-of-range” condition in the program. We tried two possible sets of angular offsets for this condition. The first was the bare minimum angular offset for that received value, which was a = 45°, b = 36°, taken from the model (Figure 2a). The second pair was the absolute extreme angular offset measured for 0% intensity: a = 50°, b = 50°. Some of the results are in Table 1. Table 1. Out-of-Range Elliptical Intersection trial results. Trial
Actual Position (ș, ij)
Cluster Center (ș, ij)
Out of Range Angular Offsets
EIM Trial I-a
(17°, 17°) = cell (16, 16)
(15°, 24°)
a = 50°, b = 50°
(11°, 16°)
a = 45°, b = 36°
(2°, -3°)
Both conditions.
(7°, -8°)
a = 50°, b = 50°
EIM Trial I-b
(0°, 0°) = cell (12, 12)
EIM Trial I-c
(-50°,-50°) = cell (0, 0)
EIM Trial I-d
(-30°, 25°) = cell (5, 18)
(-6°, -8°)
a = 45°, b = 36°
(-33°, -30°)
a = 50°, b = 50°
In theory, this method was able to approximate the light source in simulation to within ±2°. In actual practice, however, this was usually greater than ±10° with occasional trials such as I-d, off by ±60° on one axis. This indicates that our light source and light model were not as uniform as we had hoped, and that further study into making a clean reliable uniform light source will be needed. One benefit of the probabilistic methods was that even if no brightness was measured, the algorithm would increase the extreme areas and decrease the probability mass for the surrounding area. With the elliptical method, no reliable ellipse dimensions could be determined from a single measurement. Lastly, the Newton-Raphson part of the algorithm had to cycle many more thousands of times for each potential intersection, which yielded a solution on our 4MHz processor within 10 to 30 seconds, (depending on the points selected and their respective ellipse sizes). This is currently too slow for implementation on real-time docking problems.
4.2. Discretized Methods The discretized methods worked much more smoothly. The results in Table 2 show the discretized algorithms to be accurate usually within 2 cells as the probability increases. The Bayesian method is quick and accurate. It repeatedly localized to a single peak in short order; always within a few cells from the actual source of infrared light (it tends to converge low-west of actual position). Of particular interest were the results of Trial B-a, where the light source was placed slightly east and low-of-center. This was able to
214 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot
converge on the fourth sample despite the fact that the first three measurements were out of range (Figure 5). Table 2. Notable Bayesian and Elfes trial results. Trial
Actual Cell (ș, ij)
Peak Cell (ș, ij)
Bayesian Trial B-a
(16, 16)
(18, 17)
Bayesian Trial B-b
(12, 12)
(14, 14)
Bayesian Trial B-c
(12, 12)
(16, 16)
Bayesian Trial B-d
(16,16)
(11,15)
Elfes’ Method Trial E-a
(12, 12)
(1, 10)
Elfes’ Method Trial E-b
(0, 0)
(3, 1)
a.
b.
Figure 5: Trial B-a. (a) SeReS starts at cell (12,12) and sees no light. It then moves to cells (5,12) and (14,6) with no detected light. Finally, it approaches (14,14) (b) near the source and begins to converge with a higher peak. Already by the fourth sample (b), the three other competing peaks were reduced.
a.
b.
c.
d.
Figure 6. Trials B-b (a) and B-c (b) show the effects of noise in the sensor. Trial B-b has half of Trial B-c’s standard deviation. The noise in the sensor yielded different probabilities, yet still converged to within 4 cells. In Trial E-a, the distribution does not readily converge. (c) SeReS begins at cell (12, 12). (d) Fourth step at (14, 14). The overall peak increased and moved closer to the source with each sampling, but a large portion of the probability ring still remains. To implement Elfes’ approach more efficiently, we may need a better sensor model, or many more readings.
K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 215
Bayesian Trials B-b and B-c, and Elfes’ Method Trial E-a were all taken with the light at the center. Trial B-b registered the peak closer to the actual position, and Trial B-c had a wider standard deviation (Figure 6a and 6b). Next is trial E-a, which had the light source at the center (12, 12). The adaptation of Elfes’ approach is slow, but promising. To effect change in the probability distribution it relies on many readings, and had not yet converged to a single useful peak by the fourth sample. It is shown in Figure 6c and 6d.
5. Conclusions and Future Work One conclusion we can draw from these results is that the chief cause of failure or inaccuracy was that our sensor model was too brief, and did not account for anomalies that were seen at oblique angles from above and below. These anomalies are common to off-the-shelf infrared detectors like the ones used on SeReS, and are even within the specifications of the particular one we used. This is supported by the fact that the simulated models worked well based on simplified physical laws that govern light behavior. Once the methods were applied under real-world conditions, the shortcomings of the light model became apparent. Also, we found that by reducing the standard deviation for the sensor model probability curve, the peaks were sharper and converged faster and tighter than when we used a broader curve. This is only natural, as more probability mass was concentrated into a smaller volume. This accounts for the difference in Bayesian Trials B-b and B-c. By giving more benefit of the doubt to nearby cells, Trial B-c converged with a wider distribution around the peak, whereas Trial B-b was higher and tight. While this paper focused on localizing a single infrared source under semi-ideal conditions, it should be noted that the real world is full of light noise and sensor misalignments. It is therefore desirable to develop a more rigid and redundant sensor model that takes into account other factors such as slop and backlash in the servomotors, multiple light sources, and of most importance, moving lights at various ranges from the detector. One suggestion we have for further work on this topic is to use a weighted system of constant random sampling. The peaks or intersections would quickly drift across the operational space and the clusters would track the source with an error whose size was directly related to the sampling frequency and source velocity. Another suggestion would be to use a variable set of light models taken at varying distances. If peaks or intersections are either very far away from each other, or nonexistent, then the light model would be assuming that the source is too close, or too far away, respectively. One could use a wide variety of standard approaches to converge on selecting the proper model, such as rule-based lookup tables, simulated annealing, neural networks, and many others. A proper infrared sensor model can help modular systems to localize relative to opposing dock faces during self-reconfiguration. Using probabilistic methods, the approach phase of docking can be sloppier and still localize and align, assuming the control algorithm chooses appropriate points from which to take readings. In our experiments, points were chosen arbitrarily for apparent uniform distribution across the operational space, but a more variable approach could also be used.
216 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot
Acknowledgements We are grateful that this research is sponsored by AFOSR under award numbers F49620-01-1-0020 and F49620-01-1-0441. We also want to thank other members in the Polymorphic Robotics Laboratory for their generous intellectual support and discussions.
References [1] [2]
[3] [4]
[5] [6]
[7] [8]
[9] [10]
[11]
[12]
[13]
[14]
[15] [16]
A. Elfes, "Sonar-based Real-world Mapping and Navigation", IEEE Transactions on Robotics and Automation, 3(4), pps. 249-265, 1987. E. Yoshida, S. Murata, A. Kamimura, K. Tomita, H. Kurokawa, S. Kokaji, "A Self-Reconfigurable Modular Robot: Reconfiguration Planning and Experiments", International Journal of Robotics Research, 21(10), pps. 903-916, October 2002. M. Rubenstein, M. Krivokon, W.-M. Shen, "Robotic Enzyme-Based Autonomous Self-Replication", International Conference on Intelligent and Robotic Systems, Sendai, Japan, 2004. Z. Butler, K. Kotay, D. Rus, K. Tomita, "Generic Decentralized Control for a Class of SelfReconfigurable Robots", Proceedings of International Conference on Robotics and Automation, Washington, DC, USA, 2002. K. Stoy, R. Nagpal, "Self-Repair Through Scale Independent Self-Reconfiguration", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. K. Kotay, D. Rus, "Generic Distributed Assembly and Repair Algorithms for Self-Reconfiguring Robots", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. E. Klavins, R. Ghrist, D. Lipsky, "Graph Grammars for Self-Assembling Robotic Systems", Proceedings of International Conference on Robotics and Automation, New Orleans, USA, 2004. M.W. Jorgensen, E.H. Ostergaard, H.H.Lund, "Modular ATRON: Modules for a Self-Reconfigurable Robot", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. M. Nilsson, "Heavy-Duty Connectors for Self-Reconfiguring Robots", Proceedings of International Conference on Robotics and Automation, Washington DC, USA, 2002. M. Rubenstein, K. Payne, P. Will, W.-M. Shen, "Docking Among Independent and Autonomous CONRO Self-Reconfigurable Robots", Proceedings of International Conference on Robotics and Automation, New Orleans, USA, 2004. M. Silverman, D.M. Nies, B. Jung, G.S. Sukhatme, "Staying Alive: A Docking Station for Autonomous Robot Recharging," Proceedings of International Conference on Robotics and Automation, Washington DC, USA, 2002. B. Khoshnevis, P. Will, W.-M. Shen, "Highly Compliant and Self-Tightening Docking Modules for Precise and Fast Connection of Self-Reconfigurable Robots," Proceedings of International Conference on Robotics and Automation, Taiwan, 2003. K. Payne, B. Salemi, W.-M. Shen, and P. Will, "Sensor-Based Distributed Control for Chain-Typed Self-Reconfiguration," Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. K. Roufas, Y. Zhang, D. Duff, M. Yim, “Six Degree of Freedom Sensing for Docking Using IR LED Emitters and Receivers,” Experimental Robotics VII, Lecture Notes in Control and Information Sciences 271, D. Rus and S. Singh Eds. Springer, 2001. A.K. Jain, M.N. Murty, P.J. Flynn, "Data Clustering: A Review", ACM Computing Surveys, 31(4), pps. 264-323, 1999. D.M. Lane, HyperStat Online Tools Website, http://davidmlane.com/hyperstat/normal_distribution.html, Oct. 2, 1999.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
217
Mutual Localization and 3D Mapping by Cooperative Mobile Robots Julian Ryde and Huosheng Hu Department of Computer Science, University of Essex Wivenhoe Park, Colchester CO4 3SQ, England E-mail:
[email protected],
[email protected] Abstract. This paper describes the development of a 3D laser scanner and an approach to 3D mapping and localization. The 3D scanner consists of a standard 2D laser scanner and a rotating mirror assembly. Employing multiple robots and mutual localization local 3D maps are built. Global localization within the maps is performed by extracting a cross-section of the map just below the ceiling and then using an exhaustive search algorithm to enable the merger of multiple local 3D maps. The quality of these maps is such that the poses estimated by this method are accurate to within 0.1m and 1 degree. Keywords. Mutual, Localization, Cooperative, Mobile, Robots, 3D, Mapping
1. Introduction Extracting 3D information about the world surrounding a robot has proved difficult. The two main approaches, vision and laser range finding, have been dogged by problems. Vision is often computationally intensive and suffers from sensitivity to changes in illumination. Many of the difficulties stem from having to solve the correspondence problem which can be alleviated by structured light approaches, however the data spatial density does not come close to that provided by laser scanners. Non-visual localization and mapping has taken place in 2D, mainly due to limitations of the sensors in the case of laser range finders, or processor speed and algorithms for that of stereoscopic vision. Recently in a drive to test the benefits of 3D sensors researchers have mounted 2D laser scanners on nodding or rotating mechanisms to obtain 3D scans [1,2]. Alternatively, two laser scanners mounted with their scan planes orthogonal [3] are also popular. The main problems with nodding or rotating approaches are difficulties in hardware implementation and high power consumption as the 2D scanners are heavy. Consequently a rotating mirror prototype has been built which produces 3D scans with a field of view of 100 by 180◦ , is light, has low power consumption and is easily deployed on conventional robotics platforms. A number of groups are undertaking research into 3D laser mapping however very few groups are performing cooperative 3D laser mapping. The closest are groups using cooperative vision and laser based mapping in outdoor environments [4] and vision only [5]. The benefits of full 3D mapping are abundant and so the rapid expansion of this field is inevitable. The detection of negative and over-hanging obstacles greatly en-
218
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots
hances avoidance behavior. Once 3D maps of environments have been built they can be customized for different robots. For instance various 2D occupancy grids may be built for robots of different sizes or with 2D sensors at different heights. Severely cluttered non-manifold environments such as search and rescue situations may be reliably mapped. Maps based upon the ceilings of rooms will remain accurate for longer and an unoccluded view of the ceiling is usually readily accessible to a robot even in crowded environments [6]. The disadvantages of 3D sensing technologies are slower acquisition time and the geometric increase in data that needs to be processed. In this paper, the mutual localization approach discussed in Section 2.1 coupled with the 3D laser range finder prototype pushes this research into the new area of threedimensional cooperative localization and mapping. Combining mutual localization with the data from multiple 3D laser scanners enables full 3D mapping of indoor environments. This would prove vital for a number of industries such as nuclear decommissioning, search and rescue scenarios, surveying as built structures and maps for mobile robots. The rest of this paper is organized as follows. Section 2 presents the system framework of the research carried out in this paper. Section 3 details the 3D scanner hardware. Section 4 includes the experimental results and is followed by Section 5 discussing their implications along with future research directions.
2. System Framework 2.1. Mutual Localization The premise for mutual localization is that rather than merely observing robots as beacons each robot observes and is itself observed simultaneously [7]. Additionally, ensuring that robots may observe team-mates and be observed themselves means that simultaneous mutual localization events can occur. These events allow superior relative pose determination. Firstly, the mutual localization is robust to spurious readings because simple checks on the validity of the mutual pose are available; for instance the separation of the robots should be similar as measured by both observing robots. Secondly, the accuracy in the pose does not deteriorate with separation, [7], a very useful property. Increasing separation merely decreases the frequency of mutual localization events. This is accomplished by mounting retro-reflective cylinders above the laser scanner as shown in Fig. 1(a). The 3D laser scanner prototype has a blind spot directly in front of it with an angular width of 20◦ thus higher mounted beacons will be visible from further afield. Most clutter in rooms is quite low, for example chairs and tables, so even when the line of sight between the robots is interrupted; the beacons, being some way from the ground, may still be detected by the laser scanners. Mutual localization is accomplished by first projecting the observed beacon’s 3D position onto the xy-plane which is parallel to the floor of the room. Once projected onto the horizontal plane the 2D mutual localization algorithm may be used. This method assumes that the beacons are always the same height above the floor, reasonable in the case for many indoor environments with flat horizontal floors. Given the relative pose of the mapping robot with respect to the observing robot multiple 3D laser scans may be combined to produce a local map.
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots Retroreflective Cylinders
219
D d a
s1
β
C
s2
Rotating Mirror
Laser Scanner
b Robot Platform
A
α
θ d
(a)
B (b)
Figure 1. Beacon positioning and geometry used to calculate the relative pose. AB is the stationary robot and CD the mapping robot. D and B are the beacons with C and D the origins of the laser scanners.
The beacons are above the laser scanner and there is a small horizontal displacement from the origin of the laser scanner. This displacement introduces the complications evident in Fig. 1(b), where the laser scanners are represented as semi-circles with the forward part of the scanner corresponding to the curve of the semicircle and the beacons as solid circles. The robots are at A and C, with C mapping whilst A remains stationary. The pose of A may be constrained to the origin and x-axis without loss of generality. The beacons, B and D, are attached to the robots at A and C respectively. The distances AB and CD are both equal to d. The separation of the beacons is labelled as s1 and s2 because s1 is the separation as observed by the robot at A and s2 is the separation as observed by the robot at C, which in practice will not be identical. The robot at A observes the beacon D at range a and angle α whilst that at C observes the beacon B at range b and angle β. The beacon separation may be calculated from the robot observations by the cosine rule for triangles. Comparison of the two values s1 and s2 (which should be approximately equal) allows erroneous localizations to be detected and rejected. Typical errors stem from distractors in the environment or error sensitive geometric configurations. To acquire the position of the robot at C it is best to consider the geometry in the complex plane rather than the more immediately apparent geometric methods involving the sine and cosine rules. Using vectors in the complex plane gives two expressions for the position of C found by traversing the two available paths from the origin to C. A detailed explanation is given in [7]. Equating these gives the following expression for θ. eıθ =
aeıα − d b − de−ıβ
(1)
Eliminating eıθ and simplifying gives C in terms of the observed angles, ranges and the parameter d. C=
abeı(α+β) − d2 beıβ − d
(2)
The real and imaginary parts of C give the Cartesian coordinates of the second robot. The orientation is θ−β +π with θ determined from the argument of (1). The complex number calculations are done in Java using the Java Math Expression Parser (JEP) because Java does not natively handle complex numbers. The robots are able to exchange information
220
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots
through wireless networking. Once the pose of robot C is acquired its scan data may be used to update the centrally held occupancy map. The map is a standard occupancy grid [8], where each cell contains the probability that the particular space it represents contains an obstacle. The grid is initialized so that all cells contain 0.5 representing complete uncertainty. Typically the resolution is 0.1m and the size is 30 by 30m and the origin is the initial position of the observing robot. New sensor data is incorporated via the Bayesian update process described in (3). Occupancy grids are an effective map representation as they are robust, handle uncertainty well and allow fusion of data from different sensors [9]. −1 (1 − evidence)(1 − prior) P (occupied) = 1 + evidence × prior
(3)
Due to the fidelity and narrow beam width of the laser scanner a straightforward raytrace model was employed. When updating the occupancy grid all cells along the path of the laser ray have their probabilities decreased whilst the cell containing the end point of the ray has its probability increased. 2.2. Global Localization Whilst producing local maps in a cooperative manner, one of the robots remains stationary acting as a reference frame for building the map. In order to map areas larger than the local map the stationary robot must move at some point. This is an opportunity for odometry errors to adulterate the global map. Although these may be vastly reduced by moving the reference robot under observation of the mapping robot, these errors included at each step, albeit small, still undergo unbounded accumulation. Without some form of reference to the global map frame, mapping of large cyclic environments would prove impossible. Thus global localization is a necessity. Usually mobile robots operate on the floor, consequently the 2D assumption may be enforced for global localization. In most typical indoor environments moveable objects tend to lie on the floor. Floor-level 2D maps (as most in previous 2D localization mapping research have been) are unreliable and quickly become out of date when furniture is moved and doors are opened or closed. To avoid these problems 2D map data is extracted as a horizontal cross-section just below the ceiling of the room. This volume of a room changes infrequently and is more easily observed especially in cluttered environments. The horizontal cross-section chosen was the volume from 2.3 to 2.9m in height and the resulting 2D occupancy grid is produced by simply summing the vertical columns within this height range and re-normalizing. A plethora of global localization algorithms given a prior map exist however in this research an exhaustive search approach was implemented. This has a number of advantages, the entire pose probability density function may be recorded and inspected, it is very robust and suffers only from its slow execution speed. The impact of this on realtime operation may be vastly reduced especially if a position estimate is available which is almost always the case. The pose search algorithm may simply spiral search out from the estimated position. It was found that the execution speed of an unoptimized algorithm meant that global localization could take place on maps of size 20m by 20m with pose resolution of 0.1m and 1◦ in the order of minutes on a 2GHz processor. For the imple-
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots
221
Move A to NBV and take 3D scan
Move B to NBV and take 3D scan
No
Can A and B detect each other?
Yes
Brute force globally localise A in global map
Determine pose by mutual localisation and add scan to 3D map
No
Is local map adequate?
Yes Add local map to global map
Figure 2. The 3D mapping process for two robots A and B
mentation of the exhaustive search algorithm the 3D sensor scanner is converted into a 2D occupancy grid by extracting the cross-section near the ceiling. A correlation metric is then calculated iteratively for the sensor occupancy grid by stepping through various poses with respect to the map. Typically, pose resolutions of 0.1m and 1◦ are used. The pose with the best correlation metric is then chosen. An example position probability distribution function is displayed in Fig. 5(b). 2.3. Concurrent Processing Usually an exhaustive search algorithm of localization is unfeasible because it does not run fast enough to provide continuous real-time pose updates. However in this implementation the pose updates are supplied by the mutual localization. The global localization is run in parallel with the local map build process as depicted in Fig. 2. In Fig. 2 the first robot, A, moves to the next best view (NBV). The NBV is the next position which is most likely to yield useful observations for updating the map and in these experiment is calculated manually. Robot A is acting as the stationary robot providing a reference frame for B whilst B maps the local area around A. When the local map is sufficiently good it can be added to the global map. Whilst the local map around A’s new pose is built the pose of A within the global map is calculated. This pose along with (3) enables the merger of the local and global maps. The process is repeated until the global map is of sufficient quality.
3. 3D Scanner Design The 3D laser range scanner consists of a SICK LMS 200, rotating reflecting surface prototype (Fig. 3(a)) and software to convert the range scans into 3D data. The mirror rotation period is 40 seconds. The slow rotation speed contributes to a very low power consumption of 0.01W. Rotating at the same speed as the motor is the far blocking arm in Fig. 3(a). The purpose of this arm is to allow feed back of the angular velocity of the
222
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots Right block arm Motor r co φ
φ
d
r sin θ
M irr
or
θ
sθ −
Left block arm d
(a)
(b)
Figure 3. Rotating mirror mechanism and calculating 3D coordinates from range data
mirror to the robot. This blocking arm approach circumvents problems synchronizing angular velocity with the laser data. The laser scanner scans full 180◦ scans with most of the range data from side angles discarded. The furthest right hand scan points are blocked every half a second and this information is used to calculate the angular velocity of the motor and consequently the mirror. A particular advantage of this approach is ease of deployment in that the rotating mirror mechanism can simply be placed in front of the laser scanner and no further connections are required, even the power supply is self contained. The arm blocks are easy to detect as falling edges in the side range data. The LMS was configured to scan a 100◦ arc at 1◦ intervals and mm measurement setting. These settings were chosen to minimize the number of scan points in order to increase the complete scan frequency. The scan frequency is limited by the data rate and at the moment is 0.05Hz. This data rate is 34.8kb/s, the maximum for the serial interface, resulting in 13 horizontal scans per second. The SICK LMS 200 can support a 500kb/s data rate using a USB interface. At this data rate, full 3D 1◦ by 1◦ scans should be possible at 0.5Hz. The mirror is mounted in front of the scanner rather than rotating or nodding the scanner itself. This has a number of advantages namely less power consumption and simpler hardware setup. The disadvantages are a reduced scan angle to around 100◦ , a blind spot when the mirror is edge on and a requirement for more complex geometric calculations (Fig. 3(b)) to correct for the separation between the mirror and the scanner. In Fig 3(b) the effect of the mirror is to bend part of the xy-coordinate plane to a new elevation illustrated in grey. The following equations, which reference the values indicated in Fig. 3(b), indicate the conversion between the r, θ and φ coordinates, measured by the laser scanner, and 3D cartesian coordinates. x = ((r cos θ − d) cos φ + d, r sin θ, (r cos θ − d) sin θ)
(4)
where the value d is the separation between the origin of the laser scanner and the axis of the rotating mirror. The range and bearing as measured by the laser scanner are r and θ. The angle of the plane (grey region in Fig. 3(b) to the horizontal introduced by reflecting the laser ray from the rotating mirror in front of the scanner is indicated by φ.
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots
223
(a) (b) Figure 4. Photograph and typical 3D scan of mapped room
(a)
(b)
Figure 5. Global sub ceiling map and position probability density function both with a resolution of 0.1m. The map is a thresholded occupancy probability grid of the space 1-2.9m above the floor. The width of the room is approximately 12m. Two pillars and the windows at the back are evident.
4. Experimental Results and Analysis Results are visualized using a number of methods and software programs. To visualize the data a 3D scene is generated consisting of cubes or spheres representing occupancy and the results rendered using a ray tracer, Fig 4(b). This allows the production of high quality images and movies with shadowing, luminosity and perspective, visual cues that allow the brain to extract the 3D information more effectively from the 2D display media. However, despite these efforts it still remains difficult to effectively display 3D maps as images. An example of such a 3D image is displayed in Fig. 4(b) and a photograph from the corresponding view point is shown in Fig. 4(a). Each sphere in Fig. 4(b) represents a laser range return and is colored by its distance from the view point, with those closer being lighter. Fig. 5 illustrates a typical global localization scenario; the occupancy grid, Fig. 5(a), is represented by drawing a cube at every grid cell where the probability of occupancy exceeds 0.9. Fig. 5(b) shows a postion probability density surface where the probability
224
J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots
of each position is indicated by the height of the surface. The reliability and accuracy of the pose may be ascertained by inspection of the position probability distribution. The true position is easily discerned and the dominance and shape of the peak denotes the reliability and error respectively.
5. Conclusion and Future Work For decades mobile robots have been shackled by the 2D assumption, which has been necessary due to the absence of suitable 3D sensors. Even today 3D sensing technology is problematic and expensive. In this paper a rotating mirror mechanism has been added to a standard SICK LMS 200 laser scanner which produces 3D scans. These 3D scans coupled with mutual localization have produced full 3D environmental representations at very low cost. The fidelity of these representations has been validated by performing global localization within them. Further work would include mounting the laser scanner so that it is facing up into the mirror to allow fuller scans which might allow localization by range histograms. Results from increasing the number of cooperating robots would be interesting. The 3D occupancy grids produced by mutual localization, although interesting for humans to inspect, will mainly be used by other robots. Thus the global localization accuracy is a good indicator of their fidelity and suitability for further use by other robots. Global localization in these maps delivers poses accurate to 0.1m and 1◦ .
References [1] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann, “Heuristic-based laser scan matching for outdoor 6D SLAM,” in Advances in artificial intelligence. 28th annual German Conf. on AI, Sept. 2005. [2] K. Lingemann, H. Surmann, A. Nüchter, and J. Hertzberg, “Indoor and outdoor localization for fast mobile robots,” in Proceedings of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Sendai, Japan, Sept. 2004, pp. 2185–2190. [3] A. Howard, D. F. Wolf, and G. S. Sukhatme, “Towards 3D mapping in large urban environments,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Sendai, Japan, Sept. 2004, pp. 419–424. [4] R. Madhavan and K. Fregene and L. Parker, “Distributed Cooperative Outdoor Multirobot Localization and Mapping,” Autonomous Robots, vol. 17, pp. 23–39, 2004. [5] J. Little and C. Jennings and D. Murray, “Vision-based mapping with cooperative robots,” in Sensor Fusion and Decentralized Control in Robotic Systems, vol. 3523, Oct. 1998, pp. 2–12. [6] W. Burgard, A. B. Cremers, D. Fox, D. Hahnel, G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun, “Experiences with an interactive museum tour-guide robot,” Artificial Intelligence, vol. 114, no. 1–2, pp. 3–55, 1999. [7] J. Ryde and H. Hu, “Laser based simultaneous mutual localisation for multiple mobile robots,” in Proc. of Int. Conf. Mechatronics and Automation, Niagara Falls, Canada, July 2005, pp. 404–409. [8] A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22(6), pp. 46–57, 1989. [9] P. Štˇepán, M. Kulich, and L. Pˇreuˇcil, “Robust data fusion with occupancy grid,” IEEE Transactions on Systems, Man and Cybernetics, vol. 35, no. 1, pp. 106–115, 2005.
Part 5 Network Agent Systems
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
227
Exploration of complex growth mechanics of city traffic jam for the adaptive signal control Kouhei HAMAOKA , Mitsuo WADA Dept. of Information Science and Technology, Hokkaido University Abstract. Traffic jams cause stagnation of traffic flow which leads to waste of energy and damage to city environment. Thus, relaxation of traffic jams is an important issue. The dynamics of city traffic jams are difficult to derive in a theoretical sense because of their complexities. In this research, we construct a city traffic flow model which can express these complex characteristics. Furthermore, the relations between city traffic jams and traffic signals’ indications are analyzed by the traffic simulation. In the result, we propose a new measure ”critical inflow rate”. If inflow approaches this value, the traffic jam grows exponentially. Thus, it is a feasible criterion to evaluate growth of the traffic jam. Furthermore, we discuss the effectiveness of random split indication of the traffic signal in finite jams’ area in the context of stochastic resonance. From these results, we show a new methodology which can deal with city traffic jams. Keywords. Traffic simulation, Coupled map lattice, Signal control, Queuing theory
1. Introduction Real traffic flow is a complex system, since many drivers’ decisions affect the flow. Thus, there is a need to model traffic flow for the purpose of analyzing traffic jams. In traffic systems, flow volume increases linearly with density if the density is sufficiently low. But traffic flow has the intrinsic property that the flow volume does not increase any more if the density has gotten large on some level. Put simply, flow volume can not identify the density. So traffic simulation is indispensable to analyze traffic jams generated from complex traffic flow. In this research, we construct a city traffic flow model which can express these complex characteristics. Furthermore, the relations between city traffic jams and traffic signals’ indications are analyzed by the traffic simulation. On a free way, vehicles will decelerate because of the narrow gap if vehicles’ density has became large. The backward propagation of velocity delay intercepts traffic flow. We call such stagnation ”natural jam” [1]. If there is a traffic signal in the traffic system, vehicles accumulate right in front of the signal because of the signal indication. The backward growing of vehicles’ accumulation intercepts traffic flow. We call such stagnation ”signal jam”. Kuwahara et al. discuss jams in signal set system by using queuing theory 1 Correspondence to: Hokkaido University, N14-W9, Kitaku, Sapporo, 060-0814 JAPAN. Tel.: +81 011 706 7105; Fax: +81 011 706 7105; E-mail:
[email protected]
"No signal" "Signal" ρs
ρn
Density
Figure 1. Flows against densities at the traffic system under the signal indication, and at not influenced system.
Total of vehicles
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
Flow
228
∫ Qin
Lq
Wq ∫ Qout
"Input flow" "Output flow"
t
Time
Figure 2. Temporal transition of the total inflow and the transition of the total outflow.
standing on macro traffic flow models [2]. In this research, micro traffic flow is simulated to discuss the details of signal jam which is emerged by interactions of individual vehicles. Herewith we suggest a new criterion to evaluate signal jam. Aperiodic boundary condition is set in the simulations since it assumed city traffic flow. However it is difficult to conserve the number of vehicles in the system in aperiodic boundary condition. Then we discuss it by using vehicles’ inflow rate as a criterion. The signal jam grows backward from the signal according to the increase of inflow rate. In these surroundings, queuing theory is applicable to signal jam. The signal jam is relaxed by inhibition of delay propagation. Hence, we evaluate signal jam by using critical inflow rate which defines the border whether jam conserves limited scale.
2. Macro properties of traffic flow In this section, we present that critical inflow rate which evaluates growing of jam is defined by vehicles’ average passing time since queuing theory is applicable to signal jam. [vehicles/m] is set by the low density side of the signal The natural jam’s critical point jam’s critical point [vehicles/m] as shown in Figure 1. The flow volume is saturated if vehicles’ density increases from this point as shown in Figure 1. This phenomenon occurs because the red indications restrict passing vehicles at the signal. We employ inflow rate as criterion since it is independent of system situation. It shall be useful to employ critical value of inflow rate as criterion of signal jam. They are related through queuing theory as an intermediary. If the signal can correspond to the counter and the passing time can correspond to service time, queuing theory is applicable to the city traffic jam when macro probability variables of waiting time and queue are averaged. [vehicles/s] and outflow volume [veIf the difference of inflow volume [vehicles] hicles/s] becomes stationary, the average number of vehicles in the queue [s] become constant. The vehicle at the end of the queue costs and their waiting time to arrive the signal if the queue consist of vehicles on average as shown in Figure 2. Some of inflow vehicles can not pass through the signal directly. Then, the intercepted vehicles form signal jam.
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
229
(1) . [vehicles/s] is average inflow rate from to . We put relates to through inflow rate , regardless of the statistic distributions of arrival processes or passing times. (2) corresponds to signal jam. Then signal jam corresponds to queue. Then we discuss the limit of inflow rate which inhibit jam finite range by introducing queuing theory. If vehicles arrive in manner of Poisson process, their passing times form generic distribution and there is one signal, this situation is represented (M/G/1) by Kendall’s notation. In this case, the average number of vehicles in queue is described by Pollacczek Hinchine’s formula [3]. (3) is the average passing time, is the quadratic moment of the passing time. If , queue grows infinitely. Then the jam spread to backward afar off. We set the critical inflow rate, (4) Then the inflow rate
should satisfy
in order to maintain a finite queue.
3. Simulation model for traffic flow It is important to construct traffic simulation models which expresses actual situations. Yukawa & Kikuchi describe traffic flow model by using a Coupled Map Lattice (CML) model to simulate free way flow [4]. CML can adjust velocity for vehicles’ gaps every time steps. We can model the automatically deceleration of the velocity fluctuation and the vehicle’s smooth stopping by extending their model [5] [6]. We can describe vehicle’s accelerating action respect to previous vehicle simply.
(5)
is vehicle’s velocity at time , is goal velocity of each vehicle. Parameters and concern acceleration characteristic, and braking effect respectively. And, and concern scale of fluctuation. We call this CML model ”Neuro based CML model”. every time step by inputting its velocity The vehicle updates its position which is given by this map. (6)
230
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
If vehicle is set at position , and vehicle influenced by vehicle through its gap .
is set at position
, vehicle
is (7)
is the length of a vehicle. The vehicle runs without speed reduction, when the gap is sufficient large. But, when the gap gets narrow, the vehicle should brake. It will follow the previous one with conserving safe distance. We define the distance by using Fujii’s safe gap [7]. (8) [km/h] need not decelerate if its gap is larger than A vehicle running with velocity [m]. But, it will decelerate to the goal velocity to avert collision if the gap gets more narrow than . (9) is the vehicle’s braking acceleration. If the vehicle’s gap becomes narrower than the minimum gap (=4[m]), it stops. Thus, equation (9) is used in the range of . When the vehicle comes right in front of the signal, its dynamics are changed by signal’s indications. If the signal indicates green, the velocity does not change. But, if the signal indicates red, it decelerates and stops by inserting the distance to the signal as to Eq. (7).
4. Traffic flow simulation in signal set system We research how the queue and outflow are changed by the signals’ indications. We employ the neuro based CML model which was introduced in chapter 2. Its parameters are . In addition, all vehicles’ lengths are 4[m] and goal velocities are 50[km/h]. We prepare a one way traffic lane under the aperiodic boundary condition with conserving the vehicles’ inflow rate constantly. When vehicles move, they have three procedures. First, their gap against the previous vehicle is measured. Second, their velocities are determined from preset gaps and velocities. Third, all vehicles move. All vehicles emerge at position 0[m] in manner of Poisson process. They are influenced by the signal set at 5000[m]. The signal has two parameters, cycle length (period of signal indicating time, the unit is [s]) and split (green indication’s rate). We prepare following indications. Periodic indication:Both green and red indicating rates are fixed 50% for cycle length as shown in Figure 3. Random split indication:Green indicating rate % and red indicating rate change randomly as shown in Figure 4. But, their time means are equal.
%
Random split indication can be regarded as random noise polluted cyclic indication about its changing point between green and red. The changing point can move only the range
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
Figure 3. Periodic indication
231
Figure 4. Random split indication
Output Flow [vehicles/s]
which is product of the noise intensity (uniform random numbers, from 0 to 1) and cycle length. We put the noise intensity 0.2, for decreasing of flow fluctuations. In other words, both indicating times change from ”0.4 cycle length” to ”0.6 cycle length” randomly. We plot outflows which pass through the periodic and random indicating signals against cycle lengths when inflow rate is 0.167[vehicles/s] as shown in Figure 5. Random 0.167 0.1665 0.166 0.1655 0.165
"Periodic indication" "Random split indication"
20 40 60 80 100 120 140 160 180 200 Cycle length [s] Figure 5. Output flows of each indications against cycle lengths.
split indication runs something more vehicles than periodic indication until about 90[s]. If cycle length goes over any more, there are not large differences between the two indications. We plot space-time plots of both indications about each 20[s], 50[s], 100[s], 200[s] cases in Figure 6 Figure 13 to compare the features of the jams. Dots express existence of vehicles at certain times in certain place in these plots. The dense parts are jams.
Figure 6. Periodic indication : Cycle length = 20[s]. Figure 7. Random indication : Cycle length = 20[s].
When cycle length is 20[s], the jam happens only right in front of the signal with both indications. But, periodic indication’s jam is a little larger. When cycle length is 100[s], jams grow larger than in 20 [s] and they propagate to backside. There are not large differences between both indications. These two cases belong to finite jams’ area. When cycle
232
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
Figure 8. Periodic indication : Cycle length = 50[s]. Figure 9. Random indication : Cycle length = 50[s].
Figure 10. Periodic indication : Cycle length = 100[s]. Figure 11. Random indication : Cycle length = 100[s].
Figure 12. Periodic indication : Cycle length = 200[s]. Figure 13. Random indication : Cycle length = 200[s].
length is 200[s], jams spread afar off in both indications similarly. This case belongs to saturated jams’ area. But, when cycle length is 50[s], there is radical difference between both indications. Periodic indication’s jam is obviously larger than the random split indication. Next chapter, we analyze the relation between the growths of jams and signals’ indications.
233
Distribution [a.u.]
Distribution [a.u.]
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
0.4
0.3 0.2 0.1 0
0.4 0.3 0.2 0.1 0
0
10
20
30
40
Transit time [s]
Figure 14. Histogram of transit times in front of the periodic indicating signal.
0
10
20
30
40
Transit time [s]
Figure 15. Histogram of transit times in front of the random indicating signal.
5. Analysis of jams We plot histograms of passing times of vehicles at the signal when cycle length is 50[s] in Figure 14 and Figure 15 to know the both indications’ capacities for running vehicles. Both indications have tall peaks at about 4[s]. In periodic indication, there is a little peak at about 25 [s] which corresponds to red indicating time. But, in random split indication, the peak’s dispersion is larger than periodic indication because of its randomness. Further more, we compare numbers of vehicles in queues between both indications against inflow rates in Figure 16. The average passing time is 5.8279 [s] in periodic indication, and 5.6823 [s] in random split indication. The critical inflow rates are calculated from these values by using eq.(4). The critical inflow rate is 0.1716 [vehicles/s] in periodic indication, and is 0.1760 [vehicles/s] in random split indication. That is to say, the number of vehicles increases exponentially when it approaches to 0.1716 [vehicles/s] in periodic indication. But, the number of vehicles doesn’t increase exponentially yet in random split indication. It says that the difference of vehicles’ accumulations between Figure 8 and Figure 9 is derived from critical inflow rates of signal indications. In addition, we plot theoretical values of (M/G/1) for comparison. From this result, random noises seem to make the signal’s critical inflow rate get up, and jam’s growth is inhibited very large. Laszlo et al. reported effectiveness of noise in traffic lanes with a junction as spectral stochastic resonance [8]. The relation between the signal and stochastic resonance doesn’t become evident. But, in this research, it may appear to be related. Then, we plot output flow against noise intensity in Figure 17. If the noise ranges are 0.1 to 0.5, outflows are larger than in other ranges. But, there is not apparent local maximal value. We can’t say the effectiveness of stochastic resonance in this phenomenon completely. But, some factors are conceivable. Random indication may break the jam which grows like crystal by red indications. Or, frequently long green indication may inhibit growth of the jam by running vehicles at a breath like vibrations of a sieve.
6. Conclusion We simulate traffic flow by using inflow rate as criterion to evaluate signal jam. We employ neuro based CML model to express traffic flow model. Vehicle’s acceleration, velocity fluctuation and following action which depends safe gap is incorporated in this model.
234
K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam
150
"Periodic (Observed)" "Periodic (M.G.1)" "Random (Observed)" "Random (M.G.1)"
Output Flow [vehicles/s]
Queue [vehicles]
200
100
50
0 0.155
0.16
0.165
Inflow rate [vehicles/s]
0.17 0.175 λc1 λc2
Figure 16. Queues of each indications against inflow rates.
0.1668 0.1664
0.166 0.1656 0.1652 0
0.2
0.4
0.6
0.8
1
Noise intensity
Figure 17. Output flow against noise intensity at random split indication.
We introduce queuing theory to analyze growths of jams caused by signals from a new perspective. It is conceivable to evaluate signal jam by using critical value of inflow rate. There is a critical value of inflow rate to conserve the queue of the visitor finite length. This critical point exists in traffic jam too. These points can be derived by the same mechanism. In this research, jams grow rapidly if inflow rate of vehicles approach to this value. We compare outflow of vehicles and growths of jams derived from periodic indicating signal and random split indicating signal. Random split indication expands little more outflow than periodic indication in lesser cycle length by using outflow as criterion. There is apparent difference between them by using critical inflow rate as a criterion. Then, the critical inflow rate shall be useful to evaluate signal jams derived from various signal indications. In random split indication, the plot of outflow against noise intensity is analogous to stochastic resonance. But, specific clarification remains as problem. Extension to multi signals systems and intersection systems are other challenges. On that basis, we will construct an adaptive traffic signal which inhibits the growths of the city traffic jams.
References [1] D Helbing, M Treiber : ”Gas-Kinetic-Based Traffic Model Explaining Observed Hysteretic Phase Transition.” Physical Review Letters, 81 : 3042–3045, (1998). [2] M. Kuwahara and T. Akamatsu : ”Dynamic user optimal assignment with physical queues for a many-to-many OD pattern” Transportation Research Part B, Vol.35B, No.5, pp.461-479, Jun. 2001. [3] H Morimura, Y Oomae : ”Application of the queues.” Union of Japanese Scientists and Engineers Press, Ltd. : 1–82, (1980) in Japanese. [4] S Yukawa, M Kikuchi : ”Density Fluctuations in Traffic Flow.” Journal of the Physical Society of Japan, 65 : 916–919, (1996). [5] K Hamaoka, M Wada, K Ishimura : ”Coupled Map Lattice Model based on Driving Strategy for City Traffic Simulation.” Proc WSTST’05. : 1173–1182, (2005). [6] M Wada : ”Cyclic Sequences Generated by Neuron Model.” Electronics and Communications in Japan. 65-2, pp.1-10(1982) [7] E Yoneya, S Watanabe, M Mouri : ”Traffic engineering.” National Science Press, Ltd. : 117– 121, (1965), in Japanese. [8] Laszlo B Kish, Sergey M Bezrukov : ”Flows of cars and neural spikes enhanced by colored noise.” Physics Letters A. 266 : 271–275, (2000).
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
235
Coordinated Control of Mobile Antennas for Ad-Hoc Networks in Cluttered Environments Gianluca Antonelli a , Filippo Arrichiello a,1 , Stefano Chiaverini a and Roberto Setola b a DAEIMI, Università degli Studi di Cassino, Italy b Lab. Sistemi Complessi e Sicurezza, Università Campus Bio-Medico di Roma, Italy Abstract. Mobile ad-hoc networks are used to extend and adapt the area covered by a static network of antennas. This is the case of an autonomous agent moving in a cluttered environment that needs to be connected to a fixed, limited-range, base station. The wireless network can be obtained by resorting to a platoon of mobile robots carrying each one antenna. Self-configuration of the robots’ platoon is achieved by a singularity-robust task-priority inverse kinematics algorithm via the definition of suitable task functions. The effectiveness of the proposed algorithm has been extensively verified via numerical simulations involving an autonomous mobile robot moving in a cluttered environment. Keywords. Mobile Ad-hoc NETworks (MANETs), Coverage Area Adaptation, Multi-Robot Systems
1. Introduction Communication among mobile agents is a crucial requirement in many applications to support cooperation of team members in fulfilling a given mission. Traditional solutions to the problem of mobile communication, e.g., cellular networks, require coverage of a wide area through a number of space-fixed antennas that serve partly overlapping local areas (the cells); an user then communicates with the others through a wireless connection to the local antenna that conveys all the traffic within the cell. While the above static placement of the antennas is cost/performance effective when there is always a large number of users to be connected within the cell (e.g., metropolitan areas or highway paths), in many applications it would be more convenient to adopt an approach where the antennas are dynamically placed to optimize the coverage needs as in survey and rescue missions to be performed inside a damaged building or in exploration of unknown areas. In this scenario, wireless Mobile Ad-hoc NETworks (MANETs) represent a promising approach to overcome the drawbacks of fixed-configuration networks. A MANET is a collection of autonomous nodes that communicate each other without using any fixed 1 Correspondence
to: Filippo Arrichiello, Dipartimento di Automazione, Elettromagnetismo, Ingegneria dell’Informazione e Matematica Industriale, Università degli Studi di Cassino, via G. Di Biasio 43, 03043 Cassino (FR), Italy. Tel.: +39 0776 2994323; Fax: +39 0776 2993707; E-mail:
[email protected]
236
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
networking infrastructure. Each node in a MANET operates as both a host and a router. Therefore, any node can communicate directly with nodes that are within its transmission range and, in order to reach a node that is out of its range, data packets are relayed over a sequence of intermediate nodes using a store-and-forward multi-hop transmission principle. This approach is very attractive also for mobile robot applications. For this reason, a number of researchers have started proposing ad-hoc wireless solutions for mobile robot communication (see, e.g.,[8,9,12]). However, these solutions consider position and motion of each node as uncontrollable variables. Assuming instead that at least some of the nodes may be controlled, more performing MANETs can be designed. Indeed, a suitable dynamic reconfiguration of the robots’ position allows to adapt the coverage area of the communication network to better support the team’s mission, to avoid signal fading area (e.g., that induced by the presence of obstacles), and to handle possible fault of some team members. First studies on this topic were related to use mobile robots to provide adaptive sensor network in a dynamic environment. In [10] Delaunay tessellation and Voronoi diagram are used to define an algorithm able to maximize the coverage area of the network. A more general formulation for the coverage problem is proposed in [7], where a distributed asynchronous algorithm is developed to dynamically reconfigure a mobile sensors network taking into account sensor performance; specifically, in this approach the control input of each vehicle is proportional to the distance from its actual position and the centroid of the associated Voronoi region. In reference [5], to guarantee communication for a mobile robot involved into a dynamic coverage problem, a static network of markers is autonomously dispersed by the robot during its motion (these markers are also used to improve localization capability). In this paper, following the preliminary results in [1], we consider the problem of dynamically adapting the configuration of a platoon of mobile robots equipped with a wireless device (that we will call antenna) so as to realize an ad-hoc network to support communication of an autonomous agent, i.e., an autonomously driven vehicle or an human, with a fixed base station. Antennas coordination for agent coverage is achieved in the framework of kinematic control of platoon of mobile robots [3], by resorting to properly defined task functions. This approach has shown to be efficient and reliable in simultaneous handling of several control objectives, namely, agent-antennas-station connectivity and obstacle avoidance. The proposed approach has been simulated while a platoon of mobile antennas successfully performs a communication coverage mission of an autonomous robot navigating in a cluttered environment.
2. The MANET case In this section the details of the proposed solution to the MANET problem will be presented. In the considered case, a platoon of self-configuring mobile antennas is aimed to realize a communication bridge between a fixed base station (e.g., an Internet access point) and a mobile agent (a mobile robot or a human operator) performing its autonomous mission (see Figure 1). Thus, the antennas have to dynamically configure themselves so as to ensure the communication coverage of the mobile agent during the all mission, avoiding obstacles present in the environment and eventually performing secondary tasks.
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
237
agent
mobile antennas
base station
Figure 1. Sketch of the coverage problem to be solved; the autonomous agent needs to be connected with the base station by the use of a platoon of mobile antennas.
Let us denote with the subscript i the generic antenna as well as the base station, i.e., it is i ∈ [0, n] where i = 0 denotes the base station and i = 0 the generic antenna. Each antenna is supposed to cover in open space a circular area Ii centered in pi (the position of the i-antenna) and of radius rmax,i = rmax,i (i, pwi ) ,
(1)
where pwi is the current level of power used for transmission. It can be noticed that rmax,i depends on the characteristic of the i-th antenna and on the level of power used for transmission pwi . During the execution of a mission the level of power used for the transmission may be varied in order to preserve the battery level or, on the other side, can be decreased when the battery level is low. In order to guarantee adequate operative margins, we will constraint each antenna to dialogue in a smaller region, identified via the radius dmax,i = (1 − Δ) rmax,i
0 < Δ < 1,
(2)
where Δ is a design parameter. Moreover, in order to avoid collisions, all other bodies in the environment (i.e., base station, other antennas, agent and obstacles) must be farther than dmin,i from each antenna. Notice that this definition is similar to that of comfort zone in [11], therefore, this term will be used in the following to denote the area shown in Figure 2. Let us define as virtual chain a virtual connection that, starting from the base station, reaches all the n antennas keeping all the couple of successive antennas in their relative comfort zones. That is, defining a n-dimensional vector k that collects the indexes of the
238
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
rmax dmax dmin antenna
Figure 2. Comfort zone of an antenna and relevant distances.
antennas in their order in the virtual chain, then pkj ∈ Ikj+1 and pkj+1 ∈ Ikj . Thus, if the agent belongs to the comfort zone of one of the antennas, then the virtual chain represents the communication bridge between the agent and the base station. 2.1. Algorithm details The coordination control technique for multi-vehicle systems presented in [1] and inherited from [2,6] will be used to suitably coordinate the motion of the platoon of mobile antennas so as to implement an Ad-Hoc Mobile network. To solve the coverage problem by resorting to this technique, it is necessary to properly define the task functions and properly arrange them in a priority order. The task functions to be achieved by the antennas are, namely, obstacle-avoidance and keep-in-comfort-zone. The obstacle-avoidance task function is aimed to keep each antenna to a safe distance do ≥ dmin from all the obstacles (i.e. other antennas, base station or the agent). The keep-in-comfort-zone task function has to keep each couple of successive antennas in the virtual chain to a distance dc ≤ dmax . The task functions to be achieved by the antennas in order of decreasing priority, being 1 the highest and assuming that k0 is the base station, are: • for an antenna along the virtual chain kj with j ∈ {1, n−1}: 1. obstacle avoidance; 2. keep the next antenna kj+1 in the comfort zone; 3. keep the previous antenna kj−1 in the comfort zone. • for the last antenna in the virtual chain kn : 1. obstacle avoidance; 2. keep the previous antenna kj−1 in the comfort zone. Moreover, for the antenna closest to the agent (not necessarily kn ), the task of keeping the agent in the comfort zone is considered with a priority lower than the obstacle avoidance and higher than the others. With respect to the above list, thus, this task is at a priority 2 while the other tasks from priority 2 skip of one unity downward. Notice that the obstacle avoidance task function, preserving the integrity of the system, has always the highest priority. Wider discussion on the obstacle avoidance possibilities as well as implementation details will not be given here for seek of space, in the simulation Section
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
239
a rather complex environment will be handled with this approach. The limit of this obstacle avoidance approach, common to all the path planning algorithm relying on local information only, is the possibility to get trapped in local minima. Further details can be found in [3]. Once the task functions have been defined, the velocity contribution for each antenna due to each task is elaborated referring to a singularity robust closed loop inverse kinematics technique inherited from [2,6]. Then, the velocity contributions of each task are combined using a task priority based algorithm to compose the finale, reference, velocity for each of the antennas. More details on the proposed technique can be found in [1]. 2.2. Dynamic handling of the virtual chain As mentioned before, in order to guarantee the proper connection between the agent and the base station, the antennas dynamically self-organize to realize a virtual chain. This configuration is obtained using the follow algorithm: 1 from the base station k0 look for the closest antenna inside I0 and label it as the first in the virtual chain k1 ; make k1 the current antenna; 2 from the current antenna in the virtual chain kj look for the closest antenna inside Ikj in the remaining antennas and label it as the successive in the virtual chain kj+1 ; make kj+1 the current antenna; 3 repeat the point 2 increasing the index of the current antenna until there is no remaining antenna; if j +1 = n go to step 4, else keep the virtual chain formed at tk−1 , i.e., k(tk ) := k(tk−1 ) 4 virtually connect the agent to its closest antenna (not necessarily kn ). 3. Simulation Results The proposed algorithm, aimed at guaranteing the wireless communication bridge between the agent and the base station by means of several mobile antennas, has been extensively tested in numerical simulations. In all the simulations run the agent is a robot performing its missions independently from the antennas. In particular, the mobile robot has to reach predefined via-points, avoiding collision with obstacles present in the environment. Moreover, the robot’s movements are coordinated with the MANET so that it can eventually receive stop when in escaping the comfort zone of its nearest antenna to allow reconfiguration of the antennas. Notice that, as explained above, this behavior is done autonomously by considering the proposed kinematic control and not explicitly commanded to the robot. The proposed scenarios (see fig. 3, 4) simulate the navigation of an autonomous robot in a building while a platoon of antenna ensures the communication coverage. The robot has to reach different goals passing trough predefined via-points and, obviously, avoiding the walls. The antennas have to ensure the communication between the robot and the base station avoiding collisions with walls, robot, base station and among themselves. The simulations starts with the 10 antennas randomly distributed around the base station following an uniform distribution centered in position [0 0]T m with an interval of 50 m on the coordinates and verifying that a virtual chain is formed and the robot in the initial position [10 10]T m.
240
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
t = 138
t = 0.3
8 7 695412 3 10
7985412 6103
t = 561
t = 414 87 6
t = 276
9 10 351 42 7 8 6
351 4 2 9 10
541 9 103 2 6 7 8
t = 690
6897 103 5 1 4 2
Figure 3. Snapshots of a coverage mission for a platoon of mobile antennas in an indoor environment. The robot (blue dot) has to reach the several goals (marked by a ×), while the antennas (red dots) have to ensure connection between the robot and the fixed station (the star).
For the simulations reported the following parameters have been used rmax,i = 24 m dmax,i = 16 m dmin,i = 6 m T = 300 ms while the maximum velocities are 1 m/s for the robot and 1.5 m/s for the antennas. It can be noticed that dmax,i has been kept constant ∀i. The robot moves in an environment with obstacles and it is required that it stays at a minimum distance of 10 m from them. Figure 3 reports six snapshots of a mission execution while an autonomous robot reaches multiple via points in different rooms; the robot reaches the final goal in about 690 s while the antennas do maintain the coverage with the base station during the all mission and avoid the obstacles themselves. Notice that the virtual serial chain among the antennas can dynamically change during the mission, giving more flexibility to the approach; moreover, differently from [4], this implies that there is no need to impose an explicit order to the antennas. Figure 4 reports six snapshots of a mission execution while an autonomous robot reaches multiple via points in a corridor with a corner; the robot reaches the final goal in
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
t = 60
t = 0.3
241
t = 120
8 4 1 327 58910 46
t = 180 4 8 5 1 3 10 27 96
51 3 2 910 67
8 1 10 4 5 9327 6
t = 240 1 5 4 8 3 10 2 7 6 9
t = 300 10 3 1 5 4 2 8 7 6 9
Figure 4. Snapshots of a coverage mission for a platoon of mobile antennas in a complex environment. The robot (blue dot) has to reach the several goals (marked by a ×), while the antennas (red dots) have to ensure connection between the robot and the fixed station (the star).
about 300 s while the antennas do maintain the coverage with the base station during all the mission and avoid the obstacles themselves. During the simulation, the robot sometimes can reach the boundary of the comfort area of the nearest antenna; in these cases the robot stops and waits for reconfiguration of the network.
4. Conclusions In this paper we have proposed an algorithm to generate coordinated motion of a platoon of mobile antennas for dynamic reconfiguration of the coverage area of a mobile adhoc networks to ensure a communication link between an autonomous agent, that moves independently from them, and a fixed base station. The algorithm is based on the singularity-robust task-priority inverse kinematics approach that can simultaneously handle multiple tasks. Remarkably, the configuration of the communication chain and the relative priority of the tasks can be dynamically changed during the mission execution for increased effectiveness of the coverage. In this preliminary simulation study, navigation in a complex scenario has been considered. Future work will be devoted at considering coverage of more than one moving agent, occurrence of faults in the antennas, and non-chained link topologies.
242
G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks
References [1] G. Antonelli, F. Arrichiello, S. Chiaverini, and R. Setola. A self-configuring MANET for coverage area adaptation through kinematic control of a platoon of mobile robots. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1307–1312, Edmonton, CA, Aug. 2005. [2] G. Antonelli and S. Chiaverini. Kinematic control of a platoon of autonomous vehicles. In Proceedings 2003 IEEE International Conference on Robotics and Automation, pages 1464– 1469, Taipei, TW, Sept. 2003. [3] G. Antonelli and S. Chiaverini. Fault tolerant kinematic control of platoons of autonomous vehicles. In Proceedings 2004 IEEE International Conference on Robotics and Automation, pages 3313–3318, New Orleans, LA, April 2004. [4] P. Basu and J. Redi. Movement control algorithms for realization of fault-tolerant ad hoc robot networks. IEEE Network, 18(4):36–44, 2004. [5] M.A. Batalin and G.S. Sukhatme. Coverage, exploration and deployment by a mobile robot and communication network. Telecommunication Systems – Special Issue on Wireless Sensor Networks, 26(2):181–196, 2004. [6] S. Chiaverini. Singularity-robust task-priority redundancy resolution for real-time kinematic control of robot manipulators. IEEE Transactions on Robotics and Automation, 13(3):398– 410, 1997. [7] J. Cortés, S. Martínez, T. Karatas, and F. Bullo. Coverage control for mobile sensing networks. IEEE Transactions on Robotics an Automation, 20(2):243–255, 2004. [8] P.E. Rybski, N.P. Papanikolopoulos, S.A. Stoter, D.G. Krantz, K.B. Yein, M. Gini, R. Voyles, D.F. Hougen, B. Nelson, and M.D. Ericksn. Enlisting rangers and scouts for reconnaissance and surveillance. IEEE Robotics & Automation Magazine, 7(4):14–24, 2000. [9] G.T. Sibley, M.H. Rahmi, and G.S. Sukhatme. Robomote: A tiny mobile robot platform for large-scale ad-hoc sensor networks. In Proceedings IEEE International Conference on Robotics and Automation, pages 1143–1148, Washington, DC, May 2002. [10] J. Tan, O.M. Lozano, N. Xi, and W. Sheng. Multiple vehicle systems for sensor network area coverage. In Proceedings 5th World Congress on Intelligent Control and Automation, pages 4666–4670, Hangzhou, PRC, June 2004. [11] J. Vazquez and C. Malcolm. Distributed multirobot exploration maintaining a mobile netwok. In Proceedings 2nd IEEE International Conference on Intelligent Systems, volume 3, pages 113–118, Varna, Bulgaria, June 2004. [12] Z. Wang, M. Zhou, and N. Ansari. Ad-hoc robot wireless communication. In Proceedings IEEE Conference on Systems, Man and Cybernetics, pages 4045–4050, Washington, DC, Oct. 2003.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
243
An Adaptive Behavior of Mobile Ad Hoc Network Agents Masao Kubo a,1 , Chau Dan a and Hiroshi Sato a Takashi Matubara a a National Defence Academy, Japan Abstract. In this paper, the authors are interested in building a backbone network by robot collective, which can supply communication service for disaster site. At first, we show the typical characteristics of static formation of network. Then, applying the HOT&COLD theory, we proposed 3 core heuristic rules of wireless access points. With the proposed rules, the emerged networks can achieve the performance same as the static grid pattern formation. Keywords. manet, Cascade Failure, Hot&Cold
1. Introduction Mobile Ad hoc Network (MANET) is a technology, which offers communication services for users anywhere and anytime. It is robust for changing of topology and user distribution. Therefore, MANET is useful when wire network facility can’t sufficient, so that it will be more important in future in case of fire, flood, and earthquake. Fig.1 illustrates a scenario of MANET in devastated district. A group of wireless communication airships fly above the disaster area and provide backbone communication services for users (we denote as host) on the ground. In this case, if airship is not sufficient capability, they change their position according to the loads and the distribution of the communication requests from users. In this paper, we want to discuss the adaptive behaviors of the mobile agents to generate and maintain a high performance network. Basically, in MANET research, it is supposed that the behavior of mobile access points (we call it agents) and network performance is independent so that agents do not need to behave to improve network performance even though it is worse. A lot of sophisticated network protocols have been discussed in ITS field (in cellular network field, base stations are not mobile). However, recently, the wireless facility is getting smaller and more powerful so that we believe ‘mobile’ base station will be common and a key component for future swarm robotics. In this paper, a group of mobile wireless access points (agents) which generates backbone network for (hosts) who have more lower power wireless communication capability is studied. Each agent has wireless communication capability and it can open communication channel to other agents within a particular distance. Hosts can ask the nearest agent to send a fixed size message (request) to other hosts for them. Generally, 1 Masao KUBO, Dep. of Computer Science,National Defense Accademy, Hashirimizu 1, Yokosuka, Kanagawa, JAPAN,239-8686. Tel.: +81 46 841 3810; Fax: +81 46 844 5911 ; E-mail:
[email protected].
244
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
Figure 1. Scenario: a group of wireless agents and users
wired/wireless network are used without discrimination in MANET. However, in this paper, for simplicity, we suppose that all communication only use wireless facilities. This paper is composed as follows: at the next section, a scenario and a model for the MANET is defined. In section 3, we show that by numerical simulation, an adequate size grid of static agents (agent doesn’t move) can generate a good network. In section 4, the emerged network from 3 proposed rules using local information can achieve almost same performance as the medium size grid pattern with non-mobile access points. But in some cases, the deviation of distribution is worse than the static formation’s and it is too large to be accepted. Then, we discuss HOT&COLD approach, which can improve the network performance in this case.
2. Scenario and Model A group of mobile wireless access points (agents) which generates backbone network for (hosts) who have more lower power wireless communication capability is given. A given workspace W ∈ R2 is rectangle. On W , a group of agent G = {1, . . . , N }, N = |G| is allocated. The communication speed cs between agent and agent, agent and user, (d is the distance between them) (d + 1)−cd .
(1)
And if cs > kcut
(2)
is satisfied, the agent opens communication channel without any other conditions [5]([1] introduced negotiation process to realize small world connection topology). cd , kcut are constants and in the next experiments cd = {2.0, 2.6},kcut = 0.002 is used. Communication requests come from Host union not Agents union. A member of U wants to send a particular size of message to others, which is in different location ∈ W . A request rq contains 2 location information: psend , the location of sender ∈ U , and pdest , the location of the destination. At first, the nearest agent ∈ G decides to accept the request or not . If the communication speed between the sender and the nearest agent is satisfied the condition 2, the request is accepted and the nearest agents will try forward that rq to an adequate neighbor agent. On the other hand, if the condition 2 is not satisfied, the
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
245
Figure 2. Left:Successful request of cd =2.0.Right: The result by ns2(FastSpeed)
request is aborted but the request location information is recorded. Also, the location information is also recorded when the forwarded requests do not reach the destination user correctly even though it was accepted. A lot of routing protocols are proposed for MANET, for example, AODV, DSR, and etc. However, there is no universal protocol that network designers select best one according to its application, frequency of topology change, mobility of agents, and the amount of overhead, for example [2]. In this paper, we are interested in MANET characteristics under more ideal condition and Dijkstra Algorithm is selected. Each request is forwarded to the neighbors by the shortest path from source to destination. The weight of arc of Dijkstra Algorithm is 1/cs,
(3)
where cs is a communication speed of a given pair of agents ∈ G. Note that the weight is not fixed. Each agent can handle ALR(Allowable Load Resistance) requests per step and can forward (ALR − 1/cs) requests at the step after it forwards a request using cs communication speed link. If ALR − 1/cs < 0,
(4)
we suppose that this agent is saturated and it can not transfer any more requests. Therefore, weight of the all arcs is set 1. A communication link’s weight is also set to 0 if the transfer via this link made this condition to be true. Also, all agents are homogeneous so that ALR is a constant. 2.1. Verification of model Fig.2 Left shows the relation between SR(Successful Request) and the load. The x-axis indicates the requests per step. We execute 10 runs using different initial formations. Fig.2 Right shows the simulation result by using network simulator ns2 [10] with the same simulation setting. As you can see, our model offers the same characteristics with ns2 simulation result. The interesting point is: the distributions are very different, depending on the agent formations. At the beginning, the SR dramatically increases because Load is still small so that all requests are delivered correctly if it is accepted. Therefore, the SR increases
246
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
Figure 3. Left:Static Formation and links of cd =2.0(From Left to Right: random,circular、large griddal formation、the middle、the small. kcut =0.002.N =16. Right:that of cd =2.6(No links are achieved under this condition by Eq.2)
linearly. However, once Load is larger than some thresholds, the network starts to fail delivering the accepted requests. Under this circumstance, the agents around the center of network works harder than other near by agents so that some of links of central agents is saturated and requests have to travel via longer paths [4]. It makes agent’s Load increase so that SR is degraded after a threshold. However, the thresholds of 10 runs are quite different. Some of them keep forwarding a particular amount of requests while it can’t delivery.
3. Static agents Formations Well, what kind of formation is good? Therefore, we evaluate static formations shown by Fig.3 Left and Right. The cd of Fig.3 Left is 2 and that of the Right is 2.6 respectively. Circles indicate agents and the lines mean wireless links which satisfy condition 2.Some of formation of the both figure is same but the topology is different because of cd . The geographical distribution of host is same as the last experiments and the destination host is also picked up from workspace W using uniform random function. What kind of network formations offers good performance? First, we evaluate the static formations shown by Fig.3 Left and Right. The cd of Fig.3 Left is 2 and that of Fig.3 Right is 2.6, respectively. Circles illustrate agents and lines illustrate wireless links, which satisfy condition (2). In both figures, the topologies are depended on cd value. The geographical distribution of hosts is the same as the last experiment and the destination host is also picked up from workspace W using uniform random function. Fig.4 Right illustrates the histogram distribution of SR of 10000 runs when the requests are constant (=100 requests per step). The x-axis means SR(successful request) and the y-axis means the logarithmic transformed histogram. Fig.2 Left and Fig.2 Right represent the result of cd =2.0, 2.6, respectively. In Fig.2 when Load is under 100 requests, the distribution of SR is almost in the “linear region”. With random formation, the distribution of SR of cd =2.0 heavily dispersed from 30% to 100% because it strongly depends on randomness
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
247
Figure 4. Left:Result of the proposed rule of cd =2.0.Right:Result of the proposed rule of cd =2.6
that agents make a connected graph. On the other hands, circle, 3 grid formations give better results and the deviations are not too large because it is secured that agents make a connected graph. Also, the 3 grid formations show that the medium size grid pattern is best one among them. When cd =2.6 (see Fig.4 Right), agents of random formation are not able to generate good network because the peak of SR curve is almost 10%. Circle formation also gives a bad distribution of SR. The reason is if slightly large amount of requests are occurred around a particular agent, the links of the agent are saturated by condition 4. If all agents connect to its neighbors, the average distance to other agents is equal while the average distance is completely different once agents without links to its neighbor are occurred (the network shape looks like ‘C’). On the other hand, even if some agents lose their links, the network can offer sub optimal path to the destination. Of course, all grid formations don’t give high distribution of SR. For example, the agents of the smallest grid are too far from senders so that the accepted requests are few. Simulation results tell us following conclusions: basically, there are 2 approaches to improve the SR performance. 1) Increasing the accepted requests. 2) The global network should be connected and robust against of load. Especially, the agents around the center of network should be given more links to its neighbors. The balance of these 2 factors is important so that the medium size of grid formation is best among them. Therefore, in the following experiments, we would like to use this formation as criteria for evaluation of mobile wireless agent’s behavior.
4. Mobile agents: behavior rules In this section, we propose fundamental rules and show a typical performance of the distribution with mobility. 4.1. proposed rules All agents are supposed to move simultaneously and no agent knows other agent’s distance and direction of next movement. Also, at first, all agents are distributed over
248
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
workspace W by uniform random distribution. We suppose that each agent can estimate direction and distance of its liked agents from the intensity of electric wave. 4.1.1. Isolated agents Agent have no link is called isolated agent. As in Fig.3, isolated agents will be occurred. Obviously, in this case, it should connect to other agents and comes closer to others. Then, we propose the following rule. Suppose the number of links of an agent Ln and the set of location of information of accepted requests but not delivered yet, Pf d = {pdest }, if
Ln Go to
≤
thL
average
then
(5)
Pf d ,
of
where thL is a constant. By this rule, an agent goes to the average of the fails. If the agent had no link, we expect, the average of the fails indicates almost of the center of workspace W so that isolated agents have a chance to make links with each other. 4.1.2. Behavior of agents with links(1): Load-sharing & Generation of High Speed Linkage The last experiments tell us the method to improve average/deviation of SR is to degrade center agents’ load because heavy load make ‘saturated links’. Therefore, we propose the second rule as follows. Now, let the remaining ALR of the last step of agent i is ΔALRi , if ΔALRj /ΔALRi ≥ thD Go
to
agent
then
(6)
j
where j is one of agents linked to agent i and thD is a constant. By this rule, the connection speed between agent i and j is increased and we expect the load of agent j is decreased. 4.1.3. Behavior of agents with links(2): Increase of the accepted request The second approach is to increase the accepted requests. If the accepted requests are not larger than a threshold very much we denoted at section 2.1, agents accept more requests, we expect, more successful requests will be counted. Therefore, we propose third rule as follows. Now, let a set of location information of rejected requests is Pf a = {psend }, a set of location information of the un-delivered requests is Pf d , and the current position of agent i is pi . The vector from the current position to the average location of all failed and rejected requests, vp is vp = (Pf a ∪ Pf d )/(|Pf a | + |Pf d |) − pi
(7)
and let vp ’s unit vector is evp .Also, to increase the total accepted requests as whole, the overlapping region among agents should be reduced. Therefore, we introduce the
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
249
Figure 5. Left:Result of the proposed rule of cd =2.0.Right:Result of the proposed rule of cd =2.6
repulsive behavior from its nearest agent. Let j is the nearest agent of the agent i. The vector from the agent j to i is vji = pi − pj .
(8)
Also, let evij is the unit vector of vji .Using the above 2 unit vectors, agent i move to the following direction. (evp + evji )/2
(9)
4.1.4. Fragmantation caused by Simultaneous Mobile behavior Here, as described at the beginning of this section, we assume that all agents move simultaneously. Therefore, sometimes, links are broken after a movement because the distance among them is changed (Eq.2). Especially, if this flameout of linkage happens, it will make performance seriously down. Therefore, we think some agents should not move so frequently. Then we make a combination of the above core rules according this discussion. Now, let the highest speed of agent i’s links is csi , a combination is if
csi /kcut ≥ ks then execute else
execute
(10)
Eq.6&9 Eq.5
where ks is a constant. This rule pemits that only agents which are sufficiently close to other agents move but agents which has almost no links. 4.2. Result of the propose rules (1):ks Fig.5 illustrates the result of the proposed rules. At first, all agents are randomly located and change their position 40 times. We evaluate the finally produced network. As the last experiments, we execute this process 10000 times repeatedly and the histogram distri-
250
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
Figure 6. A Snapshot of the produced network with cd =2.6
butions of SR are shown in the both graphs. Fig. 5 Left and Right shows the histograms under cd = 2.0, 2.6, respectively. T hL = 1. As we can see, the set of rule can improve the performance dramatically than its initial state. When cd = 2.0, the curve of ks = 3.0 is almost same as the medium size grid formation which is the best formation in the section 3. The rule with ks = 1.5 can improve the occurrence of very low performance network and the allowable network (around SR=55% and 75%) but the occurrence around SR=100 is worse than the medium grid formation. We suppose that too many agents move simultaneously in spite of unavoidable disparity of load so that they can’t maintain as a connected graph. Under cd = 2.6 situation (see Fig.5 Right), the rule with ks = 1.5 can generate high performance networks more frequently than ks = 3.0 while Rule11 with ks = 1.5 also generate meaningless network very often. Although this results of rule11 with ks = 1.5, 3.0 seems to be a kind of trade-off, we think, it also can be explained by the above dynamics. The disparity of load is not so large because the number of accepted request is smaller so that ks = 1.5 is more adequate for this environment than ks = 3.0 but simultaneous move may their network fragmented. Summary, the combination of core rules can improve the average and distribution of SR from initial formation although ks is set adequately. However, the occurrence of useless network is still much more frequently than the static grid formation. 4.3. Result of the propose rules (2):T hL As you see, the last result based on ks can achieve high peak performance network as same as the static heuristical griddal formation but in sometimes very poor networks are emerged. We would like to improve the large disturvance of the network performacne because it is not so rare enough tha we can regare it as range of error. Highly Optimized Tolerance [6] and Constrained Optimazation with Limited Deviations [7] is optimization concept. In forest fire model[9], a spark burns trees around it and many trees catch fire. Let a farmer plants trees wall-to-wall in its cultivated area. If no sparks are happen, he get a lot of harvest while it can not get any harvest by a single spark. Hot introduces ‘firewall’, which is a intentional blank space. The firewall checks
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
251
Figure 7. Result of T hL >1
the spread of the fire but the number of harvest is down. The optimal size of firewall and several firewall patterns based on HOT concept is shown in [6]. The COLD optimization concept is the degradation of the large size of fire propagation. As you know, in a common forest fire model, the size of spreading fire is estimated by power law. The auther introduce utility and deduces optimal size of firewall based on this concept, which prevents large size fire by a ratio. However, they do not mention about the design methodology of firewall and for other application so far. Therefore, we think that it is a reasonable and important question for future network designer to ask how it should design agent’ behavior to realize meaningful network based on COLD concept. The main reason of the large disturbance is fragmentation around the center because large ks may promote network radiating from the center while rotative direction links are not considered. Fig.6 shows an example of the emerged network on the last experiments. As you see, there are several agents who has only 1 or 2 links. We think the meaningless network is occured when a breakage of long line-like formation is happen around the center area of W . HOT&COLD concept suggests that degration of large error can be prevented by firewall. In our model, excess links could be regared as the firewall cell. Therefore, we set T hL > 1 to increase rotative directions links. Fig.7 shows the result. In this graph, the results of T hL = 1, 2, 3 are depicted. As you see, T hL = 3 can strike a balance between high peak value and low divergence as same as the static griddal formation.
5. Conclusion In this paper, a group of agents, which generates network backbone for users who have less power wireless communication capability, is studied. Each agent can open com-
252
M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents
munication channel to other agents in a particular distance. Users can ask agents send message to other user for them. In section 3, we showed that by numerical simulation, an adequate size grid formation of static agents generate a good network. In section 4, emerged network by a combination of the proposed 3 core rules of wireless access points based on local information offers almost same performance as the best grid pattern with non-mobile access points, but the deviation of distribution is more worse. Finally, we proposed a new combination by using HOT&COLD optimization theory, which prevents the emergence of poor network, and T hL > 1 model composed by mobile agents can produce almost same performance network as the grid static formation.
References [1] D. Kurabayashi, A.Yajima, T.Funato, Emergence of smalll-world network among moving inidividual、17th SICE symposium on Decentralized Autonomous Systems,pp15-20, 2005(in japanese). [2] A.S.tanenbaum,computer network fourth japanese edition,2003. [3] T. Camp, J. Boleng and V. Davies, A Survey of Mobility Models for Ad Hoc Network Research, Dept. of Math. and Computer Sciences, Colorado School of Mines, Golden, CO, 2002. [4] Adilson E. Motter, Cascade control and defense in complex networks,arXiv:condmat/0401074v2, 2004. [5] N. Zhou, H. Wu and A. Abouzeid, Reactive Routing Overhead in Networks with Unreliable agents, Proceedings of Ninth Annual International Conference on Mobile Computing and Networking (MobiCom ’03), San Diego, CA, USA, September, 2003. [6] Jean M. Carlson and John C. Doyle, Highly optimized tolerance: a mechanism for power laws in designed systems, Phys Rev E, 60(2 Pt A): 1412-27, 1999. [7] M. E. J. Newman, Michelle Girvan and J. Doyne Farmer, Optimal design, robustness and risk aversion, Phys. Rev. Lett. 89, 028301, 2002. [8] S. Ichikawa, F. Hara,Effects of Static and Dynamic Variety in the Character of Robots on Group Intelligence of Multi-robot System. DARS 2000,pp89-98,2000. [9] Per Bak, How Nature Works, The Science of Self-Organized Criticality, Copernicus Books, 1999. [10] The Network Simulator - ns2, http://www.isi.edu/nsnam/ns/, 2005. [11] D. Johnson and D. Maltz, Dynamic source routing in ad hoc wireless networks, In T. Imelinsky and H. Korth, editors, Mobile Computing, pages 153-181, Kluwer Academic Publishers, 1996. [12] Christian de Waal and Michael Gerharz, BonnMotion: a mobility scenario generation and analysis tool, Communication Systems group, Institute of Computer Science IV, University of Bonn, Germany, http://web.informatik.uni-bonn.de/IV/Mitarbeiter/dewaal/BonnMotion/, 2005.
Part 6 Evolution and Learning
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
255
Learning the Cooperative Behaviors of Seesaw Balancing Agents - An Actor-Critic Approach Takashi KAWAKAMI a, Masahiro KINOSHITA a and Yukinori KAKAZU b a Hokkaido Institute of Technology Sapporo, JAPAN b Hokkaido Information University, JAPAN
Abstract This paper proposes a new approach to realize a reinforcement learning scheme for autonomous multiple agents system. In our approach, we treat the cooperative agents systems in which there are multiple autonomous mobile robots, and the seesaw balancing task is given. This problem is an example of corresponding tasks to find the appropriate locations for multiple mobile robots. Each robot agent on a seesaw keeps being balanced state. As a most useful algorithm, the Q-learning method is well known. However, feasible action values of robot agents must be categorized into some discrete action values. Therefore, in this study, the actor-critic method is applied to treat continuous values of agents’ actions. Each robot agent has a set of normal distribution, that determines a distance of the robot movement for a corresponding state of the seesaw system. Based on a result of movement in this system, the normal distribution is modified by actor-critic learning method. The simulation result shows the effectiveness of our approaching method.
Keywords Actor-critic, reinforcement learning, multiagent systems, cooperative behaviors, seesaw balancing problems
1. Introduction Today, an industrial field around robotics research is advancing rapidly. Small personal robots also are with our office and home recently. Instead of human worker, higher integrated robots are necessary for carrying out dangerous tasks. Personal robots or pet robots also should be in our house or welfare facilities. Those robots must have the higher-leveled autonomy and cooperativeness, because they have to achieve a task interacting with other robots or human in a real world where is an uncertain, complex and dynamic environment for autonomous robots. However, these autonomy and cooperativeness are hard to be designed by human designer. For this difficulty, reinforcement learning mechanisms are proposed as an effective approach, and many good results are obtained and reported. In Q-learning method as a conventional algorithm of reinforcement learning, feasible sensory input values from the environment and action values must be divided into finite discrete values. Nevertheless, there are many applications in which continuous values of sensory input and action should be represented to complete given tasks appropriately. In this case, it is well known that continuous action values of autonomous robots are applied by the
256
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
actor-critic method. Applications of actor-critic method for multiple autonomous agents systems are hardly discussed ever. In this paper, the actor-critic method is applied to treat continuous action values of multiple agents that cooperate each other to perform a given cooperative task. A seesaw balancing task is given as a difficult cooperative task for the multiple autonomous agents system. In this task, appropriate behaviors of multiple mobile robots must be found as a solution by which a seesaw keeps its balance continuously. In the environment where autonomous robots move independently, there is a possibility that the whole system may have instability by a certain agent's movement. It is important practically to set up a control system in consideration of whole stability. Each robot agent on a seesaw keeps being balanced state by using corresponding reinforcement learning systems. Each robot agent has a set of normal distributions, that determines a distance of the robot movement for a corresponding state of the seesaw system. Based on a result of movement in this system, applied normal distribution is modified by actor-critic learning method. The simulation result shows the effectiveness of our approaching method.
2. Reinforcement Learning Systems Based on Actor-Critic Method Recently, the reinforcement learning is attracting attention as a profitable machine learning algorithm. This reinforcement learning is performed based on the given environmental rewards for executed actions. Therefore, the complete world model is not necessary to construct, and the reinforcement learning system is expected to solve the difficult problems in which external environments cannot be significantly modeled in the system, or environments are dynamically changing. To realize reinforcement learning scheme, some approaches such as Temporal Difference Algorithm [3], Q-Learning [4] and Classifier Systems [5], were proposed. Q-Learning is the most successful algorithm of reinforcement learning methods; however it is necessary to categorize continuous feasible sensory input values into finite discrete values to learn appropriate action patterns. There are many applications in which continuous values of sensory input and action should be represented to complete given tasks appropriately. In this case, it is well known that continuous action values of autonomous robots are applied by the actor-critic method. Actor-critic method is TD method that has a separate memory structure to explicitly represent the policy independent of the value function. The policy structure that is called as the actor selects suitable actions. The estimated value function is called as the critic that criticizes the actions determined by the actor. Leaning by actor-critic method is performed on-policy. The critique is treated as a TD-error. The TD-error is the scalar signal output of the critic and drives all learning in both actor and critic. Actor-critic methods are the natural extension of the idea of reinforcement comparison methods to TD learning methods. After each action selection, the critic evaluates the new state to determine whether things have become better or worse than expected. The evaluation is the TD error such as; TD-error=[rt +ǫV(st+1)㧙V(st)]
(1)
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
257
where V is the state value function estimated by the critic. This TD error is used to evaluate the selected action at taken in state st. If the TD error is positive, the tendency to select at should be strengthened for the future. The other hand, if the TD error is negative, the tendency should be weakened. The advantages of actor-critic methods are as follows; z They require minimal computation in order to select actions. Consider a case where there are an infinite number of possible actions. For example, that is a continuous-valued action. Any method learning just action values must search through this infinite set in order to pick an action. If the policy is explicitly stored, then this extensive computation may not be needed for each action selection. z They can learn an explicitly stochastic policy; that is, they can learn the optimal probabilities of selecting various actions. This ability turns out to be useful in competitive and non-Markov cases.
3. The Seesaw Balancing Problem In this paper, we treat the seesaw balancing problem as a cooperative multiagent system in which cooperative behaviors of multiple autonomous robots are acquired by applying the actor-critic method. In our problem setting, there are autonomous mobile robots mri (i=1,2,…,n) on a flat seesaw that has no motor taking a balance. Each mobile robot independently moves along a straight line. Thus, the objective of this problem is keeping a balance of the seesaw by appropriate actions of mobile robots. An action of each robot can be represented as a continuous value of a distance of one dimensional movement. A position of a robot mri at time t is defined as xi,t , and moving distance value is named as 'xi,t. Consequently, a next position at time t+1 is presented by the following equation; xi,t+1= xi,t +'xi,t
(2)
Since each mobile robot moves along a corresponding straight rail, there is no collision between robots. Let the length of the seesaw be L, so, possible position of a robot is in a range of L / 2 d xi ,t d L / 2 .
Figure 1. Experimental seesaw system model
Although a conventional dynamics of the seesaw system is generally represented by an angle of seesaw Tt, an angler velocity Zt, a friction of a bearing and so on, we
258
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
exploit only a moment value of the seesaw at time t to simplify the problem in this problem setting. A moment value Mt is calculated as M t ¦ mi xi ,t , where mi is a weight of robot mri . Therefore, the system has to control the seesaw such as the moment value is kept to be value 0. Each robot is assumed to have the same weight. Figure 1 illustrates the experimental seesaw system model.
4. Applying the Actor-Critic Method Based on the problem setting of seesaw balancing tasks described above, autonomous agents learn cooperative behaviors by the actor-critic method. 4.1 Determination of Behaviors by Actor Module In our approach, behaviors of each agent at a certain time are determined by the actor module in the actor-critic system. For a certain agent, other agents’ behaviors are uncertain factors, so that the agent learns its own behavior by reinforcement learning mechanisms. The i-th agent mri has respective stochastic policy Si independently. This stochastic policy Si determines a movement length 'xi according to the current moment value Mt . In detail, the stochastic policy Si contains normal distributions N(Pi, Vi) that formed by an average value Pi and a standard deviation Vi, so that the movement length 'xi is represented as a normal random value calculated based on the normal distributions N(Pi, Vi). Each robot recognizes its current state by a moment value of the seesaw system. Appropriate movement distances of mobile robots depend upon a current moment value of seesaw, so in this study a continuous moment value is translated into one of twelve discrete state values. Therefore, the policy Si of the agent mri is in detail represented as,
Si = {(Pi,1,Vi,1), (Pi,2,Vi,2), ̖, (Pi,12,Vi,12)}
(3)
where,Pi,j and Vi,j are respectively an average value and a standard deviation of the normal distribution N(Pi,j , Vi,j) which used by i-th agent when the state of seesaw is recognized in j-th state. 4.2 Learning Stochastic Policies by Critic Module To learn appropriate behaviors of mobile agents, the critic module calculates values of TD-error and revises the stochastic policies based on trial-and-error processes. In this approach, a value of TD-error is estimated based on a current moment value Mt at time t and a changed moment value Mt+1 by movements of agents at time t+1. A TD-error value is calculated by the following equation; TD-error=|Mt| |Mt+1|
(4)
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
259
No reward value is given from environment in our approach. Using this TD-error value, modification rules of policies are presented as follows; rule 1) if TD-error>0 then revising the average value Ǵi,j of used normal distribution according to the following equation,
Pi , j m D'xi , j (1 D ) Pi , j
(5)
where, D (0 d D d 1) is a fraction of learning rate. Also if 'xi,t
V i , j m EV i , j
(6)
where, E (0 d E d 1) is another learning coefficient. In this rule, the condition TD-error>0 means that a state of the seesaw system becomes better by the agent’s moving. rule 2) if TD-error<0 then enlarging the standard deviation Vi,j , because a state of the seesaw system becomes worse. A modifying equation in this case is represented as,
V i, j m V i, j / E
(7)
Normal distributions as the stochastic policy are learned and refined by applying these rules iteratively.
5.Experiments 5.1 Computer Simulations Computer simulations are carried out based on our approach mentioned before to examine usefulness of our method. In this experiment, experimental parameters are settled as the following table. Table 1. Experimental parameters the number of mobile robots
n=3, n=4, n=10
learning coefficients
D=0.1, E=0.98
initial values of average and standard deviation of normal distributions
Pi,j=0.0, Vi,j=1.0
length of seesaw
L=10.0
weight of each mobile robot
mi =1.0
Figure 2 illustrates a changing curve of moment values of the seesaw system on which there are three mobile agents (n=3). This figure shows that mobile agents become to keep a balance of the seesaw system through learning processes.
260
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
Figure 3. Positions of three mobile agents on the seesaw through learning (n=3)
Figure 3 shows positions of three mobile agents on the seesaw. This result illustrates that two mobile agents take their positions near by the both ends of the seesaw, and the other one moves around the center position of the seesaw. That means a more robust locations are implicitly discovered by actor-critic learning method. Moment values of the seesaw system in case there are ten robots through the actor-critic learning process are presented in Figure 4. This figure shows that even if the number of mobile agents increases, this actor-critic based seesaw system can keep a balance by reinforcement learning mechanisms. As the other experiment, we carried out a computer simulation about robustness of the system for accidental failure of robots. In this simulation, there are four agents on the seesaw system. Experimental parameters are settled same as cooperative behaviors acquisition experiments mentioned above. Firstly, four agents learn their behaviors through 4000 episodes according to the actor-critic approach, and then randomly selected one agent is assumed to break down. Other three agents try to learn again for a new situation through 5000 episodes. Figure 5 illustrates moment values of the seesaw system through the actor-critic learning process. This figure shows that agents cannot preserve a balance of the seesaw system because of the failure at 4000th episode. However, remained three mobile agents learn different cooperative behaviors, and then they become keeping a balance of the seesaw. Positions of mobile agents on a seesaw in this simulation are represented in figure 6. Before the failure, two agents are at near by an end of seesaw and other two agents take place on the other side of seesaw, but after that one agent moves to keep a balance near by center of the seesaw. From these experimental results, a higher learning performance of our approach is found out.
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
261
Figure 6. Position of mobile agents on a seesaw
5.2 Real robots experiment We also carried out the real robots experiment. Our constructed experimental equipment is made by using LegoMindstormTM shown in figure 7. In this real experiment, actions of real robots are similaly represented by normal distributions
Figure 7. Experimental real seesaw robot system
262
T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents
modified for the system based on the actor module of computer simulations. Each robot learns coresponding behaviors by the actor-critic method. The states of the seesaw are recognized by angle T measured by a rotation sensor. Figure 8 shows learning process of the real seesaw robot system. After 1000 learning steps, the experimental seesaw system learns appropriate behaviors to keep the balance of the seesaw. From result of this experiment, it is examined that our actor-critic approach can solve the seesaw balancing task..
(a) after 100 steps
(b) after 1000 steps
Figure 8. Learning process of real seesaw robot system
6. Conclusions In this paper, we solve the seesaw balancing problem in which continuous values of behaviors are treated by applying the actor-critic approach. Experimental results show the effectiveness of our approach. Each robot learns its appropriate behaviors and modifies stochastic action selection policies to preserve a balance of the seesaw systems. We also carried out an experiment about robustness of this learning system for failures. Remained agents can acquire cooperative behaviors by re-learning process. Real seesaw balancing experiments with autonomous robots are also performed by our proposed actor-critc approach. By this experiment, learning ability of reinforcement learning based on actor-critic method is illustrated, and cooperative behaviors of multiple robots are adaptively learned.
References [1] R.S.Sutton and A.G.Barto; Reinforcement learning, The MIT Press,1998, pp. 1-40. [2] Watkins, C.J.C.H. & Dayan, P.: Technical Note: Q-Learning, Machine Learning 8, pp.279-292, 1992. [3] Kimura, H. & Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function, Proc. of 15th Int. Conf. on Machine Learning, pp.278-286, 1998. [4] Bradtke,S.J. & Duff,M.O.: Reinforcement Learning Method for Continuous-Time Markov Decision Problems, Advances in Neural Information Processing Systems 7, pp.393-400, 1994. [5] Kimura, H., Yamashita, T. and Kobayashi, S. :Reinforcement Learning of Walking Behavior for a Four-Legged Robot, 40th IEEE Conference on Decision and Control, pp.411-416 , 2001.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
263
Evolutionary Reinforcement Learning for Simulated Locomotion of a Robot with a Two-link Arm Yohannes Kassahun and Gerald Sommer Christian Albrechts University, Institute of Computer Science and Applied Mathematics, Department of Cognitive Systems, Olshausenstr. 40, D-24098, Kiel, Germany Abstract. In this paper we present a neural controller design for robots using an evolutionary reinforcement learning system that is suitable for learning through interaction. The method starts with networks of minimal structures determined by the domain expert and increases their complexity along the evolution path. It uses a nature inspired meta-level evolutionary process where new structures are explored at larger time-scale and existing structures are exploited at smaller time-scale. The method introduces an efficient and compact genetic encoding of neural networks onto a linear genome that enables one to evaluate the network without decoding it. We demonstrate the method by designing a neural controller for a robot with a two-link arm that enables it to move forward as fast as possible. We first give an introduction to the evolutionary method and then describe the experiment and results obtained. Keywords. Neural networks, Reinforcement learning, Evolutionary algorithms
1. Introduction A meaningful combination of the principles of neural networks, reinforcement learning and evolutionary computation is useful for designing agents that learn and adapt to their environment through interaction [4]. The combination results in an evolutionary reinforcement learning system shown in figure 1. The evolutionary algorithm contains genotypes of neural networks to be evaluated in a given environment. Each neural network is evaluated and assigned a given fitness value (reward). Through genetic operators of the evolutionary algorithm, the agents will be improved. The process continues until a certain number of generations or until an agent is found that solves a given task. The method presented in this work is an evolutionary reinforcement learning system which is closely related to the works of Angeline et al.[1] and Stanley [7]. It is related to the works of Angeline et al. in that the method uses parametric mutation that is based on evolution strategies or evolutionary programming with adaptive step sizes for optimization of the weights of the neural networks. Complexification of structures along the evolution path starting from a minimum structure makes it related to the works of Stanley. But it has two important features which makes it different from the previous works in the evolution of neural networks. The first is the introduction of a genetic encoding
264
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
Figure 1. Evolutionary reinforcement learning system. The agents are evaluated in the environment and their fitness values are returned to the evolutionary algorithm as rewards.
of a neural network that enables one to evaluate the neural network without decoding it. The topology of the network is implicitly encoded in the order of genes in the linear genome. The second is the introduction of a nature inspired meta-level evolutionary process where exploration of structures is executed at a larger timescale and exploitation of existing structures is done at smaller timescale.
2. Evolutionary Acquisition of Neural Topologies Evolutionary Acquisition of Neural Topologies (ENAT) [5] is an evolutionary reinforcement learning system that is suitable for learning and adaptation to the environment through interaction. It introduces a novel genetic encoding that uses a linear genome of genes (nodes) that can take different forms. The forms that can be taken by a gene can either be a neuron, or an input to the neural network, or a jumper connecting two neurons. The jumper genes are introduced by the structural mutation along the evolution path. They encode either forward or recurrent connections. In addition to the synaptic weight, a neuron node has a unique global identification number and number of input connections to it. A jumper node has also additionally a global identification number, which shows the neuron to which it is connected. Figure 2 shows an example of encoding a neural network using a linear genome. As can be seen in the figure, a linear genome can be interpreted as a tree based program if one considers all the inputs to the network and all jumper connections as terminals. The linear genome has some interesting properties that makes it useful for evolution of neural networks. It encodes the topology of a neural network implicitly in the ordering of the elements of the linear genome. This enables one to evaluate the network repre-
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
(a)
265
(b)
(c) Figure 2. An example of encoding a neural network using a linear genome. (a) The neural network to be encoded. It has one forward and one recurrent jumper connection. (b) The neural network interpreted as a tree structure, where the jumper connections are considered as terminals. (c) The linear genome encoding the neural network shown in (a). In the linear genome, N stands for a neuron, I for an input to the neural network, JF for a forward jumper connection, and JR for a recurrent jumper connection. The numbers beside N represent the global identification numbers of the neurons, and x or y represent the inputs coded by the input gene (node).
sented by it without decoding the neural network. That means, it is possible to evaluate the neural network encoded by the linear genome without some type of ontological process of transforming the genotype into phenotype. The evaluation of a linear genome is closely related to executing a linear program using a postfix notation. In the genetic encoding the operands (inputs and jumper connections) come before the operator (a neuron) if one goes from right to left along the linear genome. The process of evaluating a linear genome without decoding the neural network encoded by it is performed as follows. One starts from the right most node of the linear genome and then moves to the left in computing the output of the nodes. If the current node is an input node, then its current value and the weight associated with it is pushed onto the stack. If the current node is a neuron node, then n values with their associated weights are popped from the stack and the result of computation with the weight associated with the neuron node is pushed onto the stack. If the current node is a recurrent jumper node, then the last value of the neuron node whose global identification number is the same as that of the jumper node is obtained. Then the value obtained with the weight associated with the jumper node is pushed onto the stack. If the current node is a forward jumper node, the sublinear genome (sub-network) starting from a neuron whose global identification number is the same as that of the forward jumper node is copied. The response of the sub-linear genome is computed in the same way as that of the linear genome. Finally, the result of computation with the weight associated with the forward jumper node is pushed onto the stack. After traversing the genome from right to left completely, the resulting values are popped from the stack. The number of the resulting values is the same as the number of outputs of the neural network encoded by the linear genome. An example of evaluating the linear genome is shown in figure 3. The presented linear genome is complete in that it can represent any type of neural network. It is also a compact encoding of neural networks since the length of the linear genome is the same as the number of synaptic weights in the neural network. It is
266
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
Figure 3. An example of evaluating a linear genome without decoding the neural network encoded by it. The linear genome encodes the neural network shown in figure 2. The current values of the inputs to the neural network, x and y, are both set to 1. All neurons have a linear activation function of the form z = a, where a is the weighted linear combination of the inputs to a neuron. The overlapped numbers above the linear genome show the status of the stack after computing the output of a node. The numbers in brackets are the weights associated with the nodes.
closed under structural mutation and under a specially designed crossover operator. An encoding scheme is said to be closed if all genotypes produced are mapped into a valid set of phenotype networks [3]. The crossover operator exploits the fact that structures originating from some initial structure have some parts in common. By aligning the common parts of two randomly selected structures, it is possible to generate a third structure which contains the common and disjoint parts of the two mother structures. This type of crossover is introduced by Stanley [7]. An example of the crossover operator under which the linear genome is closed is shown in figure 4. If one assigns integer values to the nodes of a linear genome such that the integer values show the difference between the number of outputs and number of inputs to the nodes, one obtains the following rules useful in the evolution of the neural controllers. The first is that the sum of integer values is the same as the number of outputs of the neural controller encoded by the linear genome. The second is that a sub-network (sublinear genome) is a collection of nodes starting from a neuron node and ending at a node where the sum of integer values assigned to the nodes between and including the start neuron node and the end node is one. This property of the linear genome makes it important in the design of evolutionary methods for hierarchical and modular networks. Figure 5 illustrates the concept. The main search operators in EANT are the structural mutation, parametric mutation and crossover operator. The structural mutation adds or removes a forward or a recurrent jumper connection between neurons, or adds a new sub-network to the linear genome. It does not remove sub-networks since removing sub-networks causes a tremendous loss of the performance of the neural network. The structural mutation operates only on neuron nodes. The weights of a newly acquired topology are initialized to zero so as not to disturb the performance of the network. The parametric mutation is accomplished by perturbing the weights of the networks according to the uncorrelated mutation in evolution strategy or evolutionary programming. Figure 6 shows an example of structural mutation where a neuron node lost connection to an input and received a self-recurrent connection. Since a linear genome is equivalent to a tree based program, the initial structures are generated using either the grow or full method [2]. The initial complexity of the neural structures is determined by the domain expert and is specified by the maximum depth that can be assumed by the initial structures. The depth of a neuron node in a linear genome is the minimal number of neuron nodes that must be traversed to get from the output neuron to the neuron node, where the output neuron and the neuron node lie within the same sub-network that starts from the output neuron. The evolution of neural networks starts
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
267
Figure 4. Performing crossover between two linear genomes. The weights of the nodes of the resulting linear genome are inherited randomly from both parents.
with exploitation of structures that are already in the system. By exploitation, we mean optimization of the weights of the structures. This is accomplished by an evolutionary process that occurs at smaller time-scale. The evolutionary process at smaller time-scale uses parametric mutation and recombination operators as a search operator. Exploration of structures is done through structural mutation and crossover operator. The structural selection operator that occurs at larger time-scale selects the first half of the structures (species) to form the next generation. In order to protect the structural innovations or discoveries of the evolution, young structures that are less than few generations old with respect to the larger time-scale are carried over along the evolution regardless of the
268
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
Figure 5. An example of the use of assigning integer values to the nodes of the linear genome. The linear genome encodes the neural network shown in figure 2. The numbers in the square brackets below the linear genome show the integer values assigned to the nodes of the linear genome. Note that the sum of the integer values is one showing that the neural network encoded by the linear genome has only one output. The shaded nodes form a sub-network.
N 1
N 2
W=0.3
W=0.7
N 1
N 2
W=0.3
W=0.7
I x
W=0.5
I x W=0.5
I y
W=0.8
I y W=0.8
I y
I x
N 3
I x
W=0.9
W=0.6
W=0.4
I y
JR1
N 3
I x
W=0.1
W=0.6
W=0.4
0.6
3
0.7
0.6
3
2 0.4
0.4 0.5
0.3
0.5
0.3 0.8
0.8
x
0.1
1
0.9
2
W=0.3
0.3
0.3
1 0.7
W=0.3
y
x
y
Figure 6. An example of structural mutation. Note that the structural mutation deleted the input connection to N1 and added a self-recurrent connection to it.
results of the selection operator. New structures that are introduced through structural mutation and which are better according to the fitness evaluations survive and continue to exist. Since sub-networks that are introduced are not removed, there is a gradual increase in the number of structures and their complexity along the evolution path. This allows EANT to search for a solution starting from a neural network with minimum structural complexity specified by the domain expert. The search stops when a neural network with the necessary optimal structure that solves a given task is obtained.
3. Learning to Move Forward The crawling robotic insect introduced by Kimura and Kobayashi [6] is used for this experiment. The robot has one arm having two joints where the joints are controlled by two servo motors. It has also a touch sensor which detects whether the tip of the arm is touching the ground or not. The schematic diagram of the robot is shown in figure 7. The robot has bounded continuous and discrete state variables. The continuous state variables are the joint angles and the discrete state variable is the state of the touch sensor. The controller observes the joint angles and the state of the touch sensor. Depending on the state it perceives, the controller is expected to change the angles of the joints appropriately so that the robot can move forward as fast as possible. The first joint angle θ1 is bounded between 55˚ and 94˚, and the second joint angle θ2 lies in the range [−34˚, 135˚]. For both of the joints, the angles are measured from the vertical as shown in figure 7. The
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
269
Figure 7. The crawling robotic insect. The robot has one arm with two joints and a touch sensor for detecting whether the tip of the arm is touching the ground or not.
angle ranges are chosen so that they are equivalent to the angle ranges chosen by Kimura and Kobayashi. They measured the first joint angle from the horizontal and the second joint angle from the first link. The touch sensor φ takes the value 0 for non-touch state and 1 for touch state. Let the coordinates of the first and the second joints be (x0 , y0 ) and (x1 , y1 ), respectively and let the coordinate of the tip of the arm be (x2 , y2 ). The state of the robot at each time step t = 0, 1, . . . is given by st = (x0 , y0 , x2 , y2 , θ1 , θ2 , φ). Since the coordinate (x1 , y1 ) can be calculated given a state s, it is not listed in the definition of the state of the robot. The state transition of the system is governed by equations (1) and (2). If the tip of the arm is not touching the ground (φ(t) = 0), then the state transition equation is given by θ1 (t + 1) = θ1 (t) + δ1 θ2 (t + 1) = θ2 (t) + δ2 x0 (t + 1) = x0 (t) , y0 (t + 1) = y0 (t) x2 (t + 1) = x0 (t + 1) + l1 sin θ1 (t + 1) + l2 sin θ2 (t + 1) y2 (t + 1) = y0 (t + 1) + l1 cos θ1 (t + 1) − l2 cos θ2 (t + 1)
(1)
and if the tip of the arm is touching the ground (φ(t) = 1), then the state transition equation takes the form θ1 (t + 1) = θ1 (t) + δ1 θ2 (t + 1) = θ2 (t) + δ2 x2 (t + 1) = x2 (t) . y2 (t + 1) = y2 (t) x0 (t + 1) = x2 (t + 1) − l2 sin θ2 (t + 1) − l1 sin θ1 (t + 1) y0 (t + 1) = l2 cos θ2 (t + 1) − l1 cos θ1 (t + 1)
(2)
The quantities δ1 and δ2 are the outputs of the neural controller to be designed, and l1 and l2 are the lengths of the first and the second link. The first link is between the first joint and the second joint while the second link is between the second joint and the tip of the arm. For the experiment, l1 = 34 cm and l2 = 20 cm are chosen. The first joint is located at right upper corner of the rectangular body of the robotic insect which has a height of 18 cm and width of 32 cm. A trial contains 50 time steps and at the beginning of a trial the robot is placed at the origin. The fitness function used to evaluate a neural controller is given by f = N1 N t=1 (x0 (t) − x0 (t − 1)) where the difference x0 (t) − x0 (t − 1) is
270
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
the velocity of the system at time t in the direction of the x−axis and f is the average velocity of the robot for a trial. The quantity N is the number of time steps per trial. Tsuchiya et al. applied their policy learning by genetic algorithm using importance sampling (GA-IS) for learning to move forward. They defined a three dimensional vector X = (x1 , x2 , x3 ) for representing the state space [8]. The dimensions of the state space is made up of the joint angles and the state of the touch sensor. The policy used in their experiment is a 7 dimensional feature vector F = [x1 , x2 , x3 , x4 (= 1 − x1 ) , x5 (= 1 − x2 ), x6 (= 1 − x3 ), 1.0]. A weight vector Λ = (λ1,i , λ2,i , λ3,i , λ4,i , λ5,i , λ6,i , λ7,i ) is used to select the action ai (t) from normal dis6 tribution with mean value μi = 1/(1 + exp(− k=1 λk,i xk )) and standard deviation σi = 1/(1 + exp(−λ7,i )) + 0.1. If the selected action is out of range then it is resampled. The number of the policy parameters is 14 and hence the search space for the genetic algorithm has 14 dimensions. In our experiment, the structure shown in figure 8 (a) containing two output neurons connected to three input nodes (θ1 , θ2 , φ) is used as the initial controller. The best controller shown in figure 8 (b) is found after running EANT. Note that the best controller is more complex than the initial structure. Figure 8 (c) shows the waveforms of the joint angles and the touch sensor for the first 20 time steps as the robot moves forward and being controlled by the best controller. 2
θ1
Angle in radian
1.5
1
φ
θ2
0.5
0
−0.5
−1 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Forward steps taken by the robot
(a)
(b)
(c)
Figure 8. Learning to move forward. (a) The initial structure. (b) The best controller found by our algorithm that enables the robot to move forward. (c) The waveforms of the joint angles and the touch sensor as the robot moves forward.
The method introduced by Tsuchya et al. (GA-IS) needed on the average 10000 interactions with the environment for learning to move forward. We have run EANT 50 times and obtained on the average 3520 interactions for learning the task. As compared to the GA-IS, EANT has reduced the number of interactions with the environment necessary to learn to move forward. The reduction in the number of interactions is due to the direct search for an optimal policy in the space of policies, starting minimally and increasing the complexity of the neural network that represents the policy.
4. Conclusion An evolutionary reinforcement learning system that is suitable for learning through interaction is presented. The system exploits both types of adaptations: namely evolutionary
Y. Kassahun and G. Sommer / Evolutionary Reinforcement Learning
271
adaptation and adaptation through learning. It starts with networks of minimal structures and complexifies them along the evolution path. The method introduces a compact genetic encoding that enables one to evaluate the neural network encoded by it without some type of ontological process of transforming the genotype into phenotype. Moreover, a meta-level evolutionary process is introduced that is suitable to explore new structures incrementally and exploit the existing ones.
Acknowledgments This work is sponsored by the German Academic Exchange Service (DAAD) under grant code R-413-A-01-46029 and the COSPAL project under the EC contract no. FP6-2003IST-2004176, which are duly acknowledged.
References [1] P. J. Angeline, G. M. Saunders, and J. B. Pollack. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5:54–65, 1994. [2] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, San Francisco, CA, 1998. [3] J. Jung and J. Reggia. A descriptive encoding language for evolving modular neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 519–530. Springer-Verlag, 2004. [4] Y. Kassahun and G. Sommer. Improving learning and adaptation capability of agents. In Proceedings of 8th Conference on Intelligent Autonomous Systems (IAS-8), pages 472–481, Amsterdam, November 2004. [5] Y. Kassahun and G. Sommer. Evolution of neural networks through incremental acquisition of neural structures. Technical Report 0508, Institute of Computer Science and Applied Mathematics, Christian-Albrechts University, Kiel, Germany, June 2005. [6] H. Kimura and S. Kobayashi. Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the 6th European Workshop on Learning Robots, pages 144–153, 1997. [7] K. O. Stanley. Efficient Evolution of Neural Networks through Complexification. PhD thesis, Artificial Intelligence Laboratory. The University of Texas at Austin., Austin, USA, August 2004. [8] C. Tsuchiya, H. Kimura, and S. Kobayashi. Policy learning by GA using importance sampling. In Proceedings of 8th Conference on Intelligent Autonomous Systems (IAS-8), pages 385–394, Amsterdam, November 2004.
272
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot Viktor Zhumatiy a,1 , Faustino Gomez a , Marcus Hutter a and Jürgen Schmidhuber a,b a IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland b TU Munich, Boltzmannstr. 3, 85748 Garching, München, Germany Abstract. We address the problem of autonomously learning controllers for visioncapable mobile robots. We extend McCallum’s (1995) Nearest-Sequence Memory algorithm to allow for general metrics over state-action trajectories. We demonstrate the feasibility of our approach by successfully running our algorithm on a real mobile robot. The algorithm is novel and unique in that it (a) explores the environment and learns directly on a mobile robot without using a hand-made computer model as an intermediate step, (b) does not require manual discretization of the sensor input space, (c) works in piecewise continuous perceptual spaces, and (d) copes with partial observability. Together this allows learning from much less experience compared to previous methods. Keywords. reinforcement learning, mobile robots
1. Introduction The realization of fully autonomous robots will require algorithms that can learn from direct experience obtained from visual input. Vision systems provide a rich source of information, but, the piecewise-continuous (PWC) structure of the perceptual space (e.g. video images) implied by typical mobile robot environments is not compatible with most current, on-line reinforcement learning approaches. These environments are characterized by regions of smooth continuity separated by discontinuities that represent the boundaries of physical objects or the sudden appearance or disappearance of objects in the visual field. There are two broad approaches that are used to adapt existing algorithms to real world environments: (1) discretizing the state space with fixed [20] or adaptive [15,16] grids, and (2) using a function approximator such as a neural-network [10,5], radial basis functions (RBFs) [1], CMAC [22], or instance-based memory [4,3,17,19]. Fixed discrete grids introduce artificial discontinuities, while adaptive ones scale exponentially with state space dimensionality. Neural networks implement relatively smooth global functions that are not capable of approximating discontinuities, and RBFs and CMACs, like fixed grid methods, require knowledge of the appropriate local scale. 1 Correspondence to: Viktor Zhumatiy, IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland. Tel.: +41 58 666 6660; Fax: +41 58 666 666; E-mail:
[email protected]
V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot 273
Instance-based methods use a neighborhood of explicitly stored experiences to generalize to new experiences. These methods are more suitable for our purposes because they implement local models that in principle can approximate PWC functions, but typically fall short because, by using a fixed neighborhood radius, they assume a uniform sampling density on the state space. A fixed radius prevents the approximator from clearly identifying discontinuities because points on both sides of the discontinuity can be averaged together, thereby blurring its location. If instead we use a fixed number k of neighbors (in effect using a variable radius) the approximator has arbitrary resolution near important state space boundaries where it is most needed to accurately model the local dynamics. To use such an approach, an appropriate metric is needed to determine which stored instances provide the most relevant information for deciding what to do in a given situation [6]. Apart from the PWC structure of the perceptual space, a robot learning algorithm must also cope with the fact that instantaneous sensory readings alone rarely provide sufficient information for the robot to determine where it is (localization problem) and what action it is best to take. Some form of short-term memory is needed to integrate successive inputs and identify the underlying environment states that are otherwise only partially observable. In this paper, we present an algorithm called Piecewise Continuous Nearest Sequence Memory (PC-NSM) that extends McCallum’s instance-based algorithm for discrete, partially observable state spaces, Nearest Sequence Memory (NSM; [12]), to the more general PWC case. Like NSM, PC-NSM stores all the data it collects from the environment, but uses a continuous metric on the history that allows it to be used in real robot environments without prior discretization of the perceptual space. An important priority in this work is minimizing the amount of a priori knowledge about the structure of the environment that is available to the learner. Typically, artificial learning is conducted in simulation, and then the resulting policy is transfered to the real robot. Building an accurate model of a real environment is human-resource intensive and only really achievable when simple sensors are used (unlike full-scale vision), while overly simplified models make policy transfer difficult [14]. For this reason, we stipulate that the robot must learn directly from the real world. Furthermore, since gathering data in the real world is costly, the algorithm should be capable of efficient autonomous exploration in the robot perceptual state space without knowing the amount of exploration required in different parts of the state space (as is normally the case in even the most advanced approaches to exploration in discrete [2,8], and even in metric [6] state spaces). The next section introduces PC-NSM, section 3 presents our experiments in robot navigation, and section 4 discusses our results and future directions for our research. 2. Piecewise-Continuous Nearest Sequence Memory (PC-NSM) In presenting our algorithm, we first briefly review the underlying learning mechanism, Q-learning, then describe Nearest Sequence Memory which extends Q-learning to discrete POMDPs, and forms the basis of our PC-NSM. 2.1. Q-learning The basic idea of Q-learning, originally formulated for finite discrete state spaces, is to incrementally estimate the value of state-action pairs, Q-values, based on the reward received from the environment and the agent’s previous Q-value estimates. The update rule for Q-values is
274 V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
Qt+1 (st , at ) = (1 − α)Qt (st , at ) + α[rt+1 + γ max Qt (st+1 , a)] a
where Qt (st ,at ) is the Q-value estimate at time t of the state st ∈ S and action at ∈ A, α is a learning rate, and γ ∈ [0,1] a discount parameter. Q-learning requires that the number of states st be finite and completely observable. Unfortunately, due to sensory limitations, robots do not have direct access to complete state information, but, instead, receive only observations ot ∈ O, where O is the set of possible observations. Typically, O is much smaller than the set of states S causing perceptual aliasing where the robot is unable to behave optimally because states requiring different actions look the same. In order to use Q-learning and similar methods under these more general conditions, some mechanism is required to estimate the underlying environmental state from the stream of incoming observations. The idea of using the history of all observations to recover the underlying states forms the core of the NSM algorithm, described next. 2.2. Nearest Sequence Memory NSM tries to overcome perceptual aliasing by maintaining a chronologically ordered list or history of interactions between the agent and environment. The basic idea is to disambiguate the aliased states by searching through the history to find those previous experience sequences that most closely match its recent situation. At each time step t the agent stores an experience triples (at ,ot ,Rt ) of its current action, observation, and reward by appending it to history ht−1 = (a1 ,o1 ,R1 ),...,(at−1 , ot−1 ,Rt−1 ) of previous experiences, called observation state1 . In order to choose an action at time T , the agent finds, for each possible action a, the k observation states in the history that are most similar to the current situation. McCallum [12] defines similarity by the length of the common history 0 t = 0 ∨ t = 0 ∨ (at , ot , Rt ) = (at , ot , Rt ) n(ht , ht ) = (1) 1 + n(ht−1 , ht−1 ) (at , ot , Rt ) = (at , ot , Rt ). which counts the number of contiguous experience triples in the two observation states that match exactly, starting at t and t and going back in time. We rewrite the original n into a functionally equivalent, but more general form using the distance measure2 μ(ht ,ht )=[1+n(ht,ht )]−1 to accommodate the metric we introduce in the next section. The k observation states μ-nearest to hT for each possible action a at time T form a neighborhood NahT that is used to compute the Q-value for the corresponding action by: ⎛ Q(hT , a) = ⎝
1 |NahT |
⎞ q(ht )⎠ ,
(2)
h ht ∈Na T
where q(ht ) is a local estimate of Q(ht ,at ) at the state-action pair that occurred at time t. After an action has been selected according to Q-values (e.g. the action aT with the highest value), the q-values are updated: 1 We substitute the symbol h for s in McCallum’s original notation to avoid confusion with the accepted definition of “state” as observation sequences do not correspond to process states. 2 Note that this μ is not a metric.
V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot 275
Algorithm 1 Piecewise-Continuous NSM 1: T = 0 2: hT ⇐ ∅ // initialize history 3: loop 4: T ⇐ T +1 5: hT ⇐hT −1 +(aT ,oT ,RT ) // get and store action, observation, and reward 6: for each a ∈ A do 7: Ha = {ht |at = a,∀t < T } // split the history into subsets Ha containing all // observation states where action a was taken 8: NahT = argmink μ(ht ,hT ) // find the k-nearest neighbors to hT in Ha ht ∈Ha // using metric μ from equation 4 9: Q(hT ,a) = 1hT q(h) // =0 in case of NahT=0 |Na |
10: 11:
12:
end for aT = argmax(Q(hT ,a)) // compute action with highest value a∈A 1 μ(h,hT ) // compute best exploration action
aT
aT =argmax hT a∈A
13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
h
h∈Na T
|Na |
h
h∈Na T
if randR(0,1) < then perform
aT // select the exploratory action else perform aT // select the greedy action end if for i = 1 to n do t =randZ(1,T −1) q(ht ) ⇐ (1−β)q(ht )+β(Rt +γmaxQ(ht+1 ,a)) // update the q-values a end for q(hT ) = Q(hT , aT ) // initialize q-values with a nearest-neighbor estimate end loop q(hi ) := (1 − β)q(hi ) + β(Ri + γ max Q(hT , a)), ∀hi ∈ NahTT . a
(3)
NSM has been demonstrated in simulation, but has never been run on real robots. Using history to resolve perceptual aliasing still requires considerable human programming effort to produce reasonable discretization for real-world sensors. In the following we avoid the issue of discretization by selecting an appropriate metric in the continuous observation space. 2.3. Piecewise Continuous NSM The distance measure used in NSM (equation 1) was designed for discrete state spaces. In the continuous perceptual space where our robot must learn, this metric is inadequate since most likely all the triples (at ,ot ,Rt ) will be different from each other and μ(ht ,ht ) will always equal 1. Therefore, to accommodate continuous states, we replace equation 1 with the following discounted metric: min(t,t )
μ(ht , ht ) =
τ =0
λτ ||ot−τ − ot −τ ||2 ,
(4)
276 V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
where λ ∈ [0,1]. This metric takes an exponentially discounted average of the Euclidean distance between observation sequences. Note that, unlike equation 1, this metric ignores actions and rewards. The distance between action sequences is not considered because there is no elegant way to combine discrete actions with continuous observations, and because our primary concern from a robotics perspective is to provide a metric that allows the robot to localize itself based on observations. Reward values are also excluded to enable the robot to continue using the metric to select actions even after the reinforcement signal is no longer available (i.e. after some initial training period). Algorithm 1 presents PC-NSM in pseudocode. The functions randZ(a,b) and randR(c,d) produce a uniformly distributed random number in [a,b] ∈ Z and [c,d] ∈ R respectively, and ∈ [0,1] determines the greediness of the policy. The algorithm differs most importantly from NSM in using the discounted metric (line 8), and in the way exploratory actions in the -greedy policy are chosen (line 12). The exploratory action is the action whose neighborhood has the highest average distance from the current observation-state, i.e. the action about which there is the least information. This policy induces what has been called balanced wandering [7]. 2.4. Endogenous Updates If the q-values are only updated during interaction with the real environment, learning can be very slow since updates will occur at the robot’s control frequency (i.e. the rate at which the agent takes actions). One way to more fully exploit the information gathered from the environment is to perform updates on the stored history between normal updates. We refer to these updates as endogenous because they originate within the learning agent, unlike normal, exogenous updates which are triggered by “real” events outside the agent. During learning, the agent selects random times t< T , and updates the q-value of ht according to equation 3 where the maximum Q-value of the next state ht+1 is computed using equation 2 (see lines 18–21 in Algorithm 1). This approach is similar to the Dyna architecture [21] in that the history acts as a kind of model, but, unlike Dyna, the model does not generate new experiences, rather it re-updates those already in the history in a manner similar to experience replay [9]. 3. Experiments in Robot Navigation We demonstrate PC-NSM on a mobile robot task where a CSEM Robotics SmarteaseTM robot must use video input to identify and navigate to a target object while avoiding obstacles and walls. Because the camera provides only a partial view of the environment, this task requires the robot to use its history of observations to remember both where it has been, and where it last saw the target if the target moves out of view. 3.1. Experimental Setup The experiments were conducted in the 3x4 meter walled arena shown in figure 3. The robot is equipped with two ultrasound distance sensors (one facing forward, one backward), and a vision system based on the Axis 2100 network camera that is mounted on top of the robot’s 28cm diameter cylindrical chassis. Learning was conducted in a series of trials where the robot, obstacle(s), and target (blue teapot) were placed at random locations in the arena. At the beginning of each trial, the robot takes a sensor reading and sends, via wireless, the camera image to a vision computer, and the sonar readings to a learning computer. The vision computer extracts
V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot 277
the x-y coordinates of the target in the visual field by calculating the centroid of cp pixels of the target color (see figure1), and passes them on to the learning computer, along with a predicate p indicating whether the target is visible. If p is false, x=y=0. The learning computer merges x,y, and p with the forward and backward sonar readings, f and b, to form the inputs to PC-NSM: an observation vector o = (x,y,p,f,b), where x and y are normalized to [−1,1], and f and b are normalized to [0,1]. PC-NSM then selects one of 8 actions: turn left or right by either 22.5◦ or 45◦ , and move forward or backward either 5cm or 15cm (approximately). This action set was chosen to allow the algorithm to adapt to the scale of environment [18]. The selected action is sent to the robot, the robot executes the action, and the cycle repeats. When the robot reaches the goal, the goal is moved to a new location, and a new trial begins. The entire interval from sensory reading to action execution is 2.5 seconds, primarily due to camera and network delays. To accommodate this relatively low control frequency, the maximum velocity of the robot is limited to 10 cm/s. During the dead time between actions, the learning computer conducts as many endogenous updates as time permits. 3.2. PC-NSM parameters PC-NSM uses an -greedy policy (Algorithm 1, line 13), with set to 0.3. This means that 30% of the time the robot selects an exploratory action. The appropriate number of nearest neighbors, k, used to select actions, depends upon the noisiness of the environment. The lower the noise, the smaller the k that can be chosen. For the amount of noise in our sensors, we found that learning was fastest for k = 3. A common practice in toy reinforcement learning tasks such as discrete mazes is to use minimal reinforcement so that the agent is rewarded only when it reaches the goal. While such a formulation is useful to test algorithms in simulation, for real robots, this sparse, delayed reward forestalls learning as the agent can wander for long periods of time without reward, until finally happening upon the goal by accident. Often there is specific domain knowledge that can incorporated into the reward function to provide intermediate reward that facilitates learning in robotic domains where exploration is costly [11]. The reward function we use is the sum of two components, one is obstacle-related, Robstacle , and the other is target-related ,Rtarget : R = −20/ max(0.01, min(f, b)) + p · (500 − 50|x| − 250y + cp )
Robstacle
(5)
Rtarget
Rtarget is largest when the robot is near to the goal and is looking directly towards it, smaller when the target is visible in the middle of the field of view, even smaller when the target is visible, but not in the center, and reaches its minimum when the target is not visible at all. Robstacle is negative when the robot is too close to some obstacle, except when the obstacle is the target itself, visible by the robot. It is important to note that the coefficients in equation 5 are specific to the robot and not the environment. They represent a one-time calibration of PC-NSM to the robot hardware being used. 3.3. Results After taking between 1500 and 3000 actions the robot learns to avoid walls, reduce speed when approaching walls, look around for the goal, and go to the goal whenever it sees it. This is much faster compared to neural network based learners, e.g. [5], where 4000 episodes were required (resulting in more than 100’000 actions) to solve a simpler task
278 V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
y
x
b
f Figure 1. Learned Control Policy. Each row shows a different situation in the environment along with its corresponding learned policy. In the top row the robot is positioned directly in front of the target object. The crosses in the camera image mark detected pixels of the target color, and the circle indicates the assumed direction towards the target. The policy for this situation is shown in terms of the visual coordinates, i.e. only the x-y camera view coordinates of the high dimensional policy are shown. Each point in the policy graph indicates, with an arrow, the direction the robot should move if the circle, shown in the image is at that point in the visual field (left arrow means move left, right=right, up=forward, down=backwards, and no arrow=stand still. For instance, in this case, the robot should move forward because the circle lies in a part of the policy with an up arrow. In the bottom row the robot is almost touching the target. Here the policy is shown in terms of the subspace spanned by the two ultrasound distance sensors found at the fore and aft of the robot. The b-axis is the distance from the robot to the nearest obstacle in front, the f -axis behind. When the robot is with its back to an obstacle, and the way forward is clear (upper left corner of policy graph), it tends to go forward. When the way forward is obstructed, but there is nothing behind the robot (lower right corner), the robot tends to turn or move backward.
in which the target was always within the perceptual field of the robot. Neither do we need a virtual model environment and manual quantization of the state space like in [14]. To our knowledge, our results are the fastest in terms of learning speed and use least quantization effort compared to all other methods to date, though we were unable to compare results directly on the hardware used by these competing approaches. In the beginning of learning, corners pose serious difficulty causing the robot to get stuck and receive negative reinforcement for being too close to a wall. When the robot accidentally turns towards the target, it will quickly lose track of it. As learning progresses, the robot is able to recover (usually within one action) when an exploratory action causes it to turn away and loose sight of the target. The discounted metric allows the robot to use its history of real-valued observation states to remember that it had just seen the target in the recent past. Figure 1 shows the learned policy for this task. Since the robot state space is perception-based (not x-y coordinates on the floor as is the case in RL textbook examples), changing the position of the obstacles or target
V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot 279 2000
100
Reinforcement
1500
0
−100
1000
−200
500
−300
0
−400 −500
−500 −1000 0
Average reinforcement
200
−600 200
400
600
800
(a)
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
−700 1800
(b)
Figure 2. PC-NSM learning performance. (a) The plot shows the reward the robot receives at each time-step during learning. (b) The plot shows the reward at each time-step averaged over all previous time-steps within the same trial. The dashed lines indicate the beginning of a new trial where the target is moved to a new location.
does not impede robot performance. Figure 2 shows learning in terms of immediate and average reward for a typical sequence of trials lasting a total of approximately 70 minutes. The dashed vertical lines in the two graphs indicate the beginning of a new trial. As learning progresses the robot is able to generalize from past experience and more quickly find the goal. After the first two trials, the robot starts to accumulate reward more rapidly in the third, after which the fourth trial is completed with very little deliberation. Figure 3 illustrates two such successful trials. 4. Discussion We have developed a instance-based algorithm for mobile robot learning and successfully implemented it on an actual vision-controlled robot. The use of a metric state space allows our algorithm to work under weaker requirements and be more data-efficient compared to previous work in continuous reinforcement learning [3,17,19]. Using a metric instead of a discrete grid is a considerable relaxation of the programmer’s task, since it obviates the need to guess the correct scale for all the regions of the state space in advance. The algorithm explores the environment and learns directly on a mobile robot without using a hand-made computer model as an intermediate step, works in piecewise continuous perceptual spaces, and copes with partial observability. The metric used in this paper worked well in our experiments, but a more powerful approach would be to allow the algorithm to select the appropriate metric for a given environment and task automatically. To choose between metrics, a criterion should be defined that determines which of a set of a priori equiprobable metrics {μ1 ,...,μn } fits the given history of experimentation better. A useful criterion could be, for example, a generalization of the criteria used in the McCallum’s U-Tree algorithm [13] to decide whether a state should be split. The current algorithm uses discrete actions so that there is a convenient way to group observation states. If the action space were continuous, the algorithm lacks a natural way to generalize between actions. A metric on the action space μa could be used within the observation-based neighborhood delimited by the current metric μ. The agent could then randomly sample possible actions at the query point hT and obtain Q-values for each sampled action by computing the μa -nearest neighbors within the μ-neighborhood. Future work will explore this avenue. Acknowledgments. This work is partially supported by CSEM Robotics Alpnach.
280 V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
(a)
(b)
Figure 3. Robot arena with learned trajectories. The picture shows two typical learning scenarios with the robot, obstacle, and target (blue plastic teapot) in their initial locations. The robot must learn to find the target using the limited visual field of its video camera, and move to it while avoiding obstacles. (a) The robot starts the trial facing away from the target before turning to its right, and navigating around the obstacle to the goal. Along the way it encounters two difficulties (shown by the rectangular highlights) caused first by going through the narrow passage and then by gap in the right wall where the front sonar returns a strong signal compared to the immediately surrounding wall because it is being specularly reflected. (b) The robot in an unoccluded arena after learning in trial (a). Here the robot drives almost directly to the target after turning to see it, but, again, is held up by the dark wall gap momentarily.
References [1] C. Anderson. Q-learning with hidden-unit restarting. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 81–88, San Mateo, CA, 1993. Morgan Kaufmann. [2] R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for nearoptimal reinforcement learning. J. Mach. Learn. Res., 3:213–231, 2003. [3] K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245, 2000. [4] J. Forbes and D. Andre. Practical reinforcement learning in continuous domains. Technical Report UCB/CSD-00-1109, University of California, Berkeley, 2000. [5] M. Iida, M. Sugisaka, and K. Shibata. Application of direct-vision-based reinforcement learning to a real mobile robot with a CCD camera. In Proc. of AROB (Int’l Symp. on Artificial Life and Robotics) 8th, pages 86–89, 2003. [6] S. Kakade, M. Kearns, and J. Langford. Exploration in metric state spaces. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 2124, 2003, Washington, DC, USA. AAAI Press, 2003. [7] M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, pages 260–268. Morgan Kaufmann, San Francisco, CA, 1998.
V. Zhumatiy et al. / Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot 281
[8] M. J. Kearns and S. P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209–232, 2002. [9] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning, 8(3):293–321, 1992. [10] L.-J. Lin and T. M. Mitchell. Memory approaches to reinforcement learning in nonMarkovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science, May 1992. [11] M. J. Mataric. Reward functions for accelerated learning. In Machine Learning: Proceedings of the 11th Annual Conference, pages 181–189, 1994. [12] R. A. McCallum. Instance-based state identification for reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 377–384. The MIT Press, 1995. [13] R. A. McCallum. Learning to use selective attention and short-term memory in sequential tasks. In P. Maes, M. Mataric, J.-A. Meyer, J. Pollack, and S. W. Wilson, editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 315–324. MIT Press, Bradford Books, 1996. [14] T. Minato and M. Asada. Environmental change adaptation for mobile robot navigation. Journal of the Robotics Society of Japan, 18(5):706–712, 2000. [15] A. W. Moore. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 711–718. Morgan Kaufmann Publishers, Inc., 1994. [16] S. Pareigis. Adaptive choice of grid and time in reinforcement learning. In NIPS ’97: Proceedings of the 1997 conference on Advances in neural information processing systems 10, pages 1036–1042, Cambridge, MA, USA, 1998. MIT Press. [17] J. C. Santamaria, R. S. Sutton, and A. Ram. Experiments with reinforcement learning in problems with continuous state and action spaces. Adapt. Behav., 6(2):163–217, 1997. [18] R. Schoknecht and M. Riedmiller. Learning to control at multiple time scales. In O. Kaynak, E. Alpaydin, E. Oja, and L. Xu, editors, Artificial Neural Networks and Neural Information Processing - ICANN/ICONIP 2003, Joint International Conference ICANN/ICONIP 2003, Istanbul, Turkey, June 26-29, 2003, Proceedings, volume 2714 of Lecture Notes in Computer Science. Springer, 2003. [19] W. D. Smart and L. P. Kaelbling. Practical reinforcement learning in continuous spaces. In Proc. 17th International Conf. on Machine Learning, pages 903–910. Morgan Kaufmann, San Francisco, CA, 2000. [20] R. Sutton and A. Barto. Reinforcement learning: An introduction. Cambridge, MA, MIT Press, 1998. [21] R. S. Sutton. First results with DYNA, an integrated architecture for learning, planning and reacting. In Proceedings of the AAAI Spring Symposium on Planning in Uncertain, Unpredictable, or Changing Environments, 1990. [22] R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038–1044. Cambridge, MA: MIT Press, 1996.
282
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Transition Entropy in Partially Observable Markov Decision Processes 1 Francisco S. Melo a,2 Isabel Ribeiro a a Institute for Systems and Robotics, Instituto Superior Técnico, Lisboa, PORTUGAL Abstract This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance in all examples tested. Keywords. Partially observable Markov processes, entropy, reward shaping, realtime applications
1. Introduction The reference to autonomous systems may be found in a multitude of contexts whenever a system is able to perform some task with minimum interference from a human operator. Moreover, the reference to autonomous systems often appears in the context of robotic navigation, where an autonomous robot is able to navigate in some environment. Robotic navigation has been the subject of intensive research, since the ability of a robotic agent to perform certain tasks greatly depends on its ability to move autonomously in its environment. Many different approaches have been proposed to cope with this problem, ranging from the classical control methodologies [17] to biologically inspired approaches [9]. Many other approaches have been proposed in the literature, some of which may be found in [5, 16]. In recent years, particular interest has been devoted to the use of topological maps in navigation, [19]. A topological map represents an environment as a discrete set of states (the nodes in a graph) and the transition information between the states (the edges of a graph). This type of environments may be easily described using Markov processes and, in fact, Markov processes have already been used to model robotic navigation tasks using different methodologies, [5, 15, 16]. In this paper, we model the control of a mobile robot as a partially observable Markov decision process (POMDP). In particular, we use POMDPs to model a mobile 1 This work was partially supported by Programa Operacional Sociedade de Informação (POSI) in the frame of QCA III. 2 The author acknowledges the PhD grant SFRH/BD/3074/2000 from the Portuguese Fundação para a Ciência e a Tecnologia.
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
283
robot which has to perform some navigation task (such as reaching a goal or following a path along a set of states) and propose a new heuristic approach suitable for real-time implementation on a mobile robot. The POMDP framework allows to model not only the uncertainty inherent to the robot’s movement but also to its measurements. General reviews of POMDP solution techniques can be found in [1,2,4]. Exact solutions for POMDPs have been developed and can be easily found in the literature [3, 11]. However, the most efficient exact methods developed so far (incremental prunning [3] and witness [11]) require prohibitive computational requirements and, hence, are of little use for systems with more than a few dozen states. Results on the POMDP complexity [7, 13] leave hope for little improvement and, hence, approximate and heuristic methods have been proposed to solve POMDPs. Some of these methods may be found in [1, 5, 6, 10, 14]. In this paper we propose a new heuristic algorithm to be used in real-time applications using POMDPs. This algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable MDP which, as illustrated by the presented results, exhibits near-optimal performance in all examples tested. Furthermore, the simplicity of the method makes it suitable for real-time implementations, since its near-optimal performance requires minimum computational load and, as such, may be easily implemented in a mobile robot. This, in turn, permits the use of POMDPs as rich models which can be used in high-level (topological) navigation tasks such as goal reaching of path following. The paper is organized as follows. Section 2 provides a quick overview of POMDPs. Section 3 presents a detailed description of the approach followed to solve POMDPs and introduces the proposed algorithm. This section conveys the main contributions of the paper. Section 4 presents some experimental results of tests conducted on several different partially observable environments. Section 5 concludes the paper by presenting the most important conclusions of the work and directions for future research. 2. Partially Observable Markov Processes Consider a mobile robot moving in an environment. Suppose that such robot is to perform some given task in a specific location of the environment. Suppose, furthermore, that the environment may be represented by a topological map (see Figure 1). In 2
1
2
3
4
5
3
States
6
4 6 1 7
7
5
Figure 1. Example of a topological map from [19].
a topological map, the environment is discretized in a set S of states corresponding to possible “topological locations” for the robot. At each time instant t, the robot has
284
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
available a finite set A of possible actions (e.g. “move North”, “go to state 2”, etc.). Whenever the robot chooses action a at state s, it will move to state s with probability P[St+1 = s | St = s, At = a] = T (s , a, s), where St and At are the state and the action of the robot at time instant t. Function T is the transition probability function. Every time the robot chooses some action a at state s it will be rewarded with a deterministic reward R(s, a) (the reinforcement) which depends only on the current state and action. The reward mechanism is the formal entity used to establish the navigation purpose for the robot. At each time instant, the robot will choose its action At so as to maximize the functional ∞ hX i V (s) = E γ k Rt+k | St = s ,
(1)
k=0
where Rt is the reward received at time instant t and 0 < γ < 1 is a discount factor assigning greater importance to more immediate rewards than to those which come in a distant future. Function V is the value function, since it assigns a value to each state s ∈ S. The solution for the navigation problem is determined once the robot knows, for each state s ∈ S, the optimal action to take, with respect to (1). The value for each state-action pair (s, a) is given by the Q-function, ∞ hX i Q(s, a) = E γ k Rt+k | St = s, At = a . k=0
The navigation of a mobile robot in an environment is sustained by its onboard sensors whose measurements allow to close the robot’s control loop, which will be present regardless of the particular methodology applied to control the robot. Any control methodology must be able to cope with the uncertainty inherent to sensorial data and be robust to the possible errors in the measurements. As argued in [16], a conventional path planner may present poor performance when facing uncertainty arising from the sensor measurements. In order to include sensor uncertainty into the model described so far, suppose that the robot has no direct access to its current state but, instead, at each time instant it reaches a state s, it makes an observation x with probability P[Xt = x | St = s, At−1 = a] = M (x, s, a),
where Xt is the observation at time t and we allow it to depend not only on the robot’s current state, but also on its last action. The observations Xt take values in a finite set X . Function M is the observation probability function. Since a control strategy simply based in the current observation may lead to an arbitrarily poor performance, [18], the concept of belief state is introduced. A belief state π is a probability distribution over the set of all states and indicates, at each time instant, the probability of being in each of the states in S. Formally, πt (s) = P[St = s], for each s ∈ S. This belief state captures all the relevant aspects of the entire previous history of the robot. Given that, at some time instant t, an action a was taken and an observation x was then made, the value of the belief-state may be updated using simple Bayesian rules according to P M (x, s , a) s πt (s)T (s, a, s ) P[St+1 = s | At = a, Xt+1 = x] = πt+1 (s )|a,x = P . (2) s,s πt (s)T (s, a, s )M (x, s , a)
Recall that, in (1), a value was assigned to each state s ∈ S. Using a similar reasoning, it is also possible to assign a value to a belief state π. Define a policy δ(π) as a mapping from the belief space, Π(S), into A. The value of a particular belief state πt , according to a policy δ is
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
V δ (πt ) = E =
X
"∞ X
285
# γ i Rt+i | πt =
i=0
h i X πt (s) R(s, δ(πt )) + γ T (s, δ(πt ), s )M (x, s , δ(πt ))V δ (πt+1 ) .
(3)
s ,x
s ∗
A policy δ ∗ is optimal if V δ (π) ≥ V δ (π) for any policy δ and any belief-state π. The corresponding optimal as V ∗ , verifies i X Xvalueh function, denoted V ∗ (π) = max a∈A
T (s, a, s )M (x, s , a)V δ (π ) .
π(s) R(s, a) + γ
s ,x
s
In spite of all the work in POMDPs, the exact algorithmic solutions for solving POMDPs, i.e., for computing the optimal policy δ ∗ , still suffer from severe limitations, mainly due to the complex computation involved in any iterative process using (3). In Section 3, a new heuristic method is proposed which is significatively more time-efficient than known exact POMDP solution algorithms. This feature makes it suitable for realtime implementations. 3. Policy Computation in POMDPs Although partial observability provides a powerful way of modeling uncertainty arising from noisy, partial or aliased measurements, this increased modeling ability over fully observable MDPs introduces severe problems in the computation of the optimal policy. Solutions for fully observable MDPs can be computed using efficient algorithms such as value iteration (see [7] on complexity of MDP solutions) and the corresponding optimal policies can be implemented straightforwardly in a robot. In the literature, [2, 4, 8, 10], some heuristic methods have been proposed which make use of MDP results in order to compute approximate policies for POMDPs. Examples of such methods include grid-based methods, [8], where the belief-space Π(S) is discretized into a finite set of grid-points. The obtained model may then be treated as an MDP and near-optimal behavior is recovered. Grid-based methods provide good results but discretization may still require a prohibitively large number of gridpoints in order to properly sample the belief-space. Another sound method from the literature is the Q-MDP algorithm, [10], where the optimal Q-functions for the underlying MDP are computed and the derived policy X is given by a∗ = arg max a∈A
π(s)Q∗ (s, a).
s
The computational requirements of Q-MDP are similar to those of MDP methods. However, if the robot finds itself in a situation where the uncertainty regarding its actual state is large, the Q-MDP will present a poor performance, since it does not take such uncertainty into account. 3.1. Entropy The Q-MDP algorithm will not choose an action a just to collect information, as argued in [10]. To overcome such limitation, some alternative methods, [1, 4], have been proposed which use the information that an action may, potentially, bring to the agent. These methods borrow the concept of entropy from information theory, and define normalized entropy of a belief-state π as P ¯ H(π) =−
s∈S
π(s) ln(π(s)) , ln(|S|)
(4)
286
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
¯ where |S| is the number of elements of S. Notice that 0 ≤ H(π) ≤ 1. In particular, ¯ H = 1 for a uniform belief-state and 0 only if there is a state s such that π(s) = 1. Entropy provides a useful measure on the uncertainty/information inherent to a given belief-state π. We are interested in devising a method that overcomes the most obvious limitations of Q-MDP while maintaining the same computational load. In particular, the method should cope properly with large uncertainty in the belief-state. Large-entropy beliefstates should seek to reduce uncertainty while low-entropy belief-states should closely follow the underlying MPD optimal policy. As described in Chapter 6 of [4], dual-mode and weighted entropy control strategies are proposed which include entropy minimization considerations when defining the policy for the robot. However, as argued in [1, 2], such policies implicitly assume that the minimum of the value function is achieved at the point of maximum entropy. Because of the convexity of the value function, [4], the point of maximum entropy should, in fact, be close in some sense to the point of minimum value. However, in situations where the two points are actually different, this may lead to poor performance of these methods. To overcome such limitations, a modified MDP is proposed which includes, in its reward structure, entropy information regarding each particular transition. Moreover, the entropy information is associated with reward-information in order to keep the policy from blindly seeking minimum-entropy points. The optimal policy from this modified MDP will then be combined (using an entropy weighting criterion) with the optimal policy from the original MDP to yield the final policy. If the robot is at some state s ∈ S, a transition occurs by choosing an action a ∈ A and then observing some x ∈ X . As such, we define the transition entropy associated with the triplet (s, a, x) as ¯ T (s, a, x) = H X T (s, a, s )M (x, s , a)P[x | s, a] s
=−X
T (s1 , a, s2 )M (x, s2 , a) ln(|S|)
s1 ,s2
ln
T (s, a, s )M (x, s , a) X T (s3 , a, s4 )M (x, s4 , a)
! (5)
s3 ,s4
¯ T (s, a, x) provides a measure of the expected entropy arising from The value of H a transition triplet (s, a, x). See [12] for a detailed analysis of (5) and a formal definition of transition entropy. We define the reward associated with the triplet (s, a, x) as R(s, a, x) = max
X
a
T (s, a, s )M (x, s , a)R(s , a ).
s
The reward R(s, a, x) provides a myopic measure of the expected revenue, in terms of rewards, resulting from the transition triplet (s, a, x). By modifying the rewards of the underlying MDP by including transition entropy information, we obtain a new MDP (S, A, T, RN ), where RN is defined as RN (s, a) =
X
¯ T (s, a, x)). R(s, a, x)(1 − H
x
Q∗N ,
The optimal Q-functions from both the original and the modified MDPs, Q∗ and are then combined, using entropy weighing, to yield the final policy: » – δ(π) =
X s
` ´ ¯ ¯ π(s) Q∗ (s, a) 1 − H(π) + Q∗N (s, a)H(π) .
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
287
It is evident that, in run-time, the proposed algorithm needs only the MDP solutions (which can be computed off-line) to choose the POMDP action. This makes it suitable for run-time implementation, similarly to the Q-MDP algorithm, since, at each time instant, it only requires the on-line update of the belief-state, the computation of the corresponding entropy and its product by matrices Q∗ and Q∗N , which are all simple operations. Suppose, then, that a POMDP is used to model a mobile robot moving in an environment described as a topological map. POMDPs provide rich models since they take into account the uncertainty inherent to the movement of the robot, while allowing arbitrary noise in the sensorial data (for example, no Gaussian hypothesis is required or implicit). The policy arising from the proposed algorithm, which we will refer as the TEQ-MDP (Transition Entropy Q-MDP), can be implemented in a robot which will, at each time instant t, act accordingly, choosing its action depending on its current belief-state. 4. Results In this section we present results obtained by the proposed TEQ-MDP algorithm, and compare them with those obtained with the Q-MDP and with the optimal behavior in the underlying MDP. The description of the partially-observable navigation problems where the algorithms were tested can be found in the literature, namely in [4, 10, 12]. We also refer to [12] for a more detailed analysis of the performance as well as to results of the TEQ-MDP algorithm in other test-environments. The test environments, herein named Shuttle, 4 × 4 Grid, Cheese Maze, 4 × 3 Grid and 89-State Map, allow a comparison on the performance of the three algorithms (MDP, TEQ-MDP and Q-MDP), when applied to each case. The dimensions of the corresponding MDPs are listed in Table 1.a). In Table 1.b), the off-line computation time is also presented, in seconds, when γ = 0.995. Clearly, the Q-MDP policy uses the MDP opTable 1. MDP dimensions and computation time. |S|
|X|
|A|
MDP
TEQ-MDP
Q-MDP
Shuttle
8
5
3
4 × 4 Grid Cheese Maze 4 × 3 Grid
16 11 11
2 7 6
4 4 4
0.484 s 0.735 s 0.516 s
0.719 s 0.922 s 0.890 s
0.484 s 0.735 s 0.516 s
89-State Map
89
17
5
0.562 s 7.110 s
1.094 s 20.390 s
0.562 s 7.110 s
Environment
a) MDP Dimensions.
b) Computation time.
timal solutions and, as such, takes the same computation time. The TEQ-MDP requires, in general, a slightly larger off-line computational effort. It should however be referred that the times in Table 1.b) are concerned to computations performed off-line and that, in terms of on-line effort, both algorithms present negligible computation time. In Table 2 we present the total discounted reward obtained by the different algorithms in a 100 time-steps cyclic trial with the results averaged over 1000 Monte-Carlo runs and γ = 0.995. The bold-marked line corresponds to a large-dimension example which will be commented in detail further ahead. In the first 4 environments, the TEQ-MDP has a behavior which is similar to the Q-MDP which, as seen in [4, 10], is near optimal for these four environments. The comparison between the results achieved by the TEQ-MDP and the Q-MDP agents and the results obtained by the MDP agent
288
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
Table 2. Discounted Reward for γ = 0.995. Average on 1000 Monte-Carlo runs of 100 time instants. Environment Shuttle
MDP
TEQ-MDP
Q-MDP
201.956
201.609
201.585
4 × 4 Grid
17.769
14.689
14.595
Cheese Maze
15.778
14.381
14.357
4 × 3 Grid
12.611
9.678
9.744
89-State Map
62.832
35.747
0.884
shows that, in these four problems, partial observability does not pose serious difficulties in terms of control, since all three agents present similar total rewards. This happens because either the observations allow to disambiguate many of the states of the underlying MDP or the optimal policies are very simple and easy to follow even in the absence of state information (see [12] for details on the problems). The 89-State Map problem Table 3. Goal achievement percentage for the 89-State Map. Method
γ = 0.95
γ = 0.995
MDP TEQ-MDP Q-MDP
100% 63.7% 3.9%
100% 61.4% 1.6%
is aimed at testing the performance of the algorithms in a larger state-space POMDP. In this example, a robot moves in an environment with significant perceptual aliasing (many states exhibit similar observations) and, at each time instant, has several available possible actions, including the possibility of performing no action whatsoever. A different measure of performance will be used for this example that accounts for the number of runs in which the agent was able to reach the goal. The percentage of success (goalachievement) in this problem is summarized in Table 3, where it is clear that the superiority of the TEQ-MDP algorithm is overwhelming. Notice the clear relation between the percentages in Table 3 and the rewards in the last example of Table 2. This is due to the fact that, in this environment, a large number of states yield similar observations, this leading to frequent situations where the agent is completely lost. As such, the QMDP agent is not able to choose an action in order to successfully progress in the maze (see [12] for further details). 5. Conclusions and Future Work In this paper we proposed a new POMDP algorithm, named Transition Entropy Q-MDP (TEQ-MDP). This method is suitable for real-time implementation and, thus, adequate for navigation tasks of mobile robots. The algorithm includes entropy information in the reward structure of a modified (fully observable) MDP and uses this reward-shaping strategy to overcome some limitations of other fast POMDP methods. The algorithm was tested in several examples from the literature and presented near-optimal performance in all examples tested. In [12] further tests confirm the near-optimal behavior of the TEQMDP algorithm. Although the last example presented corresponds to a large state-space problem (83 states), future work must include additional tests of the TEQ-MDP algorithm in even larger problems such as the large real-world problems described in [4], to understand its
F.S. Melo and I. Ribeiro / Transition Entropy in Partially Observable Markov Decision Processes
289
exact range of applicability and perceive if it actually constitutes a universal alternative to the computationally untractable exact solution methods. References [1] D. Aberdeen. A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes. Technical report, National ICT Australia, 2003. [2] D. A. Aberdeen. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National University, April 2003. [3] A. Cassandra, M. L. Littman, and N. L. Zhang. Incremental Prunning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes. In Proc. 13th Annual Conf. on Uncert. Artificial Intelligence (UAI-99), pages 54–61, 1997. [4] A. R. Cassandra. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University, May 1998. [5] A. R. Cassandra, L. Kaelbling, and J. A. Kurien. Acting under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation. Math. Oper. Research, 12(3):441–450, 1987. [6] A. R. Cassandra, L. P. Kaelbling, and M. L. Littman. Acting Optimally in Partially Observable Stochastic Domains. In Proc. 12th Nat. Conf. Artificial Intelligence, pages 1023–1028. AAAI Press, 1994. [7] J. Goldsmith and M. Mundhenk. Complexity Issues in Markov Decision Processes. In Proc. 13th Annual IEEE Conf. on Computational Complexity, pages 272–280, 1998. [8] M. Hauskrecht. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes. In Proc. 14th Nat. Conf. Artificial Intelligence, pages 734–739, 1997. [9] D. Lambrinos, R. M oller, T. Labhart, R. Pfeifer, and R. Wehner. A Mobile Robot Employing Insect Strategies for Navigation. Rob. Aut. Systems, 30:39–64, 2000. [10] M. Littman, A. Cassandra, and L. Kaelbling. Learning Policies for Partially Observable Environments: Scaling Up. In Proc. 12th Int. Conf. Machine Learning, pages 362–370, 1995. [11] M. L. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes. Technical Report CS-94-40, Dep. Comp. Sciences, Brown Univ., 1994. [12] F. S. Melo and M. I. Ribeiro. The Use of Transition Entropy in Partially Observable Markov Decision Processes. Technical Report RT-601-05, Institute for Systems and Robotics, IST, Lisbon, Portugal, 2005. [13] M. Mundhenk, J. Goldsmith, and E. Allender. The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes. In Proc. 22nd Int. Symp. Mathematical Found. Computer Science, pages 129–138, 1997. [14] R. Parr and S. Russell. Approximating Optimal Policies for Partially Observable Stochastic Domains. In Proc. Int. Joint Conf. Artificial Intelligence, pages 1088–1094. Morgan Kaufmann Publishers, 1995. [15] N. Roy, G. Gordon, and S. Thrun. Finding Approximate POMDP Solutions Through Belief Compression. J. Artif. Intel. Research, 23:1–40, 2005. [16] N. Roy and S. Thrun. Coastal Navigation with Mobile Robot. In Proc. of 1999 Conf. on Neural Information Proc. Systems (NIPS’99), pages 1043–1049, 1999. [17] L. Sheng, M. Guoliang, and H. Weili. Stabilization and Optimal Control of Nonholonomic Mobile Robot. In Proc. 8th Int. Conf. on Control, Automation, Robotics and Vision, pages 1427–1430, 2004. [18] S. Singh, T. Jaakkola, and M. Jordan. Learning Without State-Estimation in Partially Observable Markovian Decision Processes. Proc.11th Int. Conf. Machine Learning, pages 284–292, 1994. [19] A. Vale. Mobile Robot Navigation in Outdoor Environments: A Topological Approach. PhD thesis, Instituto Superior Técnico, February 2005.
290
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Movement Control of Tensegrity Robot Masaru Fujii a,1 , Shinichiro Yoshii a and Yukinori Kakazu b a Hokkaido University b Hokkaido Infomation University Abstract. Certain kinds of mollusks can perform flexible actions by changing body shape. Octopi, in particular, have little in the way of a skeletal structure, and their body frame consists mostly of muscular fibers roughly equivalent to actuators. Here, we have applied tensegrity structure[3,1,2] as a flexible body structure, just as the muscular fibers of an octopus. This tensegrity structure is used in a robot that moves by changing form, creating various shapes in the process. Since the most effective means of movement in the case of a simple body is rotation, sensor information fluctuates drastically. Here, we assess control of such a robot by rule-based general reinforcement learning and by a recurrent neural network using neural oscillators. When reinforcement learning was used, because the sensor information is prone to drastic change, the robot had only a small number of chances to learn in each state, making effective control regardless of the state difficult. However, a certain level of control was possible during conditions in which the sensor information did not alter as significantly. When neural oscillators were used, a considerable amount of time was required for updating link weights, but smooth movement was achieved because of the rhythm generated by the oscillators. Keywords. Tensegrity Structure, Reinforcement Learning, Flexible structures
1. Introduction Although most kinds of mollusks, the octopus for example, do not have a full skeletal structure, they are capable of independent movement. Such a mollusk moves by changing bodily form. In order to research such a structure that is capable of this type of self-transforming yet stable-form movement, we have employed a concept that is used primarily in structural engineering, that is, ’tensegrity’. ’Tensegrity’ is a word coined by R. Buckminster Fuller and is a compound derived from ’tensile’ and ’integrity’. Tensegrity structures are characterized by achieving an internally stable structure by balancing contiguous tension and local compressive forces. As the physical model in Figure 1 shows, in a structure composed of rigid bodies (rods) and connecting tensile objects (springs), the rigid bodies are not con1 Correspondence to: Masaru Fujii, Kita 14, Nishi 9, Kita-ku, Sapporo, 060-0814 Japan. Tel.: +86 011 706 6445; E-mail:
[email protected].
M. Fujii et al. / Movement Control of Tensegrity Robot
291
nected to each other. Internal force can be generated expansion and contraction of the tensile portion, allowing the object to change its own shape. In the current paper we propose a robot that uses tensegrity structure to achieve flexible motion in which the motion is controlled. In conventional robots, internal or external frames are often created by imitating those of living creatures, and therefore it is relatively easy to imagine by analogy how the robot should move. With the robot proposed here, however, although there is a eframef that approximates a skeletal structure, the components are not joined directly. Rather, the components form a kind of esoft structuref, which makes it quite difficult for the designers to plan its motion in advance. Moreover, it is impossible to discover a method ahead of time of achieving movement by rotation shown in Figure 1, which requires the realization of rotary motion in the spherically structured robot, For this reason, in the robot we introduce in the current paper, we secure motion using a reinforcement learning[7] method and by controlling the tensile units of the tensegrity structure in a binary ON/OFF fashion.
2. Tensegrity Structure A ’tensegrity’ structure, as proposed by Buckminster Fuller, consists of rigid bars (rods) and tensile members connecting them. In a tensegrity structure, rigid bars are joined at both ends by the tensile members, resulting in a three-dimensional structure, as shown in Figure 1. The rods themselves are connected at both ends by three or more tensile members, and are inside the polygon comprising the tension vectors generated by the tensile members, and because it is based on a structure which disregards shear and bending forces, a stable balance is maintained at all joints. For this reason, the object would maintain its structure even under conditions of weightlessness. In a tensegrity structure, the rigid are not directly connected to each other. For this reason, the rigid bodies appear to float in air. Mechanically, only tension is applied to the cables and only compression to the rigid bodies. Since most of the forces applied to ordinary structures are compression and bending forces, strong and thick materials that can withstand those forces are necessary. However, since bending forces are not generated in tensegrity structures, if strong and lightweight materials are used, a structure very light in appearance can be built. Moreover, it is said tensegrity structures can support masses dozens of times their own weight. In addition, flexible objects like tensegrity structures, because they include tensile members, are prone to large transformations in shape. For this reason, although features of the structure such as bendability and pliability stand out, it is difficult to predict and control changes in their shape.
3. Tensegrity Robot Taking advantage of the flexibility and strong lightweight features of tensegrity structures, we have attempted to build a robot that can move by changing its
292
M. Fujii et al. / Movement Control of Tensegrity Robot
Figure 1. Tensegrity Structure.
shape and at the same time maintain structural stability by applying changes in tension in the structure. Specifically, for the unit acting as the tensile member, we considered a tension spring shape memory alloy (SMA) or a winding mechanism using a tension spring and thread. For the rigid part, we implemented a tension control device. Actual mounting, however, proved to be very difficult. When using SMA on the limited space that the rod provides, difficulties such as high the mounting of a high-capacity power supply for a heating unit or direct power distribution. Even in the case that a spooling mechanism is used, a controllable stepping motor or servomotor must be mounted. In either case, the balance of weight must also be taken into account. The tensegrity shown in Figure 1 has six rods, and since its shape its close to spherical, movement by rotation is possible, but in this case, since movement is achieved by shifting the center of gravity, any eccentricity in the balance of the weight of the rods would largely affect the ability to control. Conversely, if movement by rotation were restricted, the influence of weight-dependent friction would emerge. Therefore, mounting of the control devices to the rods requires the utmost care. 3.1. Physical Characteristic C. Paul is expressing tensegrity structure as follows[5]. If tensegrity structure has n rods, state vector q ∈ 6n is expressed with the group of an angle to the position coordinates (θi , φi , ϕi ) and coordinates of each tension spring (xi , yi , zi ) as shown in a equation (1). And the Newton-Euler Equations for tensegrity can be written in form (6) that the next was generalized, by group FC of the power which each tension object generates, and group FB of the power which each rod receives with the group and gravity of the power which each rod receives from the ground. Although C.Paul had not made reference about power FB by gravity, tensegrity was incorporated in order to perform operation using gravity, so that it might state later. q = [x1 , y1 , z1 , θ1 , φ1 , ϕ1 , · · · , xn , yn , zn , θn , φn , ϕn ]
(1)
FC = [FC1 , FC2 , · · · , FC2m ] d||rj − rk || r Fi = k (||rj − rk || − lci ) − c dt
(2) (3)
M. Fujii et al. / Movement Control of Tensegrity Robot
293
FCi = [FG1 , FG2 , · · · , FGn ]
(4)
FG = Fi · (rj − rk )
(5)
FB = [FB1 , FB2 , · · · , FBn ]
(6)
˙ FB , FG , FC ) A(q)¨ q = B(q, q,
(7)
Thus, since tensegrity structure has much flexibility and it influences each other, it is a very complicated system, and when controlling these dynamically, it is obvious for an analytical technique to be impossible. Therefore, this research investigates operation of tensegrity structure by performing a numerical computation simulation according to the above-mentioned equations.
4. Simulation Although tensegrity structure is used for a robot in this study, no mention has been made of how the tensile members can be manipulated in order realize movement control. Therefore, we shall first discuss the controllability of a tensegrity structure by tensile member manipulation through simulation. In order to perform a simulation of rigid member movement by tensegrity structure, we here use the Open Dynamics Engine (ODE)[6], which is an open source physical simulation library. ODE realizes the simulation of rigid body movement based on physics, and facilitates simulation, including constituted implementation of dynamics calculations of a multiplex link rigid body, gravity, and friction characteristics such as from the ground under the rigid body. Through simulation validity can be confirmed without having to implement multiple and differing tensegrity structures. Furthermore, the simulation makes it is possible to easily build different combinations of tensegrity structures and to implement various sensors which would be difficult in a real-world environment. The rods of the tensegrity structure were defined as cylindrical objects, and in order to provide a structure which touches the ground at a point, both ends of the cylinders were treated as hemispherical. This was to lessen the influence of friction by the grounding angle as much as possible. Although the tensile members which connect each rod are defined as a virtual wire without mass, considering actual implementation, they were connected externally to the cylinders, and the rod end-points were expressed so that they connect at a 1:1 ratio. The robot parameters implemented are shown in Table 1. Table 1. Value of parametersfor Tensegrity Robots Parameter
Value
strut length width
0.80 m
cable spring constant k
0.70 N/m
cable damping constant c
10.0 Ns/m
cable rest length l0
0.30 m
294
M. Fujii et al. / Movement Control of Tensegrity Robot
Concerning the tensegrity structure, the following two methods can be considered in order to achieve independent movement. One is gait-like movement[5], where rods act like legs and the robot gwalksh. However, with this structure, since the space to stretch the legs is restricted, it is not effective for movement here. Another method is a toppling motion, which is achieved by shifting the center of gravity and getting the body to roll over repetitively. This is extremely similar to the amoeboid movement that slime mold, a protist without pseudopodia, uses to move. In C. Paulfs study[5], since the change in the natural length of the springs is treated as instantaneous, the robot moves in a gait-like manner by friction with the floor surface and the elastic force generated by the instantaneous elongation and contraction of the springs. The purpose of our study, however, is to achieve movement just by changing the shape of the tensegrity structure. Therefore, in order to avoid movement by the elastic force by the rapid extension and contraction of a spring, we did not use the instantaneous change in the natural length of the spring as C. Paul did. Rather, we changed the natural length of the spring over a certain length of time.
5. Movement Control by Reinforcement Learning This section describes the securing of movement by the reinforcement learning of Mobile SMA-Net. For this study, we developed a controller algorithm (SLA controller) based on a Stochastic Learning Automaton (SLA). The details are given below. 5.1. Behavior of Tensegrity-Robot As shown in Section 3.1, tensegrity structure is realized on the complicated interaction and a designer is unable to control by hand coding. Therefore, the tendency of movement was investigated using the genetic algorithm (GA) which is one of the techniques of solution search. ON/OFF of 24 tension springs was enumerated in the gene, and what was repeated 4 times was used for it. By this, tensegrity structure changes ON/OFF of 4 times of each tension spring for every fixed time. The combination which can move to a long distance more was looked for using move distance as a degree of adaptation. The number of individuals of GA is 50 and is calculated to 240 generations by GA. The last generation’s action is shown in Figure.2. Consequently, it turns out that it can move by the change of ON/OFF. Moreover, in ON/OFF of the 1st time, it is also checked that movement is not performed, that is, movement is realized by two or more times of combination. I call this the rhythm of form modification here, and think that tensegrity structure gains the rhythm which realizes movement through some form. By learning this rhythm, move control is possible. Next, the controller by the strengthening study which learns 2 times of ON/OFF as one method of gaining this move rhythm is proposed.
295
M. Fujii et al. / Movement Control of Tensegrity Robot
t = 0[s]
t = 90[s]
t = 180[s]
t = 270[s]
t = 360[s]
t = 450[s]
t = 540[s]
t = 630[s]
t = 720[s]
t = 810[s]
Figure 2. Tensegrity Moving.
5.2. SLA Controller An SLA controller is carried in each tensile member, and periodically controls the ON/OFF of each connection link. Each ON/OFF pattern was categorized into four kinds: always OFF, from ON to OFF, from OFF to ON, and always ON. SLA, which is a form of reinforcement learning, is a method which chooses Action i based on probability, and corrects action selection probability based on an evaluation that action. There are two kinds of selection probability update: the case where a reward is given since the right action was chosen, and the case where a penalty is given because the wrong action was chosen. Here, the action selection probability pi is updated by the following updating formulas. if d ≤ 0 : pi = pi + αsla pj=i = pj −
d pi dmax
αsla d pi n dmax
(8) (9)
otherwise : pi = pi − βsla pi pj=i = pj +
βsla pi n
(10) (11)
Here, αsla is the rate of learning when a reward is given, and βsla is the rate of learning when a penalty is given. When rewards are given, a change is applied to the rate of learning based on the ratio of the maximum movement distance dmax and the movement distance d by the action up to that time. This is a device for achieving better action, and is a means to make a distinction between the rewards when the best action has been chosen and other rewards.
296
M. Fujii et al. / Movement Control of Tensegrity Robot
Figure 3. Experiment setup.
(a)
(b)
Figure 4. The move locus by the SLA controller.
Action selection probability is set for each state (sensor information), and action is achieved that matches the circumstances by distinguishing between action selection probabilities properly. State division of each SLA is obtained from the value of the optical sensor indicating the direction of advancement and grounding sensors installed on all surfaces. All agents share this information. It is possible to express the advancement direction as the direction in which there are sensors that have reacted. 5.3. Numerical Simulation As shown in Figure 3, light sources were placed in positions separated by an arbitrary amount deemed appropriate, and movement was achieved by making approaching the light source using sensors the goal of the robot. Figure 4 (a) shows the track when movement was achieved by the SLA controller. Changes in the distance to the target are shown in Fig. 4(b).
6. Conclusion It is very difficult to change form correctly according to the purpose in control of the tensegrity structure of performing completely different operation from the conventional robot. However, I was able to make tensegrity structure learn the form pattern for movement. When using globular form tensegrity structure as the target, in one modification, it turns out that the purpose cannot be attained.
M. Fujii et al. / Movement Control of Tensegrity Robot
297
Therefore, I made tensegrity structure learn combining two actions. Thus, making it change into arbitrary form can control by learning the rhythm which realizes the purpose and it in difficult tensegrity structure.
References [1] D.E.Ingber, ”The Architecture of Life”, Scientific American 278: pp.48-57, 1998. [2] D.E.Ingber, ”Cellular tensegrity: defining new rules of biological design that govern the cytoskeleton”, Journal of Cell Science Vol.104, 1993. [3] R.B.Fuller, ”Tensile-integrity structures”, United States Patent 3,063,521, November 13, 1962. [4] J.B.Aldrich, R.E.Skelton and K.Kreutz-Delgado, ”Control Synthesis for a Class of Light and Agile Robotic Tensegrity Structures”, Proceedings of the IEEE American Control Conference, Denver, Coloradao, USA, June 4-6, 2003 [5] C.Paul, J.W.Roberts, H.Lipson, F.J.Valero-Cuevas, ”Gait Production in a Tensegrity Based Robot”, International Conference on Advanced Robotics, 2005 [6] Russel Smith, ”Open Dynamics Engine ODE”, http://ode.org/ [7] Sutton, R. S. and Barto, A.: Reinforcement Learning: An Introduction, A Bradford Book, The MIT Press, 1998.
298
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
An adaptive neural controller for a tendon driven robotic hand Gabriel Gómez a,1 , Alejandro Hernandez b and Peter Eggenberger Hotz a a Artificial Intelligence Laboratory Department of Informatics, University of Zurich, Switzerland b
Developmental Cognitive Machines Lab. University of Tokyo, Japan.
Abstract. In this paper we present our ongoing work on the control of a tendon driven robotic hand by an adaptive learning mechanism evolved using a simulator developed over the last years. We present the “ligand-receptor” concept that can be easily used by artificial evolution to explore (a) the growing of a neural network, (b) value systems and (c) learning mechanisms systematically for a given task (i.e., grasping). The proposed neural network allows the robotic hand to explore its own movement capabilities to interact with objects of different shape, size and material and learn how to grasp them. As the evolved neural controller is highly adaptive, it will allow us in the future to systematically investigate the interplay between morphology and behavior using the same, but adaptive neural controller. Keywords. ligand-receptor concept, network connectivity, learning mechanisms, value system, robotic hand.
1. Introduction We present a common basis to investigate the growing of a neural network, value systems and learning mechanisms, the so-called: “ligand-receptor” concept to teach a robotic hand (with 13 degrees of freedom, complex dynamics provided by a tendon driven mechanism of actuation and different types of tactile sensors) to grasp objects. In our implementation, receptors abstract proteins of specific shapes able to recognize specifically their partner molecules. Ligands, on the other hand, are molecules moving around, which also have specific shapes, and are basically used as information carriers for their receptors. The shape of a receptor determines which ligand can stimulate it, much in the same fashion, as notches of jigsaw pieces fit exactly into the molds of other pieces. When a receptor is stimulated by a matching ligand (signaling molecule), the following mechanisms can be elicited on a neuron: connect to a neuron expressing a partner receptor, release a ligand molecule, express a receptor(see [1]). As a result of the interplay of these processes, the specification of the neural network (i.e., number of neuronal fields, size of each neuronal field, a set of receptors expressed by each neuronal unit, a set of signals 1 Correspondence to: Gabriel Gómez, Artificial Intelligence Laboratory. University of Zurich. Andreasstrasse 15, CH-8050, Zurich, Switzerland. Tel.: +41 44 635 6722; Fax: +41 44 635 4507; E-mail: gomez@ifi.unizh.ch
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
(a)
299
(b)
(c) Figure 1. Tendon driven robotic hand. (a) Experimental setup. (b) and (c) Grasping different objects.
that can be released by the sensor neurons) was obtained and then embedded as a neural controller for a robotic hand. In the following section we describe the tendon driven mechanism of our robotic hand, and the position, type and number of sensors covering it. In section 3 we specify the robot’s task. In section 4 we explain in more detail the “ligand-receptor” concept and its implications. Then we present some experimental results as well as a discussion and future work. 2. Robotic setup Our robotic platform can be seen in Fig. 1a. The tendon driven robot hand is partly built from elastic, flexible and deformable materials. For example, the tendons are elastic, the fingertips can be covered with deformable materials and between the fingers there is also deformable material (see [2] and [3]). The hand apply an adjustable power mechanism develop by [4]. The wire’s guide is flexible, and moves proportionally to the load applied to the fingertip (see Fig. 2b). When the load is small, the guide moves toward the fulcrum and the fingertip moves faster with low torque. In contrast, when a big load is applied to the finger the guide moves away from the fulcrum, resulting in a higher torque motion. Due to the spring-like characteristics of the wire’s guide, the finger’s joints gain flexibility likes the human hand fingers. The relationship between the torque T, which is generated in the fingertip, and the force F, which pulls the wire, is: T = LF sin(θ1 )
(1)
300
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
(a)
(b)
Figure 2. (a) Schematic design. (b) Three functions of the adaptive joint mechanism.
Here, the angle θ1 is defined as an angle θ2 with the fulcrum-action line and a fulcrum-pulling force line, an angle β with the fulcrum-guide roll line and a pulling force line, and a distance x from the fulcrum to a point of lever (guide roll): θ1 = T an−1 (
xsin(θ2 − β) ) L − cos(θ2 − β)
(2)
If the spring is connected near the fulcrum β is a constant value. The distance x is given by the following equation:
x=
F cos(θ1 + θ2 − β) k
(3)
θ1 , θ2 and L are defined in Fig. 2a, respectively. In a manipulator with multiple joints, the torque needed at the root joint is larger than those in the extreme joints. In the tendon driven mechanism, the torque from the other joints interferes with the torque at the root joint, resulting in a large torque at the base of the manipulator. In order to acquire in a simple way the passive motion function in addition to the adjustable velocity and torque functions, the current robotic hand uses a spring type wire as an elastic guide for the inner wire. If the load does not require the application of a high torque on the finger, the wire remains straight. In the case of a heavy load that requires more torque, the outer wire bends drawing the inner wire taut (see Fig. 2b). The outer wire is made of stainless steel and the inner wire is nylon. The robotic hand has 13 degrees of freedom that are driven by 13 servomotors and has been equipped with two types of sensors: flex/bend and pressure sensors. 2.1. Bending sensors For the flex/bend sensor, the bending angle is proportional to its resistance and responds to a physical range between straight and a 90 degree bend, they are placed on every finger as position sensors. 2.2. Pressure sensors based on pressure sensitive conductive rubber We developed a new type of pressure sensors based on pressure sensitive conductive rubber CS57-7RSC (CSA). The conductive rubber contains conductive carbon particles
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
(a) Conductive rubber diagram
301
(b) External pressure applied
(c) External pressure applied Figure 3. Pressure sensors based on pressure sensitive conductive rubber. (a) Diagram. (b) Inner resistance change due to external pressure applied on the conductive rubber. (c) Relation between the pressure applied and the voltage output from the pressure sensor. The voltage increases as the resistance in the conductive rubber decreases with the pressure applied at 9 Volts.
(see Fig. 3a) that changes the material resistance according with the pressure applied on its surface (see Fig. 3b). The pressure sensor is based on a voltage divider that traduces the changes in resistance into voltage. The voltage-pressure relationship is presented in Fig. 3c. The pressure sensors are positioned on the palm, the fingertips, in the middle of the fingers and in the back of the robot hand as shown in Fig. 5. 2.3. Control We control the robot hand using a TITech SH2 controller. The controller produces up to 16 PWM (pulse with modulation) signals for the servomotors and acquire the values from the bending and pressure sensors. The motor controller receives the commands through an USB port. We used two computers: one hosts the neural network and communicates with the sensorimotor control board to acquire the sensory data and produce the motor commands, the other computer is in charge of the visualization of the neural network. When learning takes place, the two computers exchange the synaptic changes via TCP/IP. The sensorimotor data was stored in a time series file for off-line analysis. Using this setup we were able to see the actual behavior of the robot (observer’s perspective), we had access to the sensorimotor data generated by the interaction of the hand with objects in the environment through the sensorimotor control board and additionally we could see inside the neural network and investigate how the learning mechanism, the sensory input and the interaction with the environment were shaping the neural structure (robot’s perspective).
302
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
Figure 4. Ligand receptor concept.
3. Robot task A typical experiment was performed as follows: at the beginning, the robot hand was initialized with all the fingers fully extended. Then an object was placed on its palm whose weight was high enough to activate the pressure sensor located in the palm, the hand started to explore its own movements by randomly moving the fingers, increasing and decreasing the position of the servo motors produced the pulling of the tendons, which made the fingers move back and forth . Eventually the fingers encountered the object, and some pressure sensors were activated, by comparing the previous pressure readings with the current ones the learning mechanism taught the hand about the success of its exploratory efforts and finally the hand grasped the object successfully. After that the hand remained grasping the object until it was taken away from it, in that case the sensory input was reduced to zero and the hand returned to the initial state with the fingers fully extended.
4. Ligand-receptor concept To be able to explore both the connectivity, the learning, and the value systems, we use the ligand receptor concept (see [5] and [6]). As can be seen in Fig. 4 a couple of ligandreceptor such as signal0 and receptor0 can be used to explore the connectivity, while a couple of ligand-receptor such as signal2 and receptor2 can be used for learning a specific task. There is not possible interaction between a couple of ligand-receptor such as signal1 and receptor3. We define the probability of an interaction between a ligand and a receptor as the affinity, which calculates an artificial binding between the two entities. This affinity parameter aff determines which molecule (signaling molecule) interacts with which partner (receptor). To each signal and receptor a real valued number and a function are assigned. This function is implemented as follows:
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
303
Figure 5. Neural structure and its connections to the robot’s sensors and motors. Neuronal areas: (a) sensorField. (b) hiddenField. (c) motorField. (d) motorActivities.
a12 = faf f (af f1 , af f2 ) = Exp−α(af f1 −af f2 )
2
(4)
where: af f1 , af f2 are the real valued numbers representing the geometric properties of the substances. α is the affinity parameter with positive values. If α is high, the two substances have to be very similar (i.e., af f1 ∼ af f2 ) to get a high functional faf f value, if α is low, the substances can be more different to still get high faf f values. Molecules compete for a docking site (receptor) and their success of binding depends on the affinity between the molecule and the docking site and on the concentrations of the competitors. Figure 4 shows examples of pairs of ligands and receptors with low and high affinity as well as their potential use. 4.1. Exploring the connectivity of the neural network At the moment of the creation of the neural network, the system automatically assigns a set of receptors to each neuron and releases a signal to start connecting the neural network. Then the system attempts connections between neurons based on the receptors expressed by each neuron. All the weights were initialized randomly from a gaussian distribution with mean=0.0002 and sigma=0001 (See Fig. 6a). For instance, the system will create a synaptic weight to link a neuron from the hiddenField (see Fig. 5b) to a neuron in the motorField (see Fig. 5c) only if the two neurons had expressed the same matching receptors. The same applies for the synaptic weights connecting the neurons between the sensorField (see Fig. 5a) and the hiddenField (see Fig. 5b) as well as the synaptic weights connecting the neurons between the motorField (see Fig. 5c) and the motorActivities (see Fig. 5d). The components of the neural structure and its connections to the robot are depicted in Figure 5. In total the neural network had 154 neuronal units. The size of the neuronal areas sensorField, hiddenField, and motorField was 1x16, 8x8 and 8x8 respectively. The size of the neuronal area MotorActivities was 1x10 (see Fig. 5d). Each finger was moved by a pair of antagonistic neurons in the area MotorActivities, one caused the finger to contract and the other to extend. Each of these two motor activities controlling a par-
304
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand Exploration−exploitation trade−off
Random exploration of motor capabilities 25 60
20
Neurons activated
50
Counter
15
10
40
30
20
5 10
0
0
500
1000 1500 Motor output
(a)
2000
2500
0
0
50
100
150 200 Learning cycle
250
300
350
(b)
Figure 6. Self exploratory movements. (a) Number of times that a particular servo motor reached an individual position (out of the 2,500 possible positions) during 5,000 iterations (mean=1.8571, sigma = 1.3499, when μ = 0.2, 20% of the motor neurons could fire randomly). (b) Exploration-exploitation trade off (when μ = 0.5, 50% of the motor neurons can fire randomly). The solid line represents the number of neurons that randomly fired at each learning cycle. The dashed line represents the number of motor neurons that learned and are activated by themselves.
ticular motor was connected to a different subset of motor neurons in the area motorField defined by the expression of different receptors as explained above. To mimic the activation of biological muscles by the motor neurons the synaptic weights between the areas motorField and MotorActivities were fixed in such a way that some of them will have large effects on the movements while others will have tiny effects, resulting in the exploration pattern depicted in Fig. 6a where the finger was able to explore all possible movements in an unbiased way. 4.2. Value system Value systems are neural structures that are necessary for an organism to modify its behavior based on the salience or value of an environmental cue ( [7]). Such “value systems” have been implemented in robotic systems to study their role in adaptive behavior [8]. In our case, the pressure sensors were the value system that taught the robot about the success of its exploratory movements. Whenever the pressure sensor on the palm was activated and this activation exceeded a given threshold (e.g., an object was put on the palm), the corresponding sensor neuron released a signaling molecule, the motor neurons that had expressed a receptor with high affinity to that signaling molecule will be allowed to fire randomly and therefore the hand will be in its "Exploration behavior". If a finger is in contact with an object, depending of the force and the direction of the movement, the reading of the pressure sensor on that finger will increase or decrease. In either case a corresponding sensor neuron will release a signal that will be used by the learning mechanism to reward or punish the active motor neurons at that particular moment. 4.3. Learning Mechanism The proposed neural network allows the robotic hand to explore its own movement possibilities due to the random activation of the neurons in the motorField (see Fig. 5c). The outputs of the motor neurons are determined by the following formula:
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
305
Figure 7. Learning mechanism. As the robotic hand enclosed an object with its fingers, some pressure sensors were activated, a signaling molecule was then released, and eventually a synaptic weight between a pair of active neurons was updated (only if the respective neurons had expressed a receptor that matched with the signaling molecule).
⎧ n ⎨ 1.0 : if σ(j=1 ωi,j Sj ) >= θ1 n Oi = 1.0 : if σ( j=1 ωi,j Sj ) < θ1 and μ <= θ2 ⎩ 0.0 : otherwise
(5)
Where: Oi output of the i-th neuron, Sj input of the j-th neuron, ωi,j synaptic weight between the i-th and the j-th neurons, σ is the sigmoidal function σ(x) = 1+e1−αx , α is the slope parameter of the sigmoid function, θ1 is the threshold for the neuron to fire by itself, μ is a uniform value between 0 and 1 randomly generated, θ2 is the threshold that determines how many motor neurons are allowed to fire randomly. As can be seen, if the output exceeds a certain threshold (θ1 ), the neuron will fire by itself. Otherwise the neuron is allowed to fire randomly, for that purpose an uniform value (μ) between 0 and 1 is randomly generated. By comparing μ with a second threshold θ2 , only a percentage of the population of motor neurons is allowed to fire. The solid line in Fig. 6b represents the number of neurons that randomly fired at each learning cycle when μ = 0.5 (50% of the neurons were allowed to fire randomly). The dashed line represents the number of motor neurons that learned and were activated by themselves. Figure 6b elicits the exploration-exploitation trade-off. At the beginning of the experiment about half of the motor neurons are randomly activated, so the robot hand will explore its environment, over time, the learning mechanism modified the synaptic weights causing more neurons to learn, that means more motor neurons will have an output that exceeds the threshold θ1 and will fire by themselves and less and less motor
306
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
Figure 8. Synaptic changes over time during a typical grasping experiment.
neurons will fire randomly, so the more it learns the less than it has to explore. Around the learning cycle number 160, the object was taken away from the robot hand, so there is no sensory input, therefore the robot hand restarted its exploratory activity. Synaptic modification was determined by both pre and post synaptic activity and resulted in either strengthening or weakening of the synaptic efficacy between two neuronal units. The active neurons controlling the robot hand were "rewarded" if the movement of the fingers exerted a higher activation of the pressure sensors and "punished" otherwise. In this way the synaptic connections between the neuronal areas hiddenField (see Fig. 5b) and motorField (see Fig. 5c) were updated. Figure 7 gives a graphical explanation of the sequence of events resulting in the change of a synaptic weight while the robot hand interacts with an object.
5. Experimental results Figure 8 shows the synaptic changes in an OpenGL animation resulting from the real time data exchange between the computer hosting and running the neural network and the computer visualizing the learning process. All the synaptic weights were kept between -1.0 and 1.0. A learning cycle (i.e., the period during which the current sensory input is processed, the activities of all neuronal units are computed, the connection strength of all synaptic connections are computed, and the motor outputs are generated) had a duration of approximately 0.08 seconds. Figures 1b, 1c and 1d show the final states of experiments performed with several objects of different color, shape and material.
6. Discussion and future work Predefining all possible sensors and movement capabilities of a robot, will certainly reduce its adaptivity. In order to be adaptive, a neural controller must be able to reconfigure itself to cope with environmental and morphological changes (e.g., additional sensors, not only in number, but different types of sensors could be added, damaged sensors over time, additional or fewer degrees of freedom, etc.). As we cannot predict the above pos-
G. Gómez et al. / An Adaptive Neural Controller for a Tendon Driven Robotic Hand
307
sible changes, the system should be allowed to explore its own movements and coherently adapt its own behavior to the new situation; something that cannot be achieved by a purely reflex-based system. Until now no engineering methods exist how to tackle with unforseen changes, we expect to contribute to the solution of these problems with our approach. The robustness of the evolved neural controller will be tested by making systematic changes in the hand’s morphology (e.g., position and number of the pressure sensors, different types of sensors, stronger motors, covering materials in order to increase the friction with objects) to investigate how the neural controller reacts to unforeseen perturbations. Now that we have a system able to learn to foveate by using visual information (see [5] and [6]), and a system able to learn to grasp objects based on tactile information, it will be interesting to see how the two systems can be integrated and how far the system can go to solve more demanding tasks (e.g., object manipulation, reaching, catching moving objects). Acknowledgements This research was supported by the EU-Project ROBOT-CUB: ROBotic Open-architecture Technology for Cognition, Understanding and Behavior (IST-004370) and by the "Embryogenic Evolution: From Simulation to Robotic Applications" project of the Swiss National Science Foundation (Project No.: 200021-109864/1). References [1] P. Eggenberger Hotz, “Evolving morphologies and neural controllers based on the same underlying principle: Specific ligand-receptor interactions,” in Hara, F. and R. Pfeifer, (eds.), Morphofunctional Machines: The New Species (Designing Embodied Intelligence), 2003, pp. 217–236. [2] H. Yokoi, A. Hernandez Arieta, R. Katoh, W. Yu, I. Watanabe, and M. Maruishi, Mutual Adaptation in a Prosthetics ApplicationIn Embodied artificial intelligence. Lecture Notes in Computer Science., F. Iida, R. Pfeifer, L. Steels, and Y. Kuniyoshi, Eds. Springer, ISBN: 3-54022484-X, 2004. [3] R. Pfeifer and F. Iida, “Morphological computation: Connecting body, brain and environment.” Japanese Scientific Monthly, vol. 58, No. 2, pp. 48–54, 2005. [4] Y. Ishikawa, W. Yu, H. Yokoi, and Y. Kakazu, “Research on the double power mechanism of the tendon driven robot hand,” The Robotics Society of Japan, pp. 933–934, 1999. [5] P. Eggenberger Hotz, G. Gómez, and R. Pfeifer, “Evolving the morphology of a neural network for controlling a foveating retina - and its test on a real robot.” in Proc. of Artificial Life VIII, 2002, pp. 243–251. [6] G. Gómez and P. Eggenberger Hotz, “Investigations on the robustness of an evolved learning mechanism for a robot arm,” in Proc. of the 8th Int. Conf. on Intelligent Autonomous Systems (IAS-8), 2004, pp. 818–827. [7] K. J. Friston, G. Tononi, G. N. Reeke, O. Sporns, and G. M. Edelman, “Value-dependent selection in the brain: Simulation in a synthetic neural model,” Neuroscience, vol. 59, no. 2, pp. 229–243, 1994. [8] O. Sporns and W. Alexander, “Neuromodulation and plasticity in an autonomous robot.” Neural Networks (Special Issue), vol. 15, pp. 761–774, 2002.
308
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Self-organizing Route Guidance Systems based on coevolution of Multi-Layered Guidance Vector Fields Yasuhiro Ohashi and Kosuke Sekiyama University of Fukui Abstract. This paper deals with a novel distributed route guidance system called Self-organizing Route Guidance Systems (SRGS). SRGS provides an efficient route guidance, which facilitates offset adjustment of the self-organizing control of the signal network by self-organizing multi-layered vector fields. In SRGS, traffic efficiency would decrease in the case that a conflict in the traffic flow occurs due to the conflict of guidance vectors at an intersection. To cope with this problem, a new self-organizing method of multi-layered vector fields is proposed so that proper guidance routes are self-organized in a crowded traffic situation. Simulation demonstrates the effectiveness of proposed system under a dynamic and dense traffic situation. Keywords. Self-organizing systems, Route Guidance Systems, Traffic signal network, ITS
1. Introduction Recently, a heavy traffic congestion at an urban area has brought several social problems, such as increasing traffic accidents, environmental pollution, and economic losses. Intelligent Transportation Systems (ITS) has been proposed for new transportation systems that are expected to solve these problems. To cope with problems of the traffic congestion, two major approaches have been attempted in Japan. The first approach is a traffic flow control by Centrally Determined Route Guidance Systems (CDRGS) using Vehicle Information and Communication System (VICS) [1,2]. VICS provides drivers with some information on traveling time, traffic congestion, and a route guidance through optical beacons or radio beacons located at roadsides[3]. Route guidance system is capable of managing the traffic amount at the urban area in order to improve the traffic congestion problem. In order to provide the information on shortest route, the time-constrained A∗ algorithm has been proposed[4]. To minimize the expected travel time, dynamic programming techniques have been applied in [5]. Also, to avoid the traffic congestion caused by the concentration of traffic flows, the routing method based on the estimated travel time has been proposed in [6]. The second deals with a traffic signal optimization in which traffic signal parameters are adjusted to improve the traffic flow[7,8]. Also, we have proposed a self-organizing control of
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
309
Si2
Si3
Si1
Si
Si4 Figure 1. Urban traffic signal network model.
the traffic signal network[9]. This is a fully distributed approach which improves traffic flow efficiency by coupled nonlinear oscillators. Offsets between signals and split ratio are adjusted based on the local traffic flow information so that desired offset patterns of the signal network are self-organized through mutual synchronization. From the perspective of the traffic signal control research, traffic flows are treated as an uncontrollable environment, while, from the point of the traffic flow control, traffic signals are not treated explicitly. In order to realize more efficient traffic systems, the self-organizing route guidance systems (SRGS), which is integrated with self-organizing traffic signal network, has been proposed in our previous study[10]. By utilizing a route guidance based on multi-layered guidance vector fields, SRGS solves a problem of self-organizing signal network in that the offset is not appropriately adjusted in a particular traffic situation, where the difference in the amounts of traffic flow between east-west and northsouth are almost equivalent. This paper provides a new self-organizing method of multi-layered guidance vector fields in SRGS. The more efficient route guidance is formed under a dense and dynamic traffic situation, hence, the conflict in the traffic flows caused by the conflict in the guidance vectors is resolved.
2. Self-organizing Route Guidance Systems (SRGS) 2.1. Urban Traffic Network Model We adopt the urban traffic signal network model used in [9]. The model of traffic signal network used in the self-organizing signal network is described by the bidirectional graph network as depicted in Figure.1. Node Si , (i = 1, . . . , NS ) denotes intersections and Sij , (j ∈ 1, 2, . . . , ni ) denotes the neighboring intersections of Si . εij is the normalized amount of traffic flows defined by εij =
1 ρim Ti
t+Ti
ρij (s)ds.
(1)
t
Where, ρim = const. is the traffic capacity and ρij is the traffic density on the road between the intersections Si and Sij . The detail of self-organizing traffic signal network is described in our previous work[9][10].
310
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
2.2. Multi-layered Guidance Vector Fields In SRGS, multi-layered guidance vector fields consist of several vector fields that have route guidance information corresponding to each destination (Figure.2(a)). The multi-layered guidance vector fields are mutually adjusted to avoid conflict of the traffic flows formed by a vector field corresponding to each destination. 2.3. Configurations of SRGS We adopt the bi-directional grid network as urban traffic network model used in [9]. Each intersection Si at the guidance area has a vehicle guidance unit Ui , which has a reward variable corresponding to the destination h, (h = 1, 2, . . . , Nh ) denoted by Rih . SRGS consists of vehicle guidance units located at every intersection Si in the guidance area. The guidance unit determines the route guidance direction corresponding to the destination of the vehicle based on the guidance vector (Figure.2(b)). Reward Rih is then diffused over the guidance units through local interactions of the guidance units. Guidance vector field corresponding to each destination is formed by the reward field spreading on the network. The guidance vector of the guidance unit Ui corresponding to the destination h is denoted by h h h h , gi2 , gi3 , gi4 }. gih = {gi1 h Where, gij ≥ 0,
4
h j=1 gij
(2)
= 1.
2.4. Conflict of Traffic Flows The offset is not appropriately adjusted when the traffic inflows in the eastwest and the north-south direction are almost equivalent at the intersection Si as depicted in Figure.3, because the dominant direction necessary for the offset adjustment is not determined in such a case. This highly symmetric traffic flows are treated as a conflict of the traffic flows in SRGS. To realize efficient traffic flows, conflict of traffic flows has to be avoided by appropriate self-organized guidance vector fields. The conflict of the traffic flows on guidance unit Ui is considered to be caused by a conflict of the guidance vector in a different layer of neighboring unit Uij . Figure.4 (a) shows a simple example of such a traffic conflict. In Figure.4 (a), the traffic flows guided by the guidance vector of layer 1 on the unit Ui1 and the guidance vector of layer 3 on the unit Ui3 cause the conflict on the guidance unit Ui . Therefore, in order to avoid the conflict, guidance vectors of the neighboring should be modified in a way that changes a guidance direction from the direction Ui to the other directions. In what follows, we describe how to avoid a conflict of the traffic flows and to self-organize effective multi-layered guidance vector fields. 2.5. Self-organization method of Multi-layered guidance vector fields 2.5.1. Reward Diffusion Reward Rih (Figure.2 (b)) is employed to inform each unit about a rough direction to the destination. The guidance unit that is regarded as a route destination h has
311
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
Guidance Vector Field 1
Ui2
Destination 2
Destination 1
Ri2h
gi2h
Ri3h
Ui3
gi3h
gi1h
Ui Rih
Ui1 Ri1h
gi4h Guidance Vector Field 2
Ui4
Guided Vehicle
Ri4h
(b) Reward and guidance
(a) Multi-layered vector fields of SRGS
vector of Ui on layer h Figure 2. Multi-layered Guidance vector fields of SRGS
Ui2
Ui3
Ui2
Ui1
Ui
Ui3
Ui
Ui1
Ui4
Ui4
(a) Symmetric two-way
(b) Asymmetric two-way
traffic flows
traffic flows
Figure 3. Traffic flow patterns on signal network
the constant reward, which is spread over the neighboring units by the following step. At the first step, the guidance unit Ui determines the maximum reward Rih∗ of the neighbors. h . Rih∗ = max Rij j
(3)
Then, the reward Rih is updated as follows, Rih (τi
+ 1) =
γRih∗ (τi ) if Rih∗ (τi ) > Rih (τi ) . else Rih (τi )
(4)
Where, τi is a scaled time associated with the period of the signal, and γ ∈ (0, 1) is a decay rate. The reward gradient toward the destination is formed by this diffusion process, and each guidance unit can recognize the direction to the destination based on the reward gradient. 2.5.2. Conflict Avoidance Signals A conflict of the traffic flows at the guidance unit Ui is caused by inappropriate guidance vectors on the different layers of the neighboring units as mentioned in Section.2.4. To avoid those conflicts, guidance vectors of the neighboring unit Uij should be modified. However, the conflict situation is detected by only the
312
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
These vectors cause conflict of traffic flow layer 3 layer 2 layer 1
Ui3
Ui1 Ui
Conflict of traffic flow on Unit Ui
(b) Conflict avoidance signals
(a) Conflict caused by neighboring units
Figure 4. Conflict avoidance problem of SRGS
guidance unit Ui , hence, it needs to be transmitted to the neighboring unit Uij . The guidance unit Ui then generates the conflict avoidance signal denoted by cij based on the conflict situation. The conflict avoidance signal which is supposed to be transmitted from Ui to Uij depicted in Figure.4 (b) is defined by ⎧ ⎨1 − |εij − εi,k+1 | if |ε + ε ij i,k+1 | > 0 |εij + εi,k+1 | cij = . (5) ⎩ 0 else Where, k = j + 1 (mod 4). Also, the conflict avoidance signal transmitted from Uij to Ui is denoted by c˜ij , which is defined as similarly in Eq.(5). With this conflict avoidance signal, the guidance unit detects the direction and the extent of the conflict and appropriately modifies its own guidance vectors based on the conflict avoidance signal c˜ij . 2.5.3. Update Laws of Multi-Layered Guidance Vector Fields The guidance vector is updated from Uij to Ui based on the conflict avoidance h as follows, signal c˜ij and reward gradient ΔRij h h ΔRij = Rij − Rih ,
(6)
which allows multi-layered guidance vector fields to grow toward each destination avoiding the conflict of traffic flows.When no conflict occurs at the neighboring unit, the guidance vectors of the neighboring unit are updated simply toward the destination. While, when the conflict occurs, the guidance vectors need to be updated toward the destination so as to avoid the conflict situation. Additionally, since the guidance vector, which is highly correlated with traffic outflow, causes the conflict at the neighboring unit, this guidance vector must be modified appropriately. Normalized traffic density, which flows from unit Ui to the neighboring unit Uij , is denoted by εji . Overall traffic density is divided into the traffic density of each layer based on the destination h, denoted by εhji . Where the relation between εji and εhji is given as follows, εji =
Nh h=1
εhji .
(7)
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
313
h h The updating coefficient θij of guidance vector gij is defined based on the reward h gradient ΔRij and the conflict avoidance signal c˜ij from neighboring unit Uij to unit Ui and the divided traffic density εhij .
⎧ h ⎨a + (b − c˜ ) P εji (τi ) ji Nh h h θij (τi ) = h=1 εji (τi ) ⎩0
h if ΔRij >0
.
(8)
else
Where a, b are constants. In Eq.(8), the first term a indicates the growth rate of guidance vector toward the destination. The second term is related to conflict avoidance, where the growth of vector component toward j direction is inhibited under the condition that the conflict avoidance signal c˜ij from Ui to Uij is large, and the guidance vector leads many vehicles toward j direction. The guidance h h vector gij is updated with the updating coefficient θij as follows, h h h (τi ) + θij (τi )gij (τi ) gij h
. gij (τi + 1) = ni h h h j=1 gij (τi ) + θij (τi )gij (τi )
(9)
2.6. Route Selection Route guidance information transmitted to the vehicle is selected by probability phij corresponding to the destination h. Let uj be the coefficient to restrict a U-turn defined by u = {u1 , u2 , u3 , u4 },
(10)
Route selection probability phij is introduced with uj as follows, h uj gij phij = ni . h j=1 uj gij
(11)
3. Simulations 3.1. Vehicle Model We use the vehicle model proposed in [11] and the road model, which has a straight lane and a right-turn lane. In the vehicle model, the desired velocity of vehicle m is determined by
vmax 1 + tanh xmw−vc if xm > xmin ∗ 2 vm (t) = . (12) 0 else Where, xm is the distance from vehicle m to preceding vehicle m + 1, vmax is speed limit, xmin is the minimum distance between vehicles, and vc , w are constants. At the red light, if vehicle m is the head of vehicles on the lane, xm implies the distance from vehicle m to the intersection Si instead of the distance to
314
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
∗ the preceding vehicle. Also, each vehicle follows desired velocity vm with acceleration/deceleration am (t), which is defined based on the difference between the ∗ (t) and the current velocity vm (t). desired velocity vm ⎧ + ⎪ if Δvm (t) > a+ max ⎨amax am (t) = Δvm (t) if a− (13) ≤ Δv (t) ≤ a+ m max max ⎪ ⎩ − − amax if Δvm (t) < amax . ∗ (t) − vm (t), and a± Where, Δvm (t) = vm max is the maximum value of acceleration/deceleration. vm is determined with am (t) as follows,
vm (t + Δt) = vm (t) + am (t)Δt.
(14)
3.2. Comparative Method In order to evaluate SRGS, we introduce a simple route selection method with probability p˜hij given by the normalized direction vector dhi from the intersection Si to the destination h. dhi = {dhi1 , dhi2 , dhi3 , dhi4 }. (15) 4 h Where, dhij ≥ 0, ˜hij is defined with dhij and uj in j=1 dij = 1. Probability p Eq.(10) as follows, uj dhij . (16) p˜hij = ni h j=1 uj dij Using this simple route selection, the vehicle avoids the roundabout route, and the traffic flow is dispersed. This route selection is referred to basic direction route guidance in what follows for comparison. 3.3. Simulation Settings Suppose a traffic signal network with 10 × 10 grids. We examine the performance of SRGS for several traffic situations. Vehicles flow into the signal network from specified intersections called the inflow intersection (Figure.5(a)). Each vehicle selects the destination randomly with the equivalent probability. To avoid unnecessary roundabout route selection, the initial guidance vectors are set to gih = dh i . The distance between the signals is set 200 meters, and other parameters are listed in Table 1. In this simulation, we compare the following three methods. 1. Vehicles follow the navigation by the method proposed in this paper. 2. Vehicles follow the navigation by our previous method proposed in [10]. 3. Vehicles follow the navigation by the basic direction route guidance. To observe the performance under the dense traffic situation, we examine following two cases: Case 1: The inflow amount of the vehicles is set to 5 vehicles/min. Case 2: The inflow amount of the vehicles is set to 10 vehicles/min. The dynamical environment, where the traffic situation changes according to Figure.5, is assumed in all cases.
315
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems
Table 1. Simulation parameters vmax
maximum velocity
50 km/h(13.89 m/s)
a+ max
acceleration
1.5 m/s2
a− max
deceleration
-5.0 m/s2
K
coupling constant
0.1
α, β
updating of ω
0.5, 0.05
ω ˜ i∗
base natural frequency
0.10147
ρim
road capacity
80 vehicles
μ, c
turn sensitivity constants
30, 15
γ
decay rate
0.8
a, b
updating of g
0.3, 0.65
Input Flow
Input Flow
Input Flow
A
B
D
C
Destination
A
D
B
C
Destination (a) Traffic situation 1 (t = 0 ~ 20000)
A
D
B
C Destination
(b) Traffic situation 2 (t = 20000 ~ 40000)
(c) Traffic situation 3 (t = 40000 ~ 60000)
Figure 5. Traffic situations used in simulations
3.4. Simulation Results 3.4.1. Case 1 The average velocity of all vehicles is used to asses performance of our method. E denotes the average asymmetry evaluation for all units which is defined by, Ni ni |εij − εi,k+1 | 1 . E= Ni ni i=1 j=1 |εij + εi,k+1 |
(17)
The higher value of E implies that the asymmetry of traffic flows is large in which the offset adjustment of the signal network is facilitated. The average velocity for Case 1 is shown in Figure.6(a). For proposed method, the average velocity of all vehicles during t = 0 ∼ 20,000 is 8.59 m/s = 31.0 km/h. For previous method, the average velocity of all vehicles during t = 0 ∼ 20,000 is 8.25 m/s = 29.7 km/h. For basic direction route guidance, the average velocity during t = 0 ∼ 20,000 is 7.36 m/s = 26.5 km/h. Also, the average asymmetry evaluation is shown in Figure.6(b). Both the average asymmetry of proposed method and previous method are higher than that of the basic direction route guidance. Figure.7 shows the average traffic flow corresponding to each destination during t = 0 ∼ 20,000. These results show that the conflict of traffic flows was successfully avoided and the offset adjustment of the signal network was facilitated due to the efficient multi-
316
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems 1
Average asymmetry evaluation
Average velocity (m/s)
10
8
6
4
2 SRGS(New method) SRGS(Previous method) Basic direction route guidance
0 10000
20000
30000
40000
50000
60000
0.8
0.6
0.4
0.2 SRGS(New method) SRGS(Previous method) Basic direction route guidance
0 10000
20000
Time (s)
(a) The average velocity of all vehicles; The traffic situation is changed at t = 20,000, 40,000.
30000
40000
50000
60000
Time (s)
(b) The average asymmetry evaluation of all guidance units
Figure 6. Simulation Results of Case 1
Figure 7. Average traffic flows toward each destination during t = 0∼20,000
layered vector fields formed by the proposed method. Focusing on adaptability to dynamical traffic situations, the traffic situation was altered at t = 20,000 and t = 40,000, where the proposed method was able to cope with the changes in the traffic situations effectively, although the average velocity temporarily declined shortly after t = 20,000 and t = 40,000. 3.4.2. Case 2 A case of the dense traffic situation is examined in Case 2. Figure.8 shows simulation results of Case 2. For proposed method, the average velocity during t = 0 ∼ 20,000 is 7.62 m/s = 27.4 km/h, while the average velocity is 6.70 m/s = 24.1 km/h for our previous method. As to the basic direction route guidance, the average velocity is 6.88 m/s = 24.8 km/h. When our previous method was applied, the average velocity was declined to the same level of that with the basic direction route guidance. On the other hand, our new proposal maintained the traffic efficiency. The proposed method outperformed in all cases. 4. Conclusions In this paper, we proposed a new self-organizing method of multi-layered guidance vector fields in SRGS. By newly proposed approach, SRGS navigates the traffic flows to the specific destination so as to avoid the conflict of the traffic flows and facilitates the offset adjustment of self-organizing signal network under the dense
317
Y. Ohashi and K. Sekiyama / Self-Organizing Route Guidance Systems 1
Average asymmetry evaluation
Average velocity (m/s)
10
8
6
4
2 SRGS(New method) SRGS(Previous method) Basic direction route guidance
0 10000
20000
30000
40000
50000
60000
0.8
0.6
0.4
0.2 SRGS(New method) SRGS(Previous method) Basic direction route guidance
0 10000
20000
Time (s)
(a) The average velocity of all vehicles. The traffic situation is changed at t = 20,000, 40,000.
30000
40000
50000
60000
Time (s)
(b) The average asymmetry evaluation of all guidance units
Figure 8. Simulation Results of Case 2
and dynamic traffic situations. In general, each vehicle has its own destination, hence, some techniques should be introduced to integrate similar destinations into a roughly defined destination, which is a part of our future work. References [1] S. Tsugawa, M. Aoki, A. Hosaka, and K. Seki. A survey of present ivhs activities in japan. Control Engineering Practice, 5(11):1591–1597, Nov. 1997. [2] H. Kojima. Radio communication technology on VICS and next emerging systems in ITS. IEEE Conference on Intelligent Transportation System, pages 26–32, Nov. 1997. [3] M. Yamaguchi, T. Kitamura, S. Jinno, and T. Tajima. The interactive CDRG using infrared beacons. IEEE International Conference on Intelligent Transportation Systems, pages 278–283, Oct. 1999. [4] H. Hiraishi, H. Ohwada, and F. Mizoguchi. Time-constrained heuristic search for practical route finding. 5th Pacific Rim International Conference on Artificial Intelligence, pages 389–398, Nov. 1998. [5] L. Fu. An adaptive routing algorithm for in-vehicle route guidance systems with real-time information. Transportation Research Part B, 35(8):749–765, Sep. 2001. [6] F.P. Deflorio. Evaluation of a reactive dynamic route guidance strategy. Transportation Research Part C, 11(5):375–388, Jul. 2003. [7] I. Porche, M. Sampath, R.Sengupta, Y.-L. Chen, and S. Lafortune. A decentralized scheme for real-time optimization of traffic signals. IEEE International Conference on Control Applications, pages 582–589, Sept. 1996. [8] T. Misawa, H. Kimura, S. Hirose, and N. Osato. Multiagent-based traffic signal control with reinforcement learning. IEICE Transactions, J83-D-I(5):478–486, May 2000. [9] K. Sekiyama, J. Nakanishi, I. Takagawa, T. Higashi, and T. Fukuda. Self-organizing control of urban traffic signal network. IEEE International Conference on Systems, Man, and Cybernetics, pages 2481–2486, Oct. 2001. [10] K. Sekiyama and Y. Ohashi. Distributed route guidance systems with self-organized multi-layered vector fields. Journal of Advanced Computational Intelligence andd Intelligent Informatics (JACIII), 9(2):106–113, Mar 2005. [11] S. Tadaki, M. Kikuchi, Y. Sugiyama, and S. Yukawa. Coupled map traffic flow simulator based on optimal velocity functions. Traffic and Granular Flow ’97, pages 373–378, Apr. 1998.
318
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Object Transportation Using Two Humanoid Robots Based on Multi-agent Path Planning Algorithm a
Shotaro Kamio a and Hitoshi Iba a Graduate School of Frontier Sciences, The University of Tokyo.
Abstract. The cooperation of several robots is needed for complex tasks. However, it is difficult to manually operate multiple humanoid robots in order to make them cooperate with each other. To automate the operation, a path planning method is required to navigate the robots to the appropriate positions. In this paper, the target task is a cooperative object transferring task using two humanoid robots. First, we present a multi-agent planning algorithm based on a random sampling method (RRT). This method doesn’t require the exact sub-goal positions nor the times at which cooperation occurs. The performance of the algorithm is superior to that of the normal RRT method. Next, a result of the experiment using two humanoid robots is shown. Keywords. Path planning, humanoid robots, object transportation, cooperation, multi-agent.
1. Introduction In recent years, advances have been made in the development of humanoid robots, and individual basic functions such as walking control [1], motion planning [2,3], and recognition of the surroundings [4] are now being put into practical use. Because humanoid robots’ bodies are shaped to be similar to that of a human, they have the advantage of being able to perform tasks on environments designed for human use. An important focus in the future is the combination of these basic functions into more advanced skills. Cooperative behavior can be thought of as one example of such skills. The cooperation of robots enables them to carry out complex tasks efficiently, such as transferring objects [5]. Several researches were conducted relating to the cooperative operations between humans and humanoid robots [6]. Very few have been carried out, however, on the cooperative operation among humanoid robots themselves [7]. If the task is complicated, an operator will be forced to send many instructions to the robot in order for the task to be achieved. Performing cooperative tasks increases the instructions required to operate the multiple robots because the number of robots increases. In order to put this into practice, ensuring that the instructions provided to the robots are simple will be of benefit to the operator. For this purpose, we have proposed a multi-agent cooperation path planning algorithm [8]. This algorithm is based on Rapidly-exploring Random Trees (RRT) method [9]. Our algorithm automatically generates appropriate sub-goals for robots to
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
(a) An environment with three robots.
319
(b) Sub-plans generated toward one of subgoals for the sub-task #1.
Figure 1. An example of the environment and the generated sub-plans.
perform the object transportation task. The performance of the algorithm is superior to that of the normal RRT method. In this paper, we have applied the algorithm to humanoid robots, and a result of this experiment is presented. This paper is structured as follows: the next section describes the target task. Section 3 outlines our planning algorithm. Section 4 explains the humanoid robot and how to apply the algorithm to the robot. Section 5 describes the experimental result of simulation to verify the proposed technique, section 6 presents discussions about the results, followed by a conclusion in section 7.
2. The Target Problem The problem targeted in our research is the task of having humanoid robots cooperate in carrying an object from the start position to the goal position. In our project [7], two robots cooperate and transfer the object from one to the other. The state of the environment at a certain point in the future should be known in advance. In this study, we use humanoid robots “HOAP-1” manufactured by Fujitsu Automation Limited. The states of the robots is expressed using 3D information (x coordinate, y coordinate, and rotation angle around z axis). First, the robot which has the object at the start position delivers it to another robot. The robot that receives the object then transfers it on to yet another robot. The last robot carries the object to the goal point. In order to transfer the object, the robots have to cooperate and move to positions that are close enough that the object can be transferred from one to another. In this research, the following information is known in advance. • Models of the environment and robots • Goal position for the object • Procedure for cooperation
320
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
Table 1. An example of cooperation procedure with three robots. Sub-task #.
Content of sub-task
1
Transfer the object from Robot #1 to Robot #2.
2
Transfer the object from Robot #2 to Robot #3.
3
Place the object at goal position by Robot #3.
An example of the environment for three robots is presented in Fig. 1(a). Each robot is in separate rooms, divided by walls. The robots are not able to go beyond the walls, but they are able to transfer the object to other robots at points where the walls are low (points labeled “transfer location” in the figure). The goal position for the object is provided, but not the goal positions for any of the robots. The cooperation procedure consists of a series of sub-tasks in which the robots will cooperate. An example of the procedure is indicated in Table 1. As conditions for a robot to pass the object to another robot, the two robots must position themselves within a distance of 25cm, and the offset of the angle at which they face each other must be of 0.2 radians (approximately 10 degrees) or less. These are the goal conditions for each sub-task and are hereafter referred to as the “sub-goal conditions”. Because the experiment targets humanoid robots, it is assumed that they could move freely within the surrounding area, and the environment is treated with the holonomic constraint.
3. The Planning Algorithm for Cooperation 3.1. Path Planning Algorithm Our path planning algorithm finds a collision-free solution path for the task presented in the previous section. The algorithm is superior to the normal RRT algorithm in terms of planning speed and results [8]. The path planning for each sub-task is realized by the following recursive routine (call as plan_ith_subtask routine): 1. Determine two robots to cooperate in the i-th sub-task based on the cooperation procedure. 2. Generate a pair of sub-goal states for the two robots (plan_ith_subgoal routine). 3. Generate a sub-plan from the current state to the sub-goal state for the robot having the object (plan_subplan routine). 4. Generate a sub-plan from the current state to the sub-goal state for the robot not having the object (plan_subplan routine). 5. Make the final time of the two sub-plans same. 6. Finish if there are no more sub-tasks. Otherwise, repeat from step 1. for the (i+1)th sub-task. Here, the path plan up to the sub-goal for each robot is called the “sub-plan”. An algorithm based on RRT is used to generate the sub-plans. Because RRT is a probabilistic algorithm, the algorithm sometimes fails in the generation of sub-plans (even if, theoretically, there is a solution for the sub-plan). If an failure occurs in gener-
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
321
Table 2. The specifications of the robot “HOAP-1”. Height
about 48cm
Weight
about 6kg, including 0.7kg of battery
Joint Mobility
6DOF/foot ×2, 4DOF/arm ×2
Sensors
Joint angle sensor 3-axis acceleration sensor 3-axis gyro sensor Foot load sensor
ating a sub-plan, backtracking is carried out, and the process is redone from the sub-goal generation step. 3.2. Generation of sub-goals Plan_ith_subgoal routine generates a pair of sub-goal states for the cooperating robots. The algorithm of generating sub-goals use two RRT search trees. These trees are made of the current states of two cooperating robots. Branches of the trees are grown toward the state which satisfies the sub-goal condition (Sect. 2) with the other tree states. A pair of states which satisfied the condition becomes the sub-goal. The algorithm is as follows: 1. Initialize search trees τ1 and τ2 with the current states of the robots r1 and r2 , respectively. 2. Choose a state xrand with random sampling. 3. Grow a branch from τ1 toward xrand . The state of the extended branch is referred as x1 . If the branch cannot grow, go to step 7. 4. Determine a state xcoop which satisfies the sub-goal condition with the x1 . 5. Grow a branch from τ2 toward xcoop . If the branch cannot grow, go to step 7. 6. Find a state x2 in τ2 which satisfies the sub-goal condition between x1 and x2 . If it exists, finish with returning states x1 and x2 for sub-goals of robots r1 and r2 . 7. Swapping the relations between τ1 and τ2 and repeat from step 2. 3.3. Example Result of The Path Planning Algorithm Plan_ith_subgoal routine uses two RRT search trees as described in Sect. 3.2. Their connection point becomes the sub-goal. After that, plan_subplan routine generates a subplan for each cooperating robot. Details of the algorithm are described in [8]. For example, applying the algorithm to the environment shown in Fig. 1(a) results in the sub-goal and sub-plans shown in Fig. 1(b).
4. Application to a Real Robot 4.1. The Real Robot and the Environment The robot used in this research is “HOAP-1”, which is manufactured by Fujitsu Automation Limited. It is a humanoid with 20 degrees of freedom. The specifications of the robot are shown in Table 2. This robot acts on commands given by a host computer (RT-Linux
322
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
(a) The environment model.
(b) The real environment corresponding to the model.
Figure 2. The environment used for the experiment with two humanoid robots.
(a) The path generated frequently.
(b) The path generated rarely.
Figure 3. The path plan generated by our algorithm.
operating system). It is equipped with a CCD camera in its head, which provides image data for recognition of environment. It is too difficult to implement the same environment as the one using three robots presented in Fig.1(a). In this study, the experiment on a simplified environment (Fig.2(a)) is conducted. The environment has two rooms with two robots. Their cooperation procedure consists of a sub-task, in which the robot #1 transfers the object to the robot #2. Figure 2(b) presents the real environment used. 4.2. The Plan and Control of the Robot Figure 3 shows path plans generated by our algorithm for the environment. The generated paths are divided into two types: paths that move towards the bottom of the environment (Fig.3(a)) and those that move towards the top of the environment (Fig.3(b)). Paths like the one on Fig.3(a) are much more likely to be generated, because they are shorter and simpler than the other type. However, because it only has a number of discrete actions, the robot cannot execute the paths directly. We decided to use three actions for the robot: “forward-moving”, “left-
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots 120
323
determined actions smoothed path
100 80 60 40 20 0 70
80
90
100 110 120 130 140 150 160 170
Figure 4. An example of the smoothed path and the trajectory of determined actions.
turn” and “right-turn”. Therefore, planned paths have to be converted into a sequence of these actions for real robots. In the conversion, at first, the planned path is smoothed. After that, actions are determined to go along with the path. Then, the actions are transferred to the robot to be executed. The robot actions are determined using the following procedure: 1. Calculate the distance and the angle of the direction of the path along with the face direction of the robot. 2. If the distance (or the angle) is greater than its threshold, move forward (or turn). 3. Repeat from Step 1. An example of the smoothed path and the determined actions is shown in Fig. 4. As can be seen, the trajectory of the determined actions is similar to the smoothed path. A handover process is triggered when execution of the planned actions to the subgoal is finished. This handover process consists of three steps [10]: (1) approach step (i.e., two robots approach each other), (2) alignment step (i.e., two robots align their positions in order to confront each other) and (3) handover step. The movement of the robots is determined based on images from their cameras in these steps. In order to perform the handover process, at least one robot has to detect its partner robot in its camera. If both robots cannot see their partner, they fail the handover.
5. Experiment Figure 5 shows snapshots of a result with the path presented in Fig.3(a). At the beginning, the robots turned toward the bottom of the figures to avoid the obstacle (Fig. 5(1)-(2)). Next, they walked and turned to confront each other (Fig. 5(3)-(4)). Then, the robot having the object approached to the other robot. While, the robot not having the object stayed there because it could not see the other robot by its own camera (Fig. 5(5)). After that, the robots recognized each other. They executed the handover process cooperatively (Fig. 5(6)-(8)). The task was achieved successfully.
324
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
Figure 5. Snapshots of the result with the real robot.
6. Discussion In the experiment, the success rate was low. The robots sometimes bumped into the obstacles or lost their partner from sight during the actions. In this setup, robots moved using the actions determined from the planned path. Feedback from the robots’ video cameras is not used except within the handover process. The movement of the robot is affected by the friction between the floor and robot’s sole, so that the path of the robot varies on each run. Therefore, it is difficult to increase the success rate. In order to improve the success rate, we have to apply a localization method (e.g., Monte Carlo Localization [11]) to robots. Then, the robot recognizes the position in the environment using the localization method. When the robot’s position is different from the planned one, a re-planning should be performed. Then, the robot can walk avoiding
S. Kamio and H. Iba / Object Transportation Using Two Humanoid Robots
325
obstacles along with the re-planned path. We are working for the implementation of the re-planning algorithm and the localization method. 7. Conclusion This paper explained our planning algorithm, which conducts cooperative object transportation using multiple agents. The algorithm is applied to a two humanoid robot environment, and the result is presented. An improvement needed for increasing the success rate is discussed. As a future work, we intend to conduct experiments using three humanoid robots. Acknowledgement The authors thank Dr. Yutaka Inoue for his development of two-robots handover system [10]. References [1] K. Kaneko, F. Kanehiro, S. Kajita, H. Hirukawa, T. Kawasaki, M. Hirata, K. Akachi, and T. Isozumi, “Humanoid Robot HRP-2,” in Proc. of the 2004 IEEE International Conference on Robotics and Automation (ICRA2004), 2004. [2] J. Kuffner, K. Nishikawa, S. Kagami, M. Inaba, and H. Inoue, “Motion planning for humanoid robots,” in Proc. 11th Int’l Symp. of Robotics Research (ISRR 2003), 2003. [3] S. Kagami, J. Kuffner, K. Nishiwaki, K. Okada, M. Inaba, and H. Inoue, “Humanoid arm motion planning using stereo vision and rrt search,” in Proc. of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp. 2167–2172, 2003. [4] K. Okada, M. Inaba, and H. Inoue, “Walking navigation system of humanoid robot using stereo vision based floor recognition and path planning with multi-layered body image,” in Proc. of the 2003 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems, 2003. [5] J. Ota, N. Miyata, T. Arai, E. Yoshida, D. Kurabatashi, and J. Sasaki, “Transferring and regrasping a large object by cooperation of multiple mobile robots,” in Proc. of IROS 95, 1995. [6] K. Yokoyama, H. Handa, Y. Hukase, K. Kaneko, F. Kanehiro, Y. Kawai, F. Tomita, and H. Hirukawa, “Cooperative works by a human and a humanoid robot,” in Proc. of ICRA2003, pp. 2985–2991, 2003. [7] Y. Inoue, T. Tohge, and H. Iba, “Cooperative transportation by humanoid robots – learning to correct positioning –,” in Proc. of the Hybrid Intelligent Systems (HIS2003), pp. 1124–1133, 2003. [8] S. Kamio and H. Iba, “Random sampling algorithm for multi-agent cooperation planning,” in Proc. of the 2005 IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS2005), (Edmonton, Alberta, Canada), pp. 1676–1681, August 2-6 2005. [9] S. M. LaValle and J. J. Kuffner, “Rapidly-exploring random trees: Progress and prospects,” in Algorithmic and Computational Robotics: New Directions (B. R. Donald, K. M. Lynch, and D. Rus, eds.), pp. 293–308, Wellesley, MA: A K Peters, 2001. [10] Y. Inoue, An implementation of a framework for cooperative multi-agent system using humanoid robots (in Japanese). PhD thesis, Graduate School of Frontier Sciences, The University of Tokyo, March 2005. [11] D. Fox, W. Burgard, F. Dellaert, and S. Thrun, “Monte Carlo Localization: Efficient Position Estimation for Mobile Robots,” in AAAI/IAAI, pp. 343–349, 1999.
326
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Cognitive map plasticity and imitation strategies to extend the performance of a MAS P. Laroque, E. Fournier, P. H. Phong and P. Gaussier Neurocybernetic team, ETIS, CNRS (UMR 8051) / ENSEA / UCP Université de Cergy-Pontoise, 2 rue Adolphe Chauvin, 95302 Cergy-Pontoise cedex Abstract. This paper describes the second step of a collaborative work aimed at showing how a system composed of a collection of situated agents can solve several non-trivial planning and optimisation problems. This work deals with the problem of facing dynamically changing environments, and with how to identify individual agents (to optimize imitation performance), which should enable us, in a future third step, to make agents able to act on their environment. Keywords. Embodied Intelligence, Biomimetic Autonomous Systems, Cognitive map, Learning, Neural networks
1. Introduction In previous works, we described how cognitive maps could be used by an agent to solve problems involving contradictory goals [1], and how a simple imitation (actually, an agent-following) strategy could lead a population of such agents to dramatically increase its performance when facing the problem of surviving in an initially unknown environment [2]. One important characteristic of such a system is the possibility of the emergence of social, stable subgroups of agents, knowing that each stable subgroup can be seen as a possible solution to this survival problem. Nevertheless, our study was limited to static geographical environments, and the agents were indistinguishable from one another. This limitation did not make the model completely able to give information about the emergence of subgroups. This paper reports on an evolution of our model and system in those two directions: on the one hand, we study how an agent can take advantage of its ability to enforce preferred goal-reaching strategies (and forget sub-optimal ones) to adapt to a dynamically changing environment. On the other hand, to show how stable social groups can emerge we designed a mechanism of dynamic individual signature used to distinguish agents from one another; this signature can then also be used to propose another imitation strategy. Experiments have been made that lead to think that our model and system are now mature enough to deal with complex problems such as optimisation, and to retrieve some unexpected results established for instance in spatial economics, such as unemployment traps [3,4].
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
327
Figure 1. An animat (the small triangle) in its environment, composed of landmarks (crosses), obstacles (rectangles), resources (labelled circles) and possibly other animats. Visible landmarks are linked to selected animat, and its visibility radius is put in evidence with the grey circle around it.
2. Material and Method A complete description of the client/server system used to simulate the agents can be found in [2], so we only mention the most salient features and properties of the model. Animats [5,6] live in an initially unknown environment, made of several points of interest (Fig. 1): • • • •
animats (triangles); resources (circles labelled “water”, “food” and “nest”); obstacles (solid squares); landmarks (small crosses), visible from anywhere except when occluded by an obstacle; • other agents if any. Such an environment can be built using a graphical map builder which is part of the system (server part). The animat can only see objects that are within its visibility range (the disk that surrounds it), except for landmarks, which can be thought of as fixed stars, and as such visible from any location (unless occluded by obstacles). In our simulator, the perception of the environment is given by the server to the agent at each time step. To survive, the animat needs to discover, and periodically go back to, the three types of resources (water, food and the animat’s nest). Associated with each resource type is a numerical level (thirst, hunger and stress, a percentage of satisfaction, called in the rest of this paper essential variables) that decreases exponentially over time. When one of the essential variables falls beyond a given threshold, the animat tries to reach a previously discovered, corresponding source.
328
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
The animat possible strategies fall into four categories, in increasing priority order: random exploration, imitation, planning to reach back a known source and obstacle avoidance. During its random exploration of the unknown environment, the animat stores acquired topological knowledge in a map of transition cells [7]. Those cells are built on the principle of the place cells that have been discovered in the rat’s hippocampus[8]. Each cell codes for a vector of angles (the azimuths under which each landmark is seen) representing a known position reached by the animat. Each timestep, a distance is computed between the current perception of the environment and the known positions. If the minimum distance is above a certain threshold, current location is learnt on a new cell. Those cells are then linked together, at a different level, to form the cognitive map [9] of the animat. When another animat is in sight, the agent can choose (using a probabilistic decision) to follow it, in the hope it will be led to an unknown resource. When the value of one of its essential variables (hunger, thirst, or stress) reaches its bottom level, the animat needs to reach back the corresponding resource, to have the variable reset to its maximum value. Finally, the obstacle avoidance is based on a Braitenberg [10] algorithm. Coupling the on-line building of the cognitive map with the possibility for the agent to adopt an imitation strategy allows for knowledge transmission among agents. In that respect, it can be compared with swarm intelligence-based systems. The main difference is that our architecture has no direct physical impact, such as pheromones for instance, on the environment. 2.1. Cognitive map plasticity We showed in [1] how adding a learning rule on the building and evolution of the cognitive map could help the agent acquire significantly smarter behaviours, for instance solve contradictory goals. The equation ruling this hebbian learning algorithm [11] is as follows: δwij (t) = −λwij (t) + αδij acti (t).actj (t)
(1)
where wkl (t) is the weight of a link between two successively reached cells k and l in the cognitive map, actc (t) the activity of cell c at time t and δij is 1 if and only if link ij is fired at time t. The current state of our model and system aims at making agents evolve in a dynamically changing environment, where some sources can disappear when intensively visited for a long time, and others can randomly appear somewhere in the environment. When a planning agent tries to reach a previously known source and realizes that this source has expired, two things happen: (i) the agent dissociates the current place cell from the formerly-corresponding resource, and (ii) it removes the resource from its set of known resources. Since the place cell does not fire any more when the agent feels the need for this resource, there are chances that the use of transitions leading to this place be progressively forgotten. Similarly, when a new, matching resource is discovered, the paths leading to the resource is rapidly reinforced, making the cognitive map evolve synchronously with the environment. This evolution is illustrated in Fig. 2, where left snapshot is taken when the (only) dynamic resource has not yet expired (t = 10000 time steps) whereas right picture represents the map after the agent has discovered a new
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
329
Figure 2. Cognitive map evolution induced by a changing environment. Sources are indicated with a black solid circle. The roads leading to the food source in the left snapshot (top of figure) of the cognitive map are partially forgotten when the source has changed (previous location in light grey, new location in black), whereas new paths have appeared near the new source location (right side of the right snapshot of the cognitive map).
matching source elsewhere in the environment (t = 25000 time steps). With respect to eq. 1, α = 5.10−2 and λ = 10−6 . We can see that some of the paths leading to the old resource location have been partially forgotten, and that new paths have emerged. 2.2. Identifying individual agents The fact that agents are indistinguishable from one another was a problem in a variety of aspects: no agent could be sure to follow the same agent when encountering a whole group; statistical results could not separate individuals and detect if one agent spends most of its time in the vicinity of another; etc. To solve those problems, we decided to add an individual signature to each agent. This signature is to evolve over time, when agents meet each other, in a way inspired from the talking heads experiments ([12], evolution of a shared lexicon in a population of embodied software agents). The agents developed their vocabulary by observing a scene through digital cameras and communicating about what they had seen together. We chose to design the signature as a short integer coding for a two eight-bit coordinate vector, just to be able to map it on a space isomorphic to the geographical space (see Fig. 3, where agents appear as triangles and signatures as diamonds), which allows to better follow the evolution of signatures, in the map and in time. Signatures are thus part of what is perceived by an agent, just like points of interest. When a new agent appears, its initial location is chosen randomly and its initial signature is the vector of this location. The evolution of agents’ signatures is ruled by their meetings with other agents: each time an animat decides to imitate another animat, its signature will slightly change to get closer to that of the imitated agent. We can expect two consequences from this dynamic evolution of signatures: on the one hand, it permits to put in evidence the formation of subgroups of agents in which the members imitate each other more often than they imitate members of other groups; on the other hand,
330
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
Figure 3. Agents (triangles on left-sided picture) and their signatures (diamonds on right-sided picture). Signatures being considered as a two-coordinate vector can be projected on the same physical space as agents, which shows where emerged subgroups of agents spend most of their time. Here for instance, one can distinguish two main subgroups: one near the upper edge of the central wall, the other near its left side.
agents can adopt a strategy based on signatures to choose the agent to imitate: they can choose the one that has the signature which is closest to their own, assuming that its goals are close to theirs. The equation that describes the variation of signatures is as follows: Si (t + 1) = Si (t) + δi (Sj (t) − Si (t)) + noise(t)
(2)
where Si (t) and Sj (t) are respectively the signature of the imitating and the imited agent at time t, δi represents the decision to imitate (it is a decreasing probabilistic function of the age of agent i: the older the animat, the less probable the decision to imitate). To avoid a global convergence to a unique signature, a noise is systematically added to each agent signature at each timestep. Indeed, without noise the basic mechanism described by this simplified equation only makes signature get closer from each other: it is thus likely that, after a certain amount of time, random agent meetings will have led to a global, common agent signature. This mechanism is similar to the Kohonen LVQ algorithm [13].
3. Experiments and Results We can compare two strategies of imitation on the emergence of subgroups: (i) the strategy described in [2], in which an agent decides, when encountering a bunch of other agents, to choose the one that is closer to its own direction (which we call here imitation from azimuth), and (ii) the one that relies on signatures, namely the agent will choose to imitate the one whose signature is closer to its own (imitation from signatures). One of the interesting characteristics of our system is that it is possible to deal with problems having several (families of) solutions. For instance, if we group together the three types of resources in a small area of the map, and we do the same in another area
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
331
Table 1. Influence of the imitation strategy on the formation of subgroups (2 villages). We report here the number of experiments that led to the emergence of 1, 2 or 3 subgroups, when imitation is based on the azimuth (left column) or on the signature (right column). # experiments
azimuth
signature
1 group
15
0
2 subgroups
6
14
3 subgroups
0
7
4+ subgroups
0
0
total experiments
21
21
of the same map, then the trajectories of the agents can stay closely related to one of the two possible “villages”. We compared the imitation strategies from two points of view: (i) the emergence of subgroups in a “multi-village” environment, and (ii) the survival rate of a population. 3.1. Influence of the imitation strategy on the emergence of subgroups We used the environment shown in Fig. 3, containing two separate “villages”. For each experiment, we launched 50 agents randomly in the environment and waited for 20000 time steps (which is approximately the time needed for the agents cognitive map to tile the whole environment). Then we study the set of signatures to determine the number of subgroups formed. We repeated the experiment 42 times, 21 times with each of the two imitation strategies, and reported, for each experiment, the number of subgroups that have emerged. The results are represented in Table 1. As expected, the possibility to distinguish individual agents leads to a more stable way to choose which agent to imitate, and thus to a bigger number of subgroups. What is more interesting is that, whereas the number of subgroups never exceeds the number of villages when agents imitate from azimuth, we found several cases where imitation from signatures leads to such a situation. This is a clear indication of the greater stability of the groups in this case: the “cost” to imitate someone who is not part of its own group is higher, due to the distance between the two signatures. Moreover, in the cases where three subgroups emerge, two of them are related to a village and one is made of individuals that “oscillate” between the two regions. It is likely that, if the number of villages increases, the number of different subgroups will grow more rapidly, but we didn’t have time to build experiments verifying that assertion. 3.2. Influence of the imitation strategy on the survival rate of a population In this series of experiments, we used the same environment and successively launched 40, 50, then 70 agents, for both of the imitation strategies, and counted the number of agents that survived, or died for not having found all of the three types of resource. Each of the experiments has been conducted 7 times, and the results are summarized in Table 2. Average number of lost agents is almost always greater with signature-based imitation strategy, but what can seem even more curious is that the standard deviation is significantly higher. Observing what happens during those simulations, we saw that this was due to some subgroups creation around one or two agents that did not find the
332
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
Table 2. Influence of the imitation strategy on the survival rate of the populations. Average number (and standard deviation) of surviving agents. The much higher standard deviation for signature-based imitation is due to some global group disappearance. # agents
azimuth
signature
40
15.0 +/- 1.5
18.4 +/- 2.4
50
16.3 +/- 3.1
17.6 +/- 4.9
70
27 +/- 4.8
31.1 +/- 8.3
average
19.4
22.4
three resource types at the time the group emerged. Consequently, since the chance for an agent to follow its peers is high, the result is of type “all or nothing”: either the resources are discovered by one of the group members, and the whole group survives, or they are not and the whole group disappears. This phenomenon could be related to another process, observed in the spatial economics field, known as “unemployment traps”: they are well-defined portions of urban territories in which the unemployment rate is significantly higher than anywhere around [4]. Although it might be possible for people living there to find a job a few kilometers ahead, things are like if people didn’t try to move outside this small region. In our simulation, although the needed resource is in the reach of the group, it is only seldom discovered because of the strength of the intra-group link. 4. Conclusion The current step of our model and system allows to solve non trivial planning and optimisation problems, with only very little assumptions on the initial knowledge of the agents. They can adapt themselves to a changing environment, share a partial knowledge of the problem with each other, handle multiple, contradictory goals and find different solutions to the problem when multiple solutions exist. It should now be possible to see how the optimisation can be pushed one step further, by making agents able to act on their environment instead of just adapting themselves to it: for instance, carrying some resource from “natural” sources to locations near important paths could dramatically enhance the performance of the global system, as far as the average “satisfaction” level of the agents (the average of an animat’s essential variables values) is concerned. References [1] P. Laroque, M. Quoy, and P. Gaussier. Learning and motivational couplings promote smarter behaviors of an animat in an unknown world. In European Workshop on Learning Robots, EWLR, pages 25–31, Prague, september 2002. [2] P. Laroque, N. Cuperlier, and P. Gaussier. Impact of imitation on the dynamics of animat populations in a spatial cognition task. In IAS-8, pages 71–79, Amsterdam, 2004. [3] T.C. Schelling. Models of segregation. journal of mathematical sociology, 1:143–186, 1971. [4] F. Gaschet and N.Gaussier. Concentration économique et ségrégation spatiale, chapter 12, "Les échelles du mauvais appartement spatial au sein de l’agglomération bordelaise", pages 221–241. M.A. Buisson and D. Mignot, de boeck university edition, 2005.
P. Laroque et al. / Cognitive Map Plasticity and Imitation Strategies
333
[5] J-A. Meyer and S.W. Wilson. From animals to animats. In MIT Press, editor, First International Conference on Simulation of Adaptive Behavior. Bardford Books, 1991. [6] A. Drogoul and J-A. Meyer, editors. Intelligence Artificielle Située. Hermes, 1999. [7] P. Gaussier, A. Revel, J.P. Banquet, and V. Babeau. From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological Cybernetics, 86:15–28, 2002. [8] J. O’Keefe and N. Nadel. The hippocampus as a cognitive map. Clarendon Press, Oxford, 1978. [9] N. Cuperlier, M. Quoy, Ph. Laroque, and Ph. Gaussier. Transition cells and neural fields for navigation and planning. In J.R. Alvarez J. Mira, editor, IWINAC05, Lecture Notes in Computer Science, pages 346–355. Springer, june 2005. [10] V. Braitenberg. Vehicles: experiments in synthetic psychology. Cambridge MA: MIT Press, 1984. [11] D. Hebb. The organization of behavior. Wiley, New York, 1949. [12] L. Steels. The Talking Heads Experiment. Volume 1. Words and Meanings. Laboratorium, Antwerpen, 1999. [13] T. Kohonen. Self-Organization and associative memory. Heidelberg: Springer-Verlag, Berlin, 3rd edition, 1989.
334
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Examination of Abilities based on Pseudolite System for Indoor Positioning Isamu Kitano a,1 , Keiji Suzuki a a Future University-Hakodate Abstract. Recently,the GPS has been indispensable to positioning. The GPS technology is the important social infrastructure for a position information offering. However,in some situations, such as in underground space,inside of buildings,it is really hard to navigate with GPS receiver.Therefore,in order to improve the performance of satellite-based positioning,using the pseudolite technology has been proposed. It is expected that seamless positioning service can be provided in wider area without replacing existing GPS receivers, with the pseudolite technology. The position information is required in order to move robot. We have developed the positioning system which is possible to achieve positioning in indoor. Keywords. GPS, Pseudolite, Indoor positioning
1. Introduction Here several years, the GPS is used in various fields, beside the infrastructure upgrading that surrounds the GPS goes on. Along with this passage, even GPS itself is accomplishing evolution. Including, SA (Selective Availability) the cancellation in May, 2000, U.S government is advancing the modernization of the GPS. Other GNSS (Global Navigation Satellite System), there are GLONASS of Russia and Galileo of EU. Recently,Japan decided that develops the Quasi-Zenith Satellite System. This system centering on Japan, offers the service of a high quality communication and navigation system for mobiles. The plan is advanced aiming at application of around 2008. But it is impossible virtually to solve the problem of the origin of the GNSS completely. The problem is that it cannot use in the place which satellite of radio wave does not reach. The technology that has the possibility that solves this problem is Pseudolite Technology.
2. Indoor Positioning Techonology At present,the technology of the positioning of indoor that used 1. RFID-Tag 2. Inertia Navigation 3. Pseudolite 1 Future University-Hakodate, 116-2, Kameda-Nakano, Hakodate,Hokkaidou 041-8655 Japan. Tel.: +81 138 34 6229; Fax: +81 138 34 6229; E-mail:
[email protected]
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
335
etc. is appearing. In the case, cost of user units, infrastructure preparation,connected with the satellite-based positioning etc. need to be considered. 2.1. RFID Tag Establishing the tag that gave the position information of oneself it is the method that obtains position information by communicating with the tag through a radio communication. The tag of the enormous number is necessary to obtain the positioning accuracy of some extent, although the cost of the tag is cheap. As the cost becomes high. It is hard to maintenance after it is established because there are many numbers. 2.2. Inertia Navigation The method that obtains movement distance and direction by using the gyrocompass and accelerometer etc. This method has a disadvantage that even the error integrate, to calculate the integration distance and angle from initial point. 2.3. Pseudolite Pseudolite is the like-GPS satellite that is established in the ground. The idea to use peudolites is older than GPS itself. Before the U.S. Department of Defense launched the first GPS satellites; it tested the system concept with ground-based transmitters called "pseudo satellite", which was shortened to pseudolite. A pseudolite transmits a signal with code-phase, carrier-phase and components with the timing as the satellite signals and with nearly format. A GPS receiver acquires this signal and derives code-phase pseudo-range or carrier-phase measurements to be used in navigation algorithm. The major differences are that a pseudolite typically does not contain a high-accuracy atomic clock and that the pseudolite ’s position must be described in geographical terms rather than in orbital elements.
3. Indoor Positioning System The indoor positioning system using the Pseudolite system is established in the gymnasium in Future University-Hakodate. The space of the gymnasium is about 25×40m. There are two antennas of which one is for the base role on the known point and another is for the rover role. The measurement of the rover position is carried out as a type of the relative positioning. The relative positioning is to measure the relative position between the point that the coordinate understood in advance and unknown point. 3.1. Construction The cable from the GPS antenna(outdoor),is used for the synchronization among the pseudolites, is connected to the splitter for distributing to the pseudolites. The amplifier is used between the GPS antenna and the splitter for preventing to loss of the transmission. The GPS signal and the pseudolite signal were mixed by using the mixer to cause synchronization between the base and rover receivers. The length of all cables are determined to keep the same time delay for synchronization as precisely as possible.(see Figure 1)
336
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
Figure 1. construct of the indoor positioning system
3.1.1. Transmission antennas The transmission antennas from each pseudolites locate in the each corner of the catwalks in the gymnasium. The antenna consists of the fixation metal fittings and the tripod. In the tripod, the helical type of the transmission antennas is installed to give directivity.(see Figure 2)
Figure 2. transmitting antenna
3.1.2. Transmitter The IN400 of IntegriNautics as the pseudolite is used for this experiment. The IN400 is including the GPS receiver for time synchronization and the pseudolite signal transmitter.(see Figure 3) 3.1.3. Configuration of Transmitters The Configuration of the transmitters are required to consider the following factors, • PRN • Pulse Delay • Attenuation The PRN is the prefix number which it is allotted to the GPS satellites. The PRN3337 are assigned to the ground transmitters. The pulse delay is able to mitigate the nearfar problem, with the delay quantity of each transmitter of the time that transmits the
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
337
Figure 3. IntegriNautics IN400
pseudolite signal with pulse. The attenuation is something like the resistance that is able to change the signal strength of the pseudolite signal. The transmission strength can be weakened more as the value is big. These setting must be done about all transmitters. (see Figure 4)
Figure 4. Setting of the Transmitter
3.1.4. Base and Rover Receivers For the positioning of the base and rover receivers, the antennas are set high position for avoiding the influence of the persons around the antennas. The rover antenna is fixed on the pole in the truck and attached all direction prism for the surveying instrument. The choke-ring antenna for base receiver is used to reduce MultiPass effect. As the GPS receiver, the NovAtel OEM3 is used that can receive the corresponding 2 frequency signal of GPS and the pseudolite signal. (see Figure 5,6,7,8)
4. Experiment It was observed in the static and movement situation of the rover in this experiment. In static situation it was observed with 3 points that are in fig.9. It was observed for about 5 minutes with each point. In movement situation two kinds of observation were carried out. One, rover was moved with a straight line. Rover made 3 round trips between r1 and
338
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
Figure 5. base
Figure 6. rover
Figure 7. all direction prism
Figure 8. NovAtel OEM3 GPS Receiver
r2.(see Figure 10) Another,rover was moved including the curve. Rover made 3 round trips between p1-p4.(see Figure 11) These plot points in see Figure 9,10,11 positioned it by a surveying instrument.
5. Result The result that computed the data that were acquired with each point are Figure.13Figure.17. In figures PL1,PL2,PL3 and PL4 are corresponding to PL33,PL34,PL35 and PL36. The unit is a meter. IPos(Indoor Positioning Software,ver.1.0.0) that these position calculation were offerd from GNSS Technologies Inc. was used.(see Figure 12) This software makes the base a datum point and calculate the relative position of each point. The positioning calculation result with IPos of s1 is shown to Figure.13. It is the real positioning point that the orange point that is painted in the figure. The positioning cal-
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
Figure 9. s1-s3 plot
Figure 10. r1-r2 plot
339
Figure 11. p1-p4 plot
Figure 12. IPOS Tracking Panel
culation result with IPos of s2 is shown to Figure.14. It is the real positioning point that the orange point that is painted in the figure. The positioning calculation result with IPos of s3 is shown to Figure.15. It is the real positioning point that the orange point that is painted in the figure.
Figure 13. static s1
Figure 14. static s2
Figure 15. static s3
The positioning calculation result with IPos of moving r1-r2 is shown to Figure.16. It is the real locus that the orange line that is painted in the figure. The positioning calculation result with IPos of moving r1-r2 is shown to Figure.17. It is the real locus that the orange line that is painted in the figure.
340
I. Kitano and K. Suzuki / Examination of Abilities Based on Pseudolite System
Figure 16. Moving r1-r2
Figure 17. Moving p1-p4
6. Summarize It was constructed indoor positioning system by using pseudolite. It was observed in the static and movement situation of the rover in this enviroment. As a result, the position were calculated with the error of several meter from the real position of each point. However, the several meter is serious error in indoor. It needs to revise this system that was constructed to improve the positioning accuracy.
References [1] Yusuke Konishi, Ryosuke Shibasaki, "DEVELOPMENT OF A SIMULATION SYSTEM TO ESTIMATE AVAILABLE AREA OF GPS AND PSEUDOLITE",Center for Spatial Information Science, Institute of Industrial Science, University of Tokyo, 2001. [2] YongCheol Suh, Yusuke Konishi, Ryosuke Shibasaki, "INTEGRATION OF GPS AND PSEUDOLITE FOR SEAMLESS POSITIONING : Fundamental Verification Experiment and Results ",Center for Spatial Information Science, University of Tokyo、2002. [3] I.Petrovski, K.Okano, M Ishii, H.Torimoto, GNSS Technologies Inc., Y.Konishi, R.Shibasaki, "Pseudolite Implementation for Social Infrastructure and Seamless Indoor/Outdoor Positioning", Institute of Industrial Science, University of Tokyo,ION GPS 2002.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
341
A Memory-based PID Controller for Indoor Airship Robot Takamasa Sato a,1 and Keiji Suzuki a a Future University - Hakodate, JAPAN Abstract. The purpose of our research is to realize the autonomous airship robot system. An airship robots can move safely and float long time in air with low energy. To control an airship which has complex behavior, we use a Memory-based PID control method. The scheme of this method generates PID parameters using stored input/output data in the database. In this paper, we mention about construction of our airship robot platform and results of goal directed navigation experiments in a computer simulation. Keywords. Airship Robot, PID, Motion control, Navigation
1. Introduction In recent years, autonomous robots become one of the longest growth sectors in robotics. The robot which communicates with human directly is well researched. The new field of the human communication robot is established. In two-dimensional surface, robots can move with wheels, crawlers or multi-legs in researches. However, aiming for communication with humans, aiming for search and rescue victims of hazards like earthquake, they do not need to move on ground. Rather the robots which can moved to in the air is more efficiency and safely. Today, airplanes, helicopters and airships and more are researched as autonomous mobile robots which can move three dimensional space. Airplanes and helicopters are called Heavier-than-air Aircraft: HTA because their weights are heavier than air. Airships are called Lighter-than-air Aircraft because their weights are lighter than air using levitation gas such as helium gas. HTA’s advantages are big carrying-capacity – payload, and high-speed movement. Their weak points are to need big energy to hover them, and difficult to use in the room from problem of safety. LTA is safer than HTA because they can levitate with no control and they have a little risk of crash. Problems of LTA are small payload, its behavior is dependent from environmental state such as surrounding temperature and pressure[2]. However, these problems get possible to be solved by development of various kinds of technologies. Payload problem is solved by various small size devices by technical evolution of recent years. And it disappear with a problem already that paradigm shift of needs to robots which do not need high accuracy positioning and movement in like an amusement application. 1 Future
University - Hakodate. 116-2 Kameda-Nakano Hakodate, JAPAN.
[email protected]
342
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
Figure 1. an Airship Robot
Airship robots have been studied in various contexts. Motoyama et al. present one of the indoor miniature airship robot, and control with combination of CMAC-based Qlearning and the adaptive construction of a state space with the GA, to achieve the task which is traveling around three goals on two-dimensional surface[3]. The experimental results show that their learning algorithm works well and the robot gets the control rule to achieve the task. Nishimura et al. build the small airship simulator, and experiment the pursuit task with simple Q-learning[1]. From the results, it is not successful that problem is not simple using simple Q-learning. On the other hand, PID controllers have been used to control many liner and non-liner systems. For non-liner system, some kinds of control methods based on a PID controller are proposed. These methods tune PID parameters using Artificial Neural Networks or Fuzzy logics or Genetic Algorithms. But these methods need long time to learning. Therefore a new scheme of PID controllers based on a memory based modeling is proposed. According to the memory based modeling, local models are automatically generated based on input/output data pairs of the controlled object stored in the database. In this paper, we use Memory-based PID controller[4] for the flight control of an indoor small airship. This method; one of memory based modeling methods, can adjust the PID parameters in an on-line even if reference signals are changed, and/or system parameters are changed. We describe construction of airship robot hardware, design of flight control architecture and results of experiments in a computer simulation. From the experimental result, effectiveness of a Memory- based PID method was revealed to control of an airship.
2. Airship System We are now constructing an autonomous control system for an indoor miniature airship to realize the autonomous three-dimensional mobile robot. This airship system consists of indoor miniature airship 3.5 meter long (see figure 1, 2), position/location sensing system, communication system and autonomous control system. The airship consists of a balloon loading helium gas, a gondola with propellers and four tail assemblies. The local coordinate system of the airship is defined follows: center of the balloon as origin, forward direction as x-axis, and right-handed system. The body
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
343
Figure 2. Axis of an Airship Robot
Figure 3. System diagram of an airship
of the airship’s shape is ellipsoid of revolution which is rotational symmetry around of x-axis, 3.5 meter long, 1.68 meter in diameter and 5.3 cubic meters in volume. The airship has three propellers, two of them for propulsion are posted sideways from the gondola, and one of them for rotation is lower part of the tail assembly. Propulsion propellers rotate same direction each other, therefore airship can move forward and backward. In addition, these propellers propulsion direction is rotated from horizontal position to vertical position, so airship can move up and down. We call this operation to change the propellers propulsion direction "tilt" operation. The combination of tilt operation and normal/reversal rotation of propulsion propellers make holonomic system in x-z surface. The rotation propeller provides control of attitude angle Yaw. As a whole, the airship system forms nonholonomic system which input three-dimension: propulsion propeller, rotation propeller and tilt operation, output four-dimension: x-y-z-axis and attitude angle Yaw. The communication system communicates motor control signal and internal states such as flight attitude angles and other sensors. The RC module of a radio-controlled model plane is used to transmit and receive the motor control signal. The wireless-LAN is used to communicate internal states. The computer to control the airship is set up on ground, and control it by this wireless communication system. Three-dimensional motion sensor which senses attitude angles is installed in the airship. The lightwave endmeasurement machine is used to measure the three-dimensional location sensing. The airship system hardware architecture shows in figure 3,
344
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
3. Control System 3.1. Memory - Based PID Controller A memory-based controller is one of memory based modeling method. This method generates PID parameters using stored input/output data in the database. Stored data structures are defined as following vector Φ(j). ¯ K(j)], j = 1, 2, · · · , N (0) Φ(j) := [φ,
(1)
¯ := [r(t + 1), r(t), y(t), · · · , y(t − ny − 1), u(t − 1), · · · , u(t − nu + 1)] (2) φ(t) K(t) := [KP (t), KI (t), KD (t)]
(3)
where N (0) is initial number of pieces of database, r(t) is target value at time t, y(t) is system output at time t, u(t) is system output at time t, and ny , nu are dimensions of y(t) and u(t). Following design steps is PID parameter updating algorithm. STEP 1 Create the initial database Create the initial vectors of database Φ using pre-calculated PID parameter from experimental result with input/output data on this parameter. In the original Takao’s paper, this initial PID parameter is fixed, but in this paper, we use some parameter sets. The descriptions will hereinafter, this is because a problem of a reference model in this algorithm. STEP 2 Calculate each distances and Choose neighbors ¯ and stored each points φ(j) ¯ Calculate distances between desired point φ(t) using L1 norm. φ¯l (t) − φ¯l (j) max φ¯ (m) − min φ¯ (m) , j = 1, 2, · · · , N (t) (4) m l m l
ny +nu +1
¯ ¯ d(φ(t), φ(j)) =
l=1
N (t) is database size at time t. Next, choose k neighbor vectors from distance d in ascending order. STEP 3 Compose a local model from neighbors Compose a local model from k neighbor vectors using Linearly Weighted Average as follows. K (t) =
k
wi K(i)
(5)
i=1 ny +nu +1
wi =
l=1
[φ¯l (t) − φ¯l (i)]2 1− [maxm φ¯l (m) − minm φ¯l (m)]2
After calculate wi , normalize each w to meet
wi = 1.
(6)
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
345
Figure 4. Block diagram of Memory-based PID
Next step in Takao’s paper, K (t) is modified to decrease error using steepest descent method referred the reference model. The reference model is modeled object system. But the airship’s model has great deal of complexity. In this paper, the objective is to simplify the flight control without complex analysis, so we abbreviate this step. STEP 4 Remove the redundant data To reduce database size and calculation cost, remove the redundant data. Redundant data are selected by following conditions: • Close range points to target point without neighbors and, • PID parameter is Similar to other point’s one. STEP 5 Return to STEP 2. A block diagram of this controller is shown in figure 4. Mentioned above algorithm is outline of Memory-based PID controller. 3.2. Flight Control Architecture of an Airship Robot This architecture controls an airship to goal directed navigation. As described above, an airship system has non-holonomic system. To reduce complexity, the flight control is divided to two phases; rotation (Yaw) control and movement control. Phase 1 Rotation control First, turn an airship heading to target direction using Memory-based PID. Manipulated variable is given as degree of the angle between current direction and target direction eθ (see figure 5), control values corresponds propulsion power of tail propeller mθ . mθ (t) = KP θ · eθ (t) + KDθ
d eθ (t) dt
(7)
No integral term is considered for the controller, because unintended movement and rotation of an airship have possibilities to increase control destability[5]. To adjust the effect of the airship’s inertia and control delay, we implement the simple prediction: an airship continues in a state uniform liner motion:
346
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
Figure 5. Rotation Control
Figure 6. Deviation from the Airship to the Target Position
eθ (t) = eθ (t) + (eθ (t) − eθ (t − 1)),
(8)
where e θ(t) is corrected current deviation. Equation (7) is rewrote discretely and using e (t): mθ (t + 1) = KP θ · eθ (t) + KDθ
(eθ (t) − eθ (t − 1)) Ts
(9)
When the deviation eθ (t) meets below equation, the controller goes next phase. eθ (t) < αθ
(10)
where αθ is user specified parameter. Phase 2 Movement control Next, start the straight movement control using simple PID method(see figure 6). Propeller’s tilt angle is controlled to target direction ep . There is no need for including a closed loop control to a tilt angle control servo motor, because it has position control circuit. Three-dimensional distance from current position to target position el is deviation on PID. ml (t) = KP l · el (t) + KDl
d el (t) dt
(11)
In this phase, an airship’s heading is also controlled arbitrarily. Iterate these steps on changing target position. 4. Experiment 4.1. The Task To evaluate proposed method, we experiments goal directed navigation on the computer simulation (see figure 7). The simulation world, goal positions and initial airship’s position are shown in figure 8. The goal positions are prepared given and updated when an airship reaches to current goal position. Initial goal position is g1, next is g2, and next of g6 is g1.
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
347
Figure 8. Initial and Goal Positions
Figure 7. Screen Shot of the Simulator
Table 1. Parameters on Memory-based PID method parameters
values
Orders of the information vector
ny = 3 nu = 3
number of neighbors Coefficients to inhibit the data
k=6 α1 = 0.8
Initial number of data
α2 = 0.2 N (0) = 180
Table 2. User specified parameters parameters
values
PID Parameters (for movement control)
KP l = 5.0 KIl = 0 KDl = 20
goal margin
m = 1.5[m]
4.2. Settings User specified control parameters are shown as Table 1 and Table 2. Initial vectors of database in Memory-based PID method is stored by preliminary experiments on some PID parameters set as follows: KP θ = 0.1, 0.5, 0.9, KIθ = 0, KDθ = 0.1, 0.5, 0.9, for each combination of KP θ and KDθ , 20 vecters are stored. 4.3. Results Figure 9 shows the experimental results for the trajectory of an airship traveling each goals using a memory based PID. an airship has achieved goals correctly. For the comparison, the trajectory of controlled by simple (paramter-fiexd) PID is shown in figure 10. The controllable parameters of PID are manually set by repeated trial and error in the preliminary experiments. The comparison of memory-based PID with simple PID, the proposed method has performance equivalent simple PID method without parameters tuning. This is because the optimal parameters are selected from the database each step by a memory-based PID controller. Figure 11 and 12 show the trajectories of rotate deviation eθ , and figure 13 shows trajectories of KP θ and KDθ corresponding to figure 11. From these graph, it is understood that the parameters has changed according do the state of an airship.
348
T. Sato and K. Suzuki / A Memory-Based PID Controller for Indoor Airship Robot
Figure 9. Trajectory of the Traveling goals (with Memory-based PID)
Figure 10. Trajectory of the Traveling goals (without Memory-based PID)
Figure 11. Trajectory of Rotate Deviation (with Memory-based PID)
Figure 12. Trajectory of Rotate Deviation (without Memory-based PID)
Figure 13. Trajectories of PD Parameters (with Memory-based PID)
5. SUMMARY In this paper, we developed an airship robot system and simple flight control system using Memory-based PID method. A result of the experiment on the computer simulation shows that the proposed method has performance equivalent simple PID method without difficult parameters tuning. At the future work, we will try to control real airship robot in real world.
References [1] Ayako Nishimura, Hidenori Kawamura, Masahito Yamamoto, Azuma Ohuchi, Learning control of autonomous airship for three-dimensional pursuit problem, In Proceedings of The 8th International Symposium on Artificial Life and Robotics, 194-197, 2003. [2] Gabriel A. Khoury, J.David Gillett. Airship Technology. Cambridge University Press, 1999 [3] Keiko Motoyama, Keiji Suzuki, Masahito Yamamoto, and Azuma Ohuchi. Effectsof Evolutionary Configuration of Reinforcement Learning Applied to Airship Control. Intelligent Autonomous Systems-6 (IAS-6), 567-572,2000 [4] Kenji Takao, Toru Yamamoto and Takao Hinamoto, A Design of Memory-Based PID Controllers, SICE, Vol.40, No.9, 898-905(2004) in Japanese. [5] Zwaan, S., Bernardino, A. and Santos, J., Vision based Station Keeping and Docking for an Aerial Blimp, in proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems(2000).
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
349
Co-evolutionary Design for AGV Systems a
Ryosuke Chiba a , Jun Ota a and Tamio Arai a Dept. of Precision Engineering, School of Engineering, The University of Tokyo Abstract. Competitive co-evolutional design process is proposed in this paper. The method is applied to the design of a robust flow-path network in AGV system. For this robust flow-path, the number of possible tasks is very large in AGV systems; therefore it is impossible to test the promising flow-path network against all of possible tasks. The problem is solved by the method of difficult task design with GA. The effective flow-path network is designed with GA simultaneously, because the difficult tasks depend on the flow-path networks. Competitive Co-evolution is applied to the simultaneous design. Results of the designing are shown through simulations. Keywords. AGV transportation system, Competitive co-evolution, Packaging
1. Introduction Recently, material-handling systems with Automated Guided Vehicles (AGVs) are being used in manufacturing factories. These programmable AGVs circulate on a guide-path and transport materials in factories (see Fig.1). Generally, one of the important design problems associated with the development of AGV systems is a flow-path network design problem [1][2][4][3][5]. The flow-path network design problem is how to design the placement of guides on the factory floor. The network restricts the area where AGVs run, but it is necessary for AGV systems to improve the precision of AGV movements. The researchers however designed the flow-path network as uni-directional only and assumed that AGVs would not have collisions with other AGVs on the path. Therefore, bi-directional flow-path network cannot be modeled in the same way. Bi-directional flowpath network is more efficient than uni-directional one. Therefore only bi-directional flow-path is used in this paper. The design problem of bi-directional network is solved in our previous research [6]. However, the designed flow-path network is optimized against a certain task. From our previous research, the AGV system with the optimized network against a certain task is not effective against other tasks. All above researchers also optimized against a certain task. In this design process, when the task changes to another task, the flow-path network should be designed again from the beginning. The flexibility is one of the strong merits in AGV system. Therefore, the robust AGV system is the more efficient transportation system than other systems.
350
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
Figure 1. AGV system
Figure 2. Robust flow-path network design process
In this paper, the method to generate the robust flow-path network design with Competitive Co-evolution is proposed. Both the network and the task evolve competitively as in Fig.2, and finally, a robust network and difficult tasks are designed. The robust network means that the minimum number of conveyance is large with the network to some tasks. Namely, the mini-max problem should be solved. To realize the method, three issues should be solved. 1. Bi-directional flow-path network should be designed efficiently. 2. Difficult tasks should be designed against AGV systems. For robust system design, one task is not enough as a test-set to design and evaluate a robust system. Therefore, an effective task-set should be designed. 3. A "solution rotation problem" in the competitive co-evolution should be solved for the robust network against tasks. The details of the "solution rotation problem" are mentioned in section 4. The first issue is resolved with dividing environment with square cells and calculating connections between cells using Genetic Algorithm. The second issue is also solved with GA because the number of possible tasks is very large. These first and second issues will be mentioned in section 3. The design of a task-set is achieved by "packaging" mentioned in section 4.The third issue is done with packaging method in the competitive co-evolution. This will be mentioned in section 4. In section 5, the proposed method is applied to a design simulation of AGV systems. And this paper is concluded in section 6.
2. Model of AGV systems 2.1. Settlement Environment refers to the factory layout followed by the AGVs, which includes the pickup points where the supplies are stored items and the drop-off points where the machines to wait for the items are located. Similarly, task refers to the conveyance requirements. The conveyance requirement includes pick-up point number, drop-off point number, and the time when the requirement occurs. An example of the environment for a material handling system is shown in Fig.3. The shaded areas in this figure indicate environmental obstacles. P and D in this figure indicate pick-up point (storage) and drop-off point (machine) respectively.
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
351
Figure 3. Work environment
This environment is illustrated on a two-dimensional plane that can be divided into square cells. This environment is based on a real car factory’s environment. As above described, the task is a group of conveyance requirements. The conveyance requirements occur at every fixed period. When an AGV is assigned the requirement of the task to, the AGV acts on the orders to convey items from the pick-up and drop-off points. The transporter routing as a input is reactive approach [7], where all of AGVs takes the shortest distance path to their destinations ignoring the other AGVs. This approach is very simple and often applied to a real system. However, it is easy to deadlock. Therefore, it is very important and difficult to design a network to be non-deadlock on this routing. 2.2. Problem definition The AGV systems must be designed with the considerations as follows. • Task should be completed within a certain period. • The network path length should be kept to a minimum. The systems are evaluated in this order, so completion of task is the top priority. It is better for the workers if the systems have fewer aisles of AGVs and small network because the workers share the floor with them. As mentioned later, the fitness function of systems is defined according to these requests.
3. Design processes 3.1. Flow-path design The flow-path network should be designed with adjusting to routings and a number of AGVs. To be mentioned in section 1, the bi-directional flow-path network design problem should be solved for an effective AGV system. A bi-directional flow path system is, however, very complex to design. The simulation-based optimization is applied to this problem using Genetic Algorithm in this paper. In our method, the genes represented binary indicate connections of the cells. They restrict the connection with the upper and right cell. For example, if a gene indicating the connection with the upper cell is zero, AGVs cannot move to the upper cell. By the connections, an individual represents a flow path. The number of the genes of an individual is about twice as many as that of the cells. Those individuals are tested through simulations.
352
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
The one of the reasons using GA is the number of possible networks are very large. In this representation for networks, the length of gene is about 2 times of cell size. For example, on Fig.3, the number of cells is 60, so the gene length is 120. In this case, AGV system has 2120 possible networks. Genetic Algorithm is strong to such a large-scale optimization. The details of this design method were mentioned in Chiba2002 [6]. 3.2. Task In the environment at Fig.3, the number of pick-up points is two and that of drop-off points is also two. Thus, the number of possible transport requirements is 22 = 4. A task in this paper is a set of the requirements. For instance, when a task is consisted of 50 requirements, the number of possible tasks is 450 = 2100 . This number is such large that a flow-path network cannot be evaluated against all of tasks. With this reason, GA is applied to design a difficult task. The length of the gene is 2 times of the number of transport requirements. In previous instance, the length is 100. As mentioned in previous subsection, such a kind of problem can be solved with GA.
4. Competitive Co-evolution This competitive co-evolutional design method was proposed in [8], in which the details are described. The method is explained in brief in this section. 4.1. Methodology An effective flow-path network and difficult tasks should be designed simultaneously, because the evaluations of these two elements depend on the other element. To solve this problem, competitive co-evolution is adopted. In this method, at beginning, some individuals for flow-path networks and another some individuals for tasks are prepared. And then, the group of networks and the group of tasks evolve by turn. In this process, when an individual is evaluated, the individual chose an individual from among the other group as the competitor. This system is described in detail by Sims1994 [9]. The whole algorithm is shown in below. 1. Set the initial groups for a network and a task. In these groups, some individuals are included respectively. And set 0 to the number of generation. 2. Evaluate the individuals representing networks. In this term, select a competitor from task group. 3. When all of individuals are evaluated, evaluate the individuals for tasks. 4. Evolve the group of networks and evaluated them. 5. Evolve the group of tasks and evaluate them. Increase 1 to the number of generation. 6. If the number of generation is a certain end number, finish this calculation. Otherwise, go back to 4.
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
353
Figure 4. Packaging
4.2. Packaging Generally, each individual is tested against the best competitor on the competitive coevolution [9] [10]. This strategy is called "best vs. all". However, with this competitive co-evolution, an effective flow-path network and difficult tasks cannot be designed because of "solution rotation problem". The example of this phenomenon is shown in below. In this competitive co-evolution system, when a network na are generated against a certain task ta , tasks try to beat na and a difficult task tb is generated. Then, to be effective against the task tb , a network nb are obtained. If the hard task against nb is ta , the networks however evolve to na again. In this case, these solutions are generated in rotation repeatedly. To solve this problem, a strategy named "packaging" is allied [11]. In this method, when a difficult task tk are found, the individual are added into a package P as pi . And when a individual of a network are evaluated against all of the packaged tasks (∀p). With this method, a robust network against various tasks can be generated. Individuals for tasks are properly added to and subtracted from the package according to the rules as follows. • addition: the best-performing individual after the tasks are evaluated. • subtraction: the individuals which are not most difficult against all networks after the networks are evaluated. In [8], the proper package cannot be designed beacuse this subtraction operation is not applied This packaging method makes it possible to generate a robust and effective flow-path network and a group of difficult tasks without the "solution rotation problem". 4.3. Fitness function The fitness of a flow-path network is calculated with all of the tasks in package and the lowest value is the fitness of the individual for the network. For the evaluations of individuals, the fitness function of GA is designed as follows. f itness(ni ) = min{∀p K1 × x1 (ni , p) + (1.0 − K1 ) × x2 (ni )},
(1)
354
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
whereas K1 is a constant value and setted 0.8 in this paper. The values in this formula are as follows. • x1 : the number of requirements that AGVs achieve. • x2 : the degree of the flow-path length. The shorter one is the higher value. The fitness function indicates that the individuals as follows are easy to survive. • The system that can make AGVs convey in quantity. • The path network which is not crowded. The suitable system mentioned in section 2 can be constructed by this fitness function. At the evolutions of tasks, the fitness is calculated as follows. f itness = 1.0 − {K1 × x1 + (1.0 − K1 ) × x2 }.
(2)
This function indicates that the hard task can be easy to survive.
5. Experiment and Comparison 5.1. Settlement To verify the proposed method, some experiments through simulations are carried out. There are 3 kinds of settlements that must be described. Those are, i)about Genetic Algorithm, ii)about AGV system and iii)about co-evolutional method to be proposed. The details of those are as follows. • The number of individuals for a flow-path network is 100 and that for a task is also 100. • The number of generations is 4500 including both networks and tasks. The number of network evolving is 1500 and that of task evolving is 3000. • The environment of AGV system is shown in Fig.3. In this environment, the number of connections between the cells is 104. That is, the length of the genes to represent a flow-path network is 104. • All of tasks consists of 48 conveyance requirements. Therefore, the number of possible tasks is 448 = 296 and the length of the genes to represent a task is 96. • The number of AGVs is 3. • AGVs can move to 4 directions (up, down, left and right) within a certain time. This time is defined as one step. • The transporter routing is reactive approach as mentioned in section 2. • The simulation is over at 350 steps. • Networks evolve 5 times and then tasks do 10 times, and this cycle is repeated. 5.2. Results of experiments The results of the above-mentioned simulation are shown in this subsection. At the end of this simulation, a flow-path network shown in Fig.5 is generated. This network looks not to be crowded. The number of the connections is 29 on the 104 possible connections. Figure 6 shows the movements of AGVs.
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
Figure 5. Result of flow-path network
Figure 6. Routing rule of AGVs
Figure 7. Fitness
Figure 8. Number of packaged tasks
355
The transition of system effectiveness is shown in Fig.7. At the beginning, the value changes very much at every five or ten generations. This swing indicates that the package is not enough to make a robust flow-path network. As the network becomes robust, the tasks do not perform well and it becomes difficult to design a difficult task. The transition of the number of packaged tasks is illustrated in Fig.8. The number increases at every ten generations at the beginning. Later, the number does not grow well because the package is enough. The maximum size of the package is 89 and the final size is 55. 5.3. Discussion From these results, some insights come to us. First, the designed network with our method is not crowded and makes it possible that AGVs can transport all of 48 items. It indicates very effective for AGV systems to apply GA to design a flow-path network. These two requests, simple network and effective transportation, are not easy to achieve because the relation between non-crowded and many conveyance is trade-off. Secondly, with our method, the effective task-set can be obtained. The number of tasks including the package is 89 and the number of possible tasks is 296 1029 . The package is very compact but enough to make a robust network. It is very important that the handling of only 89 tasks leads to complete all of possible tasks. One of the features of the designed network is that there are the loop structures around the pick-up points and the drop-off points. The reason to be designed these structures is based on the application of reactive approach. All of AGVs has planning rules where horizontal movements take precedence over vertical movements on the same distance ways. That is why all of AGVs takes the same directions (see Fig.6). Therefore, AGVs can reduce the collisions on the way of transportation. This robust network is very reasonable and the evidence of the efficiency of our method.
356
R. Chiba et al. / Co-Evolutionary Design for AGV Systems
Thirdly, with the competitive co-evolution, both a robust network and an effective task-set can be generated at the same time. From Fig.7, the system values swing between low value and high value at the beginning, but, after enough evolution, the values converge at a certain value. This is the evidence that the designed flow-path network becomes robust and the package includes enough tasks. Figure 8 illustrates the same evidence, because the individual represented the most difficult task is added to the package at the beginning and no individual adds to the package after 1500th generation in this figure. From the opinion that AGV system should be robust, our method makes it possible because of the result shown above. The method is useful for the design of a robust system. 6. Conclusion The design method for an effective and robust flow-path network is proposed in this paper for an AGV system. First, a flow-path network can be designed not to be crowded against a certain task using Genetic Algorithm and the result shows in section 5. Secondly, the effective task-set for a robust AGV system can be design with GA and a packaging strategy and the results show in section 5. The size of the package is 81 and this is very compact compared with the very large number of possible tasks. Finally, both a robust network and an effective task-set can be simultaneously obtained using Competitive Co-evolution. Only by the design method at the same time, a robust flow-path network can be designed because networks depend on tasks and tasks depend on network. To evolve both with their interactions make it possible to generate the effective them. References [1] Egbelu P. J. and Tanchoco J. M. A., International Journal of Production Research, 24(5), pp1075-1097, 1986. [2] Gaskins R. J. and Tanchoco J. M. A., International Journal of Production Research, 25(5), pp667-676, 1987. [3] Kaspi M. and Tanchoco J. M. A., International Journal of Production Research, 28(6), pp1023-1030, 1990. [4] Gaskins R. J., Tanchoco J. M. A. and Taghaboni F., International Journal of Production Research, 27(1), pp91-100, 1989. [5] Riopel D., Langevin A. and Savard G., G-98-32, Les Cahiers du GERAD, 1998. [6] R. Chiba, J. Ota, and T. Arai, Proc. of Int. Conf. Intell. Robots and Systems (IROS2002), pp.1820-1825, 2002. [7] L. QIU, W. Hsu, S. Huang and H. Wang, International Journal of Production Research, 40, pp.745-760, 2002. [8] R. Chiba, J. Ota, and T. Arai, Proc. of Int. Conf. Intell. Robots and Systems (IROS2005), pp. 3164-3169, 2005. [9] K. Sims, Artificial Life IV Proceedings, ed.by Brooks & Maes, MIT Press, pp.28-39, 1994. [10] D. Floreano and S. Nolfi, In Proceedings of the second International Conference on Genetic Programming, pp.398-406, 1997. [11] M. Nerome, K. Yamada, S. Endo and H. Miyagi, SMC’97 Conference Proceedings Vol.5, pp.4418-4423, 1997.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
357
An Update Method of Computer Simulation for Evolutionary Robotics Yoshiaki KATADA a,1 and Kazuhiro OHKURA b a Faculty of Engineering, Setsunan University b Faculty of Engineering, Kobe University Abstract. One of advantages of evolutionary robotics over other approaches is its parallel population search. However, it generally takes an unrealistically long time to evaluate all candidate solutions by using a real robot. Thus, a technique of computer simulation is considered to be one of the most important topics in evolutionary robotics. Although it is quite difficult to provide a precise computer simulation model of a physical experiment, simulated robot/environment interaction dynamics must be synchronously consistent with the physical one during its evolution for reducing the gap between the simulated and physical one. In this paper, in order to overcome this problem, we propose an update method of simulated robot/environment interaction dynamics based on a statistical test during evolution, and then investigate the consistency between the performances of a robot in the simulated and physical environment. A series of experiments with a mobile robot illustrate the effectiveness of the proposed method in practice. Keywords., Simulation, Real robot, Evolutionary robotics, Embodied cognitive science
1. Introduction Embodied cognitive science[1] have attracted much research interest in recent years, where a robot must interact with a real world environment for obtaining successful and robust behaviors. One of approaches in embodied cognitive science is evolutionary robotics (ER)[2], where robot control systems are designed by using evolutionary techniques. It has been pointed out that evolutionary approaches are potentially advantageous over other approaches due to their parallel population search. However, a number of trials needed to evaluate all individual in a real environment make difficult the use of physical robots during evolution, especially in the case of tasks with human intervention. One approach to overcoming this problem would be multiplying the number of the same real robots for the evaluation of individuals [3]. However, a controller should be robust enough to work well for all robots because it may perform differently due to slight differences in the electronics and mechanics of robots, even if they are apparently identical. Addition to this, the approach may result significantly expensive due to preparing a number of robots. 1 Correspondence to: Yoshiaki KATADA, 17-8 Ikeda-nakamachi, Neyagawa, Osaka 572-8508, JAPAN. Tel./FAX: +81 728 39 9148; E-mail:
[email protected]
358
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
The other approach would be the use of simulations. Computer simulations may be helpful to reduce the amount of experimental time to evaluate individuals in a population. However, the controllers evolved in a simulated environment do not always work well in the real one because of uncertain effects, e.g., noise and differences in the electronics and mechanics of robots, as Brooks pointed out [4]. Therefore, the validity of simulations is particularly relevant problem. Some researchers claimed that the problem described above can be overcome by carefully designing simulators. Jakobi et al. claimed that the addition of noise to relevant parts for a control task in a simulated environment can reduce the discrepancy between the simulated and real environment. However, such relevant parts are different on tasks and the detections are crucially dependent on prior knowledges of an experimenter. Miglino et al. [6] developed a simulator which requires experimental and multiple samplings for constructing a detailed model of sensors and motors, and claimed that such a method can reduce the performance gap between simulated and real environments. The authors also claimed that the decrease in performance after the transfer of a system from a simulated environment to a real one can be recovered by continuing the evolutionary process in the real environment. However, the method can not adapt to unpredictable changes in a real environment during evolution due to fixed conditions of the simulation. Moreover, performances would not be always improved after the transfer of a system from a simulated environment to the real one without any consideration. An approach to overcoming this problem is to calibrate the model of robot/environment interaction dynamics during robot’s life time. Such an approach is found in [7], where a world model is learned during robots’ life time. But it is deterministic and composed of only the most recent data set of sensory inputs and motor outputs. In this paper, we propose an update method of a probabilistic model of robot/environment interaction dynamics for a mobile robot during its evolution employing a statistical test in order to maintain the validity of the model in simulations. The paper is organized as follows. Section 2 describes the procedure of the proposed method. Section 3 defines the robot control problem where the evolved neural networks are evaluated, and representation of robot/environment interaction dynamics. Section 4 gives the results of our experiments. Section 5 discusses the effect of the update method on performances of a robot after the transfer to a real environment. Conclusions are given in the last section.
2. An Update Method of a Probabilistic Model of Robot/Environment Interaction Simulated and real robot/environment interaction dynamics are denoted M (s M , aM ) and R(sR , aR ) respectively, where s is sensory inputs and a is actions. The procedure of evolutionary algorithms with an update method can be summarized as follows: 0. Create an initial population and initial model, M 0 (sM , aM ). 1. Divide individuals in a population into ones to be evaluated in a real environment and the others in a simulated one. 2. Gather data, R(s R , aR ), while robots are evaluated in a real environment. 3. Update M (sM , aM ) based on R(sR , aR ). 4. Evaluate the individuals in the simulated environment, M (s M , aM ).
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
359
5. Select the individuals according to the fitness values evaluated in the simulated and real environment, then apply genetic operations to create a new population of the same size. 6. Repeat from 1. on until the terminal conditions are satisfied. In the step 3, the Student t-test is employed in order to examine the validity of a model in a simulation; R(s R , aR ) and M (sM , aM ) are defined to be statistically identical if we fail to reject the null hypothesis that they are not significantly different. Otherwise, M (sM , aM ) is corrected repeatedly until we fail to reject it. Such a definition seems appropriate for a probabilistic model like the proposed method. The details are described in the experimental setup.
3. Experimental Setup 3.1. The Task and the Fitness Function A two-wheeled robot was used in the experiment (Figure 1). The robot’s source of sensory input comes from an omnidirectional camera. The environment of the robot was a rectangular arena 2400 × 2200 mm surrounded by walls colored black with a red target placed at the upper right corner (Figure 2). The control task used in this paper was a goal reach problem where a robot approaches to a target (goal). Each individual of a population was tested on the robot 4 times for around 40 seconds each (40 sensory-motor steps). At the beginning of each trial, a robot was always placed at the same initial position, the bottom left corner, at either an upward orientation or a rightward orientation (Figure 2). One trial ends either when the robot reaches the goal or when 40 steps are performed without the goal. The performance measure to be maxN umT rials imized was as follows: F itness = 1/N umT rials i=1 (1 − Step/M axStep) where N umT rials is the number of trials for a robot (2 trials for each initial orientation of a robot) and M axStep is set at 40. The fitness function increases as the robot reaches the goal more quickly. 3.2. Representation of Robot/Environment Interaction Dynamics 3.2.1. Simulating Motors The number of the combination of motor commands used for operating the wheels of the robot is 400 (20 motor commands are available for each wheel). Thus, a R is composed of the set of displacements of a robot for each axis in figure 3, (X R (i, j), YR (i, j), ΘR (i, j)), Omni-directional Camera
Goal 2400mm Upward
Rightward
Note PC
Figure 1. A mobile robot
2200mm Mobile Robot
Figure 2. Experimental setup for a goal reach problem
360
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
Figure 3. Displacement of a mobile robot in a simulated environment
for a motor command, (i, j) (i, j = 1, · · · , 20 for the left and right wheel). Based on these data, the set of displacements for each axis in a simulated environment, (XM (i, j), YM (i, j), ΘM (i, j)), is constructed as aM . In a probabilistic form, the displacements of a two-wheeled robot for each motor command (i, j) are represented as follows [8]: x(i, j) = N (μ(XM (i, j)), σ(XM (i, j))), y(i, j) = N (μ(YM (i, j)), σ(YM (i, j))), θ(i, j) = N (μ(ΘM (i, j)), σ(ΘM (i, j))), where N (μ, σ) is the normal distributions with mean, μ, and standard deviation, σ, derived from the sampled data set, (XM (i, j), YM (i, j), ΘM (i, j)). Thus, a position of a simulated robot at the t-th step for each axis is computed as follows: x t+1 = xt + x(i, j), yt+1 = yt + y(i, j), θt+1 = θt + θ(i, j). 3.2.2. Simulating Sensors For this experiment, an omni-directional visual sensor was used for the sensory inputs of the robot (Fig. 4). The sensory inputs of an omni-directional vision was computed as follows. The binarized image is taken from an omni-directional image, where the center of the image is identified with the center of a robot (Figure 4(a)). In the image, the direction of the center of gravity of the target to a robot and the distance between a robot and the nearest point of the target are first calculated by using the pixel informations. Next, the image is divided into I individual cells in a circumferential direction and J individual cells in a radial direction (Figure 4(b)) to detect a target by using the above two values (Figure 4(c)) and walls by using the pixel informations with a threshold (Figure 4(d)) on each cell (q ij = {0, 1}, i = 1, . . . , I;j = 1, . . . , J), where qij signifies the presence or absence of the object. If an object whose height is almost the same as the one of the robot is detected in the j-th cell, the one is also detected in the (j + 1)-th cell due to the shape of an omni-directional mirror. The walls are also detected for each cell (Figure 4(d)). In a simulation, omni-directional visual sensors were computed as well as the real ones, assuming binarized images taken from the simulated image. In this experiment, we kept sM constant during evolution, that is, the model of sensory inputs was not updated due to the reliability of omni-directional visual sensors and the precision of the model based on the characteristic of them. 3.3. An Update Method of Displacements of a Simulated Robot According to the procedure described in section 2, we explain the details to update and correct a model of displacements of a robot, a M .
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
2 3
1 4 3 2 1
16 15
Edge
Object
Input Cell
14
4
13
5
12 11
6 10
7 8
Input Cell
361
Wall
Wall
9
(a) Camera image
(b) Binarized image
(c) Target detection
(d) Wall detection
Figure 4. Omni-directional image plane
For measuring the positions of a real robot during evaluations of it, a CCD camera was positioned on the top of the testing environment. Every 0.5 seconds, the displacements are computed as follows: x R (i, j) = xt+1 (i, j) − xt (i, j), yR (i, j) = yt+1 (i, j) − yt (i, j), θR (i, j) = θt+1 (i, j) − θt (i, j). After the evaluations, all the data set in (XR (i, j), YR (i, j), ΘR (i, j)) are add to (XM (i, j), YM (i, j), ΘM (i, j)) as well as the data set before this 500 generations in it are deleted to keep the data set fresh. In order to examine the validity of a M , the Student t-test was conducted with the null hypotheses, μ(X M (i, j)) = μ(XR (i, j)), μ(YM (i, j)) = μ(YR (i, j)) and μ(ΘM (i, j)) = μ(ΘR (i, j)). If the hypotheses are rejected, that is, there are discrepancies between aM and aR , then XM (i, j), YM (i, j), ΘM (i, j) are corrected, respectively. For instance, if μ(X M (i, j)) < μ(XR (i, j)) and the hypothesis is rejected with a significance level, α, then the smallest data in X M (i, j) is removed to reduce the discrepancy between them. These corrections are repeated until when all the hypotheses are ac cepted. Based on the corrected data set, (X M (i, j), YM (i, j), ΘM (i, j)), displacements of a simulated robot are computed as follows: x(i, j) = N (μ(X M (i, j)), σ(XM (i, j))), y(i, j) = N (μ(YM (i, j)), σ(YM (i, j))), θ(i, j) = N (μ(ΘM (i, j)), σ(ΘM (i, j))). In the experiments carried out in the next section, we take α = 0.05. For constructing initial aM0 , 10 data set are sampled for each motor command in advance. a M is updated after evaluations of real robots equipped with the best and second controller every 50 generations. 3.4. Simulation Conditions For this experiment, an agent’s controller was pulsed neural networks (PNN) with 128 sensory neurons corresponding to the number of cells I × J in Figure 4(c) and 4(d) for detecting a target and walls, 2 fully interconnected motor neurons and a fully interconnected hidden neuron. The network’s connection weights and the firing threshold for each neuron were genetically encoded and evolved. The total number of parameters was 396. The parameters were mapped linearly onto the following ranges: connection weights, ω ∈ [−1.0, 1.0], and thresholds, θ ∈ [0.0, 3.9]. The parameters of the neurons and synapses were set as follows: τ m = 4, τs = 10, Δax = 2 for all neurons and all synapses in the network following the recommendations given in [9]. The experiment was conducted using populations of size 50. Each individual was encoded as a binary string with 10 bits for each parameter. Therefore, the total length of the genotype was L = 3960. An extended GA, which is called the operon-GA (OGA)[10], were employed
362
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
to evolve the PNN parameters 1 . The OGA uses standard bit mutation and five additional genetic operators: connection, division, duplication, deletion and inversion. The probabilities for genetic operations were set at 0.3 for connection and division, 0.6 for duplication and 0.3 for deletion and inversion, based on our previous results in [11,12]. The length of the value list in a locus was 6. The per-bit mutation rate, q, was set at 1/L = 0.00025. Crossover was not used for the OGA, following Nimwegen’s suggestion [13]. Tournament selection was adopted. Elitism was optionally applied. The tournament size was set at 6 because the OGA prefers high selection pressure. A generational model was used. Each run lasted 2, 000 generations. The individuals in a population were divided into two groups before evaluations in a real environment as described in section 2; Two individual (the best and second one) of the population to be evaluated in a real environment and the others in a simulated one. 4. Experimental Results Figure 5 shows the maximum and average fitness at each generation for the OGA. The maximum fitness increased in the early generations without being trapped on local optima although the fluctuations are large in the middle generations. In order to verify what happens at the level of behaviors, we compared the trajectories of the best evolved individual of 2, 000 generations in the simulated and in the real environment. Figure 6 and 7 show the typical behaviors from the initial orientation of the best evolved robot. In the simulated environment (Figure 6), the robot at the initial rightward orientation approached to the goal turning left to direct toward it. This trajectory matched almost perfectly the one in the real environment (Figure 7). It demonstrates that the proposed update method reduces very significantly the discrepancy between behaviors in the simulated and real environment. Due to space restrictions we only show results for the behavior at the initial rightward orientation. However, the discrepancy between behaviors in the simulated and real environment was also reduced at the initial upward orientation. 5. Discussion In order to investigate the validity of the proposed update method, we compared the performance of the robot controller evolved in both a simulated and real environment with 1 In
the preliminary runs, it has been confirmed that the OGA outperformed the standard GA in this problem.
Fitness
1.0
Max Average
Goal Mobile Robot
0.5
0.0 0
500
1000
1500
2000
Generation
Figure 5. Maximum and average fitness at each gen- Figure 6. Behavior of the best evolved robot in the simulated environment eration for the OGA
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
363
Goal
Start
1
2
3
4
Figure 7. Behavior of the best evolved robot in the real environment Table 1. Success Rate(%) for Each Method Initial orientation
Upward
Rightward
Without update
10
0
Total 5
With update
40
100
70
the updates, the method with update, with the performance of the evolved one only in a simulated environment without any update, the method without update. An additional experiment was therefore conducted. We evolved robot controllers only in a simulated environment without any update in 2, 000 generations and then transfered the obtained control system to the real environment. The initial data set, a M0 , were used until the last generation in the method without update. The representation of robot/environment interaction dynamics in the method without update was the same as the one in the method with update. It is likely that the controller evolved only in a simulated environment is not robust enough to work well when transfered to the real environment because it may have discrepancies between the simulated and real environment. Table 1 shows the rates achieving the task in the 10 trials for the controllers evolved by the method with and without update. As predicted, the degradations in performance after the transfer of the system from the simulated environment to the real one are observed in the controllers evolved only in the simulated environment. Moreover, the controller evolved by the method performed better with update than without it for both initial orientation of a robot. This result indicates the effectiveness of the proposed method.
6. Conclusions In this work, we proposed an update method of simulated robot/environment interaction dynamics for an evolutionary mobile robot during its evolution employing a statistical test in order to maintain the validity of a model in the simulation. Our results can be summarized as follows: • An update method of a probabilistic model of robot/environment interaction dynamics was designed to reduce the discrepancy between the simulated and real environment. • The proposed method successfully reduced the gap between performances observed in a simulated environment and performances obtained in the real environment. • In order to investigate the benefits of the proposed method, the performance of the best evolved robot after the transfer to the real environment is compared with the
364
Y. Katada and K. Ohkura / An Update Method of Computer Simulation for Evolutionary Robotics
one by the method employing only simulation. The proposed method performed better than the method employing only simulation. These results suggest some guidelines for reducing the gap between performances observed in simulated environment and performances obtained in the real environment for evolved robots. Future work will investigate how well these updating guidelines apply to more complex experiments to be performed, e.g., updating a model of sensory inputs and humanoid robots.
References [1] R. Pfeifer. and C. Scheier, Understanding Intelligence, MIT Press, Cambridge, 1999. [2] S. Nolfi and D. Floreano, Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines, MIT Press, 2000. [3] R. A. Watson, S. G. Ficici and J. B. Pollack, Embodied Evolution: Embodying an Evolutionary Algorithm in a Population of Robots, In Proceedings of Congress on Evolutionary Computation, pp.335-342, 1999. [4] R. A. Brooks, Artificial Life and Real Robots, In Proceedings of the First European Conference on Artificial Life, pp.3-10, 1992. [5] N. Jakobi, Half-baked Ad-hoc and Noisy: Minimal Simulation for Evolutionary Robotics, In Proceedings of the Fourth European Conference on Artificial Life, pp.348-357, 1997. [6] O. Miglino, H. H. Lund and D. Nolfi, Evolving Mobile Robots in Simulated and Real Environments, Artificial Life 2, pp.417-434, 1995. [7] D. Keymeulen, M. Iwata, K. Konaka, R. Suzuki, Y. Kuniyoshi and T. Higuchi, Off-line Modefree and On-line Model-based Evolution for Tracking Navigation Using Evolvable Hardware, In Proceedings of the First European Workshop on Evolutionary Robotics, Springer-Verlag, Paris, 1998. [8] K. Komoriya, E. Oyama and K. Tani, Planning of Landmark Measurement for the Navigation of a Mobile Robot, Journal of the Robotics Society of Japan, Vol. 11, No. 4, pp.533-540, 1993 (in Japanese). [9] D. Floreano, C. Mattiussi, Evolution of Spiking Neural Controllers, In Gomi, T. (ed.): Evolutionary Robotics: From Intelligent Robots to Artificial Life (ER’01), AAI Books, SpringerVerlag, pp. 38-61, 2001. [10] K. Ohkura, K, K. Ueda, Adaptation in Dynamic Environment by Using GA with Neutral Mutations, International Journal of Smart Engineering System Design, 2, pp.17-31, 1999. [11] Y. Katada, K. Ohkura, K. Ueda, Tuning Genetic Algorithms for Problems Including Neutral Networks -A More Complex Case: The Terraced NK Problem-, In Proceedings of the 7th Joint Conference on Information Sciences, pp.1661-1664, 2003. [12] Y. Katada, K. Ohkura, K. Ueda, An Approach to Evolutionary Robotics Using the Genetic Algorithm with Variable Mutation Rate Strategy, In Proceedings of The 8th Parallel Problem Solving from Nature (PPSN VIII), pp.952-961, 2004. [13] E. Nimwegen,J. Crutchfield, M. Mitchell, Statistical Dynamics of the Royal Road Genetic Algorithm, Theoretical Computer Science, Vol. 229, No. 1, pp.41-102, 1999
365
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Vision-based Teleoperation of A Mobile Robot with Visual Assistance Naoyuki KUBOTAa,b, Daisuke KOUDUa, Shinichi KAMIJIMAc, Kazuhiko TANIGUCHId and Yasutsugu NOGAWAd a Tokyo Metropolitan University, Japan b PRESTO, Japan Science and Technology Agency (JST) c University of Fukui, Japan d Kinden Corporation, Kyoto R and D Center
Abstract. This paper discusses a vision-based teleoperation system for humanfriendly interface of a mobile robot used under office automation floors. We proposed a navigation method based on human simple commands. Furthermore, we proposed a perceptual system for estimating the self-location and action system for controlling the posture of the robot. This paper proposes a human-friendly information display system to improve the operability of the robot navigation. Finally, we show several experimental results of the proposed method. Keyword. computational intelligence, teleoperation, human assistant
mobile
robot,
visual
perception,
1. Introduction Recently, various types of robots have been developed to support human in dangerous areas. Human-robot collaboration has been discussed from various kinds of viewpoints such as human assistance, teleoperation, and cooperative behaviors [1]. The cooperative behavior emphasizes the importance of interaction between the robot and human [2], the teleoperation emphasizes the importance of role assignment between the robot and human. However, in autonomous robot, the robot does not need to follow instruction completely from human. When instruction is not required, it should carry out based on the instruction information stored by the interaction with human, and should act autonomously [3]. In the teleoperation, the human operability should be improved. There are two approaches for improving human operability. One is to reduce the human actual operations. The robot should move suitable to human simple commands. This requires the interpretation mechanism of the human command according to the current situation. The other is to display the situation to a human operator as real as possible. In order to integrate these two approaches, the touch panel display is very suitable to navigate a robot in the teleoperation. In the paper, we focus on a teleoperation system based on the touch panel display for a mobile robot working under floors. Office automation (OA) floors are designed to wire various cables efficiently under floors. However, several panels must be removed when a cable is added to a network. This task takes much time and effort. We develop a prototype of a mobile robot working under OA floor, and used
366
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
a CCD camera as vision sensor to share the situation perception between a human and the robot as much as possible. The robot detects poles from the images, estimates the self-location, and decides its moving direction. Various intelligent methodologies have been successfully applied to pattern recognition and motion planning problems [1,4-8]. First, k-means algorithm is applied to extract possible areas of poles. Next, a steady-state genetic algorithm (SSGA) [9] is applied to detect poles from images by using a template matching method. Furthermore, we propose a method for estimating the self-location based on a series of visual perception, and teleoperating method of the robot.
2. Perceptions and Action for A Vision-based Mobile Robot 2.1 An Experimental Mobile Robot Figure 1 shows the mobile robot. The size of this mobile robot is 300uu120 [mm]. Human operator input commands to computer in hand, and commands are send to mobile robot through a Wireless LAN. In the same way, the vision of mobile robot is sent to the computer in hand by a Wireless LAN (Figure 2). The mobile robot has crawler with two DC motors, which right-and-left independence. Therefore, the robot can go straight, turn and spin.
CCD Camera Figure 1. Mobile Robot.
Human Operator
Mobile Robot Command Input
Display Computer
Wireless LAN
Motor Controller
Host Computer
CCD Camera
Figure 2. Control architecture.
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
367
Figure 3 shows an example of the OA floor from the viewpoint of the mobile robot. There are poles at intervals of 500 [mm], and one block is defined as the square composed of four poles. The robot must avoid collision with poles. Furthermore, there may exist some cables on a floor. Therefore, the robot must detour or climb over the cables.
Figure 3. An example of the space under an OA floor.
2.2 Pole Detection based on Image Processing After the taken image is transformed into the HSV color space, the color corresponding to poles is extracted by using thresholds. Next, k-means algorithm is used for extracting the candidate positions of the poles. The k-means algorithm is one of most popular iterative descent clustering methods [10]. The inputs to k-means algorithm are the position (vj=(xj,yj), j=1,2,…,m) of pixels corresponding to pole colors where m is the number of pixels. When ri represents reference vector of ith cluster, euclidean distance (di,j) of between reference vector ri and jth input vector is shown in the following equation, di , j
v j ri
(1)
Next, the shortest euclidean distance from a reference vector is chosen by the following formula, cj
^
arg min v j ri
`
(2)
The SSGA simulates the continuous model of the generation, which eliminates and generates a few individuals in a generation (iteration). A candidate solution (individual) is composed of numerical parameters of the position and size of a pole. We use the central position of a pole (gi,1, gٛ i,2) in an image, and the magnifying rate of the pole (gٛ i,3), because the ratio of width to height of the pole is known. Therefore, we use the following adaptive mutation,
§ ©
g i , j m gi , j ¨ D j
fit max fit i fit max fit min
· ¹
E i ¸ N (0,1)
(3)
where fiti is the fitness value of the ith individual, fitmax and fitmin are the maximum and minimum of fitness values in the population; N(0,1) indicate a normal random value; Dj
368
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
and Ej are the coefficient and offset, respectively [11,12]. Fitness value is calculated by the following equation, fit i
p1 C Edge p2 C Pl
p3 COther
(4)
where p1, p2, p3 are coefficients for penalty. CEdge is number of pixels of vertical direction edge. CPl and COther indicate the number of pixels of the color corresponding to a pole and other colors, respectively. Therefore, this problem results in the maximization problem. 2.3 Posture Control and Visualization for Teleoperation The robot can estimate the posture based on the view from the robot in the grid space composed of poles. The relative angle to the grid space is approximately represented as state space. Figure 4 shows an example of the image from the robot and the picture from the outside. The view from the robot is changeable according to the posture of the robot. Therefore, we divide the image into five segments (R1~R5) shown as Figure 4 (b). The robot perceives the posture by the combination of 0 and 1 where sRi=1 if the segment Ri includes poles detected by SSGA. Consequently, the state of this example is represented by (sR1, sR2, sR3, sR4, sR5)=(0, 0, 1, 0, 0). By tracing the state transition, the robot can recognize the achievement of one block move without deadreckoning. In this way, the robot can go straight and turn right or left in the grid space. Furthermore, the change of the view from the robot is limited in the time series of images according to the movement of the robot except moving objects. This indicates the candidate solutions at the next image can inherit the searching results of SSGA from the previous candidate solutions. Therefore, the half candidate solutions are replaced with new candidate solutions generated by using the clustering results of k-means algorithm.
An original image
#PQTKIKPCNKOCIG
An extracted image
(a)The posture of the robot.
(b) The view from robot. Figure 4. State recognition.
Furthermore, the attention range can be restricted, because the action output generates the next view with the dynamics of the environment. The robot makes decision according to the state transition by using simple if-then rules. After taking one behavior from one block move, left turn, or right turn behaviors, the robot corrects the self posture into the correct position. Thus, robot moves toward to target point along the given path. A human operator performs teleoperation using the image sent from the robot. Because the image might be noisy, the useful information such as poles and cables should be displayed to the human operator. Here color cables are extracted from the
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
369
original image. According to the extracted image, the human operator can send metalevel commands to the robot. In this operation panel, the robot is controlled by clicked time on a button. 2.4 Human interface We use a touch panel display to improve the operability in the teleoperation. In the teleoperation, a human interface is one of important roles in connecting workspace and a human operator because the information that the human operator can recognize is restricted in the remote area. The operating methods of general computer are mainly keyboard and mouse, but touching operation is more direct and easier for a human to operate it. Figure 5 shows a snapshot of the human operation by using a small touch panel display. Figure 6 shows an example of human interface used in the touch panel display. When the human touches on the display, the system extracts surrounding area of the touched point, and shows the extended image on the screen. In this way, the system easily shows the area that the human pays attention to.
Figure 5. Touch panel display used for human interface.
An original image
An original image superimposed extracted image (a) The normal display. (b) The screen which specified the attention area. Figure 6. Zooming function of human interface.
3. Experiments This section shows an experimental result of the vision-based mobile robot. Figure 8 shows an example of experiment. The number of individuals in SSGA is 60. The number of clusters (k) is 10. Figure 7 (b) shows experimental result of detecting poles by SSGA. The robot detects poles required to recognize the state transition. We conducted the experiment on the teleoperation. Figure 8 shows the snapshots of the attention area displayed to the human operator. It is very difficult to detect cables
370
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
when the robot turns right or left, because the change of images is very fast. However, in this experimental result, the attention area is represented by the white rectangle. This is very easy for the human operator to check the possible area. We conducted the experiment on the teleoperation. (a) An original image
(b) Extracted image for teleoperation
(c) Operating command Panel Figure 7. Display for teleoperation to a human operation.
(a) (b) (c) (d) Figure 8. The snapshots of the attention area displayed to the human operator.
Next, we show an experiment to extract human operating patterns. The relationship between the state transition and the human operation is extracted when the human detects the poles and the cable in the manual operation. Figure 9 shows a human operating path in an experimental environment. The tasks of the robot are "2-block Forward" with getting over a cable, "90-degree left wheel", "2-block forward", "90degree right wheel", and "1-block forward". Figure 10 shows the change of the number of pixels corresponding to cables in the camera image over time, the number of pixels corresponding to poles in each area where the image was divided into five regions, and the human operating commands. The operation commands are Neutral, Forward, Rightturn, and Left-turn are expressed as the values of 900, 800, 700, and 600, respectively. In this graph, the operator judges that posture is in the correct position when there are poles in area 1 and 5. Moreover, when there are poles in area 2, 3, or 4, the operator judges that posture is not in the correct position. In this experimental result, the threshold for correcting the posture of the robot is 150 pixels.
Figure 9. The root of the teleoperation experiment.
N. Kubota et al. / Vision-Based Teleoperation of a Mobile Robot with Visual Assistance
371
Figure 10. State transition in the teleoperation.
In this way, this threshold value is used for posture correction in semi-autonomous control of the robot.
4. Summary This paper applied a steady-state genetic algorithm based on the time-series of images for a vision-based robot working under OA floors. We proposed a teleoperation system based on the visualization of useful information. The experimental results show that the proposed method can navigate the robot to the target point along a given path, and pay useful attention to the operator to perceive the situation. Furthermore, we propose more easily command input human interface. Future works Cooperation work by multiple robot Semi-autonomous control Miniaturization of the mobile robot
References [1] [2]
R.Pfeifer and C.Scheier : Understanding Intelligence, The MIT Press (1999). N. Kubota, S. Kamijima, Fuzzy Control For A Vision-Based Soccer, The 5th International Conference on Simulated Evolution And Learning (2004), p.337. [3] D. Katagami, S. Yamada, Interactive Evolutionary Computation for Real Robot from Different Viewpoints of Observation, System Control Information Vol. 16, No. 6 (2003), pp. 279-286. [4] J.-S.R.Jang, C.-T.Sun, E.Mizutani, Neuro-Fuzzy and Soft Computing, Prentice-Hall, Inc (1997). [5] D.B.Fogel, Evolutionary computation, IEEE Press (1995). [6] S.J.Russell and P.Norvig, Artificial Intelligence, Prentice-Hall, Inc (1995). [7] R.S.Sutton, A.G.Barto, Reinforcement Learning, The MIT Press, (1998). [8] R.A.Brooks, Cambrian Intelligence, The MIT Press, (1999). [9] T.Hastie, R.Tibshirani, J.Friedman, The Elements of Statistical Learning, Springer-Verlag (2001). [10] T.Kohonen, Self-Organization and Associative Memory, Springer (1984). [11] N.Kubota, S.Kamijima, K.Taniguchi, Teleoperation on of A Mobile Robot under Office Automation Floors with Visual Assistance, The 3rd International Symposium on Autonomous Minirobots for Research and Edutainment (2005), pp.66-72. [12] N.Kubota, S.Kamijima, K.Taniguchi, Teleoperation Assistance based on Visual State Transition for Vision-based Mobile Robot, Fuzzy, The 15th Intelligence System Symposium (2005), pp.403-406.
This page intentionally left blank
Part 7 Adaptation
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
375
Adaptive Control Strategy for Micro/Nano Manipulation Systems Hwee Choo Liaw a Denny Oetomo a Bijan Shirinzadeh a and Gursel Alici b a Robotics and Mechatronics Research Laboratory Department of Mechanical Engineering Monash University, Clayton, VIC 3800, Australia {hwee.liaw, denny.oetomo, bijan.shirinzadeh}@eng.monash.edu.au b School of Mechanical, Materials, and Mechatronic Engineering University of Wollongong, NSW 2522, Australia
[email protected] Abstract. Micro/nano manipulation has been identified as one of the key enabling technologies for many emerging areas of application. Within this scope, piezoelectric actuators have played a major role in achieving the required nano-resolution motion. This paper proposes an adaptive control strategy for the manipulation systems driven by piezoelectric actuators to follow specified motion trajectories. This is motivated by a search for an effective control methodology to deal with the problem of parametric uncertainties such as disturbance and hysteresis effects. The proposed adaptive strategy is formulated by introducing a parameter compensator for a system to drive its position tracking error to converge to zero. The fundamental concept lies in the properties of a quasi-natural potential function, which allows a saturated position error function in the control formulation. Implementation of the control law requires only a knowledge of the initial estimate of the system parameters. Feasibility study of the control strategy for a micro/nano manipulation system is presented. Simulations are performed to validate the suitability of the proposed control methodology. Keywords. Micro/Nano Manipulation, Piezoelectric Actuactor, Adaptive, Control
1. Introduction Micro/nano manipulation has been identified as one of the key enabling technologies for many research areas such as biomedical engineering, micro manufacturing and assembly, and micro surgery. In achieving these high-precision tasks, piezoelectric actuation systems are employed due to their high stiffness, fast response, and physically unlimited resolution. In recent years, advancements in piezoelectric actuator (PEA) designs and sensing devices such as laser interferometer encoders, capacitive sensors, strain gauges, and linear variable displacement transducers, and developments of flexure-based mechanisms [1] have enabled progress in the micro/nano positioning and manipulation. One major drawback of the PEAs is the presence of highly nonlinear hysteretic behaviour between the input (applied) voltage and the output displacement. This prevents the PEA from providing the desired high-precision motion resolution and accuracy. Re-
376
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
(a)
(b)
ρ(θ) 6
−γ
−γ
0
-
γ θ
s(θ) 6 so
-
0
γ θ −so
Figure 1. (a) Quasi-natural potential ρ(θ), and (b) its derivative s(θ)
search has been conducted in this area to model and compensate for the hysteresis effect. Some examples include the modelling of physical hysteresis [2] and a comprehensive voltage-input electromechanical model [3]. On the other hand, appropriate control strategies can be formulated to take the nonlinearity into account to achieve high-precision positioning of the PEA systems. Recent examples include an adaptive control using back-stepping approach [4] and a nonlinear observer-based variable structure control [5]. These control strategies are formulated mainly for specific applications. In this paper, an adaptive control strategy is proposed. This is motivated by the presence of nonlinear behaviour in the PEA system, which makes the exact parameter values of the model difficult to identify. This approach employs the idea of quasi-natural potential function [6], which leads to saturated position error feedback and ensures stability of the control strategy. In the scheme, a parameter compensator is introduced to adjust the control signal to accommodate the unknown system parameters and uncertainties due to disturbances including hysteresis effect. The stability of the control law is proven theoretically and the controller is able to steer a system to closely track a desired motion trajectory in position, velocity, and acceleration. Implementation of the control system requires only a knowledge of the estimated system parameters. Simulation conducted has validated the feasibility of the proposed control approach. This paper is organised as follows. The properties of a quasi-natural potential function are introduced in Section 2. The formulation of the proposed adaptive control strategy is presented in Section 3 followed by the stability analysis in Section 4. The simulation study is detailed in Section 5 and the results are presented and discussed in Section 6. Finally, conclusions are drawn in Section 7.
2. Quasi-natural Potential Function Consider a quasi-natural potential function ρ(θ), as shown in Figure 1, for the purpose of formulating the control strategy. This potential function is assumed to have the following conditions [6]: 1. ρ(θ) > 0 for θ = 0 and ρ(0) = 0; 2. ρ(θ) is twice continuously differentiable, and the derivative, s(θ) = dρ(θ)/dθ, is strictly increasing in θ for −γ < θ < γ, and saturated for |θ| ≥ γ, i.e. s(θ) = ±s o for θ ≥ γ and θ ≤ −γ, respectively; 3. there are constants c 1 > 0, c2 > 0 such that for θ = 0
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
377
ρ(θ) ≥ c1 s2 (θ) > 0,
(1)
θ s(θ) ≥ c2 s (θ) > 0.
(2)
2
Additional properties can also be observed: 4. the second derivative, (θ) = ds(θ)/dθ, is such that (θ) ≥ 0 ∀ θ; ˙ will lead to s(θ) 5. the rate of change of s(θ), s(θ) ˙ = (θ) θ, ˙ θ˙ = (θ) θ˙2 ≥ 0; + + 6. there are constants c 3 > 0, c3 > 0, c3 > c3 , such that for (θ) = 0 and θ˙ = 0 ˙2 c+ ˙ θ˙ ≥ c3 θ˙2 > 0. 3 θ ≥ s(θ)
(3)
3. Adaptive Control Strategy The control problem of tracking a desired motion trajectory x d (t) can be formulated by designing an adaptive control strategy for the PEA system described by [7]: mx ¨ + b x˙ + k x + vh + fe = vin ,
(4)
where m, b, and k are the effective mass, damping, and stiffness, respectively, x is the PEA displacement, vh is the voltage due to hysteresis, f e is the force imposed by the external mechanical load, and v in represents the applied voltage. Under the proposed control approach, the physical parameters of the system in (4) are assumed to be unknown or uncertain. The x d (t) is assumed to be twice continuously differentiable and both x˙ d (t) and x ¨d (t) are bounded and uniformly continuous in t ∈ [0, ∞). A parameter compensator is employed in the control law such that the closed loop system will follow the required trajectory x d (t). To derive the control law, the PEA model of (4) is re-written in terms of a set of physical parameters, ϕ = [m, b, k, v h ]T , and the PEA model becomes xT ϕ + fe = vin ,
(5)
where x = [¨ x, x, ˙ x, 1]T . A set of estimated parameters ϕ ˆ is defined as ϕ ˆ = ˆ vˆh ]T , such that a parameter compensator ϕ(t) ˆ is introduced to continuously [m, ˆ ˆb, k, update the control system based on the following equation ϕ(t) ˆ = ϕ(0) ˆ −
t
K −1 xd y(τ )dτ,
(6)
o
where ϕ(0) ˆ is the initial estimate of ϕ(t) at t = 0, K is a 4 × 4 constant positive definite xd , x˙ d , xd , 1]T , and y(t) is defined as diagonal matrix, x d = [¨ y = e˙ p + α s(ep ),
(7)
where ep (t) = x(t) − xd (t), α is a positive scalar, and s(e p ) is the saturated position error function, where s(e p ) = dρ(ep )/dep , described in Section 2. An estimated control signal vˆin can therefore be established by
378
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
vˆin = xTd ϕ. ˆ
(8)
The proposed adaptive control input is given as vin = −kp ep − kv e˙ p + vˆin + fe ,
(9)
where kp and kv are the proportional and derivative gains, respectively.
4. Stability Analysis To study the closed loop stability under the proposed control strategy, the dynamics of the control system must be examined. For this reason, the error dynamics of the system will be derived and used in the stability analysis of the closed loop system. To examine the closed loop dynamics of the system, the control input (9) is substituted into the PEA model (5) and using (8) to eliminate vˆ in , ˆ xT ϕ = −kp ep − kv e˙ p + xTd ϕ.
(10)
An estimated control error Δv due to the parameter compensator is defined as ˆ Δv = xTd Δϕ = xTe ϕ − xT ϕ + xTd ϕ,
(11)
ep , e˙ p , ep , 0]T . Noting the closed loop where Δϕ = ϕ ˆ − ϕ and x e = x − xd = [¨ dynamics in (10), the estimated control error (11) becomes Δv = xTe ϕ + kp ep + kv e˙ p ,
(12)
which is the error dynamics of the closed loop system. For the purpose of stability analysis, the error dynamics in (12) is multiplied by (7), Δv y = [xTe ϕ + kp ep + kv e˙ p ] [e˙ p + α s(ep )] =
du1 +w, dt
(13)
where u1 =
1 1 m e˙ 2p + α (b + kv ) ρ(ep ) + (k + kp ) e2p + α m s(ep ) e˙ p , 2 2
w = (b + kv ) e˙ 2p + α (k + kp ) ep s(ep ) − α m s(e ˙ p ) e˙ p ,
(14) (15)
while ρ(ep ) is the quasi-natural potential function described in Section 2. Re-writing u 1 in (14) as 1 1 1 m [e˙ p + 2α s(ep )]2 + m e˙ 2p − α2 m s2 (ep ) + α (b + kv ) ρ(ep ) + (k + kp ) e2p , 4 4 2 (16) and using (1) for ρ(e p ) in (16),
u1 =
u1 ≥
1 1 1 m [e˙ p +2α s(ep )]2 + m e˙ 2p +α [−α m+c1 (b+kv )] s2 (ep )+ (k+kp ) e2p . (17) 4 4 2
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
379
Thus, u1 is always positive if the control gains, k p and kv , in (9) are chosen as kp > −k
and kv >
αm − b. c1
(18)
Using (2) and (3) for e p s(ep ) and s(e ˙ p ) e˙ p , respectively, in (15), the constants c 2 and c+ 3 are selected in such a way that 2 w ≥ (b + kv ) e˙ 2p + c2 α (k + kp ) s2 (ep ) − c+ 3 α m e˙ p , 2 2 ≥ (b + kv − c+ 3 α m) e˙ p + c2 α (k + kp ) s (ep ).
(19)
To ensure a positive w, the control gains in (9) must be selected so that kp > −k
and kv > c+ 3 α m − b.
(20)
It is possible that the left-hand side of (13) can be related to a positive function. Consider a positive function u 2 given as u2 =
1 ΔϕT K Δϕ. 2
(21)
Differentiating u 2 with respect to time yields du2 = ΔϕT K Δϕ. ˙ dt
(22)
Due to the fact that the parameters ϕ of system (5) are time-invariant, i.e. ϕ˙ = 0, Δϕ˙ = ϕ ˆ˙ − ϕ˙ = ϕ, ˆ˙ the term ϕ ˆ˙ can be obtained from (6) as ϕ ˆ˙ = −K −1 xd y.
(23)
du2 = ΔϕT K [−K −1 xd y] = −[xTd Δϕ]T y = −Δv y, dt
(24)
Substituting (23) into (22),
where the scalar Δv is defined in (11). Theorem. For the PEA system described by (4), the adaptive control law (9) and the parameter compensator (6) assure the convergence of motion trajectory tracking with ep (t) → 0 and e˙ p (t) → 0 as t → ∞ under the conditions of (18) and (20). Proof. In the closed loop system formed by the system described by (4), the control law (9), and the parameter compensator (6), the functions u 1 and w from (14) and (15), respectively, are always positive in e p (t) and e˙ p (t) under the conditions of (18) and (20). Furthermore, the function u 2 is chosen to be positive in (21). A continuous and nonnegative Lyapunov function u can therefore be proposed as u = u1 + u2 .
(25)
380
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
The time derivative of u can be obtained by combining (13) and (24) as d (u1 + u2 ) = −w, dt
(26)
which shows that u → 0 and implies e p (t) → 0 and e˙ p (t) → 0 as t → ∞. Both system stability and tracking convergence are guaranteed by the control law (9) and the parameter compensator (6) driving the system (4) to follow the desired motion trajectory xd (t). 5. Simulation Study In the process of developing a PEA system for micro/nano manipulation, computer simulation is developed to study the feasibility of the proposed control strategy. The simulation model is constructed using the PEA model (4) and the adaptive control law (9). This is conducted in Matlab-Simulink [8] as shown in Figure 2. The objective of the simulation is to study the effectiveness of the proposed control algorithm such that the closed loop method will track a desired motion trajectory. In the simulation, the proposed desired motion trajectory is shown in Figure 3 for position, velocity, and acceleration. For a more realistic study, a model of hysteresis v h is added in the simulation as shown in Figure 2 and the v h is approximated by vhf (t) = (xd − (5 × 10−6 + 0.05 xd − 120000 x2d + 4.9 × 109 x3d )) × 106 (V ), vhb (t) = (xd − (5 × 10−6 + 6 xd − 318300 x2d + 4.9 × 109 x3d )) × 106 (V ), (27) where vhf and vhb are the voltage-drops representing the hysteresis effect when the PEA is moving forward or backward, respectively. The resulting effect of the hysteresis model is shown in Figure 4. f(u) From File: Desired Motion Trajectory DesTrajectory.mat
From: Des Pos [A]
[T] Goto: Time
Clock
Forward Hysteresis
Demux
[B] Goto: Des Vel [A] Goto: Des Pos
Product
Gain: ŦinvKm
Backward Hysteresis
From: Time.. [T]
From: Des Vel [B]
1 s
ŦKŦ
Product
[H] Goto: vh
Product
Int 2
Gain: ŦinvKb
From: Des Pos [A]
From: Des Pos [A]
Product
XY Graph2 [F] Goto: Act Acc
Model: 1/m
Product
Int 3
From: Time . [T]
XY Graph3
1 s
1 s
Output: Velocity
Output: Displacement
1/1
1 s
Gain: ŦinvKk
15
From: y [K]
1 s
ŦKŦ
[E] Goto: Act Vel
Bias
ŦKŦ
Int 4
Gain: ŦinvKvh
Model: b
[J] Goto: Vel Err
ŦKŦ From: Vel Err [J]
1000 [I] Goto: Pos Err
From: Pos Err [I]
Gain: Kv 4*10^6
[T] From: Time. [T]
DP
[A]
DV
[B]
DA
[C]
AP
[D]
PosSatFun
AV
[E]
Goto: Act Pos [D]
Model: k [T]
[K] Goto: y
SŦFunction
Gain: Kp
From: Time
XY Graph
Hysteresis
[G] Goto: vin
Product
Int 1
From: Des Vel [B]
ŦKŦ
From: Act Vel [E]
Sign Switch
From: Des ACC [C]
1 s
ŦKŦ [C] Goto: Des Acc
From: Des Vel [B]
f(u)
From: Time
XY Graph4
AA
[F]
VIN
[G]
VH
[H]
PE
[I]
VE
XY Graph1 To File: Simulated Results for Analysis PiezoControlSim.mat
Figure 2. Simulation of piezoelectric actuation system in Matlab-Simulink
[J]
Y
[K]
381
Acc (m/s2)
60
40
20
Forward Hysteresis Backward Hysteresis
40
0 0 2
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 Ŧ2 0 0.1
0.05
0.1
0.15
0.2
0.25
0.3
0.35
20 0 Ŧ20 Ŧ40
0 Ŧ0.1 0
Hysteresis vh (V)
Vel (mm/s)
Pos (μm)
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
0.05
0.1
0.15 0.2 Time (s)
0.25
0.3
Figure 3. Desired motion trajectory
0.35
Ŧ60 0
5
10 15 20 Desired Position ( μm)
25
30
Figure 4. Hysteresis model
It is assumed that no external force is applied to the system and the term f e is ignored in both (4) and (9). The PEA model (4) is simulated to possess the dynamic coefficients of m = 1 (V s2 /m), b = 2.5 × 103 (V s/m), and k = 2.0 × 106 (V /m). The initial estimate ϕ(0) ˆ in (6) is chosen to be zero. The control gains, k p and kv , in (9) are tuned as kp = 4 × 106 (V /m) and kv = 1 × 103 (V s/m). Furthermore, the diagonal constant matrix K in (6) is selected as K −1 = 2 × 108 diag{1, 1, 1, 1}, where the units of K −1 are (V s4 /m3 ), (V s2 /m3 ), (V /m3 ), and (V /m), respectively. The positive scalar α in (7) is set at α = 1 (1/s), and the saturated position error function s(e p ) in (7) is implemented in the S-function of Simulink as ⎧ ⎪ : ep > es , ⎨e s π ep (28) s(ep ) = es sin 2 es : −es ≤ ep ≤ es , ⎪ ⎩ −es : ep < −es , where es is the specified position error, which is chosen as e s = 1 (μm). 6. Results and Discussion Given the desired motion trajectory as shown in Figure 3, the PEA was commanded to travel in a range of 30 μm with maximum velocity and acceleration reaching 1.12 mm/s and 0.07 m/s2, respectively. The resulting PEA position and the estimated velocity are shown in Figure 5. Despite system uncertainties and zero initial conditions for the parameter compensator (6), the adaptive control law (9) was shown to be stable with the control input to the PEA as shown in Figure 5. The hysteresis effect and position tracking error of the experiment are shown in Figure 6. The position tracking result showed that the control law had successfully accommodated the hysteresis, treating the hysteresis purely as a disturbance. In the simulation, the disturbance caused by the hysteresis v h was significant and the resulting position tracking error in Figure 6 was small with the position error confined within 0.4 μm. 7. Conclusions An adaptive control strategy has been proposed for micro/nano manipulators actuated by PEAs to follow specified motion trajectories. In this approach, a parameter compensator
382
40
h
Simulated Hyst v (V)
100 50 0 0 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
20
20 0 Ŧ20 Ŧ40 0
0 2
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 Ŧ2 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.05
0.1
0.15 0.2 Time (s)
0.25
0.3
0.35
5
0
0.05
0.1
0.15 0.2 Time (s)
0.25
0.3
0.35
Figure 5. Command and resulted trajectory
Pos Error (μm)
Vel (mm/s)
Pos (μm)
in
Cmd v (V)
H.C. Liaw et al. / Adaptive Control Strategy for Micro/Nano Manipulation Systems
0
Ŧ5 0
Figure 6. Hysteresis effect and position error
is employed to formulate the adaptive scheme for a system to drive its position tracking error to converge to zero. The proposed control system is shown to be stable with feasible performance in the simulation. Being capable of motion tracking under unknown system parameters and uncertainties due to disturbance and hysteresis, the adaptive control strategy is very attractive in the field of micro/nano manipulation.
Acknowledgment This work is supported by an Australian Research Council LIEF grant and an Australian Research Council Discovery grant.
References [1] K. Spanner and S. Vorndran, “Advances in piezo-nanopositioning technology,” in Proc. IEEE/ASME International Conference on Advanced Intelligent Mechatronics, vol. 2, Kobe, Japan, 20-24 July 2003, pp. 1338–1343. [2] Y. I. Somov, “Modelling physical hysteresis and control of a fine piezo-drive,” in Proc. International Conference Physics and Control, vol. 4, St. Petersburg, Russia, 20-22 August 2003, pp. 1189–1194. [3] M. Goldfarb and N. Celanovic, “Modeling piezoelectric stack actuators for control of micromanipulation,” IEEE Control Systems Magazine, vol. 17, no. 3, pp. 69–79, June 1997. [4] H. J. Shieh, F. J. Lin, P. K. Huang, and L. T. Teng, “Adaptive tracking control solely using displacement feedback for a piezo-positioning mechanism,” IEE Proc. Control Theory and Applications, vol. 151, no. 5, pp. 653–660, September 2004. [5] C. L. Hwang, Y. M. Chen, and C. Jan, “Trajectory tracking of large-displacement piezoelectric actuators using a nonlinear observer-based variable structure control,” IEEE Transactions on Control Systems Technology, vol. 13, no. 1, pp. 56–66, January 2005. [6] S. Arimoto, Control Theory of Non-linear Mechanical systems: A Passivity-based and Circuittheoretic Approach. New York: Oxford University Press, 1996. [7] H. C. Liaw, D. Oetomo, B. Shirinzadeh, and G. Alici, “Robust control framework for piezoelectric actuation systems in micro/nano manipulation,” in Proc. IEEE Region 10 International Technical Conference, Melbourne, Australia, 21-24 November 2005. [8] The mathworks. [Online]. Available: http://www.mathworks.com/
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
383
Smart Roadster Project: Setting up Drive-by-Wire or How to Remote-Control your Car Joachim Schr¨ oder, Udo M¨ uller, R¨ udiger Dillmann Institute of Computer Science and Engineering. University of Karlsruhe (TH). Karlsruhe, Germany. Tel.: +49 721 608 4243; E-mail:
[email protected]. Abstract. Since research in intelligent autonomous road vehicles gets more and more popular, many interested researchers are confronted with the needs to setup a commercially available vehicle with drive-by-wire. Up-to-date road vehicles contain many mechatronical components, sensors and driver assistance systems, which can be interfaced and thus reduce the needs of expensive modifications and additional actors to a minimum. This paper describes how to interface steering, throttle and brake as well as built-in sensors, and shows an inexpensive way to provide a safe research platform for road traffic. Keywords. Drive-by-Wire, Autonomous Vehicle, Vehicle Control, Driver Assistance, ABS, ESP, Electric Power Steering
1. Introduction In the last decades, vehicles have been equipped with more and more mechatronical components. Starting with conventional cruise-control over first safety systems like Anti-Blocking-System (ABS) or Electronic Stabilization Program (ESP) towards higher-level assistance systems such as adaptive cruise control (ACC), lane change warning systems or semi-autonomous parking assistants. Development of such intelligent assistants also rose interest of various research labs and universities, and established a connection between robotics area and automotive industry. Recent events like the Grand Challenge show this trend of applying the experiences gained from mobile robot research to road vehicles. It can be expected, that traditional robotics research topics like computer vision, object recognition, situation estimation, decision control as well as knowledge extension and learning will be included in future driver assistance systems. With integration of cognitive capabilities, such systems will be able to understand and evaluate traffic situations and derive reasonable and safe behaviors, which will be the basis for fully autonomous road vehicles. Where automotive industry sets their focus of research on applying existing technologies to new driver assistance systems, universities and research centers
384
J. Schröder et al. / Smart Roadster Project
Figure 1. Experimental Vehicle
are able to deal with more complex problems that do not need to result in products in the near future, which means 4-5 years. It will be their major task to perform basic research in the field of cognitive road vehicles. This includes finding answers to questions of how traffic situations and vehicle knowledge can be represented, which learning algorithms qualify for these systems, and, how safety of autonomous vehicles can be verified and numeralized. To apply the experience gained in the fields of mobile and humanoid robotics to road vehicles, the Institute of Computer Science and Engineering [1], and the group Interactive Diagnosis- and Servicesystems [2] started a collaborative project, the Smart Roadster Project, to extend their research towards this area. A Smart Roadster (see Figure 1), which was donated by the Smart AG, will serve as experimental vehicle. Aim of this project is to setup a fully autonomous road vehicle, and to equip it with cognitive abilities to negotiate in road traffic environment. While autonomous lane-following or lane-changing maneuvers on highways were performed by various research groups [3,4,5], the emphasis of this project is to develop a cognitive platform with understanding for a wide variety of traffic scenarios, including inner-city and interurban driving. Even if this goal seems to be very far, and it cannot be expected to see autonomous vehicles in traffic within the next two decades, it is clear that the trend goes towards this direction. Therefore, research in this area is necessary and will be very helpful to improve assistance systems on the way to a fully autonomous vehicle. Figure 2 shows the architecture which is going to be used in Smart Roadster project. The interaction with the physical world occurs through a Sensor/Actuator Level. A camera-head with stereo and tele-cameras is going to be used as main sensor. Even though other sensor concepts like Lidar or Radar devices might be added in the beginning, the authors think that in the long term non-invasive sensors need to be preferred. At the next level, sensor data is processed to obtain information about infrastructure like signs, street course and other static obstacles, as well as pedestrians, cyclists, other cars and moving objects. The control part in Figure 2 generates control values for throttle, brake and steering according to higher-level input, taking into consideration the vehicle dynamics. The third level from the bottom contains the database and behavior execution. The database contains spatiotemporal information about recognized objects, their state and behavior as well as the own vehicle state. All capabilities of the vehicle are stored in a behavior execution net. It has an internal structure that consists
J. Schröder et al. / Smart Roadster Project
385
Figure 2. Smart Roadster Control Architecture
of different layers of behaviors, which have a pre-defined coupling and progression as they are active. The cognitive level contains the most important components: The scene estimation to understand and to evaluate a traffic scenario, and the behavior decision to trigger an appropriate vehicle movement. It is obvious that all possible situations can neither be identified nor stored. Also, there are many different ways of executing behaviors and reacting in situations. Therefore, the questions of how to learn unknown situations, how to improve the behavior, and how to assert safety for this extended knowledge will be the key task in this level. According to the architecture described above, the development of our cognitive car can be devided into the following phases: 1. Phase: Setting up drive-by-wire capabilities 2. Phase: Implementation of sensor data processing and vehicle control to provide basic features 3. Phase: Integration of simple situation estimation to perform low-level driving behaviors 4. Phase: Extension of situation estimation to wide area of traffic situations and derivation of complex tasks 5. Phase: Evaluation and integration of safe learning algorithms to improve cognitive abilities In the first phase, the goal is to provide a complete drive-by-wire ability for the system, and to make necessary modifications to guarantee safe operation. Where in former times researchers had to make expensive modifications to setup a fully controllable road vehicle, up-to-date cars have included many assistance systems and mechatronical components. These systems include various sensors and actuators, and provide interface options which reduce modifications to a minimum. This paper describes an inexpensive and effective way to setup a commercially available car for drive-by-wire. Special regulations concerning safety are considered, against the background of obtaining a later permission for use in road traffic.
386
J. Schröder et al. / Smart Roadster Project
2. Built-In Assistance Systems 2.1. Anti Blocking System (ABS), Electronic Stability Program (ESP) Nowadays, almost every new car comes with ABS, which prevents wheels individually from blocking in case of heavy braking. The main advantage in contrast to conventional braking systems is a shorter braking distance and the conservation of tractability. To determine the wheel speeds, four sensors are connected to a control device via a Controller Area Network (CAN). This CAN-bus is usually referred to as Motor-CAN. ESP keeps the vehicle dynamically stable in critical skidding situations by braking wheels individually, and by counteracting understeer or oversteer. The existence of a critical situation is computed by a control box by comparing driver’s intention (steering angle, brake pressure) with information on the state of the vehicle (wheel speeds, engine speed, lateral acceleration and vehicle rotation) [7]. This sensory information is again available on the Motor-CAN-bus (except the brake pressure, see Section 3.3) and provides the ability to use this on-board information. 2.2. Electric Power Steering (EPS) Electric Power Steering is used in more and more new models instead of Hydraulic Power Steering. The smaller and lighter electric motors can be attached directly to the steering rod and need no lines, valves or hoses like hydraulic systems. They are less power consuming and thus much more efficient, as well as easier controllable. To be able to support the driver, the steering control box needs to measure the applied torque. Therefore, a torque sensor is directly connected to the steering control box, instead to the Motor-CAN-bus, since no other component needs this information and fail-safe operation is easier to guarantee.
3. Vehicle Modifications 3.1. Power Supply, Motor-CAN Interface and Safety Modifications The vehicle itself does not leave much space for voluminous power supply devices like additional batteries, alternators and large converters. These restrictions seem to hardly qualify the car as a research vehicle. On the other hand, the vehicle is equipped with many power-consuming components like power windows, windshield defrosters and seat heating, so that a comparatively strong alternator with 900W or 75A is used. Disabling seat heating, rear window heating and fog lights save up to 40A which can be used to supply additional control equipment. This will be sufficient to run up to six PCs and three DSP boards with maximum 30A and an DC/AC converter for TFT-monitor and notebook supply. To be able to read the internal Motor-CAN-bus, it was extended from the instrument board towards the control rack. From this bus, cyclic and event-triggered messages are read to derive the sensor values described in Section 2. Since auto-
J. Schröder et al. / Smart Roadster Project
387
Figure 3. Interface and Safety Components (Base Drawing: Smart AG)
motive manufacturers usually are very reserved in terms of disclosing communication protocols, recording sets of CAN-bus messages in specific situations and reverse-engineering the data was the only, but overall quite enthralling, solution. As software framework for the control, the open-source Modular Controller Architecture [6] is used. This framework provides CAN and Ethernet communication routines, supports real-time Linux to realize time-critical control tasks on PCs, and provides the ability to setup a graphical user interface based on QT. Due to various control devices listening at this CAN-bus, it is not possible to insert additional messages with unknown IDs. This means listening is the only option and CAN-communication between own devices needs to occur on a second Control-CAN-bus. To meet safety requirements, it must be possible to electrically switch off all components that interface throttle, brake and steering, and to immediately set the vehicle back into the origin state. Since having any logical parts within this safety chain makes a evidence of safe operation very complicated, a separate circuit was added to the Control-CAN-bus. All emergency components are connected to this circuit, and only in case of proper operation (High signal), throttle, brake and steering are contacted. To implement redundancy, all DSP modules check correct functioning of both Motor-CAN-bus and Control-CAN-bus as well as activity of other DSP modules and PCs by sending ping-pong messages. If any of the safety criteria is violated, each DSP can cut the emergency circuit and hand vehicle control back to the driver. Driver and Co-driver have the option to take over control at any time by hitting emergency buttons. Figure 3 illustrates the integrated sensors and additional components to interface throttle, brake and steering, and their connection to the control box. 3.2. Steering Interface As described in Section 2.2, the torque applied by a driver is measured by a sensor and amplified from a control box by driving an electric motor. This mode of operation leads to an obvious idea of separating the sensor from the control box and generating imaginary torque values, and to force a resulting steering movement. After several tests and measurements, a circuit was built (see Figure 4)
388
J. Schröder et al. / Smart Roadster Project
to switch between Control-CAN-bus and regular sensor as input for the steering control device. The state of the relay depends on the emergency circuit, namely on the level of NOTAUS SENSE line.
Figure 4. Control Circuit to Interface Electric Power Steering
This solution to interface existing power steering is cheaper, less complex and less vulnerable than mechanical constructions. Good emergency behavior is guaranteed since the driver gains immediate control over the vehicle, and no additional actuators are able to interfere with his decisions. One problem of this practice is the amplification behavior of the steering control device. As discussed earlier, steering support changes according to vehicle speed. The faster the car drives, the less support is given by the electric motor, even if it would be strong enough to do so. This makes sense in practical use so that drivers are unable to make a hard turn when driving fast. But this means on the other hand that automatic steering works only for low speeds (until 60km/h with Smart Roadster). A possible solution to this problem would be to add the missing torques to the supporting torque by using different amplification characteristics in autonomous mode. Since such different sets of characteristics are used to run different car models (which need different steering support) with the same electromechanical component, another EPROM of the particular manufacturer will help. This drawback has also a good point: Using pre-defined characteristic curves from manufacturers in original control devices simplifies the lateral control since nonlinear friction is partially considered, and also limits the lateral movement to a safe level no matter what the higher-level control computes as set-point. 3.3. Throttle and Brake Interface Interfacing throttle was done almost exactly the same way as interfacing electric steering, described in the previous paragraph. A throttle position value is given through Control-CAN-bus and is adjusted to the corresponding electrical interface by a circuit as in Figure 4. Connection between CAN node and motor control device is double protected and only established if level of NOTAUS SENSE is high. Setting up braking-by-wire is probably the most critical part of interfacing
J. Schröder et al. / Smart Roadster Project
389
vehicle components. Several solutions seem possible, so could perhaps the existing ESP module be told to brake. Since ESP is only designed to brake individual wheels, it would not be strong enough to decelerate the entire vehicle. Interfacing an active brake booster could be an option, but is unfortunately not available in our testing car. Modification of the hydraulic system might work, but needs valves and pumps and probably causes unmeant interaction with existing ESP and ABS modules. Therefore, the easiest way to interface the brake is a mechanical connection to the brake pedal. A drum, connected to a gear motor is pulling the brake pedal over a steel cable, and adjusts to a desired brake pressure. The brake pressure sensor is interfaced through another Control-CAN-bus module, which converts the value and puts it on the bus. The steel cable connection is able to transmit force only in one direction, so that is is not possible to counteract the driver’s wish to brake. In a second step, the brake actuator will be extended through a possibility of pre-stressing the cable. This will allow completely unmanned driving since the system is able to brake automatically in case of emergency or power loss.
4. Results Figure 5 shows the converted passenger compartment with touchscreen, camera head, concole and emergency buttons (left), and a screenshot of the MCA2 interface displaying sensor values (right). The result is a very slim system, in terms of power consumption, complexity and price. First experiments with steering and throttle interfaces circuits showed that it is possible to apply a virtual torque, which causes the steering unit to run autonomously, and to set the throttle position via joystick. Figure 6 (top) shows steering angles as a result of applying a constant virtual torque of 6.2Nm at different speeds of the vehicle. At higher speed, the resulting angle is lower due to the velocity-dependent characteristics of the amplification curve. The bottom diagram of Figure 6 shows, that it is possible even without a torque applied by the driver and, with standard amplification characteristics, to drive a slalom parcours at a speed of 20km/h. Using an emergency button to switch between real sensors and interface circuits was possible at any time.
Figure 5. Display and Camera Head (left) and MCA2 Screenshot (right)
390
J. Schröder et al. / Smart Roadster Project
Figure 6. Velocity-Dependant Steering Support (top) and Slalom Test Drive (bottom)
5. Conclusion This paper showed that it is possible to setup an actual vehicle type with drive-bywire capabilities very easily without severe changes to the system. It was pointed out, how integrated sensors and actuators can be used and interfaced, and how existing control devices have to be combined with additional units to not interfere with their usual mode of operation. The next steps will be to put the brake actuator into operation, and to implement a longitudinal and lateral control to be able to perform basic vehicle movements, and lay the basis for further research. References [1] Institute of Computer Science and Engineering (ITEC). University of Karlsruhe (TH), Karlsruhe, Germany. http://wwwiaim.ira.uka.de/ [2] Interactive Diagnosis- and Servicesystems (IDS). Center of Research (FZI), Karlsruhe, Germany. http://www.fzi.de/ids/ [3] A. Broggi, M. Bertozzi, A. Fascioli, G. Conte. The Experience of the ARGO Autonomous Vehicle. World Scientific Publishing, Singapore, 1999. [4] R. Gregor, M. Luetzeler, M. Pellkofer, K.-H. Siedersberger, E.D. Dickmanns. EMSVision: A Perceptual System for Autonomous Vehicles. Proceedings of Int. Symposium on Intelligent Vehicles (IV). Dearborn, Michigan, October 2002. [5] M. Chen, T.M. Jochem, D.A. Pomerleau. AURORA: A Vision-Based Roadway Departure Warning System.IEEE Conference on Intelligent Robots and Systems, August 5-9, 1995, Pittsburgh, Pennsylvania, USA. [6] Modular Controller Architecture 2 (MCA2). http://mca2.sourceforge.net/ [7] P. Rieth. Elektronisches Stabilit¨atsprogramm. Verlag Moderne Industrie, Landsberg/Lech, Germany, 2001.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
391
A Reactive Approach for Object Finding in Real World Environments 1 Abdelbaki BOUGUERRA Mobile Robotics Lab. University of Örebro ; Sweden Email:
[email protected] Abstract. In this paper we propose an approach to handle requests of finding objects in real world environments by mobile robots. The proposed approach checks candidate objects based on the likelihood they constitute an answer to the requests in a reactive way. As a result, run-time perceived objects are handled "on the fly" without extra cost. We present the theoretical concepts of the proposed approach, and describe the experiments we run to validate it.
Keywords. Active Search, Mobile Robots, Description Matching
1. Introduction In [1], AI conditional planning was used to recover from failures resulting from ambiguities in anchoring an object with a specific description. Basically, this kind of failures occur when the robot’s sensors detect several objects that fit the description of the object to anchor (please refer to [2] for details about the anchoring problem). In this paper we present a robust approach that can be used by mobile robots to find objects that fit a specific description by actively checking the features of candidate objects. As a result, the proposed approach can be used to recover from anchoring failures involving ambiguities. Mobile robots face many challenges while acting in real world environments. Mainly because of the high dynamics of the environment, and lack of complete information there. We opted for a reactive approach to deal with the problem of missing information and used likelihood of objects fitting the desired description to select which candidate object to check first. This results in a robust handling of newly discovered objects at runtime without extra overhead, as opposed to techniques using planning [1]. However, being greedy in selecting a candidate, our approach does not guarantee an optimal strategy of actively observing the candidate objects. We believe that the proposed approach performs better than using a planning technique for two reasons. First, a planner has to have access to a model of the world to be able to generate (optimal) plans that specify how to check the properties of candidate objects. Moreover, when observation actions have several outcomes, planning becomes quickly intractable for problems involving a high number of candidate objects. Second, because real environments are dynamic, plans become quickly out of date which dictates 1 This
work has been supported by the Swedish KK foundation.
392
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
replanning. As replanning produces a new plan, the same problem might occur again implying (re)replanning. In the worst case, the robot will spend most of its time replanning in response to perceiving new objects. This paper is organized as follows: the next section reviews some of the literature related to our research work. In section 3 we specify how queries about objects are handled and how candidate objects are generated. Then, we present the algorithms used to actively infer which candidate object is an answer to the query. Before concluding we describe the experiments that we run to validate the proposed approach.
2. Related Work The work presented here constitutes an alternative to the work reported in [1,3], where AI conditional planning was used to disambiguate objects, to recover from anchoring failures, under restricting assumptions: mainly ignoring the fact that new candidate objects might be perceived at run-time. The proposed approach bears some resemblance to active sensing strategies [4,5], where the aim is to collect information to reduce the uncertainty about the state of the mobile robot, such as self-localization [6], by selecting the action that minimizes the entropy of the next state. In our approach, the focus is on finding objects that fit a high-level description. There are also active approaches that try to solve low-level issues such as scene analysis and object recognition using single sensing modalities [7]. In comparison, different sensing modalities can be used together to infer the values of the properties of candidate objects within the proposed approach. Partially Observable Markov Decision Processes (POMDPs) [8] are a powerful tool for acting in uncertain environments, but they are not able to scale up to solve problems involving a high number of objects. Moreover, POMDPs need a model of the environment to generate optimal policies. As for AI planning, handling newly discovered objects at run-time would involve recomputing the policy from scratch. The natural language system Ludwig [9] is designed to answer natural language queries about objects of a scene (vision-based). Its main focus is the semantics of visual entities contained in the scene. Our focus is on acting to acquire more information rather than scene analysis. In fact the approach we propose can sit on top of a system such as Ludwig and use its services to compute values of object properties.
3. Handling Queries We are interested in scenarios where a mobile robot, acting in an unstructured environment, gets requests (queries) to look for an object that fits a specific description, either for the purpose of anchoring or to satisfy users requests about objects of interest. The description specifies the properties that an object must satisfy to be an an answer to the query. A query is specified using the following syntax: find(object(?x)):(and f1 (?x) = v1 , f2 (?x) = v2 , ..., fn (?x) = vn )
Each fi (?x) = vi specifies a feature and its value, when the value vi is omitted, it is assumed to be True. For instance, if we want to find a cup that contains tea, we write:
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
393
find(object(?x)):(and <shape(?x) = cup>,
).
Features can be either complex or simple. Simple features are properties that can be checked in the real world using observation actions. An example of a simple feature is the color of an object. Complex features, on the other hand, are combinations of other features (that can be simple or complex). Checking that a cup contains tea involves checking tow simple features (1)- the cup is not empty, and (2)- the content is tea. Thus the feature contains is a complex one. In this work, complex features can be represented either as a conjunction or a disjunction of other features. A conjunctive complex feature holds if all its sub-features hold, whereas a disjunctive complex feature holds if at least one of its sub-features holds. 3.1. Candidate Objects To generate the set of candidate objects that would constitute an answer to the query, a monitoring process uses the percepts detected by the onboard sensors to compute a representation of the outside world that comprises all detected objects and their observed properties. The features specified in the query are then matched against the properties of each detected object, resulting in three possible outcomes: • The object does not match the description. This occurs when a feature of the checked object has a different value than the one specified in the query. • The object is a complete match of the desired object. This happens when all the features of the query are satisfied by the properties of the checked object. • The object is a partial match i.e. there is at least a feature whose value can not be asserted from the available properties of the checked object. An example of a partially-matching object for the previous query is a cup whose content has not been determined yet. If an object is a complete match, it is returned as an answer to the query. In case no complete match was found, partially matching objects are considered as candidates, and therefore, their undecided features are checked.
4. Active Query Handling In this section we detail how the mobile robot actively checks whether a candidate object is an answer to the query through checking its undecided features. The mobile robot has a repertoire of observation action templates that it can use to check simple features of candidate objects. An observation-action template specifies a sequence of robot actions to execute as a prerequisite to infer the value of the associated feature. Determining the value of complex features is done through determining the value of their sub-features. Example 1 The following template specifies the robot actions to be executed in order to determine the value of the simple feature smell-of(?obj): smell-of(?obj : object){ robot-actions: (approach(?obj);smell(?obj)) }
394
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
4.1. Feature Checking The algorithm "Check-Feature" in figure 1 implements the process responsible of actively checking features in the real world. Checking a simple feature fi amounts to instantiating the associated observation action template and launching its execution by the robot (step 3) (notice that this blocks the current process "Check-Feature"). Upon finishing the execution of the observation action, the process is awaken so it computes the new state of the world (step 4) used to evaluate the truth value of the feature (step 5). The result of evaluation can be one of three values {0, 1, u}. Value u indicates that the truth value of the feature is not known (u for undecided). This can arise when the observation action has failed to observe the object (due to occlusion, for instance). Truth value 1 (resp. 0) indicates that the feature holds (resp. does not hold) in the world state. If the feature fi is complex, the process selects one of its best sub-features fij (step 8) (according to a selection criterion) for further checking (step 9). The evaluation result of the sub-feature fij is then used to evaluate the truth value of the parent feature fi according to table1.
Process Active-Inference(candidates) Process Check-Feature(fi ) 1. if candidates = {} return failure endif 1.if simple(fi ) 2. action ← instantiate-action(fi ) 2. c ← best-candidate(candidates) 3. if undec(c) = {} return c endif 3. Execute(action) 4. f ← select-best-feature(undec(c)) 4. ws ← new-world-state() 5. result ← Check-Feature(fi ) 5. result ← evaluate(fi ,ws) 6. if result = 1 6. return result 7. sat(c) ← sat(c) ∪ {fi } 7.else 8. undec(c) ← undec(c) − {fi } 8. fij ← best-sub-feature(fi ) 9. endif 9. result ← Check-Feature(fij ) 10. truth ← evaluate-truth(fi , result) 10. if result = 0 11. return truth 11. candidates ← candidates − {c} 12.endif 12. endif 13. candidates ← candidates ∪ newcand END 14. Active-Inference(candidates) END Figure 1. Processes of active checking of candidate objects features
4.2. Active Inference The aim of active inference is to try to find the object that fully matches the query, and eliminate all those candidate objects that turn out not to fit the description of the desired object at run-time. The process of active searching for the fully-matching candidate is implemented by the algorithm "Active-Inference" shown in figure 1. Each candidate object c is associated with two sets undec(c) and sat(c), where undec(c) contains the undecided features of object c, and sat(c) stores the features of the query that where satisfied by the properties of c. The process starts by checking that there is at least one candidate object, otherwise it returns failure to report that he query has no answer. This kind of failure triggers the
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
395
Table 1. Truth-value evaluation for complex feature fi case
type(fi )
1
truth(fij )
0
truth(f i) 0 : ∀k = j: truth(fik ) = 0
1
u : ∃k = j: truth(fik ) = u 1
3
u
u
4
0
2
5 6
OR
AND
1
0 1 : ∀k = j: truth(fik ) = 1
u
u
u : ∃k = j: truth(fik ) = u
generation of an exploration task to look for candidate objects in a non-visited map sector (room, corridor,..). When the set of candidate objects is not empty, the process selects one candidate object that is judged as the best candidate which is returned as the answer to the query in case it is a complete match. This is equivalent to checking that the set of undecided features of the candidate is empty (step 3). Should the candidate be a partial match i.e. its undec set is not empty, the process selects the best feature out of those undecided ones for further checking by the process "Check-feature". As outlined in the previous sub-section, the result of checking the best feature can be one of three values {0, 1, u}. Value 1 means that the feature holds in the real world, thus the process removes the feature from undec(c) (step 8) and adds it to sat(c) (step 7). If on the other hand the feature does not hold in the real world state i.e. the result is 0, the process considers object c as a mismatch and therefore drops it from the set of candidate objects (step 11). When the feature checking process returns u (meaning that the truth value of the feature is unknown in the real world state), the process need not do anything, since the feature should be checked further at a later stage. Finally, the process updates the set of candidate objects by adding any new candidate objects observed while checking the feature, and recursively performs the same operations on the updated set of candidates (step 14). Note that computing the set of new candidates is the task of the monitoring process as discussed earlier for the initial set of candidate objects. 4.3. Selection Criteria To select the best candidate object as well as the best feature and sub-feature to check, one can use different criteria such as the euclidean distance between the current location of the robot and the location where the robot should move to observe a feature. However using distance as a measure of selection ignores how likely an object is the best match. Thus, we opted for using the probability of a candidate object matching the description of the query as a selection measure. The object with the highest probability is considered to be the best candidate. Please notice that one can also use a utility-based criterion to select candidate objects where the expected cost of checking an object is to be combined with its expected utility. Because of space limitations we only describe the probability criterion. To simplify the computation of the probability P (c) of a candidate object c, fitting the description of the query, we make the assumption that all complex features are independent.
396
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
P (c) =
P (fi )
(1)
fi ∈undec(c)
Simple features probability distributions are given by the user. Computing the probability of a complex feature fi , on the other hand, depends on its type (conjunctive or disjunctive) as well as the probabilities of all its n sub-features fik (k=1,...,n) i.e. ⎧ ⎪ (P (fik )) ⎪ ⎪ ⎪ ⎨ k=1,...,n P (fi ) =
⎪ ⎪ ⎪ ⎪ ⎩1 −
if fi is conjunctive (2)
(1 − P (fik )) if fi is disjunctive
k=1,...,n
Equation (2) is used to determine the best feature and sub-feature to actively check in the real world. Example 2 Figure 2 shows a scenario where the robot tries to find a marked object (query: find(object(?x)):(<marked(?x)>)). The feature marked is a disjunctive complex feature whose sub-features are instances of the simple feature observed-mark(?x,?loc) which is true if a mark is observed on object ?x from location ?loc. Say that the robot has instantiated feature marked(?x) for objects box-1 and gb-1 as follows: marked(box-1)=(or f1 , f2 , f3 , f4 ), where fi = observed-mark(box-1,li) marked(gb-1)=(or f5 , f6 ), where fi = observed-mark(gb-1,li) Assuming that P (fi = T ) = P (fi = F ) = 12 for all simple features fi , then using Eqs. (1) and (2) we have: P (box-1) = 1 − (1 − P (f1 ))(1 − P (f2 ))(1 − P (f3 ))(1 − P (f4 )) = 15 16 P (gb-1) = 1 − (1 − P (f5 ))(1 − P (f6 )) = 34 This shows that object box-1 is more likely to be marked then object gb-1. Consequently object box-1 is the best candidate object.
l3 l1
l5 l4
box-1 l2
gb-1 l6
Figure 2. A scenario where the robot is looking for a marked object
5. Experiments The active inference approach described in this paper has been implemented onboard a Magellan Pro mobile robot running the ThinkingCap behavior-based control architecture
397
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
[10]. Two sensing modalities were used, a pan-tilt CCD camera, and a a commercial electronic nose (Cyranose Sciences Inc. 2000). We validated the proposed approach by running several experiments for two scenarios. 5.1. Scenarios The first series of experiments we run, involved only the vision system. The robot was asked to find a marked object in a room that contains several scattered objects with different shapes and colors. Figure 3 shows an experiment where the robot is asked to find a marked gas-bottle. At the beginning, only one gas bottle is perceived (gb-1), the robot determines three observation locations to check whether gb-1 is marked. As the robot moves to the first observation location, it perceives the other two gas-bottles (gb-2 and gb-3) with no mark on them. When at the new observation location, the robot checks gb-1 and perceives no mark on it. The robot, then, updates the set of candidate objects, to add gb-2 and gb-3, and computes their probabilities. At this stage, the probability of gb-1 has dropped and the two new objects have equal probability. Object gb-2 is randomly selected as the best candidate to be checked for the mark from a new location. The same process continues until the robot detects that gb-3 is an answer(fifth observation). The chart graph of figure 3 shows the evolution of the probabilities of the three objects, where the X-axis represents the observation action number. The experiment was repeated several times with different random configurations and occlusions of candidate objects. We used up to 5 candidate objects, and fixed the number of observation locations for each new candidate at 3 that are determined at run-time around the object. I ist worth noting, that the success of the approach ranged from 60% to 90%, and all failures of identifying the correct object where due to the vision system. This happened, mostly, because the vision system did not detect a mark on an object that has a mark, or the vision system failed to detect a candidate object. 1 0.9 0.8 probability
0.7 0.6
gb-1 gb-2 gb-3
0.5 0.4 0.3 0.2 0.1 0 1
2
3
4
5
observation-action number
Figure 3. Experiments. (left) looking for a marked gas-bottle, (middle) smelling a garbage can, (right) probability of each of the three gas-bottles.
The second series of experiments involved using both modalities of sensing: vision and olfaction. The experiments consisted of finding an object that has the shape of a garbage can or a cup and that has the smell of ethanol (see Figure 3 (middle)). All candidate objects where first acquired using vision, and then the electronic nose was used to smell their contents until finding the one that contained ethanol. It is worth noting that we used different configurations of the layout of the objects and make some of them not visible from the first time to test how run-time perceived objects are considered. We also considered the case where the probability distribution of the simple feature of smell
398
A. Bouguerra / A Reactive Approach for Object Finding in Real World Environments
is conditioned on the shape of objects (it is more probable that cups smell alcohol than garbage cans do). As a result, candidate objects that have the shape of a cup where always selected first for smelling. As in the first series of experiments, most failures where caused by the vision system not detecting candidate objects.
6. Conclusion We have presented in this paper a reactive approach to search for objects that fit a specific description. Being reactive the approach presents the advantage of handling runtime perceived objects without any extra overhead. In comparison, a solution based on planning would have to replan to take into consideration the newly discovered candidate objects. However our approach does not guarantee to find the right object in an optimal way because of its greediness. Moreover we do not handle relations between objects as it is done in [1]. The aim of this work was to provide an alternative solution to recover from failures resulting from ambiguities in anchoring objects. We think that this approach can be used for other purposes where looking for objects is the task to be achieved by the mobile robot. Regarding future work, we will try to relax the independence assumption of features and use Bayesian networks to represent complex features. We also intend to make a comparison between the performances of using different selection criteria.
References [1] M. Broxvall, S. Coradeschi, L. Karlsson, and A. Saffiotti. Recovery planning for ambiguous cases in perceptual anchoring. In Proc. of the 20th Nat. Conf. on Artificial Intelligence, pages 1254–1260, 2005. [2] S. Coradeschi and A. Saffiotti. An introduction to the anchoring problem. Robotics and Autonomous Systems, 43(2-3):85–96, 2003. [3] M. Broxvall, L. Karlsson, and A. Saffiotti. Steps toward detecting and recovering from perceptual failures. In Proc. of the 8th Int. Conf. on Intelligent Autonomous Systems, pages 793–800, 2004. [4] D. Fox, W. Burgard, and S. Thrun. Active markov localization for mobile robots. Robotics and Autonomous Systems, 25:195–207, 1998. [5] J. Porta, B. Terwijn, and B. Kr. Efficient entropy-based action selection for appearance- based robot localization. In Proc. of the Int. Conf. on Robotics and Automation, pages 2842– 2847, 2003. [6] W. Burgard, D. Fox, and S. Thrun. Active mobile robot localization by entropy minimization. In Proc. of the 2nd Euromicro Workshop on Advanced Mobile Robots, pages 155–162, 1997. [7] T. Arbel and P. Ferrie F. On the sequential accumulation of evidence. Int. Journal of Computer Vision, 43(3):205–230, 2001. [8] L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134, 1998. [9] I. Horswill. Visual architecture and cognitive architecture. Journal of Experimental and Theoretical Artificial Intelligence, 9(2-3):277–292, 1997. [10] A. Saffiotti, K. Konolige, and E. H. Ruspini. A multivalued-logic approach to integrating planning and control. Artificial Intelligence, 76(1-2):481–526, 1995.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
399
A Geometric Approach for an Intuitive Perception System of Humanoids David Israel Gonzalez-Aguirre , Eduardo Jose Bayro-Corrochano Center for Research and Advanced Studies, National Polytechnic Institute (CINVESTAV-IPN), Guadalajara, Mexico. Abstract. In this paper we propose a conformal model for stereoscopic perception systems. In our model the two views are fused in an extended 3D horopter concept. The inverse transformation of the horopter spheres and Poncelet points leads remarkably to a 3D log-polar representation of the visual space. This representation is quite effective for processing of visual space information. This model is very suitable for building a powerful binocular head for applications in humanoids, intuitive man-machine interfaces and advanced 3D automated visual inspection. Keywords. Geometric Computing, Early and Stereo vision, Humanoids
1. Introduction This paper presents a novel approach for building stereoscopic perception systems. The traditional 2D horopter is reformulated now as a 3D horopter using conformal geometric algebra. In this framework the visual space is represented as a family of horopter spheres which together with their Poncelet points lead remarkably to a 3D log-polar representation of the visual space. There is quite abundant research activity on image processing using 2D log-polar schemes with either monocular or stereo systems [2],[10]. However this kind of work is representing basically the cartesian world using polar coordinates, and it fails to fuse the data of the two cameras of the stereoscopic vision system in a single framework. We believe that our human-like computational scheme looks promising for the processing of visual space data. The structure of this paper comprises the following sections. Section two gives a brief introduction to conformal geometric algebra. Section three describes the conformal model for stereoscopic perception and provides insights in the implementation details which supports a kind of human processing of the visual perception. The section four presents real time applications and accurate results using real images. The conclusions are given in section five.
2. Geometric Algebra: An Outline denote the geometric algebra of -dimensions, this is a graded linear space. As Let well as vector addition and scalar multiplication we have a non-commutative product 1 López
Mateos Sur No. 590, Guadalajara, México. Tel: +52(33)3134 5570, [email protected]
2 [email protected]
400 D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids
which is associative and distributive over addition – this is the geometric or Clifford product. A further distinguishing feature of the algebra is that any vector squares to give a scalar. The geometric product of two vectors and is written and can be expressed as a sum of its symmetric and antisymmetric parts . The inner product of two vectors is the standard scalar ordot product and produces a scalar. The outer or wedge product of two vectors is a new quantity which we call a bivector. We think of a bivector as a oriented area in the plane containing and , formed by sweeping along . Thus, will have the opposite orientation making the wedge product anticommutative.The outer product is immediately generalizable to higher dimensions – for , a trivector, is interpreted as the oriented volume formed by sweeping example, along vector . The outer product of vectors is a -vector or -blade, the area and such a quantity is said to have grade . A multivector (linear combination of objects of different type)is homogeneous if it contains terms of only a single grade. 2.1. The Geometric algebra of n-D space In this paper we will specify a geometric algebra of the n dimensional space by , where , and stand for the number of basis vector which squares to 1, -1 and . We will use to denote the vector basis . In a 0 respectively and fulfill Geometric algebra , the geometric product of two basis vector is defined as
This leads to a basis for the entire algebra: . Any multivector can be expressed in terms of this basis. In the nD space there are multivectors of grade 0 (scalars), grade 1 (vectors), grade 2 (bivectors), grade 3 (trivectors), etc... up to grade . Any two such multivectors can be multiplied using the geometric product. Consider two multivectors and of and can be written grades and respectively. The geometric product of as where is used to denote , e.g. consider the geometric product of two vectors the -grade part of multivector . 2.2. Conformal Geometry can be used to treat conformal geometry. To see how this is Geometric algebra possible, we follow the same formulation presented in [3] and show how the Euclidean is represented in . This space has orthonormal vector basis given vector space by and are bivectorial basis. The unit Euclidean , a unit Conformal pseudo-scalar where pseudo-scalar the bivector , these are used to compute the inverse and duals of respectively, and this are multivectors.The two basis added and square to and used to define two null vectors , , where is interpreted as is the point at infinity. Then a point the origin of the coordinate system and in conformal spaces is expressed by . The equation of a can be written as . Since sphere of radius centered at point
D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids 401
, we can rewrite the formula above in terms of homogeneous . Since we can factor the expression above .Which finally yields the simplified equation for the sphere as .Note from this equation that a point is just a sphere with zero radius. . The advantage Alternatively, the dual of the sphere is represented as 4-vector of the dual form is that the sphere can be directly computed from four points (in 3D)as .If we replace one of these points for the point at infinity we get the equation of a plane .So that becomes in the where is the normal vector and represents standard form the Hesse distance.A circle can be regarded as the intersection of two spheres and as . The dual form of the circle (in 3D)can be expressed by three points lying on it as .Similar to the case of planes, lines can be defined by , for more intuitive circles passing through the point at infinity as: and deeper insight in conformal geometric algebra see [4]. coordinates as to
2.3. Conformal Model for Stereoscopic Perception Systems Now we explain how we developed a conformal model for representing 3D information of the visual space. We start explaining the relationship between the horopter and our conformal model. Thereafter we describe how we build a real time system using our model. 2.4. Horopter and the conformal model The horopter is the 3D geometric locus in space where an object has to be placed in order to stimulate exactly two points in correspondence in the left and right retinas of a biologic binocular vision system [9]. In Figure 1.c-d we see a horopter depending of an azimuth angle . In other words the horopter represents a set of points that cause minimal (almost zero) disparity on the retinas. We draw the horopter tracing an arc through the fixation point and the nodal points of the two retinas, see Figure 1.a. The theoretical horopter is known as the Vieth-Müller circle. Note that each fixation distance has its own Vieth-Müller circle. According to this theoretical view, the following assumptions can be made: each retina may be a perfect circle, both retinas are of the same size, corresponding points are perfectly matched in their retina locations, and points in correspondence are evenly spaced across the nasal and temporal retinas of the right and left eyes. If an object is located in either side of the horopter, a small amount of disparity is caused by the eyes. The brain analyzes this disparity and computes the relative distance of the object with respect to the horopter. In a narrow range near of the horopter the stereopsis does not exist. That is due to very small disparities which are not enough to stimulate stereopsis. Empirical horopter measurements ( even done using the Nonius method [7])do not agree with the Vieth-Müller circle. There are two obvious reasons for this inconsistency either due to irregularities in the distribution of visual directions in the two eyes or a result of the optical distortion in the retinal image. There are various physiological reasons why the horopter can be distorted. Other cause of distortion is the asymmetric distribution of oculocentric visual distributions. In addition to a regional asymmetry in local signs in one eye, the distribution between the two eyes may not be congruent (correspondence problem), this may be another cause of horopter distortion. Asymmetric mapping from retina to the neocortex in both eyes also causes a deviation of the horopter from the ViethMüller circle. In this work we attempt to build an artificial binocular system. Deviations
402 D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids
Figure 1. a) The horopter and Spheres centers by bisector line when the depth grows. b) Spherical horopter and the unit sphere, horopter depending on the azimuth angle c)Conformal horopter configuration, Left and Fixation Point define a Circle which by means of varying Camera center , Right Camera center defines a family of spheres or Horopter Spheres.
of the horopter in this artificial system will depend if the digital cameras fulfill the above summarized requirements for obtaining the Vieth-Müller circles. In the next subsections we explain our measures taken in order to avoid a distortion of the horopters. One important contribution of this work is to show that the horopter is naturally embedded in the unit sphere of the conformal geometric algebra, remarkably enough to have certain degree of biological plausibility. The simple configuration of the horopter shown in Figure 1.a is nothing than a very naive geometric representation using polar coordinates of the geometric locus of the visual space. In contrast using the tools of conformal geometric algebra we can claim that binocular vision can be reformulated advantageously using an spherical retina. Now we show how we find the horopter in the sphere of conformal geometry. Actually we are dealing with a bunch of spheres intersecting the centers of the and , see Figure 1.b. This is the pencil of spheres in the projective space cameras and cameras centers are Poncelet points. Since a stereo of spheres. Note that the system only sees in front of it, we consider the spheres emerging towards the front. When the space locus of objects expands, the centers of the spheres move along the bisector line of the stereo rig, this is when the depth grows, see Figure 1.a. From now onwards we will use the term horopter sphere rather than the horopter circle, because when we change the azimuth of the horopter circle we are simply selecting a different circle of a particular horopter sphere , see Figure 1.b. As a result we can consider that all the points of the visual space are lying on the pencil of the horopter spheres. Let us translate this description in terms of equations of the conformal geometric algebra. We call the the one which has as center the egocenter of the stereo rig, see unit horopter sphere c c , Figure 1.c and using the sphere equation, where its center (egocenter)is attached to the true origin of space of the conformal geomet. The center c ric algebra and the radius is the half of stereo rig length of any horopter sphere moves towards the point at infinity as c , is the Euclidean 3D vector. Thus we can write the equation of the sphere where as , where the radius is computed in terms of the stereopsis depth . Consider the figure of the model for visual human system in Figure 1.c, we see that the horopter circles lye on a pencil of planes . We can obtain the same circles simply by intersecting in our conformal model such pencil of
D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids 403
planes with the pencil of tangent spheres as is depicted in Figure 1.c. The intersection is computed using the meet operation of the duals of the plane and sphere and taking the dual of the result as . Now, taking the meet of any couple of horopter spheres we gain a circle which lies on the front parallel plane with respect to the digital cameras common plane . Later on taking the meet of this circle with the unit horopter we regain the Poncelet points and which in our terms is called point pair z s L s , note that the second part of the equation computes the point PP pair wedging the dual of sphere with the line crossing the camera centers and . If we further consider the Figure 1.b., the intersecting plane cuts the horopter spheres generating geometric locus on the plane as depicted in Figure 1.b-c. These horopter circles fulfill an interesting property. If one takes an inversion of all horopter circles with centers on the line we get the radial lines of the polar diagram, see at the right in Figure 3.a. Now, since the plane (by varying angle )intersects the family of horopter spheres producing the horopter circles of Figure 3.a whose inverse is a 2D log-polar diagram, we can conclude that the inverse of the arrangement of horopter spheres and Poncelet points is equivalent a 3D log-polar diagram, as depicted in Figure 3.a. To understand this better, lets us take any radial line of the 2D log-polar diagram and express it in conformal geometric algebra L X Y . Now, applying an inversion to this line we get a circle, i.e. L . Note that this inversion is implemented as a reflection on . The 3D log-polar diagram is an extraordinary result, because contrary to the general belief that conformal image processing takes places in 2D log-polar diagram, we can consider that the visual processing rather takes place in a 3D log-polar diagram. This claim is novel and it is promising, because this framework can be used for 3D flow estimation as opposite to the use of one view or even an arrangement of two log-polar cameras. In this paper we use our conformal model in a real binocular head which will be used for an humanoid, or as an intuitive vision-based man-machine interface and also for advanced 3D automated visual inspection.
3. Building the conformal horopter using a binocular head A half-body humanoid is being built in order to research models and algorithms for perception action systems based on conformal geometric algebra, this system includes a robotic arm, hand, neck (pan-tilt unit)and stereoscopic vision (see Figure 2.a-b). The
Figure 2. a) Humanoid’s pan-tilt unit and stereoscopic vision system. b) Humanoid’s virtual reality scene. and half of the image size . d) Maximal energy c) High frequency region limited by the cut-off radius in high frequency region at time . amount
front parallel cameras system provides an excellent platform to implement the conformal horopter (see Figure 3.b). In order to support the assumptions made upon the model,
404 D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids
we use effective techniques to adjust focal distances, manage calibration and shooting control of both cameras. Since both cameras are the same model and have the same lenses mounted, we could therefore assume equal image resolution; in terms of human vision parameters this would be taken as equal retina size on both eye balls. This is a very important issue to assure so that the image and geometric process will be done in a non degenerated horopter. So priori standard camera calibration, the focal distance is set as follows. First we capture a gray scale image at time on the left cam. era, then we map the image into its Fourier representation is computed in order to determine Hence, the spectral density the energy distribution in the bidimensional signal. The high frequency region could be thought as the zone which extends beyond certain cut-off radius , i.e. where the frequency variable or (or perhaps both)represents the high frequency components of the to the zone image, see Figure 2.c. The contribution of a certain component
b)
a) Figure 3. a) Log-polar representation of visual space. b)Nodal points
planes
-
-
(camera’s centers), image
, and the azimuth angle .
. This zone is determined (weighed to avoid rippling)by implies all the components of the signal that constitute the corners or borders in space domain, features found in a well-focused camera. The energy amount in this region at certain time is the sharpness of the image we calculate it as discrete weighed integration . While these calculations take place is reached we physically adjust the focal length of the lens until the maximal see Figure 2.d. Once the left camera has gone through this stage we do the same with the right one. Finally since we do not only wish to get the best possible focus for each camera, but also to make the system evenly focused, the strategy consists in establishing a subwindow in the left image (such that its content appears in the 3D field of view, so it can be found in both images), and apply this subwindow as correlation kernel with the right image in order to find the best match. This Kernel was improved for our purposes with an edge detector filter (3x3 pixels), (1) (2)
D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids 405
The point
with maximal gray scale amplitude is the center of the highest correlation will be set on the right image. The radius would be set in such a way that the content in both subwindows will be almost the same. Similarity in both images is obtained by insulating the subwindows with the , function (best match), where the subwindow
, we proceed to take both insulated signals (left and right images with subwindows previously for each side as: matched)into their Fourier representation and compute (3) (4) (5) (6)
Later on, we calibrated the stereo rig with a well-known method from [11]. Uniform temporal distribution of features on images is an important restriction which is been tackled by computational thread schemes. Spatial distribution of features is assured with a full camera calibration including lenses distortion (radial-tangential)and projective compensations. The disparities found in images are handled as head-centric iso-disparities surfaces. These are the meshes that supports stereo matching, and more over scene reconstruction. Since off-line calibration procedure has been finished, the next stage consists in computing those iso-disparities surfaces using a multiresolution algorithm see Figure 4, this method is inspired on census mapping for correspondence estimation [6], however our algorithm uses an central pixel difference. This process has been accelerated by a set of constraints (ordering, uniqueness, disparity limit, disparity continuity, etc.)[11]. When dense mesh reconstruction is wished a median-like filter must be applied to distinguish those outliers which were not taken by the disparity limit constraint. This median-like filter considers only those pixels with non zero disparity and the value set to every filtered pixel will depend on the mean and median of the neighborhood. The variation of azimuth angle generates a pencil of planes which could be seen as the epipolar plane whose intersections with the images planes are the scan lines of the images (see figure 3.b), this is possible because the epipolar calibration has been done during the wrapping phase (where lens distortion and center of the cameras are compensated)of the on-line 3D scene mesh generation. Finally the 3D mesh of the field of view is formed by two sets of vertices (conformal 5D points). Due to the geometric nature of the approach and the features
Figure 4. Disparity Map by means of a Multiresolution Algorithm
provided by the incidence algebra [12] the usage of geometric primitives (as planes, lines,
406 D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids
spheres etc.)provide a powerful tool-box for further applications, however until this point the content in the visual space has been encoded with conformal points. The idea is to find these geometric primitives from the mesh in such a manner that effects of noise and quantization induced by the digital images could be diminished. In this way we address an intuitive geometric approach for visual space perception on humanoids, among other applications. A technique to do this in laser range data has been use by [14],to improve
Figure 5. a) Segmented Image for geometric rectification b) Selected Patch and its Centroid c) Plane Fitting d)Ellipsoid represented by covariance matrix where based on eigen vectors from Covariance Matrix e)3D Scene Reconstruction, depth segmentation and Ego-Sphere visualization in axes obtained from virtual reality for Human computer interaction
the performance and quality (on stereo images)we use the patches in the images which may be suitable to be a certain geometric primitive, ie. in the case of the plane the image patch will be similar in texture or color, this similarity could be determined by a segmentation phase using well known algorithms [13], once the image segmentation is done, then it is possible to define a list of conformal points from the iso-disparity mesh which correspond to a particular image patch , this list denoted by where is the index of a particular conformal point, the size of the list is , then the 3D centroid of the conformal points within the list is calculated see Figure 5. After that the plane fitting phase is applied in every patch using a covariance matrix as (7)
(8)
then we calculate the eigen values ( , and )and eigen vectors ( , and )from , where the eigen vector from the smallest eigen value represent the normal , in this fashion we use this of the plane, so from the plane equation geometric primitive for space rectification, this is indeed done by means of intersecting )to get the lines which describe boundaries for the patch, among other planes ( geometric corrections.
D.I. Gonzalez-Aguirre and E.J. Bayro-Corrochano / An Intuitive Perception System of Humanoids 407
4. Applications for Robotic Vision
Figure 6. Distance and motion calculation for automated visual inspection, using conformal horopter.
Once the model was implemented using appropriate programming techniques, we measured the directed distance between the corners of a geometric entity (cube). The distances obtained have fulfilled the precision desired for most manipulations: about millimeters at 850 millimeters from camera rig center.Based on the reconstruction of the two scenes, we determined the motion, the results where compared with the ground truth at 900 millimeters from camera rig center with calculated from the outlines in scene ( ). In the context of human computer interaction it is important to use the a rotation of visual information to determinate the position, pose and gesture of the user in front of the system. The conformal model has been used to build an application where a scene reconstruction and modeling (done with the geometric space rectification)has been done, this data is used to segment elements in the field of view bases on its depth (in a multiresolution manner). More information (color skin, shape, etc.)is used to estimate the user pose or gesture. Once this pose is know we can define assets in the 3D virtual world, in such a way that if the user touches this asset some command is dispatched see Figure 5.e. 5. Conclusions In this work we propose a novel conformal model for robot vision. The old concept of horopter circles is extended to a horopter spheres which leads to a 3D log-polar representation of the visual space. This model has certain biological plausibility and looks promising for making binocular heads more effectivefor real time tasks. We use our model to build a powerful binocular system which will be used for an humanoid, intuitive man-machine interfaces and advanced 3D automated visual inspection. References [1] A. Yeung and N. Barnes, “Active Monocular Fixation using the Log-polar Sensor,” Proc. of the 2003 Australasian Conf. on Robotics and Automation., 2003. [2] R. Mandelbaum, L. McDowell, L. Bogoni, B. Reich and M. Hansen, “Real-time stereo processing, obstacle detection, and terrain estimation from vehicle-mounted stereo cameras,” Proc. of the Fourth IEEE Workshop on Applications of Computer Vision., 1998. [3] Rosenhahn B. and G. Sommer. Pose Estimation in Conformal Geometric Algebra. Report 206, Christian Albrechts Universität, Kiel 2002. [4] Eduardo Bayro-Corrochano. Handbook of Geometric Computing. Springer, Berlin 2005. [5] Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge University Press, UK 2000. [6] E. Trucco, K. Plakas, N. Brandenburg, P. Kauff, M. Karl and O. Schreer,“Real-time Dispariy Maps for Immersive 3-D Teleconferencing by Hybrid Recursive Matching and Census Transform”. [7] J. Hakkinen,“Using interocular apparent movement to measure head-mounted display misalignment”,2004. [8] Z. Zhang,“Flexible Camera Calibration By Viewing a Plane From Unknown Orientations”,1999. [9] S. Steinman,“Binocular Vision Module-The Empirical Horopter”,Addison-Wesley,1994. [10] C. Capurro and F. Panerai and G. Sandini,“Dynamic Vergence Using Log-Polar Images”,1997. [11] M. Pollefeys,“Tutorial on 3D Modeling from Images”,2000. [12] Bayro-Corrochano E. Geometric Computing for Perception Action Systems. Springer Verlag, Boston, 2001. [13] D.Comaniciu and P.Meer,“Mean shift: A robust approach toward feature space analysis.”. IEEE Trans. Pattern Anal. Machine Intell , 24 , 603-619, 2002. [14] J. Castaño, E. Zalama and J. Gómez, “Reconstruction of Three Dimensional Models of Environments with a Mobil Robot”, International Conference on Robotics and Automation, 2003
408
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Autonomous learning of a topological model in a road network Gabriel Aviña Cervantès , and Michel Devy FIMEE, University of Guanajuato, Mexico LAAS-CNRS, Toulouse, France Abstract. This paper concerns the construction of a topological model by a robot which must learn a road network. The topological graph is incrementally built from images acquired by a single color camera. On every color image, the road is extracted as an homogeneous region in a color-texture space; the road boundary is extracted and described by the Shape Context descriptor. When moving on a road from a node A, the road configuration (straight, turning, crossing...) is recognized from this descriptor; when a crossing is detected, a node B and an edge A-B are created. The road extraction and classification functions have been validated on a large image data base; results on topological modelling, are provided from a sequence of synthetic images created on a virtual road network. Keywords. Topological navigation, road crossings, Shape Context descriptor
1. Introduction This work deals with the automatic construction of a topological graph by a robot, mapping a road network (figure 1(left)). In such an outdoor environment, the robot can navigate using sensor-based motions executed to follow a road, traverse a road crossing, turn at left or at right. The motion primitives integrated in our robot are described in [1]; figures 1(middle) and (right) show our Dala testbed executing a path defined in the current image, in order to follow a earthen road. In such a navigation mode, a qualitative localization is sufficient to express the robot position with respect to the topological graph: the robot is in a road crossing, represented by a node , or is moving along a road, represented by an edge between nodes and . Many works on topological modelling are devoted to indoor navigation in a corridor network. The model is given by a Generalized Voronoi Graph (GVG) built generally from sonar or laser data [2] to detect walls. The problem is far more complex for outdoor navigation: vision must be used to detect the road and the road crossings. Indeed, the automatic road extraction has been an important research subject in the last few years, motivated by robot applications in natural environments. Nevertheless, the real applicability of such a function is still limited [3] due to unsatisfactory results or to the complexity of 1 The author thanks the support of the CONACyT. This work has been partially funded by the FrenchMexican Laboratory on Computer Science (LAFMI, Laboratoire Franco-Mexicain d’Informatique) 2 Correspondence to: Michel Devy, LAAS-CNRS, 7 avenue du Colonel Roche, 31077 Toulouse Cedex 04, FRANCE. Tel.:+33 5 61 33 63 31 ; Fax:+33 5 61 33 64 55 ; E-mail: [email protected]
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
409
Figure 1. (left) A typical road network. (middle) Road following: robot view. (right) operator view.
the proposed algorithms. In this work, the road is efficiently extracted from monocular color images by the scene modeling approach described in [4] This paper is focussed on the recognition and the categorization of road configurations: such a function is required in order to detect a new node while building a topological graph on a road network, or while executing a trajectory expressed as a path in such a graph. Appearance-based features like color and texture allow to extract the road from the background. The road shape is exploited to recognize its configuration. A discriminant shape representation [5,6] must be selected; it must be at the same time, unique for a given object and able to accept certain variations to characterize a whole class. It requires robustness regarding translation, rotation, scale variations or affine transformations, so that it could be invariant according to the view point. Contour-based shape representations rely only on the knowledge of the object contour, while region-based ones depend on the whole shape region [5]. This paper exploits the versatility of the Shape Context representation [10], to classify roads according to their morphological characteristics, in crossroads, straight lines, T-intersections, dead-ends, etc.. The Shape Context coarsely describes a shape in an image, by the relative positions between points of its contour. Thus, the matching between two shapes consists in getting for each point on a shape, the most similar point on the other shape, comparing the shape context descriptors. A overall similarity score between two shapes can be solved using the bipartite weighted graph matching algorithm [9]. This paper is organized as follows: first, the road extraction and representation are described in section 2. In section 3 the shape matching is discussed, the results in topological modelling are analyzed in section 5 and this work is concluded in section 6.
2. Image Processing and Feature extraction This section presents the Shape Context descriptor [10] used to characterize a road configuration. At first, the road region is extracted from a color image [4]; a segmentation function provides regions, characterized and labelled by a Support Vector Machine (SVM) classifier, using color and texture attributes (figure 2). The road contours are extracted as a set of connected points; a sampling process is applied to obtain a predefined number of points , , . These points may not have any particularity, i.e. they are neither the maxima nor minima nor inflection points. It is recommended to work with uniformly separated points in order to speed up the matching process. The raw road boundary (figure 2) is often employed to characterize the road shape by its perimeter, its surface and its compactness among others; obviously, it is better to apply a smoothing algorithm to the 2D extracted curve.
410
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
Figure 2. 2D scene modeling and road extraction: from left to right, original image, main identified regions (the red ones are labelled unknown) and road contour extraction(without smoothing)
Shape Context describes the relative position between points (figure 3left). It is made of vectors, each one built from a reference point on the contour, in order to describe the configuration of the other points, according to . This configuration (or Shape Context of ) is given by a coarse histogram , to represent the relative distribution of the position of the others points. is calculated by using the relative points with respect to . This is formulated [10] by the coordinates of the other following relations: . Polar histogram h i
10 9 8 7 6 5 4 3 2 1 0 0
1 2 3 4 5 6 7 8
Figure 3. (left) Polar histogram for a point road contour (8 directions, 12 distances)
0
1
2
3
4
5
6
7
8
9
10
11
12
,(right) a Shape Context descriptor for a point selected on the
The bins are uniformly selected using a log-polar space, so that the largest sensitivity is obtained with the closest points. In practice, a point on the shape is characterized by a normalized radial distance and an angle with the following relations: ; where and are respectively the number of selected bins for the radius and the angle, is the radial distance from a point to the reference point, is the represents the mean distance calculated from the distance maximal distance and in the shape. In fact, the use of the average distance makes the method invariantwith respect to scale variations. By definition, it is invariant with the translation. Nevertheless, shape context is not invariant with an arbitrary affine transform, but using logarithmic
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
411
spaced bins ensures a very small variability with respect to small local distortions. Last, shape context is robust against noise and small occlusions. Figure 3(right) shows the obtained (Shape Context) histogram from a point using the previous definitions.
3. Shape Matching Criterion The correspondence between two different shapes consists, at first, in finding for every point in a shape, the point in the other shape that has the nearest "Shape Context". Over the last few years, there has been an increasing use of distributions to represent the attributes to be measured instead of representing them into some space of characteristics [7,11]. Moreover, using histograms to represent or characterize shapes brings computational advantages because histograms can be matched using probabilistic methods [12,13]. Although there are several similarity measurements (distance between histograms) which give satisfactory results [12,8], the similarity ( ) between two points characterized by histograms is calculated from the statistical test ,
where represents the cost of pairing between two points, symbolizes the number of bins, and respectively indicate the standardized histograms of the some other points and . It is possible to enrich the shape descriptor by adding to local similarity value. Once all the costs between all the pairs of points between the two shapes, are known, the best matchings must minimize the total matching cost, represented by:
where represents the vector of permutation which minimizes the function . The solution is prone to only one constraint, i.e. the mapping should be bijective. This leads us to seek the solution for an optimization problem known as the (optimal assignment problem). The method employed to maximize the similarities is based on the algorithm BWGM, bipartite weighted graph matching [9]. The BWGM is an approach based on the application of Graph Theory to the allowance and the assignment of functional units problem. One of the most effective algorithms was developed by Roy Jonker [9]; as input, this method receives a similar square matrix with and gives us the permutation which minimizes the total matching cost. The distance between two shapes and is measured by the permutation vector , solution of the BWGM, i.e. the symmetrical addition of the costs on the pairs of points which gives the best matchings,
412
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
Figure 4. Some samples of the road categorization database.
where is the number of points on one of the shapes ( should be equal in both shapes), is the cost matrix and the vector solution of the BWGM.
4. Shape Context descriptor for road classification This measurement of similarity between shapes can be applied for object recognition, using a set of prototypes for every class to be recognized. In our context,road classes could be represented either by learnt shapes extracted from actual images, or by ideal synthetic shapes. The matching with prototypes generally leads to an algorithmic implementation of nearest neighbor type. In our implementation the value is a function of the numbut other choices are also valid ber of elements in the database [14] (i.e. ). In practice, a too large data base implies a considerable computing as time for certain applications and sometimes a multidimensional analysis of data (PCA or ICA) is used to reduce redundant information. For our application on road classification, original shape context method has a reduced applicability. This is mostly due to time constraints. One single comparison between two images can take 200 ms using only 100 points from the boundary. If we need more resolution, this approach becomes prohibitive. A light and fast procedure with acceptable recognition rates in road classification from natural scenes, allows to overpass this drawback. A road region extracted from an image, is represented by only one shape context signature. This signature can be obtained either by a single point (e.g. centroid) or the most representative points on the contour. In our application, this signature is calculated , at the following point
where and are the point coordinates. Typically, if the road is seen from a camera mounted on a machine navigating on it, the reference point on the image, will be on the image bottom, and on the middle of the road. In order to compensate errors in matching, we increase both the histogram resolution (radial and angular bin numbers in the shape context) and the number of sampled points used into the model. Descriptors are normalized for being invariant to translation, rotation and scale. Furthermore, it is unnecessary to compute the complex BWGM solution. Therefore, Shape Context descriptors have been resumed into one or a few histogram signatures and a matching using an histogram-based distance. In this case, the matching cost can be negligible for online retrieval.
413
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
5. Experimental Results This section presents some results using the shape context descriptor into road classification problem. We have developed a scheme for road classification and categorization using road shape information: straight, turn or crossing to the right, turn or crossing to the left, cross-roads, dead-ends, etc. By using the shape context descriptor, we have only little sensitivity to the strong variations of orientation, position and size. 5.1. Road classification Several experiments have been done to study the behavior of all the process on a dataset of 250 images (some images on figure 4); the global results give about 90% of good classification, using several resolutions. Figure 5 shows on the right, the similarity scores between the shape extracted from the image presented on the left and all learnt shapes: the minimal score is obtained for image 43 without any ambiguity.
1 200
180
0.9
160
0.8
140
0.7
120
0.6
100
0.5
80
0.4
60
60
0.3
40
40
0.2
20
20
0
0
180 160 140 120 100 80
(a) Test image, result one
0
50
100
150
200
250
0.1 0
50
100
150
(b) Road extraction
200
250
0
20
40
60
80
100
120
140
(c) Scores with learnt images
Figure 5. Road classification from a data base of actual images
The construction of a data base of learnt shapes from actual images, could be a bulky task; a more convenient method consists in considering only ideal shapes extracted from synthetic images. Figure 7(left) presents a top view of a virtual road network; textures have been chosen on the road and off-road in order to make easier the road extraction function. 1280 synthetic images have been generated from this network; figure 6 presents on the right, the similarity scores between the shape extracted from an actual image (on the left) and all shapes extracted from synthetic images: it exists several solutions because several synthetic images give the same shape (like dead-end when arriving in node C or D). An ICA operation on the data base, is mandatory to keep only one learnt shape for every ideal road configuration. On a Ultra station SUN 5, the road extraction and classification using Shape Context descriptors, on one hundred points, requires a CPU time of 150ms: 90% of the road configurations are correctly recognized. The complexity is directly related to the number of sampled points and to the number of bins used to build the histograms. In current exand twelve periments, histograms are built using eight bins for the radial distance . When selecting bins for angular spacing, so that a shape is described by a vector in two hundred points to represent the shape, the computation time becomes 200 ms and the success rate is slightly improved to 92%.
414
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
150
1
180
0.95
160 140
0.9
120
0.85
100
50
100
0.8
80
0.75
60
0.7
40
0.65
20 0
(a) Test image, result one
100
150
200
0
0.6 100
150
200
250
200
300
400
600
800
1000
1200
(c) Scores with learnt images
(b) Road boundaries
Figure 6. Road classification from synthetic patterns
D
C
B2
B3
E
A’
F
B1
A
Figure 7. Top view of a synthetic road network, and the built topological model.
Figure 8. Images acquired from node B to node E
5.2. Topological modelling In [1,4], several results are presented on the road extraction functions and on the integration of these functions on our Dala robot, to execute sensor-based motion primitives like Follow the road. A more global experiment is prepared to evaluate the autonomous learning of a topological model on a simple road network. First, the modelling function has been evaluated on our virtual road network, using an image sequence generated on the path shown on figure 7(left). Figure 8 presents some images generated from node B to node E. On figure 7right, the topological graph is shown. the robot goes three times across the node B, and, because it is impossible to detect that it is the same node, three nodes , and are created. In the same way, the original position in the node , could not be recognized at the end of the sequence, so that a new node is created. We could cope with these problems, using another visual function to recognize nodes that have been already travesed, using for example a panoramic camera (like in [12]).
G. Aviña Cervantès and M. Devy / Autonomous Learning of a Topological Model
415
6. Conclusions This article has presented an approach for topological modelling for a road network, based on visual function for road detection and classification from color images acquired in natural real scenes. From each image, a global scene model is built, using an approach described in previous articles. From this scene model, the road contours are extracted. The classification and categorization of the road is based on the Shape Context descriptor [10] of the road contours. From our experiments, this shape descriptor has shown a great potential for object recognition, and reasonable computing times (some hundreds ms) for real time applications as in mobile robot navigation. Preliminary results in topological modelling have been presented on a sequence of synthetic images; integration on our Dala testbed is on the way.
References [1] D.Mateus, J.G.Avina-Cervantes, and M.Devy, “Robot visual navigation in semi-structured outdoor environments,” in Proc.IEEE Int.Conf. on Robotics and Automation (ICRA), 2005. [2] P. Ranganathan, J.B. Hayet, M. Devy, S. Hutchinson, and F.Lerasle, “Topological navigation and qualitative localization for indoor environment using multisensory perception,” Robotics and Autonomous Systems 1011, p.1-8, 2002. [3] A. Baumgartner, C. Steger, H. Mayer, W. Eckstein, and H. Ebner, “Automatic road extraction in rural areas,” in Int. Archives of Photogrammetry and Remote Sensing, vol. XXXII, Part 3-2W5, pp. 107–112, 1999. [4] G. Avina, M. Devy, and A. Marin, “Lane extraction and tracking for robot navigation in agricultural applications,” in Proc.11th Int. Conf. on Advanced Robotics (ICAR), 2003. [5] Remco C. Veltkamp, “Shape matching: Similarity measures and algorithms,” Technical report uu-cs-2001-03, Utrecht University, Netherlands, 2001. [6] D. Zhang and G. Lu, “Review of shape representation and description techniques,” Pattern Recognition, vol. 37, no. 1, pp. 1–19, 2004. [7] Y. Rubner, C. Tomasi, and L. J. Guibas, “A metric for distributions with applications to image databases,” in Proc. Int. Conf. on Computer Vision (ICCV), 1998. [8] S.H. Cha and S.N. Srihari, “On measuring the distance between histograms,” Pattern Recognition, vol. 35, no. 6, pp. 1355–1370, 2002. [9] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for dense ans sparse linear assignement problems,” Computing, vol. 38, no. 4, pp. 325–340, 1987. [10] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape context,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509– 522, 2002. [11] A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik, “Recognizing objects in range data using regional point descriptors,” in Proc. European Conf. on Computer Vision (ECCV), 2004. [12] J. J. Gonzalez and S. Lacroix, “Rover localization in natural environments by indexing panoramic images,” in Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2002. [13] S. Antani, R. Kasturi, and R. Jain, “A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video,” Pattern Recognition, vol. 35, no. 4, pp. 945–965, 2002. [14] P. Hart, O. Duda and D. Stork, Pattern Classification and Scene Analysis, Wiley & sons, 1998.
416
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Reinforcement Learning Performance Evaluation: An Evolutionary Approach Genci CAPI and Masao YOKOTA Department of System Management, Fukuoka Institute of Technology, Japan Abstract 㧙 In this paper, we analyze the effect of metaparameters on the performance of actor-critic reinforcement learning. The considered two different applications: first the pole-balancing benchmark task and then a surviving behavior where the Cyber Rodent robot has to capture the battery packs and increase its own energy level. The results show that metaparameters highly influence the learning time. In addition, optimal metaparameters generated by evolutionary algorithms have mutual relation with each other. Keywords: Reinforcement learning, metaparameters, evolutionary algorithm.
Introduction The reinforcement learning (RL) ([1], [2], [3], [4], [5]) performance is critically dependent on the setting of few metaparameters, which control how the agent parameters, such as the weights of a neural network defining the policy, change by learning. The appropriate settings of metaparameters depend on the environmental dynamics, the goal of the task, and the time allowed for learning. The permissible ranges of such metaparameters depend on particular tasks and environments, making it necessary for a human expert to tune them usually by trial and error. But tuning multiple metaparameters is quite difficult. In addition, hand tuning of metaparameters is in a marked contrast with learning in animals, which can adjust themselves to unpredicted environments without any help from a supervisor. The main goal of this work is to analyze the effect of metaparameters on RL to match the condition of the task and reduce the learning time. We compare the performance of RL with arbitrarily selected and optimized metaparameters by evolutionary algorithm. The basic idea to apply evolutionary algorithm is to encode the metaparameters of the RL algorithm as the agent’s genes, and to take the metaparameters of best-performing agents in the next generation. In order to test the performance of RL with different settings of metaparameters, we first considered a pole-balancing benchmark task [5]. Then we move to more complex RL application using the CR robot ([6]). The CR has to learn to survive by capturing active battery packs distributed on the environment and recharge its own battery. In the polebalancing task, the learning rate and discount factor are evolved. In the survival task the following cases are considered: 1) arbitrarily selected metaparameters and random initial weight connections; 2) evolved initial weight connections and arbitrarily selected
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
417
metaparameters; 3) evolved metaparameters and random initial weight connections; 4) evolved metaparameters and initial weight connections. The results show that metaparameters have a great effect on performance of RL. In addition, they have strong relation with each-other making that it difficult to adjust them by trial and error. On the other hand, the appropriate settings of meta-parameters can be found by evolution. Evolutionary algorithm is able to localize the best values of metaparameters within the searching space throughout the course of evolution. The learning time is significantly reduced using the optimized metaparameters and the initial weight connections.
1. Tasks 1.1 Pole balancing The pole-balancing task involves a pole hinged to the top of a wheeled cart that travels along a track, as shown in Figure 1. In order to verify the effect of metaparameters in learning time and compare the results, the simulated model of the pole-balancing system and RL algorithm are considered the same as the model presented in [5]. The state at time t is specified by four real-valued variables: ht - the horizontal position of the cart, relative to the track in meters; hl - the horizontal velocity of the cart, in meters/second; T t - the angle between the pole and vertical; Tt - the angular velocity of the pole, in degrees/second. Resulting discrete-time variables are labeled h[t ], h[t ], T [t ], and T[t ]. The state of the cart-pole system must be kept out of certain regions of the state space. The only information regarding the goal of the task is provided by the failure signal, r[t ] , which signals either the pole falling past r 12q or the cart hitting the bounds of the track at r 2.4m . The failure signal is defined as: 1 if T [t] ! 12$ or h[t] ! 2.4m r [t ] ® (1) ¯ 0, otherwise
ǰt
Ft
.
ht
Figure 1. Pole-balancing environment.
418
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
1.2 Surviving The Cyber Rodent robot is shown in Figure 2(a). Its body is 22cm in length and 1.75kg in weight. The CR has a variety of sensory inputs, including a wide-range C-MOS camera, an IR range sensor, seven IR proximity sensors, 3-axis acceleration sensor and 2-axis gyro sensor. Its motion system consists of two wheels that allow the CR to move at a maximum speed of 1.3 m/s and a magnetic jaw that latches the battery packs. The CR has a Hitachi SH-4 CPU with 32 MB memory, which allows fully on-board real-time learning and control. The task for the CR robot is to survive and increase its energy level. The environment has 8 battery packs, as shown in Figure 2(b). The positions of battery packs are considered fixed in the environment and the CR robot is initially placed in a random position and orientation. The CR can recharge its own battery by capturing active battery packs, which are indicated by red LED color. After the robot captures the battery pack, it can recharge its own battery for a determined period of time (charging time), then the battery pack becomes inactive and its LED color changes to green. The battery becomes active again after the reactivation time. Based on the energy level and the distance to the nearest active battery pack, the agent can select among three actions: 1) capture the active battery pack; 2) search for a battery pack or 3) wait until a battery pack becomes active again. The wait behavior is interrupted if a battery becomes active or after a pre-determined period of time. In our environment settings, the battery packs have a long reactivation time. In addition, the energy consumed by CR robot for 1m motion is low. Therefore, the best policy is to capture any visible active battery pack. When there is no visible active battery pack, the agent must search in the environment. As the sensory input we use a constant bias, the battery level and the distance to the nearest active battery pack (both normalized between 0 and 1). The reward is determined based on the battery energy level after each action as:
rt 1
( En _ levelt 1 En _ levelt ) / 50.
(a) (b) Figure 2. Cyber Rodent robot and the environment.
(2)
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
419
2. Actor-Critic RL 2.1 The Critic In the critic network single output cell is directly connected with the input layer and with the hidden layer. Considering a, b, and c respectively the weight connections between inputhidden, input-output and hidden-output, the output cell firing rate is calculated as follows:
n_i n_h ¦ bi xi ¦ ci yi i 1 i 1
Oc
(3)
where n_i and n_h are the number of inputs and hidden neurons and yi are the output of hidden neurons. The TD error is calculated as follows:
0 if the start state ® . k ¯r[t 1] J Q [t ] otherwise
r [t 1]
TD reduces the error by changing the weights, as follows: bi [t 1] bi [t ] U1rˆ[t 1]xi [t ] ci [t 1] ci [t ] U1rˆ[t 1] yi [t ] aij [t 1] aij [t ] U 2 rˆ[t 1] yi [t ](1 yi [t ]) sgn( ci [t ]) x j [t ],
(4)
(5)
where ȡ1, ȡ2 are the learning rates. 2.2 The actor In the pole-balancing task the agent can select among two actions push left or right. While in the survival task, the agent can select one of three actions and so the actor make use of three action cells, pj, j=1,2,3. The captured behavior is considered pre-learned ([7]). When the search behavior is activated, the agent rotates 10 degrees clockwise and the agent does not move when the wait behavior becomes active. The structure of actor network is the same with that of critic network. The weight connections between input-hidden, input-output and hidden-output are d, e and f respectively. The output of action neurons is calculated as follows:
zi
pi
n_i g ( ¦ d ij x j ) , j 1
n_i n_h g ( ¦ eij xi ¦ f ij z i ) , i 1 i 1 s
(6)
(7)
where g ( s ) 1 1 exp . In the survival task, a winner-take-all rule prevents the actor from performing two actions at the same time. The action is selected based on the softmax method as follows:
420
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation pi ( a ,s )
P ( a , st )
e
3
W
¦ (e
pi ( a , s )
W
),
(8)
i 1
where IJ is the temperature of the algorithm. To choose the best action, a signal is required from the critic about the change in value that results from taking an action. It turns out that an appropriate signal is the same prediction error, įt, used in the learning of the value function. For example, consider what happens when appropriate values have been learned that are consistent with the actions specified by the actor, i.e. Oc(x)=V(x). At this point, the currently specified actions should produce, on average zero prediction error (įt=0). However, other actions will produce, on average, nonzero įt. In particular, if įt>0 the new action is better one and if įt<0 is a worse one. The actor weights are adapted as follows:
ei [t ] D 1 rˆ[t 1]( q[t ] p[t ]) xi [t ] f i [ t 1] f i [ t ] D 1 rˆ[ t 1]( q [ t ] p [ t ]) z i [ t ] d ij [t 1] d ij [t ] D 2 rˆ[t 1]z i [t ](1 z i [t ]) sgn( f i [t ])(q[t ] p[t ]) x j [t ] ei [t 1]
(9)
3. Evolution of Metaparameters Our main goal is to reduce the learning time by selecting appropriate metaparameters. Therefore, in the pole balancing task the fitness value is calculated as the number of trials needed by the agent to learn the policy. We considered that the agent learned the policy if it continually balances the pole and does note hit the end of the track for 1000 actions. The fitness function in the surviving behavior is considered as follows: CR life if agent dies ° learn _ t °° (10) Fitness ® En level 1 if agent survives ° learn _ t battery fully recharged ° En max CR life °¯ where CRlife is the CR life in seconds, learn_t is the maximum learning time, Enlevel is the level of energy if the agent survives but the battery is not fully recharged and Enmax is the maximum level of energy. The maximum learning time is 7200 s. Enlevel varies between 0 and 1 where 0 is empty and 1 is full charged (Enmax). Based on this fitness function the agents that can not survive get a better fitness if they live longer. The agents that survive get a better fitness if the energy level at the end of maximum learning time is higher. The agents that learned to fully recharge the battery faster get higher fitness value. In order to see the effect of metaparameters and initial weight connections on learning time, we considered the following cases: (a) Evolution of meta-parameters, initial weight connection of actor and critic networks, and the number of hidden neurons; (b) Evolution of meta-parameters with randomly initialized weight connections; (c) Evolution of initial weight connections with arbitrary selected meta-parameters; (d) Learning with arbitrary selected meta-parameters and randomly initialized weight connections.
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
421
4. Results 4.1 Pole-balancing task In the pole-balancing task, we evolved the learning rate and the discount factor. In order to better understand the relation between these metaparameters, we considered the same learning rate for the actor and critic network and between different layers. The searching interval for the learning rate and discount factor is between 0 and 1. Figure 3 shows the fitness value of the best individual of each generation. The number of trials needed by the agent to learn the policy is reduced during the course of evolution. The optimal values of Ȗ and learning rate generated by GA are 0.82 and 0.64, respectively. The agent learned the task after 408 trials. When we applied the same values of metaparameters as in [5] the task is learned after 3423 trials. In addition, Figure 4 shows how 100 individuals of Generations 5, 10, 15, 20, are distributed in the searching interval. GA is able to localize a small region within the searching space of metaparameters. Based on the results of last generation, we can see that Ȗ is in a very narrow interval while the learning rate is in wider region. In addition, Figure 5 shows a linear relation between the learning rate and discount factor based on the distribution of optimal values selected from the last population. Learning time for the metaparameters shown in this graph is smaller than 500 trials. 4.2 Surviving task In order to determine the energy level after each action, we measured the battery level of CR when the robot moves with a nearly constant velocity of 0.3m/s. The collected data are shown in Figure 6. The graph shows that there is a nonlinear relationship between time and energy level. Therefore, we used the virtual life time to determine the energy level after each action. Except capturing the battery pack, the search and wait actions increased the virtual life time. 䎕䎓䎓䎓
䏑䏕䎃䏒䏉䎃䏗䏕䏌䏄䏏䏖
䎔䎘䎓䎓
䎔䎓䎓䎓
䎘䎓䎓
䎓
䎘
䎔䎓 䎪䏈䏑䏈䏕䏄䏗䏌䏒䏑
䎔䎘
䎕䎓
Figure 3. Fitness value of the best individual.
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation 䎪䏈䏑䏈䏕䏄䏗䏌䏒䏑䎃䎘
䎪䏈䏑䏈䏕䏄䏗䏌䏒䏑䎃䎔䎓 䎕䎓
䏑䏕䎑䎃䏌䏑䏇䏖
䏑䏕䎑䎃䏌䏑䏇䏖
䎕䎓 䎔䎓
䎓 䎔 䏏䏈䏄䏕䏑䎃䏕䏄䏗䏈
䎔䎓 䎓 䎔
䎔
䎓䎑䎘 䎓
䎔
䎓䎑䎘
䎓䎑䎘 䎓
䏏䏈䏄䏕䏑䎃䏕䏄䏗䏈
J
䎪䏈䏑䏈䏕䏄䏗䏌䏒䏑䎃䎔䎘
䎓䎑䎘 䎓
䎓
J
䎪䏈䏑䏈䏕䏄䏗䏌䏒䏑䎃䎕䎓
䎕䎓
䎕䎓 䏑䏕䎑䎃䏌䏑䏇䏖
䏑䏕䎑䎃䏌䏑䏇䏖
䎔䎓 䎓 䎔 䏏䏈䏄䏕䏑䎃䏕䏄䏗䏈
䎔䎓 䎓 䎔
䎔
䎓䎑䎘 䎓
䎓
䎔
䎓䎑䎘
䎓䎑䎘
䏏䏈䏄䏕䏑䎃䏕䏄䏗䏈
J
䎓䎑䎘 䎓
䎓
J
Figure 4. Distribution of individuals during evolution.
㫃㪼㪸㫉㫅㩷㫉㪸㫋㪼
㪇㪅㪎㪌 㪇㪅㪍㪌 㪇㪅㪌㪌 㪇㪅㪋㪌 㪇㪅㪊㪌 㪇㪅㪍
㪇㪅㪎
㪇㪅㪏
㪇㪅㪐
㪈
㪾㪸㫄㫄㪸
Figure 5. Evolved metaparameters from the last generation. 2600 2400
energy level
422
2200 2000 1800 1600 1400 0
1000
2000
3000
4000
tim e[s]
Figure 6. Battery level during CR motion.
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
423
Table 1. Values of metaparameters Values Optimal values Randomly between [-0.5 0.5] 0.2; 0.1; 0.8; 0.3 0.9210; 0.3022; 0.9256; 0.6310 10 5.3718 0.9 0.794
Parameters Initial Weights ȡ1, ȡ2, Į1, Į2 IJ0 Ȗ
energy level
1
0.8
evol. metaparam and init w eight 0.6
evol. metaparam / rand. init w eight evol. init w eight /arbitrary metaparam arbitrary metaparam/rand init w eight
0.4 0
1000
2000
3000
4000
5000
6000
tim e [s ]
Figure 7. Performance of best agent for different settings of metaparameters. First, learning took place with arbitrarily selected metaparameters (Table 1) and random initial weight connections. The agent continued learning for nearly 5500s until the battery was fully recharged (Figure 7). Then, using the same metaparameters and neural structure, we evolved the initial weight connections. Initially, 100 individuals are created and the evolution terminated after 20 generations. Figure 7 shows that learning time is reduced. When the metaparameters are evolved and initial weight connections are randomly selected, the learning time is significantly reduced (Figure 7). This result is important because it shows that metaparameters has larger effect on learning time compared to initial weight connections. The searching interval and GA results, when both metaparameters and initial weight connections are optimized by GA, are shown in Table 1. The initial value of weight connections are very near with their respective values after learning. In addition, the optimized value of cooling factor is low. Therefore, the agent starts to exploit the environment and make greedy actions soon after the learning starts, recharging its own battery in a very short time.
5. Conclusion In this paper, we analyzed the effect of metaparameters on RL performance. First, we applied our method on the pole-balancing, which is often used as a benchmark task of RL.
424
G. Capi and M. Yokota / Reinforcement Learning Performance Evaluation
Then we considered a surviving behavior for CR robot. Based on the simulations and experimental results we conclude: (1) Evolutionary algorithms can be successfully applied to determine the optimal values of metaparameters. (2) Metaparameters play important role on the agent learned policy. (3) Optimized metaparameters can significantly reduce the learning time. In the future, we are interesting to investigate more in detail the relation between other metaparameters. Furthermore, it would be interesting to see if evolution can also be applied to shape the reward function used during the learning process.
References [1] A. G. Barto, Reinforcement learning. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks, (pp. 804–809). Cambridge, MA: MIT Press, 1995. [2] K. Doya, Reinforcement learning in continuous time and space, Neural Computation, 12 (2000), 215–245. [3] K. Doya, H. Kimura, M. Kawato, Computational approaches to neural mechanism of learning and control. IEEE Control Systems Magazine, 21 (2001), 42–54. [4] N. Schweighofer, K. Doya Meta-learning in reinforcement learning. Neural Networks, 16 (2003), 5-9. [5] R. Sutton, A. G. Barto, Reinforcement learning. Cambridge, MA, MIT Press, 1998. [6] C. W. Anderson, Strategy learning with multilayer connectionist representations, Proc. of the Fourth Int. Workshop on Machine Learning, 103-114, 1987. [7] G. Capi, E. Uchibe, K. Doya. Selection of neural architecture and the environment Complexity. Advanced Intelligence, 6 (2002), 311-317, Ed. K. Murase and T. Asakura.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
425
Quantify distinguishability in Robotics Aurélien Hazan a,1 , Frédéric Davesne a , Vincent Vigneron a and Hichem Maaref a a
Image and Data Processing Group Complex Systems Laboratory (LSC) Abstract. Through a Dynamical Systems approach to robotics, we introduce a measure of distinguishability between different behaviours of a robot, in an uncertain context, thanks to interval analysis. Keywords. mobile robotics, dynamical systems, discrimination, interval analysis
1. Introduction Our long-term goal is (i) to do predictible and efficient environment recognition, with a simple low-cost robot, and (ii) to know before the experiment if such an aim is reachable given the environmental constraints. To achieve this goal, we make two hypotheses: first, the states of the robot form a continuum, but not all of them can be distinguished by the robot itself. Secondly, the recognition capabilities of a robot depend strongly on the density of those distinguishable states among the continuous state-space. Furthermore, we think that environment recognition mustn’t be defined alone, but comes side-by-side with a main interaction in the environment (e.g. obstacle avoidance, of homing). Admitting those hypotheses, this article sets the basis for a quantified and predictive approach to robotics, in order to know before any experiment what one can expect or not from a robot as regards its capacity to distinguish one of its own behaviour from another, in an uncertain context. To do so, we adapt mathematical tools belonging to Dynamical Systems Theory, and mix them with interval analysis so as to deal with uncertainty. More precisely, our proposition to fulfill this aim is to quantify how many states a robot can distinguish, per volume unit in the state-space. In following works, we will consider improving that number with an appropriate feedback. Section 2 is devoted to stating some definitions, especially (n, ε)-separation, and focuses on a toy problem. Section 3 extends those considerations to an experimental context, while section 5 summarizes the results and gives some insight of future works. 1 Correspondence
to: Aurélien Hazan, Laboratoire Systèmes Complexes, CE 1433 Courcouronnes, 40 rue du Pelvoux 91020 Evry Cedex. Tel.: +33 1 69 47 75 36; Fax: +33 1 69 47 06 03; E-mail: [email protected]
426
A. Hazan et al. / Quantify Distinguishability in Robotics
2. Theoretical state-counting 2.1. Background From a macroscopic point of view, the robot is a system on which forces apply, openned to a flow of energy. To give a kinematic description of its state, we set a state-space and each point in it represents a unique state of the system. But we’re also interested in a dynamical description : which transformation in the phase-space does the robot undergo during its activity ? In this paper we adopt a dynamical systems approach (see [1]) and will consider a discrete time and model the state of a robot as a point in a multidimensional state-space X. The interaction of the robot with its environment is viewed as an application T : X → X that ∀x ∈ X associates T (x). 2.2. Noise Our system is mainly driven by deterministic forces, but we also need to take fluctuations into account, since the interaction itself can be partly random, or because of the data acquisition process. For example, we could take a purely deterministic application T from X to X, and ω a random variable taking its values in X. The application S would act on x as shown by Eq.(1). Then, our aim would be to find the probability density function (pdf) of S n (x0 ), given the pdf of x0 , where we hold S n is the n-th iteration of S. In the literature, tools such as stochastic differential equations or Fokker-Planck equations could do the job. Instead, we propose in this article interval analysis (see [3]) as a first approach to model uncertainty, because we need to catch a glimpse of the overall behaviour of the system without being hindered by the intractable calculus that quickly comes with those frameworks. From now we will consider the interval [x] instead of x to model the intrinsic randomness of T , as well as [ω] instead of ω to model the uncertainty due to the measurement process. In the rest of this article, we will study the trajectory resulting from the repeated action of S on some initial condition [x0 ], as shown by equation 2. S
x → T (x) + ω S + [ω] → T ([x]) [x]
(1) (2)
where a trajectory is the set {x, S(x), . . . , S n (x)}. We will need the following notations: ω and ω are respectively the lower and upper bounds of ω, hence ω = [ω, ω]. Besides, we call W ([ω]) the width of interval [ω], i.e. the length ω − ω. 2.3. Tools from Dynamical Systems Theory As a consequence of hypotheses made in section 1, we look for a way to count the number of “distinguishable” states in a given subspace of the state-space. To define the notion of “distinguishability”, we refer to some definitions given by R. Bowen on the way to redefining topological entropy (see [2]). Let’s start introducing a two distances on X, with T an application from X to X:
A. Hazan et al. / Quantify Distinguishability in Robotics
427
Definition 1 d(x, y) = |x − y|, i.e. the euclidean norm on X i
i
dn,T (x, y) = max d(T (x), T (y)) i
(3) (4)
an illustration of which is given by Fig.1(a). When there is no ambiguity concerning the application, we will simply put dn instead of dn,T . Now we introduce the main definition: Definition 2 let K be a subset of E ⊂ X, n an integer and ε ∈ R, ε > 0. We say that K ⊂ E is an (n, ε)-separated set with respect to T if, for every (x, y) ∈ K, we have : x = y ⇒ dn (x, y) > ε. Given an arbitrary finite precision ε, two points are thus considered distinct if the application T makes them more distant than ε after n iterations. Knowing if a set of points is separated, we now count the maximum number of points satisfying that constraint, in a limited domain of the state-space: Definition 3 in the conditions of Definition 2, sn (ε, E) is the largest cardinality of any (n, ε)-separated set K contained in E. 2.4. Example Let T : x → αx be a deterministic scalar application defined on R, that models the dynamics of the robot in state space. Though it looks much too simple an example, it will prove to be useful since such exponential behaviour arise frequently in robotics. First, we look for a set K of (n, ε)-separated points in some subset E of R, admitting that it has a predefinite geometry governed by a small number of parameters. In this example, as shown by Fig. 1(b), E = [a, b] and the points of K are separated by a constant distance d along the real line. The question we ask is how to choose d such that the points of K are (n, ε)-separated, and what is the maximal value of d ? As mentionned in 2.2, the answer we propose is given in an interval analysis framework, a basic tool of which is set inversion. Let f : Rn → Rm be a function, and V a subset of Rm . Set inversion allows to characterize U = f −1 (V ) = {x ∈ Rn |f (x) ∈ V }, thanks to subpavings U − and U + such that U − ⊂ U ⊂ U + . In the present case, we use the algorithm SIVIA (see [3] for more precisions) to perform this characterization and yield U − and U + with an arbitrary resolution. Applying this method to find d such that points of K are (n, ε)separated, we get a set of solutions as shown by Fig. 1(c), where red indicates guaranteed solutions, blue stands for rejected points, and yellow shows areas where the resolution was insufficient to conclude. According to the results, d must exceed approximately 2/3 so that all points of K ⊂ E are (n, ε)-separated by T . Now we add some measurement noise to our model, represented here by an interval ω, and note S : x → T (x) + ω the studied application. We wish to maintain the geometrical hypothesis that constrains the position of points in K, and look both for the geometry parameter d, and for the noise parameter, such that the set of points remains (n, ε)-separated by S. To simplify the problem, call y a fixed point chosen in E, with no uncertainty attached to its location so that we can note [y] = [y, y]. Now call [x] the location of another point of K, separated from [y] by an unknown distance [d], such that [x] = [y] + [d]. Then we apply S to both points and look for the distance of Bowen that
428
A. Hazan et al. / Quantify Distinguishability in Robotics
dn (x0 , x1 )
T n (x0 )
T n (x1 ) b
a
T (x0 )
d
T (x1 ) x0 x1
K (a)
(b)
not solution undefinite d 0
1
2
3
solution
15
W ([ω]) ω−ω 0
W ([ω])
ω−ω
(c)
ω
ω
(d)
Figure 1. (a) Distance of Bowen between x0 and x1 . (b) Set K in E = [a, b], with predefi nite geometry parametrized by d. (c) Minimal distance d between points of K ⊂ E that guarantees their (n, ε)-separation by the application T : x → αx. Here α = 1.1, n = 10, ε = 1.5. (d) Comparison betwwen w([ω] − [ω]), and W ([ω]).
must satisfy the constraint dn ([x], [y]) = | S n ([x]) − S n ([y]) | > ε. Elementary calculus shows that: S
S
[y] → α[y] + [ω] , [x] → α([y] + [d]) + [ω] Sn
[y] → αn [y] +
n−1
Sn
αi [ω] , [x] → αn ([y] + [d]) +
i=0
(5) n−1
(6)
i=0
n−1 n−1 ! dn ([x], [y]) = αn [d] + αi (ω − ω), αi (ω − ω) i=0
αi [ω]
(7)
i=0
with the hypothesis that α > 1. We notice that because of the properties of set computation, [y] vanishes while ω − ω -that characterizes the width of [ω]- appears. We explain this remarking that interval [ω] − [ω] is centered around zero, hence the variable that matters is no longer [ω] but its width w([ω] − [ω]), as shown by Fig. 1(d). Fig. 3(a) displays the numerical solution of inequation dn ([x], [y]) > ε computed by SIVIA, as a function of [d], the distance between [x] and [y], and W ([ω]). Note that red areas stand for a guaranteed solution, blue ones show points that are not solution, and yellow denotes points for which nothing can be said. Compared to the previous case when no perturbation was added during the interaction, we find the same qualitative behaviour:
A. Hazan et al. / Quantify Distinguishability in Robotics
a
429
b x0 S(x0 ) S 2 (x0 )
a
S n (x0 ) b
x0
S n (x0 )
a
b x0
x0
Figure 2. Experimental scheme. Top: n steps of a trajectory, starting from x 0 . Center: back to x0 . Bottom: next initial condition x0
d must exceed a given value for the separation condition to be verified. However it is worth noting that the minimum required distance d between two consecutive points must increase when the width W ([ω]) of additive noise raises. This intuitive result shows that increasing the uncertainty lowers the maximum number of (n, ε)-separated points per volume unit in our simple exmaple. 3. Experimental state-counting Dealing with the theoretical case in 2, we made a simplifying hypothesis that diserves criticism. Indeed, in order to verify if a set K complies with a given condition, we first have to specify that set of initial conditions K in the state-space. Then the robot must span it and for each initial point, perform a trajectory then jump to the following initial point as shown in Fig. 2. Nevertheless, the robot is not supposed to know how to reach any arbitrary point in the state-space, consequently, K can’t be defined before the experiment, and the approximation method presented in 2.4 fails. Instead, as we still want the robot to go along the trajectory driven by S, then to come back near the initial condition and start again with another one, we suggest two hypotheses: H1 : the robot knows T −1 , thus it can come back approximately to its starting point. H2 : the robot can jump randomly in the state space and reach a new initial condition. Our aim here is twofold : first to approximate S −n ◦ S n ([x0 ]), then to inject it in the scheme developped in 2.4 to characterize the necessary jump between two initial conditions belonging to K so that they remain (n, ε)-separated in spite of the uncertainty induced by S −n ◦ S n . 3.1. Image of y by S −n ◦ S n In 2.4, we considered a fixed point y, that was written as a trivial interval [y] = [y, y], and the uncertainty concerned the position of the next point [x]. In the next section, y won’t be considered fixed any longer, because it will be produced by a back-and-forth movement given by S −n ◦ S n . Let’s focus on the image of a point z with no uncertainty attached to its position, by S −n ◦ S n , then we use hypothesis H1 stated at the beginning of 3, to define S −1 . First we write again (6):
430
A. Hazan et al. / Quantify Distinguishability in Robotics
Figure 3. Couples d, w(ω) such that two points of K are (n, ε)-separated by an application. In both cases, α = 1.1, n = 10, ε = 1.5.
[z]
Sn
→
αn [z] +
n−1
S −n
αi [w] , [z] →
i=0
[z]
S −n ◦S n
→
[z] +
n−1
αi−n [w] +
i=0
n−1
1 1 [z] + [w] n α αi i=0 n−1
α−i [w]
(8)
(9)
i=0
which can’t be easily simplified because of particular properties of set computation. We verifiy that although [z] is reduced to a trivial set [z, z], its image by S −n ◦ S n is a non trivial interval in the general case where ω is nontrivial. 3.2. Characterization of the jump [d] Reconsidering the results given in 2.4, we suppress the hypothesis on [y] that makes it a trivial interval. Instead we rewrite (7), replacing [y] by (9), and get: n−1 n−1 ! dn ([x], [y]) = αn [d] + βi,n (ω − ω), βi,n (ω − ω) i=0
(10)
i=0
where βi,n = αi + α−i + αi−n . Obviously, this constraint dn ([x], [y]) > ε is more stringent than in (7), because βi,n > αi hence the uncertainty interval is larger. Numerical resolution of the inequation verifies this statement, as shown in Fig. 3(b). Compared to Fig. 3(b), the solution area (in red) was pushed towards higher values of d. Back and forth movement due to S −n ◦ S n has increased uncertainty, and made more difficult the verification of the (n, ε)-separation constraint. 3.3. Comments However the purpose of this method must be discussed: it can give predictive insight to the experimenter concerning the limits of the robots, especially concerning its discrimi-
A. Hazan et al. / Quantify Distinguishability in Robotics
431
native capabilities. It allows to answer, before the experiment, such questions as “given that level of noise and that application, what will be the minimum distance between two points in phase space such that they can be separated by T ”. But since the robot has access neither to intervals (but to individual trajectories), nor to interval calculus, it can’t determine on-line if two points are (n, ε)-separated.
4. Environment recognition Since the robot knows nothing of the environment except the data it acquires during its interaction, what we call environment recognition is (i) the ability to compare trajectories in the phase space, so as to make some similarity statements about them, (ii) the ability to infer similarity between locations in the environment using (i). It is important to remark that the coordinates of a system in an environment may be part of the state space, but that environment and state-space mustn’t get confused. Protocol We keep modelling the interaction of the robot by the application S : x → T (x)+ω where T (x) = αx. Now, instead of being modelled as an interval, ω is a random variable that obeys a uniform law U(−a, a). Thus S is defined on the state-space X = R, which is confused in that particular example with the environment. Then the robot is placed in different initial locations {xi }i∈[1,p] in the state-space, and we let it go along fixed-length trajectories si∈[1,p] , each being defined by si = {xi , S(xi ), . . . , S n (xi )}. Among the initial conditions {xi }i∈[1,p] , one is the origin of two trajectories s1 and s2 belonging to si∈[1,p] . Nevertheless s1 and s2 are not identical because of the additive random variable. The task for the robot is to pick s2 among all the others, being given s1 . To do so, it is provided with the distance < si , sj >= dn,S (xi , xj ) between trajectories. Hence the robot computes all the distance < s1 , sk >k∈[1,p] , and finally picks the trajectory with the shortest distance to s1 . Then, we simply compute the empirical probability of success, as a function of α and of the parameter a. Results An elementary experiment is the realisation of p trajectories of length n, with different initial conditions, and the pairing of s1 and s2 through minimization of < s1 , sk >k∈[1,p] . This elementary experiment is repeated N times so as to compute an empirical probability of success. Then we repeat this with different values of α and a. Results are shown by Fig.4(a). It appears clearly that the probability of success increases when α increases, that is when the exponential behaviour of application x → S n (x) strengthens. On the opposite, when a raises, that is when the range of the uncertainty term driven by the uniform law U(−a, a) widens, the probability of success decreases. Comments First, we note that the given result is not predictive, though we emphasized the importance of such a goal in the introduction. Further examination must allow to show before any simulation the expected probability of success. Secondly, suppose we keep constant the parameter of the random variable ω. In 4 we saw that raising α resulted in increasing the probability of success. Now we wish to emphasize the relation between α and sn (ε, E), the maximum cardinality of an (n, ε)-separated set K ∈ E. Indeed the n relation sn,T (ε, E) = [ αε l ] holds, in the deterministic case, between α and sn . Consequently, if we admit temporarily the identification between sn,T (ε, E) and sn,S (ε, E), there is a functional relation between sn and the probability of success for environment recognition, represented by Fig.4(b).
432
A. Hazan et al. / Quantify Distinguishability in Robotics
Figure 4. Probability of success. For each couple of parameter α, a, we run 1000 trajectories of length n = 50 steps, in order to compute the empirical probability. (a):Probability of success, as a function of α and the parameter of the random variable ω. (b):Probability of success, as a function of s n , when the parameter of the random variable ω are fi xed with a = 0.15
5. Conclusion, perspectives In this article we gave a measurable definition of discrimination in robotics, which is a pretty rare topic in that field. We proposed simple mathematical tools to quantify that notion both in a descriptive and predictive manner, and gave an example of calculus. The functional link between our measure and the probability of success of an environment recognition task was underlined. Next we plan first to complexify the example we worked on, then to propose a method that makes the robot able to measure itself the number of separated points in a given volume of the state-space, and to apply it experimentally. For that purpose, we may extend our mathematical scope and involve stochastic tools. Third, we plan to give predictive results of the probability of success of the recognition task. Finally we will look for optimality constraints concerning the whole robot (e.g. architecture, number and technology of sensors and effectors) so as to maximize on-line the density of separated points per volume unit, so as to design feedback laws.
References [1] R. Beer, The dynamics of adaptive behavior: A research program, Robotics and Autonomous Systems 20:257-289, 1997. [2] R. Bowen, Entropy for group endomorphisms and homogeneous spaces, Trans. Am. Math. Soc., 153, p. 401-414, 1971. [3] L. Jaulin, M. Kieffer, O. Didrit and E. Walter, Applied Interval Analysis, Springer-Verlag, London, 2001.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
433
Group Transport Along a Robot Chain in a Self-Organised Robot Colony Shervin Nouyan a , Roderich Groß a,b , Marco Dorigo a , Michael Bonani c and Francesco Mondada c a IRIDIA, Université Libre de Bruxelles, Belgium b Yamakita Lab, Tokyo Institute of Technology, Japan c Autonomous Systems Lab, École Polytechnique Fédérale de Lausanne, Switzerland Abstract. We study groups of autonomous robots engaged in a foraging task as typically found in some ant colonies. The task is to find a prey object and a nest object, establish a path between the two, and transport the prey to the nest. Once a path is established, robots are recruited to the prey, self-assemble into a pulling structure and collectively transport the prey—which is too heavy for a single robot to move it— along the path to the nest. We follow a swarm-intelligence based control approach. All robots have the same controller. They self-organise into teams and sub-teams that accomplish a number of different tasks concurrently. To solve the subtask of exploration and path formation we propose a new approach, that is, chain formation based on cyclic directional patterns (CDP chains). At present, we believe this study to be the most complex example of self-organisation in the robotics field. Experimental results with groups of 2, 4 and 8 physical robots confirm the reliability and robustness of the system. Keywords. Swarm intelligence, swarm robotics, self-organisation, prey retrieval, path formation, self-assembly, group transport
1. Introduction There are several advantages in using a group of robots instead of a single one: (i) the lack of ability (of a single robot), (ii) increased efficiency, (iii) increased redundancy and fault tolerance, and (iv) reduced costs [1]. However, many challenges arise when controlling a group of robots. Especially for large group sizes, centralised control architectures and complex communication protocols rapidly reach their limits due to individual failure or limited bandwidth. To overcome these problems, and to reach both tight cooperation and scalability at the same time, swarm robotic control algorithms [4] emphasise principles such as decentralisation and exploitation of local sensory information and communication. These principles are assumed to form the basis of the behaviour of social insects when addressing challenging tasks [5]. In this paper, we demonstrate the utility of swarm robotic control algorithms in a complex foraging scenario as typically found in some ant colonies: the robots are randomly scattered in a bounded arena containing two objects—the prey and the nest. The former has to be retrieved to the latter. The following constraints are given:
434
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
• C1 : the prey requires the cooperative effort of n robots to be moved, with n > 1. • C2 : the robots have no a priori knowledge about the dimensions of the environment, or about the position of any robot or other object. • C3 : the robot’s perception range is small when compared to the distance between the nest and the prey. • C4 : communication is unreliable and limited to a small set of simple signals that can only be perceived by those robots that are in the immediate neighbourhood. These constraints have strong implications on the organisation of labour within the group. To illustrate this, we refer to the generic definition of teamwork recently proposed by Anderson and Franks [1]. To solve our task it is required for some robots to engage in the transport (C1 ), while others have to direct the transporters towards the nest (C2 , C3 and C4 ). Thus, two different subtasks have to be performed concurrently. Therefore, our task can be considered to be a team task. Both subtasks require the cooperation of multiple robots. Moreover, each subtask can be considered as a team task [1]. In this paper, we present a distributed controller for the team task described above. To the best of our knowledge, the three different tasks of path formation, self-assembly, and group transport have been tackled only separately with real robots. We present the first attempt to solve these three tasks as parts of an integrated scenario, using a robot team that is homogeneous both in hardware and control. Roles are assigned dynamically as the result of a self-organised process. Furthermore, we introduce the concept of chains with cyclic directional patterns (termed CDP-chains). CDP-chains are a new method in robotic exploration of unknown environments. These chains serve (a) to explore the environment, (b) to establish a path between prey and nest, (c) to recruit workers to the prey along this path, and (d) to guide the transport group back to the nest. The remainder of this paper is organised as follows. In Section 2, we give a brief overview of related work in path formation, self-assembly, and group transport. In Sections 3 and 4, we detail the robot’s hardware and control. In Section 5, we present the experimental results. Finally, in Section 6 we draw some conclusions.
2. Related Work Path Formation. When foraging, ants of many species lay trails of pheromone, a chemical substance that attracts other ants. Deneubourg et al. [6] showed that laying pheromone trails is a good strategy for finding the shortest path between a nest and a food source. Even though a colony of social insects is capable to solve such complex tasks, individuals are governed by simple rules. These often serve as a source of inspiration when designing distributed exploration strategies. Robotic chains, where the robots act as trail markers themselves, mimic the idea of pheromone trails. The concept of robotic chains stems from Goss and Deneubourg [9]. In their approach, every robot in a chain emits a signal indicating its position in the chain. A similar system was implemented by Drogoul and Ferber [7]. Both works have been carried out in simulation. Werger and Matari´c [21] used real robots to form a chain in a prey retrieval task. Neighbouring robots within a chain sense each other by means of physical contact: one robot in the chain has to regularly touch the next one in order to maintain the chain.
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
435
The use of virtual pheromones for environment exploration has been studied by Payton et al. [18] and by Mamei and Zambonelli [15]. Self-Assembly. Self-assembly is a particularly interesting mechanism in social insects [2]. Insects physically connect to each other to form aggregate structures with capabilities exceeding those of an individual insect. Some observed uses have strong implications for robotic system design (e.g., the formation of pulling structures [12]). Most modular robotic systems are not capable of self-assembly—modules are preassembled by the experimenter or by a separate machine [22]. Other systems can selfassemble if the modules are pre-arranged in specific patterns. Rare instances of less constrained self-assembly with up to three robots have been reported [8]. Recently, self-assembly has been demonstrated with the swarm-bots system [16]. Experiments were conducted on different terrains and with up to 16 physical robots [10]. Group Transport. Almost half a century ago, Sudd [19] studied solitary transport and group transport of prey by ants of the species Pheidole crassinoda. Although he observed that single ants would mostly behave similarly to those engaged in group transport, he reported that group transport “showed co-operative features”. Object transportation has extensively been studied in groups of mobile robots. In multi-robot box pushing, most studies consider two robots pushing a wide box simultaneously from a single side, a few systems with more than two robots have been studied [14]. Another strategy is to grasp and/or lift the object. In this case, each robot’s motion is highly constrained. Typically, systems of 2–3 physical robots have been studied. Often the planning is accomplished by a leader robot. While in some systems the leader sends explicit high- or low-level commands to the followers [20], in others, robots communicate through the object being transported [13].
3. Hardware We use a robotic system called swarm-bot lying at the intersection between collective and self-reconfigurable robotics [16]. A swarm-bot is formed by a number of basic robotic units, called s-bots, which are fully autonomous and mobile, and capable of autonomously connecting to each other. Fig. 1a shows the physical implementation of the s-bot. The robot has a diameter of 12 cm and weighs approximately 700 g. In the following, we briefly overview the actuators and sensors that are used in this study. For a more comprehensive description of the s-bot’s hardware see [16]. The s-bot has five degrees of freedom (DOF) all of which are rotational, including two DOF for the traction system, one DOF to rotate the s-bot’s upper part (called the turret) with respect to the lower part (called the chassis), one DOF for the grasping mechanism of the gripper (in what we define to be the s-bot’s front), and one DOF for elevating the arm to which the gripper is attached (e.g., to lift another s-bot). The robot’s traction system consists of a combination of tracks and two external wheels, called treelsc . An s-bot can connect with another by grasping the connection ring. An s-bot can receive connections on more than two thirds of its perimeter. In this study we make use of a variety of sensors, including 15 proximity sensors distributed around the turret, four ground sensors mounted underneath, an accelerom-
436
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
The s−toy
spherical mirror
The s−bot 70 cm
camera
30 cm gripper
40 cm
50 cm 10 cm LED−ring IR proximity sensors
(a)
Treels
(b)
Figure 1. The hardware. (a) The s-toy and the s-bot. (b) An image taken with the omni-directional camera of the s-bot. It shows other s-bots and an s-toy activating their red LEDs at various distances.
eter, two optical barriers integrated in the gripper, four omni-directional microphones, one X-Y force sensor between the turret and the chassis, as well as proprioceptive sensors. Moreover, a VGA camera directed towards a spherical mirror provides an omnidirectional view. Next to the s-bot, Fig. 1a shows the s-toy, an object which we use either as nest or as prey (depending on its colour). It has a diameter of 20 cm and, like the s-bot, it is equipped with an RGB LED-ring. The nest is immobile. The prey weighs 800 g and requires the cooperative effort of two or more s-bots to be moved. A snapshot taken from an s-bot’s camera is shown in Fig. 1b. Due to differences among the robots’ cameras, there are some variations in the perceptual range. The software we use to detect coloured objects allows a recognition of the red coloured prey up to a distance of 70 − 90 cm, and of the three chain colours, blue, green and yellow, up to 35 − 60 cm (depending on which robot is used).
4. Controller We decompose the task into two subtasks: (i) exploration of the environment to form a path between nest and prey, and (ii) assembly to the prey or already connected robots to transport the prey along the path towards the nest. We realized our controller using the behaviour based architecture illustrated in Fig. 2. The exploration module and the transport module are detailed in the following. In addition to these two main modules, there is the reset behaviour in which a robot moves back to the nest and rests. 4.1. Exploration Module The robots are initially located at random positions. They have to search the nest, which can be considered as root of each chain. Robots that have found the nest start selforganising into visually connected chains relying on the concept of cyclic directional patterns. As displayed in Fig. 3a, each robot emits one out of three signals depending on its position in the chain. By taking into account the sequence of the signals, a robot
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
Transport Module
Exploration Module Chain_found Chain_lost
Explore
Tail_of_chain
Prey_found
Prey_close
Assembly_successful Transport Target
Pec )
(Tail_of_chain
Assemble
Search Pce
Chain_lost
(Prey_found
Chain
437
Prey_found
Timer_Treset
Prey_not_close)
Chain_found
Prey_very_close
Reset
Chain_lost
Transport Blind
Timer_Tas
Figure 2. State diagram of the control. Each circle represents a state (i.e., a behaviour). Edges are labelled with the corresponding conditions that trigger a state switch. The initial state is the search state. Pec (and Pce respectively) is a boolean variable which is set to True, if R ≤ Pec (R ≤ Pce ), and to False otherwise, where R is a stochastic variable sampled from the uniform distribution in [0, 1].
can determine the direction towards the nest. Each signal is realized by the activation of a dedicated colour with the LED ring. Robots that are part of a chain may leave it under certain conditions. This is fundamental for the exploration of the environment as it allows the formation of new chains in unexplored areas. The process of chain formation and decomposition is continued until a chain encounters the prey. The members of this chain do not decompose any more and automatically lead the other robots to the prey. The prey, the nest, and members of a chain can be recognised by their colour. A chain member is either blue, green or yellow. The nest is considered as a chain member, and is blue. The prey is red. Behaviours. Three behaviours are employed to realize the exploration module: • Search: the robot performs a random walk which consists of straight motion and turning on the spot when an obstacle is encountered. No LEDs are activated. • Explore: an explorer moves along a chain towards its tail. In case a robot becomes an explorer by leaving a chain, it moves back to the nest from where it might then
Prey α
Nest (a)
(b)
Figure 3. (a) CDP-chains. The small coloured circles represent robots that have formed a CDP-chain that connects a nest with a prey. Three colours are sufficient to give a directionality to the chain. The large circles surrounding the robots indicate their sensing range. (b) Alignment of a chain member. If the angle α is less than 120o , the central chain member aligns with respect to its closest neighbours.
438
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
start to follow a different chain. No LEDs are activated. • Chain: a chain member activates the appropriate colour with its LEDs. To avoid loops in chains and to improve the length of the chains, we implemented an alignment behaviour, that is, the robot aligns with its two closest neighbours in the chain in case the angle between them is smaller than 120o (see Fig. 3a). Otherwise there is no movement. The behaviours are realized following the motor schema paradigm [3]. For each behaviour, a set of motor schemas is active in parallel. Active motor schemas are added and translated into motor activation at the beginning of each control time step.1 Common to all behaviours is a motor schema for collision avoidance. Behaviour Transitions. A set of conditions trigger behaviour-transitions: • Search → Explore: if a chain member is perceived. • Explore → Search: if no chain member is perceived. • Explore → Chain: (i) if the tail of a chain is reached (i.e., only one chain member is perceived), the robot joins the chain with probability Pec per time step, or (ii) if the prey is detected at a distance larger than 30 cm. • Explore → Assemble: if the prey is detected at a close distance (i.e., less than 30 cm). • Chain → Search: if the previous neighbour in the chain is no longer detected. • Chain → Explore: if a chain member is situated at the tail of a chain, it leaves the chain with probability Pce per time step. • Chain → Reset: if the prey is perceived at a very close distance (i.e., less than 5 cm), which only occurs if the prey is transported towards the chain member. • Reset → Search: after resting for the time Treset = 60s. The two probabilistic parameters Pec and Pce have a significant effect on the overall behaviour of the robot group. This concerns in particular the number and length of the formed chains, and the speeds of the processes that lead to the formation and to the destruction of chains. For instance, low values for Pec result in a rather patient behaviour; in most cases a single chain is formed slowly. For Pec close to 1, several chains are formed fast and in parallel. The second parameter, Pce , determines the stability of the formed chains, directly influencing their lifetime and the frequency of chain disbandment. After having conducted tests in simulation [17] and on the real s-bot, we have fixed the values of the probabilities to Pec = 0.14 and Pce = 0.007. 4.2. Transport Module The transport module controls the s-bots to form a pulling structure, a swarm-bot, connected with the prey. This swarm-bot transports the prey along a path established by other robots back to the nest. In the following the behaviours and behavioural transitions are detailed. Behaviours. The transport module comprises three behaviours: • Assemble: the robot approaches and connects with a red object (e.g., the prey). It is controlled by a reactive neural network taking input from the camera and the 1 A control time step has a length of approximately 120 ms. This value is not constant because it depends on the time required for image processing.
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
439
proximity sensors [10]. In the moment the robot connects, it activates its LED ring in red. Therefore, it becomes itself an object with which to establish a connection. • Transport-Target: the robot aligns its chassis towards the closest chain member, which indicates the direction to the nest, and starts pulling. • Transport-Blind: if no chain member is perceived, the robot monitors the force acting between the turret and the chassis. Moreover, it estimates if there is stagnation using the torque sensors of the tracks. Based on this information, a recurrent neural network computes the speed and the desired direction of the chassis [11]. Behaviour Transitions. Again, a set of conditions trigger behaviour-transitions: • Assemble → Reset: if the robot does not succeed in connecting to an object within Tas = 90 s. • Assemble → Transport-Target: if the robot succeeds in connecting to an object. • Transport-Target → Transport-Blind: if the robot perceives no chain member. • Transport-Blind → Transport-Target: if the robot perceives a chain member. 5. Results We conducted experiments in a bounded arena of size 500 cm × 300 cm. The nest was positioned in the centre and the prey was put at distance D (in cm). N robots are positioned on a grid composed of 60 points uniformly distributed in the arena. The initial position of each s-bot is assigned randomly by uniformly sampling without replacement. The s-bot’s initial orientation is chosen randomly from a set of 12 possible directions. We examined different setups (N, D), keeping a linear relationship between N and D. We studied distances (D) of 30, 60 and 120 cm, for group sizes (N ) of 2, 4 and 8 s-bots, respectively. In each case, at least two robots are required to transport the object. In the case (N, D) = (2, 30), the completion of the task does not require a chain as the prey can be seen from the nest and vice versa. In the case (N, D) = (4, 60), it is possible that the prey is recognised from within the vicinity of the nest. However, the nest cannot be perceived from within the proximity of the prey (remember that s-bots can perceive red at a further distance than other colours). Therefore, the transport requires a chain consisting of one or more s-bots. In case of (N, D) = (8, 120), a chain of three or more s-bots is required to complete the task. The criterion of success is satisfied if the prey retrieval is completed, that is, if the prey, or a robot transporting it, touches the nest. We conducted 10 trials for each setup. In total 30 trials have been performed. In 29 cases the task was successfully completed. Only in one replication of the setup (N, D) = (8, 120) this was not the case. This failure was due to an s-bot that incorrectly assumed that it was gripping the prey. Fig. 4 shows a series of six images taken from a typical trial with N = 8 s-bots. Within 120 s, three s-bots found the nest and the prey, and established a path between them. After another 60 s one of the five remaining s-bots connected with the prey, and signalled this by activating its red LEDs. This robot alone was not strong enough to pull the prey. However, shortly after, a second s-bot connected and the prey started to move. The group of s-bots transporting the prey reached the nest after a total of 300 s. There are three main phases in the accomplishment of the task: path formation, assembly and transport. We denote the completion times of these phases by Tp , Ta and Tt .
440
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
Figure 4. Sequence of images taken at different moments of a typical trial with 8 s-bots.
We consider the path formation phase to be completed as soon as a path connecting the prey with the nest can be traversed in both directions. The assembly phase is considered to be completed as soon as two s-bots are connected to the prey so that it can be moved. Table 1 summarises the results for the three completion times. For the experiments with 2 and 4 s-bots the shortest phase was path formation. In case of 2 s-bots no path needs to be formed at all. In case of 4 s-bots, one s-bot finding the nest and forming a chain in the direction of the prey is sufficient to complete the first stage. Indeed, in 9 out of 10 trials, the time to form a path is approximately equivalent to the time until the first s-bot found the nest (see Fig. 5). The largest fraction of time was required for the assembly phase: 89.9 % and 63.6 % for 2 and 4 s-bots, respectively. This phase is dominated by the relatively long time it takes to gather at least two s-bots at the prey location from their random starting positions. In fact, to find the nest or a chain, an s-bot performs a random walk. As the arena is rather large compared to an s-bot’s perceptual range, it can take a considerable amount of time until 2 out of 2, or 3 out of 4 s-bots have encountered the area from which they can perceive either the nest or a chain connected to it. The situation is different for 8 s-bots. Only 31.4 % of the time was spent in the assembly phase, which is far less than for groups of 2 and 4 s-bots, respectively. One possible explanation for this observation is the higher degree of redundancy in the system; only 5 out of 8 s-bots are required to accomplish the overall task, which is a lower fraction than for the other group sizes. However, the time until a sufficient number of s-bots have found the nest drops from slightly more than 100 seconds for the group sizes 2 and 4, to approximately 50 seconds for group size 8 (see Fig. 5): this can not any more be explained with the smaller absolute fraction of s-bots required. A possible explanation is that the larger the group size, the more s-bots join the chains, in this way extending the area from which a path to the nest can be found. This in turn makes it more likely to encounter a chain member and thus a connection to the nest. Table 1. Summary of the results. The value of Tp denotes the time required to form a path between nest and prey, Ta is the time until at least two s-bots are assembled to the prey so that the transport phase can start, and Tt is the time required to finish the transport. Mean and standard deviations are over ten experiments. 2 s-bots Mean Std. Dev.
4 s-bots
8 s-bots
Tp (s)
Ta (s)
Tt (s)
Tp
Ta
Tt
Tp
Ta
Tt
0
211.9
23.9
24.5
133.3
51.8
93.6
87.4
95.6
(0%)
(89.9%)
(10.1%)
(11.7%)
(63.6%)
(24.7%)
(33.8%)
(31.6%)
(34.6%)
0
127.5
6.3
10.8
72.8
39.8
51.9
59.7
46.6
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony 250
2 s-bots 4 s-bots 8 s-bots
200 Time (s)
441
150 100 50 0 0
1
2
3
4
5
6
7
8
Number of s-bots that found the nest or a chain connected to it
Figure 5. Time until the n-th s-bot finds either the nest or a chain connected to it.
The absolute amount of time spent during transport grows approximately linearly with the distance between nest and prey: 23.9, 51.8, and 95.6 seconds are required for the three setups, suggesting that for the transport it is not beneficial to increase the number of s-bots. Indeed, we observed that a pulling structure of 2–3 s-bots seems to be the optimal configuration for this particular transport task.
6. Conclusions We presented an experimental study in which a group of autonomous robots engage in a foraging scenario, as found in some ant colonies. The task comprised the following three complex subtasks: (i) environment exploration and path formation, (ii) self-assembly to form a pulling structure connected with an object, and (iii) group transport of a heavy object. Inspired by the natural counterparts, we developed a swarm robotic control algorithm. The system is fully decentralised, and homogeneous in control. Experimental results with up to eight physical robots confirm the reliability and robustness of the system. As, (i) the number or robots is relatively large when compared to most other examples of teamwork in multi-robot systems [1], (ii) not only the task but also its subtasks can be considered as team tasks, and (iii) our homogeneous robots dynamically solve the problem of task allocation, we believe that to date this study is the most complex example of self-organisation in the robotics field.
Acknowledgments This work is supported by the “ANTS” project, an “Action de Recherche Concertée” funded by the Scientific Research Directorate of the French Community of Belgium, and was supported by the “SWARM-BOTS Project”, funded by the Future and Emerging Technologies programme (IST-FET) of the European Commission, under grant IST2000-31010. The Swiss participants of the project were supported under grant 01.0012 by the Swiss Government. The information provided is the sole responsibility of the authors and does not reflect the Community’s opinion. The Community is not responsible for any use that might be made of data appearing in this publication. Marco Dorigo acknowledges support from the Belgian FNRS, of which he is a Research Director. Roderich Groß acknowledges additional support from JSPS.
442
S. Nouyan et al. / Group Transport Along a Robot Chain in a Self-Organised Robot Colony
References [1] C. Anderson and N. R. Franks. Teamwork in ants, robots and humans. Adv. Stud. Behav., 33:1–48, 2004. [2] C. Anderson, G. Theraulaz, and J.-L. Deneubourg. Self-assemblages in insect societies. Insect. Soc., 49(2):99–110, 2002. [3] R.C. Arkin. Behavior-Based Robotics. MIT Press, Cambridge, MA, 1998. [4] E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, New York, NY, 1999. [5] S. Camazine, J.-L. Deneubourg, N. Franks, J. Sneyd, G. Theraulaz, and E. Bonabeau. SelfOrganization in Biological Systems. Princeton University Press, Princeton, NJ, 2001. [6] J.-L. Deneubourg, S. Aron, S.Goss, and J.-M. Pasteels. The self-organizing exploratory pattern of the argentine ant. J. Insect Behavior, 3:159–168, 1990. [7] A. Drogoul and J. Ferber. From Tom Thumb to the Dockers: Some experiments with foraging robots. In From Animals to Animats 2. Proc. of the 2nd Int. Conf. on Simulation of Adaptive Behavior (SAB92), pages 451–459. MIT Press, Cambridge, MA, 1992. [8] T. Fukuda and T. Ueyama. Cellular Robotics and Micro Robotic Systems. World Scientific Publishing, London, UK, 1994. [9] S. Goss and J.-L. Deneubourg. Harvesting by a group of robots. In Proc. of the 1st European Conf. on Artificial Life, pages 195–204. MIT Press, Cambridge, MA, 1992. [10] R. Groß, M. Bonani, F. Mondada, and M. Dorigo. Autonomous self-assembly in a swarmbot. In Proc. of the 3rd Int. Symp. on Autonomous Minirobots for Research and Edutainment (AMiRE 2005), pages 314–322. Springer, Berlin, Germany, 2006. [11] R. Groß and M. Dorigo. Group transport of an object to a target that only some group members may sense. In Parallel Problem Solving from Nature – 8th Int. Conf. (PPSN VIII), volume 3242 of LNCS, pages 852–861. Springer Verlag, Berlin, Germany, 2004. [12] B. Hölldobler and E. O. Wilson. The Ants. Harvard University Press, Cambridge, MA, 1990. [13] K. Kosuge and T. Oosumi. Decentralized control of multiple robots handling an object. In Proc. of the 1996 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, volume 1, pages 318–323. IEEE Computer Society Press, Los Alamitos, CA, 1996. [14] C. R. Kube and E. Bonabeau. Cooperative transport by ants and robots. Robot. Auton. Syst., 30(1–2):85–101, 2000. [15] M. Mamei and F. Zambonelli. Spreading pheromones in everyday environments through RFID technology. In Proc. of the 2nd IEEE Symposium on Swarm Intelligence, pages 281– 288. IEEE Press, Piscataway, NJ, 2005. [16] F. Mondada, L. M. Gambardella, D. Floreano, S. Nolfi, J.-L. Deneubourg, and M. Dorigo. The cooperation of swarm-bots: Physical interactions in collective robotics. IEEE Robot. Automat. Mag., 12(2):21–28, 2005. [17] S. Nouyan. Path formation and goal search in swarm robotics. Technical Report TR/IRIDIA/2004-14, Université Libre de Bruxelles, Belgium, September 2004. DEA Thesis. [18] D. Payton, M. Daily, R. Estowski, M. Howard, and C. Lee. Pheromone robotics. Auton. Robots, 11:319–324, 2001. [19] J. Sudd. The transport of prey by an ant Pheidole crassinoda. Behav., 16:295–308, 1960. [20] T. G. Sugar and V. Kumar. Control of cooperating mobile manipulators. IEEE Trans. Robot. Automat., 18(1):94–103, 2002. [21] B. Werger and M. Matari´c. Robotic food chains: Externalization of state and program for minimal-agent foraging. In From Animals to Animats 4, Proc. of the 4th Int. Conf. on Simulation of Adaptive Behavior (SAB96), pages 625–634. MIT Press, Cambridge, MA, 1996. [22] M. Yim, Y. Zhang, and D. Duff. Modular robots. IEEE Spectr., 39(2):30–34, 2002.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
443
Growing Virtual Neural Tissue: Binding Spiking Neurons through Sensory Input Pascal Kaufmann a,1 , Gabriel Gómez a a Artificial Intelligence Laboratory Department of Informatics, University of Zurich, Switzerland
Abstract. Based on the earlier introduced Virtual Neural Tissue (VNT) architecture, growing artificial neural networks underlying spike-timing dependent plasticity (STDP) are provided with sensory input. Both environmental cues and neural morphology are exploited in order to minimize computational expenses. Applying recent neuroscientific insight, it is demonstrated that implemented spike-timing dependent plasticity leads to the functional reshaping of neural circuitries in response to either intrinsic neural activity or sensory stimulation. It is found that intrinsically generated neural activity leads to a fine tuning of the underlying neural circuitry. This is reflected by dynamically segregating and forming groups of neurons that engage in synchronously locked firing patterns. Keywords. Virtual neural tissue, spike time dependent plasticity, synchronizity, binding problem
1. Introduction The Virtual Neural Tissue (VNT) framework ( [1]) enables the concurrent analysis and visualization of individual neurons and synapses underlying dynamic plastic changes. In this study artificial sensory input is fed into the neural tissue underlying spike-timing dependent plasticity while its neural activity in terms of plasticity and spike patterns are analyzed. “Spike Time Dependent Plasticity” (STDP) was introduced by Levy ( [2]), as He evidenced that presynaptic activity can induce plastic changes that is reflected in synaptic transmission based on the timely arrival of stimuli. The relative timing between the presynaptic and postsynaptic spiking determined the direction and the extant of synaptic changes (review by [3]). If the pre-synaptic spike precedes the post-synaptic spike, the synapse is potentiated, while it is depressed if the temporal order is reversed. STDP may balance the delicate ratio between synaptic long-term potentiation (LTP) and long-term depression (LTD) by attenuating the efficacy of randomly active synapses. It has been reasoned that a slight asymmetry between LTD and LTP in favor of a higher extent of LTD may lead to a fine tuned distribution in synaptic weights ( [3]). STDP as a mechanism automatically introduces competition into Hebbian plasticity as functional clusters 1 Correspondence
to: Pascal Kaufmann, Artificial Intelligence Laboratory. University of Zurich. Andreasstrasse 15, CH-8050, Zurich, Switzerland. Tel.: +41 44 635 6722; Fax: +41 44 635 4507; E-mail: kpascal@ifi.unizh.ch
444
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
Figure 1. Artificial chemical environment created by symmetrically arranged neurons emitting red growth factors decaying to gray (neural cell bodies not visible). Note the two sensory arcs at the upper and the lower half of the screen at the left side. Each arc consists of 60 sensory cells. Remaining non-sensory neurons (right half) are extending their first neurites (white structures).
of synapses that are effective at rapidly generating spikes are potentiated by STDP rendering them even more effective at controlling the timing of volleys of spikes. It may be likely that this group of orchestrated synapses may control other synapses so as to segregate the responsiveness of neural tissue into functional units ( [4]; [5]). Given sufficient competition between synapses, STDP provides an elegant strategy for learning temporal sequences that may be reflected through clusters of synapses. VNT thus allows to test this hypothesis based on plasticity assessment and spike panels. In the following section we detail the modified VNT architecture capable of processing external stimuli. Section 3 explains how sensory input is administered to the neural substrate. Results are presented in section 4, while section 5 discusses the obtained findings with reference to recent literature. 2. Experimental setup The VNT architecture provides the framework of the study presented. Simulated neurons implemented in VNT are spontaneously active. If, by chance, several neurons fire action potentials at the same time, other neurons may become actively triggered. In adaptation of the standard protocol introduced ( [1]) sensory neurons sensitive to r,g,b-colors are added to the neural tissue in order to feed correlated input into the network. If the option is enabled, neurons are randomly distributed over the upper half of the screen before they are mirrored at the horizontal axis of the monitor. This arrangement gives rise to a neural cell body distribution that possesses a horizontal axis of reflection going through the center of the network (see fig. 1). The symmetric arrangement assures that neurites of sensory cells at one side of the network may encounter the same number of target cells as the sensory cells at the other side of the neural network. The imposed symmetry counteracts biased sensory input that may result from a high density of neural cell at one
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
445
Figure 2. Sensory field of sensory neurons created by color sensitive rays (dotted lines). The number and types of sensory cells is equal at each site. Only a few sensory axons are extending towards the right site (with reference to the screen) while others are repulsed by the lack of growth factor at the left side. Ultimately all axons will extend their neurites towards the right site executing a defined developmental program.
side compared to a lower at the other site. Nevertheless, the stochastic developmental process may distort the initially imposed symmetry. Sensory cells are arranged along two curved lines (fig. 1 left side). The shape and the arrangement of the cells is adjustable at the beginning of the growth process so that various sensory arrangements may be investigated. However, for all experiments with fed-in sensory information the standard arrangement was applied. Sensory neurons were implemented to be arranged at the left side on the screen while they are extending their axons towards the right side in order to form synapses to non-sensory neurons. In contrast to other neurons, sensory neurons are lacking dendrites. Moreover, a wall-like boundary blocks the spread of growth factor towards the left site so that aberrantly growing axons retract due to the block (black color). As sensory axons may regrow several times, ultimately all axons are spreading out into the attractive regions at the right site. Their input is provided by “color sensitive rays” which are reminiscent of biological whiskers. These rays originate at the sensory cells and extend perpendicularly to the arch that is underlying the arrangement of the sensory cells. Three types of sensory neurons were implemented that differ in their color sensitivity: According to the RGB-color scheme r-neurons are exclusively sensitive for the red component of a given color, while r- and g-cells do only sample the respective green or blue components. R, g, and b-cells are arranged in an alternating manner, starting with r-cells at the right side of the sensory arches (rostral to the left) and ending with b-sensory cells at the respective left endings of the two sensory arches. Thus the number of sensory cells is always a multiple of three so as to avoid (unilaterally) biased color sensitivity. The arrangement of sensory cells and the resulting “sensory field” is shown in figure 2. Color sensitive rays detect sensory stimuli and convey this information to the neural cell body where a conversion of graded visual input into a frequency code is performed.
446
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
Figure 3. Sensory axons projecting from the left sensory arch seeking for synaptic contacts. Target neurons were assigned a minimal material amount in order they become visible (dispersed dots). Yet, no spines are grown and cause the sensory axons to engage in spiral growth (see arrows) releasing numerous axonal sprouts to increase the probability for establishing synaptic contacts.
The spiking frequency elicited is directly and linearly related to the intensity of the respective color sampled. An r-cell thus responds to a pure red color with a maximal firing rate while the neuron would not fire if e.g. a pure blue color would be presented. The maximal firing frequency is adjustable and is to define before the growth process is initiated. Hence, if a colored object is moved in a straight trajectory from the top towards the bottom of the screen and if the object crosses a number of sensory rays, sensory neurons engage in a characteristic firing pattern that is both related to the color as well as the shape and speed of the given object. In VNT, both the trajectory and the velocity as well as shape and color of a given object is adjustable. However, in the current investigation, these parameters remained unchanged. If an axon from a sensory cell is shrunk as it did not form any synapses it is regrown with the same material amount until all sensory cells have established synaptic contacts. Sensory axons are equipped with a pioneering growth cone. However, growth cones are implemented to be subject to a constant shift in their growth factor sensitization. Thus, with progressing time the main axon becomes more likely to engage in significant turning ("or searching behavior") as its sensitivity for high growth factor concentration is building up. Remember that highest growth factor concentrations are sensed adjacent to neural cell bodies. The main axon will thus ultimately grow along a spiral if no target was encountered before (fig. 3). This mechanism enables the axon to increase the probability of forming synaptic contacts as curved pathways give rise to axonal sprouts ( [1]). The sensory-innervation procedure can be triggered and stopped manually by the experimenter.
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
447
Figure 4. Toti-activity panel of 60 sensory neurons, listed on the left side. Duration of recording: 500 ms. An object is moved from right to left (with reference to the horizontal symmetry axis of the network, see panel top right) crossing the sensory field of 60 sensory neurons within 400 ms. Most upper trace corresponds to most temporal sensory neuron at the right site of the VNT, subsequent trace shows recordings from consecutively arranged sensory neurons. Note that right sensory neurons are numbered with even numbers, the left with odd respectively. Thus, two declining traces emerge representing the right and subsequently the left sensory arch. The inset shows the respective sensory neurons together with the RGB colored object (r=255, g=155, b=55) that is moved though the visual field along a straight trajectory. Also note that r-neurons are maximally stimulated while b-neurons engage in moderate spiking activity.
3. Methods and protocol: feeding sensory input to VNT In the following stimulation experiments both sensory and “tissue neurons” are recorded and analyzed on toti-activity panels. In order to distinguish sensory from tissue neurons sensory neurons are listed in the lower part of the activity panel and are tagged by a small asterisk. However, the characteristic and discrete spike pattern elicited by moving objects are clearly discernible from other neurons in the tissue (see fig. 4). 3.1. Standard protocol with sensory input VNT sensory neurons physiologically correspond to non-sensory neurons so that the parameter setting are equivalent. If sensory input is enabled, additionally 60 sensory neurons are arranged at the left side of the 46 standard neurons (see above). 2 x 30 Sensory neurons are arranged on two arches according to fig. 2. Two sensor-arches give rise to a sensory overlap of 2 times 35 degrees. The origin of each arch is located at 30% screen width, while the radius is set to 60 pixels. Sensory cells are equally arranged over an angular extent of 110 degrees. The sensory field of each sensory cell is restricted to a “sensory ray” starting at 80 pixel radius and ending at 400 pixels radius with reference to the two origins of the sensory arch. Each sensory ray is assigned a width of 1 pixel. Default settings were adjusted so that a pure white RGB quadrate (255, 255, 255) of length 50 pixels is moved through the sensory field. Within a time window of 500 ms the
448
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
Table 1. Standard parameter settings for VNT with visual input Property: VNT with Sensory Input
Units
Number of total neurons Number of sensory neurons Dendrites per sensory neuron Dendrites per non-sensory neuron Max dendrite length μm Max main axon length μm Max axonal sprouts length μm 1st axon generation branching angle deg 2nd axon generation branching angle deg Intrinsic curvature deg Sprouting curvature for main axons deg Threshold Vthr (ηE ΔE ) signals Signal lifespan tsl μs Refractory period tref μs Spontaneous activity Hz Maximum firing frequency of sensory cells Hz Recording time ms
Value 106 60 0 30 150 1000 200 90 45 0 ∼60 100 2000 1000 20 300 500
Table 2. STPD Parameters STDP activated LTDwindow [μs] LTPwindow [μs]
Exp. A Exp. B
2000 5000
1000 1000
object is presented 6 times in sequence with a delay of 20 ms in between presentations. The trajectory of the object was defined so as to intersect the visual field in a straight line, originating at the most right and ending at the most left sensory ray (with reference to the network axis) (fig 2). The maximum firing frequency of sensory cells was adjusted to 300 Hz for the respective pure component and 0 Hz if colors are shown that do not contain the respective color component. Parameter settings for the standard protocol with sensory input are summarized in table 1. The standard procedure is equivalent to the procedure outlined without sensory input as the stimulus presentation occurs automatically. Thus, upon enabling the option for neural firing, a white visual stimulus is moved several times through the sensory field of each sensory neuron. The activity of the neural tissue is recorded for 500 ms while all parameters are logged automatically. Grounding on earlier studies on the impact of LTD- and LTP STDP window width on plasticity and neural firing (see [1]) the following STDP parameters are shown in table 2.
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
449
Figure 5. Neural activity in response to repeated sensory stimulation. A1: Asymmetric time window is applied with LTDwindow : 2000 μs and LTPwindow : 1000 μs. Note that the ratio LTDwindow :LTPwindow is 2:1. Spike trains of non-sensory neurons are created at the onset of the first stimulation. The largest fraction of spiking neurons is recruited at the time the fourth sensory neuron at the right side of the network is stimulated. Upon stimulation of neurons from the left sensory arch a second increase in neural activity occurs caused by a 30% fraction of simultaneously firing non-sensory neuron. Note the highly synchronized event at time step 690 ms. Larger fractions of concurrently firing neurons occur during sensory stimulation while the duration of synchronous events decreases steadily. A2 The ratio of LTDwindow :LTPwindow is 2:1. The smaller LTDwindow width is reflected by an increased number of potentiated synapses. B1: Asymmetric time window is applied with LTDwindow : 5000 μs and LTPwindow : 1000 μs. Compared to previous examples, the ratio of LTDwindow :LTPwindow is 5:1. Note the fractured onset of spike trains considering all non-sensory neurons. A fraction of neurons starts bursting at the time of the onset of sensory stimulation while they cease firing after 15 ms of intense bursting. As the former group of neurons stops firing, another fraction of simultaneously firing neurons emerges that ceases firing after 15 ms of intense action potential release. A new fraction of neurons that is partially constituted by previously active neurons then engages in synchronized action potential release in response to left sensory arch stimulation. B2: The 5:1 predominance of LTDwindow over the LTPwindow is reflected at the level of synapses. Most synapses are underlying LTD. However, note the three diffusely horizontally arranged clusters of potentiated synapses below the sensory arches. They are paralleled by three vertically arranged pronounced clusters of potentiated synapses surrounded by synapses underlying LTD.
450
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
4. Results The following data show toti-activity recordings and their corresponding virtual neural tissue. The sensory arches would be located on the top (not visible) of the networks. Note the neural dynamics as depicted with white bars depicting actively triggered spikes listed per neuron that are numbered on the left side of the panel: Green dots depict potentiated synapses (weights above 1.0), while red dots depict depressed synapses (weight below 1.0). Unchanged weights of value 1.0 are not visualized. Note the highly synchronized activity of several neurons triggered by the sensory input. Interestingly, three diffusely horizontally arranged clusters of potentiated synapses below the sensory arches are formed. They are paralleled by three vertically arranged pronounced clusters of potentiated synapses surrounded by synapses underlying LTD. The demonstrated results only show one set of experiments. Yet recordings are representative and could be verified in different preparations and instantiations in VNT. However, an exhaustive comparison study with statistical relevance would go beyond the scope of this study. Results are show below (see figure 5).
5. Discussion In the presented set of experiments sensory stimulation was applied to neurons constituting the Virtual Neural Tissue (VNT) architecture. As is discernable in the toti-activity recordings in figure 5, groups of concurrently spiking neurons form and give rise to pronounced network activity in response to artificial sensory stimulation. As stated above, the effects of differing STDP time window on plasticity and neural dynamics are traceable by virtue of the analytical tools provided by the VNT environment. Note that circuitry assessments, activity localization and synaptic weight development over time are not shown as a comprehensive analysis of the highly complex spiking pattern would lie beyond the scope of this study. With regard to stated hypotheses on STDP (eg. [3]; [6]) several experiments can be conducted and their outcome analyzed. STDP may balance the delicate ratio between synaptic long-term potentiation (LTP) and long-term depression (LTD) by attenuating the efficacy of randomly active synapses. It has been reasoned that a slight asymmetry between LTD and LTP in favor of a higher extent of LTD may lead to a fine tuned distribution in synaptic weights ( [3]). STDP as a mechanism automatically introduces competition into Hebbian plasticity as functional clusters of synapses that are effective at rapidly generating spikes are potentiated by STDP rendering them even more effective at controlling the timing of volleys of spikes. It may be likely that this group of orchestrated synapses may control other synapses so as to segregate the responsiveness of neural tissue into functional units ( [4]; [5]). Given sufficient competition between synapses, STDP provides an elegant strategy for learning temporal sequences. It is noticeable that potentiated synapses are generally outnumbered by depressed synapses even though symmetric LTD and LTP window values are applied. This may be the result of a filtering effect that is common to all integrate-and-fire models that operate in an input-averaging mode. The results obtained demonstrate the highly dynamic character of spike-timing and its susceptibility to various parameter settings. Toti-activity recordings presented show transient synchronization of fractions of neurons that are temporarily responsive to in-
P. Kaufmann and G. Gómez / Growing VNT: Binding Spiking Neurons Through Sensory Input
451
trinsically generated fluctuations before they break up into clusters that are either driven by correlated sensory input (not shown here) or that are sensitive to the activity of neural clusters. The emerging dynamics have been reviewed by Rolls and colleagues ( [7]) who suspected syntactic binding of separate neuronal ensembles by temporal synchronization. Unfortunately, current neuroscientific methods do not yet allow an analysis of clusters of synapses nor is it possible to record a significant number of synapses concurrently, so that the results obtained can not be set in relation to physiological data. More detailed investigations with VNT may although provide guidelines and further future simulations so as to shed light on the dynamics of both virtual and biological brain tissue. The framework established may stimulate ongoing research at the interface between neuroscience and robotics research. VNT may serve as a test bed for digital neuroscience thus gapping the bridge between analytical approaches and synthetic modeling by equipping real world agents with growing virtual neural tissue.
Acknowledgements This research was supported by the EU-Project ADAPT (Artificial Development Approach to Presence Technologies) IST 2001-37173.
References [1] P. Kaufmann, “Virtual neural tissue: On the application of digital neuroscience.” Master’s thesis, University of Zurich, 2003. [2] W. Levy and D. Steward, “Temporal contiguity requirements for long-term associative potentiation / depression in the hippocampus,” Neuroscience, vol. 8, pp. 791–797, 1983. [3] L. Abbott and S. Nelson, “Synaptic plasticity: taming the beast,” Nature Neuroscience supplement, vol. 3, pp. 1178–1183, 2000. [4] G. M. Edelman and G. Tononi, Consciousness - How matter Becomes Imagination. Penguin Books Ltd, London, England, 2000. [5] C. Koch and I. Segev, “The role of single neurons in information processing,” Nature Neuroscience supplement, vol. 3, pp. 1171–1177, 2000. [6] Y. Dan and M. Poo, “Spike timing-dependent plasticity of neural circuits,” Neuron, vol. 44, pp. 23–30, 2004. [7] E. Rolls and A. Treves, Neural Networks and Brain Function. Oxford university press, 1998.
This page intentionally left blank
Part 8 Emergent Synthesis
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
455
Hormone-inspired Adaptive Distributed Synchronization of Reconfigurable Robots Feili Hou, Wei-Min Shen Information Sciences Institute, University of Southern California Abstract In this paper, we present a hormone-inspired adaptive distributed synchronization of reconfiguration robots. The main approach is to use a combination of discrete event-driven hormone communication and continuous time-controlled motor motion. Without any prior knowledge, all the modules’ speed can be self-adjusted to adapt to each other to achieve a collaborative motion. Simulation and experimental results show that our method is robust to accuracy of local timer, configuration change and unexpected disturbance. Keywords Adaptive distributed synchronization, reconfigurable robot
1. Introduction Made from a network of homogeneous reconfigurable modules, the self reconfigurable robots can change their shape and size by themselves to accomplish different tasks in dynamic, uncertain, and even unforeseen environments. Due to this versatility, the self-reconfigurable robots have a number of potential applications, such as fire fighting, battlefield reconnaissance, undersea mining. One of the most fundamental and vital issues to realize these potentials is the synchronization. To make a large number of connected modules accomplish a robust, flexible and adaptive global behavior, the action of the different modules must be coordinated and synchronized for maximum effectiveness. Since the centralized synchronization will cause poor scalability and high vulnerability, a distributed communication by means of local interaction is preferable. The research challenges for the distributed synchronization are: 1. Timer constraint: In the absence of a uniform global clock, all the modules need act at the right time autonomously to make the global behavior smooth and efficient. 2. Communication Constraint: In distributed synchronization, there is no global broadcast or central controller. All the modules should detect other’s state and coordinate with each other just by local interactions with direct neighbors. 3. Motor speed regulation: To get a collaborative global behavior, all the modules should have autonomous speed control to adapt to each other in the absence of a centralized controller. For example, for the creep motion (please refer to Section 5 on this gait in detail), the two legs should be controlled to have the same speed so that the robot can move forward straightly. 4. Robustness requirement: If some modules can not reach the desired angles as expected, or some modules fail in the halfway, or the robot configuration is changed, how can the robot adapt to these situations and keep moving without jamming? Focusing on these problems, we proposed a hormone-inspired adaptive distributed synchronization protocol in this paper. The idea of virtual disconnection is proposed to prevent the message circulating for cyclic robot configuration. After that, a
456
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
combination of discrete event-driven hormone communication and continuous timecontrolled motor motion is developed to achieve a scalable and robust behavior. In the following text, the related work is discussed in section 2. Section 3 proposes the idea of virtual disconnection, while section 4 describes the adaptive distributed synchronization protocol. Section 5 shows the capacity of our method by implementing it both in a physics-based GALINA simulation environment and in our SUPERBOT self-reconfigurable robot. Finally, conclusion and future work are made in section 6. 2. Related Work In the past, many researchers have been devoted to develop the synchronization control of locomotion in reconfigurable robots. Yim[1-2] use a gait control table to synchronize the modules’ motion transition from one gait to another. The need of central controller is an obvious performance bottleneck that causes poor scalability and high vulnerability. As an alternative, many distributed approaches have been studied. K.Stoy[3-4]gave a distributed synchronization based on the role-based control, where all the servos’ motions are confined to sinusoidal. This restriction limits the robot to certain repetitive locomotion. For some specific one, like rolling track etc, it is hard to implement. Shen[5] proposed to use artificial hormones to synchronize global locomotion among modules. In the early version, the hormone is generated every predefined period to start the next motion step. This brings a question of what is the exact period. If it is too short, each module can not reach the triggered position in time. If too long, time is wasted on waiting between any two motion steps. To get the time period, a well prior understanding of all the module’s motor attributes is required. Moreover, this open loop control scheme is vulnerable to non-predictable changes. In [6], a synchronization method is proposed that every module waits for the completion of its local actions before it relays the hormone. This avoids the timer issue, but engenders some limitation to its scalability. For example, a malfunctioned module will block the message transmission and bottleneck the whole behavior. As an integration and improvement of the previous synchronization methods, the adaptive distributed synchronization in this paper makes it possible for the selfreconfigurable robot to achieve versatile and robust movements autonomously. 3. Virtual Disconnection In our distributed synchronization protocol, a module selects appropriate actions based on received “command" hormones, its local state, and its location in the current configuration. To be robust to configuration changes, the hormone is directed to the module doing a specific function rather than to a specific module. Hereby, there is no name or identifier for each module. A module can automatically switch behavior if its location is changed in the configuration. In this section, we will give the definition of local topology vector to represent a module’s location in the robot, and propose the idea of virtual disconnection to prevent message from circulating for the robot that has loops in its configuration. Each module has a local topology to denote how its connectors are connected to the connectors of its neighbor modules. Suppose that every module has n different connectors, and each connector can either have no connection or connect to any of the n connectors of another module. Since each of the n connectors can have n+1 possible connection choices, the total number of local topology for each module is (n+1)n. As n
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
457
increases, (n+1)n will increase exponentially. Therefore, instead of representing the local topology as a number like [5], we express it as a vector for more general application. To illustrate the idea of local topology vector, let us consider a module that has six connection side as front(F), left(L), right(R), up(U), down(D), back(B). The local topology is represented by a vector that specifies the connection direction of the six connectors in the order of “F L R U D B”. If some connectors have no connection, its connection is N. For example, if a module’s topology type is [N,U,D,N,N,F], it means that the module has a topology as link[F]=N, link[L]=U, link[R]=D, link[U]=N, link[D]=N, link[B]=F. Here, link[x]=y (x=F,B,U,D,L,R; y=F,B,U,D,L,R,N) means that the modules’ x connector is connected to the y connector of another module. If the robot configuration is acyclic, it can be viewed as a tree. Each node denotes a module and the link corresponds to a physical connection between two modules. A “command” hormone can propagate through the entire tree of modules, yet causes different modules react differently. After receiving a “command” hormone, the module will check its connection from the topology, and send the hormone to all active links except the link from which the hormone is received. The generator module can be viewed as the root of the tree, when each module will receive the hormone from its parent, and will send the hormone to all children. Since the tree includes every node, the hormone reaches every module. For the robot with loops in its configuration, to ensure that each “command” hormone is received once and only once, a scheme is needed to break the transmission loops. Since there isn’t any identifier for each module, it presents a challenge that where the hormone transmission should be stopped. In [5], the control mechanism of individual module is changed to break the loop of communication. Involving the communication with the low level control of modules makes it not general enough to be applied into all robot configurations with loops. Inspired by the spanning tree algorithm, here we propose the idea of virtual disconnection to break the loops. The benefit of virtual disconnection is that it prevents the hormone from propagating to the same module again and again, without interfering the low-level module control. First, starting from an arbitrary module as the root, we use breadth-first-search (BFS) to explore all the modules and get a spanning tree, and then virtually cut all the connections that are not in the tree edges. Here, to virtual cut, we just change the corresponding elements in the modules’ local topology vector from uppercase to lowercase, and prohibit a module from sending the hormone message through the link when its topology value is L U lowercase. For example, given the B F R D robot configuration as Fig. 1, we start from module 1 and traverse all L U L U the modules by BFS. Three links F B B F R D R D that are not in the traversal path (shown by the crosses) are virtually L U F B cut, and the corresponding R D topology values in module 2 3 4 L U L U and 5 are changed accordingly. For B F B F R D R D example, the local topology of module 2 is changed from Fig. 1 Virtual disconnection for robot configuration [N,N,N,R,F,F] to [N,N,N,R,F,f]. with loops So module 2 will know that its B
458
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
connector is connected to another module’s F connector, but it will not send any message out through it B connector. After the loops are broken, the robot can still be viewed as a tree, and the hormone message will not circulate any more. 4. Adaptive Distributed Synchronization Protocol After virtual connection, the adaptive distributed synchronization protocol is used to collaborate the modules for an optimal, robust and scalable locomotion. The main approach is to use a combination of discrete event-driven hormone communication and continuous time-controlled motor motion.Here, each module works is deemed as an independent subsystem that works in a distributed way and runs the same program. Fig. 2 shows the main procedure of the program. Basically, it is a loop of receiving and sending hormones between neighbors, and executing local actions based on these messages. For each received hormone, the module will check its type, and then choose the time-controlled local actions or event-driven communication. The complex global behavior of the robot emerges from the interconnection and communication between individual modules. Here, a hormone is defined as [type, data, src], where type can be “command” or “feedback”, data is the hormone data, and src is the receiving connector through which the message is received. The “command” hormone is used to trigger different actions on the modules, while “feedback” hormone is for coordinate all the modules action. Loop do For each received hormone [type,data,src] do if (type==command) then Continuous time-controlled Local Actions. else Event-driven hormone communication. ExecuteLocalAction() LocalTime++; End Loop; Fig. 2 The adaptive distributed synchronization protocol
4.1. Continuous Time-controlled Local Actions function PropagateHormoneAndGetDesiredAngles (Hormone[command, data, src]) NewHormone[type, newData, src]<- Generate Hormone data according the RULEBASE; Cp=src; Send out NewHormone to all active connectors except Cp. DesiredAngles <-Select Actions according to the RULEBASE; Determine the time DT for the last received hormone } Move servos from LastDesiredAngles to DesiredAngles according to equation (1) Fig. 3 The continuous time-controlled local actions
Fig. 3 shows the pseudo code program of continuous time-controlled local actions. Originated from the head module, the “command” hormone is modified during its transmission, propagates through the entire tree, and stops at the leaf nodes. Different module reacts to the “command” hormone differently based on its own topology and local state information etc under a predefined RULEBASE. After a module receiving a “command” hormone[command, data, src], it will save src to be Cp (the connector that links to the parent module), rely the hormone to all of active connectors except Cp, set the DesiredAngles of its servo, and move the servo accordingly. Different locomotion has different RULEBASE, please refer to the hormone-inspired control [5] on how to construct it for detail.
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
459
As well as determine the DesiredAngles, the servo movement should have autonomous speed control so that the module can collaborate with each other. Here, we make the modules predict the time interval of each motion step by the history value, and self-regulate their own speed accordingly. For each received hormone [command, data, src], DT[data] represents the elapsed time before the module receives the next “command” hormone, and is used as the estimated time interval after the module receives the same hormone again. Suppose the module’s servos have reached LastDesiredAngles at previous step, and is triggered to DesiredAngles now by the hormone [command, data, src]. The module will regulate the motor speed to move from LastDesiredAngles to DesiredAngles within time DT[data] under equation (1).
S S L[i] D[i] L[i] - D[i]) sin( t ) t DT[data] ° ServoAngle[i](t) ® 2 2 DT[data] 2 °¯D[i] t DT[data]
(1)
where t is the module’s local time, L[i] is the LastDesiredAngles of the ith servo, and D[i] is the DesiredAngles of the ith servo. Instead of moving in a steady speed straight line that may suffer sudden speed changes between any two steps, we make the servos move in a sinusoid pattern. The speed at the angle of LastDesiredAngles and DesiredAngles are all zero, so that the robot moves smoothly without speed discontinuity between any two motion steps. If the servos’ speed can not achieve the action within DT[data], it will move directly towards DesiredAngles after that. 4.2. Event-driven Hormone Communication Event-driven hormone communication is used to coordinate all the modules, including synchronizing local clock, giving feedback on action state etc. Here, the event is the state that a module and all its descendents in the tree have completed their actions in the current step. Once reaching this state, the non-head module will generate a “feedback” hormone to send its action state to its parent from Cp, while the head module will generate a new “command” hormone to start another motion step. Hereby, “feedback” hormone traces back from the leaves to the root module when all the actions are finished in that step. Fig.4 shows the routine of event-triggered hormone communication. When a module receives a “feedback” hormone through src, it is informed that all the modules in the sub-tree rooted by src have finished their actions, and thus set the flag[src] to be 1.If the module has reached the DesiredAngles, the flag[self] is set 1. When flag[self] and flag[c] both reaches 1 for all active connectors c except Cp, a new “command” hormone is engendered for the head module while a “feedback” hormone will be sent to its parent module for a non-head module. function FeedbackHormone (Hormone[feedback, data, src]) flag[src]=1; if (|DesiredAngles-CurrentAngles|< G ) then flag[self]=1; if (flag[self]==1 & flag[c]==1 for all active connectors c except Cp) | (LocalTime>MaxClock) then clear all flag values to be 0 if (Cp!=NIL) then Send out [feedback, 1, nil] to its parent module through Cp connector else Generate a new “command” hormone in sequence, and send it out to all connectors links Fig.4 The Event-triggered hormone communication
To increase the robustness, we also set up a wake up mechanism. Due to the potential of communication errors, there may be situations where “feedback” hormone is lost. Or, the DesiredAngles is unreachable for the servos. To prevent the robot from
460
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
waiting forever in these situations, we set a threshold MaxClock. If the local timer reaches MaxClock, the module will wake up, and a new “command” or “feedback” hormone will be generated. 4.3. Discussion The adaptive distributed synchronization protocol has the following properties that are essential for reconfigurable robots. It is independent of the accuracy of the local timer rate of each module. Even though the clock rates are different among all the modules, the time interval in each motion step is well synchronized based on the feedback mechanism and DT prediction. DT is self-adjusted. If it is too short for a module, namely that some module can not complete the triggered action within that interval, the “feedback” message will be delayed by that module. So the next “command” hormone will be generated later than DT, and thus DT will be updated longer. In each motion step, DT is updated by the longest action time among all the modules in the robot, gradually increased to be long enough for all the modules to complete their actions, and becomes stable in the end. Based on the DT value, all the modules can regulate their speeds automatically to be consistent with each other without any prior knowledge about other’s motion attribute. It can achieve collaborative behaviors just by local communication between neighbor modules. Once actions in that motion step are completed for all modules in the robot tree, the head module will be notified by the “feedback” hormone and generate a new “command” hormone to travel further right away. Therefore, the robot can move continuously without gap between any two steps. It is resistant to disturbance. Due to our feedback mechanism, it is robust to the interference. If the speed of some modules is slowed unexpectedly, the delayed “feedback” hormone will postpone the generation of the next “forward” hormone, and thus increase the DT value for all modules. Hereby, all other modules will slow down their speed to achieve the collaborative behavior. Or, if some module can not reach the desired angles as expected, the wakeup mechanism will prevent it from stuck forever. It is robust to shape changing as modules can be added deleted or rearranged. Not any specific module, but the one with specific topology, acts as the head module. So the robot tree is dynamic and any module can become a leader when its local topology is appropriate. All the modules work in a distributed way so that there is no single point of failure. For example, for the caterpillar robot, the leftmost module with topology [N,N,N,N,N,B] acts as head module, and generates “command” hormones to engender the movement. If the head module is tore apart, the new left-most module will quickly update its topology, become the head based on its new topology, and generate another sequence of “command” hormones to keep the remaining modules moving. It is highly scalable. The robot is reduced to an interconnection of subsystems. Since all the modules work independently under the same rule, it is independent of the robot size. Moreover, it works for all robot configuration, because all configuration can be deemed as a graph, and evolves to be a tree under our virtual disconnection method. 5. Experimental results The adaptive and robust advantage of our synchronization method have been implemented the both in a physics-based GALINA simulation environment and our self-reconfigurable robot called SUPERBOT. 5.1. SUPERBOT Module
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
461
Fig. 5 shows our design of SUPERBOT module. It has enough moving flexibility to combine M-TRAN[8], ATRON[9] and CONRO[5] into one. As shown in Fig.5, each module essentially has three parts, the two segments and central part. Each of the two segments has Fig. 5 Schema of SUPERBOT module three universal connectors on three different sides so that a module can connect to any other module in any of the six directions in 3D (up, down, front, back, left, and right). Joined by a gimbal mechanism, the module has 3 rotational degrees of freedom and two pitch/yaw connected by a roll. This design will allow the module to pitch and yaw for up to 180 o and roll for 90o in each direction. This design gives the SUPEROBT module the most flexible movements, and provides the needed flexibility for the different locomotion and reconfiguration of multi-modules. Every module is able to detect if a connector is docked or not, and talk with up to six neighboring modules with a high-speed and reliable communication bus. 5.2. Simulation Result We have developed a simulation software called GALINA to create superbot modules and the testing environments as realistic as possible. To emulate concurrent execution of control programs for different modules and resulting communication issues, we use the threads mechanism to emulate simultaneously running modules in GALINA. In the implementation of all these behaviors, all the modules are loaded with the same program. The fist example is a creep motion. As shown in Fig. 6, the creep robot is composed of two modules that make the butterfly action. Intuitively, the two legs should be lifted from the ground when moving forward and touching the ground when moving backwards. To show the independence of the internal clock and ability to self regulating the speed, we Fig. 6 The robot with creep motion deliberately set the clock frequency of the two modules as 1000HZ and 1100KHZ, and the maximum motor speed of the left and right and 220o/s respectively. From creep.avi module to be 180o/s (http://www.isi.edu/robots/superbot/movies/sim/creep.avi), we can see that the two modules’ timer and speed are well synchronized, and the robot moves in a straight way. Moreover, even though the motion of lifting leg has different amplitude from moving forward, the speeds are well regulated so that the legs do not touch the ground before reaching the destination when swing forward. To demonstrate the adaptive ability to unexpected disturbance, we deliberately decrease the maximum speed of the left module in the halfway to make it unable to complete the desired motion in time. As we can see from creepRecover.avi (http://www.isi.edu/robots/superbot/movies/sim/creepRecover.avi), the robot makes a turn since the left leg is suddenly slowed down. After that, the feedback mechanism automatically increase the DT for each hormone, and the right module quickly adapts to the new speed of the left module. In a short while, the robot moves straightly again.
462
F. Hou and W.-M. Shen / Hormone-Inspired Adaptive Distributed Synchronization
Multi-locomotion, including caterpillar, sidewinder, rolling track, and crawler gait etc, is also implemented. To test the ability of recovering from dynamic configuration change, we deliberately “cut” a 8-module caterpillar, a sidewinder, and an eight-legged crawler into halves in the halfway. Without any modification to the program, all these segments adapt to the new configuration immediately and move as two small caterpillars, sidewinders, or crawlers. 5.3. Experimental Result We have also loaded our program into the real SUPERBOT modules for different locomotion. As of the writing of this paper, only two SUPERBOT modules are built, so the possible locomotion is very limited. After connecting the two modules in a line, we can make it have different motion, like caterpillar, creep etc. By changing the head module, the robot can move forward or backward. Both of the two modules are loaded with the same program and well synchronized. For the video of different behaviors in simulation and in real robot, please visit http://www.isi.edu/robots/superbot/movies/ 6. Conclusion This paper presents a hormone-inspired adaptive distributed synchronization for reconfigurable robots. The communication protocol for both cyclic and acyclic configuration is improved with the idea of virtual disconnection. The “feedback” hormone is proposed to make the behavior robust to dynamic changes and unexpected disturbance. Collaborative locomotion is achieved by adjusting each module’s motor speeds autonomously to make the modules adapt to each other. Effectiveness of our algorithm is demonstrated by the SUPERBOT modules both in physics-based simulation and in real robots. Multimode locomotion modes are implemented without increasing the hardware or software complexity. Our future work will be adding sensor information so that the robot can react to the environment. References [1] [2] [3] [4] [5] [6] [7] [8] [9]
M. H. Yim, Y. Zhang, D. G. Duff, “Modular robots,” IEEE Spectrum, 39(2), pp. 30-34, February 2002. M. Yim, Locomotion with a unit-modular reconfigurable robot (Ph.D. Thesis), in Department of Mechanical Engineering. 1994, Stanford University. Stoy, K, W.-M. Shen, P. Will, “Using Role-Based Control to Produce Locomotion in Chain-type SelfReconfigurable Robots,” IEEE Transactions on Mechatronics, 7(4), 410-417, Dec. 2002. K. Støy, W.-M. Shen, and P. Will, "How to Make a Self-Reconfigurable Robot Run", Proc. of the 1st international joint conference on autonomous agents and multiagent systems, July, 2002. W.-M. Shen, B. Salemi, and P. Will, “Hormone-Inspired Adaptive Communication and Distributed Control for CONRO Self-Reconfigurable Robots,” IEEE Transactions on Robotics and Automation, 18(5), October 2002. W.-M. Shen, B. Salemi and P. Will, "Hormone for self-reconfigurable robots". Proc. Of Intl. Conf. Intelligent Autonomous Systems, IOS Press, pp. 918-925, 2000. W.-M. Shen, Y. Lu and P. Will, “Hormone-based control for self-reconfigurable robots.” Proc. Intl. Conf. Autonomous Agents, 2000. S. Murata, E. Yoshida, etc, "Hardware Design of Modular Robotic System", Proc. of 2000 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems, 2000 .MWJorgensen, EHOstergaard, HHLund, "Modular ATRON: Modules for a self-reconfigurable robot", proceedings of IEEE/RSJ International Conference on Robots and Systems, pp. 2068-2073, 2004.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
463
Spatial Prisoner’s Dilemma in a Network Environment Introducing Heterogeneous Information Distribution Hiroshi Kuraoka a,1 , Nobutada Fujii b and Kanji Ueda b Department of Precision Engineering, The University of Tokyo b Research into Artifacts, Center for Engineering, The University of Tokyo a
Abstract. Recent studies reveal that the features of scale-free and small-world phenomena enhance the performance of networks such as the Internet or protein networks. Numerous studies have analyzed the manner in which cooperation emerges in such complex networks. It is necessary to take into account simultaneously the facts that the network structure itself emerges through interactions among agents and dynamics arising on the network because these points are inseparable in the real world. Furthermore, it is important to take into account the heterogeneous information distribution among elements because elements in the real world have structural restrictions and act under incomplete information. In this paper, heterogeneous information distribution is introduced to a spatial prisoner’s dilemma and the emergence of cooperation through the interactions among agents is examined. The resulting network has characteristics of a scale-free network and a small-world network. Results also show that a robust form of cooperation emerges when heterogeneous information distribution is considered. Keywords. spatial prisoner’s dilemma, complex network, heterogeneous information distribution
1. Introduction Recent studies of complex networks have revealed that features of network topology are associated with the performance of networks such as the Internet, food webs, and protein networks. Most of these networks share three prominent features[6]. (a) The average shortest path length L is small. Typically, only a few edges must be passed to connect two nodes on the graph. (b) The clustering coefficient C is large. Two nodes having a common neighbor are far more likely to be mutually connected than are two randomly chosen nodes. (c) The distribution of the degree is scale-free, i.e., it decays as a power law. Small-world properties comprising the average short path and the clustered property are realized in the pioneering mathematical model[9]. The model of a scale-free network that realizes scale-free distribution in the context of growing networks receives a lot of attention[2]. Recent studies have revealed that the features of small-world and scale-free phenomena enhance network performance. 1 Correspondence to: Hiroshi Kuraoka, The University of Tokyo Kashiwanoha 5-1-5, Kashiwa, CHIBA 2778568, Japan; Tel: 04-7136-4271; Fax: 04-7136-4242; E-mail: [email protected]
464
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
Meanwhile, the Prisoner’s Dilemma models an interaction involving two (or more) elements, each with two or more "strategies" or states, such that the outcome depends on the choices of all the interacting elements. The Prisoner’s Dilemma is used to investigate how cooperation emerges. Many studies have analyzed the manner in which cooperation emerges in complex networks. In most of those studies, the spatial Prisoner’s Dilemma, which introduces spatial structure similar to that found in the real world, is used on a complex network to investigate the manner in which cooperation emerges. Similarly, the emergence of cooperation in a network environment is discussed in this paper. New knowledge of methodology to create a decision-making solution of artificial system is expected to be obtained by investigating the emergence of cooperation in a network environment. Mainly, previous studies of complex networks have concluded that the network topology is important in the expression of system function (Fig. 1(a)). For example, Abramson used a spatial Prisoner’s Dilemma to many kinds of networks and found that cooperation is promoted mostly in small-world networks[1]. This work analyzes the manner in which cooperation emerges on a static network. It presents discussion of the network dynamics. However, it is more realistic to consider situations in which network interactions evolve dynamically by adapting themselves to the emerging global structure. It is important to consider that networks are dynamic entities that evolve and adapt according to the actions of the elements that form a network from the bottom-up[8][10]. In other words, it is necessary to take into account simultaneously the dynamics that prevail within the network and the dynamics of the network, which is the process of network evolution: these two dynamics are inseparable in the real world. Furthermore, it is natural to consider that the function of the system appears through these two network dynamics(see Fig. 1(b)).
Figure 1. Relation between network structure and system function
On the other hand, it is important to take into account the heterogeneous information distribution among the elements because real-world elements have structural restrictions and act under incomplete information. The Zimmermann study indicates that smallworld phenomena emerge through the two network dynamics using the spatial Prisoner’s Dilemma, but it does not examine the actions of players based on heterogeneous information distribution[10]. Heterogeneity and structural restrictions have much to do with the system performance. Examples are shown below.
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
465
• Kauffman posits that heterogeneity and structural restrictions based on physical laws are indispensable for biological evolution[5]. • In the field of Artificial Intelligence, the coupling of a simple sensor and action engenders intellectual action such as finding a way to the goal of the maze[7]. • Synchronization phenomena of fireflies are attributable to information distribution and primitive information-handling ability[4]. In this paper, heterogeneous information distribution is introduced to a spatial Prisoner’s Dilemma. The emergence of cooperation is analyzed through two-network dynamics using a computer simulation. This study is intended to examine the emergence of cooperation in a network environment introducing heterogeneous information distribution. It presents a new knowledge of methodology to create decision-making solutions of artificial systems.
2. Model The system that is considered in this paper is defined as a 200 by 200 lattice upon which N agents are placed randomly. The location of agent i is defined as xi . Each agent chooses a number of n agents as opponents against whom to play the game; subsequently, it creates links between them. The neighborhood of agent i (Ψi ) is composed of those agents who are connected directly to i by one link; the size of Ψi defines its degree ki . The sum of agents that i chooses as opponents and those agents who choose i as an opponent are ki . Only one link is created between agents when two agents choose one another as opponents. Network N is structured from links among agents. The state of each agent si can be (1,0) or (0,1). si is considered either as cooperation or defection because the Prisoner’s Dilemma is used in this model. Each agent chooses C (cooperation, si = (1,0)) or D (defection, si = (0,1)). The simulation flow is shown below (Fig. 2). 1. Agent i plays games with its neighbors Ψi and obtains a total payoff Πi . Πi is defined as below. si JsTj . (1) Πi = j∈Ψi
Πi depends on the chosen state si and payoff matrix RS J= . TP
(2)
If both agents choose C, each gets a payoff R; if one defects while the other cooperates, the former gets payoff T > R, whereas the latter gets the "suckers payoff" S < R. If both defect, each gets P . Under standard restrictions T > R > P > S, T + S < 2R, defection is the best choice in a one-shot game. The result is a Nash equilibrium in which both agents defect. 2. Agent i updates its current state by comparing its total payoff with all neighbors and imitates the state of the wealthiest agent. Agent i does not update its current state when its own payoff is highest among neighbors. 3. Agent i diffuses influence field Fi (l) in the system according to the total payoff Πi . Also, l is the distance from agent i. Here, Fi (l) is defined as
466
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
Figure 2. Simulation model
Fi (l) = Πi {exp(−l2 /λ2 )}.
(3)
In that equation, λ is defined as the influence parameter. It can be said that the higher the total payoff Πi , the stronger the influence field Fi (l) becomes. Fi (l) is attenuated as the distance from agent i becomes longer. Agent i perceives the total influence field Ii from the system. Ii is defined as Ii = Fj (|xi − xj |). (4) j=i
4. When the state of agent i before updating its state is D, i replaces the link with probability p between the opponent whose state before updating is also D by a new one. It can be said that the higher the probability p, the more flexible the network structure becomes. The probability Φik of agent k to be chosen as a new opponent by agent i is defined as Φik =
Fk (|xi − xk |) . Ii
(5)
An agent whose influence field is strong will be chosen as a new opponent with high probability. In this paper, the heterogeneity of information distribution is defined as the difference of Ii among agents. The degree of homogeneity of information distribution is defined as influence parameter λ. In case of large λ, Fi (l) is not attenuated so much and the
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
467
information distribution becomes homogeneous since the influence field Fi (l) is diffused globally among the system. On the other hand, when λ is small, Fi (l) is attenuated notebly and the information distribution becomes heterogeneous since the influence field Fi (l) is diffused locally among the system. The indicator used in the model will be (a) ρc (t): the percentage of the agent whose state is C, (b) LN : the average shortest path length of network N which measures the typical separation between any pair of elements in the network, and (c) CN : clustering coefficient of network N which meaures the clustering of an element’s neighborhood. 3. Simulation results and discussion 3.1. Simulation settings In this paper, simulation of various influence parameters λ is shown to analyze the effect of heterogeneous information distribution. The relationship between the influence field according to the influence parameter λ and the scale of the field is shown in Figure. 3.
Figure 3. The relationship between Fi (l) and the scale of the field
In this paper, λ = 10000 is defined as the situation in which information distribution is completely homogeneous. The degree of heterogeneity of information distribution becomes stronger in order of decreasing λ. Three cases (p = 0.01, 0.1, 1) are tested. The model is characterized as N = 1000, n = 2. The initial state and opponents of the agent are chosen randomly. Parameters used in the matrix of the Prisoner’s Dilemma are R = 1.0, T = 1.6, P = 0.1, S = 0. 3.2. Network emergence 3.2.1. Relation between ρc (100) and heterogeneous information distribution Figure 4 shows average ρc (100) according to the influence parameter λ in each p. Here, ρc (100) is the percentage of agents whose state is C in time step 100. In case of p = 0.1 and 1, emergence of cooperation is observed in about λ = 50 because the ρc (100) converges in about λ= 50 to ρc (100) = 0.8, 0.9. A notable difference is inferred to exist between λ = 5 and λ = 10. In case of small λ, such as λ = 5, opponents playing games will only be nearby agents because the perception of the influence field will be the one diffused by the near agent. For that reason, agents will lose the chance of updating their state to C by playing games with distant agents; no emergence of cooperation is observed. It can be said that cooperation does not emerge if information distribution is notably heterogeneous (λ = 5) among agents. In the case of p = 0.01, the value of ρc (t) is low. If the network structure is not sufficiently flexible, cooperation’s emergence will be difficult to observe because less chance exists of updating its state to C by playing games with other agents.
468
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
Figure 4. Transition of average ρc (100) according to λ
3.2.2. Relation between LN , CN and heterogeneous information distribution
(a) LN (b) CN Figure 5. Transition of LN and CN according to distance
Figure 5(a) shows average LN according to the influence parameter λ in each p. It is observed that LN is at its highest among all λ at λ = 5 and it is attenuated as λ becomes larger. In case of p = 1, LN converges to 4. Particularly, when λ = 10000, a situation in which information distribution is completely homogeneous, small-world phenomena are observed (LN = 3.67). There are shortcuts that link distant agents in a small-world network, but for small λ such as λ = 5, the game opponents will be nearby agents, so there will be no shortcut that makes the network small. For small p, no significant difference of LN exists among λ. It is considered that this is true because the network is not flexible and the agents cannot change opponents so often. Figure 5(b) shows average CN according to the influence parameter λ in each p. Similar to the case of LN , CN is at its highest among all λ at λ = 5 and it is attenuated as λ becomes larger because, in the case of small λ, the game opponents will be nearby agents. There will be a sufficient chance of two agents having a common neighbor. It is observed that for all λ of p = 1, the network structure is highly clustered considering the fact that the clustering coefficient of the random network crand is crand = ki /N = 0.004, and that the clustering coefficient in all λ of p = 1 is greater than crand . Smallworld phenomena emerged also from the perspective of clustering coefficients.
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
469
3.3. Robustness of cooperation
(a) λ = 5
(b) λ = 129 (c) λ = 198 Figure 6. Connectivity distributions
(d) λ = 10000
Figure 6 shows the distribution of degree of λ = 5, 129, 198, 10000 for p = 1. Apparently, no hub agent exists: that is, no agent has a much greater number of links than another in the case of λ = 5, but there is a hub agent in case of λ = 10000. On the other hand, power law distributions are visible in the case of λ = 129, which shows the property of a scale-free network. The influence field of an agent becomes stronger when the total payoff of the agent grows and spreads through the system homogeneously when the information is homogeneous (λ = 10000). It is considered that when information is homogeneous, a positive feedback mechanism pertains against the agent that starts to grow as a hub because this agent will be chosen as an opponent with high probability by other agents because of homogeneous information distribution. A network structure that contains a few hubs is considered to be robust against random failure, but is vulnerable to the failure of hubs[3] because hubs chiefly sustain the network structure.
Figure 7. Transition of ρc (t) according to λ
Figure 7 shows the transition of ρc (t) of λ = 5, 129, 198, 10000 in the case of p =1. This figure illustrates the sensitivity of the stationary network structure to perturbations that act on the highly connected agents, which reflects their key role in sustaining cooperation. At time step 100, the most connected agent is externally forced to change its state from C to D, triggering an avalanche. It is apparent that, in the case of λ = 10000, in which information distribution is completely homogeneous, an avalanche engenders ρc (t)=0: no cooperation exists. It is considered that a large avalanche occurs when there are only a few hub agents. In other
470
H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment
words, cooperation might emerge on a network that is structured under homogeneous information distribution, but that cooperation will not be robust. For λ = 129, 198, in which heterogeneous information distribution is considered and a couple of hubs exist, the emergent cooperation is robust and reproduces transient dynamics in which the system searches for a new stationary globally cooperative structure. These characteristics pertain because some clusters exist that are sustained by each hub; they stop the avalanche. After the avalanche settles, a new network will be structured as centering upon one hub, which subsequently engenders the reproduction of cooperation.
4. Conclusion This paper introduced heterogeneous information distribution to the spatial Prisoner’s Dilemma and analyzed the effects of heterogeneous information distribution using a computer simulation. Results demonstrated that: (a) the resulting network has characteristics of a small-world and scale-free network when information distribution is heterogeneous; (b) cooperation that emerges on a network under homogeneous information distribution is fragile; and (c) robust cooperation emerges when heterogeneous information distribution is considered. In other words, the perspective of heterogeneous information distribution is important in creating a decision-making solution for an artificial system. Further studies should explore the effect of heterogeneous information distribution on real-world problems such as manufacturing systems.
References [1] G. Abramson, M. Kuperman: Social games in a social network, Physical Review E 63 (2001), 030901. [2] A.-L. Barabasi, R. Albert, H. Jeong: Mean-field theory for scale-free random networks, Physica A 272 (1999), 173-187. [3] R. Albert, H. Jeong, A.-L. Barabasi: Error and attack tolerance of complex networks, Nature 406 (2001), 378-382. [4] M. Buchanan: Nexus: Small Worlds and the Groundbreaking Science of Networks, W. W. Norton & Company, 2002. [5] S. Kauffman: Investigation, Oxford Univ. Pr on Demand, 2003. [6] K. Klemm, V.M. Eguiluz: Highly clustered scale-free networks, Physical Review E 65 (2002), 036123. [7] R. Pfeiffer, C. Scheier: Understanding Intelligence, Massachusetts Institute of Technology, 1999. [8] B. Skyrms, R. Pemantle: A dynamic model of social network formation, Proceedings of the National Academy of Sciences 97(16) (2000), 9340-9346. [9] D.J. Watts, S.H. Strogatz: Collective dynamics of ’small-world’ networks, Nature 393 (1998), 440-442. [10] M.G. Zimmermann, V.M. Eguiluz, M.S. Miguel: Coevolution of dynamical states and interactions in dynamic networks, Physical Review E 69 (2004), 065102.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
471
Behavioral Decision for Multi-agent Systems with Dynamic Interaction Yusuke Ikemoto a,1 , and Toshio Fukuda a a Nagoya University Abstract. In this study, we discuss a behavioral decision for multi-agent systems. Merging behaviors of vehicles at non-signalized intersection are mentioned as an example of behavioral decision. The algorithm enables vehicles to pass through an intersection one by one like weaving and zipping manners without vehicle collision. These behaviors are achieved by an emerging special-temporal pattern of a coupled oscillator network, where a mutual communication between vehicles is realized by using only IVC device. The special-temporal pattern formation can be realized in the diffusion system through the mutual communication so that only locally broadcasting communication could be enough for the communication. The algorithm is therefore independent of any infrastructures at a junction such as a signal system, road-vehicle communication systems, and so on. Finally, the proposed algorithm is experimentally verified using actual autonomous mobile robots. Keywords. Local Interaction, Spatial-temporal Pattern Formation, Homogeneous System, Globally Coupled Oscillator Network, Experiment by Robots.
1. Introduction We discuss a behavioral decision for multi-agent systems with dynamic interaction.As an example, merging behavior of vehicles are treated.Automated transportations are one of the desired systems in order to improve the traffic efficiency and to reduce traffic accidents. In ITS research field, most researches of an automated vehicle have an approach to develop an intelligent system that automatically and safely drives a vehicle on behalf of human or that assists a driver. These automation technologies are not sufficient for the automated transport system, because a cooperative manner in driving becomes one of important factors for smooth and efficient traffic flow control especially in a congested area. The ideally efficient traffic flow will be realized when each driver cooperatively selects his own path, locally or globally checking other cars on its driving paths. For example, several studies on the cooperative driving using IVC (Inter-vehicle Communications)[1][6] have been reported. A vehicle that is equipped with intelligent system such as IVC and car navigation system can drive cooperatively without any advanced infrastructures. Authors have also proposed a cooperative planning algorithm, where each car revises its own path at all intersection based on other vehicles ’s destinations exchanged in a local 1 Yusuke
Ikemoto, Department of Micro-Nano System Engineering, Graduate School of Engineering, Nagoya University, Japan. Tel.: +81 52 789 4481; Fax: +81 52 789 3909; E-mail: [email protected]+81 52 789 4481
472
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
Figure 1. Definition of blue zone and red zone
Figure 2. Transitions of target phase-locked state. (a) There is two cars around an intersection. (b) An additional car joints the communication. (c) A car of three passed an intersection.
area. This is a planning approach to improve the traffic flow, and a local performance improvement is another approach for the same purpose. In this study, we therefore focus on a cooperative merging control for an intersection flow, using only IVC as one of local performance improvement approaches. We have proposed zipping and weaving behaviors which is observed in the merging in non-signalized intersection[7]. The merging control for the zipping and weaving manners includes more general cases. Finally, we make experimants using actual mobile robots and confirm the effectiveness of the proposal algorithm.
2. Target Behavior and Assumption A merging control algorithm at a general non-signalized intersection is dealt with as the extension of zipping and weaving manners. Figure 2 shows the target behaviors of a merging. At a general intersection. The time allocation depending on the traffic states is informative for traffic management. For example, in the case of figure 2(a), the time for passing through is allocated by two kinds of roads. Only when some vehicles which try to turn right approaches the intersection as shown in figure 2(b), the time is allocated by three kinds of roads. After all vehicles which located in right-turn lane finish passing through as show in figure 2(c), the time is allocated by two kinds of roads again. 2.1. Common Reference Signal In order to realize the synchronization among vehicles, each vehicle behaves based on the own reference signal and the signals are synchronized by an interaction. The reference signal is defined as follows: θ˙i = ωn ,
(1)
where θi is the phase of vehicle i and θ˙i is time differentiation of θi . ωn is natural frequency, which is determined by an intersection configurations and so on. Here, the blue zone and red zone are defined in figure 1, corresponding to the value of the reference signal θi . When the value of θi is in the blue zone, a vehicle,i, is allowed to enter an intersection. In the case of the red zone, a vehicle i must stop before entering. In this experiment, an even-length time is allocated for two zone as ideal state that is independent of an intersection configurations such as the number of roads, size of an intersection and so on. Considering the state shown in figure 2 as an example.
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
473
2.2. Synchronization of Traffic from Different Roads Reference signals, which independently oscillate, need to be synchronized by simple mutual communication. Coupled oscillator network is used for synchronization among each reference signal. There are many kinds of oscillator models and any kinds of oscillator models can be installed in our proposed algorithm. A reaction-diffusion system expressed in equation (2) is used, because of simplicity of parameter settings.
θ˙i = ωn + K
N
a · f (θj − θi ),
j=1
where a =
(2)
−1 if j = i 1 if j = i
and N (≥ 2) and K are the number of roads or lanes and coupling strength of diffusion, respectively. The function f , that has a phase difference variable as input, means oscillators’ coupling. When two vehicles in the same road communicate, the suffix j is the same as i, that means the reference signals synchronize with zero phase difference. On the other hand, the reference signals synchronize with a certain phase difference when two vehicles in different roads communicate. Figure 3 shows the time series of reference signals in a numerical simulation, where four vehicles are supposed to approach an intersection from two difference roads. θpi and θqi are reference signals of vehicles in the road p and q, respectively. As shown in the figure, the phase difference between θp1 and θp2 converges to zero, while the phase difference between θpi and θqi converges to π. Using these coupled oscillator networks, the reference signals in equation (1) can be synchronized. The stability of synchronization in general case where an intersection consists of N roads is investigated in the next section. 3. Synchronization Stability All vehicles are supposed to communicate with all vehicles in a local area around an intersection using a broadcasting communication. The coupled oscillator network could be considerd as globally coupled oscillator[14]. To be synchronized with an even phase difference, the function, f , in the equation (2) has to be set properly. In order to stabilize the globally coupled oscillator, the function, f , should be set as the follows: 1 f (φ) = − φ + 1, π
(3)
where φi means the phase difference. Considering the symmetry, it is enough to prove only a case of φ1 = θ1 − θ2 converges. The difference of φ1 is φ˙1 = θ˙1 − θ˙2 =K
N j=1
{f (θ1 − θj ) − f (θ2 − θj )}
474
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
Figure 3. Example of phase-locked transitions. The phase difference between θp1 and θp2 converges to zero, while the phase difference between θpi and θqi converges to π. It means that vehicles can generate virtual phase signal by local communications.
=K
N −1
f
j=1
j
φl
−f
l=1
j
φl+1
.
(4)
l=1
From the equation (3), n n φi = f (φi ) − (n − m) f i=m
"
(m > n).
(5)
i=m
Here, substituting the equation (5) to the equation (4), φ˙1 = K
N −1
(N − j) {f (φj ) − f (φj+1 )}
j=1
= K {N f (φ1 ) − (N − 2)} N (φ1 = 2π − i=2 φi ) =−
KN 2π (φ1 − ). π N
As a consequence, φ1 is solved as follows: 2π KN φ1 = Cexp − t + , π N
(6) (7)
(8)
where C is a constant number determined by an initial condition. From the equation (8), it is proved that the phase converges exponentially and the phase difference becomes 2π/N after enough time passed.
4. Implementation 4.1. Behavior Rules A vehicle determines to enter or stop before an intersection according to its reference sig2 nal. A vehicle moving at maximum velocity vmax takes vmax /2a[m] for its completely 2 /2a[m] ahead stop, where a is deceleration. The decision line is supposed to drawn vmax of an intersection. A vehicle has to make a decision where it could enter an intersection
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
475
or wait for the next phase at the decision line. Before passing through the decision line, a vehicle can move regardless of blue zone or red zone, and then a vehicle is allowed to enter an intersection when it is in a blue zone on the limit line. On the other hand, a vehicle decelerates if it is in a red zone when it passes through the decision line. The decelerating vehicle starts to accelerate to enter an intersection if it is in the blue zone of the next cycle. 4.2. Parameter Settings The one cycle is defined as an interval from a blue zone to the next blue zone and tbi and tra are defined as the interval of the blue zone of vehicle i and a red zone that the other vehicles should decelerate or stop. The necessary condition for collision avoidance is that cars in the other lanes never enter the intersection until a car in a lane finishes passing completely once cars in a lane enter. That means the deceleration red interval tra becomes longer than the blue interval. The sum of blue zone intervals in all lanes is calculated by N θbi i=1
ωn
=N
θb , ωn
(9)
where θb (0 ≤ θb < 2π) is the angle of the blue zone shown in figure 1. Therefore, tra is calculated by 1 N
θb 2π −N ωn ωn # 1 2π − θb . = ωn N
tra =
#
(10)
Additionally, the geometrical condition also should be satisfied, as follows:
tra
v(t)dt > L + l,
(11)
0
where l and L are vehicle length and road width, respectively. Assuming that a velocity of a vehicle is constant, the condition for collision avoidance is obtained as follows: 1 ωn
2π − θb N
# >
L+l . vconst.
(12)
Based on this condition, ωn and θb are designed. These parameters depends on intersection specifications that includes the number of road N , road width L and vehicle velocity vconst . These information is available since they can be embedded in car navigation systems.
476
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
Figure 4. Phase locked
4.3. Communication Range for Convergence The reference signals should be synchronized at least before a vehicles enters an intersection. A necessary communication range R[m] is calculated by the convergence time spent for a phase lock using the equation (8). As far as an initial condition,the necessary convergence time for phase lock becomes the maximum in the case of φi t=0 = 0. Besides, the convergence time becomes the maximum in the case of N = 2 since the phase is exponentially locked with a stability constant, KN/π. Consequently, the necessary communication distance R is calculated on the condition where the number of roads N , are two, and where stability constant C, is −π. Figure 4 shows transitions of θ1 , θ2 and θ3 where N = 3, K = 0.4 and ωn = 0.075. The phase almost converges and is locked after the seventh mutual broadcasting communication. The necessary communication distance is calculated as follows: R > 2 × v × 7vtc = 14vtc + L,
(13)
where L[m] is the width of an intersection.
5. Experiment 5.1. Experimental Settings In this experiment, a vehicle velocity is simplified as constant speed and binary with no acceleration and no deceleration. The velocity and the length of a vehicle are set as vmax = 50[mm/s] and l = 65[mm], respectively. The velocity is corresponding to about 20[km/h] of a current passenger vehicle, considering the scaling factor. Only vehicles locating in the communication area is allowed to receive information from other vehicles in other roads. Vehicle can recognize an intersection by sensing white lines drawn on the floor with infrared sensors so that it pass through a cross, turn right or left and stop before a cross. The internal value is transmitted with 250[msec] intervals by checking an internal timer interrupt of a micro processor. The internal data received from other vehicles are buffered and used to update own internal value of the oscillator. If no information is received until the next timer interrupt, the internal value is updated with K = 0 of the equation (2). Figure 6 shows the intersection model in this experiment. The dimensions of the intersection is designed according to the ratio between an actual vehicle and an experimen-
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
477
Experiment (a)
Experiment (b)
Figure 5. experimental view
Figure 6. Intersection model
Experiment (a)
Experiment (b)
Figure 7. Experimental result
tal vehicle. Four kinds of roads are supposed to connect at an intersection, and N , S, W and E are used to express each road that comes from north, south, west and east directions. Each road has three kinds of lanes named i, r and o, which goes into an intersection for passing through, for turning right and goes out of an intersection, respectively. For example, the vehicles which come from road Si or Er will go out an intersection from road No . In this case the vehicle routes are expressed as a route Si -No and a route Er -No , respectively. 5.2. Results There are two kinds of experiments: (a) the merging between route Ei -Wo and route Wr So and (b) the merging between route Si -No , route Ei -Wo and route Sr -Eo . On the both of the experiments, the parameters is set as ωn = 0.075, θb = 1.57 and K = 0.4 by the equation (12). In the experiment (a), two numbers of vehicles approach the intersection to pass through the route Wr -So at t = 5, 20 during six numbers of vehicles pass through the route Si -No . In the experiment (b), two numbers of vehicles approach the intersection to pass through the route Sr -Eo at t = 10, 30 during five numbers of vehicles pass through the route Si -No and five numbers of vehicles pass the route Ei -Wo . Figure 7 indicates the distance from a center of the intersection every 1 second. Both results clearly show that vehicles alternately pass through the intersection one by one. Some vehicles stop before reaching a limit line because they keep a distance from fore vehicle stopping on a limit line. The reason the velocity of some vehicles is wiggling is that they detect the line and rotate during line-trace.
478
Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems
6. conclusion In this study, we discuss a merging algorithm at non-signalized intersection as behavioral decision for multi-agent systems and the effectiveness of the algorithm was confirmed by mobile robots in real space. Additionally, the stability of the synchronization algorithm between vehicles using coupled oscillator network was proved by a mathematical analysis. The proposal algorithm is effective since vehicles can automatically generate a global order using only locally broadcasting in situation of not only merging but also other cooperative driving. The synchronization algorithm with a coupled oscillator network works. Widespread applications are expected for mutual concessions in general road, allocating the time for a lane change and so on.
References [1] M. Parent, “Distributed motion control of an electric vehicle,” Proc. 5th International Workshop on Advanced Motion Control, pp. 573, 1998 [2] C. Thorpe, T. Jochem, and D. Pomerleau, “The 1997 Automated Highway Free Agent Demonstration,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [3] O. Gehring and H. Fritz, “Practical Results of a Longitudinal Control Concept for Truck platooing with Vehicle to Vehicle Communication,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [4] A. Cochran, “AHS Communications Overview,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [5] K.S. Chang, J. K. Hedrick, W. B. Zhang, P. Varaiya, M. Tomizuka, and S. Shladover, “Automated Highway System experi-ments in the PATH Program,” IVHS J., vol. 1, no. 1, pp. 63-87, 1993. [6] S. Kato, S. Tsugawa, K. Tokuda, T. Matsui and H. Fujii, “Vehicle Control Algorithms for Cooperative Driving With Automated Vehicles and Intervehicle Communications,” IEEE Trans. Intell. Transport. Syst., vol. 3, pp. 155-161, Sep. 2002. [7] Y. Ikemoto, Y. Hasegawa, T. Fukuda and K. Matsuda“Zipping and Weaving : Control of Vehicle Group Behavior in Non-signal Intersection,” Proc. IEEE Int. Conf. on RA, 2004, CD-ROM. [8] A. Bejan and R. Lawrence, “Peer-to-Peer Cooperative Driving,” 17th Int. Symp. Computer and Information Sciences, pp. 259-263. [9] A. Uno, T. Sakaguchi and S. Tsugawa, “A Merging Control Algorithm based on Inter-Vehicle Communication,” Proc. IEEE Conf. ITS, 1999, pp. 783-787. [10] Y. Kuramoto, “Chemical oscillations, waves, and turbulence,” Berlin, Springer-Verlag, 1984 [11] J. S. Kwak and J. H. Lee, “Infrared Transmission for Intervehicle Ranging and Vehicle-To Roadside Communication Systems Using Spread-Spectrum Technique,” IEEE Trans. Intell. Trans-port. Syst., vol. 5, pp. 12-19, Mar. 2004. [12] M. Akanegawa, Y. Tanaka and M. Nakagawa, “Basic Study on Traffic Infirmation System Using LED Traffic Lights,” IEEE Trans. Intell. Transport. Syst., vol. 2, pp. 197-203, Dec. 2001. [13] “Optical Transmitter and Receiver for Inter-vehicle Communi-cation,” OKI Technical Review, OKI Electric Industry co., Ltd. Homepage, http://www.oki.com/ . [14] E. Brown, P. Holmes, and J. Moehlis, “Globally coupled oscillator networks” Perspectives and Problems in Nonlinear Science: A Celebratory Volume in Honor of Larry Sirovich, E. Kaplan, J. Marsden, K. Sreenivasan, Eds., p. 183-215. Springer, New York, 2003
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
479
Co-creative Composition Using Multiagent Learning: Toward the Emergence of Musical Structure1 Shintaro Suzuki a,2 , Takeshi Takenaka a and Kanji Ueda a a Research into Artifacts, Center for Engineering (RACE), University of Tokyo Abstract. This paper presents Co-creative Composition, by which the musical note agent decides the next position while interacting with other agents in a musical sheet-like environment. The agent determines its action with reinforcement learning according to the sensory consonance theory. The generated chords are evaluated through comparison with music theory and through psychological experimentation. Results demonstrate that the generated chords were similar to those obtained through music theory, and show that the subjects felt their clarity of sound. Keywords. Co-creation, Reinforced Learning, Consonance, Multi-agent
1. Introduction >From the perspective of music cognition, we listen to the entire impression of multiple tones rather than to each tone in music. Each musical note is dependent on other notes and has a strong relation to them based on listeners’ cognitive features. Musical composition, therefore, is thought to be a design problem of these relations. Prior studies for understanding such human composition in cognitive psychology have employed analytical approaches and have represented attempts to create a human model of the composition process. Their approaches seem to be insufficient because processes of human composition have variety and depend on persons or situations. Consequently, we aim not to create a model, but to understand synthetic aspects of human composition that involve relationships between cognitive features and emergence of musical structure through solving the design problem. However, it is very difficult to design proper relations of tones because many layout methods of tones exist. A multi-agent approach has been applied to this complex system. We propose this Co-creative Composition system, in which music is generated through interaction of autonomous musical notes based on cognitive features. Musical note agents 1 Use
IOS-Book-Article.tmpl file as a template. to: Shintaro SUZUKI, Research into Artifacts, Center for Engineering (RACE), University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba. Tel.: 04 7136 4276; Fax: 04 7136 4276; E-mail: [email protected]. 2 Correspondence
480
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
learn suitable relationships with other agents through reinforced learning. In addition, we specifically examine sensory consonance, which is a basic feature of music cognition, as a relationship among agents. We generate chords (chord sequence) through simulation using our system and examine what kind of musical structure is emergent in them through comparison with music theory and by psychological experimentation. Some previous studies on automatic composition have mainly used a database of existing music and have reconstructed data with emergent-based calculation [3]. Prior studies have also examined automatic harmonization and automatic counterpoints [1] [2]. Such approaches seem to be unsuitable for our object – understanding synthetic aspects of human composition – because we particularly explore generation of musical phrases created by human, musical theory and composition technique themselves, which are given in advance in those approaches.
2. Multi-agent Simulation with Reinforcement Learning for Serial Chord Generation 2.1. Outline of simulation model Figure 1 shows the structure of a simulation model in the Co-creative Composition system. Musical note agents are in a musical-sheet-like environment. We use Q-learning in this study. The procedure is to repeat the following two-step loop: 1. Agents perceive the state of environment and select the action by discrete timestep 2. As a result of actions, agents are given a reward and update Q(s,a) according to the following function: Q(s, a) ← Q(s, a) + α[r + γ max Q(s , a ) − Q(s, a)] a ∈A(s )
(1)
where s is the state, a is the action, α is the step size parameter, and γ is the discount rate. The action selection strategy is the Boltzmann strategy. The “state” is defined as the positional relationship of the other agents (how far the distance from the other agents is by half-tone). The “action” is defined as the distance from the present position to the next one of the agent (how far the agent moves up or down by half tone). The “reward” is the value of new position generated through the actions of agents. In the next section, we explain the process of evaluating chords. 2.2. Evaluation Method based on Sensory Consonance Theory We use the sensory consonance theory for evaluation of actions. This theory is the most widely supported one to explain the concept of consonance. The theory considers dissonance as roughness brought by the beat, which is decided by a pitch, a tone color, and a velocity [4]. In this study, we used the Sethares model for computation of dissonance [5]. Consonance was used originally for the relationship of two tones existing simultaneously (spatial consonance). In this study, however, we propose to extend the concept of consonance to the time dimension (temporal consonance). We consider short term memory (STM) during listening and set the range of memories that is equivalent to a
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
481
Figure 1. Structure of the simulation model.
certain fixed period from present to past in a time-pitch plane to realize this idea. Generated notes are located in this range. Dissonance between the present tone and the past one in the range is defined as temporal dissonance (Fig. 2). Music is art based on tempo. Therefore, it seems to be reasonable to employ STM. In this study, temporal consonance is applied only to past tones of each agent between memories because we conceptualize each agent’s motion as a melody and think of the generated music as a polyphonic music in which each melody is highly independent. Dissonance between two tones is evaluated based on the sensory consonance theory. Dissonance is converted into a consonance point using linear transformation. Spatial consonance R s or the temporal consonance R t is given by the average of the points between tones that are relevant to them. Using R s and Rt , the reward R is given as R = Rs × Rt
(2)
Figure 2. Model of short term memory during listening.
2.3. Experiment 1 2.3.1. Simulation Experiment: Outline Using the model explained in the preceding section, we conducted an experiment in four cases. In each case, we used different kinds of consonance to evaluate the actions. We verified the effects of temporal consonance through this experiment.
482
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
In this experiment, the model had three musical note agents. They produced decisions simultaneously. All actions had equal duration (length of crotchet). Therefore, generated chords consisted of three notes at a constant tempo. The task for the agents was to generate eight constant-paced chords. In experiment 1 on case 4, we attempted to generate consonance and melodious chords using this system. case 1 Spatial consonance and temporal consonance (the range of memories is equivalent to 16 tones (each tone is half of a crotchet)) were used. We call this case “Md” henceforth in this study. case 2 Spatial consonance and temporal consonance (the range of memories is equivalent to 1 tone (each is half of a crotchet)) was used. We call this case “Next”. case 3 Only spatial consonance was used. We call this case “notM”. case 4 Although consonance was the same as that in case 1, agents were prohibited from crossing with each other to make their actions more melodious. This additional condition to melodize chords was set as an application of gestalt theory to music. We call this case “notCross”. 2.3.2. Simulation Experiment: Results and Analyses In each case, we obtained 10 patterns (generated chords) of convergent results. An exemplary learning curve is shown in Fig. 3. It shows the average reward of three agents on the vertical axis and the number of trials on the horizontal axis. Figure 4 shows an example of the generated chords in case 1. Below, we describe the analysis of the generated chords.
Figure 3. An example of the learning curve in case 1.
Figures 6(a) and 6(b) show the frequency distribution of component tones of the generated chords in case 1 and case 3. Figure 6(a) shows that the generated chords in case 1 consist only of tones of a major scale (Fig. 5). On the other hand, Fig. 6(b) shows that generated chords in case 3 consist of all tones between 2 octaves. Component tones of the generated chords in case 2 almost correspond with tones of a major scale. Through this analysis, it turns out that we were able to obtain chords of a major scale using temporal consonance when the range of memory has appropriate length. This result suggests that humans created tonal music and certain kinds of folk music based on both spatial consonance and temporal consonance.
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
483
Figure 4. Score: An example of chords simulated in case 1.
Figure 5. Major scale on key of F.
(a) case 1.
(b) case 3.
Figure 6. Frequency distribution of component tones of generated chords
In case 4, component tones of the generated chords have identical characteristics to those in case 1. We successfully generated consonance and melodious chords using our system.
484
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
2.3.3. Psychological Experiment: Method We conducted a psychological experiment using an impression rating As stimuli, we used 25 chords generated through simulation (Md: 5, Next: 5, notM: 5, notCross: 5, random: 5). This experiment enrolled 18 participants as volunteers (17 males, 1 female, 22–26 years old). The task was to evaluate their impressions. We used a semantic differential technique in which the participants evaluated impressions according to 14 adjective pairs on a 5 point scale. 2.3.4. Psychological Experiment: Results and Analysis First, we obtained the averaged value of each stimulus type according to each evaluation item across all participants. The averaged value was defined as an evaluated value of the stimuli. We obtained them in each case (Md, Next, notM, notCross, Random) for each evaluated item. The average value was defined as the evaluated value of the generation type on the evaluated item. Figure 7(a) shows results of these data analyses.
(a) Radar of stimulus
(b) Mapping of stimulus.
Figure 7. Results of the psychological experiment
Furthermore, we did factor extraction to the evaluated value of each stimulus using factor analysis. Principal factor method was used for factor extraction. Orthogonal and varimax rotations were used for transformation of values. The two main factors were extracted from results of factor analysis. The first factor, called “sense of affinity”, comprised likability, naturalness, coherency and ease of remembrance. The second factor, called “sound clarity”, comprised lightness, brightness, beauty of sound and reduction in tension. Figure 7(b) shows that participants evaluated stimulus according to two main axes, thereby suggesting that participants had different impressions according to generation types. Chords using temporal consonance give participants the impression of "sound clarity" because the stimulus generated in the temporal consonance (Md, notCross) condition showed high scores on the vertical axis in the map.
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
485
2.4. Experiment 2 2.4.1. Simulation Experiment: Outline In experiment 2, we set a different rhythm pattern in each case. The model also has three musical note agents. They make decisions at different times. Actions of agents have different duration. Therefore, the generated chords consisted of three melody lines with different rhythm (time and duration). In both cases, the consonance used to evaluate the actions was identical to that in experiment 1 case 1.
Figure 8. Rhythm pattern in case 1.
case 1 The task for the agents was to generate chords according to the rhythm pattern we made (Fig. 8). We verified the effect of temporal consonance in chord generation with a rhythm pattern through this experiment. case 2 The task for the agents was to generate chords according to the rhythm pattern extracted from the opening part of Bach’s Dreistimmige Invensionen (Sinfonien) Sinfonie No. 14 B-dur. We verified the characteristics of generated chords compared with Bach’s original pattern. 2.4.2. Simulation Experiment: Result and Analysis In each case, we obtained 10 patterns (generated chords) of convergent results. In case 1, the result shows that generated chords consist of tones of a major scale. In the simulation with the rhythm pattern, the chords in a major scale were also generated as well as in the simulation with constant rhythm. This result also supports our idea that humans created tonal music based on both spatial consonance and temporal consonance. In case 2, compared with Bach’s original pattern, the generated chords have a lesssmooth melody and unnatural sound (Fig. 9). These strange characteristics are inferred to arise from the fact that we ignored some cognitive features about melody in our system. Consequently, evaluation based on some cognitive features about melody seems to be necessary to generate chords with a smooth and natural melody. 2.5. Summary This paper presented the Co-creative Composition, in which the musical note agent decides the next position interacting with other agents on the musical-sheet-like environment. Generated chords have the order that component tones were identical to a major scale from the perspective of musical theory and have the order of "sound clarity" at the aspect of human impression using temporal consonance. Therefore, we inferred the possibility that these orders had generated through structural determination based on sensory
486
S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning
Figure 9. Example of generated pattern in case 2.
consonance. As shown in experiment 2 (case 2), some other evaluations were based on other cognitive features. They should be incorporated into our system to generate more musical chords that have various types of musical order.
References [1] K. Sugawara, R. Yoneda, T. Nishimoto and S. Sagayama, Automatic Harmonization for Melodies based on HMMs Including Note-chain Probability, ASJ 1-9-2 (2004), 665-666 [2] S. Nakagata, T. Nishimoto and S. Sagayama, Automatic Generation of Counterpoints Based on Dynamic Programming and Probability of Tone Series, MUS 56 (2004), 65-70 [3] K. Verbeurgt, M. Fayer and M. Dinolfo, A Hybrid Neural-Markov Approach for Learning to Compose Music by Example, Canadian AI 3060 (2004), 480-484 [4] R. Plomp and W.J.M. Levelt, Tonal Consonance and Critical Bandwidth, Journal of the Acoustical Society of America 38 (1965), 548-560 [5] W.A. Sethares, Local consonance and the relationship between timbre and scale, Journal of the Acoustical Society of America 94 Num. 3 (1993) 1218-122
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
487
Self-assembly of Mobile Robots: From Swarm-bot to Super-mechano Colony Roderich Groß a,b , Marco Dorigo a , and Masaki Yamakita b a IRIDIA, Université Libre de Bruxelles, Belgium b Yamakita Lab, Tokyo Institute of Technology, Japan Abstract. Up to now, only a few collective or modular robot systems have proven capable of letting separate and autonomous units, or groups of units, self-assemble. In each case, ad hoc control algorithms have been developed. The aim of this paper is to show that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multirobot platform (i.e., a super-mechano colony system). Although there are substantial differences between the two robotic platforms, it is possible to qualitatively reproduce the functionality of the source platform on the target platform—the transfer neither requires modifications in the hardware nor an extensive redesign of the control. The results of a set of experiments demonstrate that a controller that was developed for the source platform lets robots of the target platform self-assemble with high reliability. Finally, we investigate mechanisms that control the patterns formed by autonomous self-assembly. Keywords. Self-assembly, self-reconfigurable robots, pattern formation, collective robotics, swarm robotics, super-mechano colony, swarm-bot, swarm intelligence
1. Introduction Self-assembly is a reversible processes by which discrete entities bind to each other without being directed externally. It may involve components from molecular scale (e.g., DNA strands forming a double helix) to planetary scale (e.g., weather systems) [23]. The study of self-assembling systems attracted much interest as it may lead to selfconstructing, self-repairing and self-replicating machines [21,26,11]. Self-assembly has been widely observed in social insects [1]. Insects physically connect to each other to form aggregate structures with capabilities exceeding those of an individual insect. Oecophylla longinoda worker ants, for example, form connected living bridges which other colony members traverse [15]. Understanding how insects use selfassembly to manipulate objects or to navigate all-terrain may have strong implications for robotic systems design. Self-reconfigurable robots [24,20] hold the potential to self-assemble and thus to mimic the complex behavior of social insects. In current implementations [18,24,20,
488
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
14], however, single modules usually have highly limited autonomous capabilites (when compared to an insect). Typically, they are not equipped with sensors to perceive the environment. Nor, typically, are they capable of autonomous motion. These limitations, common to most self-reconfigurable robotic systems, make it difficult to let separate modules, or groups of modules, connect autonomously. In some systems, self-assembly was demonstrated with the modules being pre-arranged at known positions [25,26]. Rare instances of less constrained self-assembly have been reported which are detailed in the following: • Fukuda et al. [9] demonstrated self-assembly among cells of the CEBOT system [10]. In an experiment, a cell approached and connected with another one. Communication among connected cells was studied to enable the group to approach and connect with additional cells [8]. • Bererton et al. [3] studied docking in the context of self-repair. One robot was equipped with a fork-lift mechanism to replace a component of its teammate. Successful docking was demonstrated for distances of up to 30 cm and angular displacements of up to 30◦ . Image processing was accomplished on an external PC. • Rubenstein et al. [19] demonstrated the ability of two modular robots to selfassemble. Each robot consisted of a chain of two linearly-linked CONRO modules [5]. The robot chains were set up at distances of 15 cm, facing each other with an angular displacement not larger than 45◦ . • White et al. [22], Griffith et al. [11], and Bishop et al. [4] studied simple, programmable parts capable of self-assembling in the context of self-replication and pattern formation. The parts slid passively on an air table and bound to each other upon random collisions. • Groß et al. [12] demonstrated self-assembly of 16 physical robots using the swarm-bot system [16,7]. The robots self-assembled from arbitrary initial positions and on different types of terrain. At present, swarm-bot is the state of the art in autonomous self-assembly. The aim of this paper is to show that a control algorithm for autonomous selfassembly can be ported from a source multi-robot platform to a different target multirobot platform. The source platform is the swarm-bot system [16,7]. The target platform is a super-mechano colony (SMC) system [6]. Although there are substantial differences between the two robotic platforms, it is possible to qualitatively reproduce the functionality of the source platform on the target platform. This requires neither modifications in the hardware nor an extensive redesign of the control. The paper is organized as follows. Section 2 overviews the hardware of the source and target platforms. Section 3 presents the control as implemented on the source platform, and details the transfer to the target platform. Section 4 presents the results of a set of experiments. Finally, Sections 5 and 6 discuss the results and conclude the paper.
2. Hardware 2.1. Source Platform (Swarm-bot) Fig. 1a shows an s-bot, the basic robotic component of the swarm-bot system [16,7, see also www.swarm-bots.org]. The s-bot has a diameter of 12 cm, a height of 19 cm
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
(a)
(b)
(c)
489
(d)
Figure 1. Swarm-bot: (a) the s-bot and (b) two s-bots connecting to each other. Super-mechano colony: (c) a child robot (top view) and (d) two child robots connecting to each other.
(including the transparent cylinder on top), and weighs approximately 700 g. The s-bot has nine degrees of freedom (DOF), all of which are rotational, including two DOF for the traction system, one DOF to rotate the s-bot’s upper part with respect to the lower part, one DOF for the grasping mechanism of the gripper (located in what we define to be the s-bot’s front), and one DOF for elevating the arm to which the gripper is attached. The robot’s traction system consists of a combination of tracks and two external wheels, called treels© . An s-bot can connect with another by grasping the connection ring (see Fig. 1b). The s-bot can receive connections on more than two thirds of its perimeter. For the purpose of robot to robot communication, the s-bot has eight RGB LEDs. The s-bot is equipped with a variety of sensors, including 15 proximity sensors distributed around the robot, two optical barriers integrated in the gripper, and a VGA omnidirectional vision system. The control is executed on an XScale board running a Linux operating system at 400 MHz. A battery provides full autonomy. 2.2. Target Platform (SMC) Super-mechano colony (SMC) [6,17] is a modular robotic concept composed of a single main body, called mother ship, and many child robots attached to it. Child robots are an integral part of the system’s locomotion. They can disband to accomplish separate, autonomous missions, and reconnect once the missions are accomplished. Furthermore, child robots have the potential to connect to each other. Fig. 1c shows the physical implementation of a child robot of an SMC system [6]. The robot has a diameter of 26 cm, a total height of 51 cm and weighs 11 kg. The child robot has five DOF, including two DOF for the traction system, one DOF to rotate the robots’ upper part with respect to the lower part, one DOF for elevating a manipulation arm (located in what we define to be the robot’s front), and one DOF to open and close a gripper that is attached to the manipulation arm. The traction system consists of two active wheels on the left and the right side, and two passive wheels in the front and the back. Each child robot is equipped with a coupling cylinder in its back that allows for receiving connections from a teammate (see Fig. 1d). A directional VGA stereo vision system is mounted on top of the robot. An additional camera is attached to the manipulation arm. The vision system can detect the relative position of the mark attached to the top of the coupling cylinder of another robot. The control is executed on an on-board Pentium MMX computer running a Microsoft Windows operating system at 233 MHz. A battery provides full autonomy. In the experiments presented in this paper we used an external power supply instead.
490
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
Algorithm 1 Self-assembly module for the source platform 1: set-color-ring(blue) 2: repeat 3: if risk-of-stagnation() then 4: set-traction-system-recovery() 5: end if 6: (i1 , i2 ) ← camera-feature-extraction() 7: (i3 , i4 ) ← proximity-sensor-readings() 8: (o1 , o2 , o3 ) ← neural-network(i1 , i2 , i3 , i4 ) 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
if (o3 > 0.5) ∧ grasping-requirements-fulfilled() then set-gripper(close) if successfully-connected() then set-color-ring(red) halt() else set-gripper(open) end if end if set-traction-system(o1, o2 ) until timeout-reached()
3. Control Transfer 3.1. Control of the Source Platform (Swarm-bot) Algorithm 1 details the control module for self-assembly. It is assumed that the process is seeded by the presence of a non-moving red object (e.g., the color ring of a robot). A reactive neural network (line 8) constitutes the principal control mechanism. The network takes as input the binary values i1 and i2 from the robot’s vision system (line 6) and the values i3 and i4 from the left-front and right-front robot’s proximity sensors (line 7). The network’s output (o1 , o2 , o3 ) is used to control the speed of the left and the right side of the traction system (line 19) and the connection mechanism (lines 10 to 18). Before being applied to the traction system, o1 and o2 are smoothed by a moving average function. By default, the tuple (i1 , i2 ) is assigned (0, 0). Any other assignment indicates the presence of a red object (in the front, or to the left or the right side). If an unconnected robot (i.e., a blue object) is perceived in between, i1 and i2 are set to zero. To avoid damage to the traction system and to improve the reliability of the control, a recovery move is launched if high torque is continuously present on the traction system (lines 3 to 5). During recovery, the s-bot moves backwards with a small lateral displacement. The neural network had been shaped by artificial evolution in a computer simulation [13] and subsequently transferred to the physical swarm-bot system [12]. 3.2. Transfer to the Target Platform (SMC) Algorithm 1 describes the control algorithm as it was developed for the source multirobot platform. In the following, we explain how the sensing and acting functions of
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
(a)
(b)
491
(c)
Figure 2. Self-assembly of two robots: influence of the initial orientation of the approaching robot. Examples of (a) initial and (b) final configurations. (c) Box-and-whisker plot [2] of the completion times (in s) grouped according to the initial orientation of the approaching robot (39 observations in total).
the source platform were realized on the target platform so that Algorithm 1 could be ported without any change. Some functions as, for instance, neural-network() remained identical (except for the time required for processing). Many other functions such as set-traction-system() could be transferred with minor modifications (e.g., by scaling the speed values to an appropriate range). In the following, we detail those functions which required a different implementation on the target platform to qualitatively reproduce the original function of the source platform. • risk-of-stagnation(): to detect the risk of stagnation of an s-bot, the torque on each side of the traction system was monitored. For the SMC child robot, we use the camera vision system instead. If there is the risk that the left side of the manipulation arm collides with another robot, the recovery move is executed.1 • proximity-sensor-readings(): as the target platform is not equipped with proximity sensors, we mimic virtual proximity sensors heading in the front-left and front-right directions by making use of the vision system. The reading values of the virtual sensors are computed based on the relative position to other robots. • grasping-requirements-fulfilled(), successfully-connected(): to test if the grasping requirements are fulfilled, the stereo vision system is used. The system allows for computing the relative position of the coupling cylinder. Consequently, no additional tests must be performed to validate the connection. • set-color-ring(): as the current prototype of the SMC system is not equipped with communication mechanisms other than wireless network, the robots do not signal their connection state. Therefore, each robot can receive connections at any time.
4. Experiments on the Target Platform (SMC) 4.1. Influence of Initial Orientation We examine the ability of a robot to approach and connect with a passive teammate (see Fig. 2). The two robots have identical hardware. The approaching robot is placed at a 1 Note
that the neural network lets the robot approach the object either straight, or by turning anti-clockwise. If the right side of the manipulation arm collides with the object, the neural network lets the robot retreat as a result of the high reading values from the front-right proximity sensor.
492
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
(a)
(b)
(c)
Figure 3. Self-assembly of two robots: influence of the approaching angle. Examples of (a) initial and (b) final configurations. (c) Box-and-whisker plot [2] of the completion times (in s) grouped according to the approaching angle (50 observations in total).
distance of 100 cm and orientation α with respect to its teammate. The latter is oriented so that its back with the coupling cylinder is heading towards the approaching robot. For each initial orientation α ∈ {0◦ , 90◦ , 180◦ , 270◦}, 10 repetitions are carried out, thus in total 40 trials are performed. If the robots have not established a connection within 300 s, the trial is stopped. In 39 out of 40 cases, the robots self-assembled successfully. Fig. 2c shows the observed completion times (in s). If no robot to approach is perceived, the neural network controller lets the robot turn anti-clockwise. This explains the differences in performance. Overall, it seems that the success rate does not depend on the initial orientation of the approaching robot. 4.2. Influence of Approaching Angle We examine the ability of a single robot to connect with a passive teammate when approaching it from different angles (see Fig. 3). Due to the mechanical design, the robot cannot connect with the coupling cylinder of the teammate from every angle. In fact, if the angular mismatch between the orientations of the two robots exceeds 85◦ , it is impossible to establish a connection. Therefore, potential approaching angles for a successful grasp are limited to the range [−85◦, 85◦ ]. For approaching angles in the range [−45◦ , 45◦ ], there should be no difference in the performance as the jaws of the gripper element are not likely to collide with the body of the teammate. The bigger the angular deviation, the more difficult gets the task. We study the approaching angles α ∈ {−75◦, −45◦, 0◦ , 45◦ , 75◦ }. Initially, the approaching robot is oriented towards the teammate. For each angle, ten repetitions are carried out, thus in total 50 trials are performed. If the robots have not established a connection within 300 s, the trial is stopped. In all 50 cases, the robots correctly self-assembled. Fig. 3c shows the observed completion times (in s). The fluctuations in performance are astonishingly low: all completion times are in the range [50, 63]. 4.3. Start Positions That Require to Turn Away or Retreat We examine the ability of two robots to self-assemble when their starting position and orientation are such that self-assembly is particularly difficult. To create such a situation, we take two robots forming a linear, connected chain and we generate the start positions
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
(a)
493
(b)
Figure 4. Sensor readings and actuator commands over time (in s) for the two difficult initial arrangements: translation of the approaching robot (a) to the left and (b) to the right.
from this situation via a translation of the grasping robot for 10 cm to either the left or the right side. These start positions oblige the grasping robots to turn away or retreat before approaching the target. In fact, aligning the robot on the spot in the direction of the target would result in a collision between one side of the manipulation arm and the coupling cylinder. The robots correctly self-assembled in both situations. Figs. 4a and 4b show the corresponding sensor readings and actuator commands as monitored at the end of each control cycle for the whole duration of the trial. In the first case (see Fig. 4a), the entire situation was handled by the neural network that caused the robot to retreat. In the second case (see Fig. 4b), instead, a recovery move was launched during three control cycles (at time 0, 2, and 7 s).
5. Discussion We have shown that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multi-robot platform (i.e., a super-mechano colony system). A set of experiments demonstrated the validity of the approach. The robots selfassembled reliably—in 91 out of 92 trials the robots correctly established a connection. A robot can approach another and connect to it from a wide range of angles (at least 150◦ ).2 The initial orientation of the approaching robot does not affect the reliability. In addition, the control proved capable of dealing with two particular challenging arrangements in which the approaching robot is required to retreat. All experiments have been recorded on video tape (see http://iridia.ulb.ac.be/~rgross/ias2006/). The success can be partially attributed to some design choices made when developing the control for the source platform. First, the use of sensory feedback was reduced to a small set of input variables that appeared to be indispensable to perform the task. For instance, the perceptual range of the robot vision was limited substantially. Consequently, no major difficulties arose when porting the control on the target platform which 2 Note
that also for the source platform the range of possible approaching angles is limited. When starting from outside this range, the approaching robot can search for another position or the target robot may change its orientation.
494
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
(a)
(b)
Figure 5. A group of four robots self-assembling: (a) initial configuration and (b) final configuration reached after 475 s. Once a robot had completed its task, a visual mark was manually attached to the coupling cylinder.
(a)
(b)
(c)
Figure 6. Pattern formation: a group of four robots self-assembles starting from a specific initial arrangement, as shown in (a). Depending on the type of visual mark of the robot seeding the process (i.e., the robot on the right side in the figures), different patterns emerge. The final configurations shown in (b) and (c) were reached after 102 s and 207 s, respectively. During the experiments, there was no human intervention.
was equipped with directional vision only. Second, a simple neural network controller was chosen to compute the motor commands based on the sensory feedback. The neural network controller proved robust with respect to changes in the hardware as well as in the provided control functions. An open issue to address is the problem of scalability. To enable tens or more SMC child robots to self-assemble, we believe that it is beneficial if each robot can signal whether it is connected or not (as it is the case on the swarm-bot platform). Although it is possible to mimic such a function using the existing actuators of the target platform, it might be more appropriate to equip the robot with a communication mechanism (e.g., a controllable LED). To illustrate the use of such a mechanism, we conducted a preliminary set of experiments. Fig. 5 shows a group of four robots self-assembling. In this experiment, the (visual) marks on the top of the coupling cylinder of each robot were attached manually as soon as the robot completed its task. In a second experiment, we adjusted the type of visual mark of the robot seeding the process prior to experimentation. It was shown that depending on the visual mark present, distinct patterns emerged (see Fig. 6). Video recordings are available at http://iridia.ulb.ac.be/~rgross/ias2006/.
6. Conclusions The aim of this study was to show that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multi-robot platform (i.e., a super-mechano colony system) if the control
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
495
algorithm is designed adequately. Although there were substantial differences between the two robotic platforms, we showed that it is possible to qualitatively reproduce the basic functionality of the source platform on the target platform. In fact, the transfer did neither require modifications in the hardware nor an extensive redesign of the control. The task to let two mobile robots of identical hardware self-assemble starting from random position had so far been demonstrated only with the CONRO system [19] and the swarm-bot [12]. The results presented in this paper demonstrate that the control ported from the swarm-bot to the super-mechano colony lets the robots accomplish the task, while preserving high reliability and low fluctuations in performance. This suggests that the control algorithm is based on some generic principles for the design of self-assembling systems. Finally, we presented a first attempt to control the patterns that are formed by autonomous self-assembly. An interesting research direction will be the study of mechanisms that lead to patterns that have a functional value for the robotic swarm.
Acknowledgments This work was accomplished during the visit of Roderich Groß to the Tokyo Institute of Technology in 2005, supported by the Japan Society for the Promotion of Science (JSPS). Marco Dorigo acknowledges support from the Belgian FNRS, of which he is a Research Director, and from the “ANTS” project, an “Action de Recherche Concertée” funded by the Scientific Research Directorate of the French Community of Belgium.
References [1] C. Anderson, G. Theraulaz, and J.-L. Deneubourg. Self-assemblages in insect societies. Insect. Soc., 49(2):99–110, 2002. [2] R. A. Becker, J. M. Chambers, and A. R. Wilks. The New S Language. A Programming Environment for Data Analysis and Graphics. Chapman & Hall, London, UK, 1988. [3] C. A. Bererton and P. K. Khosla. Towards a team of robots with repair capabilities: a visual docking system. In Proc. of the 7th Int. Symp. on Experimental Robotics, volume 271 of Lecture Notes in Control and Information Sciences, pages 333–342. Springer Verlag, Berlin, Germany, 2000. [4] J. Bishop, S. Burden, E. Klavins, R. Kreisberg, W. Malone, N. Napp, and T. Nguyen. Programmable parts: A demonstration of the grammatical approach to self-organization. In Proc. of the 2005 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 2644–2651. IEEE Computer Society Press, Los Alamitos, CA, 2005. [5] A. Castano, A. Behar, and P. M. Will. The CONRO modules for reconfigurable robots. IEEE/ASME Trans. Mechatron., 7(4):403–409, 2002. [6] R. Damoto, A. Kawakami, and S. Hirose. Study of super-mechano colony: concept and basic experimental set-up. Adv. Robots, 15(4):391–408, 2001. [7] M. Dorigo, V. Trianni, E. Sahin, ¸ R. Groß, T. H. Labella, G. Baldassarre, S. Nolfi, J.-L. Deneubourg, F. Mondada, D. Floreano, and L. M. Gambardella. Evolving self-organizing behaviors for a Swarm-Bot. Auton. Robots, 17(2–3):223–245, 2004. [8] T. Fukuda, M. Buss, H. Hosokai, and Y. Kawauchi. Cell structured robotic system CEBOT— control, planning and communication. In Proc. of the 2nd Int. Conf. on Intelligent Autonomous Systems, volume 2, pages 661–671. IOS Press, Amsterdam, The Netherlands, 1989.
496
R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony
[9] T. Fukuda, S. Nakagawa, Y. Kawauchi, and M. Buss. Self organizing robots based on cell structures - CEBOT. In Proc. of the 1988 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems, pages 145–150. IEEE Computer Society Press, Los Alamitos, CA, 1988. [10] T. Fukuda and T. Ueyama. Cellular Robotics and Micro Robotic Systems. World Scientific Publishing, London, UK, 1994. [11] S. Griffith, D. Goldwater, and J. M. Jacobson. Self-replication from random parts. Nature, 437(7059):636, 2005. [12] R. Groß, M. Bonani, F. Mondada, and M. Dorigo. Autonomous self-assembly in a swarmbot. In Proc. of the 3rd Int. Symp. on Autonomous Minirobots for Research and Edutainment (AMiRE 2005), pages 314–322. Springer Verlag, Berlin, Germany, 2006. [13] R. Groß and M. Dorigo. Group transport of an object to a target that only some group members may sense. In Proc. of the 8th Int. Conf. on Parallel Problem Solving from Nature, volume 3242 of Lecture Notes in Computer Science, pages 852–861. Springer Verlag, Berlin, Germany, 2004. [14] M. W. Jørgensen, E. H. Østergaard, and H. H. Lund. Modular ATRON: Modules for a selfreconfigurable robot. In Proc. of the 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, volume 2, pages 2068–2073. IEEE Computer Society Press, Los Alamitos, CA, 2004. [15] A. Lioni, C. Sauwens, G. Theraulaz, and J.-L. Deneubourg. Chain formation in Oecophylla longinoda. J. Insect Behav., 14(5):679–696, 2001. [16] F. Mondada, L. M. Gambardella, D. Floreano, S. Nolfi, J.-L. Deneubourg, and M. Dorigo. The cooperation of swarm-bots: Physical interactions in collective robotics. IEEE Robot. Automat. Mag., 12(2):21–28, 2005. [17] K. Motomura, A. Kawakami, and S. Hirose. Development of arm equipped single wheel rover: Effective arm-posture-based steering method. Auton. Robots, 18(2):215–229, 2005. [18] S. Murata, E. Yoshida, A. Kamimura, H. Kurokawa, K. Tomita, and S. Kokaji. M-TRAN: Self-reconfigurable modular robotic system. IEEE/ASME Trans. Mechatron., 7(4):431–441, 2002. [19] M. Rubenstein, K. Payne, P. Will, and W.-M. Shen. Docking among independent and autonomous CONRO self-reconfigurable robots. In Proc. of the 2004 IEEE Int. Conf. on Robotics and Automation, volume 3, pages 2877–2882. IEEE Computer Society Press, Los Alamitos, CA, 2004. [20] D. Rus, Z. Butler, K. Kotay, and M. Vona. Self-reconfiguring robots. Commun. ACM, 45(3):39–45, 2002. [21] J. von Neumann. Theory of Self-Reproducing Automata. Univ. Illinois Press, Urbana, IL, 1966. [22] P. J. White, K. Kopanski, and H. Lipson. Stochastic self-reconfigurable cellular robotics. In Proc. of the 2004 IEEE Int. Conf. on Robotics and Automation, volume 3, pages 2888–2893. IEEE Computer Society Press, Los Alamitos, CA, 2004. [23] G. M. Whitesides and B. Grzybowski. Self-assembly at all scales. Science, 295(5564):2418– 2421, 2002. [24] M. Yim, Y. Zhang, and D. Duff. Modular robots. IEEE Spectr., 39(2):30–34, 2002. [25] M. Yim, Y. Zhang, K. Roufas, D. Duff, and C. Eldershaw. Connecting and disconnecting for chain self-reconfiguration with PolyBot. IEEE/ASME Trans. Mechatron., 7(4):442–451, 2002. [26] V. Zykov, E. Mytilinaios, B. Adams, and H. Lipson. Self-reproducing machines. Nature, 435(7039):163, 2005.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
497
Lot Release Control Using Genetics Based Machine Learning in a Semiconductor Manufacturing System , Nobutada FUJII b Kanji UEDA b and Motohiro KOBAYASHI c a School of Engineering, The University of Tokyo b Research into Artifacts, Center for Engineering, The University of Tokyo c Sony Corporation
Ryohei TAKASU
a,1
Abstract. This paper proposes a new lot release control method employing genetics based machine learning (GBML) in a semiconductor manufacturing system. A scheduling agent with a classifier system (CS) that is a GBML system is introduced to generate suitable rules to decide the timing to release new products onto a production floor according to the distribution of work in process (WIP) number. Feasibility of the proposed lot release control method is assessed based on results of computer simulations modeling a multi-product semiconductor manufacturing system. Comparisons with existing lot release control methods in terms of throughput, turn around time (TAT) and WIP were performed. Analyses of lot release rules that were acquired by the scheduling agent were also performed. Keywords. Lot release control, Reinforcement learning, Semiconductor manufacturing system,
1. Introduction Because of diversification of consumer needs and intensification of international competition, semiconductor manufacturers must make huge investments and keep pace to meet sharp demand fluctuations. Therefore, semiconductor manufacturing systems must realize efficient facility utilization and flexibility for production of widely various products and work volumes. A semiconductor manufacturing system is a representative example of complex manufacturing systems because it usually comprises hundreds of machines. A semiconductor product involves a complex and reentrant sequence of more than hundreds of processing steps. Figure 1 shows an example of the product ’s process flow. In Figure 1, named nodes represent the kind of processing step and arcs represent the input-output relation between processing steps. The huge problem size and the complex process relationship imply vastly numerous combinations. Moreover, the objective function is dynamic, not static. It is important 1 Correspondence to: Ryohei TAKASU, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8568, Japan. Tel.: +81 04 7136 4271; Fax: +81 04 7136 4271; E-mail: [email protected]
498
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
Figure 1. Example of process flow
to plan a proper production schedule to achieve high productivity and low cost in the production system. However, specific properties of semiconductor manufacturing render it difficult to determine an optimal solution using traditional static optimization. This study mainly addresses lot release control problems in semiconductor manufacturing. Some lot-release control methods have been proposed: constant work in process number (CONWIP) [1], workload regulation (WR) [2], and starvation avoidance (SA) [3]. The CONWIP manages product-dispatching timing to stabilize the work in process (WIP) throughout the production floor or on a part of it. In turn, WR regulates productrelease timing depending on the WIP number for the whole production floor to maintain the workload of the bottleneck facility at a certain level. Products are released by SA so that the WIP number of a bottlenecked facility can maintain a proper level. Current proposed lot release control methods are reportedly effective and can determine a proper schedule to realize a short turn around time (TAT) of products on a static production floor. However, it is difficult to determine the premeditated static threshold of the scheduling criteria in a large-scale and complex production floor. Moreover, even if a suitable threshold value is found, the threshold value is no more effective in cases of a production situation change in a turbulent semiconductor-manufacturing environment. A new method must be developed that can produce a proper schedule without using a static predetermined threshold of criteria. This study adopts a biological manufacturing systems approach (BMS) [4], and proposes to employ genetics based machine learning (GBML) to control lot release timing and thereby realize adaptive lot release control without premeditated thresholds in the scheduling criteria.
2. Lot Release Control Using GBML 2.1. GBML Based Scheduling A lot release scheduling agent should have a learning classifier system [5], which is one GBML method, to decide the product dispatching timing that is introduced to the production floor. It is expected that the proposed scheduling agent can plan a proper lot release schedule without premeditated thresholds because the scheduling agent can find the threshold by itself through trial-and-error learning. Because of learning characteristics, it is also expected that the proposed scheduling method is applicable to large-scale scheduling problems and can adapt properly to dynamic fluctuation on the production floor.
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
499
Figure 2. Lot release scheduling agent and the production floor
Figure 3. Configuration of lot release scheduling agent
Figure 4. WIP conversion into a 2-bit binary string
Figure 2 depicts the proposed system using the scheduling agent with a classifier system. The input to the agent is the state of the production floor, i.e. the WIP number of each machine group that consists of alternative machines. On the other hand, the output from the agent is the action: lot release. The agent action is evaluated using the evaluation criteria. Consequently, the agent obtains a suitable state-action chain. The scheduling agent is modeled using the classifier system (see Figure 3). The classifier system consists of fashioned n-bit production rules. In this paper, the condition part of the rule consists of a 2m-bit binary string that represents the WIP number of m machine groups. Condition {10} means that WIP is larger than twice the number of machines in the machine group. Condition {11} represents that WIP is less than twice, but more than the machine number, condition {01} means that WIP is less than the machine number, and condition {00} means that no product exists in the machine group (see Figure 4). Only the condition-part classifiers consist of a "do not care" mark
500
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
(#) that can match both 0 and 1 so that generalization of rules occurs. The action part consists of an n-bit binary string in the case that n kinds of product are produced. Action {1} means that it dispatches one lot of each product; action {0} signifies that it dispatches no lot. The scheduling agent ’s objective is to obtain a rule to fulfill high productivity and a preset throughput rate among product kinds. The scheduling agent ’s action is evaluated as the following: the scheduling agent is rewarded if a finished product keeps its due-date; the scheduling agent is penalized if not. Credit values are apportioned among classifiers using the bucket brigade algorithm. The action is also evaluated for the throughput rate among product kinds: the scheduling agent receives a reward if the throughput rate meets or exceeds the target throughput rate; otherwise, it is penalized. The credit values are assigned among classifiers using a profit sharing plan (PSP) [6]. Genetic algorithms [7] are used to create new classifiers to maintain various classifiers and to avoid entrapment in a locally minimal solution. 2.2. Scheduling on the Production Floor The production floor scheduling problem must be solved properly. A dynamic scheduling method using self-organization [8] solves the scheduling problem. In self-organization based scheduling, production proceeds through local interaction among machines and AGVs transporting a product without global control. A potential field, which is a superposition of attraction and repulsion fields, is employed to realize interaction on the production floor.
3. Simulation Results and Discussion 3.1. Simulation Setting Computer simulations are performed to verify the feasibility of the proposed lot-release control method using the classifier system. Figure 5 illustrates the modeled production floor. One scheduling agent, which determines whether to release new lot, is introduced onto it. The production floor has 40 processing machines, which are classified into 22 kinds as alternative machine groups. The number of AGVs is set to four. Six stockers are also introduced to stock the products waiting for the next process. This system has one product dispatching point and one collection point.
Figure 5. Production floor model
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
501
Two product types are produced on a production floor that has approximately 1/4 length of the process steps of the actual process flow: TYPE1 has 90 production steps and its total production time is 1.95 (day); TYPE2 has 60 steps and 2.49 (day) production time. The respective due dates of products are set to about twice their production times. These data are listed in Table 1. As a consequence, classifiers that constitute lot-release rules that have been acquired by the scheduling agent have 44-bit input condition strings to detect situations of machine groups and 2-bit output action strings to dispatch each product. The target throughput rate is set as TYPE1 : TYPE2 = 1:1. The simulation runs for 15 trials; each trial has a 1000-day simulation. The trial is stopped and moved to the next situation if irrelevant dispatching fills the floor. 3.2. Simulation Results Figure reffig:wip shows the respective transitions for 50 days of (a) average WIP number, (b) daily throughput, and (c) average TAT throughout the simulation trials. The horTable 1. Process data of products: TYPE1 and TYPE2 TYPE1 Process number Process time [day] Due date [day]
TYPE2
90
60
1.95 4
2.49 5
(a) WIP
(b) Throughput
(c) TAT Figure 6. Transitions of (a) WIP, (b) throughput, and (c) TAT
502
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
Table 2. Average of WIP, throughput and TAT in the final 50 days TYPE1
TYPE2
WIP Throughput [lot/day]
15.5 5.46
17.2 5.50
TAT [day]
2.82
3.11
izontal axis indicates simulation steps; p_q represents qth day in pth trial. In the initial trials, the WIP number achieves wrong and fluctuating values because of overloading caused by random lot release. After the scheduling agent learns, it accomplishes a better and steady value at the 15th trial. This simulation takes about one day to run. Table 2 shows the average WIP number, daily throughput and average TAT during the last 50 days at the 15th trial. It is readily apparent that the throughput rate of two product types accomplishes almost 1:1 and that the average TAT of both products is before each due date. The availability of the machine with highest value also obtains 96.9%. The result clarifies that the obtained scheduling rules can achieve not only a target throughput rate, but also high productivity on the production floor. 3.3. Comparison with Other Lot Release Control Methods We compared this system with two other lot release control methods to verify its performance. The first one is Uniform (UNIF), which dispatches lots at certain intervals. The second one is Constant WIP (CONWIP).We maintained WIP numbers at set values. The threshold criteria for respective methods were determined a priori by preliminary simulations with consideration of the result of the proposed method and were set to indicate the best performance with each methods. Table 3 shows results of comparison in terms of average WIP number, daily throughput, and average TAT. As shown in Table 3, the proposed lot release control method can obtain approximately equal performance to UNIF and CONWIP. Whereas current methods, including UNIF and CONWIP, must be preset with appropriate criteria thresholds, the proposed method can acquire lot release rules autonomously by capturing the changes of the production floor ’s condition. Therefore, its effectiveness is expected to be related to the fluctuation of the production environment such as the change of demand throughput rate. Table 3. Comparison with other methods WIP
Throughput
TAT
TYPE1
TYPE2
TYPE1
TYPE2
TYPE1
TYPE2
Proposed
15.54
17.24
5.46
5.50
2.82
3.11
UNIF
14.52
16.12
5.56
5.48
2.61
2.91
15
17
5.56
5.72
2.71
2.97
CONWIP
3.4. Analysis of Acquired Rules s conditions are We analyzed acquired dispatching rules to verify that the machine group ’ determinative to the scheduling agent ’s action. Figure 7 shows all executed rules during
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
503
Figure 7. Executed classifiers in the final 5 days
Figure 8. Difference of machine group condition by action
the final 5 days of the simulation. Here (2m − 1)th and 2mth locus in the condition part of each classifier represent the WIP of machine group No.m and nth locus in the action part represent whether to release a new TYPEn lot to the production floor. In the final 5 days, we observed that 10 classifiers were executed. Their action parts were action11, which means dispatching one TYPE1 lot and one TYPE2 lot, and action00, which means dispatching no lot. The condition part include numerous "do not care" marks, so we can only slightly detect which locus dictates action according only to Figure 7. Figure 8 shows the difference of frequency caused by difference between actions on each condition of machine groups. The horizontal axis indicates the number of the machine group; the vertical axis shows the difference of frequency of each condition. We can infer that machine groups that show the largest difference of frequency are determinative to the timing to switch actions. For instance, focusing on the No. 21 machine group, action{11} tended to be executed if its condition was {11}; and action{00} tended to be executed if its condition was {10}. Therefore, it is presumable that the condition change of the No. 21 machine group engenders the action change of the scheduling agent. Consideration of the difference of frequencies in Figure 8 with the locus of a donot-care mark in Figure 7, confirms machine groups No. 1, No. 13, No. 21, and so on as determinative. These machine groups are characteristic for machines with high availability, no alternative machines, and machines with much accessed time by product. Consequently, the scheduling agent realizes the production with high productivity and fulfillment of demands through detecting the condition change of these machine groups and obtains dispatching rules that correspond to their condition changes.
504
R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System
4. Concluding Remarks This paper presented a new approach to lot release control method using GBML in a semiconductor manufacturing system. In this approach, a scheduling agent acquires proper dispatching rules autonomously using a classifier system without premeditated criteria thresholds, which current scheduling methods require. Computer simulations were performed using data extracted from an actual semiconductor fabrication to verify the feasibility of the proposed method. Simulation results show that the proposed scheduling agent can achieve high productivity and fulfillment of demands without criteria thresholds. Comparison with other methods showed that its performance is approximately equal to other lot release control methods such as UNIF and CONWIP in terms of WIP number, throughput and TAT. By analysis of acquired rules, we demonstrated that the scheduling agent obtained the proper dispatching rules without focusing on machine groups with specific characteristics. For future studies, the adaptability of the proposed method can be verified by dealing with more difficult problems: manufacturing systems with more product types, larger scale manufacturing systems, and manufacturing systems that introduce fluctuation of production demands.
Acknowledgements This study was supported in part by Grants-in-Aid for Young Scientists (B), 17760326, and for Scientific Research on Priority Areas (2), 16016228 from the Ministry of Education, Culture, Sports, Science and Technology.
References [1] Spearman, M.L.; Woodruff, D.L. & Hopp, W.J. (1990). CONWIP: a pull alternative to kanban, International Journal of Production Research, Vol.28, No.5, pp. 879-894 [2] Wein, L.M. (1988). Scheduling Semiconductor Wafer Fabrication, IEEE Transactions on Semiconductor Manufacturing, Vol.1, No.3, pp. 115-130 [3] Glassey, C.R. & Resende, M.G.C. (1988). Closed-Loop Job Release Control for VLSI Circuit Manufacturing, IEEE Transactions on Semiconductor Manufacturing, Vol.1, No.1, pp. 36-46 [4] Ueda, K. (1992). A concept for bionic manufacturing systems based on DNA-type information, Proceedings of 8th International PROLAMAT Conference, IFIP, pp. 853-863 [5] Holland, J.H.; Holyoak, K.J.; Nisbett, R.E. & Thagard, P.R. (1985). Induction-process of inference, learning and discovery, The MIT Press [6] Grefennstette, J.J. (1988). Credit Assignment in Rule Discovery System Based on Genetic Algorithms, Machine Learning, Vol.3, pp. 225-245 [7] Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley [8] Vaario, J. & Ueda, K. (1998). An emergent modeling method for dynamic scheduling, Journal of Intelligent Manufacturing, Vol.9, pp. 129-140
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
505
Design of an AGV Transportation System by Considering Management Model in an ACT Satoshi Hoshino a,1 , Jun Ota a , Akiko Shinozaki b , and Hideki Hashimoto b a Dept. of Precision Engineering, The University of Tokyo, Japan b Mitsubishi Heavy Industries, LTD. Abstract. This paper focuses on the design of a highly efficient Automated Guided Vehicle (AGV) transportation system. Therefore, we attempt to construct detailed system management models that include agent cooperation, container storage scheduling, and container transportation planning. We then optimally design systems that are constructed with the use of these management models. The systems are compared to evaluate their operation efficiency based on their total construction cost. Finally, we describe a highly efficient management model for an AGV transportation system and discuss the need to consider an efficient system management methodology. Keywords. Design, AGV transportation system, management model, ACT,
1. Introduction The amount of container trade, especially in Asia, has increased significantly in recent years [1]. Following this trend, several studies have analyzed automation from various viewpoints [2,3]. In this paper, we consider the machines as the operational agents that operate in the automated container terminal (ACT) for unloading/loading, transportation, and storage. We then design a highly efficient automated transportation system in which the agents operate. For this purpose, it is necessary to consider the following problems: (I) optimal design of a transportation system [4], (II) comparison, evaluation, and analysis of a transportation system [5,6], and (III) efficient management of a transportation system. For problem (I), we have proposed a hybrid design methodology for an automated guided vehicle (AGV) transportation system with the use of a queuing network theory and a transportation simulator [4]. For problem (II), we have compared and evaluated the characteristics of the layouts of the transportation systems, such as vertical and horizontal ones [5]. On the other hand, as an integrated study of problems (I) and (II), we have evaluated and analyzed the total construction cost of the transportation system on the basis of the comparison of the optimal design results. A comparison with the vertical system that has been used in conventional ACTs indicated that the horizontal system is 1 Correspondence to: The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 JAPAN. Tel.: +81 3 5841 6486; Fax: +81 3 5841 6487; E-mail: [email protected].
506
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
Figure 1. Actual Horizontal AGV Transportation System in an Automated Container Terminal (ACT)
more efficient [6]. Therefore, in this study, we focus on the horizontal AGV transportation system shown in Fig.1; we then design a highly efficient transportation system while considering an efficient management methodology to overcome the existing problem (II I). In a conventional study for ACT management, Hartmann has introduced an approach to generate scenarios of operations [7]. The generator is based on a large number of parameters. He then outlined the parameters that are important to produce realistic scenarios of high practical relevance. This study, however, focuses mainly on scenarios in a quay-side operation. As for yard-side container transportation and storage operations, he focused on the transportation destination of a container. Therefore, for this study, we constructed yard-side management models. For this purpose, we define the management models as follows. • Agent cooperation between heterogeneous multi-agent. • Container storage scheduling at a storage location. • Container transportation planning in a container storage area. After that, we implement the constructed models to a horizontal AGV transportation system and then derive the numbers of AGVs and rubber-tired gantry cranes (RTGCs) for various transportation demands by means of the proposed design methodology [4]. Based on the derived design parameters, we compare the construction costs of each transportation system and evaluate the influence of the management model on operation efficiency. Finally, we describe a highly efficient management model and discuss the need to consider an efficient management methodology.
2. AGV Transportation System in an ACT 2.1. Transportation System and Layout Settings With regard to the layout of a transportation system for an optimal design, the horizontal AGV transportation system, shown in Fig.1, is divided into three parts and four areas, namely, a quay, two means of transportation, and a container storage area. A number of deployed quay container cranes (QCCs) is given as three, and it is not a design objective due to a space constraint. Here, container transportation orders are equally generated by
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
507
the three QCCs in the quay area. In the container storage area, the container storage locations show the temporal container storage space between the sea and land. Two RTGCs are used at each location. There is one work path adjacent to each location. Each location has a container storage capacity of 320 (4 Rows × 20 Bays × 4 Tiers). Let us define the capacity of one storage container is a 20-foot equivalent unit (TEU). The operation specifications of the AGV, RTGC, and QCC are based on the cited literature [8]. 2.2. Problem Description In addition to the numbers of AGVs and RTGCs, the management models, i.e. agent cooperation, container storage scheduling, and container transportation planning, are the design objectives. The detailed design process of the number of AGVs and RTGCs are discussed in the cited literature [4]. The derived numbers of AGVs and RTGCs in the designed system are the evaluation factors. Concerning storage constraints, the final storage rate at each location must not exceed the given limits. This constraint is enacted by the actual ACT field because effective land use is a serious issue. 2.3. Transportation Procedure The AGVs circularly transport containers in the system shown in Fig.1 while cooperating with the RTGCs as follows. step 1. A QCC loads a container to an AGV from a container ship. step 2. A container storage location is assigned as the destination location of the container. step 3. The AGV begins to transport the container to the assigned location. step 4. The AGV goes into the work path and then arrives at a container transfer point. step 5. The AGV begins to transfer the container to an RTGC. step 6. The RTGC stores it at the storage point and waits for the next order. step 7. The AGV that has already transferred the container to the RTGC goes back to a QCC. Back to step 1.
3. Agent Cooperation 3.1. Framework of Agent Cooperation In conventional studies, Yoshimura et al. and Ota et al. have focused on agent cooperation, such as handing over and transporting objects with the use of homogeneous multiagent [9,10]. On the other hand, since we deal with heterogeneous multi-agent, it is impossible to realize highly efficient agent cooperation using a uniform agent-behavior strategy for the various types of agents. Therefore, we focus on and design the following agent cooperation: (i) the RTGC selection method by AGV (distance-based RTGC selection and workspace-based RTGC selection), and (ii) RTGC selection and call-out timing by AGV (immediately after the AGV arrives at the destination point on the work path; immediately after the AGV goes onto the work path).
508
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model Workspace for RTGC_A RTGC_A
Workspace for RTGC_B
Destination point of AGV_A
RTGC_B
AGV_A
Communicating/Sensing/Selecting AGV on the working path
AGV_B
Sensing/Selecting AGV at the destination point
Figure 2. Cooperation Between the AGVs and RTGCs (top view)
3.2. Construction of Agent Cooperation 3.2.1. Distance-based RTGC selection at the destination point As shown by the dotted line in Fig.2, the AGV_A measures the relative distance of each RTGC at the location when the AGV_A arrives at the destination point. If two RTGCs are idling, RTGC_B, which is nearer the AGV_A than RTGC_A, will be selected and identified as the transfer partner. However, if the RTGC_B (a) has already been selected by another AGV or (b) is in a transfer or storage operation, the RTGC_A is selected as the transfer partner. If both RTGCs are in the state of (a) or (b), the AGV_A needs to wait at the destination point. 3.2.2. Workspace-based RTGC selection at the destination point The RTGCs have their own assigned workspace and operate in each workspace. Therefore, as shown in Fig.2, the AGV_A selects and calls the RTGC_B that operates in the assigned workspace. For this study, an equal scale of workspace was given to each of the RTGCs at the location. In other words, each RTGC has the following workspace: 4 Rows × 10 Bays × 4 Tiers. 3.2.3. Distance-based RTGC selection on the work path As shown by the solid line in Fig.2, immediately after the AGV_A goes onto the work path, the AGV_A begins to communicate with each RTGC to collect situational information and measure the relative distance for the AGV_A’s destination point; it then selects the RTGC_B as the transferring partner. The selected RTGC_B begins to move while the AGV_A is transporting. As in the case described in 3.2.1, the AGV has to select the more distant RTGC even if another RTGC is near the AGV’s destination point. This occurs even if the nearby RTGC is in the state of (a) or (b). In addition, the increase in the workload on the work path causes an increase in the number of AGVs that have already been communicating with the RTGCs as the number of AGVs increases. When this increase occurs, the rear AGV does not begin to communicate with the RTGC until the lead AGV finishes communicating with it.
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
Workspace of RTGC_B
Workspace of RTGC_A RTGC_A
509
RTGC_B J
F
L
C B
D
G
A
K
H
E
I
[Initial orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 J B D F E C L K I A H G Task [Scheduled orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H I J K L Task [Rescheduled orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 Task A H B C I J D K E F G L
Figure 3. Example of Container Storage Scheduling (top view)
3.2.4. Workspace-based RTGC selection on the work path As shown by the solid line in Fig.2, immediately after the AGV_A goes onto the work path, the AGV_A begins to communicate with each RTGC to identify its workspace and collect situational information. After that, the AGV_A selects the RTGC_B based on the workspace in which the container is stored; it orders the RTGC_B to move to the destination point. The selected RTGC_B begins to move while the AGV_A is transporting. The AGV maintains communication or waits on the work path until the RTGC is in the idling state if the RTGC is in the state of (a) or (b).
4. Container Storage Scheduling 4.1. Framework of Container Storage Scheduling From among conventional studies of container storage scheduling with ACTs, several studies have focused on system performance, in which container storage orders are given on the basis of certain dispatching rules [11,12]. These studies, however, have been limited to discussions of how to assign the generated transportation and storage orders to the AGVs; there have been no studies regarding the assignment considering the sequence of orders at the container storage locations. Therefore, in this study, we construct the following container storage models: random container storage, in which the sequence of orders is not considered, and container storage scheduling, in which the sequence of orders is considered. 4.2. Random Container Storage Random container storage is a container storage model that does not consider the sequence of the operation orders at the location. In Fig.3, the “initial orders” represent container transportation and storage orders that are randomly generated in the initial state.
510
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
Here, every container has information about its storage situation, including the “rows” and “bays” of available container storage space A∼L. In the case of random container storage, the AGVs achieve the orders in sequence, and the transported containers are then stored in sequence by the RTGCs. 4.3. Container Storage Scheduling With regards to container storage scheduling, we arrange the initial orders based on the following objectives: (A) minimizing the total moving distance of the RTGCs and (B) equalizing the workloads at the locations and on work paths. Here, we use the workspace for the RTGC. All orders are known a priori. First, the positions are defined as RTGC_A and RTGC_B, which are shown in Fig.3 as the initial positions. We then replace the initial orders based on (A) and define them as “scheduled orders.” These scheduled orders, however, cause a heavier workload at the workspace of RTGC_A. Therefore, we again replace the scheduled orders based on (B) as the second scheduling in each workspace and define them as “rescheduled orders.” The AGVs achieve the rescheduled orders.
5. Container Transportation Planning 5.1. Framework of Container Transportation Planning With regards to container transportation planning, we construct the following models: random, equivalent, and quay-side-weighted transportation. Based on the constructed models, the number of containers that are transported and stored at each location is determined in a few seconds under a storage constraint. With regards to the storage constraint at the location, the maximum number of stored containers, i.e., the container storage capacity, is assumed to be 320 [TEU]. 5.2. Random Transportation Under the random transportation rule, the destination locations of the containers are randomly planned. Therefore, the number of containers transported to each location is affected by random numbers. 5.3. Equivalent Transportation Under the equivalent transportation rule, the destination locations of the containers are equivalently planned. For instance, the number of containers is 600 [TEU], and the number of locations is 4. Thus, the number of containers transported to each location is planned to be 150 [TEU] (see Fig.4). As for the x-axis of Fig.4, the locations are arranged from the quay side in the order of the location number. In addition, due to the container storage capacity at one location (320 [TEU]), the final storage rate at each location reaches 46.9 [%]. Here, a final storage rate denotes (the number of stored containers) / (the container storage capacity).
511
Qu
ay
-si
76.9 [%]
246 [TEU]
de
-w
+30 [%]
eig
hte
ds
tor
ag
46.9 [%]
e
Equivalent storage 150 [TEU]
-30 [%]
Transported and stored container rate [%]
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
54 [TEU]
16.9 [%]
1
2
3
4
Location number
Figure 4. Example of the equivalent and quay-side-weighted transportation rule
5.4. Quay-side-weighted Transportation As shown in Fig.4, under the quay-side-weighted transportation rule, the destination locations of the containers are preferentially determined from the quay side. With regard to the storage constraint described in 2.2, the maximum and minimum storage rate must fall within ±30[%] at each location. Under this constraint, the number of transportation and storage containers is planned at a slant from the quay side. For instance, the total number of containers is 600 [TEU], and the number of locations is 4. Thus, the numbers of containers transported to the 1st ∼ 4th locations are 246, 198, 102, and 54 [TEU], respectively.
6. Optimal Design of the AGV Transportation System 6.1. Design Process The design process is described from the given demand, i.e., required throughput, to the design solutions, as follows. Here, the throughput is given as follows: (total number of containers) / (required transportation time). First of all, under random transportation, we construct the transportation systems based on management model 1)∼6), as shown in Table1. In order to derive the number of AGVs and RTGCs, we apply the design methodology [4]. Model 1) represents a conventional management model assumed in the literature [4]. Under this model, the AGV selects an RTGC at random with the use of the random container storage when the AGV arrives at the destination point. Process 1. Input a throughput. Process 2. Implement the management models. Process 3. Derive the deployed number of agents, evaluate operation efficiency, and derive a more efficient management model. Process 4. Apply the equivalent and quay-side-weighted transportation rules to the management model that is derived from process 3.
512
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
Table 1. System management models under random transportation RTGC selection method
RS
DS
DS
DS
DS
WS
WS
WS
WS
RTGC selection & call-out timing
DP
DP
WP
DP
WP
DP
WP
DP
WP
Container storage scheduling
×
×
×
×
×
Management model
1)
2)
3)
–
–
–
4)
5)
6)
DS: Distance-based Selection, WS: Workspace-based Selection, RS: Random Selection DP: at the Destination Point, WP: on the Work Path, ×: Random storage, : Storage scheduling
50
50
Model 1) Model 2) Model 3)
Random storage (Model 6)
Model 4) Model 5) Model 6)
30
Quay-side-weighted storage
40 Construction cost [point]
Construction cost [point]
40
20
10
0
Equivalent storage 30
20
10
0
10
20
30
40
50
60
70
80
90
100 110 120 130
10
20
30
Required throughput [TEU/hour]
(a) Under random container transportation
40
50
60
70
80
90
100 110 120 130
Required throughput [TEU/hour]
(b) Under all transportation rules
Figure 5. Comparison Results of the Total Construction Costs
Process 5. Derive the deployed number of agents, evaluate the operation efficiency, and design the most efficient management model. Design the numbers of AGVs and RTGCs as the design solutions. In processes 3 and 4, we use the following cost model in order to derive the construction cost. Based on the cost of equipment for the AGV and RTGC, a and b denoting the cost, the coefficients are as follows: a : b = 1 : 2. Construction cost = a × AGV s + b × RT GCs 6.2. Evaluation of the Operation Efficiency Based on the Total Construction Cost As shown in Fig.5(a), except model 1), notable differences of the total construction costs are noticeable as the required throughput becomes higher. As a result, the difference in the total construction costs between models 1) and 6) at the point of the required throughput is 120, i.e., higher by 18 [point]. This result indicates that model 6) is more efficient than the conventional management model by about 38 [%]. From the results of models 2)∼4) that were obtained in order to evaluate the validity of agent cooperation under the random container storage, model 4), that is, the model in which the AGV selects and calls out the RTGC by means of workspace-based selection at the destination point, is the most efficient one. Under container storage scheduling, model 6) is more efficient than model 5).
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
513
Therefore, model 6), that is, the management model in which the AGV selects and calls out the RTGC by means of workspace-based selection on the work path under container storage scheduling, is the most efficient one. 6.3. Discussion of a Highly Efficient Management Methodology Fig.5(b) shows the results of a comparison of the total construction costs of transportation systems designed with the use of equivalent and quay-side-weighted transportation in addition to management model 6). At the point at which the required throughput is 130, the differences in the total construction costs under equivalent and quay-side-weighted transportation are higher by 32 and 34 [point], respectively. In terms of operation efficiency, it is noticeable that the 16.3[%] system efficiency is improved under equivalent transportation and the 10.5[%] system efficiency is improved under quay-side-weighted transportation for model 6) with the use of random storage under random transportation. Even if a required throughput becomes greater, the tendencies in which equivalent transportation is more efficient will not be changed because of the operating RTGCs that are deployed equally at each location. From the results shown above, we designed the following highly efficient management model: the AGV selects and calls out the RTGC with the use of workspace-based selection on the work path under container storage scheduling after the container transportation and storage destinations are planned on the basis of equivalent transportation. Furthermore, these results clarified a need to take efficient system management into account.
7. Conclusion and Future Work In this paper, the focus was on the design of a highly efficient AGV transportation system. We constructed system management models that included agent cooperation, container storage scheduling, and container transportation planning. In the result, we designed a highly efficient management model and clarified the need to consider an efficient system management. In future works, we will address the issue of reliability and robustness of the transportation system in case of failure of one of the operations. On the other hand, the problem discussed in this paper is probably formulated from a viewpoint of mathematical programming. In that case, it will be necessary to consider the computational cost.
References [1] D. Steenken et al., Container Terminal Operation and Operations Research - A Classification and Literature Review, OR Spectrum, 26 1 (2004) 3–49. [2] P. Ioannou et al., Advanced Material Handling: Automated Guided Vehicles in Agile Ports, CCDoTT Technical Report, Center for Advanced Transportation Technologies, (2001). [3] J. Zhang et al., Automated Container Transport System between Inland Port and Terminals, 83rd TRB Annual Meeting Interactive Program, (2004). [4] S. Hoshino et al., Optimal Design Methodology for an AGV Transportation System by Using the Queuing Network Theory, 7th International Symposium on Distributed Autonomous Robotic Systems, (2004) 391–400.
514
S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model
[5] S. Hoshino et al., Comparison of an AGV Transportation System by Using the Queuing Network Theory, Proc. 2004 IEEE/RSJ Int. Conf. Intell. Robots and Systems, (2004) 3785– 3790. [6] S. Hoshino et al., Optimal Design, Evaluation, and Analysis of AGV Transportation Systems Based on Various Transportation Demands, Proc. IEEE Int. Conf. Robotics and Automation, (2005) 1412–1418. [7] S. Hartmann, Generating Scenarios for Simulation and Optimization of Container Terminal Logistics, OR Spectrum, 26 2 (2004) 171–192. [8] MITSUBISHI HEAVY INDUSTRIES, LTD., Advanced Technology Cargo Handling Systems, Products Guide, (2004). [9] Y. Yoshimura et al., Iterative Transportation Planning of Multiple Objects by Cooperative Mobile Robots, 2nd International Symposium on Distributed Autonomous Robotic Systems, (1996) 171–182. [10] J. Ota et al., Flexible Transport System by Cooperation of Conveyer-load AGVs, Proc. IEEE Int. Conf. Robotics and Automation, (2000) 1144–1150. [11] Martin Gunow et al., Dispatching Multi-load AGVs in Highly Automated Seaport Container Terminals, OR Spectrum, 26 1 (2004) 211–235. [12] Chin.I. Liu et al., A Comparison of Different AGV Dispatching Rules in an Automated Container Terminal, The IEEE 5th International Conference on Intelligent Transportation Systems, (2003) 3–6.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
515
Analysis of Purchase Decision Making: Network Externalities and Asymmetric Information Yohei Kaneko a,1 , Nariaki Nishino b Sobei H. Oda c and Kanji Ueda b a School of Engineering, The University of Tokyo b Research into Artifacts, Center for Engineering (RACE), The University of Tokyo c Faculty of Economics, Kyoto Sangyo University Abstract. Product markets in which network externalities are present have been expanding through technological development. In such markets, information about users strongly affects purchase decision-making because a consumer’s utility depends on the number of users. Although complete information is assumed in most studies of network externalities, it is difficult for consumers to obtain complete information in the real world. We construct a model of multiple-product market with network externalities. This model subsumes that each consumer has asymmetric information with respect to others’ purchase decision-making. We analyze purchase decisions using game theory, multi-agent simulation, and experiments with human subjects. Results imply the following important features: (1) a monopolistic market arises when consumers neglect their own preferences and purchase the same products that others do; (2) a market becomes inefficient because of asymmetric information; (3) irrational decision-making might increase total surplus; and (4) much information does not always increase consumers’ utilities. Keywords. Network externalities, Asymmetric information, Nash equilibrium, Multi-agent simulation, Experiments with human subjects
1. Introduction This study is intended to investigate consumers’ purchasing decision-making under conditions of asymmetric information in product markets where network externalities are present. Markets with network externalities, such as those for mobile telephones or Internet services, have been growing. Network externalities are defined as externalities by which a consumer’s utility depends on the number of users who consume the same product. To cite one example, a fax machine is worth little to a person if no other consumers use them. Church et al. [1] analyzed the equilibria of product diffusion in a market with mathematical model including the variety of software and prices. Katz et al. [2] specifically examined excess inertia and excess momentum, and investigated entering markets and product compatibility. Ruebeck et al. [3] undertook experiments with human subjects and indicated that switching costs between products prevented diffusion of superior products. 1 Correspondence
to: Yohei Kaneko, Research into Artifacts, Center for Engineering (RACE ), The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, CHIBA, 277-8568, Japan. Tel.: +81 4 7136 4271; Fax: +81 4 7136 4242; E-mail: [email protected]
516
Y. Kaneko et al. / Analysis of Purchase Decision Making
Oda et al. [4] constructed a market model with network externalities and analyzed the product diffusion process using cellular automata. Two problems are inherent in those studies. First, they assumed that information that consumers possess about products is complete and symmetric. Consumers in the real world, however, obtain various information from different media. Accordingly, one can say that information among consumers is asymmetric. Second, most studies specifically examine only the diffusion of products. Considering asymmetric information and the variety of consumer preferences, it is necessary to examine not only product diffusion, but also consumers’ behavior. We developed a model of a market with network externalities introducing asymmetric information. We used it to analyze consumers’ behavior with game theory, multiagent simulation, and experiments with human subjects. This paper is organized as follows. Section 2 explains our model. The set of market equilibria is characterized in section 3. Results of multi-agent simulation and experiments with human subjects are illustrated in section 4 and 5. We conclude this study in section 6. 2. The Model We explain the model of a product market, a formulation of network externalities, and consumers’ behavior under asymmetric information, which is an extention of the model constructed in Davis et al.[5]. 2.1. Product Market We constructed a model of a product market with network externalities. Consumers and multiple products (A, B) exist in the market. Among them, some consumers prefer product A, others prefer product B. A consumer’s utility using a product depends on the number of users of the same product because of network externalities. Only consumers exist in the market: for simplicity, no producers are included. It is assumed that the price of product A equals that of product B; both prices are constant. 2.2. Network Externalities Each consumer chooses a product for which (U tility) = Rp N e − (P rice)
(1)
is maximized [5]. Here, Rp indicates the reservation price and N e represents the number of users of each product. Equation (1) represents that a consumer’s utility increases when more consumers purchase identical products. A set of consumer’s reservation price for each product represents his or her preference. Table 1 shows an illustration of Rp. For example, the maximum money a consumer who tastes product A can pay for product A is $8; that for product B is $3. Table 1. An exemplary reservation price Consumer who prefers product A (B) Rp
Product A
Product B
$8 ($3)
$3 ($8)
Y. Kaneko et al. / Analysis of Purchase Decision Making
517
2.3. Consumers’ Behavior under Asymmetric Information Figure 1 shows a summary of consumers’ behavior. A consumer’s behavior in the market with network externalities is modeled as follows: • A consumer has a reservation price for each product. • It is assumed that each consumer has different information concerning others’ purchase decisions. For example, one consumer can know only three consumers’ decision-making among ten consumers, another can know eight among them. • A consumer makes the purchase decision of buying product A, product B, or nothing. • A consumer derives zero utility from buying nothing.
Figure 1. Consumers’ purchase behavior model
3. Theoretical Analysis This section shows results of theoretical analyses for product diffusion in three ways: consumer’s utility maximization, social surplus maximization, and Nash equilibria. Table 2 shows the definition of parameters for theoretical analysis. We assume that parameters satisfy conditions (2) and (3). Table 2. Definitions of parameters for theoretical analysis N
Number of whole consumers
rA
Number of consumers who prefer product A
rB
Number of consumers who prefer product B
RJ a
Reservation Price for product A that consumers who prefer product J have (J = A, B)
RJ b
Reservation Price for product B that consumers who prefer product J have (J = A, B)
P
Product Price
na
Number of consumers who use product A
nb
Number of consumers who use product B
n0 P >
Number of consumers who use nothing RA a (=
N =r
A
+r
RB b ) B
>
RA b (=
RB a )
= na + nb + n0
(2) (3)
3.1. Consumer’s Utility Maximization We classified nine cases and examined product diffusion for maximizing each consumer’s utility 1 . Table 3 shows the respective users in each case. 1 Althogh the details of equations are omitted, these cases are classified by inequalities of J J RJ a , Rb , N, r and P .
518
Y. Kaneko et al. / Analysis of Purchase Decision Making
Table 3. Number of users, in equilibrium, for whom consumer’s utility is maximized
Case 1 2 3 4 5
na 0 0 rA 0 rA
nb 0 0 0 rB rB
n0 N N rB rA 0
Case 6 7 8 9
na rA N 0 N 0
nb rB 0 N 0 N
n0 0 0 0 0 0
In Cases 1 and 2, no one purchases a product because products are expensive. Particularly in Case 1, no consumer can obtain positive utility even if all consumers purchase identical products. On the other hand, in Case 2, some consumers obtain a positive utility and others obtain negative utility when all buy identical products. In Case 3, consumers who prefer product A purchase product A, consumers who prefer product B never purchase any product. Cases 3 and 4 are symmetric. In Cases 5 and 6, the same product is bought according to each consumer’s preference. In Case 5, consumers who buy the opposite product to their preferences always obtain negative utilities. On the other hand, they do not do so necessarily in Case 6. All consumers purchase one kind of product in Cases 7, 8 and 9. Accordingly, some consumers buy opposite products to their preferences. 3.2. Social Surplus Maximization This section presents examination of product diffusion for maximizing social surplus in each case that was obtained in section 3.1. Social surplus means the sum of each consumer’s utility. Table 4 shows results of analysis. Table 4. Number of users, in equilibrium, for whom social surplus is maximized
Case 1 2 3 4 5
na 0 N 0 N 0 N 0
nb 0 0 N 0 N 0 N
n0 N 0 0 0 0 0 0
Case 6 7 8 9
na N 0 N 0 N 0
nb 0 N 0 N 0 N
n0 0 0 0 0 0 0
In Case 1, the social surplus is maximum when no product is purchased for an overly expensive product price. Above all, social surplus equals zero. Except for Case 1, monopolistic diffusion of any product maximizes the social surplus. In Cases 2, 5, 6 and 9, whichever product A or B diffuses, social surplus is maximized. In Cases 3, 4, 7 and 8, however, only one kind product diffuses because bias exists about consumers’ preference distribution. Compared with diffusion in section 3.1, the results indicate that private incentives of consumers do not coincide with social incentives (Cases 2, 3, 4, 5 and 6). 3.3. Nash Equilibria We examined Nash equilibria in each case that was obtained in section 3.1. Table 5 shows those results. Our analyses revealed the following points.
519
Y. Kaneko et al. / Analysis of Purchase Decision Making
Table 5. Nash equilibria Case 1
na 0
nb 0
n0 N
Case
na N
nb 0
n0 0
2
0 rA
0 0
N rB
7
0 0
N 0
0 N
N
0
0
0 0
N 0
0 N
3 4
5
6
0
0
N
0 0
rB 0
rA N
rA
rB
0
rA 0
0 rB
rB rA
0 N
0 0
N 0
rA
rB
0
rA 0
0 N
rB 0
0
rB
rA
0
0
N
8
9
N
0
0
* *
rA rA
rB 0
0 rB
*
0 0
N rB
0 rA
0
0
N
A * only if RA ar ≥P
• Product diffusion in each case of section 3.1 shows one of the Nash equilibria. • In addition to results of section 3.1, extra patterns of product diffusion were derived. • Although some Nash equilibria maximize social surplus, product diffusion that maximizes social surplus is not always a Nash equilibrium. 4. Multi-agent Simulation We analyzed decision-making under asymmetric information with multi-agent simulation. Cases 1 and 2 were not examined because no consumers purchased any products in equilibria. Furthermore, Cases 3 and 4 and Cases 7 and 8 were symmetric and essentially identical. For those reasons, this study simulated Cases 3, 5, 6, 8 and 9. 4.1. Agents’ Decision Making A consumer agent behaves in the following manner. 1. A consumer agent prefers product A or product B. 2. The number of users that each consumer can know is different from that of others. 3. N agents sequentially make decisions and can know the results of previous agents’ selections. 4. Consumer agents calculate expected utilities in the following two linear modes: (a) Calculate expected utility using information of preference distribution, or (b) Calculate expected utility without using information of the preference distribution. 5. The agent purchases the product that maximizes expected utility if it is positive. However, the agent purchases nothing if the expected utility is negative. 4.2. Simulation Results Table 6 presents the parameters used for simulations. Table 7 summarizes the simulation results. Three features are remarkable.
520
Y. Kaneko et al. / Analysis of Purchase Decision Making
• Information Cascades: Regardless of preferences, a consumer mimics others’ decision-making [6]. • Social Surplus: Social surplus is the sum of all consumers’ utilities. • Information and Utility: The relation between the number of users a consumer can know and their average utility. Figure 2 shows the relation between information and utility (Cases 5 and 6). Table 6. Parameters in simulation Case
RA a
RA b
P
rA
rB
3
8
3
35
7
3
5 6
8 8
3 3
35 25
5 5
5 5
8 9
8 5
4 3
25 25
3 5
7 5
N
10
Table 7. Summary of simulation results Without considering preference distribution
Considering preference distribution
Case
IC
SS
Information and Utility
IC
SS
3
No
39%
information⇒utility
Yes
34%
Information and Utility nothing
5
Yes
5%
information⇒utility
Yes
3%
information⇒utility
6
Yes
68%
information⇒utility
Yes
70%
information⇒utility
8
Yes
85%
nothing
Yes
100%
9
Yes
100%
nothing Yes 38% I C : Information Cascades, S S : Social Surplus
(a) simulation, Case 5
nothing nothing
(b) simulation, Case 6
Figure 2. Relation between information and utility
First, it is apparent that information cascades are observed, except for Case 3 (without considering preference distribution). For example, a consumer who prefers product B purchases product A to increase their utility by following other consumers’ decisionmaking. Second, in almost all cases, social surplus is inefficient. Accordingly, we can say that social surplus is not optimal even if consumers try to maximize their own utilities. We infer that two reasons pertain. • Product diffusion when maximizing each private utility differs from that when maximizing social surplus (Cases 3, 5 and 6). • Agents fail to do correct decision-making because of asymmetric information. Finally, we describe the relation between information and utility. In Case 5, much information increases utility (Fig. 2 (a)). Agents tend to obtain negative utilities because products are expensive. Their utilities decrease if the products which initial agents purchase
521
Y. Kaneko et al. / Analysis of Purchase Decision Making
do not diffuse in the market. On the other hand, much information decreases utility in Case 6 (Fig. 2(b)). In this case, initial agents purchase products according to their preferences. The latter agents buy those identical products by information cascades. Consequently, initial agents’ utilities are higher than those of latter agents. 5. Experiments with Human Subjects 5.1. Settings of Experiments We conducted experiments with human subjects based on methods of experimental economics [7]. Parameters were identical for all simulation cases. Settings of experiments were as follows. • Only one human subject existed in the market; other consumers were replaced with computerized agents. • In experiments using preference distribution information, agents considering it were replaced. • In experiments without using preference distribution information, agents who did not consider it were replaced. 5.2. Results of Experiments Table 8 summarizes results of experiments with human subjects. Figure 3 shows the relation between information and utility (Cases 5 and 6). Table 9 shows private utility and social surplus. Table 8. Summary of experimental results Without considering preference distribution
Considering preference distribution
Case
IC
SS
Information and Utility
IC
SS
3
Yes
39%
information⇒utility
Yes
35%
nothing
5
Yes
5%
information⇒utility
Yes
0%
information⇒utility
6
Yes
62%
information⇒utility
Yes
66%
information⇒utility
8
Yes
80%
information⇒utility
Yes
100%
nothing
9
Yes
95%
nothing
Yes
26%
nothing
(a) experimental results in Case 5
Information and Utility
(b) experimental results in Case 6
Figure 3. Relation between information and utility Table 9. Effect to all consumers by a consumer’s decision-making Case 6 (Without Information of preference distribution)
Private Utility Agent Human Subject
96% 84%
Social Surplus 64% 99%
522
Y. Kaneko et al. / Analysis of Purchase Decision Making
First, information cascades were also observed in experiments with human subjects. Consumers in the real world buy the same products as others do. Second, social surplus is inefficient in almost all cases. Compared with simulation results, it is less inefficient because human subjects behave irrationally in experiments despite obtaining the same information as agents in simulation. Private utility of human subjects is worse than that of computerized agents. However, social surplus in the market where the human subjects exist might be efficient (Table 9), perhaps because human subjects fail to maximize their respective utilities and consequently increase other consumers’ utilities. Finally, we describe the relation between information and utility. Nearly identical results to those of the simulation were derived in experiments. As shown in Case 6 (Figure 3 (b)), the consumer whose information is 8, however, obtained more utility than computerized agent. This indicates that he or she performed purchase decision making that is more rational than linear expectation. 6. Concluding Remarks In this paper, we analyzed consumers’ purchase decisions under asymmetric information in a product market with network externalities. Introducing asymmetric information that each consumer has different information about product diffusion, we developed a model of a market in which network externalities are present. Our analyses using theory, multiagent simulations, and experiments with human subjects indicated the following salient points. • Monopoly diffusion is derived when consumers neglect their preferences and buy the same products as others do. • Social surplus is especially inefficient when a difference is apparent between private incentives of consumers and social incentives. • Irrational decision-making decreases a consumer’s utility. However, it might increase the social surplus. • In cases where products are expensive, much information increases consumers’ utility. On the other hand, in cases where products are cheap, less information increase consumers’ utility because products that are initially purchased diffuse in the market because of information cascades. References [1] J. Church and N. Gandal: Network Effects, Software Provision, and Standardization, The Journal of Industrial Economics, Vol.40 (1992), No. 1, pp.85-103. [2] M. L. Katz and C. Shapiro: Product Introduction with Network Externalities, The Journal of Industrial Economics, Vol.40 (1992), No.1, pp.55-83. [3] C. Ruebeck, S. Stafford, N. Tynan, W. Alpert, G. Ball and B. Butkevich: Network Externalities and Standardization: A Classroom Demonstration, Southern Economic Journal, Vol.69 (2002), No.4, pp.1000-1008. [4] S. H. Oda, K. Iyori, K. Miura, K. Ueda: The Application of Cellular Automata to the Consumer’s Theory, Simulating the Duopolistic Market, Simulated Evolution and Learning, Springer (1999), pp.454-461. [5] D. D. Davis and C. A. Holt: Experimental Economics, pp.249-282, Princeton University Press, Princeton, N.J., 1992. [6] L. R. Anderson and C. A. Holt: Information Cascades in the Laboratory, The American Economics Review, Vol.87 (1997), No.5, pp.847-862. [7] D. Friedman and S. Sunder: Experimental Methods: A Primer for Economists, Cambridge University Press, Cambridge, 1994.
Part 9 Dynamics, Morphology, and Materials in Intelligent Behavior
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
525
Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling Simon Bovet 1 Artificial Intelligence Laboratory, University of Zurich Abstract. When natural intelligent systems are studied with artificial agents, control architectures and body dynamics are usually designed for specific tasks. This contrasts with natural evolution, considered to have no goal or intention. An alternative approach consists in investigating how emergent behaviors can be observed with systems not designed for any particular task. This paper shows how two different homing strategies observed in insects (path integration and landmark navigation) can emerge from a general model of autonomous agent control. This recently proposed control architecture consists of homogeneous, non-hierarchical sensorimotor coupling. The synaptic coupling between each pair of modalities uses a Hebbian learning rule that captures correlations between sensory or motor activity generated by the agent itself. Activity is allowed to propagate across the different modalities, thus generating the observed behaviors. The experiments are performed both in simulation and with a real robot. Keywords. Emergent behaviors; Sensorimotor coordination; Insect navigation
1. Introduction In the endeavor of building “intelligent” agents, designers very often decide to concentrate on a specific task the agent has to perform, towards which much of the design is actually biased. For instance, if we consider an agent built by a researcher focusing on bipedal locomotion, it is very likely that no one will be surprised (nor will really argue) if the robot is equipped with two “legs”, if its control model contains some sort of central pattern generators, or if some proprioceptive information from the legs or the feet is used as feedback to modulate some parameters of the pattern generators. Also, providing the agent with an ambient temperature sensor will most likely be considered as irrelevant or off-topic. On the one hand, there is no need to argue that this scientific approach – i.e. designing agents for specific tasks – has led and continues to lead to substantial progress and understanding. On the other hand, several difficulties of this approach have been recognized [1]. To mention just a few: What kind of sensory or motor information is relevant and should be used for a given task? How to implement a particular observed behavior? Very often, designers go at least partially for ad-hoc solutions tailored in actual fact for 1 Current
address: AI Laboratory, Andreasstrasse 15, 8050 Zurich, Switzerland. E-mail: bovet@ifi.unizh.ch
526
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
Figure 1. (a) General structure of the control architecture model. Each pair of sensory of motor modalities is fully connected by synaptic coupling (represented symbolically by the arrows). (b) Implementation of the model at the neural level for any sensory or motor modality. Arrows represent full connectivity from and to all other modalities. See text for details.
the task the agent has to perform. For example, designers almost always provide the control system of autonomous robotic agents with a set of basic behaviors such as reflexes or with a general hierarchical structure in the control architecture. However, it is not clear how arbitrary the assumptions made by the designer about how the external world is perceived by the situated agent actually are. Moreover, since most “intelligent” systems that we know and from which we often draw inspiration are in fact natural systems, it is legitimate to ask the following question: how much can we understand about intelligent systems which are products of natural evolution – i.e. which were not intentionally designed for any specific task – by investigating artificial agents precisely constructed to perform given tasks? Consequently, an alternative approach consists of building a system a priori not designed for any particular task, and observing what behaviors can emerge from its interaction with the environment. The present work follows this approach with focus on the control part of robotic agents. In [2] and [3], we proposed a new model of homogeneous and non-hierarchical robot control architecture. We showed that by systematically coupling any pair of sensory or motor modalities with a structure kept as simple as possible (using only Hebbian learning), interaction of the agent with the environment can lead to the emergence of coherent behaviors, such as the agent approaching and following an object without any built-in reflex, or solving a delayed reward task with no sensory memory. The present paper extends this initial research and shows that for a situated agent using the same control architecture model, a sufficiently rich body dynamics leads to the emergence of two different navigation strategies observed in insects such as bees and ants [4]: one resorting on egocentric information (dead reckoning or path integration [5, 6]), the other using a geocentric system of reference (Snapshot model [7] or Average Landmark Vector model [8]).
2. Model The control architecture model used in this work is described in details in [2,3]. The essential aspects thereof are briefly discussed here.
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
527
This model consists essentially of a systematic and homogeneous synaptic coupling of each pair of sensory or motor modalities, as shown in Figure 1(a). The role of the synaptic coupling between each pair of sensory or motor modalities is twofold: first, it learns the correlations of simultaneous activity in the respective modalities (with a simple Hebbian learning rule); second, it allows activity to propagate from one modality to the other, according to correlations learned by the agent itself. The activity thus induced in the motor modalities generates actions, leading to observable behaviors of the agent. Note that at this level, no distinction is made between sensors and motors. Moreover, there is no hierarchical structure that could bias how the external world is perceived by the robot: the agent has no built-in knowledge about any relation between sensory input and motor action. In our model, each modality consists of five populations of real-valued linear artificial neurons (see Figure 1(b)): 1. Current State: the activity of the neurons in this population reflects the current reading of the corresponding sensor or motor. In the visual modality for instance, the activity of each neuron corresponds to the brightness of a pixel in the camera image. 2. Delayed State: like the current state, but delayed by a small amount of time, typically one time step. 3. “Desired”1 State: contains as many neurons as the current state, but is only activated by the synaptic input coming from all other sensory or motor modalities. 4. State Change: a population of neurons whose activity reflects a change in the current state, i.e. a difference between activity in the delayed and current states. For example, the activity of this population corresponds in the visual modality to the visual flow2 in the camera image. 5. “Desired” Change: like the state change, but taking this time the difference of activity between the current and “desired” states. The activity of the neurons in this population propagates through the synaptic couplings to the neurons in the “desired” state of all other sensory or motor modalities. The synaptic coupling between each pair of sensory or motor modality fully connects the neurons in the “desired” change population of the first modality to the neurons in the “desired” state population of the second modality. The weights (initially set to zero) are continually modified by a simple Hebbian learning rule using the activity of the current and state change populations in the first and second modality, respectively. A detailed mathematical description of the model is provided in the appendix.
3. Experiments 3.1. Agent and Environment Experiments are performed both in simulation and in the real world with the mobile robot shown in 2(a). The agent is situated in a flat environment similar to the ecological niche 1 The
term “desired” is purely arbitrary, chosen so only to help the reader better understand the model. visual flow is defined here as the ratio of intensity change of one pixel over the gradient of neighboring pixel intensity. If the absolute value of the gradient is below a given threshold, the flow is set to zero. 2 The
528
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
Figure 2. (a) The mobile robot “Samurai” used for the real world experiments, equipped with an omnidirectional camera and differential steering. (b) Image processing for the visual input. The image from the 360◦ camera (top) is used to extract the visual input for the system (300 × 1 pixel, bottom). (c) Motor input of the system. Each component is represented by a radial bar whose length indicates the magnitude, and whose position indicates the direction. The gray arrow represents the resulting net input. See text for details.
of a desert ant. The home location – corresponding to the “nest” – consists of a dark hole in the ground. An external light and heat source – similar to the sun – shines upon the whole surroundings. The agent is equipped with the following sensors: 1. Omnidirectional Camera: the one-dimensional 360◦ view is taken by thresholding, averaging and low-pass filtering a portion of the polar transformed image from the camera, as shown in Figure 2(b). The polar image is aligned to a geocentric coordinate system by using a compass3 . 2. Temperature Sensor: since the agent out of its nest is constantly exposed to the external heat source, its temperature constantly increases with time. 3. Light Sensor: only detects light when the agent is outside the nest hole4 . The motor modality of the agent is a crucial aspect for the following experiments and needs a more detailed description. Even though the experiments are performed in the real world with a robot equipped with a differential steering, a redundant motor representation is chosen. Instead of a two-component input indicating the left and right motor speeds, the input consists of several components indicating the magnitude of “speed vectors” applied to the agent in equally spaced azimuth directions, as shown in Figure 2(c). The different vectors applied together produce a net speed vector, which is finally transformed to the two wheel motors in the following way: if the agent is aligned in the direction of the net speed vector, its moves forward proportionally to its magnitude; otherwise, it turns towards its direction. 3 With the real robot, the compass value is obtained by triangulation, using the omnidirectional camera image and three salient landmarks of known positions. 4 Obviously – no hole were bored in the floor of the indoor office arena! – temperature and light sensors are simulated on the real robot.
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
529
Figure 3. Trajectories of the agent whose behavior corresponds to an egocentric homing strategy (path integration). The solid circle indicates the home location. The dot specifies the location where the “homing behavior” is triggered. In (d), (e) and (f), the agent was displaced to a different location (dotted arrow); when released, the agent returns to a fictive home location corresponding to the integrated path (dotted circle). (a) and (d) show trajectories obtained in simulation, (b) and (e) plot trajectories of the real robot, (c) and (f) show for comparison trajectories of a real desert ant (adapted from [9] with permission).
Figure 4. Path integration homing behavior. (a) Graphical representation of the synaptic weights connecting the temperature modality to the motor modality, corresponding to the run shown in Figure 3(e), just before the homing behavior is triggered. (b) Propagation of activity across the temperature and motor modalities. The activity in each population of neurons is illustrated symbolically. See text for details.
3.2. Emergence of Egocentric Navigation Strategy During the initial foraging excursion, the neurons in the “desired” state population of the motor modality are externally activated with random values. The agent therefore moves around randomly in the environment. At a given point, the activity of the “desired” state neuron in the temperature modality is set to an arbitrarily low value, and the agent is let to move on its own. The resulting trajectories – obtained either in simulation or with the real robot – are shown in Figure 3(a), (b), (d) and (e): the agent returns in an almost straight course to the starting location and stops there. If the agent is first displaced and released from a different location, it returns to a fictive home location corresponding to the integrated path.
530
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
This behavior thus matches quite exactly what is described in the literature as egocentric homing strategy (path integration, see Figure 3(c) and (f), or e.g. [5,6,9]). A consideration of the coupling between the motor and temperature modalities provides an explanation for the emergence of this behavior. In the described environment, the value of the temperature sensor constantly increases, meaning that the state change neuron in the corresponding modality displays a constant activity. This leads to the fact that the Hebbian learning rule modifying the synaptic weights of the connections from the temperature modality acts in fact like an integrator. Consequently, the values of the weights connecting the temperature modality to the motor modality are the timeintegrated values of the motor readings, i.e. the actual integrated path (see Figure 4(a)). Therefore, activity in the temperature “desired” change neuron (provoked by reducing the activity of the “desired” state neuron) will propagate into the motor modality according to the integrated path, thus generating the homing behavior (see Figure 4(b)). 3.3. Emergence of Geocentric Navigation Strategy In this second experiment, we observe the behavior of the agent provoked by setting the activity of the “desired” state neuron in the light modality to an arbitrarily low value. The resulting behavior is again quite surprising (see Figure 5(a) and (b)): irrespective of the release position, the agent returns to the home position and stops there. This behavior corresponds closely to what is known as geocentric homing strategies (landmark navigation, see Figure 5(c), or e.g. [7,8,10]). It is also interesting to notice that agent does not necessarily return along a straight trajectory and most of the time, the agent avoids obstacles on its route – an additional emergent behavior already observed with traditional visual homing strategies [10]. An explanation for the emergence of this behavior requires here to consider neuronal activity across three modalities, namely light, vision and motor modalities. The neuron in the light modality corresponding to the change of detected light (i.e. in the state change population) is only active at the beginning of the foraging excursion, namely when the agent exits its dark hole and surfaces into the bright external environment. The effect is that the Hebbian learning rule will thus capture a “snapshot” of the states of all the other modalities in the weights of the synapses leaving the light modality. Therefore, when the activity of the “desired” state neuron in the light modality is arbitrarily set to a low value (corresponding to darkness), the synaptic coupling will propagate activity into the “desired” state population of the visual modality, projecting so to say the view of the environment as seen from the home location (Figure 6(a), inner ring). Now, if the agent is located anywhere else, the discrepancy between neural activity corresponding to the perceived environment (current state) and the projected view (“desired” state) will lead to activity in the “desired” change of the visual modality, as shown in Figure 6(b). This activity will in turn propagate into the motor modality according to the correlation learned by the agent itself between motor activity and visual flow, thus leading to the observed homing behavior (see Figure 6(c)).
4. Discussion This investigation sheds a new light on the situated perspective of an agent performing a given behavior. It shows how two homing strategies emerged from a single control
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
531
Figure 5. Trajectories of the agent whose behavior corresponds to a geocentric homing strategy (landmark navigation). Objects are shown in dark gray and the circle marks the home location. Each trajectory corresponds to a different release position. (a) Simulation results. (b) Trajectories obtained with the real robot. (c) Trajectories of an agent using the Average Landmark Vector model (adapted from [10] with permission).
Figure 6. Visual homing behavior. (a) The difference between the visual current state (outer ring) and the “desired” state (inner ring) produces activity in the “desired” change corresponding to the visual flow indicated by the arrows. Landmarks are represented by black discs, the home location by the gray dot. (b) Propagation of activity across the light, vision and motor modalities. The activity in each population of neurons is illustrated symbolically. (c) Generated motor activity. The resulting behavior is a motion of the agent in east direction, i.e. towards the home location. See text for details.
model, each strategy triggered by only changing the activity of one single neuron in two different sensory modalities: in the temperature modality (whose sensor value continually increases with time) for the egocentric strategy; or in the light modality (whose sensor value only changes when the agent leaves or returns to the home location) for the geocentric strategy. Note that the activation of these single neurons, presented here as an external intervention, can be generated by the agent itself. Suppose that the agent were provided with an additional sensory modality measuring its hunger (as a matter of fact an internal, situated value). The nest being a source of food, the model would learn – as the agent would eat inside the nest – a correlation between decrease of hunger and darkness as well as low temperature. As soon as the hunger level of the agent would exceed a given value during the foraging behavior, activity would propagate from the “desired” change of the hunger modality into both the light and temperature modalities, generating the same kind of activation as the one presented so far as external intervention, and thus triggering the homing behaviors.
532
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
This work also demonstrates that observable behaviors do not need to rely on specifically dedicated control “modules” – an instantiation of the well-known frame-ofreference problem [1]. It contrasts with other models of these navigation strategies proposed so far (for a survey, see e.g. [8,9,11,12,13]), which – to our knowledge – all present definite designer biases5 at the control level. Firstly, only activity in specific modalities (e.g. vision for geocentric strategies) is defined as system input. Secondly, all these models contain a specific “module” responsible for converting an explicit internal state (e.g. the components of the integrated home vector) into precise motor commands. An additional conceptual difficulty is related to the symbol-grounding problem: what triggers the homing behaviors? An explicit “go home” action seems indeed quite questionable from a biological and evolutionary perspective. In our model, the homing behaviors are produced by activity generated (either externally or by the agent itself) in sensory modalities a priori quite irrelevant, which would certainly be attributed differently by an observer focusing only on the model without taking into account the agent-environment interaction.
5. Conclusion This work shows how emergent behaviors can be observed with situated agents using a control model not designed for any particular task. More specifically, it describes how two insect navigation strategies (path integration and landmark navigation) emerge from a recently proposed model of control architecture [2,3], consisting basically of homogeneous sensorimotor coupling using Hebbian-like synaptic plasticity. The presented set of experiments highlights the relevance of using an agent equipped with a number of different sensory modalities and redundant body dynamics, engaged in a sufficiently rich environment (a concept also referred to as morphological computation [15]). This also stresses the importance of taking into account a priori irrelevant sensors (such as temperature or light sensors), which turn out to play a critical role in grounding the observer-based concept of “homing” behavior.
Acknowledgment This research is supported by the IST-2000-28127 European project (AMOUSE). Thanks to Hanspeter Kunz for kindly providing the “Samurai” robot platform and to Miriam Fend for helpful comments on the paper.
Appendix: Mathematical description of the model M M is Let xM = (xM 1 , x2 , . . .) be the current state of a modality M . The state change y M M defined at any time step t by means of the delayed state x ˘ (t) := x (t − τ ):
yM (t) := δ M (˘ xM (t), xM (t)) 5 Even
if some specific parts have been shown to be learnable by the agent itself [14].
S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling
533
where τ > 0 is a small time delay and δ M (·, ·) a function computing a change between two states, typically: δ M (a, b) := b − a. The synaptic weight matrix WM N = (WijM N ) connecting any two different modalities M and N is initially set to zero WM N (0) := 0 and is updated with a Hebbian learning rule: N WijM N (t + 1) := WijM N (t) + η · xM i (t) · yj (t)
with the learning rate η > 0. The “desired” state x ˜M is defined as the synaptic input coming from all other modalities. Finally, the “desired” change y ˜M is defined in a similar M way as the state change y : x ˜M (t + 1) := WM N (t) · y ˜N (t) N =M
M
M
y ˜ (t) := δ (xM (t), x ˜M (t))
References [1] R. Pfeifer and C. Scheier, Understanding Intelligence. Cambridge, Mass.: MIT Press, 1999. [2] S. Bovet and R. Pfeifer, “Emergence of coherent behaviors from homogenous sensorimotor coupling,” in Proc. 12th Int. Conf. on Adv. Robotics (ICAR), Seattle, 2005, pp. 324–330. [3] ——, “Emergence of delayed reward learning from sensorimotor coordination,” in Proc. of the IEEE/RSJ Int. Conf. on Int. Robots and Sys. (IROS), Edmonton, 2005, pp. 841–846. [4] R. Wehner, B. Michel, and P. Antonsen, “Visual navigation in insects: Coupling of egocentric and geocentric information,” Journal of Experimental Biology, vol. 199, pp. 129–140, 1996. [5] M. Collett and T. S. Collett, “How do insects use path integration for their navigation?” Biological Cybernetics, vol. 83, pp. 245–259, 2000. [6] D. Andel and R. Wehner, “Path integration in desert ants, Cataglyphis: how to make a homing ant run away from home,” Proc. R. Soc. Lond., vol. 271, pp. 1485–1489, 2004. [7] B. A. Cartwright and T. S. Collett, “Landmark learning in bees,” Journal of Comparative Physiology, vol. 151, pp. 521–543, 1983. [8] D. Lambrinos, R. Möller, T. Labhart, R. Pfeifer, and R. Wehner, “A mobile robot employing insect strategies for navigation,” Robotics and Autonomous Systems, special issue on Biomimetic Robots, vol. 30, pp. 39–64, 2000. [9] M. Müller and R. Wehner, “Path integration in desert ants, cataglyphis fortis,” Proc. Natl. Acad. Sci. USA, vol. 85, pp. 5287–5290, 1988. [10] V. V. Hafner, H. Kunz, and R. Pfeifer, “An investigation into obstacle avoidance as an ‘emergent’ behaviour from two different perspectives,” in Proc. of the EPSRC/BBSRC Int. Workshop on Biol.-Inspired Robotics: The Legacy of W. Grey Walter, Bristol, 2002, pp. 166–173. [11] M. O. Franz and H. A. Mallot, “Biomimetic robot navigation,” Robotics and Autonomous Systems, vol. 30, pp. 133–153, 2000. [12] A. S. Etienne and K. J. Jeffrey, “Path integration in mammals,” Hippocampus, vol. 14, pp. 180–192, 2004. [13] R. Möller, “Insect visual homing strategies in a robot with analog processing,” Biological Cybernetics, vol. 83, pp. 231–243, 2000. [14] V. V. Hafner and R. Möller, “Learning of visual navigation strategies,” in Proc. of the Eur. Workshop on Learning Robots (EWLR-9), M. Quoy, P. Gaussier, and J. Wyatt, Eds., Prague, 2001, pp. 47–56. [15] R. Pfeifer and F. Iida, “Morphological computation: Connecting body, brain and environment,” Japanese Scientific Monthly, vol. 58, no. 2, pp. 48–54, 2005.
534
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Active learning of local structures from Attentive and Multi-Resolution Vision 1 Maxime Cottret a and Michel Devy a,2 a LAAS-CNRS, Toulouse, France Abstract. In order to execute tasks, a robot needs a complex visual system to cope with the detection, characterization and recognition of places and objects. This paper presents on-the-way developments on detection and characterization functions, integrated on a companion robot equipped with an omnidirectional color camera and an PTZ camera with pan, tilt and zoom controls. When moving in a indoor environment, our robot executes concurrently a pre-attentive process and an attentive one; the former extracts coarsely localized regions of interest (ROI) from the omnidirectional images. The latter analyzes more accurately each ROI by focusing the PTZ camera, grabbing a succession of viewpoints and generating an appearance based model of the region. Based on discriminant and invariant patches, this model provides a set of visual features which will be used to categorize region classes and to recognize region instances in next images. Keywords. local structures, attention, active vision, multi-resolution.
1. Introduction In ten or twenty years, everyone will have a companion robot at home to perform some simple domotic tasks. The cognitive functions of such a robot are currently studied by several robotics team, in the framework of the COGNIRON european project. Several testbeds are integrated , with many sensors used to learn knowledges, to control navigation and manipulation tasks and to interact with the user. The embedded system of a companion robot, integrates functional and decisional functions, especially to learn knowledges on the environment, autonomously or interactively with a user. In order to generate motions, several spatial models must be learnt; at the more abstract level, the environment is described by places a posteriori labelled by the user according to some semantical informations (labels like kitchen, room, livingroom...) and connected in a topological graph. In order to control the execution of these motions, the robot will require other knowledges to locate itself, i.e. discriminant landmarks linked to places where they have been detected or characteristic images for every area. Classical free space representations like occupancy grids could be generated in cluttered areas. 1 This
work is funded by the FP6 european project COGNIRON: http://www.cogniron.org to: Michel Devy, LAAS-CNRS, 7 avenue du Colonel Roche, 31077 Toulouse Cedex 04, FRANCE. Tel.:+33 5 61 33 63 31 ; Fax:+33 5 61 33 64 55 ; E-mail: [email protected] 2 Correspondence
M. Cottret and M. Devy / Active Learning of Local Structures
535
A companion robot has to recognize, locate and grasp objects. First, a user could ask the robot to bring him an object (a pen, a glass, the TV control box. . . ) or to execute autonomously a more complex task that has been learnt a priori like Water plants. Moreover, the companion robot could execute by itself some simpler tasks, planned from analysis of the user behaviour to recognize its intention, e.g. Bring an object the user is looking for. Many learning functions required on a companion robot, have been studied in the robotics community, taking advantage of the Bayesian framework to deal with uncertainties and noisy measurements, e.g. the automatic construction of stochastic maps [13] to represent landmarks detected from visual or laser data [5,15], the autonomous learning of topological models from panoramic images, including the place categorization [17,14], the automatic construction of a geometrical model for a 3D object to be grasped by the robot . . . The robot could learn these models interactively with the user, e.g. it could construct navigation maps following an operator who guides the robot during the home tour and who gives a semantical label to every place [16], ot it could construct the model of an object that the operator moves so that every aspect will be viewed [4]. This paper gives preliminary results on a more autonomous strategy applied by our robot to learn local knowledges, from two different visual modalities: a color omnidirectional camera and an active camera with pan, tilt and zoom parameters that can be controlled by the system (PTZ camera). When learning the environment map from panoramic images, eventually following an operator, our robot will also detect and track regions of interest (namely, ROIs): a ROI is a salient image region, i.e. a region that can be easily distinguished in a local neighborhood. These ROIs correspond to local structures of the environment, discriminant and invariant enough to be tracked in an image sequence and to be exploited in further motions, either planare objects like quadrangles [5], or isolated 3D objects laid on the ground (chairs...) or on tables (glasses, plates...). From high resolution images acquired by the PTZ camera, the robot will build an invariant representation for such local structures, fusing data acquired from several view points and asking the operator to label this entity for further interactions. So, two processes will be executed asynchronously to extract local knowledges; the panoramic images will be analyzed in a pre-attentive process, while high resolution images will be acquired and analyzed by the attentive process. The section 2 gives a short state of the art and details our general approach. In the section 3, the pre-attentive process is described: it provides a list of salient regions detected on low resolution images. The section 4 is devoted to the attentive process, for the categorization of local structures extracted from high resolution images acquired on every ROI. The section 5 gives preliminary results on the integration of such visual processes on our experimental robot. Finally, the section 6 will summarize our contribution and our current works.
2. General approach In recent years, numerous authors in the Vision community, have provided contributions on cognitive vision, visual learning or object categorization. Some authors [1] propose neuronal approaches to analyze the whole image, but many others exploit only invariant features or patches. Numerous regions descriptors have been proposed [9]:
536
M. Cottret and M. Devy / Active Learning of Local Structures
Scale Saliency Patches [7] are mostly used in object recognition and categorization [3,2], while SIFT features [8] have become very popular in robotics [17]. J.Ponce et C.Schmid [11,12,10] build representations from patches inspired from the classical Harris points [9]. . . In these contributions, an object class is characterized by a set of invariant patches extracted from numerous images acquired from different viewpoints. The described entities correspond generally to a single aspect of a 3D object (the back side of a car, a right side of an airplane) like in [3]. Typically these authors do not address the localization problem, which is mandatory for a robot which will interact with the described object (landmark-based navigation, object grasping, visual servoing). Moreover, images are simple, with isolated objects during the learning step; only one passive visual modality is exploited to acquire images; only view-based representations are built, assuming that all patches extracted from every image belong to the same object. Here we take advantage of active and multi-focal vision to improve the learning of local structures extracted and tracked from sequences of complex images. At this step, the learning task is autonomous; interactions with the operator will be considered in a next step, in order to add semantics to the model. The robot moves along a known trajectory, and during this motion, executes concurrently two asynchronous visual processes. The pre-attentive process detects salient regions from low resolution images. It exploits quick operators [6] to create from every low resolution image, a saliency map, segmented by a clustering method, in order to generate Regions of Interest, memorized in a ROI list. The attentive process will learn only from these regions. It is executed asynchronously, taking as input, the ROI list; it controls the PTZ camera to acquire high resolution images focused successively on the salient regions. For every one, it generates an appearance based representation, more precisely a set of discriminant scale-invariant patches [7]. Then, this representation will be used to categorize local structures in object classes, and to recognize object instances in next images [3]. The pre-attentive process provides image-based information only for short-term memorization; on the contrary the attentive process generates knowledge, for long-term memorization. Such a structure could correspond to an isolated object; but generally, it will be difficult to obtain a right object segmentation from an image; a user interaction will be required later for the final interpretation.
3. Pre-attentive vision This process takes place before any decision. The robot need to detect highly salient regions in order to choose where to focus. Numerous definitions of visual saliency exist in the literature, like Jagersand, Itti or Kadir’s ones. As our robot will interact with humans, it has to focus on the same discriminant features than humans. Therefore, our preattentive vision system is based on the saliency detector developed by L. Itti [6]: according neurobiologic studies on primates, particular locations in the scene only are treated in the visual cortex, in a bottom-up process and without a priori knowledge. These studies also showed that three types of visual components take most part in the pre-attentive vision : color, intensity and orientations. Thus, after extraction of these components into three maps, they are put in competition and only highly informative locations remain leading to a saliency map of the visual field.
M. Cottret and M. Devy / Active Learning of Local Structures
537
Figure 1. Saliency map construction
Saliency map construction The figure 3 summarizes the pipeline of operations from the input image to the saliency map. Three filters are applied to extract intensity, color and orientation discontinuities at several levels of the input image resolutions. Three feature maps (one for each filter) are computed at several image resolutions, using center-surround comparisons, in order to model the sensitivity difference in the human visual system between the center and the edge of the retina. I(c, s) = |I(c) I(s)| RG(c, s) = |(R(c) − G(c)) (G(s) − R(s))| BY (c, s) = |(B(c) − Y (c)) (Y (s) − B(s))| O(c, s, θ) = |O(c, θ) O(s, theta)|
with c ∈ {2, 3, 4} and s = c + δ, δ ∈ {3, 4} So, the first step consists in computing gaussian pyramids, with scale σ ∈ [0..8], to detect discontinuities on the intensity I(σ), the color RG(σ) for G − R, BY (σ) for B − Y , and the orientation O(σ, θ), using Gabor filters with angles θ = 0◦ , 45◦ , 90◦ and 135◦ . Feature maps are normalized and integrated to form a single conspicuity map for each visual attributes I, G − R, B − Y and O. A non-linear normalization is applied to each conspicuity map to amplify peaks of contrasts relative to noise in the background. In the final stage, conspicuity maps are combined at a given scale (here σ = 3) to produce a single saliency map S: I=
c+4 4 $ $
N (I(c, s))
c=2 s=c+3 c+4 4
C=
$ $
|N (RG(c, s)) + N (BY (c, s))|
c=2 s=c+3
O=
(
c+4 4 $ $
θ∈{0◦ ,45◦ ,90◦ ,135◦} c=2 s=c+3
S=
N (I)+N (C)+N (O) 3
N (O(c, s, θ)))
538
M. Cottret and M. Devy / Active Learning of Local Structures
The figure 2-b presents the saliency map and intermediate intensity, color and orientation conspicuity maps, computed on a low resolution image acquired in a kitchen: let us note that the more salient local structure is the coffee machine.
(a)
(c)
(b)
(d)
(e)
Figure 2. (a) input image, (b) saliency map, (c) intensity conspicuity, (d) color conspicuity, (e) orientation conspicuity (from Gabor filters)
Regions of interest extraction. Our selection algorithm takes the saliency map as a statistical map for the purpose of Monte-Carlo selection in order to extract N from the most salient points. It uses rejection sampling, i.e. randomly reject points according a randomly biased threshold. The resulting points cloud, distributed over the most salient regions of the map, is then clustered in R3 (spatial position and saliency) using the K-means method, with K a priori fixed; we are evaluating a Mean Shift algorithm to avoid the classic al problem on the K selection. The regions of interest are stocked as the barycenter - weighted by the saliency - , the bounding box of the points from a resulting class and the mean saliency. The pre-attentive process. Whereas the robot goes through its environment, ROIs are detected using the omnidirectional camera and stored with a timestamp and a position tag. The treatment of ROIs through the attentive process is done according the mean saliency weighted by: Ð distance from the current position of the robot : the ROI must be in the field of view of the PTZ camera. Ð distance from the current timestamp : even if most of the environment is almost static, some objects can be moved, involving a change in the interest of the ROI. So, we considere that the saliency of a ROI become insignificant with time. ROIs are erased once treated or if they become too old (the time threshold is manually fixed).
M. Cottret and M. Devy / Active Learning of Local Structures
539
4. Attentive vision Once the robot "knows" where salient structures rely in its environment, it can address planning or decision making problems of higher abstraction, in order to build a longterm model for these entities. By now, our development are focussed on the learning of appearance-based models; later a geometrical ones will be built from the same image sequences.
Figure 3. Extracted features on focalized images
Our companion robot evolves in a cluttered environment, where plenty of partially occulted objects, hardly separable from the background, can be found. Therefore the robot’s model has to be incrementally built and/or updated. It appears that global object description like PCA or moments are not adapted for our purpose (it code the whole shape and appearance of the object in one description). Therefore a local structure is modelled by a constellation of features. Scale Saliency Patches T.Kadir and M.Brady [7] introduce a local invariant descriptor based on information and scale-space frameworks. Each point of the input image x, is characterized by a descriptor vector D that take values (d1 ...dr )t . A local probability density function px,s (pdf) is estimated using histogram on a circular window and for a range of scale s (radius). The entropy Hx,s of this pdf is: Hx,s = Σi px,s (di )log2 (px,s (di )) The local maximum of entropy over the scale gives the patch characteristic scale s¯ .Thus, a local feature is given by a pixel s with its scale s¯ and a score S S = Hx,¯s Σi |px,¯s (di ) − px,¯s−1 (di )| In order to reduce the feature number, extracted features are clustered in R3 (spatial positions and scale): a cluster of Scaled Salient Patches has a local support (nearby points with similar scores and scales). Each cluster must be sufficiently distant from all other in. Thus, a local structure corresponds to such a cluster, given by the barycenter of a selected local support (mean pixel, mean scale and mean score): the figure 3 presents Scaled Salient Patches extracted from focalized views acquired on the coffee machine seen on the global view in figure 2. The attentive process. The attentive process focus on the most salient ROI given by the pre-attentive process. For each ROI, Scaled Salient Patches are extracted and stored as a thumbnails clus-
540
M. Cottret and M. Devy / Active Learning of Local Structures
Figure 4. (top left) input from omnidirectional camera; (top right) saliency map (boxes mark centers of ROIs); (bottom) focalized image on the bottom right salient region
ter with a position tag and label them as "location" of the environment. As describe in [10], we use pixel correlation for thumbnails matching and so "location" recognition : there is recognition if most features in the new viewpoint are matched with the location’s one. Then, the location’s list of thumbnails is updated by merging the new viewpoint’s ones and eliminates those that never match. A coarse 3D localization using the position of the different viewpoints, can also be computed because the robot motion is known.
5. Preliminary results By now, operations involved in the pre-attentive and attentive processes have been evaluated on several images manually acquired at home. Only the pre-attentive process is integrated on a robot. Panoramic images are acquired, moving the robot with the joystick; salient regions must correspond to same areas in the environment; we are studying how to match salient regions extracted from a sequence of panoramic images. The omnidirectional camera is coarsely calibrated, so we can compute a 3D direction from each ROI’s center; on figure 4, salient ROIs are extracted from the panoramic image on the top left, and a focalized image is acquired from the PTZ camera pointed toward the more salient region, here around the clear cupboard behind a person.
6. Conclusion This paper has proposed an original approach to learn local structures that a robot could detect by itself indoor. These models will be used to understand how the environment is structured: they could be combined either with a topological map built by another learning process, to recognize places (such an entity characteristic for such a place), or with a metrical stochastic map to locate the robot. Current works are devoted to two problems:
M. Cottret and M. Devy / Active Learning of Local Structures
541
1) how to take into account contextual knowledge, mainly to make easier the segmentation of the local structures (for example, assuming that these structures are on large uniform planar faces of the environment: ground, walls ...)? and 2) how to categorize and recognize objects, from their patch representations built from several viewpoints, using the Bayesian framework proposed in [2]?
References [1] N. Dohuu, W. Paquier, and R. Chatila. Combining structural descriptions and image-based representations for image, object and scene recognition. In Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI), 2005. [2] L. Fei-Fei, R. Fergus, and P. Perona. A bayesian approach to unsupervised one-shot learning of objects categories. In Proc. Int. Conf. on Computer Vision (ICCV), 2004. [3] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scaleinvariant learning. In Proc. IEEE Int. Conf. on Vision and Pattern recognition (CVPR), 2003. [4] P. Fitzpatrick, G. Metta, L. Natale, S. Rao, and G. Sandini. Learning about objects through action: Initial steps towards artificial cognition. In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2003. [5] J.B. Hayet, F. Lerasle, and M. Devy. Visual landmarks detection and recognition for mobile robot navigation. In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Vol.II, pages 313–318, 2003. [6] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence(PAMI), Vol. 20, No. 11, pages 1254–1259, 1998. [7] T. Kadir and M. Brady. Scale, saliency and image description. Int. Journal on Computer Vision (IJCV), 2001. [8] D.G. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision, 60, 2, pages 91–110, 2004. [9] K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In Proc. Int. Conf. on Computer Vision (ICCV), 2001. [10] J. Ponce, S. Lazebnik, F. Rothganger, and C.Schmid. Toward true 3d object recognition. In Proc. AFRIF/AFIA Conf. Reconnaissance de Formes et Intelligence Artificielle (RFIA), 2004. [11] F. Rothganger, S. Lazebnik, C.Schmid, and J. Ponce. 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In Proc. IEEE Int. Conf. on Vision and Pattern Recognition (CVPR), 2003. [12] C. Schmid. Constructing models for content-based image retrieval. In Proc. IEEE Int. Conf. on Vision and Pattern Recognition (CVPR), Vol II, pages 39–45, 2003. [13] R.C. Smith and P. Cheeseman. On the representation of spatial uncertainty. In Int.Journal on Robotics Research, 1987. [14] A. Tapus, S. Heinzer, and R. Siegwart. Bayesian programming for topological global localization with fingerprints. In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2004. [15] S. Thrun and Y. Liu. Multi-robot slam with sparse extended information filers. In Proc. Int. Symp. of Robotics Research (ISRR), 2003. [16] E.A. Topp and H.I. Christensen. Tracking for following and passing persons. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2005. [17] Z. Zivkovic, B. Bakker, and B. Krose. Hierarchical map building using visual landmarks and geometric constraints. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2005.
542
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Modular Design of Home Service Robot System with Hierarchical Colored Petri Net Guohui TIAN a,1, Feng DUAN b and Tamio ARAI b a b
School of Control Science and Engineering, Shandong University, China
Department of Precision Machinery Engineering, University of Tokyo, Japan
Abstract. A modular design procedure is proposed for the autonomous and distributed control of home service robot system with Hierarchical Colored Petri net. In order to handle complexity of robot configurations and to satisfy flexibility necessary to covering various tasks and dynamic environments for home service, each kind of devices or tasks are considered as a module, the interaction mechanisms for the coordination and cooperation of the devices are presented. By studying an example, a control structure for home service robot system is developed in detail using the simulation software—CPN Tools. Keywords. Home service robots, modular design, Hierarchical Colored Petri net
Introduction It is necessary to develop effective home service robot systems consisting of two or more robots and other important assistant devices, in order to deal with so many and different kinds of home works and increase the whole system’s flexibility, adaptability and reliability in the complicated and dynamic home environments. The complex characteristics of home service robot system can be summarized as follows: (1) Home service robots may have many different robot configurations of the kinematical and sensor modules. For example, one can have only a mobile base and vacuum. Some can have a head module in addition and others can have arms in addition. As the robots are considered to be composed of several modules of a mobile base, arms, head, sensors units, and so on, the architecture has to be able to handle the complexity of diverse robot configurations.
1
Corresponding Author: Guohui TIAN, School of Control Science and Engineering, Shandong University,
No.73, Jingshi Road, Jinan city, 250061, P.R.China; E-mail: [email protected].
G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 543
(2) Home service robots may have to perform many different kind of tasks. They include exploration, guidance, transportation, cleaning, surveillance, teaching and playing with children, etc. Since these tasks are different from each other, it is better for each task to have a corresponding module [1, 2]. (3) Home environment is dynamic. For example, the orientation of a chair is changed or even this chair is moved away from its former position, or one person suddenly crashes into the house. So an efficient developing procedure is needed to handle the complexity of robot configurations and to satisfy flexibility necessary for covering various tasks and the dynamic environments. It should be composed of upper level control for high-level intelligence to interact with human and to plan tasks, as well as lower level control to allow low-level intelligence for prompt reaction in each module [3, 4]. In this paper, modular design procedure is proposed using Hierarchical Colored Petri nets [5]. The design methodology is based on a concept of discrete states and actions, and a kind of devices or tasks is represented as one autonomous module composed of a sequence of primitive actions. Home service robot system is viewed as a collection of asynchronous and concurrent processes which execute different tasks. For cooperative or exclusive tasks of multiple modules, discrete event-driven control is implemented as a data flow network of concurrent processes communicating with each other. The procedure is developed by modeling a home service robot system using one simulation software—CPN Tools [6].
1. The Structure of a Home Service Robot System A general home service robot system is depicted as Figure 1. It is composed of a hand robot with an arm camera, a mobile robot carrying a hand with an arm camera, a
RF Tag
Internet
Figure 1. The structure of a home service robot system
544 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net
conveying mobile robot with a pallet, a set of projector, and 4 set of ceiling cameras. In addition, the RF-Tags are used for the conveying mobile robots to navigate in the home environment, QR-Codes are adopted to recognize and handle so many different kinds of goods in the house conveniently and efficiently. All the above devices are connected by the Local Area Net and controlled in a distributed and autonomous way. We will modularly develop the control structure for this system in a down-top way using Hierarchical Colored Petri Nets.
2. Petri Net Models for the Modules 2.1. Petri Net Model for Projector The projector can send out a color line onto the floor board in specified position and orientation. The mobile robots will be guided by this line as globally referential path information, as well as move autonomously with collision avoidance based on the local environmental information through their own sensors, such as laser range scanners and infrared detectors. Guided by this line, the mobile robot can navigate more robustly considering the dynamic characteristics of home environment. This module is modeled as Figure 2. When the projector receives a request command from the coordinator and if it is in idle state at this time, it will modulate its orientation, and then send out a guiding line in correct position and direction. The projector will maintain the massage-sending state till it receives an acknowledgement massage to cancel the guiding line, and then return to be idle. 2.2. Petri Net Model for Ceiling Cameras Each of the 4 set of cameras can give the local scene information respectively, and they
Figure 2. Petri net model for projector module
Figure 3. Petri net model for ceiling cameras module
G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 545
work together corporately to give the global scene information about the home environment. Based on the scene information, the conveying mobile robot can understand the home environment more efficiently. This module is modeled as Figure 3. When the cameras receive the request command from the coordinator, they will capture images, send out image massages, and maintain these massages till they receive an acknowledgement massage from the conveying robot, and then turn to be idle. 2.3. Petri Net Model for Hand Robot The hand robot carrying an arm camera can load or unload goods. Its model is composed of an arm module and an arm camera module, its global description is just depicted as Figure 4 (a). When the hand robot receives a request command, the arm will start to move near to the goods to be handled, and at the same time the arm camera will receive a command massage to start to capture images in order to detect the goal object in detail. After the goal object is detected, the arm will hold it, and move it near to the pallet installed on the conveying robot, at the same time a command request is
(a) Global Petri net model for hand robot
(b) Petri net model for arm camera module
(c) Petri net model for arm module
Figure 4. Petri net models for hand robot module
546 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net
sent to the arm camera again in order to detect the exact position of the pallet. If the pallet is available and is detected, when the arm approaches it, the hand will place the goods onto it, at the same time a massage is sent to the conveying robot. The arm will wait till it receives a massage from the conveying robot that the goods has been loaded successfully, then it will release the goods. Finally a massage is sent to conveying robot so that it can start to transfer, and also an acknowledgment massage is sent to the coordinator indicating that the loading action is over, the hand returns to home state and then becomes idle, just depicted as Figure 4 (c). When the arm camera receives a request command massage, it will start to capture images till it has detected the goal object, then it will send out the goal image massage to the arm module and return to be idle, just depicted as Figure 4 (b). 2.4. Petri Net Model for Mobile Robot The mobile robot carries an arm with an arm camera to load or unload goods just like the hand robot, it also can move autonomously in some local areas of the house. Its model is composed of a wheel module, an arm module, and an arm camera module, its global description is just depicted as Figure 5(a). When the mobile robot receives a request command, firstly it moves to one position near to the goods to be loaded or unloaded, and then it will load/unload goods onto/from the conveying robot in the same way as the hand robot does. The moving action of the wheel module is modeled as Figure 5(b). The wheel module will continue moving till it approaches to the destination, and then it will send out a request command to the arm module. Here the arm camera module acts in the same way as in the case of hand robot depicted as Figure 4(b). The loading/unloading action of the arm module is the same as that of the arm module of the hand robot depicted as Figure 4(c), except that here it
(a) Global Petri net model for mobile robot
(b) Petri net model for wheel module
Figure 5. Petri net models for mobile robot module
G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 547
receives the request command signal from the wheel module instead of that from the coordinator in case of hand robot. 2.5. Petri Net Model for Conveying Mobile Robot The conveying mobile robot carries a pallet and can move to convey goods from one position to another destination position. It can be loaded or unloaded goods by the hand robot or the mobile robot. The conveying robot can navigate in the house by detecting the local environment by its own installed sensors. In addition, it can use the image massages provided by the ceiling cameras as local and global scene information about the environment and can be guided by the line massage provided by the projector to navigate robustly in the whole house. This module is modeled as Figure 6(a). When the conveying robot receives a request command, guided by the ceiling cameras and the projector it firstly moves to a position near to the loading robot, i.e. the hand robot or the mobile robot. Then it will send an approaching signal to the loading robot and wait to be loaded. Finishing being loaded, it also will send a message to the loading robot and wait to transfer the loaded goods. When the loading robot moves its arm away, it starts to transfer the loaded goods to the destination position. After it is unloaded, it will send out an acknowledgment massage to the coordinator and return to the idle state.
(a) Global Petri net model for conveying mobile robot
(b) Petri net model for its wheel module
Figure 6. Petri net model for conveying mobile robot module
548 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net
The actions of the conveying robot to move near to the loading robot and to transfer the loaded goods to the destination position are both described by the wheel module depicted as Figure 6(b). The former is considered as the first moving request to the wheel module, as well as the latter as the second one. The places of Wheel Req and Approach Massage are typed as INT, so that two kinds of request commands can be distinguished by variable (wreq), i.e. 1 and 2, respectively. The initial mark for the Idle place is 1’e, indicating that the wheel module can only respond to one request once a time, i.e. mutual exclusion. The unloading action is similar to the loading action, it is represented just as one transition for simplicity.
3. Coordinating Mechanism for Above Devices All the above devices are connected by Local Area Net, and are scheduled by the coordinator. The Petri net model of the coordinating mechanism is depicted as Figure7. When we want to move the goods on desk to another destination position, the coordinator will send request commands to the hand robot, the conveying robot, the projector, and the ceiling cameras respectively. The hand robot loads goods onto the conveying robot, the conveying robot will transfer the loaded goods to the destination position guided by the massages provided by the projector and the ceiling cameras. In the same way, when we want to fetch some goods from the refrigerator, the coordinator will send request commands to the mobile robot, the conveying robot, the projector, and the ceiling cameras respectively. The mobile robot loads goods onto the conveying robot, the conveying robot will transfer the loaded goods to the destination position guided by the massages provided by the project and the ceiling cameras.
Figure 7. Petri net models for coordinator
G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 549
Figure 8. Top level Petri net model for the whole system
4. Top Global Level Petri Net for the Whole System The whole system is modeled as the top level net of Hierarchical Colored Petri net just depicted as Figure 8. Where the substitution transitions have their own subpages corresponding to the representations of hand robot, mobile robot, conveying robot, projector, ceiling cameras, and also the coordinator described as above respectively.
5. Conclusion Based on the modular design procedure, we could provide the flexibility needed for so many different kinematical configurations and tasks. In addition, it is very easy to add or remove robot configuration modules, and can represent additional devices belonging to the same kind using the same module model only by changing its color type conveniently. References: [1] [2] [3] [4] [5] [6]
G.Yasuda and K.Tachibana, A parallel distributed control architecture for multiple robot systems using a network of microcomputers, Computers & Industrial Engineering, 27 (1994), 63-66. G.Yasuda, K.Tachibana, A computer network based control architecture for autonomous distributed multirobot systems, Computers & Industrial Engineering, 31 (1996), 697-702. Gideon Cohen, Concurrent system to resolve real-time conflicts in multi-robot systems, Engineering Applications of Artificial Intelligence, Volume 8, Issue 2, April 1995, 169-175. John P. T. Mo and Bryan C. K. Tang, Petri net modeling and design of task oriented messaging system for robot control, Computers & Industrial Engineering, Volume 34, Issue 4, 1998, 729-742 Kurt Jensen, An Introduction to the Practical Use of Colored Petri Nets. In: Reisig, W., Rozenberg, G. (eds.): Lectures on Petri nets II. LNCS 1492. New York: Springer Verlag, 1998, pp. 237-292. CPN Tools. Online: http://wiki.daimi.au.dk/cpntools/cpntools.wiki.
550
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Auctions for task allocation to robots Maitreyi Nanjanath a and Maria Gini a,1 a Department of Computer Science and Engineering, University of Minnesota Abstract. We present an auction-based method for dynamic allocation of tasks to robots. The robots operate in a 2D environment for which they have a map. Tasks are locations in the map that have to be visited by the robots, in any order. Unexpected obstacles and other delays may prevent a robot from completing its allocated tasks. Therefore tasks not yet achieved are rebid every time a task has been completed. This provides an opportunity to improve the allocation of the remaining tasks and to reduce the overall time for task completion. We present experimental results that we have obtained in simulation using Player/Stage. Keywords. multi-robot systems, auctions, task allocation, Player/Stage
1. Introduction There are many real-world problems in which a set of tasks has to be distributed to a group of robots. We are interested in situations where, while a single robot could do all the tasks, sharing the work with other robots will reduce the time to complete the tasks and increase the success rate. Search and retrieval tasks, which have been studied extensively in robotics (see, for instance [10,14]), are examples of the types of tasks we are interested in. In our study, tasks are locations in the map that have to be visited by the robots, but we could easily add other activities the robot has to perform at each task location. What distinguishes task allocation to robots from other task allocation problems is the fact that robots have to physically move to reach the task locations, hence the cost of accomplishing a task depends highly on the current robot location. We describe an efficient method based on auctions to perform task allocation. The method does not guarantee an optimal allocation, but it is specially suited to dynamic environments, where execution time might deviate significantly from estimates, and where it is important to adapt dynamically to changing conditions. The method is totally distributed. There is no central controller and no central auctioneer, which increases robustness. The auction mechanism we propose attempts to minimize the total time to complete all the tasks. Given the simplifying assumption of constant and equal speed of travel for all the robots, this is equivalent to minimizing the sum of path costs over all the robots. We are not as much interested in obtaining a theoretically optimal solution, as in providing a method that is both simple and robust to failure during execution. If a 1 Correspondence to: Maria Gini, Dept of Computer Science and Engineering, 200 Union St SE. Minneapolis, MN 55455. Tel.: +1 612 625 5582; Fax: +1 612 625 0572; E-mail: [email protected].
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
551
robot finds an unexpected obstacle, or experiences any other delay, or is disabled, the system continues to operate and tasks get accomplished. Our algorithm is greedy, and finds close-to-optimal solutions that are fast to compute. It is flexible, allowing robots to rebid when solutions are unobtainable, rather than forcing a costly re-computation of the entire optimal solution.
2. Proposed Algorithm In this work we assume that each robot is given a map that shows its own location and the positions of walls and rooms in the environment. No information is given about where the other robots are located. The map allows a robot to estimate its cost of traveling to the task locations, and to compute the path to reach them from its original location. Suppose a user has a set R of m robots R = {r1 , r2 , ...rm }, and a set T of n tasks T = {t1 , t2 , ...tn }. In this study tasks are simply locations in the map that have to be visited, but the algorithm can take into account additional costs of doing the task once its location has been reached. The user partitions the tasks into m disjoint subsets, such that T1 ∪ T2 ∪ ... ∪ Tm = T and Ti ∩ Tj = φ ∀i, j1 ≤ i, j ≤ m. and allocates each subset to a robot. Note that a subset can be empty. The initial task distribution done by the user might not be optimal. Some robots might have no task at all assigned to them, while others might have too many tasks, the tasks assigned to a robot might be distributed all over the environment, and might be within easy reach of another robot, some tasks may be in an unreachable location. A robot must complete all its tasks unless it can pass its commitments to other robots. Since the robots are cooperative, they will pass their commitments only if this reduces the estimated task completion time. The ability to pass tasks to other robots is specially useful when robots become disabled since it allows the group as a whole to increase the chances of completing all the tasks. This process is accomplished via single-item reverse auctions, in which the lowest bid wins, that are run independently by each robot for their tasks. Each bid is an estimate of the time it would take for that robot to reach that task location (assuming for simplicity a constant speed) from its current location. To generate paths efficiently, robots use Rapidly-expanding Random Trees (RRTs) [12]. Generation of RRTs is very fast, and scales well with large environments. An example of a RRT is shown later in Figure 2. Auctions are simultaneous, i.e. many auctioneers may put up their auctions at once, but since each bidder generates bids in each auction independently of the other auctions, the effect is the same as having the auctions done sequentially. The algorithm each robot follows is outlined in Figure 1. We assume the robots can communicate with each other, for the purpose of notifying potential bidders about auctioned tasks, for submitting their own bids, and for receiving notification when they won a bid. When estimating costs for tasks in different auctions, a robot uses only its current position, without accounting for possible movements in between task locations. A robot can choose not to bid on a particular task, based on its distance from and accessibility to that task. Once the auctioned tasks are assigned, the robots begin to move to their task locations, attempting the nearest task first (i.e. the task with the lowest cost). When a robot
552
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
Repeat for each robot ri ∈ R: 1. 2. 3. 4. 5. 6.
Activate ri with a set of tasks Ti and a list of robots R−i = R - {ri }. Create an RRT using ri ’s start position as root. Find paths in the RRT to each task location in Ti . Assign cost estimate cj to each task tj ∈ Ti based on the path found. Order task list Ti by ascending order of cj . ri does in parallel: (a) Auction the assigned tasks: i. Create a Request For Quotes (RFQ) with tasks Ti . ii. Broadcast the RFQ to R−i and wait for bids. iii. Find the lowest bid bjk among all the bids for task tj . iv. If bjk < cj then assign tj to robot rk else keep tj . Mark tj as assigned. v. Ask rk to update its bids for the tasks left (rk has now new tasks). vi. Repeat from 6(a)iii until all tasks are assigned. (b) Bid on RFQs received from other robots: i. Find a RRT path for each task tr in the RFQ. ii. Create a cost estimate cr for each tr that the robot found a path to. iii. Send the list of costs to the auctioneer that sent the RFQ. (c) Begin execution of first assigned task: i. Start executing the first task tj by finding a path in the RRT and following it as closely as possible. ii. If new tasks are added as result of winning auctions, insert them in Ti keeping it sorted in ascending order of cost, and repeat from 6(c)i. iii. If ri is stuck, auction ri ’s tasks. iv. If tj is completed successfully, restart from 4.
until timeout. Figure 1. Task allocation algorithm.
completes its first task, it starts an auction again for its remaining tasks, in an effort to improve the task allocation. In case robots get delayed by unexpected obstacles, this redistribution of tasks allows them to change their commitments and to adapt more rapidly to the new situation. If a robot is unable to complete a task it has committed to, it can auction that task. Any task that cannot be completed by any of the robots is abandoned. We assume that there is value in accomplishing the remaining tasks. The robots are given a time limit to complete the tasks, so that they do not keep trying indefinitely. When all the achievable tasks (determined by whether at least one robot was able to find a path to that task) are completed, the robots idle until the remainder of the time given to them is over. The algorithm allows for dynamical additions of new tasks during the execution, but for simplicity, in the experiments described in Section 3, the set of tasks and of robots is known at start and does not change during the execution.
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
553
Figure 2. The hospital environment. The top part of the figure shows the Stage simulation, with the locations of the tasks and of the robots. (The active robot has its range sensor traces shown). The lower part shows the paths generated by the RRT algorithm, with the location of the active robot on the paths indicated by a square. This is one of the single robot experimental runs, where only one robot is active.
3. Experimental setup and analysis We conducted experiments in the Player/Stage simulation environment [9]. We simulated robot deployment in complex 2-D worlds, using as our test environment the section of the hospital world from Player/Stage shown in Figure 2. The hospital world consists of several rooms with small doorways and limited accessibility, covering a total area of 33 × 14m2 . Each robot is a small differential drive vehicle placed at an arbitrary location in the world. It is equipped with 5 sonar sensors mounted at 45◦ angles across its front, which are used for obstacle avoidance. While these sensors allow the robot to avoid colliding into straight walls, robots tend to get stuck on corners where they cannot detect the corner before colliding into it. This tend to produce unexpected delays in the execution. Tasks are modeled as beacons placed at different positions in the environment. We used different experimental setups, each with 16 tasks placed in different rooms. We tested the setups with 1, 3, and 10 robots, and ran a final set of experiments with a single auction (with no rebidding) to use as a baseline. The experiments were run for 10 minutes each, to avoid long runs when robots were unable to make much progress. This also allowed us to test how often the robots could
554
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
not accomplish all the tasks in the allocated amount of time. We ran each experiment 10 times, with the same initial conditions, but with different initial task allocations. The auction algorithm is sensitive to the order in which tasks are given to the robots. To reduce this effect we supplied the tasks to the robots in a random order each time an experiment was run. This, combined with the inherently random nature of the RRT generation algorithm, resulted in significant variations across runs both in the allocation of tasks and time taken to complete the tasks. 600
Completion Times (seconds)
500
400
300
200
Single Round Auction Single Robot
100
3 Robots, Multiple Auctions 10 Robots, Multiple Auctions
0
1
2
3
4
5 6 Experiment Number
7
8
9
10
Figure 3. Time spent trying to complete tasks in different robot-auction combinations.
Performance results are shown in Figure 3. The results show the time taken to complete all the tasks that were accomplished in each run. We can observe that a single robot takes longer, but, as expected, the speedup when using multiple robots is sublinear. A single round auction tends to perform worse than multiple auctions and has more variability in the time needed to complete the tasks. This is consistent with the observation that reallocation of tasks via additional bidding tends to produce on average a better allocation. The results are best when the number of robots and tasks is balanced. When the task are few some of the robots stay idle, when the tasks are too many with respect to the number of robots the completion time increases, since each robot has more work to do. Figure 4 shows the percentage of tasks completed for each run. Since the number of tasks was relatively large with respect to the time available and the distance the robots had to travel, very few runs had all the tasks completed. We can observe that with a single robot only a small percentage of the 16 tasks get accomplished in the time allocated. With a more balanced number of tasks and robots a much larger percentage of tasks gets done. We can see differences between runs when using a single round auction versus using multiple rounds. The performance of multiple rounds of auctions is not consistently better than when using a single round. Recall that in each experiment the initial allocation of tasks to robots was different, and some allocations were clearly better than others.
555
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots 100 90
Percentage Tasks Completed
80 70 60 50 40 30
Single Round Auction Single Robot
20
3 Robots, Multiple Auctions
10
10 Robots, Multiple Auctions
0
1
2
3
4
5 6 Experiment Number
7
8
9
10
Figure 4. Relative task completion rates for different robot-auction combinations
4. Related Work The problem we studied is a subset of the larger problem of coordination in a team. Our robots have to coordinate so that all the locations of a given set are reached by a robot, but are otherwise independent. A recent survey [5] covers in detail the state of the art in using auctions to coordinate robots for accomplishing tasks such as exploration [4,11], navigation to different locations [15], or box pushing [7]. Auction-based methods for allocation of tasks are becoming popular in robotics [4,8,15] as an alternative to other allocation methods, such as centralized scheduling [3], blackboard system [6], or application-specific methods, which do not easily generalize [1] to other domains. Combinatorial auctions have been tried as a method to allocate navigation tasks to robots [2] but are too slow to be practical and do not scale well. They allow tasks to be accomplished with maximum efficiency, but the time taken in determining whom to assign which tasks often ends up being more than the time for the tasks themselves. Single item auctions tend to miss opportunities for optimal allocation, even though they can be computed in polynomial time. Our approach tries to find a tradeoff between computational complexity and optimality of allocations. We do not use combinatorial auctions, but we reauction tasks multiple times while they are being executed, so allowing for a better allocation. Recent work [15,13] has focused on producing bidding rules for robot navigation tasks that lower computational costs while providing an optimal solution. The method uses multi-round auctions, where each robot bids in each round on the task for which its bid is the lowest. The overall lowest bid on any task is accepted, and the next round of the auction starts for the remaining tasks. Once all the tasks have been allocated, each robot plans its path to visit all the sites for the tasks it won. The bidding rules are such that there is no need for a central controller, as long as each robot receives all the bids from all the robots, each robot can determine the winner of the auction. Our approach differs in many ways. First, the auctioneer determines the winner of the
556
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
auction, so if a robot fails to submit a bid (perhaps because of communication failure), the auction can continue. Second, our approach is designed for dynamic situations where unexpected delays during execution can prevent a robot from accomplishing its tasks, or can make task accomplishment more expensive than originally thought. By continuously rebidding and reallocating tasks among themselves during task execution, the robots react and adjust to changing situations.
5. Conclusions and Future Work We have presented an algorithm for allocation of tasks to robots. The algorithm is designed for environments that are dynamic and where failures are likely. We assume the robots are cooperative, and try to minimize the total time to complete all the tasks assigned to the group. Each robot acts as an auctioneer for its own tasks and tries to reallocate its tasks to other robots whenever this reduces the cost. Robots also reassess the current situation and attempt to improve the current task allocation by putting their remaining tasks up for bid whenever they complete a task. The process continues until all the tasks have been completed or the allocated time has expired. We removed any need for central coordination; tasks are assigned in a distributed fashion, so that the system can recover from single or even multiple points of failure. This prevents us from using any centralized system, such as a blackboard system [6], since this will create a single point of failure. Future work will include considering additional costs to do tasks over the cost of reaching the task location, and introducing heterogeneous robots having different speeds and capabilities. In addition, we have left addressing communication malfunctions for future research. It is our experience that robots can become disabled but rarely lose the ability to communicate. Disabling communication will introduce new challenges. Since each robot is initially given its own tasks, it will have to maintain separately the complete list of tasks given to the system as a whole. This can be done by having an initial communication phase that involves broadcasting the list of tasks to all the robots (assuming no communication failure at that time). Each robot will also need to track task completion by the other robots, and periodically broadcast its own state. Acknowledgments Work supported in part by NSF under grants EIA-0324864 and IIS-0414466.
References [1] W. Agassounon and A. Martinoli. Efficiency and robustness of threshold-based distributed allocation algorithms in multi-agent systems. In Proc. of the 1st Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 1090–1097, July 2002. [2] M. Berhault, H. Huang, P. Keskinocak, S. Koenig, W. Elmaghraby, P. Griffin, and A. Kleywegt. Robot exploration with combinatorial auctions. In Proc. Int’l Conf. on Robotics and Automation, 2003. [3] S. Chien, A. Barrett, T. Estlin, and G. Rabideau. A comparison of coordinated planning methods for cooperating rovers. In Proc. of the Int’l Conf. on Autonomous Agents, pages 100–101. ACM Press, 2000.
M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots
557
[4] M. B. Dias and A. Stentz. A free market architecture for distributed control of a multirobot system. In Proc. of the Int’l Conf. on Intelligent Autonomous Systems, pages 115–122, Venice, Italy, July 2000. [5] M. B. Dias, R. M. Zlot, N. Kalra, and A. T. Stentz. Market-based multirobot coordination: A survey and analysis. Technical Report CMU-RI-TR-05-13, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, April 2005. [6] R. S. Engelmore and A. Morgan, editors. Blackboard Systems. Addison-Wesley, 1988. [7] B. P. Gerkey and M. J. Matari´c. Sold!: Auction methods for multi-robot coordination. IEEE Trans. on Robotics and Automation, 18(5), Oct. 2002. [8] B. P. Gerkey and M. J. Matari´c. Multi-robot task allocation: Analyzing the complexity and optimality of key architectures. In Proc. Int’l Conf. on Robotics and Automation, Sept. 2003. [9] B. P. Gerkey, R. T. Vaughan, and A. Howard. The Player/Stage project: Tools for multi-robot and distributed sensor systems. In Proc Int’l Conf on Advanced Robotics, pages 317–323, June 2003. [10] D. Goldberg and M. J. Matari´c. Design and evaluation of robust behavior-based controllers. In T. Balch and L. E. Parker, editors, Robot Teams: From Diversity to Polymorphism. A K Peters Ltd, Natick, MA, Apr. 2002. [11] N. Kalra, D. Ferguson, and A. Stentz. Hoplites: A market-based framework for planned tight coordination in multirobot teams. In Proc. Int’l Conf. on Robotics and Automation, 2005. [12] J. J. Kuffner and S. M. LaValle. RRT-connect: An efficient approach to single-query path planning. In Proc. Int’l Conf. on Robotics and Automation, pages 995–1001, 2000. [13] M. G. Lagoudakis, E. Markakis, D. Kempe, P. Keskinocak, A. Kleywegt, S. Koenig, C. Tovey, A. Meyerson, and S. Jain. Auction-based multi-robot routing. In Robotics: Science and Systems, Cambridge, USA, June 2005. [14] K. Sugawara and T. Watanabe. Swarming robots – foraging behavior of simple multi-robot systems. In Proc. IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, Lausanne, Switzerland, 2002. [15] C. Tovey, M. Lagoudakis, S. Jain, and S. Koenig. The generation of bidding rules for auctionbased robot coordination. In Multi-Robot Systems Workshop, Mar. 2005.
558
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Exploration of Natural Dynamics through Resonance and Chaos Alex Pitti, Max Lungarella, and Yasuo Kuniyoshi Intelligent Systems and Informatics Lab Dept. of Mechano-Informatics The University of Tokyo, Japan Email:{alex,maxl,kuniyosh}@isi.imi.i.u-tokyo.ac.jp Abstract. Self-exploration of movement possibilities and exploitation of natural dynamics are two crucial aspects of intelligent autonomous systems. We introduce a dynamical exploration strategy which combines chaotic neural activity with feedback-induced resonance. The underlying mechanism satisfies three conditions: (1) resonant patterns are discovered even in the presence of noisy sensory feedback; (2) no preprocessing of the feedback signal is required; and (3) exploration is an integral part of the mechanims. An embodied system endowed with our exploration strategy, can autonomously discover and tune into the intrinsic resonant modes most relevant to its body morphology. The mechanism not only allows to explore stable behavioral patterns, but also unstable ones. We demonstrate the role played by our exploration strategy on the emergence of movements through a simulation of a bipedal robot. We analyze with spectral and temporal methods the synchronicity of movements, spatio-temporal coordinations, and bifurcations in the couplings between neural, body, and environmental dynamics. Keywords. Natural dynamics, feedback resonance, coupled chaotic fields, morphology, neural control
1. Introduction The main question addressed in this paper is how an embodied system can autonomously discover modes of movement coordination which exploit the natural dynamics of the body. Traditional approaches to robot control [1] as well as soft-computing methodologies (e.g. reinforcement learning, neural networks, or genetic algorithms) conceive of the control of actions as a mathematical problem neglecting the importance of the interaction with the real world. The emphasis is placed more on precision, reproducibility, and planning of optimal movements guided by feedback. Although these approaches are successful in particular well-controlled application domains (e.g. industrial environments), they face considerable problems when the system’s parameters are unknown, the number of sensor and actuators is subject to changes, or the robustness and adaptivity to unforeseen situations is of the essence. The difficulties have not gone unnoticed, and there have been recent attempts at devising more adaptive control strategies grounded into dynamic systems theory (e.g. [5,10]). The focus of this research effort is on the coupling of brain, body, and en-
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
559
vironment, and the exploitation of the intrinsic dynamics of the individual components. The "control loop" explicitly and purposively exploits factors such as morphology, material properties, natural dynamics, and, more generally, the physical interaction with the real world. In this paper, building up on these ideas and on previous work [6,11], we present and discuss a framework for exploring and exploiting the bodily capabilities of a complex embodied system. The core intuition is that small but well-timed perturbations can be used to push the system into behavioral modes characterized by high behavioral flexibility and low energetic cost. For this purpose, we investigate neural and physical mechanisms involved in the emergence of actions that are driven by the interaction of brain, body, and environment, and propose several tools to quantify them. We focus on two key components governing such emergence: (1) feedback resonance (physical component), and (2) coupled chaotic fields (neural component). In the following section, we detail how these two components affect the emergence of movements. We then describe our experimental setup and a set of methods aimed at shedding some light on the complex interplay between brain, body, and environment. We finally analyze the movement patterns produced by our system and discuss the results.
2. Exploration Mechanism 2.1. Physical component: feedback resonance Feedback resonance is a mechanism to drive a nonlinear oscillator into resonance by varying the frequency of an external perturbation as a function of the oscillation amplitude [8]. The rationale is that such a perturbation destabilizes the system and induces it to move from its current orbit to another more stable one, potentially driving the system closer to resonance. In addition to external perturbations, changes of the system’s global dynamics are also regulated through perturbations induced by the system’s internal dynamics. Modulating the internal dynamics can therefore also provide clues about which external perturbations the system is sensitive to. This approach is inspired by the OGYmethod [7] which demonstrates how a small time-dependent change of a control parameter of a chaotic system can significantly affect the behavior of the system, e.g. turn a chaotic behavior into a periodic one. It is also somewhat related to the notion of "intervention dynamics" which was introduced as a means of funneling the global dynamics of a system through particular characteristic via-points known as "focuses" [5]. Given two coupled systems A and B, we can formulate the principle underlying feedback resonance as: FiA (t + 1) = FiA (t) + γFiB (t), with FiA (t) >> γFiB (t),
(1)
where FiA (t) denotes the "force" exerted on the ith degree of freedom of system A at time t, and γ is a coupling constant. The term γFiB (t) represents the influence of system B on system A. Conversely, the same perturbation scheme can be applied to system B: FiB (t + 1) = FiB (t) + ηFiA (t), with FiB (t) >> ηFiA (t),
(2)
where FiB (t) is the value of the ith degree of freedom of B, and FiA (t) indicates the influence of system A on system B modulated by the coupling constant η. Eqs. 1 and 2 establish a coupling between the two systems which allows internal dynamics and ex-
560
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
ternal perturbations to mutually entrain. Such mutual entrainment is typically accompanied by resonant modes. In our framework, A and B are the neuro-body and the bodyenvironment sub-systems (Fig. 1 a).
Figure 1. Our model: a) theoretical framework, and b) simulated bipedal robot.
2.2. Neural component: dynamically coupled chaotic field We model the neural part of our system as a dynamical network of decoupled chaotic units. These units are connected through afferent connections to the sensors (input), and through efferent connections to the actuators (output). The chaotic system is coupled to the mechanical system (body) through feedback resonance (Eqs. 1 and 2). The effect of entrainment is the emergence of correlations and modes of coordination accompanied by an integrative field of neural units which can not be produced by a predefined and static coupling. We suggest that the interaction between brain and body-environment dynamics couples (or decouples) the chaotic units connected to the body parts involved in the movement. Each chaotic unit is modeled as a logistic map whose chaoticity is determined by the control parameter α: xi (t + 1) = fα (xi (t)),
(3)
fα (xi (t)) = 1 − αxi (t)2 + ηFi (t), where 1 − αxi (t)2 >> ηFi (t). In all our experiments and for all units: α ∈ [0.0, 2.0]. The use of chaotic units is partially justified by findings on human locomotion showing that variations between steps are not mere noise but have a chaotic structure [2]. Some evidence also indicates that chaos might occur at higher levels of the neuromuscular control system [2,3].
3. Methods In this section, we introduce measures for quantifying movement patterns: the duty factor (DF), the spectral bifurcation diagram (SBD), the wavelet transform (WT) and a novel method derived from the wavelet transform, the wavelet bifurcation diagram (WBD). The duty factor is typically used for quantifying and classifying different types of locomotive patterns [14]. It is defined as the fraction of time the legs are on the ground versus the duration of one locomotive cycle. In walking humans, for instance, each foot is on the ground for more than half of the time (DF > 0.5). In running humans, however,
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
γ
η
10
0.01
density
damping
stiffness
561
sampling time (s)
0.6 10 10 0.05 Table 1. Parameters used in our experiments.
it is on the ground for less than half of the time (DF < 0.5). The spectral bifurcation diagram is an attempt to analyze qualitatively the dynamics of high dimensional dynamical systems [9]. It superimposes the spectral density distribution of each variable relative to the current value of the control parameter of the system under study. The wavelet transform is a popular and powerful tool for time series analysis and signal compression [12]. It extends the Fourier representation to the temporal domain. An interesting property of the wavelet transform is that it can be used to identify short- and long-range temporal correlations in time series. We extended the wavelet transform to the analysis of correlations among the chaotic units composing the neural system and the individual body parts. In order to visualize the spatio-temporal patterns and the bifurcations at different scales, we added a dimension spanning the index of the units of the system. We call this measure the wavelet bifurcation diagram.
4. Experiments For our experiments we used a simulated bipedal (two-legged) robot moving in the sagittal plane (Fig. 1 b). Each leg was composed of two limbs linked by a damper-spring joint and had two active and two passive joints. The legs were actuated at the hip by two motors controlled by a neural system consisting of two chaotic units. The input to the chaotic units was provided by angle sensors located in the passive joints (the sensors measured the angle between upper and lower limb). The parameters used for the simulation are given in Table 4. All our simulations were performed with Matlab. By changing the control parameter α, we observed many different movement patterns. For 0 < α < 0.7, the dynamics of the chaotic system and of the body were weakly coupled and perturbed each other only slightly. The small perturbations transmitted by both systems, however, were not strong enough for any movements to occur. For α = 0.7, the system started to rock slowly back and forth without actually changing its position. By increasing α up to 0.8, the robot had enough energy to balance but not enough to lift its legs and locomote, that is, the system began to "crawl" and then from time to time returned to its initial (balanced) position (Fig. 3 a). For α = 0.85 the system had finally enough inertia to lift one of its legs and started moving. The external perturbations, and the speed increased too. This change is reflected by a drop of the duty factor from about 0.65 to 0.40 for speeds v < 0.3 m/s. For α = 0.9, the internal excitations of the chaotic system fed to the motors had enough energy so that both legs could move separately. The system coordinated its legs alternatively by hopping (Fig. 3 b; duty factor DF < 0.20 and speeds v < 0.5 m/s), walking (Fig. 3 c), and jumping (Fig. 3 d; DF < 0.20, 0.5 m/s < v < 1.0 m/s). Note that for α < 1.0 hopping was a more stable and reproducible behavior compared to walking and jumping which require a more precise synchronization between the legs. Systems with hard (k = 50) and soft (k = 10) springs displayed the same stable behaviors (Fig. 2). For α > 1.0 we observed a change of the dynamics depending on the spring stiffness. In the case of hard springs, the mechanical system moved in the
562
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
same stable fashion quite immune to changes of the chaoticity of the neural system. The system with soft springs was able to perform faster coordinated movements but as the control parameter increased the movements were less stable. Its dynamics allowed it to run and jump at v > 1.5 m/s for DF < 0.20. For α > 1.4, the system was highly unstable and did not display any coordinated movements. The perturbations of the chaotic system strongly affected the body movements. External perturbations due to the bodyenvironment interaction had no measurable influence on the neural system.
Figure 2. Duty factor for different values of α and for joints with different stiffness. (a) Hard springs (k = 50); and (b) soft springs (k = 10). α < 1.0 (circles); α > 1.0 (crosses). 600 samples for each category.
Figure 3. Gaits for different values of the control parameter: a) crawling, b) hopping, c) walking, and d) jumping.
5. Analysis Movement patterns have spatio-temporal structure resulting from changes in the coordination dynamics of both body and neural system. The spectral bifurcation diagram illustrates the spectral distribution of the coupling between the neural system and the body as a function of the control parameter α (Fig. 4). The frequencies with high density correspond to those of synchronization between the two dynamical systems. They represent the inner resonant states for which perturbations (or moments of energy exchange) between the two coupled systems are significant. Interestingly, for α < 1.0, the body dynamics has little effect on the neural dynamics and vice versa. We can distinguish the fundamental mode at a frequency of 10 Hz, and harmonics at regular intervals from the fundamental with an harmonic mode around 256 Hz. This corresponds to the excitation of the passive dynamics of the damped springs (the low frequencies) by the chaotic system (the higher frequencies) when the mechanical system balances, crawls, and starts walking and hopping. These frequencies
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
563
Figure 4. Spectral Bifurcation Diagram: (a) sensors, and (b) chaotic units.
represent the lowest energetic cost required by the system to move and locomote and at the same time the maximum amount of external perturbation that the system can absorb while performing the required actions. The wavelet transform for different values of α for the time series extracted from the sensor located in the knee of the rear leg is plotted in Fig. 5 (a). This figure also illustrates the stability and low complexity of the behavior when the robot is poised in balance. Note that for α < 1.0 every scale lacks long-range temporal correlations. Conversely, for α = 0.97, a qualitative change occurs in the interaction between the two dynamical systems. As a result a spectral bifurcation at frequency 128 Hz appears in the SBD indicating more powerful hopping and walking patterns. The wavelet transform in Fig. 5 (a) shows this qualitative change for s = 128 and for s < 50. Temporal correlations are formed by the appearance of long almost single-scale (i.e. periodic) temporal patterns. The wavelet bifurcation diagram (WBD) in Fig. 6 (a) allows to visualize the emergence of rhythmically stable spatio-temporal coordinations of the two legs (units 1 and 2) when the system starts to walk and hop (appearance of stripes at scales 1 and 2). When α increases a little, these dynamical patterns become more unstable (Fig. 6 b). For α > 1.13, the spectral bifurcation at 128 Hz diffuses and gradually activates all the surrounding frequencies. This new modulation affects the stability of the coordination between the chaotic system and the body producing fast walking and running behaviors in addition to hopping and crawling. These frequencies represent the amount of energy that the system can actually absorb to perform these behaviors with regard to internal/external perturbations. These irregulars harmonics are characteristic of the change of stiffness in springs as observed in [13], giving the property of the system to "harden" or "soften" its springs, and thus to have higher flexibility. As can be seen in Fig. 5 (c) shortlasting unstable motions for scales s < 25 form stable and repetitive long-range temporal patterns at higher scales (s > 25, 128, 155 and higher). The stability of the dynamics has changed scale with α implying that stability is scale-dependent. The same result is observed in the case of the WBD (Fig. 6 c). For scales s > 2 the embodied system performs stable "long-range" movements which are accompanied by short-range perturbance-like movements at lower scales. The structure of the spatio-temporal patterns is fractal as in human locomotion [2]. The patterns are formed at the lowest scales and are integrated and coordinated in time at higher scales. Long-lasting movements are thus composed of movements of smaller duration. Increasing α beyond 1.25 gives rise to a second spectral bifurcation at frequency 256 Hz leading to chaotic interaction dynamics between the coupled systems. In this
564
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
mode, the embodied system is highly sensitive to any kind of perturbations and is unstable (almost all the perturbations give rise to resonance). This result seems to contradict the ones obtained with the wavelet transform (Fig. 5 d) and with the WBD (Fig. 6 d). The movements, although unstable and dynamic for scales s < 25, are highly correlated at the higher scales presenting stable movement patterns with long-range temporal correlations. On other words, the stability of the dynamics has changed scale.
Figure 5. Wavelet Transform. From top-left to bottom-right the intervals of the control parameter α are: [0.90; 1.00]; [1.00; 1.10]; [1.10; 1.20]; and [1.20; 1.30]. The arrow indicates a spectral bifurcation.
Figure 6. Wavelet Bifurcation Diagram. From top to bottom the intervals of the control parameter α are: [0.90; 1.00]; [1.00; 1.10]; [1.10; 1.20]; and [1.20; 1.30]. The horizontal axis denotes time, the vertical axis denotes the index of the chaotic unit. The individual graphs have different time-scale resolutions.
A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos
565
6. Conclusion In this paper, we addressed the question of how an embodied system can autonomously explore its movement possibilities and exploit its natural dynamics. We proposed a general framework based on two core phenomena: resonance and chaos. We also introduced a set of quantitative measures to analyze the dynamics of coupled nonlinear system. We observed that the stability of the emergent movement patterns in a simulated bipedal robot is scale dependent and present a spatio-temporal fractal structure. Our future work will be aimed at exploring how our framework might be combined with learning, planning, and intentional behavior.
7. Acknowledgments The authors would like to thank Fumiya Iida for kindly providing the simulation of "Puppy – the dog robot", and the Japanese Society for the Promotion of Science (JSPS) for supporting this research.
References [1] Spong, M.W. and Vidyasagar, M. (1989). Robot Dynamics and Control. New York: Wiley. [2] Hausdorff, J. M., C.-K. Peng, Z. Ladin, J. Y. Wei, and A. L. Goldberger (1995). Is walking a random walk? Evidence for long-range correlations in the stride interval of human gait. J. Appl. Physiol. 78: 349-358, 1995. [3] Kurz MJ, Stergiou N. (2005). An artificial neural network that utilizes hip joint actuations to control bifurcations and chaos in a passive dynamic bipedal walking model. Biological Cybernetics 93(3):213-21. [4] Kaneko, K. and Tsuda, I. (2000). Complex Systems: Chaos and Beyond. New York: SpringerVerlag. [5] Yamamoto,T. and Kuniyoshi, Y. (2002). Global dynamics: a new concept for design of dynamical behavior. Proc. of 2nd Int. Workshop on Epigenetic Robotics, pp. 177-180. [6] Kuniyoshi, Y. and Suzuki, S. (2004). Dynamic emergence and adaptation of behavior through embodiment as coupled chaotic field. Proc. of 17th Int. Conf. on Intelligent Robots and Systems, pp. 2042-2048. [7] Ott, E., Grebogi, C. and Yorke, J. (1990). Controlling Chaos. Physical Review Letters, 64(11):1196-1199. [8] Fradkov, A.L. (1999). Investigation of physical systems by means of feedback. Autom. Remote Control, 60(3). [9] Orrel, D. and Smith, L.A. (2003). Visualising bifurcations in high dimensional systems: the spectral bifurcation diagram. Int. J. of Bifurcation and Chaos, 13(10):3015-3027. [10] Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environmental dynamics: a robotic case-study. Adaptive Behavior, 10(3/4):223-241. [11] Pitti A., Lungarella, M. and Kuniyoshi, Y. (2005). To appear in Proc. of 2nd Australian Conf. on Artificial Life. [12] Mallat S. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(7):674-693. [13] Thompson, J.M.T. and Stewart, H.B. (2002). Nonlinear Dynamics and Chaos. New York: Wiley. [14] McNeill Alexander, R. (2003). Principles of Animal Locomotion. Princeton Book.
566
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
One-Legged Locomotion with a Compliant Passive Joint a
Juergen Rummel a , Fumiya Iida a,b , Andre Seyfarth a Locomotion Laboratory, University of Jena, Dornburger Str. 23, D-07743 Jena, Germany, email: [email protected], web: www.lauflabor.de b Artificial Intelligence Laboratory, University of Zurich, Andreasstrasse 15, CH-8050 Zurich, Switzerland Abstract. There is an increasing attention of exploiting compliant materials for the purpose of legged locomotion, because they provide significant advantages in locomotion performance with respect to energy efficiency and stability. Toward establishing a fundamental basis for this line of research, a minimalistic locomotion model of a single legged system is explored in this paper. By analyzing the dynamic behavior of the system in simulation and a physical robotic platform, it is shown that a stable locomotion process can be achieved without the necessity of sensory feedback. In addition, further analysis characterizes the relation between motor control and the natural body dynamics determined by morphological properties such as body mass and spring constant. Keywords. Legged locomotion, compliance, hopping, two-segmented leg, feedforward control, morphological properties,
1. Introduction While most of the legged robots are composed of rigid materials and controlled by highgain control, there has been an increasing attention to the legged locomotion exploiting elastic materials. In nature, biologists have explored a few important reasons why animals make use of elasticity to achieve rapid legged locomotion. Elastic materials can be used for storing energy which results in energy efficient locomotion [1,2]. In particular, the so-called spring-mass model has been explored for the theoretical analyses of animal running behavior [3,4]. This simple theoretical model which consists of a body represented as a point mass and a leg as a linear spring explained how the legged locomotion can be approximated. Based on these biomechanical studies, a number of legged robots have been developed to understand how elasticity could be exploited for the purpose of locomotion [5,6,7,8,9,10]. Although all of these robotic platforms are designed for the purpose of legged locomotion, there are a few different approaches. One class of locomotion mechanism makes use of prismatic actuation in the legs. In this approach, a prismatic actuator pushes off the body when it touches down the ground [5,6]. In the second approach, robots recognize stance phases of a leg but without using prismatic actuation. The leg is controlled to set a particular angle during swing phase.
567
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint
hip joint
beam
(x1,x2)
(xl,yl)
hip joint
spring
a y
rubber cm
B
lspr
(x6,x7)
x
k
z1
z2
l2 /2
knee joint
A
spring
body
M g x5
thigh
(x3,x4) m1 I1=m1l12/12 l1 /2 shank
x8
m2 2 I2=m2l2 /12
(xr,yr)
Figure 1. A: Fujubot, a one-legged robot with two segments and a compliant passive joint. A based beam holds the robots body. B: Schematic diagram of the simplified dynamic model used in simulations.
We aim to propose another class of legged locomotion with compliant legs, which uses no sensory information. Previously, it has been shown that, if the system has a good morphological properties such as spring stiffness and body dimension, it is able to achieve rapid four legged locomotion [8]. In this model, by simply swinging the legs back and forth, the system is able to achieve hopping locomotion behavior. Moreover, the system has to exploit the body dynamics derived from morphological properties, because there is no sensory information. For a better understanding of dynamic legged locomotion, we reduced the model to a single spring-like but also segmented leg that will be introduced in the next section. In this study, we investigate the characteristics of self-organized locomotion in relation to actuation and morphological properties in which we use a physical robotic platform and a simulation model. The performance criterions speed and actuator torques will be analyzed in detail to prove the behaviors in one-legged locomotion.
2. Methods 2.1. Mechanical application The mechanical application (Fujubot) we used in this investigation is a one-legged robot with two segments shown in Figure 1A. The robot is approximately 15 cm in height and has a total weight of about 0.2 kg including additional masses behind the horizontal bar. Only one digital servomotor is implemented as actuator. This motor actuates the hip joint that is connected to the thigh segment (90 mm). A centered shank segment (80 mm) is connected to the thigh via a passive knee joint. In addition, the shank is elastically linked with the thigh by a spring. The spring stiffness k is 0.2 N/mm with a rest length of 25 mm. The rest angle of the knee joint is circa 130 ◦ . Therefore, the leg length is
568
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint
approximately 12 cm. Two elements of rubber at both ends of shank segment serve as damper when hitting the ground. As shown in the background of Figure 1A, the body of Fujubot is held by a beam (0.34 m) and a base. This mounting, similar to Raiberts tether mechanism [5], constrains the motion of robots body leaving just two degrees of freedom. It moves forward and backward and up and down. Indeed, this robot is built imperfectly, hence, the mentioned retaining application has play in the joints. Especially the pitch axis has a play of up to ±5 ◦ . Nevertheless, with the help of this mounting the robot does not need to be stabilized. Moreover, we are able to investigate the behavior of a segmented spring-like leg without using difficult control strategies. 2.2. Motor control The hip actuator is realized by a position controlled motor. In experiments we used a simple sine oscillation as angular position signal. α(t) =
ωmax sin (2 π f t) + α0 2πf
(1)
The maximum angular velocity ωmax was held constant at 360 ◦ /s and is a motor specific property. Furthermore, we are able to use the frequency f and offset angle α0 as independent control parameters. In the experiments the frequency varied from 1.5 to 8 Hz in steps of 0.5 Hz, where the offset angle ran the gamut from −25 ◦ to +25 ◦ in increments of 5 ◦ . We have to notice that the maximum controllable position of the used motor averages at ± 64 ◦ . Therefore, we limited the amplitude of the position signal at 38 ◦ for frequencies lower than 3 Hz. According to [8], this control strategy does not need global sensory feedback. 2.3. Experimental method In the experiments the robot moved in a circle on a flat ground surface. For kinematic analysis of the robots behavior and locomotion we attached reflective markers on the application. A high-speed (240 Hz) camera system (Qualisys) with six motion capture units recorded the three-dimensional coordinates of the markers. For data analysis we converted the measured trajectories into a two-dimensional plane. 2.4. Simulation model In this study we also used a simplified simulation model as illustrated in Figure 1B. It has 4 degrees of freedom: the body coordinates x1 and x2 and the angular displacement of thigh and shank: x5 and x8 . We replaced the equation of motion for x5 while using kinematical control. The control was the same as discussed in the 2.2 except of the limitation of the amplitude at lower frequencies. The ground is modelled as nonlinear spring and damper in vertical direction [11]. For horizontal direction, our ground reaction model is more complex. Here, we implemented a state machine for describing sliding friction and adhesive force when the foot stands still. It switches from sliding to sticking when the velocity of the foot becomes lower than a specified threshold. It switches back when the horizontal force becomes higher
569
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint 0
20
2.5
-1
1
0
A
-3 -2
10
-1
0
0 1
-10
-20
-20
5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]
2 1
2.5
2
0
offset angle a0 [deg]
2.5
2
10
-10
1
0
-2
offset angle a0 [deg]
20
B
2.5
2.5
2
2 5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]
Figure 2. Average speed in horizontal direction of the robot (A) and the simulation model (B). The velocity is measured in leg lengths per second, whereas the leg is 12 cm in length.
than the force of sliding friction. See appendix for a detailed description of the ground model. A difference of this model to the real existing leg is that this model has no play R 7.01 with additional toolboxes in the joints. We simulated this model using Matlab R Simulink and SimMechanics.
3. Results Locomotion behavior of the two-segmented leg explained in the previous section is tested and analyzed by using the robotic platform and the simulation model. First, we explore the relation between locomotion behavior and the control parameters, offset angle α0 and the frequency of hip oscillation f . With every set of these two parameters, we conduct experiments for 30 seconds. We repeat the experiments two times and record the kinematic data of the robot by using the motion tracking system mentioned in section 2.3. By varying these parameters, we observe four clearly distinguishable patterns of locomotion, i.e. forward locomotion with one and two ground contacts, hopping at place and backward locomotion. Figure 3B shows typical forward locomotion of this robot in which the robot touches down with the right end of lower segment. The spring is extended due to the ground reaction force. Because the robot efficiently uses the spring, this behavior generally results in faster locomotion. Another type of locomotion behavior shows ground contacts of both ends of the lower segment as shown in Figure 3A. Generally, after the right point touches down, the left end of the segment also hits the ground which results in slower but more stable locomotion behavior. This type of behavior is, however, mostly dependent on the parameter of offset angle as shown in Figure 3E. The backward locomotion shown in Figure 3C hardly uses the passive joint, which leads to unstable speed as explained later in this section. This behavior results in fast locomotion, but rarely. In order to understand underlying mechanisms of these behaviors, we investigate locomotion speed with respect to the control parameters as shown in Figure 2A. For this analysis, we split the two-dimensional trajectories into cycles defined by motor oscilla-
570
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint 0.35
1.0
y [m] 0
10 0
0.35
0.5
-10
0.2 0.2
A
0.5
0.2
offset angle a0 [deg]
0.1
0.5
20
-20
y [m]
0.1
D
5 7 2 3 4 6 8 frequency f of hip oscillation [Hz] 20
B
offset angle a0 [deg]
0
y [m]
0.1
C
0.2
10 0
0.3
0.4
-10 -20
0
0
0.1
0.2
x [m]
0.3
E
0.5
0.6
0.5
5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]
Figure 3. A-C: The stick figures shows two cycles of robots locomotion behavior. The direction of movement is described by the arrows. Parameters of motor control: A: f = 3 Hz, α0 = −15 ◦ , B: f = 5 Hz and α0 = 10 ◦ and C: f = 7.5 Hz, α0 = 25 ◦ . The landscape in D illustrates the standard deviation of the robots horizontal velocity measured in leg lengths per second. In figure E, the length of time is shown where the left foot-point touches the ground in addition to the right foot-point. The values are normalized to cycle time of motor oscillation.
tion. For every set of parameters, offset angle and frequency, we randomly selected 120 movement cycles. The average speed of each cycle was calculated and we averaged it again. It is clearly shown that there are three peaks of speed in the landscape of control parameters, i.e. one peak in the lower frequency and lower offset angle, one in the middle, and the negative peak at the higher frequency and high offset angle. These peaks are corresponding to the forward locomotion with two ground contacts, the normal forward and the backward locomotion. We conduct simulation experiments in the same manner, and it results in a similar perspective as shown in Figure 2B. Here, the limit where direction of hopping changes is displaced into lower frequencies. In contrast, the region of lower offset angles, where high speed locomotion is observed, spreads out to higher frequencies. These differences could be caused by the non-existence of compliance or play in the hip joint of the model. For further characterization of the locomotion behavior we explored the periodicity of the cyclic motion. The periodicity is analyzed using standard deviation of average speeds of each motion cycle as shown in Figure 3D. The deviation was calculated by using the previous locomotion speed data. In this analysis, the standard deviation is to interpret as opposite value to periodicity. This means, low deviation in speed of the motion cycles describes high periodicity and vice versa. The figure shows that more periodic lo-
571
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint
f = 5 Hz
8 6
1
0
2
0
0,1
2
4
1 1
0
2
0
0 0
0,2
0,4
0,6
0,8
10
0,2 0,4 0,6 0,8 10 spring stiffness k [N/mm]
0,2
0,4
0,6
0,8
1
0
mean of hip torques [Ncm]
-1
0
2
0,2
3
1
3
1
0.5
body mass M [kg]
4
0,3
10
2
-1
f = 4 Hz
2
f = 3 Hz
0,4
Figure 4. Simulation results: Absolute values of averaged hip torques during stance phase. The contour lines illustrate the forward velocity as additional performance criterion. The velocity is measured in leg lengths per second. The offset angle α0 is held constant at 10 ◦ . Cleared regions indicate that the foot does not leave the ground.
comotion can be achieved with lower offset angles, and it becomes more unstable as the offset is increased. By considering the locomotion speed shown in Figure 2A, the best performance can be achieved by locomotion with two points in ground contact. Another region of higher performance is shown around the frequency of 5 Hz and an offset angle of approximately 10 ◦ . Here, a higher speed and a local minimum in deviation of speed was achieved where the robot shows normal forward locomotion. So far, we have investigated the influence of control on morphology of hopping behavior. An additional analysis of the locomotion performance is explored in simulation with respect to the spring stiffness k and the body mass M . Here, we investigated the effects of these mechanical properties on the torque in the hip joint and again on speed as illustrated in figure 4. Both issues provide information about energy efficiency of locomotion. We calculated the torque by using the ground reaction forces during stance in conjunction to the leg vector. The data were averaged over 10 locomotion cycles in which the simulated leg was in steady state of hopping. The figure shows that with higher mass the torque mostly increases and therefore the raised energy increases too. Indeed, there is a linear connection between body mass and stiffness that causes in higher velocities if the legs natural oscillation frequency is considered. Furthermore, we observe that with low mass, i.e. 0.2 kg, the behavior is more independent on the stiffness but the speed is lower. Therefor, an exact adjustment of spring stiffness is not essential for achieving locomotion with compliant legs.
4. Conclusion This paper presented preliminary experimental results of a single legged locomotion by using a robotic platform and simulation. Despite its simplicity, this locomotion model shows a few implications which are potentially important for our better understanding of dynamic locomotion behavior. Even though this model is still attached to a supporting beam to prevent rotational movement of the body, the locomotion behavior is simply achieved without the necessity of sensory feedback. This is possible mainly because the system can self-stabilize
572
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint
F2 T1
F1
F1 Mg
F2
Fg2y Fg2x
Fspr Fspr
j2 F3 m2 g F4
j1 F4
m1 g F3
Fg1y Fg1x
Figure 5. Dynamics of the two-segmented leg.
itself into a stable periodic movement by exploiting the body dynamics derived from mechanical properties and the ground interaction with the environment. In particular, it is important to note that morphological properties (e.g. body mass and spring stiffness) are the important design parameters in order to make this locomotion possible. One of the direct applications of this approach could be four (or more) legged locomotion as shown in some of the previous work [6,8,9]. Although the relation of the proposed locomotion model and the control of rotational movement has to be investigated further in the future. The minimalistic model explored in this paper provides further insights of self-stabilizing character in the processes of locomotion in general.
Appendix Here, we introduce the ground reaction model in detail. The ground reaction forces Fg.. are exemplarily given for the right foot-point described as (xr , yr ). All variables correspond to Figure 1A and Figure 5. ⎧ e2 e1 ⎨ yr x˙ r √ −p1 l0 sgn(yr ) + p2 · mt g for yr < 0, g l0 Fg1y = ⎩ 0 otherwise, ⎧ Δxr e1 ⎪ ⎪ −p 1 ⎪ l0 ⎪ e2 ⎪ ⎪ ⎪ x˙ r ⎪ √ ⎪ · sgn(Δx ) + p · mt g for yr < 0, and so long as r 3 ⎪ ⎪ g l0 ⎨ Fg1x − μ1 Fg1y > 0 Fg1x = ⎪ ⎪ ⎪ for yr < 0, and so long as −sgn (x˙ r ) μ2 Fg1y ⎪ ⎪ ⎪ ⎪ | x˙ r | − vlim > 0 ⎪ ⎪ ⎪ ⎪ ⎩0 otherwise,
J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint
Δxr = xr − xr0
573
mt = M + m1 + m2
The moment the state machine for describing horizontal ground reaction forces changes into adhesive friction, the horizontal displacement xr is saved as reference xr0 . In our simulations we used the following constants: g = 9.81 m s−2
M = 0.2 kg
m1 = 0.01 kg
m2 = 0.01 kg
l0 = 120 mm
μ1 = 0.40
μ2 = 0.45
p1 = 2.5 · 105
p2 = 3
p3 = 1
e1 = 3
e2 = 1
vlim = 0.01 m s−1 Acknowledgements This research is supported by the German Research Foundation (DFG). References [1] Cavagna, G. A., Heglund, N. C. and Taylor, C. R., Mechanical work in terrestrial locomotion: two basic mechanisms of minimizing energy expenditure. Am. J. Physiology 233, R243R261, 1977 [2] Alexander, R. McN., Three uses for springs in legged locomotion, Int. J. Robotics Res., 9, Issue 2, 53-61, 1990 [3] Blickhan, R., The spring-mass model for running and hopping. J. Biomech. 22, 1217-1227, 1989 [4] McMahon, T. A. and Cheng, G. C., The mechanics of running: how does stiffness couple with speed? J. Biomech. 23, 65-78, 1990 [5] Raibert, M. H., Legged Robots That Balance, The MIT Press, Cambridge, 1986 [6] Cham, J. G., Bailey, S. A., Clark, J. E., Full, R. J. and Cutkosky, M. R., Fast and robust: hexapedal robots via shape deposition manufacturing, Int. J. Robotics Res. 21, No. 10, 869883, 2002 [7] Buehler, M., Saranli, U., Papadopoulos, D. and Koditschek, D., Dynamic locomotion with for and six-legged robots, Int. Symposium on Adaptive Motion of Animals and Machines, Montreal, 2000 [8] Iida, F. and Pfeifer, R., Cheap rapid locomotion of a quadruped robot: self-stabilization of bounding gait. Intelligent Autonomous Systems 8, IOS Press, 642-649, 2004 [9] Fukuoka, Y., Kimura, H. and Cohen, A. H., Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts, Int. J. Robotics Res. 22, No. 3-4, 187-202, 2003 [10] Zhang, Z. G., Fukuoka, Y. and Kimura, H., Stable quadrupedal running based on a springloaded two-segment legged model, Proc. ICRA2004, 2601-2606, 2004 [11] Guenther, M., Computersimulationen zur Synthetisierung des muskulaer erzeugten menschlichen Gehens unter Verwendung eines biomechanischen Mehrkoerpermodells. PhD Thesis, University of Jena, 1997 [12] Altendorfer, R., Moore, N., Komsuoglu, H., Buehler, M., Brown Jr., H. B., McMordie, D., Saranli, U., Full, R. J. and Koditschek,D. E., RHex: a biologically inspired hexapod runner, Autonom. Robots 11, 207-213, 2001 [13] Seyfarth, A., Geyer, H., Guenther, M. and Blickhan, R., A movement criterion for running. J. Biomech 35, 649-655, 2002
574
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Analysis of Dynamical Locomotion of Two-Link Locomotors Kojiro MATSUSHITA, Hiroshi YOKOI, Tamio ARAI The University of Tokyo, Precision Engineering Dept. Japan
Abstract: In this paper, we demonstrated simple locomotive robots, which consist of two links connecting at one joint and are actuated with simple oscillation, in both virtual and real world. Then, the locomotion of those robots was analyzed in terms of gait and exploitation of its own morphology. As results, we have firstly recognized two different gaits: kicking ground type and falling-over type. Then, the third gait has been acquired and it indicated a clue to acquire dynamically bipedal locomotion. Keywords: Legged robot, Dynamical Locomotion, Morphology, Edutainment
Introduction In the field of legged locomotion robotics, implementation of control is mainly focused so that many types of gait control and balance control are theorized and applied to legged robots [7,11]. Common characteristics of these robots in mechanism are made of rectangular-solid materials, actuated with many high-drive electronic motors, and controlled with high-speed information processing. With these configurations, the conventional robots are able to keep its balance (balance control) and their gaits are generated in the same methodology as robot arm manipulators: stance legs are regarded fixed links and swing legs are regarded as effecters during one step motion. By iteration of the manipulation, the robots achieve locomotion. Meanwhile, in recent years, there have been designed legged robots with biologically inspired knowledge such as anatomy, physiology, and neuroscience [1]. For example, passive dynamic walker [2,4,10] is a robot, which well-designed for bipedal locomotion - straight legs, curved feet with passive hip joints and latch knee joints as human body components and, then, the robot achieves dynamically stable walking on some incline by driving force from not motor actuation but gravity through its own morphology. Thus, it is known that morphological characteristics help to exploit its own physics and it can be regarded as computational functionality (morphological computation [5]) in embodied artificial intelligence [6]. In our research, we focus on morphological characteristics of legged robots. As our hypothesis, inter-dependence between morphology and controller should be an important factor [8] to achieve dynamical locomotion. We define dynamical locomotion as exploiting gravity and inertial force during its locomotion. In this paper, firstly, we introduce edutainment course as heuristic design methodology in order to acquire locomotive robots. Secondly, we show the results of analysis on the remarkable legged robots in both real and virtual world. Finally, we indicate one type of dynamical locomotion by the interdependence between morphology and controller.
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
575
Robot Edutainment as Heuristic Design Method We utilized edutainment course as heuristic design methodology in order to acquire various locomotive robots. The edutainment course is organized for 20 students in master course at engineering department and aimed at teaching importance of morphology. The main characteristic was that students could build their own robot structure with rapid proto-typing method as described in next section. Students designed locomotive robots with one or two motors with electric-circuit controllers and we instructed use of the controller with sample programs, which generates simple oscillation.
A.
Electric Circuit: Simple Oscillation Controller
Electric circuit Controllers are built on microchip H8/3664 (16MHz) and functions to control three RC-servo motors, to read two light sensors and two touch sensors, to communicate with a computer through serial port (fig.1, table1). The oscillation algorithm for RC servo motors is the followings: motor axis moves to two different angle-position alternately at an certain cycle. Students could modify the angle-positions and the cycle as control parameters. The students could use light detect sensor, however, we did not describe on the sensor in this paper.
Fig.1 Controller circuit
B.
Table1 Control circuit for edutainment Microchip H8/3664 (16MHz) Communication RS232 RC Servo Control 3ch Sensor Input 4ch LED 4ch Battery 9V
Rapid Proto-Typing Method: Plastic Bottles Based Robots
In the robot edutainment course, we applied rapid proto-typing method: plastic bottles are used as body parts, RC-servo motors as actuated joints, and connected with hot glue. The advantage is easy-assembling, easy-modifying, and economizing because, generally, it takes long time to build robots and difficult to change its morphology. It is sure that preciseness is less than metal materials, however, enough to realize desired behaviors as proto-type robots.
C.
Results: Locomotive Robots Designed by Students
Locomotive robots designed by students as shown in fig.2, were qualitatively categorized into three types at the criteria of locomotion gaits: shuffling, kickingground, and falling-over. The shuffling type moves forward by a high-frequently oscillation at a motor, which is approximately 4Hz. The locomotion is generated by friction forces at contact points. The kicking ground (KG) type moves forward by
576
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
kicking the ground with a rear leg. In other words, driving forward force is generated with the kicking motion. The falling-over (FO) type moves forward by falling over after the motor lifts up its fore leg (fig.3). In short, the FO type changes its body shape and, then, exploits gravity to fall over to forward. Table 2 indicates time traveled 1 meter forward of designed robots. The shuffling type was much slower than other types and difficult to move straight forward so that they normally could not reach the goal. Meanwhile, the best KG type and FO type reached 1 meter at approximately 10 seconds. We assumed that the shuffling type contacted the ground all the time and its locomotion was influenced by its floor condition so much. Therefore, actuation force is subtracted by friction force and driving force was relatively small and not unidirectional. It is no wander that this type is slower than KG and FO types, which efficiently utilize actuation force to driving force (we do describe on the KG and FO type in next chapter).
Fig.2 Students’ locomotors Table 2 categorization and ranking of students’ robots Locomotion Type Inversed Pendulum (falling) Shuffling Kicking Ground
Locomotive Energy Gravity Inertia force/ Friction Actuation
Number of robots 5 7 6
Ranking 1 2 3 4 5 - 20
Record Time of 1Meter Race 7.4 sec 8.1 sec 11.0 sec 1 m 30 sec More than 5 min / No goal
Locomotion Type Kicking Ground Inversed Pendulum Kicking Ground Inversed Pendulum Others
Preparation State
Moving Forward State 䋺RC Servo Motor
Kick Kick-Ground Type Move rear leg
Rear leg kicks ground Lift up
Inversed-Pendulum Type Move fore leg
Lifting up fore leg and falling over
Fig.3 Schematic snap shots of two types at each step during locomotion
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
577
Investigation on Morphology of Two Locomotors In this chapter, we mainly analyze on locomotion characteristic of two remarkable locomotors, which acquired in the edutainment course. Firstly, we observed their ground contact information during locomotion in real world, which represents their gaits. Secondly, those robots are modeled and simulated in three-dimensional world and, then, their ground contact information including ground reaction force are recorded. A.
Analysis of Locomotive Robots in Real World
In the previous chapter, we categorized that the KG type moves forward by kicking the ground (fig4a, fig5a) and the FO type moves forward by falling-over (fig4b, fig5b). In this chapter, we quantitatively analyze the locomotion by recording ground contact information.
Length*Width*Height =0.30m*0.25m*0.20m Weight = 3 kg
Length*Width*Height =0.35m*0.30m*0.25m Weight = 3.5 kg
(a) KG type
(b) FO type Fig.4 Locomotors designed in edutainment
(a) KG type
(b) FO type
㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋
㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋
㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋
㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋
㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉
㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉
Fig.5 photographic strip
㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅 㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽
㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅
㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽
㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅 㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽
㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽
㪍㪇㪇㪇
㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅
㪎㪇㪇㪇
㪏㪇㪇㪇
㪐㪇㪇㪇
㪍㪇㪇㪇
㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴
(a) KG type (Cycle=1 sec)
㪎㪇㪇㪇 㪏㪇㪇㪇 㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴
(b) FO type (Cycle=1 sec)
Fig.6 ground contact information of locomotors
㪐㪇㪇㪇
578
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
As results, we have macroscopically seen similar gaits between the KG type and the FO type: their rear legs were all the time on the ground and their fore legs are taken off the ground for approximately 0.5 second when the types opened their legs. However, they have microscopically indicated difference characteristics. In the case of the KG type, the fore leg took the ground off approx. 0.2 sec before and approx. 0.2 sec after the KG type started opening its legs. We assumed that the center of gravity (CoG) should be on the rear leg at the time for 0.2 sec and, then, the rear leg kick the ground for 0.2 sec. On the contrary, in the case of the FO type, the fore leg took the ground off 0.1 sec after the legs started opening. It seems that the fore leg was lifted up for 0.1 sec and, then, the rear leg fell over forward for 0.2 sec. Thus, the ground contact information implies their locomotive patterns. However, with real robots, it is difficult to acquire some data such as ground reaction force, which represents gaits. Therefore, we modeled the KG and FO types for analysis of their locomotive patterns. Motor Angle Rear Leg
Motor
Mass
Mass Rear Leg
Fore Leg
Fore Leg Angle
Mass: 0.5kg at 0.0m on F-leg Angle: PI/2 rad F-leg: 0.65*0.1*0.1m, 0.5kg R-leg: 0.35*0.1*0.1m, 0.5kg
Mass: 0.5kg at 0.35m on F-leg Angle: PI/4 rad F-leg: 0.65*0.1*0.1m, 0.5kg R-leg: 0.35*0.1*0.1m, 0.5kg
(a) KG type
(b) FO type Fig.7 Simulated and modeled locomotors
(a) KG type
(b) FO type Fig.8 photographic strip 㪲㪥㪴
㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾
㪌
㪇
㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝
㪍
㪍㪅㪌
㪎
㪎㪅㪌
㪏
㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼
㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼
㪲㪥㪴
㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾
㪌
㪇 㪍
㪍㪅㪌
㪎
㪎㪅㪌
㪏
㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝
㪫㫀㫄㪼㩷㪲㫊㪼㪺㪴
㪫㫀㫄㪼㪲㫊㪼㪺㪴
Maximum Amplitude=PI/2 Rotary Speed=PI/2 [rad/sec] Cycle=1.4 [sec]
Maximum Amplitude=PI/3 Rotary Speed=PI/2 [rad/sec] Cycle =1.0 [sec]
(a) KG type
(b) FO type
Fig.9 ground contact information and ground reaction force of two locomotors
㪏㪅㪌
㪐
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
B.
579
Analysis of Locomotive Robots in Three-Dimensional Simulation
In the previous section, we have got clues to distinguish their locomotion patterns. In this section, we modeled the locomotive robots and simulated in three-dimensional world (implemented with three-dimensional library – Open Dynamic Engine [9]) in order to observe their internal states. Fig. 7 shows the modeled KG type and FO type in simulation. The design is slightly different from original design in real world because of simplification. By using the same body components, which are two links connecting at a joint and a mass ball, we compared between those two locomotive robots. The morphological and controller parameters are searched heuristically, and those simulated robots showed similar locomotion patterns to real ones as shown in fig8. We observed their ground contact information including ground reaction force (GRF). As results, fig.9 shows ground contact information and transition of GRF. The fig.9a indicates the following states of the KG type: the CoG was on the rear leg (GRF at the foreleg gradually decreased and GRF at the rear leg gradually increased), the rear leg started kicking the ground (GRF at the rear leg shows its peak and decreased and, then, GRF at the rear leg is zero for 0.1 sec), stayed in the air (GRF at both legs are zero for 0.05 sec), the fore leg landed (GRF at the fore leg is peak) for one locomotion cycle. Meanwhile, the fig9b shows that the FO type is similar to crawling because the rear leg keeps relatively constant GRF and even the fore leg keeps constant GRF except its landing peak. So far, we have shown internal state during locomotion. However, it is necessary to reveal the advantages of their locomotion so that distance traveled forward at amplitude and frequency as control parameters are examined. Fig.10 shows results: both types commonly traveled most forward at approx. 0.5 [sec] on cycle axis (The maximum motor speed is PI/2 [rad/sec]). However, the area where is more than 2 [sec] on cycle axis and more than 3PI/8 [rad] on amplitude axis, shows different characteristic. This indicates that the KO type achieves moving forward by direct kicking the ground, therefore, it can kick the ground at both fast and slow cycle. Oppositely, the FO type moves forward by lifting up the fore leg so that the lifting up motion requires fast cycle because the fore leg is not lifted up and it does not move forward if the cycle is slow. Thus, it revealed that the KG type has more controllability than the FO type.
(a) KG type
(b) FO type
Fig.10 Distance traveled at different control parameters
C.
Same-Length Leg Type
In the previous section, we analyzed two robots, which were the same morphology, but actuation angle range was different. In this section, we also focused on two link
580
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
structure and investigate on the same length type (SL type). We implemented both simulated and real locomotors as shown in fig11. As results, the SL type achieves qualitatively similar locomotion between virtual and real. Comparing with GRF of the KG and FO types, the SL type is different because it does not have landing peaks during locomotion but more smooth transition. Furthermore, the remarkable difference is seen in fig.14. The map of distance traveled is completely different from ones of the KG and FO types. The SL traveled forward more than 6 meters at one peak on the map at the area where is 1 [sec] on cycle axis and PI/4 [rad] on amplitude axis. In the term of controllability, the SL is lower than others. However, the results indicates the interdependence between morphological and control parameters achieves smooth transition as dynamical locomotion and better locomotion ability. Motor
Rear Leg
Angle
Fore Leg
Mass: 0.5kg at 0.05m behind motor F-leg: 0.50*0.1*0.1m, 0.5kg R-leg: 0.50*0.1*0.1m, 0.5kg
(a) Simulated robot
(b) Schematic
(c) Real robot
Fig.11 Simulated and Real locomotors
(a) Simulated robot
(b) Real robot Fig.12 photographic strip 㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋
㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾
㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋
㪌
㪇 㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝
㪍
㪍㪅㪌
㪎
㪺 㪎㪅㪌
㪏
㪏㪅㪌
㪐
㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉
㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼
㪲㪥㪴
㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅 㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽
㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅 㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽
㪫㫀㫄㪼㩷㪲㫊㪼㪺㪴
(a) Simulated robot (Cycle=0.80 sec)
㪍㪇㪇㪇
㪎㪇㪇㪇 㪏㪇㪇㪇 㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴
㪐㪇㪇㪇
(b) Real robot (Cycle=0.70 sec)
Fig.13 Ground contact and ground reaction force
Furthermore, the SL type generates symmetrical characteristic in its locomotion at some specific control parameters as shown in fig15: the robot starts iteration of switching swing leg and support leg at one spot in simulation and its GRF is similar to M letter shape, which represents human walking gait [3]. We assume that this locomotion is close to bipedal locomotion and utilizing pitching oscillation indicates
581
K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors
exploiting its own physics through morphology. Therefore, we believe three link robot is appropriate design for bipedal locomotion because it should control its pitching oscillation.
㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼
㪲㪥㪴
㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾
㪌
㪇 㪍
㪍㪅㪌
㪎
㪎㪅㪌
㪏
㪏㪅㪌
㪐
㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝 㪫㫀㫄㪼㪲㫊㪼㪺㪴
Same Length and Mass Rear Type Amplitude = Angle between legs = 6PI/64, Mass Position=-0.05 Fig.14 Distance traveled at different control parameters
Simulated robot (Cycle=0.80 sec) Fig.15 ground contact information and ground reaction force
Conclusion In this paper, we investigated on dynamical locomotion as exploiting its own physics through morphology. The locomotive robots are designed in edutainment course as heuristic method. Then, we selected remarkable robots and analyzed them in both real and virtual world. As results, two-link locomotors have shown two different gaits. Furthermore, we have observed the third gait - pitching oscillation, which switch swing leg and support leg, and the transition of its ground reaction force is similar to M letter shape, which is characterized human dynamic walking. Therefore, we found that twosame-length link robot exploits pitching oscillation during locomotion and we believe that the design can be a clue to achieve dynamical locomotion of legged robots.
Reference [1] [2]
Alexander, R. “Principles of Animal Locomotion.” Princeton University Press, New Jersey, 2002 Collins,S.H., Wisse,M. and Ruina,A. “A 3-D passive-dynamic walking robot with two legs and knees.” Int. J. of Robotics Research, 20(7):607-615, 2001. [3] Komi,P.V., Gollhofer,A., Schmidtbleicher,D. and Frick,U. “Interaction between man and shoe in running: considerations for a more comprehensive measurement approach.” Int. J. of Sports Medicine, 8:196-202, 1987. [4] McGeer,T. “Passive dynamic walking.” Int. J. Robotics Research, 9(2):62–82, 1990. [5] Paul,C. “Morphology and computation.” Proc. of Int. Conf. on the Simulation of Adaptive Behaviour, 2004. [6] Pfeifer,R. and Scheier,C. “Understanding Intelligence.” Cambridge, MA: MIT Press, 1999. [7] Raibert, M. H. “Legged Robots That Balance.” MIT Press, Cambridge, MA, 1986 [8] Sims, K. “Evolving 3D morphology and behavior by competition.” In R. Brooks and P. Maes, editors, Artificial Life IV Proceedings. MIT Press, pp 28-39, 1994 [9] Smith, R. “Open Dynamics Engine.” URL: http://ode.org/, 2000. [10] Van der Linde,R.Q. Design, “Analysis and control of a low power joint for walking robots, by phasic activation of McKibben muscles.” IEEE Trans. Robotics Automation, 15(4):599-604, 1999. [11] Vukobratovic,M. and Juricie,D. “Contribution to the synthesis of biped gait.” IEEE Tran. On BioMedical Engineering, 16(1):1-6, 1969.
This page intentionally left blank
Part 10 Mobiligence
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
585
A Modular Robot That Self-Assembles Akio Ishiguro a,1 , Hiroaki Matsuba a , Tomoki Maegawa a and Masahiro Shimizu a a Dept. of Computational Science and Engineering, Nagoya University Abstract. One of the most graceful phenomena widely observed in nature is selfassembly; living systems spontaneously form their body structure through the developmental process. While this remarkable phenomenon still leaves much to be understood in biology, the concept of self-assembly becomes undeniably indispensable also in artificial systems as they increase in size and complexity. Based on this consideration, this paper discusses the realization of self-assembly with the use of a modular robot. The main contributions of this paper are twofold: the first concerns the exploitation of emergent phenomena stemming from the interplay between the control and mechanical systems; the second is related to the implementation of different adhesiveness among the modules. Here, form generation by self-assembly is considered as the result of time evolution toward the most dynamically stable state. Preliminary simulation results show that stable and spontaneous self-assembly is achieved irrespective of the initial positional relationship among the modules. Keywords. Self-assembly and self-repair, modular robot, emergent phenomena, morphological computation, fully decentralized control
1. Introduction Self-assembly is a phenomenon where basic units gather together and form some specific configuration stably and spontaneously. In the natural world, this phenomenon is observed widely among living and non-living things. For example, hydra, a primitive coelenterate, is known to reproduce itself by recreating its original body structure even if it is artificially dissociated into cells. Amphipathic molecules organized from hydrophilic and hydrophobic groups spontaneously form a micelle or vesicle structure. Even in artificial constructions, this concept will become essential as the system to be designed increases in size and complexity. In addition, as in the hydra example, self-assembly is a concept strongly associated with self-repair. Therefore, discussion toward the realization of self-assembly can also be expected to provide useful information for artificial construction of highly fault tolerant systems. Based on this viewpoint, several studies are currently underway in the field of robotics to realize self-assembly using a modular robot — or sometimes referred to as a reconfigurable robot. A modular robot, which was originally initiated by Fukuda and his colleagues, is a robotic system that consists of multiple mechanical units called modules[1]. In contrast to a conventional robot on a fixed-morphology basis, a modular robot can reconfigure itself according to the situation encountered by actively altering 1 Corresponding
Author: Department of Computational Science and Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan. Tel.: +81 52 789 3167; Fax: +81 52 789 3166; E-mail: [email protected].
586
A. Ishiguro et al. / A Modular Robot That Self-Assembles
the relative positional relationship among the modules. What has be noticed is that, in many of these studies, the behavior of each module in the self-assembly process is individually determined in a fully algorithmic manner: a control algorithm — or sometimes an external operator — precisely specifies which modules should be connected physically as well as how each module should be moved. Under this kind of “rigorous” control method, however, the control algorithm required may become extremely complicated and intractable as the system scale increases in size and complexity. Consider the self-assembly phenomenon observed in the natural world. It should be noted that there is no entity corresponding to an algorithm that fully specifies all events in the self-assembly process behind this phenomenon; the resultant form is emerged by exploiting a physiochemical interaction between substances as well. For example, different intercellular adhesiveness are known to play a significant role in the time evolution to the final form in an individual morphogenesis. Since self-assembly is based on the presumption that the number of system components is extremely large, we believe it is essential to adopt an approach that explicitly considers the exploitation of physical interactions between modules from the initial stage of investigation. In light of these facts, this study aims at the development of a modular robot that enables stable and spontaneous self-assembly from the initial to the final configuration by exploiting emergent phenomena stemming from the interplay between the control and mechanical systems. In order to achieve this goal, this study has focused on a “functional material” and a “mutual entrainment among nonlinear oscillators”. By using the former as an inter-module spontaneous connectivity control mechanism and the latter as a core control mechanism for the generation of inter-module cohesive force, we attempt to realize self-assembly in a fully decentralized manner. Note that this study considers form generation by self-assembly as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state. Here, the behavior of individual modules is not determined only by a control algorithm; exploitation of the inter-module physical interaction is also well considered, which allows us to significantly reduce the computational cost required for the connectivity control.
2. Previous and Related Work This section briefly introduces several prior studies related to self-assembly. Murata et al. discussed a method which enables a staged formation toward a goal structure using a modular robot called Fractum [2]. Yoshida et al. extended Fractum to a threedimensional modular robot [3]. Using a modular robot called M-TRAN, Kamimura et al. realized self-assembly to a serpentine or four-legged robot [4]. These studies, however, were based on a full algorithmic manner: the control algorithm precisely specifies the behavior of each module to realize the goal structure, particularly focusing on efficient procedures. Similar approaches are also found in [5]-[10]. On the other hand, several studies have also been reported on self-assembly achieved through fully exploiting physical interactions between modules (e.g. [11]). These studies, however, discussed self-assembly in terms of micrometers or smaller, and it still remains unclear whether they can be directly employed for centimeter-order or larger applications. Recently, inspired by a seminal study done by Hosokawa et al. [12], Bishop et al. have constructed a self-assembling robot [13]. In their study, however, since the
A. Ishiguro et al. / A Modular Robot That Self-Assembles
587
physical interaction between the modules greatly depends on random collision, efficient time evolution to the final form may be difficult.
3. The Proposed Method 3.1. The Design Strategies The design strategies for the development of a modular robot that enables self-assembly employed in this study can be summarized as followings: 1. Self-assembly is considered as an emergent phenomenon. More specifically, form generation by self-assembly is regarded as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state. 2. In the self-assembly process, the behaviors of individual modules are not determined simply by a control algorithm. Exploitation of the physical interaction between modules is also considered in detail. In what follows, we will explain how we designed the control and mechanical systems of a modular robot having the ability to self-assemble according to the above idea.
Figure 1. A schematic of the mechanical structure of the module considered in this study. (left) top view. (right) side view. Note that no active control mechanism that precisely specifies the connection/disconnection among the modules is implemented. Instead, a spontaneous connectivity control mechanism exploiting a functional material, i.e. a genderless Velcro strap, is employed. This allows us to induce emergent properties in artificial “morphogenesis”.
3.2. The Mechanical System As an initial step of the investigation, a two-dimensional modular robot has been considered, consisting of many identical modules, each of which has a mechanical structure like the ones shown in Figure 1. Each module is equipped with telescopic arms and a ground friction control mechanism (explained later). Each module is also equipped with a mechanism which informs the module whether it is positioned as an inner module or a surface module within the entire system1 . Note that the module is covered with a functional material. More specifically, we have used a genderless Velcro strap as a practical example, since this intrinsically has interesting properties: when two halves of Velcro come into contact, they are connected easily; and when a force greater than the yield 1 This could be done by some sort of sensory mechanism or in an algorithmic manner; we do not intensively consider the detail of this mechanism in this paper.
588
A. Ishiguro et al. / A Modular Robot That Self-Assembles
Figure 2. A schematic of the active mode and the passive mode (A side view of two connected modules is shown for clarity). Note that a module in the passive mode sticks to the ground and acts as a supporting point.
strength is applied, the halves will come apart automatically. Exploiting the property of this material itself as a “spontaneous connectivity control mechanism", we can expect not only to reduce the computational cost required for the connection control significantly, but also to induce emergent properties in morphology control. The property of the connectivity control mechanism is mainly specified by the yield stress of Velcro employed: connection between the modules is established spontaneously where the arms of each module make contact; disconnection occurs if the disconnection stress exceeds the Velcro’s yield stress. We also assume that local communication between the connected modules is possible, which will be used to create the phase gradient inside the modular robot (discussed below). In this study, each module is moved by the telescopic actions of the arms and by ground friction. Note that each module itself does not have any mobility but can move only by the collaboration with other modules. 3.3. The Control System Under the above mechanical structure, now we consider a fully decentralized control method that enables the modules to move toward a particular equilibrium configuration, i.e. goal shape, stably and efficiently. It should be noted here that we do not use any active control for the connection/disconnection among the modules. Instead, we have employed a spontaneous connectivity control mechanism exploiting a functional material, i.e. a genderless Velcro strap. In order to take full advantage of this mechanism, in this study, we intend to generate a cohesive force inside the modular robot, acting like an effect stemming from the surface tension. In the following, we will show how we have achieved this requirement. 3.3.1. Active mode and passive mode Before a detailed explanation of the control algorithm developed, here we firstly define the basic operation of each module. Each module of this modular robot can take one of the following two exclusive modes at any time: active mode and passive mode. As shown in Figure 2, a module in the active mode actively contracts/extends the connected arms, and simultaneously reduces the ground friction. In contrast, a module in the passive mode increases the ground friction2 , and returns its arms to their original length. Note that a module in the passive mode does not move itself but serves as a “supporting point" for efficient movement of the modules in the active mode. 2 Slime
molds use this mechanism. This is called a pseudopod.
A. Ishiguro et al. / A Modular Robot That Self-Assembles
589
3.3.2. Basic requirements In order to generate an appropriate cohesive force that effectively drives all the modules toward a particular equilibrium configuration, the following two things have to be carefully considered in developing a control algorithm: • The mode alternation in each module. • The extension/contraction of each telescopic arm in a module. Of course, this control algorithm should be implemented in a fully decentralized manner. In addition, it should not depend on the number of the modules and the morphology of the entire system. To this end, we have focused on nonlinear oscillators since they exhibit an interesting phenomenon called the mutual entrainment when they interact with each other. In this study, we have implemented a nonlinear oscillator onto each module of the modular robot. We then create phase gradient inside the modular robot through the mutual entrainment among the locally-interacting nonlinear oscillators. This is done by local communication among the physically-connected modules. By exploiting the phase gradient created as key information for controlling the mode alternation and the arm extension/contraction in each module, we have successfully generated a cohesive force inside the modular robot in a fully decentralized manner. In what follows, we will explain this in more detail. 3.3.3. Creating phase gradient through the mutual entrainment As a model of a nonlinear oscillator, the van der Pol oscillator (hereinafter VDP oscillator) has been employed, since this oscillator model has been well-analyzed and widely used for its significant entrainment property. The equation of the VDP oscillator implemented on module i is given by αi x ¨i − βi (1 − x2i )x˙ i + xi = 0,
(1)
where the parameter αi specifies the frequency of the oscillation. βi corresponds to the convergence rate to the limit cycle. The local communication among the physically connected modules is done by the local interaction among the VDP oscillators of these modules, which is expressed as: ⎧ ⎫ i (t) ⎨ 1 N ⎬ xi (t + 1) = xi (t) + ε xj (t) − xi (t) , (2) ⎩ Ni (t) ⎭ j=1
where xj represents the state of the neighboring module physically connected to module i, which is obtained through the local communication mechanism. Ni (t) is the number of modules neighboring module i at time t, respectively. The parameter ε specifies the strength of the interaction. Note that this local interaction acts like a diffusion. When the VDP oscillators interact according to Eq. (2), significant phase distribution can be created effectively by varying the value of αi in Eq. (1) for some of the oscillators. In order to create an equiphase surface effective for generating an appropriate cohesive force, we set the value of αi as: 1.3 if the module is positioned as a surface module αi = (3) 1.0 if the module is positioned as an inner module
590
A. Ishiguro et al. / A Modular Robot That Self-Assembles
Note that the value of αi is increased when the module is positioned as a surface module in the entire system. As a result, the frequency of the VDP oscillators in the outer modules will be relatively decreased compared to the ones in the inner modules. This allows us to create the phase gradient inside the modular robot, which can be effectively exploited to endow the entire system with the cohesive force (explained later). Figure 3 shows the phase distribution created through the mutual entrainment when the modules are arranged circularly. In the figure, arrows — each of which represents the direction of the gradient vector at the corresponding point — are also depicted for clarity. It should be noticed that the direction of each gradient vector points toward the center of the entire system.
Figure 3. Phase distribution created through the mutual entrainment among the VDP oscillators in a circular arrangement. The gray scale denotes the value of the phase at the corresponding point. Each arrow represents the direction of the gradient vector at the corresponding point. Note that all the arrows point toward the center.
3.3.4. Generating cohesive force Based on the above arrangements, here we will explain how we designed the control algorithm. More specifically, we will show how the mode alternation and the arm extension/contraction in each module is controlled by exploiting the phase gradient created from the aforementioned mutual entrainment among the locally-interacting VDP oscillators. The mode alternation in each module is simply done according to the phase of its VDP oscillator, which is expressed as: 0 ≤ θi (t) < π : active mode (4) π ≤ θi (t) < 2π : passive mode where, θi (t) denotes the phase of oscillation of the VDP oscillator in module i at time t, which is written by θi (t) = arctan
x˙ i (t) . xi (t)
(5)
On the other hand, the extension/contraction of each arm mounted on module i in the active mode is determined according to the phase difference with its corresponding neighboring module. This is given by Fim (t) = −k{θjm (t) − θi (t)},
(6)
591
A. Ishiguro et al. / A Modular Robot That Self-Assembles
where, Fim (t) is the force applied for the extension/contraction of the m-th arm of module i at time t. k is the coefficient. θjm (t) represents the phase of the neighboring module physically connected to the m-th arm of module i. Due to the algorithm mentioned above, the degree of arm extension/contraction of each module will become most significant along the phase gradient (see Figure 3), and the timings of the mode alternation are propagated from the inner modules to the outer ones as traveling waves. As a result, all the modules are encouraged to move toward the center of the entire system, by which a cohesive force will be automatically generated. 3.4. Experiments 3.4.1. Simulation results In order to verify the proposed method, simulations were conducted under the condition where the number of modules was set to 100; βi = 1.0; i = 0.01; all the modules were placed initially so as to form a rectangular shape. A representative data is shown in Figure 4. These snapshots are in the order of time evolution of the self-assembly process. As the figures explain, all the module move such that the entire system forms a disk-like shape. This is because a disk-like shape is the equilibrium configuration — the most dynamically stable form — in this case, as observed in soap bubbles. Note that the spontaneous connectivity control mechanism stemming from the genderless Velcro straps is fully exploited in this process.
(a) Initial state
(b) 20000 steps
(c) 30000 steps
(d) 50000 steps
Figure 4. : Time evolution of the self-assembly (see in alphabetical order). The entire system consists of 100 modules. In this simulation, all the modules are identical. Due to a cohesive force similar to an effect stemming from surface tension, the modules converge to a disk-like form. Note that the spontaneous connectivity control stemming from the genderless Velcro straps is fully exploited in this process.
3.4.2. Changing equilibrium configuration In the abovementioned result, a disk-like form was dynamically most stable; a module group starts its self-assembly process toward this form from any initial form. One question then arises. How can we modify this dynamically most stable form? In other words, how can we change the equilibrium configuration in the self-assembly process? In the following, we will investigate this question by focusing in particular on different cell adhesiveness that are observed in the morphogenesis, i.e., the development of multicellular organisms. At the initial stage of the morphogenesis, a fertilized egg repeats cell division to become an aggregate of many cells. As this process proceeds, the cells are differentiated
592
A. Ishiguro et al. / A Modular Robot That Self-Assembles
into a variety of cell types; they are grouped together depending on the genes expressed in each cell. During this process, proteins called cell adhesion molecules on the cell surface are known to specify which cells it can adhere to. Steinberg clearly explained that this different intercellular adhesiveness seriously affects the final configuration of the organisms (known as the Steinberg’s thermodynamic model)[14][15]. Inspired by his remarkable idea, we have investigated how the equilibrium configuration could be altered by different adhesiveness. For simplicity, we have employed a modular robot consisting of two types of modules — module A and module B, each of which has its own specific inter-module adhesiveness. For convenience, let the adhesion strength between module A-A, module A-B, and module B-B be WAA , WAB and WBB . Although there are many possible relationships among WAA , WAB and WBB , as an initial step of the investigation, we consider the following two cases: Case 1 for WAA = WBB > WAB ; Case 2 for WAA > WAB > WBB . Possible structures for Case 1 and Case 2 are illustrated in Figure 5. Representative simulation results obtained under Case 1 and Case 2 are shown in Figure 6 and Figure 7, respectively. In both cases, the entire system consisted of 100 modules: 50 for module A; 50 for module B. The solid and empty circles in the figures denote module A and module B, respectively. As Figure 6 explains, over time the entire system forms into two groups, each of which consists of the same type of module (see (d)). Note that this is the equilibrium configuration in this case. In contrast, the equilibrium configuration in Case 2 is different from the one observed in Case 1: the modules with lower adhesive strength, i.e., module B envelopes the ones with higher adhesive strength, i.e., module A. These results obtained fit well with the Steinberg’s thermodynamic model.
(a) Case 1: WAA = WBB > WAB
(b) Case 2: WAA > WAB > WBB
Figure 5. : Possible structures for Case 1 and Case 2. Note that (1) the binding force between modules (corresponding to intercellular adhesiveness) is proportional to the genderless Velcro strap sticking area; (2) gendered Velcro straps in addition to genderless ones are also exploited in Case 2.
4. Conclusion and Further Work This paper has investigated the realization of self-assembly with the use of a modular robot consisting of many mechanical modules. The main contribution of this study can be summarized as follows: first, we considered form generation by self-assembly as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state; second, the behavior of each module was not determined simply by the control algorithm. The inter-module physical interaction was also well exploited in the
A. Ishiguro et al. / A Modular Robot That Self-Assembles
(a) Initial state
(b) 10000 steps
(c) 25000 steps
593
(d) 50000 steps
Figure 6. : Time evolution of the self-assembly in Case 1 (see in alphabetical order). The entire system consists of 100 modules: 50 for module A; 50 for module B. The solid and empty circles denote module A and module B, respectively. Interestingly, over time the entire system forms into two groups, each of which consists of the same type of module (see (d)). Note that this is the equilibrium configuration in this case. The result obtained fits well with the Steinberg’s thermodynamic model.
(a) Initial state
(b) 10000 steps
(c) 60000 steps
(d) 180000 steps
Figure 7. : Time evolution of the self-assembly in Case 2 (see in alphabetical order). The entire system consists of 100 modules: 50 for module A; 50 for module B. The solid and empty circles denote module A and module B, respectively. The equilibrium configuration in this case is different from the one in Case 1: the modules with lower adhesive strength, i.e., module B envelope the ones with higher adhesive strength, i.e., module A. This result also fits well with the Steinberg’s thermodynamic model.
process of self-assembly. To this end, we focused on a functional material, i.e. a gendered/genderless Velcro strap, which was used as an inter-module spontaneous connectivity control mechanism; third, the self-assembly was done in a fully decentralized manner by using the mutual entrainment among the locally-interacting nonlinear oscillators; fourth and finally, we investigated how the equilibrium configuration could be altered, by borrowing the idea from the Steinberg’s thermodynamic model. To verify the feasibility of our proposed method, an experiment with a real physical modular robot is significantly indispensable. A prototype of the module currently under construction is represented in Figure 8 [16]. In this prototype model, the telescopic arm is implemented as an air-driven cylinder. After checking the operation of the modules, we are planning to construct more than 20 modules.
Acknowledgment This work has been partially supported by a Grant-in-Aid for Scientific Research on Priority Areas “Emergence of Adaptive Motor Function through Interaction between Body, Brain and Environment” from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
594
A. Ishiguro et al. / A Modular Robot That Self-Assembles
Figure 8. : The structure of a prototype model. (left) a broken down view of the module. The telescopic arm is implemented as an air-driven cylinder. The outer walls are covered with Velcro straps, which are not shown here for clarity. (right) the real module. This module can be driven in a fully self-contained manner by using a high-pressure-air cartridge.
References [1] T. Fukuda and Y. Kawauchi, Cellular Robotic System (CEBOT) as One of the Realization of Self-Organizing Intelligent Universal Manipulators, Proc. of IEEE ICRA (1990), 662–667. [2] S. Murata, H. Kurokawa, and S. Kokaji, Self-Assembling Machine, Proc. of IEEE ICRA (1994), 441–448. [3] E. Yoshida, S. Murata, H. Kurokawa, K. Tomita and S. Kokaji, A Distributed Reconfiguration Method for 3-D Homogeneous Structure, Proc. of 1998 IEEE/RSJ IROS (1998), 852–859. [4] A. Kamimura, S. Murata, E. Yoshida, H. Kurokawa, K. Tomita, and S. Kokaji, SelfReconfigurable Modular Robot – Experiments on Reconfiguration and Locomotion –, Proc. of 2001 IEEE/RSJ IROS (2001), 590–597. [5] M. Dorigo, et al., Evolving Self-Organizing Behaviors for a Swarm-bot, Autonomous Robots (2004), 223-245. [6] M. W. Jorgensen, E. H. Ostergaard and H. H. Lund, Modular ATRON: Modules for a SelfReconfigurable Robot, Proc. of 2004 IEEE/RSJ IROS (2004), 2068–2073. [7] K. Kotay and D. Russ, Generic Distributed Assembly and Repair Algorithms for SelfReconfiguring Robots, Proc. of 2004 IEEE/RSJ IROS (2004), 2362–2369. [8] M. Yim, C. Eldershaw, Y. Zhang, and D. Duff, Self-reconfigurable Robot Systems: PolyBot, Journal of the Robotics Society of Japan, 21(8) (2003), 851-854. [9] A. Castano, W.-M. Shen, and P. Will, CONRO: Towards Miniature Self-Sufficient Metamorphic Robots, Autonomous Robots (2000), 309-324. [10] C-J. Chiang and G. Chirikjian, Modular Robot Motion Planning Using Similarity Metrics, Autonomous Robots, 10(1) (2001), 91-106. [11] K. Fujibayashi and S. Murata, A Method of Error Suppression for Self-Assembling DNA Tiles, Preliminary Proc. of 10th International Meeting on DNA Computing (DNA10) (2004), 284–293. [12] K. Hosokawa, I. Shimoyama, and H. Miura, Dynamics of Self-Assembling Systems: Analogy with Chemical Kinetics, Artificial Life 1 (1995), 413–427. [13] J. Bishop et al., Programmable Parts: A Demonstration of the Grammatical Approach to SelfOrganization, Proc. of 2005 IEEE/RSJ IROS (2005), 2644-2651. [14] M. S.Steinberg, Reconstruction of Tissues by Dissociated Cells, Science, 141(3579) (1963), 401-408. [15] F. Graner and J.A. Glazier, Simulation of Biological Cell Sorting Using a Two-Dimensional Extended Potts Model, Physical Review Letters, 69(13) (1992), 2013-2016. [16] M. Shimizu, T. Kawakatsu, and A. Ishiguro, A Modular Robot That Exploits Emergent Phenomena, Proc. of 2005 IEEE ICRA (2005), 2993-2998.
595
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Autonomous Robust Execution of Complex Robotic Missions Paul Robertson, Robert Effinger, and Brian Williams MIT CSAIL, 32 Vassar Street, Building 32-272 Cambridge, MA 02139
Abstract. A model-based system for autonomous robot health management and execution of complex robotic explorations is described that handles software, hardware, and plan execution faults robustly. Keywords: Model-Based Autonomy, Health Management, AI, Space Exploration.
1. Introduction Getting robots to perform even simple tasks can be fraught with difficulty. Sending a robot to a distant planet, such as Mars, raises a number of significant issues concerning autonomy. The intended mission itself may be quite complex, conceivably involving coordinated work with other robots. An example of such an anticipated mission is to send a team of robots to Mars where they will, among other things, construct habitat that will be used by astronauts that arrive at a later time. Quite apart form the complexity of the mission, these robots are likely to be expensive especially when considering the cost of transporting them to the planet surface. The loss of a robot would therefore be a serious loss that could jeopardize the feasibility of the astronaut arrival. For this reason robotic exploration (Figure 1) of Mars (Sojourner, Spirit, and Opportunity) have not been autonomous except for very minor navigation capability.
Figure 1 NASA Rover Spirit
Operating rovers from earth is an excruciating process because the time delay caused by the time it takes a signal to travel between Earth and Mars does not permit tele-operation of the rovers. Consequently a large team of mission planners must plan and test, in excruciating detail, plans for the next days operations. This approach will not work for more complex missions such as cooperative robots assembling an astronaut habitat.
596
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
In our rover testbed we are attempting to perform such complex cooperative missions using a team of iRobot ATRV robots (Figure 2a) with arms mounted on them for instrument placement/collection and construction tasks. The hostile nature of space and the uncertainty of the Martian environment means that we must consider repair of a mission plan as well as adapting to failed components. In modern systems these failed systems are as likely to be software systems as hardware. Indeed many of NASA’s recent problems have been software problems. In this paper we describe our approach to dealing with perturbations with a particular emphasis to the software/mission planning aspect of robot autonomy. In complex, concurrent critical systems, every component is a potential point of failure. Typical attempts to make such systems more robust and secure are both brittle and incomplete. That is, the security is easily broken, and there are many possible failure modes that are not handled. Techniques that expand to handling component level failures are very expensive to apply, yet are still quite brittle and incomplete. This is not because engineers are lazy – the sheer size and complexity of modern information systems overwhelms the attempts of engineers, and myriad methodologies, to systematically investigate, identify, and specify a response to all possible failures of a system. Adding dynamic intelligent fault awareness and recovery to running systems enables the identification of unanticipated failures and the construction of novel workarounds to these failures. Our approach is pervasive and incremental. It is pervasive in that it applies to all components of a large, complex system – not just the “firewall” services. It is incremental in that it coexists with existing faulty, unsafe systems, and it is possible to incrementally increase the safety and reliability of large systems. The approach aims to minimize the cost, in terms of hand-coded specifications with respect to how to isolate and recover from failures.
Figure 2 (a) MIT MERS Rover Testbed, (b) Deep Space 1: Fight Experiment
There are many reasons why software fails. Among the more common reasons are the following: 1. Assumptions made by the software turn out not to be true at some point. For example, if a piece of software must open a file with a given path name it will usually succeed but if the particular disk that corresponds to the path name fails the file will not be accessible. If the program assumed that the file was accessible the program will fail. In highly constrained situations it is possible to enumerate all such failures and hand code specific exception handlers – and such is the standard practice in the
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
597
industry. In many cases however, particularly in embedded applications, the number of ways that the environment can change becomes so large that the programmer cannot realistically anticipate every possible failure. 2. Software is attacked by a hostile agent. This form of failure is similar to the first one except that change in the environment is done explicitly with the intent to cause the software to fail. 3. Software changes introduce incompatibilities. Most software evolves of its lifetime. When incompatibilities are inadvertently introduced software that previously did not fail for a given situation may now fail. Whatever the reason for the software failure we would like the software to be able to recognize that it has failed and to recover from the failure. There are three steps to doing this: Noticing that the software has failed; Diagnosing exactly what software component has failed; and Finding and alternative way of achieving the intended behavior. In order for the runtime system to reason about its own behavior and intended behavior in this way certain extra information and algorithms must be present at runtime. In our system these extra pieces include models of the causal relationships between the software components, models of intended behavior, and models of correct (nominal) execution of the software. Additionally models of known failure modes can be very helpful but are not required. Finally the system needs to be able to sense, at least partially, its state, it needs to be able to reason about the difference between the expected state and the observed state and it need to be able to modify the running software such as by choosing alternative runtime methods. Building software systems in this way comes with a certain cost. Models of the software components and their causal relationships that might otherwise have existed only in the programmers head must by made explicit, the reasoning engine must be linked in to the running program, and the computational cost of the monitoring, diagnosis, and recovery must be considered. In some systems the memory footprint and processor speed prohibit this approach. More and more however memory is becoming cheap enough for memory footprint to not be an issue and processor power is similarly becoming less restrictive. While the modeling effort is an extra cost there are benefits to doing the modeling that offset its cost. Making the modeling effort explicit can often cause faults to be found earlier than would otherwise be the case. The developers can choose the fidelity of the models. More detailed models take more time to develop but allow for greater fault identification, diagnosis, and recovery. Finally our approach to recovery assumes that there is more than one way of achieving a task. The developer therefore must provide a variety of ways of achieving the intended behavior. The added costs of building robust software in this way are small when compared to the benefits. Among the benefits it allows us to: Build software systems that can operate autonomously to achieve goals in complex and changing environments; Build software that detects and works around “bugs” resulting from incompatible software changes;Build software that detects and recovers from software attacks; and Build
598
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
software that automatically improves as better software components and models are added. Prior Work: Self-adaptive software has been successfully applied to a variety of tasks ranging from robust image interpretation to automated controller synthesis [7]. Our approach, which is described below, builds upon a successful history of hardware diagnosis and repair. In May 1999 the spacecraft Deep Space 1 [1], shown in Figure 2b, ran autonomously for a period of a week. During that week faults were introduced which were detected, diagnosed, and recovered from by reconfiguring the (redundant) hardware of the spacecraft. Subsequently Earth Observer 1 has been flying autonomously planning and executing its own missions. Extending these technologies to software systems involves extending the modeling language to deal with the idiosyncrasies of software such as its inherently hierarchical structure [6]. Approach: At the heart of our system is a model-based programming language called RMPL that provides a language for specifying correct and faulty behavior of the systems software components. The novel ideas in our approach include method deprecation and method regeneration in tandem with an intelligent runtime modelbased executive that performs automated fault management from engineering models, and that utilizes decision-theoretic method dispatch. Once a system has been enhanced by abstract models of the nominal and faulty behavior of its components, the model-based executive monitors the state of the individual components according to the models. If faults in a system render some methods (procedures for accomplishing individual goals) inapplicable, method deprecation removes the methods from consideration by the decision-theoretic dispatch. Method regeneration involves repairing or reconfiguring the underlying services that are causing some method to be inapplicable. This regeneration is achieved by reasoning about the consequences of actions using the component models, and by exploiting functional redundancies in the specified methods. In addition, decision-theoretic dispatch continually monitors method performance and dynamically selects the applicable method that accomplishes the intended goals with maximum safety, timeliness, and accuracy. Beyond simply modeling existing software and hardware components, we allow the specification of high-level methods. A method defines the intended state evolution of a system in terms of goals and fundamental control constructs, such as iteration, parallelism, and conditionals. Over time, the more that a system’s behavior is specified in terms of model-based methods, the more that the system will be able to take full advantage of the benefits of model-based programming and the runtime model-based executive. Implementing functionality in terms of methods enables method prognosis, which involves proactive method deprecation and regeneration, by looking ahead in time through a temporal plan for future method invocations. Our approach has the benefit that every additional modeling task performed on an existing system makes the system more robust, resulting in substantial improvements over time. As many faults and intrusions have negative impact on system performance, our approach also improves the performance of systems under stress. Our approach provides a well-grounded technology for incrementally increasing the robustness of complex, concurrent, critical applications. When applied pervasively, model-based execution will dramatically increase the security and reliability of these systems, as well as improve overall performance, especially when the system is under stress.
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
599
2. Fault Aware Processes Through Model-based Programming Recall, to achieve robustness pervasively, fault adaptive processes must be created with minimal programming overhead. Model-based programming elevates this task to the specification of the intended state evolutions of each process. A model-based executive automatically synthesizes fault adaptive processes for achieving these state evolutions, by reasoning from models of correct and faulty behavior of supporting components. Each model-based program implements a system that provides some service, such as secure data transmission. This is used as a component within a larger system. The model-based program in turn builds upon a set of services, such as name space servers and data repositories, implemented through a set of concurrently operating components, comprised of software and hardware. 2.1. Component Services Model The service model represents the normal behavior and the known and unknown aberrant behaviors of the program’s component services. It is used by a deductive controller to map sensed variables to queried states. The service model is specified as a concurrent transition system, composed of probabilistic concurrent constraint automata 0. Each component automaton is represented by a set of component modes, a set of constraints defining the behavior within each mode, and a set of probabilistic transitions between modes. Constraints are used to represent co-temporal interactions between state variables and intercommunication between components. Constraints on continuous variables operate on qualitative abstractions of the variables, comprised of the variable’s sign (positive, negative, zero) and deviation from nominal value (high, nominal, low). Probabilistic transitions are used to model the stochastic behavior of components, such as failure and intermittency. Reward is used to assess the costs and benefits associated with particular component modes. The component automata operate concurrently and synchronously. 2.2. Self Deprecation and Regeneration Through Predictive Method Dispatch In model-based programming, the execution of a method will fail if one of the service components it relies upon irreparably fails. This in turn can cause the failure of any method that relies upon it, potentially cascading to a catastrophic and irrecoverable system-wide malfunction. The control sequencer enhances robustness by continuously searching for and deprecating any requisite method whose successful execution relies upon a component that is deemed faulty by mode estimation, and deemed irreparable by mode reconfiguration. Without additional action, a deprecated method will cause the deprecation of any method that relies upon it, potentially cascading to catastrophic system-level malfunction. Model-based programmers specify redundant methods for achieving each desired function. When a requisite method is deprecated, the control sequencer attempts to regenerate the lost function proactively, by selecting an applicable alternative method, while verifying overall safety of execution. More specifically, predictive method selection will first search until it finds a set of methods that are consistent and schedulable. It then invokes the dispatcher, which passes each activity to the deductive controller as configuration goals, according to a
600
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
schedule consistent with the timing constraints. If the deductive controller indicates failure in the activity’s execution, or the dispatcher detects that an activity’s duration bound is violated, then method selection is reinvoked. The control sequencer then updates its knowledge of any new constraints and selects an alternative set of methods that safely completes the RMPL program. 2.3. Self-Optimizing Methods Through Safe, Decision-Theoretic Dispatch In addition to failure, component performance can degrade dramatically, reducing system performance to unacceptable levels. To maintain optimal performance, predictive method dispatch utilizes decision-theoretic method dispatch, which continuously monitors performance, and selects the currently optimal available set of methods that achieve each requisite function. 2.4. A Simple Example To describe the machinery of Predictive Method Selection and Decision-Theoretic Dispatch in more detail, we present a simple example. In this example a MER (Figure 1) rover is tasked with performing science operations on the surface of Mars. Specifically, in this example, the rover is asked to visit two Martian rocks and sample their composition. The primary goal is to scratch each rock using a grinder (a.k.a. rock abrasion tool) and then analyze the rock’s composition. As a secondary goal, if for some reason the primary goal is unattainable, the rover should take a high resolution image of the rock. In Figure 3, below, we show how this simple example can be encoded as a Temporal Plan Network (TPN). A TPN [8] can be viewed as a generalization of a Simple Temporal Network (STN) that supports multiple redundant methods, method deprecation and regeneration, and optimal planning. Ask grinder = ok
Ask grinder = ok
Start
Rover1.sample( target1 ) Rover1.goto(p1) [5,10]
[2,5]
5
Rover1.image( target1) [2,5]
End
Rover1.sample( target2 ) Rover1.goto(p2) [5,10]
10
[2,5]
5
Rover1.image( target2 ) [2,5]
Rover1.goto(p3) [5,10]
10
Tell grinder = ok [0,+inf]
Figure 3: Temporal Plan Network encoding of the MER Rover Example.
In a TPN, nodes represent instantaneous time events, and arcs represent activities with duration. The duration of an activity is constrained by a lower and upper timebound, specified in brackets, [lb,ub]. By default, a blank arc has [0,0] timebounds. Redundant methods in a TPN are denoted by a double circle and a circle with two horizontal lines. One thread of execution must be chosen between each pair of double circle node and double line node in the TPN. This construct allows the autonomous system to select from multiple redundant methods at runtime. Method deprecation and regeneration is enabled through Ask and Tell operators. For example, in Figure 3, we enforce that the grinder be in working condition if we wish to sample the composition of a rock. This condition is enforced by placing an arc over each sample(Target)
601
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
activity which asks that the condition grinder = ok holds true during the entire execution of the activity. The dispatcher enforces that an Ask condition is satisfied by covering it with a Tell condition. For example, at the bottom of the plan in Figure 3, we tell the plan that grinder = ok for the duration of the plan. Additionally, a cost Step 1: Predictive Method Selection: - initially select a set of methods that are consistent, schedulable, and optimal Ask grinder = ok
Ask grinder = ok
Start
Rover1.sample( target1 ) [2,5]
Rover1.goto(p1) [5,10]
5
[2,5]
Rover1.goto(p2)
Rover1.image( target1)
[5,10]
5
Rover1.goto(p3)
Rover1.image( target2 )
10
[2,5]
End
Rover1.sample( target2 )
[5,10]
10
[2,5] Tell grinder = ok [0,+inf]
Step 2: Deductive Controller indicates failure of an activity during execution - the grinder fails and can’t sample target 1 Ask grinder = ok
Start
Failure
Rover1.sample( target1 ) [2,5]
Rover1.goto(p1) [5,10]
5
[2,5]
Ask grinder = ok
[5,10]
End
Rover1.sample( target2 ) [2,5]
Rover1.goto(p2)
Rover1.image( target1)
Catch: Type: sampleTarget Handler: Rover1.retract_arm Tell: grinder = broken
Throw Exception: Type: sampleTarget Reason: grinder = broken
5
Rover1.goto(p3)
Rover1.image( target2 )
10
[2,5]
[5,10]
10
Tell grinder = ok [0,+inf]
Step 3: Method Deprication: - update the plan with new information (grinder = broken), sample(Targets) is now depricated Ask grinder = ok
Ask grinder = ok
Start
Rover1.sample( target1 ) Rover1.goto(p1) [5,10]
[2,5]
5
Rover1.image( target1) [2,5]
End
Rover1.sample( target2 ) [2,5]
Rover1.goto(p2) [5,10]
10
[2,5] Tell gripper =
Broken
5
Rover1.image( target2 )
Rover1.goto(p3) [5,10]
10
[0,+inf]
Step 4: Reinvoke Predictive Method Dispatch: - choose the optimal set of methods, while avoiding any depricated methods Ask grinder = ok
Ask grinder = ok
Start
Rover1.sample( target1 ) Rover1.goto(p1) [5,10]
[2,5]
5
Rover1.image( target1) [2,5]
End
Rover1.sample( target2 ) [2,5]
Rover1.goto(p2) [5,10]
Rover1.image( target2 )
10
[2,5] Tell grinder = Broken
5
Rover1.goto(p3) [5,10]
10
[0,+inf]
Figure 4: A walkthrough example.
can be associated with each activity in the TPN. This allows the planner to choose the set of available methods that minimizes cost, or analogously, maximizes reward. The cost of an activity is denoted by a number, and as shown in Figure 4 we give each sample(Target) activity a cost of 5 and each image(Target) activity a cost of 10. By giving each sample(Target) activity a lower cost than the associated image(Target) activity, we are abiding by the plan requirement that sampling each rock is the primary goal, and imaging each rock is just the secondary goal.
602
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
Next, we walk through each step of the MER rover example depicted above in Figure 4. Initially, in step 1, Predictive Method Selection is invoked and an optimal consistent and schedulable plan is selected. We see that among the redundant methods of sampling and imaging each rock, the planner correctly chooses to sample each rock (since this is the option with the least cost). Now that a plan is selected, it is sent to the dispatcher. In step 2, we assume that the grinder breaks, causing a failure exception to be thrown. When the failure occurs, the plan is updated with any new information (grinder = broken) and then method selection is re-invoked. However, as shown in step 3, we see that since the grinder is now broken, the Ask condition associated with each sample(Target) activity is no longer satisfied, thus the activities must be deprecated. In step 4, predictive method selection is re-invoked, and the optimal consistent and schedulable plan is selected. As is shown, with a broken grinder, the only consistent alternative for the MER rover is to take high-resolution images of the two targets.
3. Results Initial testing of the described system has been performed by augmenting the MIT MERS rover test bed. The rover test bed consists of a fleet of ATRV robots within a simulated Martian terrain. By way of example we describe one mission whose robustness has been enhanced by the system. Two rovers must cooperatively search for science targets in the simulated Martian terrain. This is done by having the rovers go to the selected vantage points looking for science targets using the rover’s stereo cameras. The rovers divide up the space so that they can minimize the time taken in mapping the available science targets in the area. The paths of the rovers are planned in advance given existing terrain maps. The plan runs without fail. Between them the rovers successfully find all of the science targets that we have placed for them to find. The scenario is shown below in Figure 5. p4
p5 p5
1 1
2 2
p1
p2
p3 p3
Figure 5: Rover test bed experimental platform
In the test scenario two faults are introduced by placing a large rock that blocks rover #1’s view of one of the designated areas. When rover #1 gets into its initial position to look for science targets its stereo cameras detect the unexpected rock obscuring its view. This results in an exception that disqualifies the current software component from looking for targets. Since the failure is external to the rover software the plan itself is invalidated. The exception is resolved by replanning which allows the both rovers to modify their plans so that the second rover observes the obscured site from a different vantage point. The rovers continue with the new plan but when rover #2 attempts to scan the area for science targets the selected vision algorithm fails due
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
603
to the deep shadow being cast by the large rock. Again an exception is generated but in this case a redundant method is found – a vision algorithm that works well in low light conditions. With this algorithm the rover successfully scans the site for science targets. Both rovers continue to execute their plan without further failure. R o v e r 1 . f in d T a r g e t s R o v e r 1 . g o t o ( p 2 )
R o v e r1 .g o to (p 3 )
S ta rt
E nd R o v e r1 .g o to (p 1 )
R o v e r 1 . g o t o ( p 2 ) R o v e r 1 . f in d T a r g e t s R o v e r 1 . g o t o ( p 3 )
R o v e r2 .g o to (p 5 )
R o v e r 2 .fin d E g g s
R o v e r2 .g o to (p 6 )
R o v e r2 .g o to (p 6 )
R o v e r 2 .g o to ( p 3 ) R o v e r 2 .fin d T a r g e ts
F a ilu r e
R o v e r2 .g o to (p 3 )
Figure 6: The TPN for the two-rover exploration plan. Failure due to an obscuration (rock) results in automatic online replanning so that the mission can continue.
4. Comparison with Current Technology 4.1. Model-based Programming of Hidden States The reactive model-based programming language (RMPL) is similar to reactive embedded synchronous programming languages like Esterel. In particular, both languages support conditional execution, concurrency, preemption and parameter less recursion. The key difference is that in embedded synchronous languages, programs only read sensed variables and write to controlled variables. In contrast, RMPL specifies goals by allowing the programmer to read or write ``hidden'' state variables. It is then the responsibility of the language's model-based execution kernel to map between hidden states and the underlying system’s sensors and control variables. 4.2. Predictive and Decision-theoretic Dispatch RMPL supports nondeterministic or decision theoretic choice, plus flexible timing constraints. Robotic execution languages, such as RAPS, [2], ESL[4] and TDL[3], offer a form of decision theoretic choice between methods and timing constraints. In RAPS, for example, each method is assigned a priority. A method is then dispatched, which satisfies a set of applicability constraints while maximizing priority. In contrast, RMPL dispatches on a cost that is associated with a dynamically changing performance measure. In RAPS timing is specified as fixed, numerical values. In contrast, RMPL specifies timing in terms of upper and lower bound on valid execution times. The set of timing constraints of an RMPL program constitutes a Simple Temporal Network (STN). RMPL execution is unique in that it predictively selects a set of future methods whose execution are temporally feasible. 4.3. Probabilistic Concurrent Constraint Automata Probabilistic Concurrent Constraint Automata (PCCA) extend Hidden Markov Models (HMMs) by introducing four essential attributes. First, the HMM is factored into a set of concurrently operating automata. Second, probabilistic transitions are treated as conditionally independent. Third, each state is labeled with a logical
604
P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions
constraint that holds whenever the automaton marks that state. This allows an efficient encoding of co-temporal processes, which interrelate states and map states to observables. Finally, a reward function is associated with each automaton, and is treated as additive. 4.4. Constraint-based Trellis Diagram Mode estimation encodes PHCA as a constraint-based trellis diagram, and searches this diagram in order to estimate the most likely system diagnoses. This encoding is similar in spirit to a SatPlan/Graphplan encoding in planning.
5. Conclusions We have extended a system capable of diagnosing and reconfiguring redundant hardware systems so that instrumented software systems can likewise be made robust. Modeling software components and their interconnections poses a high modeling burden. Results of our early experiments are encouraging but much work remains to extend the current experimental system to cover the full range of software practice.
6. Acknowledgements The work described in this article was supported in part by DARPA and NASA. NASA pictures courtesy of NASA.
7. References [1]
[2] [3] [4] [5] [6] [7] [8]
D. Bernard, G. Dorais, E. Gamble, B. Kanefsky, J. Kurien, G. Man, W. Millar, N. Muscettola, P. Nayak, K. Rajan, N. Rouquette, B. Smith, W. Taylor, Y. Tung, Spacecraft Autonomy Flight Experience: The DS1 Remote Agent Experiment, Proceedings of the AIAA Space Technology Conference & Exposition, Albuquerque, NM, Sept. 28-30, 1999. AIAA-99-4512. R. Firby, “The RAP language manual,” Working Note AAP-6. University of Chicago, 1995. R. Simmons, “Structured Control for Autonomous Robots,” IEEE Transactions on Robotics and Automation, 10(1), 94. E Gat, “ESL: A Language for Supporting Robust Plan Execution in Embedded Autonomous Agents,” In Proceedings of the AAAI Fall Symposium on Plan Execution, 1996. B. Williams and P. Nayak. A Reactive Planner for a Model-based Execution. In Proceedings 15th International Joint Conference AI, Nagoya, Japan, August 1997. IJCAI-97. T. Mikaelian & M. Sachenbacher. Diagnosing Mixed Hardware/Software Systems Using Constraint Optimization. In Proceedings DX-05. Robert Laddaga, Paul Robertson, Howard E. Shrobe. Introduction to Self-adaptive Software: Applications. Springer Verlag LNCS 261 Phil Kim, Brian C. Williams and Mark Abramson. 2001. "Executing Reactive, Model-based Programs through Graph-based Temporal Planning." Proceedings of the International Joint Conference on Artificial Intelligence, Seattle, Wa.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
605
Emergence of small-world in Ad-hoc communication 1 network among individual agents Daisuke Kurabayashi a,2 , Tomohiro Inoue a , Akira Yajima b and Tetsuro Funato a a Tokyo Institute of Technology b Olympus, co., ltd. Abstract. This paper describes an algorithm to realize a small-world type ad-hoc communication network among autonomous mobile agents. An ad-hoc network means a network emerged by local negotiations without central manager. We have shown procedures so that an emerged network equips small-world property, which results effective communication with low cost. We have formulated expected number of communication links and degrees of agents. By simulations we verified the proposed method. Keywords. Small world, Ad-hoc network, Local negotiation
1. Introduction Our society includes many networks. Sometime we are nodes of the communication networks. Today, we have some high-speed wireless communication systems, for example W-CDMA (UMTS) phones. However, they require heavy infrastructure, and they are not so fault-torelant when the access point is broken. In the contrast, we also have Ad-hoc network, which is constracted and maintained by distributed way. It is robust, but its performance depends on the algorithms in individuals. In this study, we formulate communication network property that is emerged by ad-hoc negotiation between autonomous agents. We examine the essential factor and conditions to emerge effective communication network that has ”smallworld” property.
2. Problem settlement 2.1. Small World (SW) Some researchers have suggested that the structure of a graph itself has functionality. Small world is a property found in human links in our society [1][2][3][4]. 1 This work has been partially supported by a Grant-in-Aid for Scientific Research 17075007 from Japanese Ministry of Education, Culture, Sports, Science and Technology. 2 Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552 Japan.
606
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
The property ”small world” means good connectivity in large-scale network with rather small number of edges. Some researchers have suggested that the network of the power transmission lines in USA forms a small world graph[5].
(a) Regular graph
(b) Small world
(c) Random graph
Figure 1. Typical graphs and a small world.
We employee the property ”small world” to evaluate an ad-hoc network emerged by local negotiations. A small world graph is identified by following two measures: Characteristic Path Length (CPL) and Clustering Coefficient (CC) [6][7]. Equation (1) shows CPL and CC, where d¯vi is average of the shortest paths to other nodes, kvi shows degree of a node vi , Γvi means partial graph composed of neighbors of vi , and ||E(Γvi )|| indicates number of edges in Γvi .CPL indicates radius of a graph. CC denotes connectivity of geometrically near nodes. CP L =
NV 1 d¯v NV i=1 i
CC =
NV 1 ||E(Γvi )|| NV i=1 kvi C2
(1)
Generally, both CPL and CC of a random graph become large, and those of a regular graph become small. However, a small world graph has small CPL and large CC. This means that a small world graph has high connectivity among not only geometrically near nodes but also far nodes. Notice that there is no strict threshold to distinguish a small world graph from other graphs. So we usually compare CPL and CC with those of a random graph and a regular graph (Fig. 1)[8]. We define a graph is a small-world graph if its normalized CPL is lower than 0.4 and normalized CC is higher than 0.6 (2). N CP L =
CP L − CP LRandom CP LRegular − CP LRandom
N CC =
CC − CCRandom (2) CCRegular − CCRandom
2.2. Assumptions and conditions 2.2.1. Agents Let us suppose that Nr autonomous agents walk in a certain region W (Fig. 2(a)). An agent walks straight with v per step in a certain period. Then, it changes its walking direction at random, and continues to walk. We call this motion as random walk. Initial positions of agents are generated at random.
607
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
2.2.2. Targets We also suppose NT static agents, called targets. A target never move around, but it changes its status between non-active and active in a certain period. All targets are synchronized. When targets are active, an agent walks toward to the position of the nearest target (Fig. 2(b)). Let us call this motion as concentration. When targets turn into non-active, an agent returns to the random walk. In this case, agents diffuse in the working field. We name this as diffusion. The targets take the role of schools, hospitals and companies in the real society. We assume that a working region W is a square, NT is a square number, and targets are located in the grid positions to simplify the problem. An agent
A target
ra
An agent
v A0 t-1
(a) Mobile agents
(b) Showing the targets
A1 t
(c) Communication radius
Figure 2. Essential motions of agents
2.2.3. Generation of a communication edge Each agent has its ID like a telephone number. An agent can communicate with agents it knows whose ID. We call this connection as an edge in a communication network. An agent can talk to other agents within a circle with radius ra directly. When it finds other agents in the circle, it negotiates them to exchange their IDs. They will reach an agreement with the probability S (0 < S ≤ 1). An agent can memory IDs up to D. When it wants to memory D + 1th ID, it deletes one of IDs in its current memory at random, then incorporates new one. It notifies a robot of the deleting ID in order to the agent can also delete the communication link. Under those assumptions, we examine the condition to emerge small-world network among mobile agents. 3. Essential factor for SW network In this section, we examine qualitative condition to emerge small-world communication network. We carry out simulations to evaluate the effects of the targets. Table 1 indicates the parameters. The graph theory says that a connected graph needs degree d >> ln(Nr ) [8]. We set the limit number of memory by the condition. In the initial state, agents are located at random and they have no connection (no edge). We have carried out each simulation for 5000 steps in following two conditions.
608
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
Table 1. Parameters for simulations W
Size of a region
Nr NT
Number of agents Number of targets
200 × 200
T2
Step for active
100
100 9
ra S
Communication radius Success ratio of a negotiation
10 1.0
D
Limit number of memory
v
Speed of an agent
1.0
T1
Step for non-active
100
10∼70
㪈㪅㪇 D=10
㪇㪅㪐
9KVJ6CTIGVU 9KVJQWV6CTIGVU
Normalized CC
㪇㪅㪏 㪇㪅㪎 㪇㪅㪍
D=20
㪇㪅㪌 㪇㪅㪋 㪇㪅㪊 㪇㪅㪉
D=10 D=70
㪇㪅㪈 D=20
㪇㪅㪇 㪇㪅㪇
㪇㪅㪉
㪇㪅㪋 㪇㪅㪍 㪇㪅㪏 Normalized CPL
㪈㪅㪇
Figure 3. Properties of emerged networks
A NT targets are located. A target repeats T1 step non-active → T2 step active. B No target exists. An agent walks at random. Figure 3 shows the normalized CC and CPL of generated communication networks. We have changed the limited number of memory D from 10 to 70. Under the condition B, the generated graph never satisfy the criteria to small world. However, in A, when D is in 10 < D < 20, graphs with small-world property emerge. We can say that we need targets to generate small world in a communication network. In the next section, we will formulate the property of emerged network.
4. Formunation of the emergence of a network 4.1. Estimated number of edges In this section, we formulate an estimation of edges in an emerged network. Suppose that the estimated number of agents in a certain area is subject to Poisson distribution. Let Pij (t) and Qij (t) indicate possibilities that an edge between agent i and j exist and that they negotiate each other, respectively. We denote Rij (t) as the probability so that the edge between i and j disappears. Then we can formulate estimated number of edges NE (t) by (3). Let us formulate Qij (t) and Rij (t) respectively. Pij (t) = Pij (t − 1)Qij (t)S + Pij (t − 1)Rij (t)
NE (t) = Pij (t)Nr C2
(3)
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
609
4.2. At the random walk Let U is a set of agents that agent i does not have communication link to. Fig. 2(c) shows transition from t − 1 step to t. Region A0 indicates an overlapped communicable region by that of step t − 1. Let ρU (t) is the density of U in W , ρ− U A (t) is that in A before negotiation, and ρ+ (t) is that after negotiation. Let N U (t) is an expected number of U after UA negotiation. NU (t − 1) Pij (t − 1)(Nr − 1) = W W 1 ρ− ρ+ (4) U A (t) = U A (t − 1)A0 + ρU (t)A1 A Nr −1 − # 1 Nr − 1 (ρU A (t)A)x −ρ− (t)A − UA e − ρ ρ+ (t) = (x − S) + ( (t))AR (t) ij UA UA A x=1 x! W ρU (t) =
Then, we can derive Qij (t) as (5). Qij (t) =
N r −1 x=1
x − {ρ− 1 U A (t)A} e−ρUA (t)A x! NU (t − 1)
(5)
4.3. At the concentration and the diffusion An agent repeats the concentration and the diffusion when targets are introduced. Let τ indicates passed steps from the latest activation or inactivation of targets. We describe t = 0∗ , T1∗ and T2∗ when τ = 0, τ = T1 and τ = T2 , respectively. From the viewpoint from agent i, other agents are separated into two categories. Let C and X show sets of agents to go toward the same target of agent i and don’t, respectively. An agent only can negotiate with C (6). C
Pij (τ ) = C P ij (τ − 1)Qij (τ )S + C Pij (τ − 1)Rij (τ )
X
Pij (τ ) = X Pij (τ − 1)Rij (τ ) C
(6)
X
Pij (τ ) = Pij (τ ) + Pij (τ ) We suppose C Pij (t) = ηC Pij (t)、X Pij (t) = (1−ηC )Pij (t)、C Pij (t) = ηX Pij (t)、 X Pij (t) = (1 − ηX )Pij (t) at t = 0∗ . ηC , ηX are derived by(10). We derive the expected number of C, NC by (7). NC =
Nr NT
(7)
We denote the expected number of U ∈ C as C NU (τ ), of which initial value is determined by (9).
610
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
By the concentration, the density of agents increases. We approximate a Voronoi region of a target as a circle with the same size of averaged area for targets (8). Then we formulate the density of U as (9), where B0 is a communication range that is overlapped by that in t − 1 step. W (8) r1 = πNT 1 C + ρU (t − 1)B0 + ρU (t)B1 (9) WB 1 C − Nr − 1 C − C + − ρU (t))WB Rij (t) ρU (t) = ρU WB − Qij (t)NU (t − 1)S + ( WB W
C − ρU (t)
C
=
∗ NU (0∗ ) = C ρ+ U (0 )WB
Here, we can derive ηC and η = X by (10). ηC =
(NC − 1) − C NU (0∗ ) (Nr − 1) − NU (0∗ )
C
ηX =
NU (0∗ ) NU (0∗ )
(10)
During the concentration, the approximated Voronoi region, B(τ ), shrinks (11). When t = T1∗ , it is expressed by (12). B(τ ) = (r1 − v0 τ )2 π (r1 − v0 T2 )2 π (r1 ≥ v0 T2 ) BC (T2∗ ) = 0 (v0 T2 ≥ r1 )
(11) (12)
When T1 is quite small, the concentration will be effected by the latest diffusion. We formulate B(τ ) by (13) for those cases. Note that B(T1∗ ) indicates an area of C just after the diffusion. B(T1∗ ) 2 B(τ ) = (r1 − v0 τ ) π, r1 = (13) π When targets turn into non-active, an agent starts the diffusion. We can approximately formulate the Voronoi region as (14) B(T2∗ ) 2 (14) B(τ ) = (r2 + v0 τ ) π, r2 = π Let ρU B (τ ) as the density of U in B(τ ). We derive it by (15) and (16). C − 1)(Nr − 1) NU (τ − 1) = B(τ ) B(τ ) 1 + ρ− ρU B (τ − 1)A0 + ρU B (τ )A1 U B (τ ) = A
ρU B (τ ) =
cP
ij (τ
(15)
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network
ρ+ U B (τ )
1 = A
B(τ ) =
N c −1 x=1
611
# x NC − 1 {ρ− −ρ− (τ)A − U B (τ )A} e UB − ρU B (t))ARij (t) (x − S) + ( x! B(τ )
B(τ ) (B(τ ) ≥ A) A (B(τ ) < A)
(16)
Then, we derive (17). Qij (t) =
N c −1 x=1
x − {ρ− 1 U B (τ )A} e−ρUB (τ)A C x! NU (τ − 1)
(17)
Notice that the size of a Voronoi region becomes larger than that of WB , we apply the random walk model to the system. 4.4. Formulation of Rij (t) When agent i obtains new ID although its memory has been already full, it deletes one of ID in the memory at random. Let D QiU (t) as the possibility that agent i without available memory tries to negotiate with U. We derive it when an agent moves at the random walk (18) and at the concentration (19), respectively. D
QiU (t) = (Nr − 1 − D)Qij (t)
D
QiU (t) =
Nr − 1 − D Qij (t)C NU (t − 1) NU (t − 1)
(18) (19)
Then, we can formulate Rij by (20). Notice that fn (t) indicates the possibility that an agent has n edges after negotiation. Rij (t) = fD (t − 1)D QiU (t)S
1 1 (2 − fD (t − 1)D QiU (t)S ) D D
(20)
4.5. Verification of expected degrees We carried again simulations with the parameters in Table 1. In this case, we verified expected degrees for agents with targets. Figure 4 shows the total number of edges and degrees, respectively. Table 2 illustrates the errors according to the parameters. By the results, we have successfully formulated the network within 10 % errors. The approximations for the formulations cause the errors. The most essential one may be the approximation that we regard the Voronoi region of a target as a circle with the same size. However, it is quite difficult to formulate the Voronoi diagram for general layout of the targets.
5. Conclusion In this study, we examined the essential factor and conditions to emerge smallworld (SW) communication network among autonomous mobile agents. We in-
D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network 800
60
700
50
600 500 400
Formulated
300
Simulated
200
0WODGTQHCIGPVU
Total number of edges
612
100 0
Formulated
40
Simulated
30 20 10
0
0
200 400 600 800 1000 1200 1400 1600
0
STEP
(a) Total number of links
5 10 15 &GITGG PWODGTQHEQPPGEVGFCFIGU
(b) The distribution of degrees
Figure 4. Verification of the formulated model Table 2. Error of number of links D 10
15
S
Ave. Error (%)
Distribution
1.0
3.08
3.18e−4
0.5
3.77
3.25e−4
0.1
3.94
6.33e−4
1.0
5.92
3.11e−4
0.5
5.24
3.32e−4
0.1
4.27
1.34e−3
D 20
S
Ave. Error (%)
Distribution
1.0
5.00
6.82e−4
0.5
4.60
5.91e−4
0.1
4.89
2.03e−3
corporate concentration and diffusion motion for SW property. We formulate estimated property of a network emerged by ad-hoc negotiation among agents. Based on those results, we verify the derived model by simulations.
References [1] S. Milgram: The small world—world problem, Phycology Today, Vol. 2, pp. 60/67, 1967. [2] M. Marchirori, V. Latora: Harmony in the Small-world, Physica A, Vol. 285, pp.539/546, 2000. [3] N. Mathias, V. Gopal: Small Worldws: How and Why, Physical Review E, Vol. 63, No.2, 2001. [4] J.M. Montoya, R.V. Sole: Small World Patterns in Food Webs, 2000. [5] L.Adamic: The Small World Web, Proc. ECDL99, pp. 443/452, 1999. [6] D. Watts, S. Strogatz: Collective Dynamics of Small-world Networks, Nature, Vol. 393, pp. 440/442, 1998. [7] D. Watts: Small Worlds, Princeton U. Press, 1999. [8] B. Bollobas: Random Graphs, Academic Press, 1999.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
613
Parametric Path Planning for a Car-like Robot Using CC Steers Weerakamhaeng Yossawee a,1 , Takashi Tsubouchi a , Masamitsu Kurisu b and Shigeru Sarata c a Intelligent Robot Laboratory, University of Tsukuba, Japan b Department of Mechanical Engineering, Faculty of Engineering, Tokyo Denki University, Japan c Intelligent systems Institute, AIST, Japan Abstract. The present authors have been interested in a smooth locomotion of a wheel loader between any predefined pair of a locally desired initial configuration and a locally desired destination configuration. The success comes from the smooth path generation scheme based on the design of a steering method called Continuous-Curvature Steer( CC Steer ) which is composed of line segments. clothoid segments and circular arcs. Keywords. clothoid pair, nonholonomic path planning, upper-bounded curvature and curvature derivative path, continuous-curvature path, frame articulated steering type vehicle
1. Introduction The present authors have been interested in autonomous control of a wheel loader (WL) (Figure 1) which is considered as a frame articulated steering type vehicle. WL is one of major machine vehicles used in several fields, especially for mining. Its main operations are to shovel, carry, load and unload gravel. During the operations, a typical motion of this vehicle consists of several motions between pairs of a locally desired initial configuration, and a locally desired destination configuration. It is vigorously preferable to have the smooth locomotion in order to accommodate the precision of path tracking control and improve the vehicle’s dead reckoning ability. This paper then contributes to the generation of a smooth path as the key to preference. Dubins proved that without any obstacle, the optimal(shortest) path[1] for a car that can only go forward will take one of the six combinations of sequential curve segments. However, his idea must have sharp turns. For the case of a car-like robot that can move forward or backward, Reeds and Shepp [2] showed that the optimal path between two configurations will take one of 48 different forms. The curvature of this type of path is discontinuous. Discontinuities occur at the transitions between straight segments and sharp turns and between sharp turns with opposite direction of rotation. 1 Correspondence to: Weerakamhaeng Yossawee, Intelligent Robot Laboratory, University of Tsukuba, Japan. E-mail: [email protected], [email protected]
614
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers
Barraquand and Latombe [10] proposed a heuristic brute-force approach to motion planning for tractor-trailer robots. It consists of heuristically searching a graph whose nodes are small axis-parallel cells in configuration space. Two such cells are connected in the graph if there exists a feasible path between two configurations in the respective cells. The main drawback of their method is that when the heuristics fail, it requires an exhaustive search in the discretized configuration space. In addition, the resulting path is inexact because the solution to the nonholonomic constraints is approximated numerically; this implies that the goal configuration is never reached exactly. Laumond et al. [9] proposed a motion planning for a car-like robot taking into account of nonholonomic constraint whose turning radius is lower-bounded. They obtained a fast and exact planner for their mobile robot model(HILARE mobile robot), based upon the recursive subdivision of a collision-free path generated by a lower-level geometric planner that ignores the motion constraint. The resultant trajectory is optimized to give a path that is of near-minimal length in its homotopy class. S.M.LaValle and J.J. Kuffner [11] presented the first randomized approach to trajectory planning. The task is to determine control input to drive a robot from an initial configuration and velocity to a goal configuration and velocity while obeying physicallybased dynamical models and avoiding obstacles in the robot’s environment. They developed a randomized planning approach based upon what they called Rapidly-exploring Random Trees , which offers benefits that are similar to those obtained by successful randomized holonomic planning methods, but apply to a much broader class of problems. Thierry et al. [3] presented the CC Steer, an algorithm planning paths in the absence of obstacles. The planned path possesses a) continuous curvature, b) upper-bounded curvature, c) upper-bounded curvature derivative for a car-like vehicle, and d) the length of the computed path is not the minimal length but close to the length of the optimal paths for the simplified car (as it is true when the curvature derivative limit tends to infinity, the paths computed become Reeds and Shepp paths). The path computed by CC Steer are made up of line segments, circular arcs and clothoid arcs[4]. Kito et al. [6] considered the application of CC Steer in the work space with the definite number of polygonal obstacles. They combined this CC Steer idea with visibility graph search. In order to gain the optimal path (shortest path), the decision of the number of tangents to the discretized circle having the center at each vertice of polygonal obstacles play a role in the algorithm as well. However, the specific criteria in the decision was not mentioned. In this paper, the present authors propose a smooth path generation scheme in the absence of obstacles based on the nonlinear programming. Novelty of this paper is a formulation of the shortest path finding problem suitable for the nonlinear programming, where the geometric relationships with combination of CC Turns [3] and line segments were considered. The objective is to find an optimal path (shortest in length) for the specific parameterized canonical path skeleton. By the optimization, for a given pair of initial and destination configurations, the values of parameters in the parameter set can be understood and used to draw a smooth planned path. The planned path consists of 2 turns, each of which is a CC Turn.
615
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers Front axle
R σst 2
σst(steering angle)
σst 2
L
R
center pin
Pwl (representative point of the wheel loader)
Figure 1. A Wheel Loader(WL)
L
Rear axle
Figure 2. Circling Behaviors of a Wheel Loader
2. Paths of continuous and upper bounded curvature and its derivative 2.1. A wheel loader kinematics issues It is natural to consider that the path to be planned should have the properties of continuous curvature, upper-bounded curvature, and upper-bounded curvature derivative. The limit property of upper-bounded curvature arises from the maximum steering angle σ st max (Figure 2). The curvature κ at any instance with respect to the steering angle σ st is represented as: κ = tan(
σst )/L 2
(1)
The fact that |
dσst ds dσst |=| · | dt ds dt
(2)
is bounded requires the continuous curvature property. 2.2. The combination among CC Turn and 2 line segments Refering to [3], in general, a CC Turn (Figure 3) is made up of three parts: 1. a clothoid arc(from lc p0 to lc p4 ) of sharpness k = ±kmax whose curvature varies from 0.0 to ±κ max where |κmax | is the upper bounded curvature. A clothoid is a curve whose curvature(κ) varies linearly with its arc length s : κ(s) = ks + κ(0.0). 2. a circular arc(from lc p4 to lc p5 ) of radius ±κ−1 max and 3. a clothoid arc(from lc p5 to lc p9 ) of sharpness −k whose curvature varies from ∓κmax to 0.0. For the local reference frame lc having lc p0 as the origin, without loss of generality, p0 (lc xp0 ,lc yp0 ,lc θp0 ,lc κp0 ) = (0.0, 0.0, 0.0, 0.0). Defining sp4 as the distance of lc p4 from lc p0 , from Figure 3, lc p4 (lc xp4 ,lc yp4 , lc θp4 ,lc κp4 ) can be calculated as follows: lc
616
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers lc
κp4 = ±κmax lc
sp4 = lc
θp4 =
nrdlimit min =
1 lc κ
(3)
p4
κp4 −lc κp0 k
k · [sp4 ]2 = 2
lc
lc
,
u=sp4
xp4 =
(4) lc
θm
(5)
lc
cos[ θ(u)] d u
lc
,
u=sp4
yp4 =
u=0.0
sin[lc θ(u)] d u
(6)
u=0.0
The center of the circle arc, lc ac(lc xac ,lc yac ) can be calculated as: lc
lc xac =lc xp4 −lc nrdlimit min · sin[ θp4 ] , lc
lc
lc yac =lc yp4 +lc nrdlimit min · cos[ θp4 ] (7)
nrac and lc μ can be determined as: ( lc nrdlimit lc min · nrac = lc (lc xac −lc xpo )2 + (lc yac −lc ypo )2 | nrdlimit min | (8) lc
lc xac −lc xpo μ = arctan[ lc ] yac −lc ypo
(9)
Defining the deflection of the CC Turn as lc θp9 −lc θp0 : lc
θp9 −lc θp0 = (lc θp4 −lc θp0 ) +lc θp4 p5 + (lc θp9 −lc θp5 ) = lc θm +lc θp4 p5 +lc θm
(10)
The deflection of the CC Turn whose circular arc has zero length( lc θp4 p5 = 0.0) is θp9 −lc θp0 = κ2max k −1 = 2 ·lc θm and CC Turn consists of only two clothoid arcs. It is called the double clothoid or the clothoid pair according to Kanayama et al. [4]. When the path is made up of 2 line segments connecting a CC Turn at lc p0 and lc p9 respectively, the curvature profile is continuous and looks like the one depicted in Figure 4 (assumption of CCW rotation of this CC Turn). Segment from lc pa to lc p0 is the line segment. Moving from lc p0 to lc p4 is on the clothoid arc of k = k max . The curvature is maximum and constant from lc p4 to lc p5 which is the circular arc. Finally, moving from lc p5 to lc p9 is on another clothoid arc of k = −k max . Curvature is varied down back to 0.0, then movement continues on the line segment from lc p9 to lc pb . lc
3. Specific parameterized canonical path skeletons Given a pair of 1 Init(1 xinitdsr ,1 yinitdsr ,1 θinitdsr ) (without loss of generality, set to (0, 0, 0)) and 1 Des (1 xdesdsr ,1 ydesdsr ,1 θdesdsr ) (where the leftmost superscript ”1” is refered to the local reference frame x 1 − y1 ), the authors propose a path generation scheme employing the following specific parameterized canonical path skeleton. In this section, parameterizations of the path skeleton suitable for the nonlinear programming are illustrated.
617
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers
lc
θm lcp7
lc
lc
p8
θ p9 p9
lcp6
lc
lc
cx
θ p5
lc
μ
lc
lc
μ μ
lcac
lc
p5
lc
θ p4p5
θ p4
lc
μ lc μ
ci
lc lc
p4
ylc lc
p3
xlc lcp0
lc
lc
-μ
p1
Curvature lcp2 lc
θm
κ max lc
nrdlimit min lc
lc
lc
nrac
Figure 3. A CC Turn
pa
p0
lc
p4
lc
p5
lc lc
p9
distance pb
Figure 4. Curvature profile of two line segments connecting to both ends of a CC Turn of CCW rotation
3.1. Skeleton consisting of two turns of two CC Steers In Figure 5, the vehicle will begin to run on this skeleton from 1 O1 to 1 A on line segment −−−→ 1 1 O1 A of displacement nl 1 . Then it takes a CC Turn ac 1 illustrated in Section 2. The deflection of this CC Turn ac 1 defined as 1 θD − 1 θA is equal to 1 θmac1 + 1 θBC + 1 θmac1 which is depicted as θM in Figure 5. Vehicle continues to move on line segment −−−→ 1 1 D E of displacement nl 2 to 1 E. It then takes a CC Turn ac 2 . Finally, vehicle moves −−−→ on line segment 1 H 1 J of displacement nl 3 to 1 J as the destination of this skeleton. The deflection of CC Turn ac 2 defined as 1 θH − 1 θE is equal to 1 θmac2 + 1 θF G + 1 θmac2 which is depicted as θN in Figure 5. 3.1.1. Scheme for the path generation for this skeleton The scheme for the path generation is to minimize the total moving distance of vehicle from 1 O1 to 1 J. Therefore, the following nonlinear programming can be formulated: 2
M inimize Z = 1 O1 1 A + [ 1 A1 B]2 + [ 1 B 1 C]2 + [ 1 C 1 D]2 2
+1 D1 E + [ 1 E 1 F ]2 + [ 1 F 1 G]2 + [ 1 G1 H]2 + 1 H 1 J
2
(11)
= Z(1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 ,1 θmac2 ,1 θF G ,1 nl3 ) (12) Subject to: 1 nl1 − L ≥ 0.0 or 1 nl1 + L ≤ 0.0 , 1 nl3 − L ≥ 0.0 (13) 1
xdesdsr = 1 xJ
,
1
ydesdsr =1 yJ
,
1
θdesdsr =1 θJ
(14)
618
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers 1
μac1 = |lc μ| or 1 μac1 = −|lc μ| , 1 nrac1 = |lc nrac | or 1 nrac1 = −|lc nrac | (15) 1
θmac1 = |lc θm | or 1 θmac1 = −|lc θm | 1
μac1 · nrac1 ≥ 0.0 ,
1
μac2 = |lc μ| or 1 μac2 = −|lc μ| , 1 nrac2 = |lc nrac | or 1 nrac2 = −|lc nrac | (18)
1
θmac2 = |lc θm | or 1 θmac2 = −|lc θm | 1
1
(16)
1
1
θmac1 · nrac1 ≥ 0.0
μac2 ·1 nrac2 ≥ 0.0 ,
1
θmac2 ·1 nrac2
(17)
(19) ≥
0.0
(20)
Note that 1 nl1 , 1 θBC , 1 nl2 , 1 θF G and 1 nl3 can continuously be varied independently in the nonlinear programming to minimize Z. On the other hand, 1 μac1 and 1 μac2 , 1 nrac1 and 1 nrac2 and 1 θmac1 and 1 θmac2 are allowed to be varied between either ± lc μ, ±lc nrac and ±lc θm (fixed by calculation from Eq.(9), Eq.(8) and Eq.(5) ) consecutively. This means that these parameters are given discretely when the nonlinear optimization is performed. By the way, k in Eq.(5) is fixed constantly as k upperbound and equal to the upper bounded curvature derivative (|dκ limit max /dt|) [8] divided by the absolute value of tangential linear velocity(v = ds/dt) to a clothoid at any moment. kupperbound =
|dκlimit max /dt| |v|
=
|
dt dκlimit | · | max | ds dt
(21)
The deflection of CC Turn ac 1 whose circular arc has zero length is θ Mmin = |κlimit max | · −1 kupperbound . The deflection of CC Turn ac 2 whose circular arc has zero length is θ Nmin −1 = |κlimit max | · kupperbound . Constraints of Eq.(13) are to guarantee that there will be a line segments of lower bounded length | L | connected to a CC Turn at 1 A and the bucket attached to the front body of wheel loader must always leads the rear body toward any given desired destination position( 1 Des)(1 nl3 must be positive) and there must be a line segment 1 nl3 connecting CC Turn ac 2 at point 1 H of minimum length of L respectively. Constraints of Eq.(14) are to guarantee that end point of this planned path( 1 J) is exactly connected to the locally desired destination position, 1 Des(1 xdesdsr ,1 ydesdsr ), with the same orientation. Constraints of Eq.(15) and Eq.(16) are the results from the upper-bounded curvature and upper-bounded curvature derivative which are from the physical limits of wheel loader. Constraints of Eq.(17) are to guarantee the consistency of the signs of 3 parameters, 1 μac1 , 1 nrac1 and 1 θmac1 that characterize the direction of rotation(CW or CCW) of this CC Turn ac 1 . Constraints of Eq.(18) and Eq.(19) are the results from the upper-bounded curvature and upper-bounded curvature derivative which are from the physical limits of wheel loader. Constraints of Eq.(20) are to guarantee the consistency of the signs of 3 parameters, 1 μac2 , 1 nrac2 and 1 θmac2 that characterize the direction of rotation(CW or CCW) of this CC Turn ac 2 . 3.2. Remark on one CC Turn path Depending on the initial condition of the given pair of 1 Init and 1 Des, there is a possibility that the path is obtained by using only one CC Turn constituting the skeleton. In such a case, similar parameterization and nonlinear programming yields the path. Because of shortage of the paper length, details of this case could not be presented. For a
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers
619
realistic strategy, it must be an idea that once the one CC Turn constituting the planned path is employed, if a specific criteria could not be met, the skeleton consisting of two turns of two CC Steers is then applied. 3.3. Nonlinear programming For the actual nonlinear programming(NLP) in Section 3.1.1, the authors utilized the Mathematica command function. Cost function Z and all of the constraints are passed to the suitable parameter lists of this function. The Mathematica NLP function can accept attributes of continuous and/or discrete parameters. The author specified differential evolution method for NLP as the option. Desired NLP for resolving the problem formulation can be performed properly. Finally, for a given pair of 1 Init and 1 Des, the values of the parameter set of ( 1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 , 1 μac2 ,1 nrac2 ,1 θmac2 ,1 θF G ,1 nl3 ) can be obtained. Based upon the computer model of Pentium 3, 650MHz, 128MB RAM, the calculation time required for a given pair of 1 Init and 1 Des is approximately less than or equal to 78.376368 seconds.
4. Example of path generation In this section, typical examples of path generation results based on the proposed path limit limit generation scheme are illustrated. We assumed |ω st | of 20 degree/sec, |σ st | of 40 max max degree, L of 12.5 cm and |v| of 40 cm/sec. 4.1. Example 1 For a given pair of 0 Init(2.0L, 3.0L, 140◦) and 0 Des(10.0L, 10.0L, 225◦) 1. Setting 0 O1 = 0 Init and converting to 1 O1 (0, 0, 0) and 1 Des(−1.62884L, −10.5046L, 85◦ ) 2. Applying the nonlinear programming of Section 3.1.1 and the obtained values of (1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 , 1 θmac2 , 1 θF G ,1 nl3 ) = (1.0L, −lcμ, −lc nrac , −lc θm , −19.585◦, 6.52142L , lc μ,lc nrac , lc θm , 104.585◦, 1.0L). | θM | is | −158.749◦ | which is greater than | θ Mmin |(139.164◦). | θN | is 243.749◦ which is greater than | θ Nmin |(139.164◦). Planned path can be drawn as in Figure 6. 4.2. Example 2 For a given pair of 0 Init(2.0L, 3.0L, 140◦) and 0 Des(0.0L, 10.0L, −90◦) 1. Setting 0 O1 = 0 Init and converting to 1 O1 (0, 0, 0) and 1 Des(6.0316L, −4.07674L, −230◦) 2. Applying the nonlinear programming of Section 3.1.1 and the obtained values of (1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 , 1 θmac2 ,1 θF G ,1 nl3 ) = (1.70905L, −lcμ, −lc nrac , −lc θm , 122.148◦, 0.661746L, −lcμ, −lc nrac , −lc θm , −73.82◦, 1.44713L). | θM | is | −17.016◦ | which is less than | θMmin |(139.164◦). | θN | is | −212.984◦ | which is greater than | θ Nmin |(139.164◦). Planned path can be drawn as in Figure 7. Note that as a result of NLP, the steer-
620
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers ed 2
θN nl 3
1 1
μac2
H 1
θJ
1
1
J
μ ac2
μ ac2
1
G
θ mac2
F
μac2
1
θ mac2 1 μac2
ac2cx ac2 1
nrac2
limit
1
−μac2
ac2ci
nrd2
ed2
x3
1
1
θ FG
N
E
min
y3 nl 2
μac1
1
1
1
nrac1
D
limit
nrd1
ac1cx
min
ac1ci
μac1
1
ac1 1
y2
y1
O1
1 1
μac1
A
x1
bucket front body
θ mac1
θmac1 B
x2 1
−μac1
ed
1
μac1
1
M 1
μac1
1
θM
C
θ BC
nl 1 ed1
Figure 5. Skeleton consisting of 2 turns of CC Steer
ing angle σst of the vehicle did not come to the maximum physical limited value. Therefore, the planned path is not tangent to the inner circle of CC Turn ac 1 . This fact is provided by the NPL result merely due to the value of the given pair of 1 Init and 1 Des.
5. Conclusion In this paper, the present authors present the path generation scheme for a wheel loader during actual operations. The focus of this scheme is on the creation of the smooth path connecting any given pair of a locally desired initial configuration and a locally desired destination configuration. The specific parameterized canonical path skeleton is defined: the one with two CC Turns and line segments. Path planning is achieved by finding out necessary parameters characterizing this skeleton. The planned path might not be the
W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers y0-axis
G
G H D
621
y0-axis
F
J
H
C
J E
B
A
E
D
F
O1
O1 A
x0-axis x0-axis
Figure 6. Example of 2 ccpath for a given pair of 0 Init(2.0L, 3.0L, 140◦ ) and 0 Des(10.0L, 10.0L, 225◦ )
Figure 7. Example of 2 ccpath for a given pair of 0 Init(2.0L, 3.0L, 140◦ ) and 0 Des(0.0L, 10.0L, −90◦ )
shortest distance, however it guarantees the smooth locomotion. Morover, extensions can be made to the proposed skeleton in order to cover the finite number of the static obstacle existences and more number of CC Turns.
References [1] L. E. Dubins. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of Mathematics, 79:497-516, 1957. [2] J. A. Reeds and L. A. Shepp. Optimal paths for a car that goes both forwards and backwards. Pacific Journal of Mathematics, 145(2):367-393, 1990. [3] Thierry Fraichard, and Alexis Scheuer, From Reeds and Shepp’s to Continuous-Curvature Paths, IEEE Trans. on Robotics and Automation, Vol.20 No. 6, December 2004 [4] Yutaka Kanayama, and Norihisa Miyake, Trajectory Generation for Mobile Robots, Robotics Research, vol. 3, 333-340, MIT Press, 1986 [5] Scheuer, A., and Fraichard, Th., Collision-free and Continuous-Curvature Path Planning for Car-Like Robots. Proc. of 1997 IEEE Int.Conf.on Robotics and Automation, pp. 867-873, 1997 [6] T. Kito, J. Ota, et al, Smooth Path Planning by Using Visibility Graph-like Method. Proc. IEEE Int. Conf. on Robotics and Automation, pp.3770-3775, September 2003. [7] Weerakamhaeng Yossawee, Takashi Tsubouchi, Shigeru Sarata, and Shin’ichi Yuta., Path Generation for Articulated Steering Type Vehicle Usi ng Symmetrical Clothoid Proc. of 2002 IEEE Int. Conf. on Industrial Technology(ICIT’02), volume 1, pp.187-192, 2002. [8] Weerakamhaeng Yossawee, Takashi Tsubouchi, Masamitsu Kurisu, and Shigeru Sarata, ., A Semi-Optimal Path Generation Scheme for Frame Articulated Steering Type Vehicle submitted in July 2005 to International Journal of Advanced Robotics, Japan. [9] J.P. Laumond, P.E. Jacobs, M. Taïx and R.M. Murray., A Motion Planner for Nonholonomic Mobile Robots IEEE Transactions on Robotics and Automation 10(5), 577-593(1994). [10] J. Barraquand and J.C. Latombe., Nonholonomic multibody mobile robots: Controllability and motion planning in the presence of obstacles. Algorithmica, 10:121-155, 1993. [11] S.M. LaValle and J.J. Kuffner., Randomized kinodynamic planning. International Journal of Robotics Research, 20(5):378-400, May 2001. [12] Stephen Wolfram., The Mathematica Book, 5th edition, ISBN:1-57955-022-3.
622
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
623
624
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol observation points
border of sensing areas
rsen
virtual observation point
obstacle
ri rb
grid detect to be obstacle by sensor
ri
observation point
625
626
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
boundary of the working area uncover region
rb mobile robot
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
627
628
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
4
4 5
5
b
3
b
3 6
2
7
1 a
6
2
7
1 a
8
8
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
629
630
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
cyclic partition greedy
140
80 60 40 20
120 maximum traveling length [m]
100
100 80 60 40 20
0 b
c
simulation environments
d
100 80 60 40 20
0 a
cyclic partition greedy
140
120 maximum traveling length [m]
maximum traveling length [m]
cyclic partition greedy
140
120
0 a
b
c
simulation environments
d
a
b
c
simulation environments
d
C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol
631
632
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System using Communication Tomohisa Fujiki a,1 , Kuniaki Kawabata b and Hajime Asama c a School of Engineering, The University of Tokyo b RIKEN c RACE, The University of Tokyo Abstract. In multi-robot system, cooperation is needed to execute tasks efficiently. The purpose of this study is to realize cooperation among multiple robots using interactive communication. An important role of communication in multi-robot system is to make it possible to control other robots by intention transmission. We consider that multi-robot system can be more and more adaptive by treating communication as action. In this report, we adopt action adjustment function to achieve cooperation between robots. We also run some computer simulations of collision avoidance as an example of cooperative task, and discuss the results. Keywords. Q-learning, Multi-Robot System, Communication, Cooperation, Mobile Robot
1. INTRODUCTION In multi-robot systems, communication is thought as a necessary skill for robotsto cooperate, and a number of schemes have been proposed for it [1,2]. However, these studies may not be useful to adapt in a dynamic and complex environment as they set rules to communicate. To achieve cooperation effectively in such environments, we have to discuss the adaptable cooperation using communication. Yanco et al. tried to develop a method to acquire an adaptive communication for cooperation of two robots[3]. Billard et al. proposed a learning method of communication through imitation [4]. This is an interesting approach but the system needs a teacher robot. In these methods and most of robotics resesarch, the communication is treated as special function for the robotic systems. On the other hand, in developmental psychology, communication is considered as interaction between individuals [5]. Moreover, communication is the transmission of intention, and those who received have to comprehend the intention. In conventional studies on cooperation of robots based on communication as signal transmission, action is taken as a motion of its own body and they focused on decision making using sensory 1 Correspondence to: Tomohisa Fujiki, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8568, JAPAN . Tel.: +81-47-136-4260; Fax: +81-47-136-4242; E-mail: [email protected].
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
633
information. Communication is signal transmission over wireless LAN or other devices, but it is not correct in developmental psychological sense. There should be a sort of protocol between robots to communicate, and the intention should be exchanged. Consequently transmitting one’s intention could be treated as an action and receiving other’s intention could be treated as perception in multi-robot system. By introducing this concept to their control architecture, robots can make an attempt to control other robots. This means that a robot can make an action over constraint of its own D.O.F.(bodyexpansion behavior), and multi-robot system can be more flexible and adaptable. In this study, we take in communication to robot’s model both as perception and action. It means to achieve cooperation between robots, not only robot’s own movement but also sending message to other robots is treated as an action. We have previously developed an action selection method [6] which treats communication as above, but there was a problem of how to adjust different type of actions; self generated action and a requested one by communication. It seems that most effective strategy for the whole system is to accept a request only when the situations for both robots seem to improve. In this paper, we propose an action adjustment function to achieve cooperation between mobile robots. We also have some computer simulations of collision avoidance as an example of cooperative task, and discuss the results.
2. ACTION SELECTION METHOD INCLUDING INTERACTIVE COMMUNICATION 2.1. Reinforcement Learning Reinforcement Learning(RL,[7] is widely used in robotic systems to emerge robots’ actions from the interaction between the environment. However, in multi-robot system, there is a possibility that the same action causes different state transition which misleads the learning. To avoid this problem, Q-Learning for Semi Markov Decision Process (SMDP, [8]) which can handle discrete time series is utilized generally. Q-Learning algorithm for SMDP is as follows. 1. Observe state st at time t in the environment. 2. Execute action at selected by action selection node. 3. Receive reward r and calculate the sum of discounted reward Rsum until its state changes. Rsum = rt + γrt+1 + γ 2 rt+2 + · · · + γ N −1 rt+N −1
(1)
Here, r is a discount factor (0 ≤ r ≤ 1). 4. Observe state st+N at time t + N after the state change. 5. Renew Q value by equation (2). Q(st , at ) ← (1 − α)Q(st , at ) + α[Rsum + γ N max Q(st+N , a )] a
Here, α is a learning rate (0 ≤ α ≤ 1) and a is possible actions in state st+N . 6. Clear r. 7. Renew time step t to t + N , and return to 1.
(2)
634
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
2.2. Basic Actions There are a variety of tasks considered as cooperative tasks, but in this paper, we are going to discuss collision avoidance problem of mobile robots. This is because that although there are a lot of the rule based schemes proposed for it, it can still see the effect of the communication and the expansion in D.O.F. directly. We suppose omni-directional mobile robots which are equipped with omnidirectional visual sensors. Considering communication as robots’ action, basic actions for robots are set as Table 1. Here, “Comminication” means intention transmission, which is a requesting action to other robot to make an asked action. This means that a robot can request any actions which the other robot can make. Robots acquire their state-action policy by RL. We also configure robot’s state space as Table 2. Numbers in Table 2 shows the number of state space for each domains. Example for the visual sensory information is shown in Figure 1. The size of the other robot on image plane is determined by the distance threshold. Both the direction of the other robot and the goal are devided into six state spaces. Direction of the wall has five state spaces, which are front, left, right, and the back of the robot, or, no walls. In Figure 1, the white wall is placed above the distace threshold, so only the grey wall is considered as an obstacle in robot’s state space. In this framework, a robot selects, evaluates and learns its action from sensory information and other robots’ intentions. Table 1. Actions of robot Move Own Body
Communication
- No changes in speed or direction
- No changes in speed or direction
- Speed down (2[mm/sec]) - Speed up (2[mm/sec])
- Speed down (2[mm/sec]) - Speed up (2[mm/sec])
- Change direction (+45 [deg]) - Change direction (-45 [deg])
- Change direction (+45 [deg]) - Change direction (-45 [deg])
2.3. Action Selection and Reward There are a lot of action selection models for Q-Learning like Max Selection or Random Selection. One of the methods to improve its adaptability gradually by RL is probabilistic Table 2. Configuration of state space Visual sensory information - Size of other robot on image plane - Direction of other robot - Direction of the goal - Wall direction inside the sensing area
2 6 6 4+1(none)
Communication - Other robot’s request
5+1(none)
Other Information - Own Speed Number of the State Space
2 4320
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
635
distance threshold 600[mm]
front left
small large roobt on the image plane (distance threshold 600[mm])
wall
right
wall
back
direction of walls
goal
1 6
2
5
3
4
1
other robot
ex. direction of other robot = 2
direction of other robots
6
2
5
3
4
ex. direction of the goal = 2
direction of the goal
Figure 1. Visual Sensory Information
action selection using Boltzmann distribution (Boltzmann selection). It is used widely and is reported that probabilistic selection works better than deterministic policy in multiagent systems [9]. In Boltzmann Selection model, probability p(a | s) to make action a in state s is defined as equation (3). expQ(s,a)/T p(a | s) = expQ(s,ai )/T
(3)
ai ∈A
Here, T is temperature constant. If T is near zero, action selection will be deterministic, and if T becomes large, action selection will be more random and will do aggressive search for state-action policy. Evaluation of the selected action is done by using the distance from the goal g(t) in time t. Reward rt is defined by equation (4). rt = μ(g(t) − g(t − Δt))
(4)
Here, μ is a weight value and represents effectiveness of the reward. Δt is cycle time for decision making.
3. ACTION ADJUSTMENT FUNCTION When communication is treated as an action for intention transmission, accepting all the requested actions will only to improve other robots’ situations. However, for the whole system, it is seems that most effective way is to accept the request only when the situations of both robots can be improved. To accept such requests, there is a need for action adjustment function to compare the actions which are self determined action and a requested one by communication. It makes the robots to create better situations, and will be able to cooperate efficiently. For this action adjustment, we introduce the algorithm which is illustrated in Figure 2. First, a robot decides whether to move itself or to make other robot move by communication. This is a selfish action selection which doesn’t consider the state of other robot.
636
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
Of course there is a probability that the request will be refused, but whether to accept or reject the request is determined by the receiver. Next, a robot will determine which action to make; the selfish action that is decided at first step or a requested action by other robot. By those two steps, a robot can select an action considering a request from other robot. This adjustment algorithm can be utilized generally by giving numeric values for each actions. In this paper, we use Q-Learning algorithm for SMDP and the Q values from the RL are used as numeric values for two step action selection. The implemented algorithm for the robot is shown in Figure 3. Move Own Body
Move Other Robots (Communicate)
Execute Requested Action
Selfish Action Selection Total Action Selection
Selected Action
Figure 2. Action selection process
Observe State s
Calculate Reward r
No
State Changed?
Yes Renew Q value Selfish Action Selection Total Action Selection Execute Selected Action a
Figure 3. Algorithm for action selection including communication
4. COMPUTER SIMULATION OF COLLISION AVOIDANCE PROBLEM In this section, we are going to have some computer simulations to test our approach and discuss the results.
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
637
4.1. Settings There are two omni-directional mobile robots in simulation field, and the task is collision avoidance as an example of cooperative task. To compare our approach to general approach of communication in robotic research field, we set three conditions. Case A is for our proposed approach and robots can use communication to move the other robot by intention transmission. In case B, robots can inform their adjacent action which they made by signal transmission. By this, we can eliminate an influence on the size of state space and robots in case B have same state number as in case A. Robots in case C cannot use communication. Common settings are as follows. Robot is cylinder shaped and its diameter is 300[mm]. Start position of each robot is set 500[mm] from longitudinal sides of the environment, symmetry in ups and downs. Goal position is the starting point of the other robot, and is set face to face in initial condition. Maximum speed of the robots is 40[mm/sec], and minimum is 10[mm/sec]. It assumes that robots can output their speed without a time lag. A trial is terminated under four conditions, which are the goal of the both robots, the collision of robots, the collision of either robot against walls or simulation area, or when the time step reached to 3000. The parameters for RL are set experimentally as Δt = 1.0[sec], μ = 0.1, α = 0.04, γ = 0.9 and T = 0.2. Reward for the robots are calculated by equation (4), but in case of any collisions, r = −5 is given as punishment value. 4.1.1. Simulation 1 In simulation 1, straightway environment in Figure 4 is utilized. Width 800 ≤ x ≤ 3000[mm] is changed by 100[mm] and computer simulation is run for four times in every situation, and the learning is episodic for each simulation. Maximum trial number is 30000 for every experiment. 4.1.2. Simulation 2 In simulation 2, crossroad environment in Figure 4 is utilized. Simulation area is 3000[mm] square, and the width of both roads are x[mm], which changes 600 ≤ x ≤ 3000[mm] by 200[mm]. Four black pieces in Figure 4 are walls (obstacles). Computer simulation is run for four times in every situation, and Maximum trial number is 100000 for every experiment. Learning is episodic for each simulation. All settings has the same distance for goal, to make it easy to compare the results. In simulation 2, when a robot moves, physical relationship against the walls change, and it affects robot’s state space. Consequently, robots’ state change frequently when x is small, and the problem will be much difficult compared to the same x in Simulation 1. 4.2. Results Figure 5 shows the number of trials for convergence. In this report, “convergence” means 100 continuous goals. Horizontal axis shows the width of the road x. Data on those graphs are the average of four trials. It shows some oscillation, but the aptitude can been comprehended.
638
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
Figure 4. Over view of environment
Figure 5. Number of trials for convergence
4.2.1. Convergence Properties When the width x is large enough for robots and the problem can be solved easily, Case C achieves convergence faster than other cases. We believe that this occurs because the state space of Case C is one fifth of other cases and therefore it is easy to acquire the stateaction policy. The result of Case B shows large oscillation in both graphs. In this case, communication changes the state of the other robot, and it makes difficult to search stateaction policies. Communication as signal transmission doesn’t show its superiority in any case of our experiments. It only multiplies the number of states and prevents system from fast achieving of cooperation. Finally, Case A has superiority to other methods when x is small. This is the condition which the problem is hard to solve and is difficult to cooperate with others. Results show that our approach can solve the problem cooperatively even when the other approaches cannot solve it. It is a difficult situation for robots to cooperate without communication, and comparing Case A to Case B, our proposed system works better than usual usage of the communication such as information transmission. 4.2.2. Quality of the Solution Figure 6 shows the number of steps to converge, which shows the quality of the solution achieved by the system. Data on those graphs are the average of four trials. Although there are many spikes, Case A apts to generate better solutions than the other methods. We consider that intention transmission worked effectively by affectiong the other robots, only when the communication is needed. This result supports our approach that it is not only in the fastness in finding solutions but also in the quality of the solution.
T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System
639
Figure 6. Number of steps
5. CONCLUSIONS In this paper, we proposed a method to adjust different type of actions which include communication as intention transmission. By using this method, we enabled to treat communication as intention transmission action in multi-robot system and also examined its performance by computer simulations. The results show that our approach can find solution in difficult situation where cooperation is hardly achieved without communication, and is also excels in the quality of solution achieved by the system than ordinal way of communication or without using communication. In our future work, we will try our approach in more complex environments or other tasks.
References [1] Y. Arai, S. Suzuki, S. Kotosaka, H. Asama, H. Kaetsu and I. Endo: Collision Avoidance among Multiple Autonomous Mobile Robots using LOCISS (LOcally Communicable Infrared Sensory System), Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2091–2096, 1996. [2] N. Hutin, C. Pegard, E. Brassart, A Communication Strategy for Cooperative Robots, Proc. of IEEE/RSJ Intl.Conference on Inteligent Robots and Systems, pp. 114–119, 1998. [3] H. Yanco, L. A. Stein, An Adaptive Communication Protocol for Cooperating Mobile Robots, From Animals to Animats 2, pp. 478–485, 1993. [4] A. Billard, G. Hayes, Learning to Communicate Through Imitation in Autonomous Robots, Artificial Neural Networks - ICANN’97, pp. 763–768,1997. [5] Walburga Von Raffler-Engel (Editor): Aspects of Nonverbal Communication, Loyola Pr, 1979. [6] M. Hoshino, H. Asama, K. Kawabata, Y. Kunii and I.Endo: Communication Learning for Cooperation among Autonomous Robots, Proceedings of the IEEE International Conference on Industrial Electronics, Control & Instrumentation, pp. 2111–2116, 2000. [7] Richard S. Sutton and Andrew G. Barto: Reinforcement Learning:An Introduction (Adaptive Computation and Machine Learning), The MIT Press, 1998. [8] Steven J. Bradtke and Michael O. Duff: Reinforcement Learning Methods for ContinuousTime Markov Decision Problems, In G. Tesauro and D. Touretzky and T. Leen, editors, Advances in Neural Information Processing Systems, Vol.7, pp. 393–400, The MIT Press, 1995. [9] Satinder P. Singh and Tommi Jaakkola and Michael I. Jordan: Learning Without StateEstimation in Partially Observable Markovian Decision Processes, International Conference on Machine Learning, pp. 284–292, 1994.
640
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
641
642
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
643
644
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
Perc
δP
Kf
+
+
−Kb
E ff
Vds
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
645
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
10 8 6 distance from the object along y-axis
646
4 2 0 -2 -4 -6 -8 -10 -3
-2
-1
0
1
distance from the object along x-axis
2
3
A. D’Angelo and E. Pagello / From Mobility to Autopoiesis
4.5
4
3.5
direct gain Kf
3
2.5
2
1.5
1
0.5
0 -6
-4
-2
0 inverse gain Kb
2
4
6
647
This page intentionally left blank
Part 11 RoboCup
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
651
Cooperative Agent Behavior Based on Special Interaction Nets Oliver Zweigle 1 , Reinhard Lafrenz, Thorsten Buchheim, Uwe-Philipp K¨ appeler, Hamid Rajaie, Frank Schreiber and Paul Levi University of Stuttgart, Germany Abstract. An important aim of the current research effort in artificial intelligence and robotics is to achieve cooperative agent behavior for teams of robots in real-world scenarios. Especially in the RoboCup scenario, but also in other projects like Nexus, agents have to cooperate efficiently to reach certain goals. In the RoboCup project, cooperative team-play and team strategies similar to real world soccer are intended. This article describes an approach that combines cooperative aspects, role assignment algorithms and the implementation of robot behavior with Interaction Nets. Based on these single methods, a complete framework for team strategies was developed that is used in the RoboCup environment for the middle-size team CoPs Stuttgart and in the Nexus project, where a team of robots guides persons through a building. Keywords. Cooperative Robotics, RoboCup, Interaction Nets
1. Introduction Modeling and implementing group behavior in multi-agent systems, which are used to for multi-robot applications, is a complex and fault-prone task. Interdependent interactions between agents have to be considered in the individual behavior model for a well working coordination and synchronization of distributed agent actions. Especially for real-world applications of a team of autonomous mobile robots these problems become even more aggravated because situations can be interpreted in different ways by different agents. One reason for that is incomplete and fuzzy information provided by the local sensors of a robot. Therefor it is a very complex task to develop a distributed plan for coordinated group behavior. Such a complex system requires in addition to the implementation a lot of testing and tracing to identify potential problems, e.g., the occurrence of special events or deadlocks, which cannot be feasibly modeled in the team plan. In this paper, we present a framework for modeling distributed multi-agent plans by introducing XPlM Nets a special simplified form of hierarchical Interaction Nets. 1 Correspondence
to: O. Zweigle, Univerit¨ at Stuttgart, IPVS, Universit¨ atsstr. 38, 70569 Stuttgart, Germany Tel.: +49 711 7816 421; Fax: +49 711 7816 250; E-mail: [email protected].
652
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
Interaction Nets [1] are a formalism based on a specialized form of Predicate/Transition Nets [3]. The approach of Interaction Nets was introduced in [4]. In [7] they were modified to describe cooperation processes in a multi-agent environment. As this approach was too complex for the RoboCup scenario it was simplified. Motivated by the description of several team cooperation strategies in [11], we used the Shared Plans Theory as the basic idea for cooperation between the agents. The Joint Intensions Theory is currently in our opinion not useful in RoboCup, as it assumes that knowledge about the team-mates is always available. Due to instable communication in RoboCup this theory isn’t used at the moment, but it is promising for future work, using for example visual tracking [10] instead of communication. A behavior-based approach to control robots on a higher level of complexity was introduced in [12]. The attempt to use basic behaviors to reach a more abstract level of designing tactics in RoboCup provided a basis for our current work. In our implementation we have two levels of abstraction in the representation of cooperative behavior: First, modeling of the behavior in form of XPlM Nets, simplified hierarchical Interaction Nets, and secondly the execution of decision trees specified in the XML-dialect XABSL (eXtensible Agent Behavior Specification Language) [6], a specification language for agent behavior which can be directly executed by a execution engine. For easy specification of distributed behaviors, we developed the graphical editor XPlM for the XPlM Nets that generates XABSL output. The rest of the paper is organized as follows: Section 2 describes the distributed world model and the tactical role concept, the fundamentals of cooperative behavior. In section 3, Interaction Nets, the specialized XPlM Nets and the graphical modeling tool are introduced. Section 4 explains the modeling of a pass-play in robotic soccer as a complex application example. Finally, section 5 summarizes and gives an outlook.
2. Cooperative System Overview As a basis for cooperation, knowledge about the environment and the objects within is required. In complex robotic systems, this knowledge comprises environmental information locally derived from sensor data, local state information, e.g., the current role or behavior, and information transmitted by other members of the robot team. To allow for uncomplex handling of this data, we use a world model, which incorporates mechanisms for multi-thread save data representation, data processing and communication [2]. 2.1. Decentralized World Model Each robot holds several data containers, one for the locally sensed or derived data and one for each team-mate’s data, but only a (small) part of locally available information of a robot is communicated to others. The information of all robots is used to estimate abstract world states, which describe application-relevant features of the environment. In the RoboCup scenario, geometrical relations and
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
653
abstract states such as weHaveBall or iAmAttacker are used to trigger a specific behavior and are called world model flags. Due to autonomy of the robots, each one makes its decisions locally, considering all available information, including communicated status information, which can be used to model cooperative behaviors. This approach implements a cooperative system with a high degree of fault-tolerance. 2.2. Basics for Team-Play Modeling The subsystem for team play is divided into two layers. One layer, the so-called WorldModelAdapter, is derived from the world model. The WorldModelAdapter is responsible for processing the data and providing results in a format suitable for the second layer. Due to fault-tolerance, the WorldModelAdapter runs locally on each robot. The input data are local sensor readings that are processed and information communicated by other robots of the team. This way, characteristics of the current game situation are derived from a merged view of the distributed system of robotic agents. The results provided by the WorldModelAdapter are data sets, describing for example the ball position, logical information about ball possession (teamHasBall), the positions of other robots or the most recent referee event. These flags are used together with the current tactical roles of the robot to switch between behaviors of the agents. A behavior is a state of an agent in which certain drive commands are executed. The second, higher layer constitutes the control of the behaviors of an agent. This behavior control is executed in a XPlM Net (see section3). 2.3. Role Assignment Since all robots communicate with each other, it is possible to assign tactical roles and subroles to the different robot players. Every robot has a tactical role and a unique tactical subrole during the match. Although the tactical roles and subroles are assigned dynamically, a robot’s tactical role and subrole will not change unless other robots fail. We distinguish between two main tactical roles: Defender and Forward. Defender and Forward are redivided into subroles, such as Left Defender or Front Forward. This system is adopted from human soccer and can be easily extended to other main tactical roles and subroles, e.g., Midfielder. It was one aim to switch between different strategies for the team lineup like: Defender - Defender - Defender - Forward - Forward Defender - Defender - Forward - Forward - Forward At present, the Midfielder role is not used due to the limited field dimensions. According to the RoboCup rules we only change the team lineup before the match and during the halftime break at the moment. On each level of the tactical role and subrole hierarchy the different tactical subroles are designed in such a way that they do not conflict with each other and define complementary behavior, e.g., covering and defending the own goal. The advantage of the chosen tactical role hierarchy approach is a better organization of the team coordinating mechanisms.
654
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
In near future, we will try to recognize the game situation automatically and adapt the adequate tactical roles and the role behavior in a self-organized way [9]. The implementation uses two ordered lists. One list (the roleList) contains the dedicated tactical roles. The other list (robot list) contains the available robots. Robots that break down during a match are automatically removed from the second list. The tactical roles are assigned to the robots one-on-one depending on the order defined in the role list. Tactical subroles are assigned depending on the number of robots that occupy a tactical role. Every robot calculates its tactical role and subrole locally, based on a common algorithm. In addition, there are special roles that weight more heavily than the tactical roles. If a special role is assigned to a robot it will keep its tactical role and subrole, but it will execute the behavior of the the special role. Special roles are for example Keeper, Attacker, Pass Player or Pass Receiver.
3. XPlM Nets - A Simplified Version of Interaction Nets To achieve cooperative performance, we use a simplified version of Interaction Nets to map behaviors to robotic agents. This mapping process is based on the knowledge represented in the world model as described above. Because all robots in the cooperative environment have the same capabilities, the basic behaviors and therefore the decision making mechanism to switch between these behaviors should be identical for all robots of the team1 . In the context of autonomous mobile systems with unreliable communication, a centralized approach with only one deciding instance is not applicable. Hence, each agent has to make individual decisions about the current situation and the state of its team mates based on its local belief state. This local belief state is enhanced by communication, but decisions must also be taken in absence of a working inter-team communication. To achieve this, we use decision networks based on simplified Interaction Nets and call them XPlM Nets. Interaction Nets are a formalism to describe agent interactions in a multi-agent scenario, Interaction Nets show a more centralized approach than the underlying Predicate-Transition-Nets introduced in [3]. They include the possibility to run several agents (tokens) in one common net. In contrast, our XPlM Net approach allows only one agent to run through a commonly known net, which is replicated on every agent and executed locally. 3.1. XPlM Nets - Phases and Transitions A XPlM Net N et = {P h, T rans, Arc, T ok} consists of Phases, Transitions, Arcs from phases to transitions and from transitions to phases, and one Token to indicate the current phase. The phases P h = {P h1 , . . . , P hn } are equivalent to places in Petri-Nets and represent states of the agent, but imply the execution of a basic behavior. Such a basic behavior is an abstract driving command like getBall or dribble. See Fig. 1 for an example of a simple XPlM Net. Transitions T rans : P h → P h define preconditions and postconditions of a phase. To switch from P hn to P hn+1 , P hn must hold the token and the transition 1 Here,
we do not consider special roles with different behavior patterns, e.g., keeper.
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
655
IHaveBall
dribble
getBall
ballIntercepted
readyToShoot
ballLost
Figure 1. Simple XPlM Net
conditions between the two phases must be satisfied. Fig. 2 shows an example of transition conditions.
Figure 2. Transition conditions in the XPlM Net editor
3.2. Modularity Modeling the behavior of robot agents can reach a high level of complexity with a large number of phases and transitions. To keep the system manageable, hierarchical nets are introduced, where subnets SN ⊆ N et are handled like common phases. Such a subnet has a transition for its preconditions and one for its postconditions. Regardless of which phase is reached in the subnet, the execution of a subnet is immediately finished if the postcondition is valid. To leave a subnet after a certain process in the subnet was executed a starting and ending phase can be defined. As soon as the end phase is reached the execution of the the subnet is finished and the process continues on the level above with the post condition of the subnet. 3.3. Modeling Interactions Between Agents Although every agent runs a local copy of the commonly known net, cooperation between the agents is required. To implement cooperative behavior every agent gets a unique tactical subrole. Based on that subrole, different phases with different behaviors are triggered in the net. Every agent can get information about its team-mates with the help of the world model flags. With the help of that knowledge the agent can coordinate its actions with the actions of its team-mates. To map a behavior to an agent, in our implementation a subsystem consisting of two layers is used. The lower layer for agent controlling uses XABSL [5], a
656
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
XML-based behavior specification language for autonomous robots, which was originally developed for the 4-legged AIBO league in RoboCup. The XABSL files are compiled into an intermediate code to be executed by a runtime system (XABSL-engine). Writing these files manually is complex and error-prone, hence we developed a graphical modeling tool to design hierarchical decision networks, the so called XPlM editor (Fig. 3). This higher layer generates XABSL output automatically.
Figure 3. XPlM Net Editor: Editor with opened subnet showing the handling of certain referee events in RoboCup
4. Application The application of the modeling framework is currently used in two applications, the RoboCup team CoPs of the University of Stuttgart and in the Nexus project [8] for a distributed robot architecture in which multiple robots are used to guide persons through a building. In RoboCup, the aim was the modeling of an intelligent player model for coordinated and cooperative team play. There is one global multi-agent plan for the whole team that is executed on each robot player individually and designed according to real life soccer. According to a player’s tactical role the multi-agent plan specifies a behavior sequence to be executed depending on the current situation of the game and the team mates. The definition of the multi-agent plan is slightly connected to
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
657
real soccer strategies. On the main level we distinguish between Referee Events, such as Throw-In or Kick-Off, and the running game flow. The Referee Events subnet assigns unique tactical positions tuned to the current situation for every robot. Every referee event is handled uniquely in an own subnet. Furthermore, there are different strategies for some referee events like Goal-Kick or ThrowIn, depending on the current situation. The further development will include a learning algorithm to choose the right referee event strategy. During the running game we distinguish between two strategies, offensive play and defensive play. Each strategy defines a proper set of behaviors according to the situation. The advantage of such an approach is that each role can make use of different behaviors depending on the current situation in the match, which results in a situation sensitive architecture. This architecture can be used for many different player movement strategies that are modeled on real soccer movements. The Offensive and Defensive subnets are subdivided into further subnets that handle the behavior for the different tactical roles and subroles. In the subnets for tactical subroles, the different behaviors are implemented. An example for such a behavior is the pass play scenario described in the following section. 4.1. Pass-play
Figure 4. Pass Play Net
Figure 4 shows an example of a XPlM Net for a pass play implementation in form of a 2-agent interaction. Here, a pass is done between two players with a fixed position of the pass receiver. This XPlM Net requires the participation of two agents, one taking the special role of a pass player and another the special role of pass receiver.
658
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
The two participating agents start in the initial phase decideRole. Here a decision is made on which player takes which role in the pass play interaction. The function iAmNearestTo, which calculates the current position of the robot itself compared to the position of the team mates relative to the distance to a given arrival point, is part of a set of special world model functions 2.1. They are triggered regularly to gain information about the current status of the robot in connection with its team mates. The robot agent that has the closest position to the ball takes the special role pass player. The agent that has the closest position to the defined pass target position takes the special role pass receiver. If the distance of both players to one or both targets is similar a unique robot-id is considered to assign the special roles. The pass player approaches the ball oriented in the direction of the pass target. The pass receiver drives towards the pass target position. As soon as the pass player has reached the required position relative to the ball an action synchronization between the two agents takes place. Only if the pass receiver moves into a range of tolerance around the target position the execution of the pass play is continued. The position of the team mate is derived from communication over the world model. The pass receiving agent changes to the behavior interceptBall as soon as it reaches the pass target position. The behavior interceptBall lets the agent move to a position parallel to the goal line to intercept the pass. As soon as the pass receiver agent got the ball or the ball stops in a position very close to the pass receiver, it tries to shoot a goal directly. The introduced interaction contains only three transitions that result in an action coordination. The input transition of the behavior decideRole results in a complementary role assignment of the cooperating agents. The transition canPass results in a time-synchronization of the action phase. For clarity of the example not all exceptional cases are presented. The whole implementation was successfully tested in a simulation environment as well as on real robots at RoboCup 2005 in Osaka where an modified version was used to do a short pass after the referee events Throw-In and Kick-Off to avoid that the executing robot is shooting the ball directly into the goal.
5. Conclusion and Outlook We have presented a framework for modeling and executing cooperative behavior for teams of robots, based on newly introduced XPlM nets(simplified interaction nets) in combination with XABSL. We are currently using and evaluating the framework in RoboCup. In addition the approach will be used in an application for the Nexus project [8] that has the aim to develop global distributed world models. As a part of the project certain applications have to use these world models. One application will be the use of a set of robot agents that are navigating as visitor guides through a building. Here a high degree of cooperative agent behavior is necessary to coordinate the tasks of the different agents. Furthermore the presented framework will be enhanced for the use in the RoboCup project. To achieve a higher degree of autonomy for a team of cooperative robots, we are developing mechanisms for self-organized role decision. The work on a component for the analysis of the game situation is also in progress. Based on that analysis
O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets
659
the system should be able to decide autonomously which strategies are used. At the moment a certain team strategy has to be defined before the match or in the halftime break. With the help of an autonomous strategy decision component the team could react faster on the events during a match. As a part of that work we evaluate learning algorithms that allow the robots to decide about situationdependent strategies.
References [1] M. Becht, Muscholl K. M., and Paul Levi. Transformable multi-agent systems: A specification language for cooperation processes. In World Automation Congress (WAC), Sixth International Symposium on Manufacturing with Applications (ISOMA), 1998. [2] T. Buchheim, G. Kindermann, R. Lafrenz, and P. Levi. A dynamic environment modelling framework for selective attention. In Visser et al., editor, Proceedings of the IJCAI-03 Workshop on Issues in Designing Physical Agents for Dynamic Real-Time Environments, 2003. [3] H. J. Genrich and K. Lautenbach. System Modelling with High-Level Petri Nets. 1981. [4] Y. Lafont. From proof nets to interaction nets. In J.-Y. Girard, Y. Lafont, and L. Regnier, editors, Advances in Linear Logic, pages 225–247. Cambridge University Press, 1995. ungel. Designing agent behavior [5] M. L¨ otzsch, J. Bach, H.-D. Burkhard, and M. J¨ with the extensible agent behavior specification language XABSL. In Daniel Polani, Brett Browning, and Andrea Bonarini, editors, RoboCup 2003: Robot Soccer World Cup VII, volume 3020 of Lecture Notes in Artificial Intelligence, pages 114–124, Padova, Italy, 2004. Springer. [6] M. L¨ otzsch, J. Bach, H.-D. Burkhard, and M. Jngel. Designing agent behavior with the extensible agent behavior specification language xabsl. In 7.th international workshop on RoboCup 2003, 2003. [7] K.M. Muscholl. Interaktion und Koordination in Multiagentensystemen. Dissertation, Universit¨ at Stuttgart, 2000. [8] K. Rothermel, D. Fritsch, B. Mitschang, P. J. K¨ uhn, M. Bauer, C. Becker, C. Hauser, D. Nicklas, and S. Volz. SFB 627: Umgebungsmodelle f¨ ur mobile kontextbezogene Systeme. Proceedings Informatik, Innovatice Informatikanwendungen Band 1, pages 103–116, 2003. [9] M. Schanz, J. Starke, R. Lafrenz, O. Zweigle, M. Oubbati, H. Rajaie, F. Schreiber, T. Buchheim, U.-P. K¨ appeler, and P. Levi. Dynamic Task Assignment in a Team of Agents. In P. Levi et al., editor, Autonome Mobile Systeme, pages 11–18. Springer, 2005. [10] T. Schmitt, M. Beetz, R. Hanek, and S. Buck. Watch their moves: applying probabilistic multiple object tracking to autonomous robot soccer. In Eighteenth national conference on Artificial intelligence, pages 599–604, 2002. [11] M. Tambe. Towards flexible teamwork. Journal of Artificial Intelligence Research, 7:83–124, 1997. [12] Barry Brian Werger. Cooperation without deliberation: A minimal behavior-based approach to multi-robot teams. Artificial Intelligence, 110(2):293–320, 1999.
660
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
a,1 a b c
b
c
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
661
662
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
Lua Script Scripts can already access the available methods of the modules. Concrete Modules
MindModule
CameraModule
ActionModule
SensorModule1
SensorModule2
We can plug the modules that we want to use into the base program. Abstract Modules
JPMindModule
JPCameraModule JPActionModule JPSensorModule JPUDPModule
The occured events is allotted to the modules. Core Program OPEN-R Base Program
JPTCPModule
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
663
664
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
665
666
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain
o¨ e´
667
668
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Major behavior definition of football agents through XML José Luis VEGA, Ma. de los Ángeles JUNCO, Jorge RAMÍREZ Instituto Tecnológico y de Estudios Superiores de Monterrey
Abstract. Implementing behaviors on robots or computational agents is a task that implies specific knowledge about the basic robot programming language. In our work we propose an approach that separates low level robot programming from behavior definition, which allows an accelerated development of both tasks. The behavior models that we use are hierarchical state machines in which every state is related with an action that can be a basic action or a complex behavior defined in XML. Transitions between states are triggered by events perceived by the agent. A simulator was built to test this approach and to develop football behaviors. This simulator accepts agents with different characteristics and also defines an environment similar to the actual field. It shows transitions, active states in the internal state machines of the agent and the events perceived. With this tool the main players of a football team (goalkeeper, defense and attacker) have been generated easily with little knowledge of programming and mainly focusing on strategies.
Introduction Robot programming is a difficult task for a person without robotics or programming skills. Even for expert programmers changing code in native language is hard and error prone. We want to avoid these inconveniences so any person can define complex task on robots. To make this possible the system must allow defining agent behaviors in an easy and intuitive way without needing any knowledge on the basic implementation of actions and perceptions [1]. The Sony Four-Legged Robot League players of the RoboCup league [2] are a particular case in which a team needs to define in a fast and easy way strategies. In our team we are looking for a platform that is simple, versatile, and capable of code reuse. In addition we wish a hierarchical and well defined structure that can be adapted easily to the developments of other modules as perception or locomotion, and independent enough so we can reuse it in future research on robotics and multi-agent systems. To fasten the development of strategies a controlled or simulation environment is built like in [3], in it the strategies can be tested and improved quickly. One of the most used techniques in defining behaviors are the Finite State Machines (FSM), in spite of their limitations the fast and always predictable response makes it a good option in defining robot behaviors. Using an intuitive language as XML (Extensive Markup Language) for modeling Finite State Machines we can hide lots of programming details and facilitate the creation of graphic tools that even a nonprogrammer user can understand. In addition we propose a hierarchical structure in
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
669
which the behaviors can be easily reused and evolved with the characteristics of the robot. Parallel work between separate modules can be achieved and this system can also be used as an alternative in defining complex tasks in a generic way, just implementing the XML engine in the specific platform of the agent.
1. Defining Behaviors in Agents There is a lot of work related with the problem of defining behaviors in agents. There are well studied techniques that are very efficient in certain situations. RoboCup research, which focuses mainly on the robotics leagues, is very particular because of the environment and robot characteristics. Teams using decision systems based on planning must contemplate that plans need to be generated in real time. The environment is so dynamic that a plan generated lately could be inadequate or maybe harmful in pursuing the global team goals. Other complication is that although the plan has been generated on time the execution of this plan in the real world can arouse unexpected results, therefore systems capable of executing multiple parallel plans, plans with conditional cycles, timing in actions and fail planning systems have been developed. Some teams have decided not to use planning but instead they have been using fast and reactive systems, for example Finite State Machines, decision trees [4] or behavior based robotics [5]. In past works like Rudomín and Millán [6], XML scripts are used for defining low level animations and they include in the same scripts a way in which the agent can interact with the environment. As a result they obtain a simplified way to assign an agent or groups of agents a particular behavior. XML agents are associated with a “type of agent” which depending on the perception of the agent can switch to other behaviors. The “type of agent” consists of a set of states that are associated to atomic actions which are executed in a sequential manner. In each state there are transitions to other states that validate some condition and, if evaluated true, they switch the running state. Other useful features are mechanisms for defining variables and assigning values to them on execution time and, also, include them in a condition expression part of a transition. In its champion code of 2004, the “German Team” of the Four-Legged RoboCup league, uses the language XABSL [8] based on XML. All the football strategies are modeled using this language. Robot behaviors are defined through a modular state machine system in which each module is named “option”. These options are written in XML archives. This language has many advantages: it is flexible, modular and hierarchical. The decision tree is based on if else statements and the conditions are related to a big number of variables that the user must manage in order to program properly. With these tools and this behavior modeling, the team has achieved the integration of a large number of options. These flexibility allows a lot of people working specifically in strategy without necessarily knowing the basic details of the implementation in C++. Graphic documentation is another fancy characteristic using DOTML [9]. Another highly used method to consider is behavior based robotics [5], the disadvantages are that football is a game in which strategy is a key point, so hybrids methods can be used like in [7], but searching always an equilibrium between complexity and usability of the whole system.
670
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
2. Defining behaviors through XML We developed a platform for defining robot behaviors using state machines modeled with XML as in [8]. With this any user can define a complex task even without knowing the basic programming details. These facilitate the creation of graphic interfaces even more intuitive for an inexperienced computer or robotics user. As any state machine it responses very fast to environment changes. We propose a hierarchical structure so it allows at top levels defining more complex strategies. The behaviors modeled with XML prevent programming the strategies directly in the native language of the robot (like C++ for AIBO), we only need one implementation of a generic state machine which can run any state machine defined in XML avoiding a compilation stage with every strategy change. The XML document has many advantages like: a well standardized description, easy and intuitive writing, portability and a lot of design tools available. With this module users can work directly with top level strategies encapsulating programming details. A user programming an agent only needs to know the set of permitted actions as well as all the events an agent can perceive in order to program the robot. The module allows the incorporation of new functionality, as actions or events with little changes in the source code, so it can grow as fast as other modules like vision or locomotion. The following section will explain how the XML behavior definition document is compounded. State definition States are the basic execution unit. All of these have an associated action which can be a basic action or an action defined as well by a state machine. They can be iterative or not. An iterative state causes an action to be executed repeatedly as long as the state is active. An example could be “turn till the ball is found” that will cause the robot to turn until a ball is found. A non-iterative state executes the associated action once and then waits for an event that switches the active state. Figure 1 shows an example of an state definition in XML.
<state name="begin" action="searchBall“ iterate=“true”> … Figure 1. Example of a state definition through XML. The state executes an action of type “searchBall”, which is repeated until a transition is triggered.
Transition definition Transitions are normally associated to events that the robot can perceive, although null transitions can exist too. All transitions make a change of the execution state. The transition keeps a hierarchical structure where the top transition represents the higher hierarchical level. With this structure a layered behavior can be easily modeled. They are represented by a tag of type “transition” whose only attribute is the next state in case the transition is activated.
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
671
A distinguishing characteristic of our state machine is the capability of relating basic actions to a transition. For example losing a ball could trigger sending a message to their team mates. Any number of events can be inside a transition, for a transition to be activated all the events need to be true. This is a very simple way of having a conjunction of events. If we want to model a disjunction of events just another transition needs to be added. The events can be logical or analog. They are represented with the tags “event” or “analogEvent” and the possible attributes are: name, operator and value. A logical event evaluates to a Boolean value so only a not operator is available; an example of this event is “ballLost”. Analog events can make a comparison to some value and if the conditions are met the event will be true. The “operator” attribute indicate the type of comparison and the “value” attribute the value to which it’s compared, an example is of this event is “timer”. Figure 2 shows a transition definition in XML. <event name="ballFound" operator="not"/> Figure 2. Example of a transition definition through XML. The transition will be triggered when the event “ballFound” has a value “false”.
Behavior definition A behavior is compounded by a set of states with their corresponding transitions, in other words it represents a complete state machine. It has an initial state from as the execution is begun and one or many final states, those make sense only if the behavior cannot be fragmented, meaning that a whole sequence must be completed so that the agent can leave this behavior, if that is the case, an “atomic” tag is introduced to the behavior definition. If the agent cannot get into a final state, a blockage can occur. An example of this behavior is a goal shoot, in it a whole kick sequence must be completed in order to leave the state. Figure 3 shows an example of an atomic state. … Figure 3. Atomic state definition.
Behaviors are defined in a hierarchical way as in [3]. This technique provides a way to encapsulate any number of behaviors but even with three encapsulation levels, behaviors where complex enough to define football strategies. XML Validation To prevent defining elements or attributes without meaning we need tools to validate the documents. In XML this is not a problem because there are many validation options, one of these are the DTD documents which can be generated easily and can restrict properly the elements and attributes used in our behavior and environment definition.
672
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
Environment definition The simulation environment is also defined through XML as in [6]. In it the user can define the main simulation parameters like speed and friction as well as the kind of robots, robots IDs, positions, teams or virtual world. This feature provides an easy way to incorporate players, change the player behaviors or even change the virtual world without need of changing the source code. Figure 4 shows an example of an environment definition with XML. <environment robot_delay="30" object_delay="100" friction="0.3"> Figure 4. Environment definition with XML. In this script the environment includes two AIBO robots, the first one belongs to the blue team and uses the state machine in the document “attacker.xml” the second one belongs to the red team and uses the state machine in the document “goalKeeper.xml”.
3. Simulation results The implementation keeps the same hierarchical relationship in each behavior. An object is created for each behavior and state. The execution as well keeps the same relationship with states and behaviors so the robot runs the XML code directly through the behavior and state objects without needing any intermediate code. When simulating the agents we have great control over the execution because the user sees the behaviors exactly as them were described in the XML document. Figure 5 shows an image of the simulator with a football environment. The black arc represents the vision angle of the robots and the color the team they belong. When selecting a player from the menu, a contour will appear on him to easily identify it. Afterwards, two windows will be shown: the first one with his system of events and the second one with the graph of the main behavior. The window of events shows the state of each event (active or inactive), and provides an interface so that the user sends events to the agent as if it were buttons in a real robot.
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
673
Figure 5. Football agent simulation. The simulator provides a graphic way to analyze a behavior or even navigate through the actions with their associated behaviors or atomic actions. It shows the transitions between them and the active states. Figure 6 shows the top state machine of a goal keeper.
Figure 6. Example of a hierarchical state machine. The player is at the state “outToClear”, which is made up of another state machine.
The simulator has the functionality of pause/resume the game so each agent could be analyzed at any moment. Each player can show the events perceived, the actions it is performing, the values of the analog variables, the active states and the transitions be-
674
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
tween them. It also has other functionalities like: locating players; returning the ball to the initial position; unblock the ball or goal counting. We implemented basic roles of football including goalkeeper, defender and attacker. This team played against a team with players of only two kinds: one goal keeper and the rest of the players only searched for the ball, went for the ball, and shot to goal. The simulator run for 8 hours and the results indicated that the robots with specific roles were much better. In some cases we observed some emergent behaviors like passing between defender and attacker. Frequently used behaviors like “follow the ball”, could be described once and used many times by the same player or even by other players accelerating behavior definition and simplifying code reusing. The different strategies and players could be easily incorporated in the environment definition through XML and changing a strategy do not required programming skills. To validate that our approach is easier than traditional programming, we ask first semester students to modify and improve agent’s strategies. As a result they were able to understand and make important improvements with little help. In addition, players exhibited complex behaviors from simple state machines, and the most important usability characteristic is that the top hierarchy of the behaviors resulted in a human readable state machine. The hierarchical approach in a way helps to partition the strategy in many sub-problems which could be addressed individually and keeping the general strategy less confusing.
4. Conclusions We have shown a behavior definition platform used in the definition of agent football players, separated from the original robot programming language. This allowed to generate and to prove game behavior quickly. The behavior definition in XML is simple with the aid of tools available for the manipulation of XML archives, in addition it provides a model of comprehensible definition of strategies independent from the internal programming of the robot. This accelerates the development of new strategies as well as actions and events perceived by the robot, since it is only needed to update the listings of actions and events available so that a strategist knows the functionalities that can be used. On the other hand, the representation of a behavior as a hierarchic state machine facilitates the continuous improvement and the reusability of behaviors. The possibility of encapsulating behaviors allows the creation of low complexity state machines on each layer. We have also built a graphical program to show active and inactive events, active and inactive states, and transitions, offering a clear panorama of the operation of the robot. This facilitates the comprehension of behaviors by allowing the user to realize which actions or events are desirable and, therefore, developing better strategies.
References [1]
MacKenzie, D. and Arkin, R., "Evaluating the Usability of Robot Programming Toolsets", (1998). International Journal of Robotics Research, Vol. 4, No. 7, pp. 381-401.
[2]
M. Veloso, W. Uther, M. Fujita, M. Asada, and H. Kitano. Playing soccer with legged robots. In In Proceedings of IROS-98, Intelligent Robots and Systems Conference, Victoria, Canada, October 1998.
J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML
675
[3]
T. Balch. TeamBots simulation environment. http://www.teambots.org, 2003.
[4]
M. Lötzsch, J. Bach, H. D. Burkhard and M. Jüngel. Designing Agent Behavior with the Extensible Agent Behavior Specification Language XABSL. In: 7th International Workshop on RoboCup 2003 (Robot World Cup Soccer Games and Conferences). Lecture Notes in Artificial Intelligence. (2004).
[5]
R. Arkin. Behaviour Based Robotics, MIT Press, Cambridge MA, 1998.
[6]
I. Rudomín and E. Millán. XML scripting and images for specifying behavior of virtual characters and crowds. Proceedings CASA 2004, Geneva Switzerland, July 2004, University Press pp 121-128
[8]
M. Lötzsch. XABSL web site. 2003. http://www.ki.informatik.hu-berlin.de/XABSL
[9]
M. Lötzsch. DotML Documentation. 2003. http://www.martin-loetzsh.de/DOTML.
[7] M. Nicolescu, M. Mataric. A Hierarchical architecture for behavior-based robots. Proceedings of the first international joint conference on Autonomous agents and multiagent system: part 1. Bologna, Italy. Session 3B: robot architectures, pages 227-233, 2002.
676
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot Jörg Stückler, Johannes Schwenk, and Sven Behnke University of Freiburg, Computer Science Institute Georges-Koehler-Allee 52, 79110 Freiburg, Germany Abstract. To make bipedal locomotion robust, it is not sufficient to rely on postural responses that try to prevent falls, but it is also necessary to detect falls and to implement appropriate recovery procedures. This paper describes general methods for a humanoid robot to stand up from the prone and supine posture. We illustrate the use of these getting-up routines on the example of our robot Jupp. The proposed methods require only a limited number of degrees of freedom and observe common joint-angle limitations. We employed a physics-based robot simulation to analyze the kinematics and dynamics of getting up. The standing-up routines have been implemented on the real robot as well. Tests in our lab and at RoboCup 2005 showed that reliable standing-up is possible after a fall and that such recovery procedures greatly improve the overall robustness of bipedal locomotion. Keywords. Getting up, standing up, humanoid robot
1. Introduction Bipedal locomotion on a variety of surfaces under the influence of external disturbances is a challenging task for a humanoid robot. While many approaches exist to prevent the robot from falling, postural responses are frequently not sufficient to maintain an upright posture. If disturbances are large enough, a fall might become unavoidable. Falls are most likely in dynamic environments that contain other agents, such as humans, which interact physically with the robot. In the case of a fall, it is essential for the overall robustness to have reliable recovery procedures. The robot must be able to detect the fall, recognize its posture on the ground, and to get back into an upright posture. While it is relatively easy to interpret the readings of attitude sensors in order to detect a fall and to decide, e.g. whether the robot lies in a prone (facing down) or a supine (facing up) posture, it is not straightforward to come up with reliable gettingup routines. This is due to the fact that the robot’s center of mass (COM) projection to the ground leaves the convex hull spanned by the feet contact points. Hence, additional support points are needed in order to move the COM back into the foot polygon. Knees, elbows, hands, and the backside of the robot may be used to provide additional support. This results in whole-body motions with sequences of support points. The many degrees of freedom of humanoid robots and the changing contact points make it difficult to apply
J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot 677
conventional motion-planning techniques. On the other hand, humans devised a variety of strategies to get up from the ground [1]. The observation of the human example served as inspiration for us during the development of getting-up motion sequences. However, compared to humans, humanoid robots often lack essential degrees of freedom, e.g. in the trunk. Furthermore, the robot joints are often restricted to a limited range of motion and can only provide limited torques. Humans overcome similar limitations by combining statically stable motion phases with dynamic phases. We follow this approach with the design of standing-up routines from the prone and supine posture. While statically stable motion sequences bring the COM projection close to the foot polygon, dynamic motion is needed to come back on two feet. Afterwards, statically stable motion brings the robot into an upright posture. We developed the standing-up routines using a physics-based simulation. This had the advantage that the stability of motion could be analyzed with perfect information about the robot dynamics on a variable time scale. We implemented the getting-up routines on two small humanoid robots, which were designed to compete in the RoboCup KidSize class. In order to assess the performance of these routines, we performed tests in our lab. We also used the behaviors during humanoid robot soccer games, which took place for the first time at RoboCup 2005 in Osaka, Japan. The remainder of this paper is organized as follows: The next section reviews some of the related work. In Section 3, we detail the mechanical and electrical design of our small humanoid robot Jupp. We also introduce the framework we use for controlling its behavior. The main part of the paper is Section 4, where we explain in detail how the robot detects falls, recognizes its posture, and stands up from the ground. Experimental results are presented in Section 5.
2. Related Work Very few humanoid robots are designed to survive a fall. Even the most advanced humanoids, like Asimo [2] and the Toyota Partner Robots [3], have not been demonstrated to be able to go to the ground nor to get back into an upright posture. They focus on walking stability. Indeed, only few humanoids are able to stand up. The best known example is HRP-2P (154cm, 58kg, 30DOF) [4], which has a lightweight backpack, strong arms, and wide ranges of motion in key joints. Kanehiro et al. proposed for it a motion controller that supports static motion and ZMP-controlled [5] dynamic motion. Standing-up after controlled going to the ground is performed by transitions between predefined states of contact between the robot’s body parts and the surface. When the robot stands up from a prone position, the dynamic motion controller is applied to perform a short transition between kneeing and crouching. Dynamic motion is needed because the knees and soles of the robot can not have contact with the floor simultaneously. Kuniyoshi et al. [6] investigated the dynamics of standing-up from the supine posture by a roll-and-rise motion with humanoid robot K1 (150cm, 70kg, 46DOF). They analyzed the dynamics with a simplified model of the robot and explored the parameter space within the simulation. Whereas the simulated robot succeeded in standing-up, the motion was not transferable to the real robot. The authors explained this difference by the difficulty of simulating the contacts between robot and ground precisely. Another humanoid that is able to get up from the ground is Sony’s QRIO [7] (58cm, 7kg, 38DOF). It checks its position after a fall, turns face up, and recovers from a va-
678 J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot Joint Axis Range (driven) Range (relaxed) Shoulder roll 0◦ ↔ +180◦ n/a pitch −90◦ ↔ +90◦ n/a Elbow pitch −45◦ ↔ +135◦ n/a Spine pitch −40◦ ↔ +90◦ −45◦ ↔ +90◦ roll −40◦ ↔ +90◦ −60◦ ↔ +90◦ Hip pitch −40◦ ↔ +90◦ −60◦ ↔ +90◦ yaw −90◦ ↔ +90◦ −90◦ ↔ +100◦ Knee pitch −130◦ ↔ 0◦ −150◦ ↔ +5◦ Ankle pitch −65◦ ↔ +65◦ −80◦ ↔ +80◦ roll −90◦ ↔ +30◦ −90◦ ↔ +30◦ Figure 1. Lateral and sagittal view of our robot Jupp. The table shows the range of motion for its joints.
riety of prone positions by static movements. There is a variety of smaller servo-driven humanoid robots, which have been designed for the RoboOne competitions [8], where robot fighters engage in martial arts and standing-up is an essential feature. The CycloidII robot by Robotis [9] (41.5cm, 2.4kg, 23DOF), for example, had some success in these competitions. It is capable of standing up statically from both the prone and supine posture due to its powerful Dynamixel actuators and disproportional long arms. Kondo also constructed their popular KHR-1 [10] robot (34cm, 1.2kg, 17DOF) for the RoboOne competitions. Static standing-up routines from both lying postures have been implemented for it. Robotic soccer is another domain where standing up is important. At RoboCup 2005 [11], Team Osaka [12] demonstrated static standing-up of their robot VisiON Nexta (47.5cm, 3.2kg, 23DOF) from each posture. This robot has yaw and pitch waist joints and is driven by Dynamixel actuators.
3. KidSize Humanoid Robot Hardware Design: Fig. 1 shows two views of our humanoid robot Jupp. It has been designed for the 2005 RoboCup Humanoid League competitions in the KidSize class. As can be seen, Jupp has human-like proportions. Its mechanical design focused on weight reduction. Jupp is 60cm tall and has a total weight of only 2.3kg. The skeleton of the robot is mostly constructed from aluminum extrusions with rectangular tube cross section. We removed all material not necessary for stability. Jupp’s feet and its forearms are made from sheets of carbon composite material. The forearms are bendable in sagittal direction and are covered by a lightweight foam tube. The robot is driven by 19 servo motors: 6 per leg, 3 in each arm, and one in the trunk. Three orthogonal servos constitute the hip joint, two orthogonal servos form the ankle joint, and one servo drives the knee joint. We selected the S9152 servos from Futaba to drive 2 DOFs of the hips, the knees, and the ankles. These digital servos are rated for a torque of 200Ncm, and can be driven to positions of ±65◦. They have a weight of only 85g. The hip yaw joints need less torque. They are powered by DS 8811 digital servos (190Ncm, 66g, ±90◦ ). Jupp’s arms do not need to be as strong as the legs. They are powered by SES640 analog servos (64Ncm, 28g, ±90◦). Two orthogonal servos constitute the shoulder joint and one servo drives the elbow joint. We use the following definition of rotational directions: Positive pitch bends in on the frontal side of the robot. Positive roll and yaw means rotating outwards looking forward. The table in Fig. 1 lists the range of motion for all joints.
J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot 679
Jupp is fully autonomous. It is powered by high-current Lithium-polymer rechargeable batteries, which are located in its lower back. The servos are interfaced to three tiny ChipS12 microcontroller boards. One of these boards is located in each shank and one board is hidden in the chest. Every 12ms, target positions for the servos are sent from the main computer to the ChipS12 boards, which generate intermediate targets at 180Hz. This yields smooth joint movements. The microcontrollers send preprocessed sensor readings back. In addition to joint potentiometers, Jupp is equipped with an attitude sensor. The attitude sensor is located in the upper trunk. It consists of a dual-axis accelerometer and two gyroscopes. We use a Pocket PC as main computer [13], which is located in Jupp’s chest (see Fig. 1). This computer runs behavior control, computer vision, and wireless communication. Behavior Control: We control Jupp using a framework that supports a hierarchy of reactive behaviors [14]. This framework allows for structured behavior engineering. Multiple layers that run on different time scales contain behaviors of different complexity. For Jupp we use three levels of this hierarchy: individual joint, body part, and entire robot. This structure restricts interactions between the system variables and thus reduces the complexity of behavior engineering. The lowest level of this hierarchy, the control loop within the servo, has been implemented by the servo manufacturer. It runs at about 300Hz for the digital servos. We monitor target positions, actual positions, and motor duties. At the next layer, we generate target positions for the individual joints of a body-part at a rate of 83.3Hz. Simulation: In order to be able to design behaviors without access to the real hardware, we implemented a simulation for Jupp, which is based on the Open Dynamics Engine [15]. The simulation allows examining the robot’s behavior on variable time scales. This helps the behavior engineers to design complex or dynamic motions.
4. Standing-Up Routines A standing-up routine is triggered when the robot has fallen over with high certainty. To determine this state the attitude sensors are interpreted. If the robot is tilted more than 45◦ for more than one second, we assume that the robot has fallen. As our robots cannot lie other than facing upwards or downwards on a flat surface, we only have to inspect the sign of the sagittal tilt in order to recognize its posture. Assumptions: We developed standing-up routines for the RoboCup Soccer domain from the supine and prone posture under the following assumptions: • The surface is flat carpet (soft, moderate friction). • No obstacles conflict with the robot’s motion during the standing-up routine. • The robot is lying straight on the surface with all joints moved to zero positions. Motion Generation: The standing-up motions are generated by setting target positions for individual joints. We use sinusoidal trajectories, which are smooth and natural for static movements, whereas for dynamic motions the distinctive points of target velocity and acceleration being zero or maximal are obtained easily from the waveform itself.
680 J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot
Figure 2. (a)-(e) Starting and end positions of phases I-IV when standing up from the supine posture.
Standing up from the Supine Posture: Standing up from the supine posture in a statically stable way can be achieved by folding the body over the feet supported by the arms. In this movement the knees and the ankle pitch joints are bent in. Then, the hip pitch joints are moved positively and the trunk is leaned back by rotating the spine negatively to shift the COM over the feet. Now, the trunk is brought over the feet by bending in the spine in positive direction and, finally, the whole body is straightened. Jupp’s limitation of the ankle pitch, knee and spine joint motion ranges complicates this motion, because the arms are not long enough to support the motion while the hip pitch joints and the spine let the COM shift over the feet. Instead, the body has to be swung forward, while the arms lose contact with the surface. During this dynamic movement, the robot tips over the trailing edges of the feet, which form the support polygon. The COM projection onto the surface is moving from behind the trailing edges of the feet to the front until the whole feet are standing on the surface. The standing-up motion unfolds as four separate phases (see Fig. 2). Phase I. Move the upper body into a sit-up posture and move the arms into a supporting position behind the back. Phase II. Move into a bridge-like position using the arms as support. Phase III. Move the COM over the feet by swinging the upper body to the front. Phase IV. Move the body into an upright posture. During Phases I and II the robot moves into an advantageous, bridge-like starting position for Phase III. To be able to bend the trunk in, the arms are angled by rotating the elbow and shoulder pitch joints. Additionally, the shoulder roll joints are moved to enlarge the distance of the elbows to the ground by moving the arms outwards. Now, the robot can sit up. The ankle pitch joints are rotated such that the feet lie flat on the surface, while the hip and the spine pitch joints are bent in. Meanwhile, the elbow has to be rotated further to prevent the forearms from touching the ground. As soon as possible, the elbows start moving back to straighten the arms, while the shoulder pitch joints are rotated further into negative direction. The shoulder roll joints are moved back to zero. This brings the arms into a backward position, which will support the later bridge-like position. During the alignment of the arms, when the hip pitch joints and the spine have reached their target positions, the legs are bent in by folding the knees to their negative limit. The twist of the hip is compensated by rotating the spine further, such that the robot remains sit-up. In Phase II, the arms are straightened in the elbows, and the hip pitch and spine joints are moved to zero, such that the hip is lifted. The ankle pitch joints rotate positively to let the hip and the knees shift forward.
J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot 681
(a) Shoulder Roll (b) Shoulder Pitch (c) Elbow (d) Spine (e) Hip Pitch (f) Knee (g) Ankle Pitch (h) Support
I
0.4 0.3 0.2 0.1 0 1 0.5 0 -0.5 -1 -1.5 2 1.5 1 0.5 0 2 1.5 1 0.5 0 -0.5 -1 2 1.5 1 0.5 0 0 -0.5 -1 -1.5 -2 -2.5 -3 1.5 1 0.5 0 -0.5 0.3 0.2 0.1 0 -0.1 -0.2 -0.3
II
III
IV
αh
0
2
4
6
8
10
12
Time (sec)
Figure 3. (a)-(g) Target positions (in rad) of individual joints when standing up from the supine posture (αh = 0.25π). (h) Distance (in m) of the COM projection to the extreme points of the support polygon in sagittal direction.
Now, the COM projection can be shifted close to the trailing edges of the feet in Phase III by rotating the hip pitch joints in positive direction, while the spine joint is moved negatively. As soon as the robot begins to tip back over the trailing edges of the feet, as the arms are not long enough, the arms are swung forward and the spine is rotated positively to accelerate the trunk forward. As the arms lose contact with the ground, the COM projection moves out of the support polygon until the whole foot soles are touching the ground again (see Fig. 3(h)). We identified the main control parameters to be the amplitude of the hip pitch joint trajectory, αh , and the length of the dynamic phase. Both values directly influence the speed of the trunk forward motion. Too low values let it cease before the COM has moved over the feet, whereas too high values increase the chance of the body to overshoot and to fall over the leading edges of the feet. Finally, all joints are brought back to zero to get the robot into an upright posture. The spine is temporarily bent in to stabilize the motion. Fig. 3(a)-(g) show details on the trajectories and their exact timing. Standing up from the Prone Posture: The main problem of standing up from the prone posture is that the knees of a humanoid robot cannot be bent in positive direction. If it was possible, standing up from the prone posture would be similar to standing up from the supine posture. Instead, the COM projection has to be brought near to the feet by first bending in the ankles, the knees, the hip, and the spine maximally, such that the robot touches the ground with the knees and the leading edges of the feet. Then, the arms can be utilized to tip over the leading edges of the feet and to let the COM, that is situated in the hip region, be shifted over the feet. This motion is dynamic, as the COM projection leaves the support polygon at the leading edges of the feet.
682 J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot
Figure 4. (a)-(e) Starting and end positions of phases I-IV when standing up from the prone posture.
The mechanical limitations of Jupp, especially the limited joint angle ranges in the knees and the ankles, complicate the standing-up motion, as the legs can not be fold in completely. Thus, the arms have to lift a longer lever, which is formed by the leading edges of the feet and the COM. For a detailed description of the motion sequence, we divide it into four phases. Phase I. Lift the trunk and bring the forearms under the shoulders. Phase II. Move the COM projection as close as possible to the leading edges of the feet by bending in the spine, the hip pitch and the knee joints. Phase III. Straighten the arms to let the robot tip over the leading edges of the feet. Phase IV. Bring the body into an upright posture. As indicated by Fig. 4(c)-(d), the soles have to be moved as even as possible onto the ground, such that the dynamic motion is short. Therefore, the ankle pitch joints are rotated to their maximum position. To support this movement, the feet are lifted from the ground by bending in the knee and the hip pitch joints to let the knees contact the surface instead of the feet. Jupp cannot bend in the arms enough to get the ends of the forearms below the shoulders, while the trunk is lying straight on the surface. Thus, it prepares to support the lifting of the trunk with the arms: The elbows are bent in and the shoulder pitch joints are moved back, such that the endpoints of the forearms are still on the lateral plane of the robot. Meanwhile, the arms are rotated outwards in the shoulder roll joints, as they can be moved back to get the forarms below the shoulders when the trunk is lifted up. The trunk lifting is accomplished by bending out the spine and hip pitch joints. The arms support the motion by pressing the forearms onto the ground, as the elbow joints are straightened partly and the shoulder pitch joints move back to zero. During this motion, the arms can be brought under the trunk by moving the shoulder roll joints back to zero. Finally in this phase, the elbows are straightened further and the flexible forearms are strained by the weight of the trunk. In Phase II, the spine, the hip pitch joints, and the knees are fold in, bringing the COM projection close to the leading edges of the feet. The arms support this movement by rotating the shoulder pitch and elbow joints in positive direction. Now, the robot is holding a bridge-like position, leaning on the leading edges of the feet and the strained forearms. To let the body tip over the leading edges of the feet in Phase III, the shoulder pitch joints have to be rotated further to the +90◦ position and the arms are straightened by the elbow joints. The COM projection is leaving the support polygon shortly behind the leading edges of the feet, as described before and shown in Fig. 5(h). The amplitude of the elbow trajectory, αe , and the length of the dynamic phase are the main control
J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot 683
(a) Shoulder Roll (b) Shoulder Pitch (c) Elbow (d) Spine (e) Hip Pitch (f) Knee (g) Ankle Pitch (h) Support
I
2 1.5 1 0.5 0 2 1 0 -1 -2 2.5 2 1.5 1 0.5 0 1.5 1 0.5 0 -0.5 -1 2 1.5 1 0.5 0 -0.5 -1 0 -0.5 -1 -1.5 -2 -2.5 1.5 1 0.5 0 0.3 0.2 0.1 0 -0.1 -0.2 -0.3
II
III
IV
αe
1
2
3
4
5
6
7
8
9
Time (sec)
Figure 5. (a)-(g) Target positions (in rad) of individual joints when standing up from the prone posture (αe = 0.24π). (h) Distance (in m) of the COM projection to the extreme points of the support polygon in sagittal direction.
parameters for the dynamic motion. If they are too small, the COM projection will not leave the support polygon. Too high values cause the backward motion to overshoot, such that the arms are thrown back and are not able to absorb the following forward motion, which causes the robot to fall over the leading edges of the feet again. During the final Phase IV, all joints are moved to their zero position to bring the body into an upright posture. Fig. 5(a)-(g) show details on the trajectories and their exact timing.
5. Experimental Results We designed the proposed routines using the simulation and transfered them to the real hardware. We performed extensive tests in our lab. Under normal circumstances, i.e. appropriate battery voltage, the routines worked very reliably at high success rates of 100 percent in 100 tests. We observed, that the main control parameters αh and αe can be chosen from a wide range of values, still yielding highly reliable standing-up routines (e.g. αh = 0.21π ± 0.025π, and αe = 0.16π ± 0.06π). This is explained by the shortness of the dynamic phases, which is achieved by moving the COM projection as close as possible to the foot polygon in the preliminary static phases. Fig. 6 and Fig. 7 show image sequences of the dynamic phases of standing up from the supine and prone posture, respectively. The standing-up routines proved to be robust against changes in the initial posture and varying servo temperatures. When standing up from the prone posture, low battery voltage causes the supply voltage of the arm servos to break down during the high load in the dynamic phase. Although the routines have been developed based on rather restrictive
684 J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot
Figure 6. Image sequence showing dynamic Phase III of standing up from the supine posture (αh = 0.235π).
assumptions, the generated motions proved to be robust against relaxation of each of the assumptions. For example, the robot can also stand up on hard and slippery surfaces like stone or hardwood. The getting-up routines even worked when the joints of one arm were relaxed. The routines did not only work in our lab, but were used during soccer games at RoboCup 2005. In this dynamic environment, multiple robots are in pursuit of the ball, which leads to unavoidable body contact. The physical interaction of the robots disturbs the walking pattern, which may lead to falls. Because human help during play is not allowed, the robots must be able to get back into an upright posture by themselves. Our robots were able to detect falls reliably, triggered the appropriate standing-up routine, and succeeded almost always in standing up. Because the robots continued play afterwards, this greatly improved the overall robustness of our soccer team.
6. Conclusions In this paper, we proposed general methods for standing up with a humanoid robot from both the prone and the supine posture. In order to overcome hardware limitations, such as limited range of motion or torque limits, we combined statically stable motion phases with dynamic phases. Statically stable motion sequences that utilize support by the arms, the knees, or the backside of the robot bring the COM projection close to the foot polygon. Dynamic motion is needed to come back on two feet. Afterwards, statically stable motion brings the robot into an upright posture. We designed the standing-up routines using a physics-based simulation and implemented them on two small humanoid robots, which were designed to compete in the RoboCup KidSize class. The routines worked very reliably in our lab. Furthermore, they were integrated into the behavior control software that was used during 2 vs. 2 soccer games at RoboCup 2005. Because unavoidable robot interactions in this dynamic environment lead to falls, the reliable standing up was one of the key factors that determined the performance of our soccer team. Our team NimbRo played well. The gettingup was appreciated by the crowds. We reached the final in the soccer games, against the titleholder, Team Osaka. The exciting game ended 2:1 for Osaka. Videos showing the standing-up routines and their use in the RoboCup competition can be downloaded from our webpage http://www.NimbRo.net/media.html.
J. Stückler et al. / Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot 685
Figure 7. Image sequence showing dynamic Phase III of standing up from the prone posture (αe = 0.22π).
We plan to work into two directions in the future: avoiding falls and minimizing the damage of a fall. To avoid falls, we will implement various postural responses, based on sensory feedback. To minimize damage, we plan to bring the robot into a protective posture and to relax some of its joints prior to ground contact.
Acknowledgements This project is supported by the DFG (Deutsche Forschungsgemeinschaft), grant BE 2556/2-1.
References [1] A. van Sant. Rising from a supine position to erect stance. Physical Therapy, 68:185-192, 1988. [2] M. Hirose, Y. Haikawa, T. Takenaka, and K. Hirai. Development of Humanoid Robot ASIMO. Int. Conf. on Intelligent Robots and Systems, Workshop 2, 2001. [3] Toyota Partner Robots. http://www.toyota.co.jp/en/special/robot. [4] F. Kanehiro et al., The First Humanoid Robot that has the Same Size as a Human and that can Lie down and Get up, Proc. of the 2003 IEEE Int. Conf. on Robotics & Automation, Taipei, Taiwan, 2003. [5] M. Vukobratovic, A. Frank, and D. Juricic. On the Stability of Biped Locomotion. IEEE Transactions on Biomedical Engineering 17(1), S. 25-36, 1970. [6] K. Terada, Y. Ohmura, and Y. Kuniyoshi. Analysis and Control of Whole Body Dynamic Humanoid Motion - Towards Experiments on a Roll-and-Rise Motion. Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2003. [7] Sony: QRIO. http://www.sony.net/qrio. [8] RoboOne. http://www.robo-one.com. [9] Robotis: CycloidII. http://www.robotis.com/kor/cycloid2.html. [10] Kondo: KHR-1. http://www.kondo-robot.com. [11] The RoboCup Organization, http://www.robocup.org/ [12] Team Osaka. http://www.vstone.co.jp/top/products/robot/v2/ [13] S. Behnke, J. Müller and M. Schreiber, Using Handheld Computers to Control Humanoid Robots, Proc. of the DARH 2005, Yverdon-les-Bains, Switzerland, May 2005. [14] S. Behnke and R. Rojas, A Hierarchy of Reactive Behaviors Handles Complexity, Balancing Reactivity and Social Deliberation in Multi-Agent Systems, M. Hannebauer and J. Wendler and E. Pagello, 125–136, Springer, 2001. [15] R. Smith. Open Dynamics Engine. http://opende.sourceforge.net.
686
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Ball tracking with velocity based on Monte-Carlo localization Jun Inoue a,1 , Akira Ishino b and Ayumi Shinohara c a Department of Informatics, Kyushu University b Office for Information of University Evaluation, Kyushu University c Graduate School of Information Science, Tohoku University Abstract. This work presents methods for tracking a ball from noisy data taken by robots. In robotic soccer, getting a good estimation of a ball’s position is a critical. In addition, accurate ball tracking enables robots to play well, for example, efficiently passing to other robots, fine saves by the goalie, good team play, and so on. In this paper, we present new ball tracking technique based on Monte-Carlo localization. The key idea of our technique is to consider two kinds of samples, one for position and the other for velocity. As a result of this technique, it enables us to estimate trajectory of the ball even when it goes out of view. Keywords. Monte-Carlo Localization, Object Tracking, Object Recognition, RoboCup
1. Introduction In RoboCup, estimating the position of a ball accurately is one of the most important tasks. Most robots in Middle-size League have omni-directional cameras, and in Smallsize League, they use ceiling cameras to recognize objects. In the Four-legged League, however, robots have only low resolution cameras with a narrow field of vision on their noses, while the size of the soccer field is rather large. These facts make it difficult for the robots to estimate the position of the ball correctly, especially when the ball is far away or too near. Moreover, since the shutter speed of the camera is not especially fast, and its frame rate is not very high, tracking a moving object is a demanding task for the robots. In order to tackle these problems, several techniques have been proposed [11,9]. Kalman filter method [7] is one of the most popular techniques for estimating the position of objects in a noisy environment. It is also used to track a moving ball [2]. Methods using particle filters have been applied for tracking objects [5], and especially in Four-legged League, the method using Rao-Blackwellised particle filters has had great success[9]. In this paper, we consider how to estimate the trajectory of the moving ball, as well as the position of the static ball, based on the Monte-Carlo localization method [1]. MonteCarlo localization has usually been used for self-localization in the Four-legged League. We extend this to handle the localization of the ball. We have already mentioned a primitive method based on this idea in [6], where we had ignored the velocity of the ball. 1 Correspondence
to: Jun Inoue, E-mail: [email protected].
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
687
Figure 1. Cases of Ball Recognition. Dots on the edge show vertexes of a inscribed triangle. The diameter of the ball is calculated by using these vertexes.
We propose three extended methods to deal with the velocity of the ball and report some experimental results. The rest of the paper is organized as follows. In Section 2, we first present our ball recognition algorithm from the images captured by the camera. Section 3 describes the details of Ball Monte-Carlo localization, and our experimental results and conclusions are given in Section 4.
2. Ball Recognition from the Image Data As we have already mentioned in the previous section, the camera of the robot in Fourlegged league is equipped on its nose. The field of vision is narrow (56.9◦ for horizontal and 45.2◦ for vertical), and the resolution is also low (412×320 at the maximum). Moreover, stereo vision is usually impossible, although some attempts to emulate it by using plural robots have been reported [8,12]. Therefore, first of all, an accurate recognition of the ball from a single image captured by the camera is indispensable in estimating the position of the ball accurately. In particular, the estimation of the distance to the ball from the robot is very critical in establishing a stable method for estimating the position of the (possibly moving) ball. A naive algorithm which counts the orange pixels in the image and uses it to estimate the distance does not work well, since the ball is often hidden by other robots, and the ball may be in the corner of the robot’s view as shown in Fig. 1. In addition, the projection of the line of sight to the ground is used for calculating the distance to the object, but it depends on the pose of the robot and is affected by the locomotion of the robot. In the Four-legged League, the robot’s sight is very shaky; therefore, we do not use this method either. In this section, we will show our algorithm ability to recognize the ball and estimate the position relative to the robot from a single image. It will become the base for estimations from multiple images, as described in the following sections. In the image, the biggest component colored by orange and satisfying the following heuristic conditions is recognized as the ball. (1) The edge length of the component has to be more than 10 pixels, which helps exclude components too small. (2) The ratio of the orange pixels to the area of the bounding box must exceed 40%. (3) If the component touches the edge of the image, the length of the longer side of the bounding box must be over 20 pixels. We use the diameter of the ball to estimate the distance to it. However, the ball is often hidden by other robots partially, and when the robot approaches to the ball, only a part of the ball is visible at the corner of the camera view. Thus the size of the bounding box and the total number of pixels are not enough to estimate the distance accurately. Fig. 1 shows two cases of diameter estimation. When the ball is inside the view completely (left image), we regard the length of longer side of the bounding box as the
688
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
diameter of the ball. When the bounding box touches to the edges of the image (right image), we use three points of the components, since any three points of the edge of a circle uniquely determine the center and the diameter of it.
3. Ball Localization Using Monte-Carlo Method In this section, we propose a technique which estimates the position and velocity of a moving ball based on the Monte-Carlo Localization. This technique was introduced for the self-localization in [4] and utilized in [10]. The aim is to calculate the accurate position and velocity of the ball from a series of input images. In principle, we use the difference of the positions between two frames. However, these data may contain some errors. The Monte-Carlo method absorbs these errors so that the estimation of the velocity become more accurate. In this method, instead of describing the probability density function itself, it is represented as a set of samples that are randomly drawn from it. This density representation is updated every time based on the information from the sensors. We consider the following three variations of the method, in order to estimate both the position and velocity of the moving ball. (1) Each sample holds both position and velocity, but is updated according to only the information of the position. (2) Each sample holds both position and velocity, and is updated according to the information of both the position and velocity. (3) Two kinds of samples are considered: one for the position, and the other for the velocity. They are updated separately. For evaluating the effectiveness of our ball tracking system, we experimented in both simulated and real-world environments. The details of the algorithm and experimental results are shown below. 3.1. Ball Monte-Carlo Localization without Updating Velocity Firstly, we introduce a simple extension to our ball localization technique in [6]. It is a naive extension. The velocity is appended in samples, but not updated. In this method, a pi , vi , si ), where pi is a position in the local coordinate, vi sample is denoted as ai = ( represent a velocity of the ball, and si is a score (1 ≤ i ≤ n). We update these samples according to the procedure shown in Fig. 2. When the robot recognizes a ball in the image, each score si is updated in the function UpdateScoresByPosition. New score is calculated from the observed position po and the position of a sample pi as follows: ⎧ p (si − maxdownp ≤ Sip ≤ si + maxupp ) ⎨ Si (Sip > si + maxupp ) si := si + maxupp ⎩ si − maxdownp (Sip < si − maxdownp ), p
p
po − pi |, and τp > 0 is a constant which defines the neighwhere Sip = e−li ·τ , lip = | borhood of the sample . Moreover, the constants maxupp and maxdownp are also constants which controls the conservativeness of the update. If a ratio maxup/maxdown is smaller, responses to the rapid change of ball’s position are faster, at the same time, influences of errors are stronger. The constants tp and tv are thresholds of scores which control the effect of misunderstandings. The function randomize generates a new sample
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
689
Algorithm BallMoteCarloLocalizationWithoutUpdatingVelocity pi , vi , si )} and observed ball position p o Input. A set of samples Q = {ai | ai = ( Output. estimated ball position p e , velocity ve and average score avg of Q. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
for i := 1 to n do begin i + vi ; p i := p if p i is out of the field then randomize(ai ) end; if p o = then updateScoresByPosition(Q, p o ); P avg := si /n; p e := 0; ve := 0; w := 0; for i := 1 to n do begin if si < avg · random(0, 1) then randomize (ai ); else if si > tp then begin e + si · p i ; p e := p ve := ve + si · vi ; w := w + si end; end; e /w; ve := ve /w; p e := p output p e , ve , avg; Figure 2. Procedure Ball Monte-Carlo Localization without Updating Velocity
which has random position and velocity and it’s score is 0. The function random(0,1) generates a real number between 0 and 1. The vectors pe and ve are the weighted mean of samples which scores are t or greater. At first we show the results of a simulation. The environment of the simulation are as follows. The observing robot was standing at the center of the field, which is the origin of the coordinate and watching forward without any movement. It assumed that the ball was rolled from a left position to a right position with a constant speed, and the robot observed it, according to this assumption, we generated simulated positions of the ball automatically. Simulated positions of the ball included the error at a constant but high rate in advance for approximating the real environment. The robot estimated the accurate positions of the ball for observing the ball (first 20 frames), and predicted the trajectory of it based on the past observation (last 20 frames). We set parameters in the experiment as follows. The size of the field was 6000 × 6000(−3000 ≤ x, y ≤ 3000), the range of speed was −100 ≤ dx, dy ≤ 100, the number of samples n = 500, the constants maxupp = 0.1, maxdownp = 0.05, tp = 0.15, and τp = 0.008. Crosses in the figures present the position of the simulated ball positions, and Kalman Filter in Figure 3 shows a result of simple Kalman Filter. The line MonteCarlo(No Velocity) in Figure 3 shows the result of the simulation of this method. This method estimated the accurate position of the ball for observing the ball, but the velocity of each sample did not converge because of no update and prediction points move to random direction when the ball was out of view.
690
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization 2500
Simulated Ball Position MonteCarlo (no velocity) MonteCarlo(with velocity) MonteCarlo(separated) Kalman Filter
2000
1500
1000
500
0 -1000
-500
0
500
1000
1500
2000
2500
3000
Figure 3. The result of a simulation
3.2. Ball Monte-Carlo Localization with Velocity In the above method, samples had velocities but those were not reflected to the scores. In this method, all scores are updated by the velocity in addition to the above method. The function UpdateScoresByVelocity(Q, vo ) is inserted between Step 8 and Step 9 in Fig 2, and the expression plast := po is inserted below Step 21. For updating scores by velocity, we need the velocity of the ball. However, the velocity vo is not observed directly. Then we calculate it from the difference of positions between two frames. Let the vector plast be the last position of the ball and dt be the difference of frame-number between two po − plast )/dt, and the new scores are calculated based on it. frames. Then vo = ( We used another evaluation function for the update of velocity. It is similar to the function of updating the position, but the constants τv = 0.08, maxupv = 1.0, maxdownv = 0.1, tv = 0.2, because of the difference of scale between the position and the velocity. The line MonteCarlo(With Velocity) in Figure 3 shows the result of updating the position and the velocity jointly on the simulation. This method also estimated the position of the ball accurately when the ball was in the sight, but the trajectory that the ball went out the view was wrong. 3.3. Ball Monte-Carlo Localization separate Position and Velocity In second method, the velocity was reflected to the score but it did not estimate accurate trajectory. Because the number of the possible samples was huge and samples did not pi , spi ), and veli = converge. Therefore we consider two kinds of samples posi = ( (vi , svi ) in this method and they are scored independently. The scores spi are updated by UpdateScoresByPosition and the scores svi is updated by UpdateScoresByVelocity. It describes that the position and velocity is estimated independently using two MonteCarlo methods and reduces the number of possible samples. The detail of the algorithm is presented in Fig. 4 The line MonteCarlo(Separately) in Figure 3 presented the result of the method which updated position and the velocity separately on the simulation. The estimated positions fitted the pathway of the rolling ball,and our method predicted the accurate trajectory while the ball was out of sight. The effectiveness of this method was almost equal to Kalman filter.
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
691
Algorithm BallMoteCarloLocalizationSeparatePositionAndVelocity pi , spi )} and VEL = Input. Two sets of samples POS = {posi | posi = ( o , calculated ball velocity vo . {veli | veli = ( vi , svi )}, observed ball position p Output. estimated ball position p e , velocity ve , average score of P OS avep , average score of V EL avev . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
for i := 1 to n do begin i + ve ; p i := p if p i is out of the field then randomize(posi ) end; if p o = then begin UpdateScoresByPosition(POS, p o ); UpdateScoresByVelocity(VEL, vo ); end; p e = 0; P ve = 0; wp = 0;P wv = 0); ji /n; avgv := ki /n; avgp := for i := 1 to n do begin if ji < avgp · random(0, 1) then randomize (posi ); else if ji > tp then begin e + ji · p i ; p e := p wp := wp + ji end; if ki < avgv · random(0, 1) then randomize (veli ) else if ki > tv then begin ve := ve + ki · vi ; wv := wv + ki end; end; e /wp ; ve := ve /wv ; p e := p o ; p last := p output p e , ve , avgp , avgv ; Figure 4. Procedure Ball Monte-Carlo Localization separate Position and Velocity
3.4. Experiment increasing the number of samples Fig. 5 and Fig. 6 present the results of experiments increasing the number of samples. We increased the number of samples for 10 to 1000. Fig. 5 shows the result of using the data of uniform linear motion with errors, Fig. 6 shows the result of applying the uniform linear motion data with a obstacle. Each result is the average of five trials. All results presented that the measure of precision became more accurate as the number of samples increased, but it was not too large. In the case the number of samples exceeded 150, the result was almost unchanged in practice. The Observed data include observation errors from the beginning. For example, the no obstacle data has 52.6 mm error on the average in the observation. But in the result of localization using the no obstacle data in third method, that included the result of the estimation of the trajectory, the average of errors was about 40 mm in 150 samples. Therefore it declared that our method certainly hold down errors.
692
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
1000
1000
MonteCarlo(no velocity) MonteCarlo(with velocity) MonteCarlo(separated)
800
800
600
600
400
400
200
200
0
MonteCarlo(no velocity) MonteCarlo(with velocity) MonteCarlo(separated)
0
0
100
200
300
400
500
0
100
200
300
400
500
Figure 5. Experiment increasing the number of sam- Figure 6. Experiment increasing the number of samples (no obstacle) ples (with obstacle)
3.5. Real-world Experiments Fig. 7 and Fig 8 present the results of experiments in the real-world environment. The condition of the experiments was the almost same as that of simulation„ but from the results of simulation, we only experiment the separated method, because other two methods failed to predict an accurate trajectory. In these experiments, the obstacle hid the rolling ball in a while (Fig. 7) or the ball was rebounded frequently (Fig 8). Both the estimation for observing the ball and the prediction when the ball was out of view have good results, especially although the ball rebound frequently, our method predicts the accurate trajectory of the ball where the ball is out of view. 2000
Observed Ball Position Kalman Filter Ball Monte-Carlo Method
2000
1500 1500
1000 1000
500 500
0
Observed Ball Position Kalman Filter Ball Monte-Carlo Method
-600
-400
-200
0
200
400
600
Figure 7. Experiment with Obstacle
800
0 -600
-400
-200
0
200
400
600
800
Figure 8. Experiment with two rebounds trajectory
4. Conclusions We presented applications of Monte-Carlo localization to track a ball from noisy data and showed experimental results. The method which updates by positions of a ball, i.e. the original method of ball Monte-Carlo localization, did not estimate the ball position and the velocity well when the ball went out of view. The first extension of the method
J. Inoue et al. / Ball Tracking with Velocity Based on Monte-Carlo Localization
693
is to use positions and, in addition, velocities. However, the extended method also did not estimate the velocity accurately. The second extension of the method estimates by using positions and velocities separately. By the method, the estimation of the position and velocity was accurate, and the trajectory was a good approximation even when the ball went out. The naive methods to estimate the position and velocity are to use the arithmetic-mean or weighted arithmetic-mean. Those are very simple and fast, but are affected errors strongly, and it is clear that those methods do not estimate accurately. Some ball tracking techniques [3] cooperate with other robots and estimate global position and velocity of the ball [8]. We used only one static robot to estimate the position and velocity, and we plan to integrate our method with multi-robot ball tracking techniques. In addition, the movement of robots do not reflect the results of experiments, therefore we should introduce it in our experiments.
References [1] F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte carlo localization for mobile robots. In Proc. of the IEEE International Conference on Robotics & Automation, 1998. [2] M. Dietl, J.-S. Gutmann, and B. Nebel. Cooperative sensing in dynamic environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’01), 2001. [3] A. Ferrein, G. Lakemeyer, and L. Hermanns. Comparing sensor fusion techniques for ball position estimation. In Proceedings RoboCup 2005 Symposium, 2005. to appear. [4] D. Fox, W. Burgard, F. Dellaert, and S. Thrun. Monte carlo localization: Efficient position estimation for mobile robots. In Proc. of the National Conference on Artificial Intelligence, 1999. [5] C. Hue, J.-P. Le Cadre, and P. Perez. Tracking multiple objects with particle filtering. Technical Report 1361, IRISA, oct. 2000. [6] J. Inoue, H. Kobayashi, A. Ishino, and A. Shinohara. Jolly Pochie 2005 in the Four Legged Robot League, 2005. [7] R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, pages 35–45, 1960. [8] A. Karol and M.-A. Williams. Distributed sensor fusion for object tracking. In Proceedings RoboCup 2005 Symposium, 2005. to appear. [9] C. Kwok and D. Fox. Map-based multiple model tracking of a moving object. In Proceedings RoboCup 2004 Symposium, 2004. [10] T. R¨ ofer, T. Laue, and D. Thomas. Particle-filter-based self-localization using landmarks and directed lines. In RoboCup 2005: Robot Soccer World Cup IX, Lecture Notes in Artificial Intelligence, Springer, 2005. to appear. [11] D. Schulz, W. Burgard, D. Fox, and A.B. Cremers. Tracking multiple moving objects with a mobile robot. In Proc. of the IEEE Computer Society Conference on Robotics and Automation(ICRA), 2001. [12] A. Stroupe, M.-C. Martin, and T. Balch. Distributed sensor fusion for object position estimation by multi-robot systems. In Proc. of the IEEE International Conference on Robotics and Automation. IEEE, May 2001.
694
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Fast Vector Quantization for State-Action Map Compression Kazutaka TAKESHITA a,1 , Ryuichi UEDA a and Tamio ARAI a a Department of Precision Engineering, School of Engineering, The University of Tokyo, Tokyo, Japan Abstract. We propose a novel algorithm for fast vector quantization of state-action maps, which are used for quick decision making of robots. A state-action map is a multi-dimensional memory array whose elements record behavior of a robot at every state of the robot and its surroundings. Since its size on memory easily becomes larger than the amount of memory on a robot, vector quantization methods for their compression have been proposed. In this paper, we try reducing its computing time by using a clustering algorithm that utilizes a binary tree. In simulations and experiments with a RoboCup domain, the proposed algorithm can compress state-action maps in 0.2% of the time required by our conventional algorithm. Keywords. dynamic programming, state-action map, tree structure vector quantization
1. Introduction For real-time decision making of robots, utilization of state-action maps is effective. A state-action map (a map) is a look-up table that records appropriate behavior of a robot for every state of the robot and its surroundings. When transition of the state by behavior of the robot is known beforehand, a map can be created by dynamic programming (DP) [1] in advance. The robot with the map can quickly decide its behavior only with a reference of memory. Their size on memory frequently becomes a serious problem. For example, a map with 1.2 billion states has been created [2]. On the other hands, many kinds of robot do not have the room for such a huge map on its memory. As the solution of this problem, compression methods for maps have been proposed in [2][3,4,5] by Fukase, Ueda, Kobayashi and others. They use vector quantization (VQ), which has been frequently used for compressing images and waves [6]. A compressed map can provide the data in an element without decompression. Accordingly, the robot can refer to it quickly even if the state fluctuates quickly. In VQ, a map on a multi-dimensional memory array is evenly divided into blocks. The blocks are classified into some clusters. This process is called clustering. Blocks in each cluster are replaced by an average block. If the number of clusters can be set 1 Intelligent
Systems Division., Dept. of Precision Engineering, School of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 JAPAN Tel.: +81-3-5841-6486; Fax: +81-3-5841-6487; E-mail:[email protected]
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
695
much smaller than the number of blocks without losing the ability of the map, the VQ algorithm can reduce the amount of memory for recording it dramatically. In the VQ method, the pairwise nearest neighbor (PNN) clustering algorithm [7] has been used for the clustering process since it is a secure method, while it takes much time for computation. For compressing images or waves, many clustering algorithms that are much faster than the PNN algorithm have been proposed. However, compression of maps is much more difficult than their compression. If we want to compress maps quickly, this problem must be conquered. In this paper, we propose a novel clustering algorithm for clustering for quick clustering. A binary tree is utilized for clustering. The structure of this paper is as follows. In Sec. 2, DP is explained. The VQ method and the new clustering algorithm are presented in Sec. 3. RoboCup four legged league is introduced and each method is applied to a task in the RoboCup environment in Sec. 4. Evaluation is done in Sec. 5. We conclude this paper in Sec. 6.
2. Problem Definition Suppose the following system. Time is discretized as t = 0, 1, 2, . . . in this system. The agent exists at a state x in state space X . The agent can choose an action from a set of actions: A = {ak |k = 1, 2, . . . , Na } for every time step. Each action changes a state to another. We consider a set of discrete state S = {sj |j = 1, 2, . . . , Ns }, which is the division of X . S is regarded as a discrete state space. All of state transitions can be measured as a probabilities Pss . s, s and a denote the previous state, one of possible posterior states, and an action respectively. Some states are destinations of control and the agent stops when it enters one of them. They are called final states sf ∈ Sf . The agent is given reward Rass ∈ for each step according to the set of s, s , and a. We want to maximize the sum of the rewards from a (an initial) state to a final state by using an appropriate policy for choosing actions. The sum is called value V (s) of state s. When a policy is defined as a mapping π : S → A, a mapping V π : S → that gives the value for each state exists toward π. This mapping is called a state-value function. This system belongs to Markov decision processes (MDPs). In an MDP, there is a policy that can maximize values of all states. This policy, represented by π ∗ , is named the optimal policy. The state-value function for π ∗ , symbolized V ∗ , is named the optimal state-value function. V ∗ follows the Bellman equation [1]: V ∗ (s) = max a
a a ∗ Pss [Rss + V (s )].
(1)
s
To solve V ∗ (s) for all states, the value iteration algorithm [8] is frequently used. After V ∗ is completely solved, π ∗ is then obtained by: π ∗ (s) = argmax a
s
a a ∗ Pss [Rss + V (s )].
(2)
696
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
We define a state-action map, or a map for short, as a memory block for representing π. Our purpose is to reduce the volume of a map. The volume of an uncompressed map, which is a sequence of π ∗ (s0 ), π ∗ (s1 ), . . . , π ∗ (sNs ) in an array, is I = Ns log2 Na [bit].
(3)
We want to reduce this number.
3. Vector Quantization of State-Action Map 3.1. Flow of The Method The first step of VQ is vectorization. In this process, all states s1 , s2 , . . . , sNs are distributed in Nv = Ns /m blocks, where m denotes the number of elements in a block and all blocks have the same form. Next, the element in each block is aligned under a certain rule, and it is regarded as a m-dimensional vector v = {v j |j = 1, 2, . . . , m}. An example of vectorization of two-dimensional state-action map is shown in Fig. 1.
Figure 1. An example of vectorization(two dimensions)
Second process is quantization. In this process, the vectors which are made by vectorization are classified into clusters C1 , . . . , CNc . This procedure is called clustering. Nc denotes the number of clusters and determine the compression ratio. The all vectors in cluster C are represented by vector c , which is called a representative vector. This operation is called quantization. Finally, the compression result (a vector quantized state-action map, a VQ map) is given by two look-up tables: an index table and a codebook. In the index table, the indexes of representative vectors are written in order of the indexes of vectors. The codebook records the representative vectors in order of their indexes. The amount of memory for a VQ map is IVQ = Nv log2 Nc + mNc log2 Na [bit].
(4)
The first term of the right side is the amount of memory for the index table. The second one is that for the codebook. When we want to obtain the action of state sj from the VQ map, the following procedure is executed: 1) the vector and the position in the vector that are correspond to sj are determined, 2) the representative vector for the vector is determined by the index table, and 3) the action at the position in the representative vector is obtained.
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
697
3.2. Distortion Measure When a vector is changed to a representative vector that is not identical with the vector, information of the map is lost. In other words, the values of some states decrease. The following value D: D(π) = V ∗ (s) − V π (s) (5) s
s
will be a measure of this loss if we want to evaluate it as the average decrease on the state space. π is one with a VQ map. Such a measure is called distortion measure in VQ. Clusters and representative vectors are chosen based on this measure. However, it takes much time for computing this measure than some general distortion measures for compression of images or waves. That is because change of an element of a map sometimes decreases values of many states. In [4,5], following distortion measure is used by way of compensation. When the optimal action a = π ∗ (s) at state s is changed to another action a , amount of change of state value is calculable as a a ∗ δ(s, a ) = V ∗ (s) − Pss (6) [Rss + V (s )]. s
When a map gives policy π, the distortion is calculated by δ(s, π(s)). D (π) =
(7)
s
When a vector v is changed into a representative vector c, the distortion is D(v, c) =
m
δ(svi , ci ),
(8)
i=1
where vi and ci are ith element of v and c respectively. svi then denotes the state where vi comes from. The representative vector c is chosen as this equation is minimized. We do not need to calculate V π at change of an action of a state if these equations are applied to clustering. 3.3. Clustering Algorithms In the clustering process, we cannot evaluate all combinations of clustering ways due to the computational complexity even if we use Eq.(7) for evaluation. Therefore, a clustering algorithm that finds feasible clustering ways in feasible time should be utilized. 3.3.1. Pairwise Nearest Neighbor Clustering Algorithm This algorithm, which is also called the PNN clustering algorithm [7], has been used for compression in [2][3,4,5]. In this algorithm, Nv clusters are prepared and each vector belongs to each cluster at first. The clusters are merged one-by-one until the number of clusters is more than an aimed number. The pair of clusters that are merged is chosen from all possible pairs of two clusters as the increase of distortion is kept to a minimum.
698
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
3.3.2. Proposed Clustering Algorithm We propose a novel clustering algorithm based on tree structured vector quantization (TSVQ) for quick compression of state-action maps. This algorithm prepares one cluster that contains all vectors at first. The cluster is divided until the number of clusters is less than an aimed number. This process can be regarded as the growth of a binary tree whose each leaf has a cluster. Since the number of representative vectors is no more than one-tenth of the number of vectors in the task of [2,3], divide of one cluster is advantageous to fast computation than merge of many clusters. In this algorithm, a cluster C whose distortion, DC = v∈C D(v, c), is the larger than that of any other cluster is divided into two clusters. Two representative vectors cα and cβ are substituted for c, which is the representative vector of C. They are chosen with the following equations: cα = argmax D(v, c), and v∈C
cβ = argmax D(v, cα ). v∈C
(9) (10)
From these operations, a distant pair of representative vectors is chosen. After that, the generalized Lloyd algorithm (GLA) [6] is executed to the vectors in C. cα , cβ , and two clusters of these representative vectors are upgraded. The GLA algorithm stops when the improvement of distortion is not expected. 3.3.3. Generalized Lloyd Algorithm The GLA algorithm is applied after a codebook is obtained by the proposed clustering algorithm. The GLA algorithm can improve the codebook. In the GLA algorithm, which is based on Lloyd algorithm [10], the following procedures are iterated. 1) Representative vectors are fixed and clusters are reassembled. Each vector x belongs to a cluster whose c can yield minimal D(v, c) of Eq.(8). 2) Each representative vector is computed from vectors in each cluster.
4. Application to RoboCup 4-Legged Robot League For comparing the PNN clustering algorithm and the proposed algorithm, we apply them to a task for RoboCup 4-legged robot league. In this league, small pet robots, ERS-7 made by SONY, are used as soccer players. An ERS-7 has 3 degrees of freedom in every leg and its head as shown in Fig. 2. The CPU and RAM installed on it are RISC 576 MHz and 64 MB respectively. It has a CMOS camera on its nose. 4.1. RoboCup 4-legged league The task is to approach the ball from a proper direction with the minimum number of walking steps in the soccer field shown in Fig. 3. The state of the robot is represented by (x, y, θ), its position and orientation in the field. The position of the ball is expressed
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
Figure 2. ERS-7
699
Figure 3. State variables
Table 1. Domain and Divide of State Variables Variables
Domain
number of divisions
width
x [mm] y [mm]
[-2100, 2100) [-1350, 1350)
21 15
200 180
θ [deg]
[-180, 180) [150, 3150) [-75, 75)
18 9 15
20 100 (nearest interval from the robot) -700 (farthest) 10
r [mm] ϕ [deg]
by polar coordinates, (r, ϕ). r is the ball’s distance from the robot and ϕ is its direction from the robot. Their origins and directions of axes are defined in Fig. 3. The robot moves by using one of 25 kinds of walk operation such as go forward, go right/left side, and rotate right/left at one step. On the way to the ball, the robot should be able to observe the ball with its camera. |ϕ| ≤ 75[deg] is the limit. 4.2. Obtaining State-Action Map The domain and the division of each variable are shown in Table 1. As a result, the number of discrete states Ns reaches 765, 450 in which 11, 445 final states are included. a The transition probabilities Pss are calculated by means of random sampling of state transitions from any points in a discrete state of (x, y, θ) or that of (r, ϕ). As the rewards Rass , -1 is given to every step. Under these condition, we use value iteration algorithm [8] to obtain a state-action map.
5. Comparison of State-Action Maps We measure time for compression and performance of state-action maps through a simulation and an experiment. The following maps are compared: a state-action map (uncomp. map), a VQ map obtained by the PNN algorithm (PNN map), and a VQ map obtained by the proposed algorithm (TSVQ+GLA map). Their compression ratio are shown in the left part of Table 2. At first, time for compression is measured with Pentium 4 3.6GHz CPU. The result is shown in Table 2. The time is dramatically reduced by our proposed algorithm. Then we compare the performance of the maps with simulation and experiment.
700
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
Figure 4. An example trajectory of task 1
Figure 5. An example trajectory of task 2
5.1. Simulation The following simulation is held for evaluating the performance of maps. In a trial, an initial state is chosen at random and the number of steps from the state to a final state is counted. If the number of steps becomes more than 50 or a state is out of the state space, we regard these situations as failure. 10, 000 trials are held for each map. As shown in Table 2, we can think that to use the PNN clustering is too cautious in this task. The TSVQ+GLA map can mark the same extent of success rate and the number of steps in this simulation with the PNN map. 5.2. Experiment with an Actual ERS-7 We implement the three maps in an ERS-7 and repeat the following two tasks 50 times. The examples of trials in the Task 1 and The Task 2 are shown in Fig. 4 and Fig. 5 respectively. Task1: We put the robot at a point on the x = 0[mm] line (center line) with θ = 0[deg], and put the ball at a point on x = 1750[mm] line. y-coordinates of the robot and the ball are chosen at random. At first, the robot swings its head 10[sec] to localize itself. Next, the robot approaches the ball based on a map. Task2: We put the robot at a point on the x = 1750[mm] line with θ = 180[deg], and put the ball at a point on the x = 0[mm] line. Other conditions are identical with the conditions of Task1. Table 3 shows the results of the two tasks. As in the case of the simulation, the TSVQ+GLA map and the PNN map obtain similar results.
6. Conclusion and Future Work This paper proposed a fast clustering algorithm for compression of state-action maps. This algorithm belongs to tree structured vector quantization (TSVQ). This algorithm Table 2. Result of simulation
uncomp. map PNN map TSVQ+GLA map
comp. ratio
computation time
1:1 1 : 0.028 1 : 0.027
([min] for creating the map) 1.7 × 103 [min] 3.0[min]
success rate
avg. of steps
99.9% 99.1% 99.3%
12.3 12.8 12.7
K. Takeshita et al. / Fast Vector Quantization for State-Action Map Compression
701
Table 3. Experimental result Task1 success rate avg. of steps uncomp. map
Task2 success rate avg. of steps
PNN map
100 % 94 %
11.7 11.6
98 % 92 %
17.1 18.2
TSVQ+GLA map
94 %
11.4
96 %
18.3
can compress a state-action map for a task for robot soccer 570 times as fast as the PNN clustering algorithm that is used in [2,3]. In the simulation and the experiment, the VQ map that was compressed by the proposed algorithm marked identical performance with the VQ map that was compressed by the PNN algorithm. We will apply the proposed algorithm and other clustering algorithms to some problems in future so as to evaluate the relation between time and performance of maps.
References [1] R. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957. [2] T. Fukase et al., “Real-time Decision Making under Uncertainty of Self-Localization Results,” Gal A. Kaminka, et al. (Eds.) RoboCup 2002: Robot Soccer World Cup VI, pp. 375– 383, 2003. [3] R. Ueda et al., “Vector Quantization for State-Action Map Compression,” in Proc. of ICRA, 2003, pp. 2356–2361. [4] R. Ueda et al., “Value Iteration Under the Constraint of Vector Quantization for Improving Compressed State-Action Maps,” in Proc. of ICRA, 2004, pp. 4771–4776. [5] R. Ueda et al., “Vector Quantization for State-Action Map Compression –Comparison with Coarse Discretization Techniques and Efficiency Enhancement,” in Proc. of IROS, 2005, pp. 166–171. [6] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer Academic Publishers, 1992. [7] W. H. Equitz, “A new vector quantization clustering algorithm,” IEEE Trans. Acoust. Speech Signal Process, vol. 37, no. 10, pp. 1568–1575, 1989. [8] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998. [9] T. Fukase, Y. Kobayashi, R. Ueda, T. Kawabe, and T. Arai, “Real-time Decision Making under Uncertainty of Self-Localization Results,” in Proc. of 2002 International RoboCup Symposium, 2002, pp. 372–379. [10] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Information Theory, vol. IT28, pp. 127–135, 1982. [11] Y. Lin et al., “An algorithm for vector quantization design,” IEEE Trans. Comm., Col., vol. COM-26, pp. 84–95, 1980.
702
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Incremental Purposive Behavior Acquisition based on Modular Learning System Tomoki Nishi a , Yasutake Takahashi a,b,1 and Minoru Asada a,b a Graduate School of Engineering, Osaka University b Handai FRC Abstract. A simple and straightforward application of reinforcement learning methods to real robot tasks is considerably difficult due to a huge exploration space that easily scales up exponentially since recent robots tend to have many kinds of sensors. One of the potential solutions might be application of so-called “mixture of experts” proposed by Jacobs et al.[1]; it decomposes a whole state space to a number of areas so that each expert module can produce good performance in the assigned small area. This idea is very general and has a wide range of applications, however, we have to consider how to decompose the space to a number of small regions, assign each of them to a learning module or an expert, and define a goal for each of them. In order to cope with the issue, this paper presents a method of self task decomposition for modular learning system based on self-interpretation of instructions given by a coach. Unlike the conventional approaches, the system decomposes a long-term task into short-term subtasks so that one learning module with limited computational resources can acquire a purposive behavior for one of these subtasks. Since instructions are given from a viewpoint of coach who has no idea how the system learns, they are interpreted by the learner to find the candidates for subgoals. Finally, the top layer of the hierarchical reinforcement learning system coordinates the lower learning modules to accomplish the whole task. The method is applied to a simple soccer situation in the context of RoboCup. Keywords. Reinforcement Learning, Behavior Acquisition, Task decomposition, RoboCup
1. Introduction One of the most formidable issues of reinforcement learning application to real robot tasks is how to find a compact state space in order to acquire a purposive behavior within reasonable learning time, and this has been much more serious since recent robots tend to have many kinds of sensors like normal and omni-vision systems, touch sensors, infrared range sensors, and so on. They can receive a variety of information from these sensors, especially vision sensors. This fact indicates that the difficulty of applying reinforcement learning to real robot tasks becomes more serious. 1 Correspondence to: Yasutake Takahashi, YamadaOka 2-1, Suita, Osaka, 565-0871, Japan. Tel.: +81 6 6879 4123; Fax: +81 6 6879 4123; E-mail: [email protected]
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
703
One of the potential solutions might be application of so-called “mixture of experts” proposed by Jacobs et al.[1], in which a whole state space is decomposed to a number of areas so that each expert module can produce good performance in the assigned area, and one gating system weights the output of the each expert module for the final system output. This idea is very general and has a wide range of applications. For example, Doya et al.[2] have proposed MOdular Selection And Identification for Control (MOSAIC), which is a modular reinforcement learning architecture for non-linear, non-stationary control tasks. However, all learning modules share the one state space that consists of a few variables. In order to make a search space as compact as possible, it is not enough to divide the space to a number of areas and locate an expert to each of them and it is necessary to reduce a number of state variables that construct whole search space. Fortunately, a long time-scale behavior might often be decomposed into a sequence of simple behaviors each of which needs a few state variables in general, and therefore, the search space can be divided into smaller ones. In the existing studies, however, task decomposition and behavior switching procedures are given by the designers (e.g.,[3,4,5]). Others adopt the heuristics or the assumption that are not realistic from the viewpoint of real robot application (ex. [6,7,8,9]). A basic idea to cope with the above issue is that any learning module has limited resource constraint, and this constraint of the learning capability leads us to introduce a multi-module learning system and specified sub-tasks by itself rather than sub-tasks are defined in advance by others. That is, one learning module has a compact state-action space and acquires a simple map from the states to the actions, and then a sub-task is defined by this module. We have already proposed the basic idea and showed some results with simulation and simple real robot experiments [10,11], however, the proposed algorithm needs a large amount of data in experiences that is hard to acquire with a real robot. In this paper, we introduce an idea that the capability of a learning module defines the size of sub-tasks. We assume that each module can maintain a few numbers of state variables and this assumption is reasonable for real robot applications. Then, the system decomposes a long-term task into short-term sub-tasks with self-interpretation of coach instructions so that one learning module with limited computational resources can acquire a purposive behavior for one of these sub-tasks. In previous work [10,11], the system had to try all possible actions at all possible states because it used action value Q on each state to estimate availabilities of modules, then, it is almost impossible to apply to real robot tasks. We develop another approach to use state value V instead of the action value. We show experimental results with much more sensors such as normal and omni-vision systems and 8 directions infrared range sensors.
2. Basic Idea There are a learner and a coach in a simple soccer situation (Figure 1). The coach has a priori knowledge of a task to be played by the learner while s/he does not have any idea about the system of the learner. On the other hand, the learner just follows the instructions without any knowledge of the task. After some instructions given by a coach, the learner decomposes the whole task into a sequence of subtasks, acquires a behavior for each
704
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
Figure 1. A coach gives instructions to a learner
Figure 2. The learner follows the instructions and finds basic behaviors by itself
subtask, and coordinates these behaviors to accomplish the task by itself. In Figure 1, the coach instructs a shooting a ball into a yellow goal with obstacle avoidance. Figure 2 shows an example that the system decomposes this task into three subtasks and assigns them to three modules that maintain state spaces consist of ball variables, opponent and goal ones, and goal ones, respectively.
3. Robot, Tasks, and assumption
Figure 4. Captured camera images Figure 3. A real robot
Figure 3 shows a mobile robot we have designed and built. The robot has a normal camera in front of body, an omni-directional camera on the top, and infrared distance sensors around the body. Figure 4 show the images of both cameras. A simple color image processing is applied to detect the ball, the goal, and an opponent in the image in real-time (every 33ms). The robot has also 8 directions infrared range sensors. The robot has totally 39 candidates of state variables. The details of the candidates are eliminated because of space limitations. The mobile platform is an omni-directional vehicle (any translation and rotation on the plane). The tasks for this robot are chasing a ball, navigating on the field, shooting a ball into the goal, and so on. We assume that the given task has some absorbing goals, that is, the tasks are accomplished if the robot reaches to certain areas in state spaces that consist of a few state variables. We demonstrated only shooting behaviors for the task decomposition and the coordination. Figure 5 shows four examples of the behaviors instructed by the coach. The total number of instruction is 4 for this experiment.
4. Task Decomposition Procedure Figure 6 show a rough sketch of the idea of the task decomposition procedure. The top of the Figure 6 shows a monolithic state space that consists of all state variables
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
705
Figure 5. Examples of Instructed behaviors
(x1 , x2 , · · · , xn ). The red lines indicate sequences of state value during the given instructions. As we assume beforehand, the system cannot have such a huge state space, then, decomposes the state space into subspaces that consist of a few state variables. The system regards that the ends of the instructions represent goal states of the given task. It checks all subspaces and selects one in which the most ends of the instruction reach a certain area (Gtask in Figure 6). The system regards this area as the subgoal state of a subtask that is a part of the given long-term task. The steps of the procedure are as follows: 1. find module unavailable areas in the instructions and regard them as unknown subtask. 2. assign a new learning module. (a) list up subgoal candidates for the unknown subtasks on the state space based on all state variables. (b) decompose the state space into subspaces that consist of a few state variables. (c) check all subspaces and select one in which the subgoal candidates reach a certain area best (Gsub in Figure 6). (d) generate another learning module with the selected subspace as a state space and the certain area as the goal state. 3. check the areas where the assigned modules are available. 4. exit if the generated modules cover all segments of instructed behaviors. Else goto 1.
5. Availability Evaluation and New Learning Module Assignment The learner needs to check the availability of learned behaviors that help to accomplish the task by itself because the coach neither knows what kind of behaviors the learner has already acquired nor shows perfect example behaviors from the learner’s viewpoint. The learner should suppose a module as valid if it accomplishes the subtask even if the greedy policy seems different from the example behavior. In order to judge the module is valid for executing the instructed behavior, a sequence of state value will be available. Figure 7 shows a sketch of state value function in which the state value is the biggest at the goal state and it distributes like a mountain. When an agent follows an optimal policy, it goes up the state value function. Then, if it goes up the state value mountain, it means the module seems valid for explaining the executing behavior.
706
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System Instruction
x2
Subgoal Candidate
x3 x1
x4 State Space Decomposition
x5
Goal State
x4 xn
x2
Gtask xn
x3
x1
x1
xn-1 x2
x2
Module Available Area
New Subgoal Candidate
Module Available Area Subgoal Candidate
x3
x1
x4 State Space Decomposition
x3
x5
Subgoal Gsub
x2
x4
xn
xn x1
x2
xn-1
x1 New Subgoal Candidate
Module Avairable Area
Module Available Area Subgoal Candidate
x2 x3
x1
x4 x5 xn
Figure 6. Rough sketch of the idea of task decomposition procedure
Now, we introduce AE in order to evaluate how suitable the module’s policy is to the subtask: AE(s) = γ
V (s ) , V (s)
(1)
where γ, s, s , and V (s) indicate discount factor, current state, next state, and state value
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
707
Figure 8. Sketch of propagation of state value Figure 7. Sketch of state value function and optimal policy
of state s, respectively. AE becomes larger if the instructed action leads to the goal state of the module while it becomes smaller if it leaves from the goal state. Figure 8 show a rough image of state value propagation in a simple case: States are chained straight one by one and an agent can move to only the next state in deterministic way. When the agent moves from, for example, state s3 to state s4 successfully, the state value V (s3 ) is propagated as γV (s4 ), and AE(s3 ) will be AE(s3 ) = γV (s4 )/V (s3 ) = 1. On the other hand, if the agent moves to s2 , the AE(s3 ) will be smaller. In order to decide whether the module is valid or not, we prepare a threshold AEth , and the learner evaluates the module as valid for a period if AE > AEth . If there are modules whose AE exceeds the threshold AEth , the learner selects the module which keeps AE > AEth for longest period among the modules (see Figure 9). In Figure 6, "Module Available Area" indicates the one in which AE > AEth . an existing learning module is available new learning modules are needed
AE AEth
ignore
t Figure 9. Availability identification during the given sample behavior
If there is no module which has AE > AEth for a period, the learner creates a new module which will be assigned to the subtask (see procedure 2 in Section 4 ). To assign a new module to such a subtask, the learner identifies the state space and the goal state. The system follow the two steps to select an appropriate state space and the goal state for the subtask: • selection of one state variable that specifies the goal state, and
708
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
• construction of a state space including the selected state variable. In order to find one state variable that specifies the goal state best, the system lines up the candidates for a goal region in term of state variables. On the other hand, in order to select another state variable, the system evaluates performance of Q value estimation. The details of the procedure are eliminated because of space limitations.
6. Experiments
LM 4 LM 3 LM 2
g oa l st a t e a ct iva t ion
LM 1
1
0.5
0 0
100
200
300
400
500
600
St e p
Figure 10. Sequences of the selected module, availability evaluations and goal state activations of modules through an instruction
Upper Layer LM
Action
State
Identification of LM
Lower Layer
LM[A ob,Xob ]
State Sensor Value
Action Motor Command
Goal State Activity
LM[Yob ,Qob]
State Sensor Value
LM[ Qoog ]
Action Motor Command
State Sensor Value
Action
Motor Command
LM[Dob ,Q ob]
Action
State Sensor Value
Motor Command
World
Figure 11. Acquired hierarchy for the shooting behavior
T. Nishi et al. / Incremental Purposive Behavior Acquisition Based on Modular Learning System
709
According to the learning procedure, the system produced four modules for the instructed behaviors. The modules are LM1 (Apb , Xpb ), LM2 (θog ), LM3 (Yob , Xob ), and LM4 (Aob , θob ). For example LM1 (Apb , Xpb ) indicates that the modules has a state space that consists of the area of ball on the normal camera image (Ao b) and the x position of the ball on the normal camera image (Xpb ). Figure 10 shows sequences of the selected module, availability evaluations and goal state activations of modules through an instruction.
7. Conclusion We proposed a hierarchical multi-module learning system based on self-interpretation of instructions given by a coach. We applied the proposed method to our robot and showed results for a simple soccer situation in the context of RoboCup.
References [1] R. Jacobs, M. Jordan, N. S, and G. Hinton, “Adaptive mixture of local experts,” Neural Computation, vol. 3, pp. 79–87, 1991. [2] K. Doya, K. Samejima, K. ichi Katagiri, and M. Kawato, “Multiple model-based reinforcement learning,” tech. rep., Kawato Dynamic Brain Project Technical Report, KDB-TR-08, Japan Science and Technology Corporatio, June 2000. [3] J. H. Connell and S. Mahadevan, ROBOT LEARNING. Kluwer Academic Publishers, 1993. [4] P. Stone and M. Veloso, “Layered approach to learning client behaviors in the robocup soccer server,” Applied Artificial Intelligence, vol. 12, no. 2-3, 1998. [5] P. Stone and M. Veloso, “Team-partitioned, opaque-transition reinforcement learning,” in RoboCup-98: Robo Soccer World Cup II (M. Asada and H. Kitano, eds.), pp. 261–272, Springer Verlag, Berlin, 1999. [6] B. L. Digney, “Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments,” in From animals to animats 4: Proceedings of The fourth conference on the Simulation of Adaptive Behavior: SAB 96 (P. Maes, M. J. Mataric, J.-A. Meyer, J. Pollack, and S. W. Wilson, eds.), pp. 363–372, The MIT Press, 1996. [7] B. L. Digney, “Learning hierarchical control structures for multiple tasks and changing environments,” in From animals to animats 5: Proceedings of The fifth conference on the Simulation of Adaptive Behavior: SAB 98 (R. Pfeifer, B. Blumberg, J.-A. Meyer, and S. W. Wilson, eds.), pp. 321–330, The MIT Press, 1998. [8] B. Hengst, “Generating hierarchical structure in reinforcement learning from state variables,” in 6th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000) (R. Mizoguchi and J. K. Slaney, eds.), vol. 1886 of Lecture Notes in Computer Science, Springer, 2000. [9] B. Hengst, “Discovering hierarchy in reinforcement learning with HEXQ,” in Proceedings of the Nineteenth International Conference on Machine Learning (ICML02), pp. 243–250, 2002. [10] Y. Takahashi and M. Asada, “Multi-layered learning systems for vision-based behavior acquisition of a real mobile robot,” in Proceedings of SICE Annual Conference 2003 in Fukui, vol. CD-ROM, pp. 2937–2942, Aug 2003. [11] Y. Takahashi, T. Nishi, and M. Asada, “Self task decomp osition for mo dular learning system through interpretation of instruction by coach,” in RoboCup 2005 Symposium papers and team description papers, pp. CD–ROM, Jul 2005.
This page intentionally left blank
Part 12 Real World Information Systems
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
713
Simple Form Recognition Using Bayesian Programming Guy Ramel a,1 , Adriana Tapus a , François Aspert a and Roland Siegwart a a Autonomous Systems Lab Swiss Federal Institute of Technology Ecublens, Lausanne 1015, Switzerland Abstract. The environment that surrounds us is very complex. Understanding and interpreting it is a very hard task. This paper proposes an approach allowing simple form recognition with a camera by using a probabilistic approach based on the Bayesian Programming formalism. The main goal is to recognize several type of elemental features composing an image, such as local orientation of a contour, or corners. The Bayesian Program for feature recognition is presented and the learning stage explained. Our approach has been validated through experiments. Keywords. Computer Vision, Pattern recognition, Bayesian Programming, HumanRobot Interaction
1. Introduction Nowadays, the ability to detect some particular shapes in an image is a crucial issue that take more and more importance. Indeed, in two-dimensional images, most of the time, important information about objects can be extracted from the particular shape of objects. For example, we can state that a door is composed of vertical lines, horizontal lines and four corners. This information can be considered as sufficient to recognize a door. This type of recognition can be taken as a basis for multiple industrial or security image processing applications (e. g. video surveillance, quality control). Therefore, there is a real need in detecting these features and in finding a way to classify them in a robust manner in order to be able to build stronger applications from this basic model. Biological vision is an example of hierarchical based system: the part of the neocortex that treats the vision is composed of several layers named for example V1, V2, V4 and IT (enumeration non exhaustive). V1 is directly activated by the optic nerve and contains a population of neurons specialised in recognition of elementary features (primitives) such as lines with a particular orientation, corners or end of lines for example. Each layer focuses on more complex combinations of primitives. Finally, IT contains a population of neurons activated if specific objects are present in the field of vision (e.g. human face), and invariant to translation or rotation [2,3]. Even if this description is uncomplete, a brief overview of a cognitive vision system was depicted. 1 Correspondence to: Guy Ramel, EPFL - STI - I2S U ˝ LSA, ME A3 434 (Bâtiment ME), Station 9, CH-1015 Lausanne. Tel.: +41 21 693 54 65; Fax: +41 21 693 78 07; E-mail: guy.ramel@epfl.ch.
714
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
As for human object recognition, top-down knowledge of objects from object recognition may be used to categorise objects in a real scene from primitive features. Some previous works use codebook of local appearances of a particular object category to recognise this one in real-world scenes [5]. Other approach described in [6] analyzes the appearance and the contour shape to classify objects. In [8], authors use Bayesian networks to describe a class of objects from primitive features given by the two firstderivative of the Gaussian basis function and by the 18-vector containing the responses to the basis filters of the first three derivatives at two scales. All these considerations justify the development of a robust tool for recognition of simple forms, using some minimal information about the image (i.e. no a priori information can be taken). Furthermore, the user should be able to easily describe and modify the forms to recognize in the image. The approach should also be able to deal with the uncertainty and the possible range of variations that exists for each feature we want to detect and recognize. Another strong constraint is the computational efficiency. Given all these constraints, we propose a new approach based on the Bayesian Programming formalism so as to recognise simple forms in a robust manner. This formalism, first described by Lebeltel in [4], is designed for robot programming using conditional probabilities. It addresses several interesting properties that can be applied to our form’s recognition problem. One of the main strength is the fact that it is based on a supervised learning which can potentially enable the definition of any kind of form for recognition and the possibility of their modification in a very flexible way. Secondly, using the conditional probabilities allow some variations in the features to recognise. That permit a detection of form dealing with noise and other undesirable possible variations. The rest of the paper is structured as follows. Section 2 briefly defines the Bayesian Programming formalism. Section 3 is dedicated to the probabilistic method used for the simple forms recognition. Experimental results are presented in Section 4. Finally, Section 5 draws conclusions and discusses further work.
2. Bayesian Programming Probabilistic methodologies and techniques offer possible solutions to the incompleteness and uncertainty problems when programming a robot. The basic programming resources are probability distributions. In the context of probabilistic method, the Bayesian Programming (BP) approach was originally proposed as a tool for robotic programming (see [4]), but nowadays used in a wider scope of applications ([7,9] shows some examples). The Bayesian Programming formalism allows for using a unique notation and provides a structure to describe probabilistic knowledge and its use. The elements of a Bayesian Program are illustrated in Figure 1. A BP is divided in two parts: a description and a question. 2.1. Description The purpose of a description is to specify an effective method to compute a joint distribution on a set of relevant variables X1 , X2 , . . . , Xn , given a set of experimental data δ and a priori knowledge π. In the specification phase of the description, it is necessary to:
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
715
⎧ Pertinent Variables ⎪ ⎪ ⎨ Decomposition Specification(π) Parametric Forms Description ⎪ ⎪ Form Program ⎩ ⎪ ⎪ Program ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ Identification based on data(δ) ⎪ ⎪ ⎩ Question ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
⎧ ⎪ ⎪ ⎪ ⎪ ⎨
Figure 1. Structure of a Bayesian program.
• Define a set of relevant variables X1 , X2 , . . . , Xn , on which the joint distribution shall be defined • Decompose the joint distribution into simpler terms, using the conjunction rule. The conditional independence rule can allow further simplification, and such a simplified decomposition of the joint distribution is called decomposition • Define the forms for each term in the decomposition; i.e. each term is associated with either a parametric form, as a function, or to another Bayesian Program 2.2. Question Given a description P(X1 ∧ X2 ∧ ∧ Xn | δ ∧ π), a question is obtained by partitioning the variables X1 , X2 , . . . , Xn into three sets: Searched, Known and Unknown variables. A question is defined as the distribution P(Searched | Known ∧ δ ∧ π). In order to answer this question, the following general inference is used: P(Searched | Known ∧ δ ∧ π) = U nknown P(Searched ∧ U nknown ∧ Known) U nknown,Searched P(Searched ∧ U nknown ∧ Known)
(1)
3. Form Recognition with Bayesian Programming 3.1. Primitives In this work, we focus only on primitives features of low level. These ones are depicted in Figure 2. The primitives used are : vertical lines, horizontal lines, slash and backslash lines and four corners of different orientations. Several translated primitives of each type are added and used as Garbage Collector. This is explained in more detail in section 4. 3.2. Pertinent Variables The choice of the pertinent variables is a crucial point in defining a Bayesian program since it will be the backbone of the program and the quality of our detection will highly depend on it. These variables must not only be relevant for describing the features in terms of attributes and characteristics, but also, not be too numerous in order to keep the decision calculation time reasonable.
716
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
Figure 2. Set of primitives with “real” primitives (vertical, horizontal, slash and backslash line and the four corners) and several translated primitives of each type used as Garbage Collector.
Only one output variable is needed in our case. This is used to determine the type of the feature that have been recognized. This variable that is denoted by F eat. It is discrete and takes integer values over the range [0 . . . N ], where N corresponds to the number of different features that are searched in the image. In our case, all low-level features are recognised inside a window, of a defined size that scans the entire image. As shown in Figure 3, this window is divided into several square zones. In each of these zones, the ratio between the number of white pixels (contour pixels) called whitepix and the total number of pixels within a zone (called totpix) is employed for primitives detection. For each zone, this ratio is represented by a variable called Xi (i standing for a particular zone of a given feature) and is discrete over the interval IX = [0, 1]. This can be expressed as : Xi =
whitepix totpix
(2)
These variables corresponds to the Known variable subset in our questioning inference . Therefore, the following joint distribution stands for our description of the problem: P (F eat ⊗ X1 ∧ . . . ∧ Xi | δ ∧ π)
(3)
Figure 3. This figure depicts how a window is divided in several square zones. Each of these ones correspond to a variable. In this example, an upper-right corner feature can be seen inside the window.
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
717
3.3. Decomposition The decomposition consist in decomposing the joint distribution into a product of simples terms.This step also permits to express the dependence relationships between variables. This distribution is conditioned by the preliminary knowledge. One can express this statement in the following way: P (X1 | F eat ∧ δ ∧ π)⊥ . . . ⊥P (Xi | F eat ∧ δ ∧ π)
(4)
Under this hypothesis, using the product and marginalization rule, the decomposition becomes: P (F eat ∧ X1 ∧ . . . ∧ Xi | δ ∧ π) = P (F eat | δ ∧ π) × P (X1 | F eat ∧ δ ∧ π) × . . . . . . × P (Xi | F eat ∧ δ ∧ π)
(5)
3.4. Parametric forms Since no a priori information about the distribution of the features is available, one assume F eat to be uniformly distributed over [0 . . . N ]. This is expressed as follows: P (F eat = i) =
1 N
∀ i ∈ ℵ, [0 . . . N ]
(6)
Since we assume that Xi variables follows a Gaussian probability law where the mean and the standard deviation are dependent on the particular zone and the feature corresponding to the variable, we describe this as : P (Xi = x) = Gauss(μ(zone, F eat), σ(zone, F eat)) ∀ i ∈ , [0, N ]
(7)
where μ(zone, F eat) and σ(zone, F eat) will be computed during the learning. This particular choice for the probability law seemed to be logical since a Gaussian law express the fact that for each primitive, each variable has a fixed value corresponding to the mean of the Gaussian. More the value will differ from the mean, the less probable it will be. 3.5. Identification The identification process consists in determining the different free parameters of the previously chosen probability laws associated with the variables. In our case, we only need to determine the parameters for the zone ratio variables (the mean and the standard deviation) since the F eat variable is already completely determined. This is done by performing a learning over a defined data feature set.
718
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
3.6. Question According to Equation (1) the Bayesian inference will therefore be of the following form: P (F eat | X1 ⊗ X2 ∧ . . . ∧ Xi ∧ δ ∧ π)
(8)
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
Specification (π)
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
Description
Program
where i depends on the number of division used to describe the selected features. Figure 4 resumes the Bayesian Program used in this work. ⎧ Pertinent Variables ⎪ ⎪ ⎪ ⎪ •X1 . . . Xn input variables designating all zones ⎪ ⎪ ⎪ ⎪ •F eat output variable corresponding the feature to detect ⎪ ⎪ ⎪ ⎪ Decomposition ⎪ ⎪ ⎨ •P (F eat ∧ X1 ∧ . . . ∧ n | δ ∧ π) )X n = P (F eat | δ ∧ π) · ⎪ i=1 P (Xi | F eat ∧ δ ∧ π) ⎪ ⎪ ⎪ Form ⎪ ⎪ ⎪ ⎪ •P (F eat) = U nif orm(F eat) ⎪ ⎪ ⎪ ⎪ •P (X1 | F eat ∧ δ ∧ π), . . . , P (Xn | F eat ∧ δ ∧ π) ⎪ ⎪ ⎩ = N (μi (F ), σi (F ))
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Identification based on data (learning)(δ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Question •P (F eat | X1 ∧ . . . ∧ Xn ∧ δ ∧ π)
Figure 4. Structure of our Bayesian program for primitives recognition.
4. Experimental Results In order to validate our Bayesian Programming based approach for visual primitives detection, some experiments were conducted. The first step is a pre-process used to obtain a black and white image, by applying a simple method of thresholding. This is followed by a Gaussian blurring filter in order to smooth the image and remove the noise. Finally, edges in the resulting image are detected using the well known Canny edge detector [1]. After this pre-processing step, our method for simple forms recognition can be applied. The second step of experiments is to determine the parameters of Gaussian distributions for our variables. This is made by calculating the mean and the standard deviation iteratively from a set of examples. The learning set contains examples of all kind of primitives with variations in orientation and in position with respect to the center of the window. Nevertheless, the positions stay close to the centre, because if primitives are learned in all the surface of the windows, all variables will take a similar value for all primitive, and recognition capacity will be strongly affected. In order to avoid this, we
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
719
add garbage to the set of primitive. These garbage collectors allow capturing all features outside the correct interval. The third step is to use the Bayesian Program in order to recognise primitives similar to those previously learned. To do that, the program scans the entire image with a window and analyse the window at each step. Since the windows can contain several primitives at the same times a new intermediary process is needed. Blob detection is realised so as to detect all features. This process allow us to treat separately the features and to remove features that are too small for corresponding to a possible feature. 4.1. Results To validate our program, a first test was realised on the learning set itself, with different configurations of variables and sizes for the windows. Hence, we can choose the best configuration for the real tests with features not included in the learning set. The Table 1 depict the results of this test. It can be noticed that the best configuration is for a size of 12 × 12 pixels, and 36 variables (6 × 6 variables made from 2 × 2 pixels). The best results obtained with the most big number of variables is explained by the fact that we don’t take into account the configuration of black and white pixels to evaluate a variable, but only the number of black and white pixels. So, the fewer variables are, the more confusion between primitives is. Set A Nb pixels
Set B
12×12
Nb variables
36
Mean
83.6%
8×8 16
74.1%
12×12 36
68.4%
85.9%
8×8 16
77.4%
70.0%
Table 1. Results for the validation of the method and the determination of the best type of window.
Finally, Table 2 shows some results realised with a window of 12 × 12 pixels, and 36 variables, and with an image composed of several lines drawn in several orientations, and with corners of different angles, and with a real image of a detached house front. These are described in Figure 5.
Figure 5. Example of testing image after the pre-processing.
Table 2 shows a percentage of correct recognition bigger than 68% with a average of 81%. Another remark is that the sum of percentages for columns are not equal to zero. This is a result of the misclassification between the core-primitives and the garbage collector which are not showed in this table. Indeed, we consider that a garbage detection
720
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
VL VL
SL 1%
1%
68%
1%
1%
UL
UR
LL
LR
1%
3%
3%
2%
3%
4%
1%
1%
76%
BL UL
BL
7% 98%
HL SL
HL
89%
UR
3%
LL
4%
LR
1%
Nb measures
129
1% 104
13%
1%
3%
77%
4%
5%
1%
80%
3%
2%
1%
1%
6%
82%
8%
1%
5%
81%
246
205
217
1%
7%
82
106
206
1%
Table 2. Results with real image. In this table, meanings of abbreviation are: VL = vertical line; HL = horizontal line; SL = slash line; BL = back slash line; UL = upper-left corner; UR = upper-right corner; LL = lower-left corner; LR = lower-right corner.
is not pertinent for statistics since it also depends on the position of the primitive and not only of the type of primitive. Another difficulty is to interpret why a given primitive is better recognised than another one. Some configurations of features inside the window can be ambiguous, and in order to build statistics, we have to choose if the Bayesian Program got the wrong class for the feature or not. This act can depend on the variation of the visual appreciation. Moreover, some primitives can be more similar to the class belonging to the garbage collector than to another one. This last fact can strongly depend on the chosen set for learning. Moreover, our scanning scheme allows one feature to be detected multiple times. This fact can increase the quality of the detection. Indeed, if the same feature is detected N times, it is not absolutely necessary that the detector recognises the right feature N times. We can imagine that a majority of correct decisions for each feature should be sufficient for a future processing. The importance of detection relies not only on the ground-truth results but also on the limitation of false positives for each feature type. By taking into account all these requirements, it can be noticed that Table 2 shows good results with an average of good detection equal to 81%. This high capability of avoiding confusion between features is also very important for the robustness. Another important fact is the computer complexity. To solve a Bayesian Program implies two major difficulties: marginalisation and random sampling. Those are non trivial and don’t have a simple answer in high dimension problems. In our case we use a library which offers general methods to work with probability distributions. The number of variables Xi in our problem is D = 6 × 6 = 36), each variable within the same range r (Xi ∈ IX = [0, 0.25, 0.5, 0.75, 1] → r = #(IX ) = 5) and the number of possibilities is thus 536 . Actually, our program allow us to analyse a static image.
5. Conclusion and Future Work This paper presented a new method for visual primitive’s recognition by using the Bayesian Programming methodology. This work took place in the context of perception and interpretation of the environment using a probabilistic method. From the exper-
G. Ramel et al. / Simple Form Recognition Using Bayesian Programming
721
iments, we conclude that the presented approach is practical and robust with a global result of 81% of correct matching. In order to improve the current results, we propose to analyse the same feature several times, in several positions inside the window. This will eliminate the case where the primitive features are recognised as a class belonging to the Garbage Collector. Moreover we have now to experiment with different sets of variables in order to decrease the dimensionality of the problem. This in turn will allow image processing and interpretation in near real-time in the field on human-robot interaction. Other future works will focus on the extension of the whole approach towards the recognition of more complex objects, by using a hierarchical Bayesian Program. This will help us to construct a tool for recognising high-level features, that can be used to model the environment in a more natural manner similar to the way humans do it, and to add semantics to it.
References [1] J. Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 8, No. 6, Nov 1986. [2] David Hubel and Torsten Wiesel, Receptive fields of single neurones in the cat’s striate cortex, Journal of Physiology, Vol 148, pages 574-91, 1959 [3] David Hubel and Torsten Wiesel, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex, Journal of Physiology, Vol 160, pages 106-54, 1962 [4] Olivier Lebeltel, Programmation Bayésienne des Robots, INP Grenoble, 1999 [5] Bastian Leibe and Bernt Schiele, Interleaved Object Categorization and Segmentation, in British Machine Vision Conference (BMVCŠ03), Norwich , UK, Sept. 2003 [6] Bastian Leibe, Bernt Schiele: Analyzing Appearance and Contour Based Methods for Object Categorization. CVPR (2) 2003: 409-415 [7] K. Mekhnacha, E. Mazer, and P. Bessière, A robotic CAD system using a Bayesian framwork, In Proceedings of the IEEE-RSJ International Conference on Intelligent Robots and Systems, Takamatsu (JP), Best Paper Award, 2000 [8] Justus H. Piater and Roderic A. Grupen, Feature Learning for Recognition With Bayesian Networks, In Proceedings of Fifteenth International Conference on Pattern Recognition (ICPR 2000), Barcelona, Spain, 2000 [9] A. Tapus and G. Ramel and L. Dobler and R. Siegwart, Topology Learning and Recognition using Bayesian Programming for Mobile Robot Navigation, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004
722
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Towards Robust State Estimation with Bayesian Networks: A New Perspective on Belief Propagation1 Jan Nunnink 2 , Gregor Pavlin IAS Group, Informatics Institute, University of Amsterdam Abstract. We investigate properties of Bayesian networks (BNs) in the context of state estimation. We introduce a coarse perspective on the inference processes and use this perspective to identify conditions under which state estimation with BNs can be very robust, even if the quality of the model is very low. By making plausible assumptions we can formulate asymptotic properties of the estimation performance with respect to the network topology. In addition, we introduce techniques that support detection of potentially inaccurate inference results. Keywords. Bayesian Networks, Classification, Accuracy, Robustness
1. Introduction Autonomous systems and decision making processes must be able to respond to the relevant aspects in their environment, which requires knowledge about the situation. For example, in crisis management applications decision makers require reliable assessment of the presence of toxic gases, in order to make adequate decisions. Situation assessment often involves reasoning about facts which cannot be observed directly. Consequently, we must infer hidden events of interest through an appropriate interpretation of large quantities of heterogeneous and uncertain information, obtained through the existing communication, data storage and sensing infrastructure. Such interpretation requires automated information fusion systems that can cope with uncertain models and process large amounts of noisy and subjective information. At the same time the estimation results have a mission critical impact because misleading assessment can have devastating consequences on the further coarse of events. Furthermore, there are applications that make use of filtering techniques, such as Kalman and particle filters, which require reliable state estimation in order to prevent filtering divergence. For example, if a robot uses a particle filter for the localization, then the sampling phase will be more effective if the divergence between the true and the estimated distribution over the possible states are minimized. We assume that a situation can be described through a set of discrete random variables and that there exists a mapping between the states and optimal decisions or actions. 1 This
work was done within the Combined project, funded by the Dutch Ministry of Economic Affairs. to: Jan Nunnink, Kruislaan 403, 1098 SJ Amsterdam. E-mail: [email protected]
2 Correspondence
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
723
A situation corresponds to a set of hidden and observed states of variables that the nature ‘sampled’ from some true distribution over the combinations of possible states. Thus, in a particular situation certain states materialized while others did not, which corresponds to a point-mass distribution over the possible states. Consequently, the state estimation can be reduced to a classification of the possible combinations of relevant states. Since we deal with stochastic processes, classification is based on the estimated probability distributions (i.e. beliefs) over the hidden states. We assume that the probability distributions are estimated with the help of Bayesian networks (BNs), which are rigorous probabilistic models that define a mapping between observations and hypotheses in uncertain domains. BNs facilitate a systematic fusion of information about observations with the prior knowledge about the stochastic processes. Since BNs provide the mapping between the evidence and hypotheses of interest, they have a crucial impact on the classification accuracy. We emphasize a fundamental difference between the model and fusion result accuracy. A BN is a generalization over many possible situations that captures the probability distributions over the possible events in the observed domain. However, even a perfect generalization does not necessarily support accurate classification in a particular situation. For example, consider a domain in which 90% of fires cause smoke. While it is common to have fire and smoke coexisting in a case, rare cases will occur where we have a fire but no smoke. Observing the absence of smoke would in such a rare case decrease our belief in the presence of fire, leading our belief away from the truth, even if the used BN were a perfect generalization. In the context of mission critical applications we consider classification of the state accurate if it is equivalent to the truth in the sense that knowing the truth would not change the action based on the classification. This is the case if the divergence between the estimated and the true point-mass distributions is sufficiently small. Obviously, if the current situation is a rare case, then that divergence is increased. In addition, in real world applications we usually cannot model the true distributions with high precision. In this paper we show that probabilistic inference based on BNs can be very robust under certain circumstances and can cope with rare cases and modeling imprecisions. In addition, by introducing a coarse runtime perspective on inference processes we can obtain additional information about the classification quality and show interesting properties of inference processes with BNs. This paper merely illustrates relevant aspects of reliable inference and, due to the lack of space, interesting properties are presented without rigorous proofs, which are described in [5]. In Section 2 we discuss classification using BNs with poly-tree topologies. Section 3 analyzes the role of individual parameters in the inference process, Section 3.1 describes inference from a coarse runtime perspective, and Section 4 relates this description to the inference processes. In Section 5 we list several possible applications.
2. Bayesian Network Classification In a decision making context, state estimation can often be reduced to a classification problem, which is a process of computing certain scores over several alternatives and choosing one of the alternatives based on a certain threshold. In the case of Bayesian network classifiers (BNCs), the scores are a probability distribution over different values of a variable given a set of observations about other variables.
724
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
?>=< 89:; 89:; ?>=< A B 666 6 ?>=< 89:; 89:; ?>=< C D6 666 ?>=< 89:; ?>=< 89:; 89:; ?>=< E F G Figure 1. An example Bayesian network with a poly-tree topology.
Probabilistic inference (also called belief propagation) in BNs consists of a series of multiplication and marginalization steps that combine predefined modeling parameters and consider the observed evidence. For the sake of clarity, we limit our discussion to BNs with poly-tree topologies, in which any two vertices are connected through a single undirected path 1 . For example, consider a BN whose graph is shown in Figure 1. We will use capital letters to denote variables (or nodes) and lower case letters to indicate its value or state. Suppose we would like to classify the state of variable D, while we have observed that G = g2 , F = f1 and E = e1 . We call E = {e1 , f1 , g2 } the evidence (or observation) set. Conditional probability tables (CPTs) describe for each variable X the conditional probabilities p(X|parents(X)) for all possible combinations of the states. The numbers in CPTs are the parameters of the BN. For an overview of BNs see [7,3]. Classification requires the computation of the posterior probability over the possible values of D, which can be expressed through the following factorization: p(D = di |E) = α
A
*
p(A)
C
p(C|A)p(e1 |C) +,
B
p(B)p(di |A, B) p(f1 |di ) p(g2 |di ) . * +, - * +, φ3 - φ2
φ1 =p(di |e1 )
where α is a normalizing constant over all values of D. We can identify three factors, φ1 , φ2 and φ3 which reflect the conditional independence encoded in the BN graph. All factors are functions of separate BN fragments that are conditionally independent given node D. Considering BNs as causal models, we see that φ1 represents belief propagation in the causal direction with respect to node D, which is called predictive inference. Factors φ2 and φ3 represent belief propagation towards D against the causal relations, which is called diagnostic inference or retrospective support to the belief over the states of variable D ([6]). In a poly-tree a classification node always d-separates its parents from its children. In general, for poly-trees we can observe the following factorization property: predictive inference with respect to the classification node always results in a single factor. For example, for the network in Figure 1 any instantiation of the nodes {A, B, C, E} will be captured by a single factor φ1 . 1 The
discussion can be extended to more general topologies which, however, is out of scope of this paper.
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
725
3. Model Parameters The modeling parameters (CPTs) obviously play a significant role. In general the parameters are found through domain experts, automated learning methods or a combination of both, typically maximizing training data likelihood. The accuracy of the resultant BN, can then be estimated through: * Measuring the percentage of correct classifications over a set of test cases. * Measuring the divergence between the estimated distribution and the true distribution over the hypothesis variable H, i.e. the error. The fact that these methods evaluate the model as a whole over a complete data set can cause the following problems: * It is possible that several modeling errors within the network cancel out each other, which may not be detectable. * These methods require that the true distribution or class label is known. Therefore, we are interested in a methodology that can help us analyze the accuracy of a BNC (i) at the parameter level, and (ii) without knowing the true value of all variables. This raises the interesting question of what accuracy means in the context of a parameter. An obvious answer could be that a parameter is accurate if it is equal to its ‘true’ value. When talking about Bayesian networks, a parameter is an estimated conditional probability and the corresponding true value would be its value in the modeled domain. For example, if in some particular domain fire causes smoke with probability pˆ(s|f ) = 0.8, then that is the true value of the model parameter p(s|f ), even though we might have set it to p(s|f ) = 0.7 in our model. We specifically distinguish between estimated and true values. In practice we apply a model to a range of relatively similar but slightly different domains. For example, if we are modeling crises in different environments, then fire could cause smoke with slightly different probability depending on the particular environment. However, when constructing the model we generalize over many possible environments. The point is that a BN always describes a generalization over many little processes which can be different from case to case. Even if we assume that there is a fixed true value for every parameter, then the probability that we can estimate a value that is equal to it is extremely small. In that sense parameters are never really accurate. Fortunately, precise parameters are often not crucial for accurate classification. In the next section we will give a rationale for this, and derive a weaker definition of parameter accuracy. 3.1. Parameter Accuracy To illustrate the impact of BN parameters on a classification task, let us take a look at their influence on the calculation of a posterior probability distribution. Consider again the example from the previous section based on the network in Figure 1, where we want to classify D. Recall that each instantiation of a network fragment that is d-separated from the other parts of the network by D corresponds to a factor in the expression describing the distribution over D. In addition, for each such conditionally independent fragment of the BN we can observe that if we multiply the conditional equation with the corresponding factor and normalize over all states of D, there exists a state whose posterior probability will increase the most. For example suppose the parameters were p(f1 |d1 ) = 0.8 and p(f1 |d2 ) = 0.3. The first step, observing F = f1 , thus increased the
726
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
posterior of d1 the most. One could say that for this observation (F = f1 ) state d1 ‘wins’. Obviously, the state that wins sufficiently often will end up with the highest posterior probability. This suggests that it is not the exact factor values, but the relations between them that matter most with respect to the estimation accuracy. Therefore, we introduce the notion of inference reinforcements defined as follows: r(xi ) = arg max p(xi |yj ) yj
s(x1i , . . . , xni ) = arg max p(zj |x1i , . . . , xni ) zj
where Y is an ancestor of X while Z is a common descendant of X1 , . . . , Xn . We call these functions the reinforcement, because they return the state of an ancestor or a descendant of X whose posterior probability is increased the most (i.e. reinforced) by observing X = xi . Functions r(xi ) and s(x1i , . . . , xni ) denote reinforcements resulting from diagnostic and predictive inference processes, respectively. In the example in Section 2, we can identify reinforcements r(f1 ) = arg maxdj p(f1 |dj ), r(g2 ) = arg maxdj p(g2 |dj ) and s(e1 ) = arg maxdj p(dj |e1 ). Moreover, we can define an accurate reinforcement: Definition 1 (Accurate Reinforcement) Let H be any reinforced variable and let h∗ be its hidden true value. A reinforcement is accurate iff either h∗ = r(xi ) or h∗ = s(x1i , . . . , xni ). In other words, the true state of H should be reinforced. We illustrate this with an example. Let’s assume binary variables C and E and the CPT p(E|C) containing modeling parameters p(e1 |c1 ) = 0.7 and p(e1 |c2 ) = 0.2. Given these parameters an observation of E = e1 yields reinforcement r(e1 ) = c1 . If c1 is indeed the true value of C (i.e. the ground truth) then the model reinforces the true value and we consider the reinforcement accurate (see Definition 1). Consequently, we consider modeling parameters p(e1 |c1 ) and p(e1 |c2 ) accurate. Moreover, one can see that in this particular case we will obtain an accurate reinforcement as long as the parameters in the CPT p(E|C) satisfy condition p(e1 |c1 ) > p(e1 |c2 ), which defines an interval of accurate parameter values. Given Definition 1, we can identify the following property of belief propagation in BNs: Property 1 Let H be the classification variable. If a factor φi of the posterior probability formula corresponds to an accurate reinforcement, then the estimated distribution p(H|E) approaches the true distribution over H, independently of the assumed prior distribution p(H) and the parameters associated with other factors. This property results from the fact that exactly one of the possible states of H materialized and the true distribution over H is a point-mass (or delta) distribution; i.e. probability of one state is 1 while other states have probability 0. As next, we would like to know with what probability pre we can encounter situations in which a given model p(E|C) will support accurate reinforcements. By considering dependent variables C and E and the true conditional probability distributions pˆ(E|C), we can identify the following property: Property 2 Let βi be the set of states of E which reinforce state ci of C. We can show ˆ(ek |ci ) > 0.5. In that case the that pre > 0.5 if for all ci the following is true: ek ∈βi p lower bound on pre is given by pre ≥ mini ( ek ∈βi pˆ(ek |ci )).
727
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks 1
c2 0.4 0.6
(b) e1 e2
c1 0.7 0.3
c2 0.6 0.4
0.8
0.6
pch
(a) e1 e2
c1 0.8 0.2
Figure 2. CPTs for p(E|C).
0.4
0.2
p
0 1
0.9
0.8
0.7
0.6
0.5
0.4
re
Figure 3. The decay of the value pch for different values of pre and branching factors k: 3 (solid), 5 (dotted) and 5 (dashed).
Let’s assume an example, where the true distributions pˆ(E|C) are given by the CPT in Figure 2a. We see that c1 would cause observation E = e1 with probability 0.8, while c2 would cause E = e2 with probability 0.6. Suppose the modeling parameters would satisfy p(e1 |c1 ) > p(e1 |c2 ) and p(e2 |c2 ) > p(e2 |c1 ), corresponding to reinforcements r(e1 ) = c1 and r(e2 ) = c2 . Given the true distribution, we would obtain an accurate reinforcement with probability 0.8 if state c1 occurred, since c1 causes e1 with 0.8 probability and e1 reinforces c1 . Analogously, an accurate reinforcement occurs with probability 0.6 if state c2 materialized. Hence, the lower bound on pre for this example would be 0.6, since we do not know in advance which of the states of C will actually occur. Note also, that in this case pre ≥ 0.6 as long as the greater-than/less-than relations between the modeling parameters for each possible state of E are identical to the relations between the corresponding probabilities in the true distribution (see Figure 2a). Since in many applications the learning algorithm or the designer could easily recognize such relations, we can consider assumption pre > 0.5 realistic. However, we can identify cases in which the lower bound for pre is less than 0.5. Assume a model where variables C and E are related by a CPT, whose parameters are identical to the true distribution depicted in Figure 2b (i.e. we have a perfect model). By considering Definition 1 we see that in the case that c2 occurs we will in 40% of the cases observe e2 , which will yield a correct reinforcement. In other words, there exist domains in which the true distribution over modeled events is such that the expected classification performance can be very poor, even if we had perfect models. If we consider binary variables, we see that this is the case if for both possible causes the true probability of getting the same symptom/effect is greater than 0.5. In the next section we show that after we assume pre for all CPTs in a BN we can combine them into an estimated probability of accurate classification.
4. Inference and Accuracy By considering the reinforcements we can reduce belief updating to a counting problem, which simplifies further analysis of the inference processes. For the sake of clarity, in this section we consider only reinforcements of type r(xi ), that are contributed through factors φi implementing diagnostic inference (see section 3.1). For each state hi of some variable H we define a reinforcement counter ni , which is the total number of factors
728
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
that reinforced state hi . If we believe that the true value of a variable X is xi then we can look up which state r(xi ) of X’s parent H gets reinforced. The probability that the true state of H gets reinforced by X can be expressed as ptrue = pch pre + α(1 − pch )(1 − pre ),
(1)
where pch is the probability that xi is the true state of variable X and pre is the probability that the reinforcement given a correct state xi is accurate (see Definition 1). The second term represents the probability of a situation where the reinforcement would be inaccurate given a correct state of X, but where the chosen incorrect state xi is such that the errors cancel out and the true state of H gets reinforced. If we have binary variables, then α = 1. For each node X which is not an evidence node, we assume that the state xi with the greatest counter ni is the true state of X. A lower bound for the probability that xi is indeed the hidden true state of X can be expressed as pch ≥
k m= k/2
k m (1 − ptrue )k−m , p m true
(2)
where m is the number of accurate reinforcements (see Definition 1), while k is the total number of reinforcements, i.e. the number of all factors φi contributing diagnostic belief updates. Consequently, k is equivalent to a branching factor at node X (see Section 2). We can compute pch for any node in a BN by recursively applying operations (1) and (2). For the sake of clarity, we illustrate this process with the help of a BN with a simple tree structure, where each variable has at most one parent and the leaves are evidence nodes. We start from the evidence variables in order to compute pch for every variable in the tree. We know that pch = 1 for the evidence variables because we have observed their true value. By assuming pre > 0.5 we can compute for each leaf ptrue > 0.5, the probability that correct state at the leaf’s parent gets reinforced. Note that assumption pre > 0.5 is realistic in many applications (see Section 3.1). Based on these probabilities, we can compute pch for each leaf’s parent. For each parent node we then consider the state with the maximum reinforcement counter as hard evidence and continue with this process until the root of the tree is reached. If we make the assumption that the probability pre > 0.5 for all CPTs in a BN with a tree topology then we can show the following property, resulting from (2): Property 3 Given a BN with binary nodes and odd branching factors k, for every node the probability pch that the true state corresponds to the maximum reinforcement counter is greater than 0.5. In addition, pch grows and approaches 1 as k increases. In addition, for sufficiently great k, the probability pch is greater than 0.5 also in the case of BNs with multi-state nodes and even k. In other words, the distribution over the constellations of reinforcement counters at any variable is a function of pre and the branching factors, which is reflected in the experimental results shown in Figure 3. The curves show how pch at a node in simple tree networks with different branching factors changes as a function of pre . In this experiment, the variables had three states. Property 3 implies that reinforcement counters can be used for a coarse-grained state estimation.
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
729
Moreover, with the help of the reinforcement counters, we can explain under which circumstances inference processes in BNs will be robust with respect to the variation of parameters. We assume a δ such that estimated distribution p(H|E) over node H will result in a correct decision or classification if the Kullback-Leibler divergence satisfies KL(ˆ p(H) p(H|E)) < δ, where pˆ(H) is the true distribution. Note that pˆ(H) is a point mass distribution, reflecting the fact that exactly one of the possible states of H materialized. For the classification variable we also define na , the number of factors contributing accurate reinforcements (see Definition 1) and ne , the number of factors introducing inaccurate reinforcements. We can also assume that the posterior probability over H remains approximately constant if the numbers of accurate and inaccurate reinforcements are the same. This is the case if for every factor φi corresponding to an inaccurate reinforcement we can find a factor φj corresponding to an accurate reinforcement such that the factors cancel out after the normalization. By considering Property 3 we see that, if a BN features sufficiently large branching factors and pre > 0.5 for every CPT, then the probability that na − ne > 0 is greater than 0.5. Moreover, this probability grows with increasing branching factors. Also, according p(H) to Property 1 we see that na − ne accurate reinforcements will reduce KL(ˆ p(H|E)), irrespectively of the magnitude of the corresponding factors φi . However, for p(H) p(H|E)) < δ in a given δ, these factors must be large enough to satisfy KL(ˆ na −ne steps. Moreover, if for a fixed ratio na /ne > 1 the number of factors is increased, the absolute difference na −ne grows as well. Consequently, we can use a greater variety of factor values in order to satisfy KL(ˆ p(H) p(H|E)) < δ; due to larger na − ne we can satisfy KL(ˆ p(H) p(H|E)) < δ also with factors that result in smaller belief updates, smaller changes of the estimate p(H|E). This means that the intervals from which we can choose adequate parameters grow, which increases the likelihood of having factors φi that will in a given situation support accurate classification. In other words, increasing number of factors in p(H|E) that satisfy pre > 0.5 improves the robustness, since the classification is less sensitive to the parameter precision. Hence, by considering the relations between the BN topology and the factorization (see Section 2), we can observe that: Property 4 For a given classification node H in a BN with a tree topology and pre > 0.5, the classification robustness improves with the number of conditionally independent BN fragments that are rooted in H and support diagnostic belief updates.
5. Applications There are many types of applications that benefit from robust models and the presented inference analysis techniques. Namely, in real world applications we often have to cope with imprecise models and uncertain or subjective observations. For example, in crisis management applications it is often difficult to obtain precise causal models of underlying processes, since such applications focus on relatively rare events for which we cannot obtain good training examples. On the other hand, causal models are indispensable for systematic fusion and the question is whether we can still achieve reliable estimation based on imprecise models. We explain how we can cope with uncertain models and show how the reinforcement counters can be used to analyze the classification quality.
730
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
5.1. Coping with Imprecise Models Properties 3 and 4 imply that under weak assumptions diagnostic inference supports very robust classification with asymptotic properties, even if we use imprecise models and noisy evidence. This properties very important for real world applications. We can often obtain large amounts of very noisy and heterogeneous information, for which we cannot specify precise models relating the observations to possible hidden facts. Such information can originate from humans and can be obtained through databases, GSM networks, etc. It is impossible to obtain precise sensor models relating human reports and observed events. On the other hand, we can assume that more than 50% of the reports indicate the true hidden cause (see rationale in section 3.1) and that we can obtain heterogeneous information corresponding to BNs with large branching factors. In such cases we can still achieve high-quality fusion, despite uncertain models and evidence. 5.2. Analysis of the Inference Quality We can use the reinforcement counters to determine a lower bound on the accuracy of the inference results. If we can assume the lowest probability of an accurate reinforcement, then we can compute the lower bound pch on the probability that the maximum reinforcement counter corresponds to the true state (see property 3). If the maximum counter is associated with the state that has the greatest posterior, then we know that this is the true state with probability greater than pch . The greater the counter, the greater is the likelihood that the state with the maximum posterior is the true state. In other words, the value of the maximum counter indicates the classification quality. Why is this relevant? If we had a perfect model, then the distribution obtained through propagation is already the best estimate. However, we can still encounter rare cases, in which even a perfect model will not support correct classification. Namely, we estimate a state in a particular situation by using causal models that generalize over many situations (see Section 3). Thus, the value of the counter can provide additional information on the quality. If the counter does not exceed a certain value, then we might be dealing with a rare case. In such a case we have to gather more evidence or we know that we should not rely on the estimation results, since they might be misleading. In other words, the reinforcement counters provide a flagging mechanism signaling potentially misleading estimation results. In addition, we can identify BN fragments that will provide misleading results with high probability, even if they precisely model the true distributions, i.e. pre < 0.5 . 5.3. Topological Analysis Property 4 suggests that robustness depends on the branching factors and the direction of inference. If we can make the assumption that each reinforcement in a particular model is accurate in more than 50% of the cases, but we do not know the exact percentage, we can still compute the reinforcement counters with an assumed worst case pre . In this way we can see how we should construct a model in order to maximize the accuracy; where in the model we should increase the branching factor in order to improve the classification quality. Although we cannot determine the actual improvement in terms of error margins, we can see the tendencies of the accuracy improvement. In other words, such tendencies provide guidance for the construction of robust fusion systems (e.g. see [4]).
J. Nunnink and G. Pavlin / Towards Robust State Estimation with Bayesian Networks
731
6. Discussion In this paper we focus on relevant and challenging aspects of the state estimation accuracy. State estimation can often be formulated as classification of possible states. We consider classification with Bayesian networks (BNs) and investigate the classification accuracy by putting belief propagation processes into a coarse perspective. We introduce the notion of belief reinforcements, with which we can identify important properties of inference in BNs w.r.t. the classification accuracy. We show that information fusion based on diagnostic inference in BNs can have asymptotic properties. Namely, if we use BNs with sufficiently great branching factors, the classification accuracy converges to 1 as the number of independent factors or reinforcements increases. This property is based on the assumption that the probability of having accurate reinforcements is greater than 0.5, which in turn is related to the notion of the adequacy of CPTs. We show that such assumption is realistic in many domains in which accurate reinforcements can be obtained if very weak relations from the true distributions are captured by the modeling parameters in the CPTs. Moreover, we can show that asymptotic fusion properties can be very insensitive to the parameter values. This implies that the fusion can be very robust, which is especially relevant in the domains where we cannot obtain precise models due to the lack of sufficient training data or expertise. Note that these properties reflect empirical results discussed in [2] and the analysis of simple Bayesian classifiers [1]. In addition, we show that there exist domains which simply do not allow robust fusion, even if we knew the true distributions between the events in the domain. The identified properties in the context of fusion accuracy are very relevant for real world applications. Firstly, the properties guide the design of inherently robust estimation systems. Secondly, we can use the reinforcement counters as a confidence or quality measure that supports alerting mechanisms in the case of potentially misleading estimation results; consistency among the reinforcements indicates accurate estimation.
References [1] P. Domingos and M. J. Pazzani. Beyond independence: Conditions for the optimality of the simple bayesian classifier. In International Conference on Machine Learning, pages 105– 112, 1996. [2] M. Henrion, M. Pradhan, B. Del Favero, K. Huang, G. Provan, and P. O’Rorke. Why is diagnosis using belief networks insensitive to imprecision in probabilities? In Twelfth Conference on Uncertainty in Artificial Intelligence, pages 307–314. Morgan Kaufmann, 1996. [3] F. V. Jensen. Bayesian Networks and Decision Graphs. Springer-Verlag, New York, 2001. [4] G. Pavlin, M. Maris, and J. Nunnink. An agent-based approach to distributed data and information fusion. In IEEE/WIC/ACM Joint Conference on Intelligent Agent Technology, pages 466–470, 2004. [5] G. Pavlin and J. Nunnink. Inference meta models: A new perspective on inference with bayesian networks. Technical Report IAS-UVA-05-03, Informatics Institute, University of Amsterdam, The Netherlands, December 2005. [6] J. Pearl. Fusion, propagation, and structuring in belief networks, volume 29. Elsevier Science Publishing Company, 1986. [7] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
732
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
HRP-2W: A Humanoid Platform for Research on Support Behavior in Daily life Environments Tetsunari Inamura , Kei Okada Masayuki Inaba and Hirochika Inoue Dept. of Mechano-Informatics, Graduate School of Informatics Science and Technology, Univ. of Tokyo Abstract. In this paper, we introduced a concept of real-world oriented humanoid robot that can support humans’ daily life behavior. Such real-world oriented robots have to behold humans, understand behaviors of humans and support daily life tasks. Especially, real-world behavior such as handling of table wares, delivering daily commodities by hand and so on, are required. We developed a humanoid robot HRP-2W, which has a wheel module instead of legs, is developed as a research platform for the aim. We also developed basic software configuration in order to some research groups are easily integrated. Through experiments, we show feasibility of the humanoid robot platform and availability of software architecture for project fusion. Keywords. Humanoid Robots, Human Robot Interaction, Daily Life Environment, Behavior Learning and Acquisition
1. Introduction Agent systems, that can communicate with humans and offer solutions against problems, are recently gaining a great deal of attention. On the other hand, development of humanoid robots is also under intense study. Integrating these two technologies, realization of real world oriented humanoid robots is becoming more prospective. Such real-world oriented robots have to behold humans, understand behaviors of humans and support daily life tasks. Especially, real-world behavior such as handling of table wares, delivering daily commodities by hand and so on, are getting more important in real-world environments. Following such background, research on humanoid robots trends toward contents oriented system, not individual function such as manipulation, image understanding, motion control and so on. Interaction between humans, that is gesture understanding, natural language communication, teaching by showing, storage and reuse of communication experience, have been well studied. Bischoff et al have developed HERMES[1] that can interact with humans and understand their instructions by natural language. Thrun et al also have developed a tour guide robot in a museum environment. It can build environment maps using distance sensors, 1 Correspondence
to: 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan. E-mail: [email protected].
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
733
Figure 1. Image of daily life support by humanoid robots
navigate in a museum and communicate with audiences. Our approach considers not only such communication functions but also physical interaction between humans and watching functions of humans’ behavior. Figure 1 shows an ultimate goal image of our project. Robots always pay attention to humans’ behavior, forecast humans’ intention and support the daily life behavior before humans ask for helps. In this paper, we describe methodology how to realize real-world oriented humanoid robots.
2. HRP-2W: A humanoid robot platform for support behavior 2.1. Concept We developed a humanoid robot platform “HRP-2W” for research on support behavior in daily life environments. An outline of the robot is shown in Fig.2. Similar humanoids with wheel unit have already proposed [1][2]. One of the differences between those researches is realization of long-term behavior in both in-door and out-door environments. Four large battery packs inside its body enable about four-hour behavior without any power supplies. Through the long-term behavior, the robot stores plentiful shared experiences by making communication with humans, then learn how to act in complex daily life environments. Another concepts of the platform is that the researcher can focus on intelligent applications without consideration of delicate balance control. Ishiguro et al have developed interactive humanoids “Robovie”[3]. Thrun et al have also developed an interactive tour-guide robot[4]. These robots have communication functions between human beings, however, manipulation and handling of daily-life goods are not well studied. Therefore, we developed a humanoid robot platform that has communication, learning and handling functions in daily-life real world.
734
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
5VGTGQ OKETQRJQPGU
&DKPQEWNCT ECOGTCU
#RRNKECVKQPU
CZGUHQTEG UGPUQT
7PFGTUVCKPF QHDGJCXKQT
/QFGNQH EQOOQFKVKGU
5GPUQT +PRWV 5QWPF +OCIG .4( #EEGRVCPEGQH +PUVTWEVKQPU 1RGTCVKQPU
/GOQT[ .GCTPKPI 4GEQIPKVKQP
.CUGT4CPIG (KPFGT
5GPUQT .C[GT .KDTCT[ .C[GT
Figure 2. An outline of HRP-2W.
Figure 3. Software configuration
2.2. Hardware Configuration The following lists are equipped sensors on the humanoid platform; 20DoFs: 3 for each shoulder, 1 for each elbow, 3 for each wrist, 1 for finger on each hand, 2 for head, 2 for waist. Binocular color cameras for stereo vision. Stereo microphones for speech recognition and sound source localization. A speaker for speech utterance. Force sensor for six axes on both hands. A Laser Range Finder for building maps, localization and navigation Independence system based on batteries with large capacity and wireless LAN. A frame of wheelchair is adopted as an under-body of the humanoid. The wheelchair is designed for welfare use, therefore the humanoid easily go out to outdoor environments. On the wheel module, two high-capacity batteries are loaded for long-term behavior. 2.3. Software Configuration Figure 3 shows a software configuration of this system. Central sensor layer provide perception result from auditory, visual and distance sensor. At the library layer, following functions are operated using sensor information; 1. Understanding of humans’ behavior based on vision
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
735
%QNQT$KPQEWNCT%COGTCU 6YQ/KETQRJQPGU
OO
4QDQV2CTV
,Q[UVKEMOQFWNG
%QORWVGT$QZ
%QORWVGT$QZ
$CVVGTKGU HQTYJGGN
9JGGN2CTV
OO
OO
5KFG8KGY (TQPV8KGY
Figure 4. Front view and size view of HRP-2W.
2. Model of daily-life commodities for manipulation and handling 3. Memorization, learning and recognition of time-series pattern of environment and humans behavior 4. Acceptance of humans’ instructions and operations. Applications are developed based on above libraries. This research project is premised on development sharing of various functions among several project teams. Therefore we adopted a common software interface and provided basic common software modules for the project fusion. A distributed application architecture CORBA is adopted as an integration framework. With the help of the CORBA, researchers can easily match their application to the central software basis without consideration of differences among programming languages and operating systems. We have developed a behavior planning method based on geometric and functional model of room configuration and daily life tools. Using the model, we have performed complex behavior like washing dishes, cleaning up with a broom, and so on[5].
3. Understanding of Daily Life Environment based on Communication 3.1. Understanding of pointing behavior Figure 5 shows an experiment that a human instruct daily life behavior to the humanoid robot by pointing gesture. First, the user instruct a target bottle with pointing gesture. Accessing to a vision processor module, the robot 3D information of the user’s hand and the pointing finger. At picture No.2, the robot checks whether a vector of pointing finger collide with any objects in environment model. At picture No.3, the user show to the robot which trash box is suitable for the trash. Finally, the robot discard the trash in the pointed trash box.
736
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
Figure 5. Understanding of Object Pointing Behavior
Figure 6. Garbage separation behavior using vision and force sensors
4. Behaviors in Daily-life environment 4.1. Objects manipulation and usage of tools in daily life environments Figure 6 shows an experiment that the humanoid sort trashes and throw away them into trash boxes. The humanoid detect the position of a trash, then move the hand to grasp the trash (from picture No.1 to No.3). Next, it distinguish between cans and bottles by visual pattern matching (from picture No.4 to No.5). Finally, the robot throw away the trash into a suitable trash box (picture No.6 and No.9). 4.2. Objects understanding behavior by vision and audio To understand daily life environments, not only visual information but also auditory information is effective. The humanoid has sound localization and sound identification functions. These functions are used in a situation where object identification was not succeeded by visual information.
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
737
Figure 7. Identifying a target object by vision and sound localization
Figure 7 shows an experiment where the humanoid identify a ringing mobile phone, then hand-deliver the mobile phone to a human. At the picture No.1, the humanoid could localize the sound source, however it could not identify the ringing phone because there were two similar phones in the same direction. Therefore, the humanoid moves one of them and listen to the bell sound to distinguish the target phone. If the humanoid confirmed the end-point position and the sound source direction are synchronized, the phone in the moved hand is identified as the ringing phone. At the picture No.2 and 3 shows the behavior of the confirmation.
5. On-line behavior learning and acquisition The main target tasks of this research are daily life behavior, such as handling of platewares, cleaning of furniture, operation of home information appliances, and so on. Designing autonomous behavior in advance to achieve above tasks is difficult for developer, because models of the tools are hardly predicted and acquired in unknown environments. Therefore, on-line behavior learning and acquisition functions are adopted to act in each case. As user interfaces for the learning and acquisition of behavior, we developed two systems as follows: 1) Integration of on-line remote operation and autonomous behavior and 2) On-line imitation with a motion capturing device. 5.1. Integration of on-line remote operation and autonomous behavior As a method for behavior teaching to HRP-2W, we have developed a remote operation system using a game pad controller. The system always observes instructions by a operator, then learns the motion patterns with hidden Markov models (HMM). During the user’s operation, the system also recognizes what kind of behavior is input by the user. If the operation is not suitable for situation, the robot select the autonomous behavior which is output of the HMM recognizer. The detail of the mechanism is described in [6]. We have confirmed the usability of the on-line operation system in outdoor environment. Figure 8 shows the behavior in a festival event. The operator could improvise several behaviors requested from audience. 5.2. On-line imitation with a motion capturing device Recently, motion capturing systems are widely used in behavior learning and teaching for humanoids. Almost all motion capturing systems adopt optical device or magnetic device, however, they are inconvenient in daily life environments because of the restric-
738
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
Figure 8. Improvisation of robot’s behavior in an outdoor environment
Figure 9. (left) Portable motion capturing system with 18 gyro sensors. (right) Motion modification from an operator’s to the humanoid’s
tion of movable area. We adopted a wearable motion capturing system without such a restrictions. The used motion capture system is “GypsyGyro” manufactured by Spice Inc. and Animazoo Inc. This capture device has 18 gyro sensors and these sensors are attached on a operator’s body as shown in Fig.9. Each gyro sensor can measure acceleration for three degrees, and send the measured data to center unit via wireless transmission. Sampling rate of the measurement is 120[Hz], resolution is 0.03[deg] and the maximum measure error is about 1[deg]. These properties satisfy the use in daily life environment and imitation of the objective behavior. Raw motion patterns performed by an operator are not able to apply to the humanoid directly, because the body configuration of the operator differs from the one of the humanoid. Therefore, positions and postures of both hands measured by the motion capturing device are apply to the humanoid behavior. On the other hand, inconsequential patterns like joint angles of elbows are not used for the imitation. Arrangement of focus points for imitation results solution of the body configuration differences and effective handling and manipulation behaviors. We have confirmed that the humanoid can imitate several daily life behaviors like cleaning up and carrying a glass of water. Figure.10 shows imitation behavior of carrying a glass of water. If the operator failed to keep the hand’s posture, the robot would modify the failure using model of tasks as shown in Fig.11. The detail methodology is discussed in paper[7]. 6. Summary and Conclusions In this paper, we introduced a concept of real-world oriented humanoid robot that can support humans’ daily life behavior. As a research platform, a humanoid robot HRP-2W
T. Inamura et al. / HRP-2W: A Humanoid Platform for Research on Support Behavior
739
,QKPV CPINGU
㱔
2QUKVKQPCPFRQUVWTG QHHQEWUGFJCPF
,QKPV CPINGU
㱔
-PQYNGIGQHVCUMU
Figure 10. On-line Motion teaching and modifying Figure 11. On-line Motion Modification based on knowledge of tasks
which has a wheel module instead of legs, is developed. For realization of supporting behavior in daily life environment, following functions are embedded as basic library; 1) Understanding of humans’ behavior based on vision, 2) Model of daily-life commodities for manipulation and handling, 3) Memorization, learning and recognition of time-series pattern of environment and humans behavior, 4) Acceptance of humans’ instructions and operations. Through experiments, feasibility of the humanoid robot platform and availability of software architecture for project fusion. We are planning to develop a waiter robot in simple restaurant environment based on the software modules and the platform. In such a complex environment, modularity of the developed software architecture would show effective results. References [1] R. Bischoff and V. Graefe. Hermes - an intelligent humanoid robot, designed and tested for dependability. Experimental Robotics VIII, B. Siciliano and P. Dario (eds), springer tracts in advanced robotics 5, Springer, pages 64–74, 2003. [2] Shuji Hashimoto and Hideaki Takanobu et al. Humanoid Robots in Waseda University Hadaly-2 and WABIAN -. In Proc. of IEEE-RAS International Conference on Humanoid Robots, 2000. [3] Hiroshi Ishiguro, Tetsuo Ono, Michita Imai, Takeshi Maeda, Takayuki Kanda, and Ryohei Nakatsu. Robovie: an interactive humanoid robot. Int’l. Journal of Industrial Robotics, 28(6):498–503, 2001. [4] S. Thrun, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg, N. Roy, J. Schulte, , and D. Schulz. Minerva: A second generation mobile tour-guide robot. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1999–2005, 1999. [5] Kei Okada, Takashi Ogura, Atsushi Haneda, Junya Fujimoto, Fabien Gravot, and Masayuki Inaba. Humanoid Motion Generation System on HRP2-JSK for Daily Life Environment. In Proc. of IEEE Int’l Conf. on Mechatronics and Automation, pages 1772–1777, 2005. [6] Naoki Kojo, Tetsunari Inamura, and Masayuki Inaba. Behavior induction by geometric relation between symbols of multi-sensory pattern. In Proc. of Int’l Conference of Intelligent Autonomous Systems, 2006. [7] Tetsunari Inamura, Naoki Kojo, Tomoyuki Sonoda, Kazuyuki Sakamoto, Kei Okada, and Masayuki Inaba. Intent Imitation using Wearable Motion Capturing System with On-line Teaching of Task Attention. In Proc. of IEEE-RAS Int’l Conf. on Humanoid Robots, pages 469–474, 2005.
740
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Human Supporting Production Cell “Attentive Workbench” Masao Sugi 1 , Yusuke Tamura, Makoto Nikaido, Jun Ota, Tamio Arai, Kiyoshi Takamasu, Kiyoshi Kotani, Akio Yamamoto, Hiromasa Suzuki, Yoichi Sato, Fumihiko Kimura, and Seiichi Shin The University of Tokyo Abstract. We have proposed “Attentive Workbench (AWB),” a new cell production system in which an intelligent system supports human workers. Recognizing worker’s condition and intention, the system supports workers from both physical and information aspects. This paper deals with physical assembly support using self-moving parts trays. The system delivers necessary assembly parts to workers and clears finished products quickly. On transporting large products, multiple trays are subjected to form a rigid body working as a large single tray. In order to realize this arrangement of trays in real-time, a planning method based on the priority scheme and heuristic rule is proposed. Self-moving trays and a simple gesturebased interface are integrated to show a real example of deskwork support. Keywords. Cell production systems, assembly support, human-robot interfaces, gesture recognition.
1. Introduction In recent years, manufacturers are expected to maintain variety in their product lines while retaining the ability to quickly produce appropriate products in adequate quantities. For this reason, instead of automated assembly lines, cell production systems have come into wide use in manufacturing environment. In a cell production system, a single human worker manually assembles each product from start to finish [2,3]. Multiple skilled human workers enable cell production systems to accommodate diversified products and production quantity more flexibly than fully automated manufacturing systems. There are dynamic changes in the age distribution of working population. With negative or zero growth of the population, together with the tendency of young people avoiding manufacturing jobs, we will face a shortage of skilled workers, and hence a great difficulty in maintaining the cell production system. We have proposed attentive workbench (AWB) [1]: an intelligent cell production system. Figure 1 shows a schematic view of attentive workbench. The system recognizes the intention and condition of human workers through cameras and vital signs monitors, and presents information through projectors. The system delivers assembly parts to the 1 Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, 73-1 Hongo, Bunkyo-ku, Tokyo 113-8656 Japan. Tel: +81 3 5841 6486; Fax: +81 3 5841 6487; Email: [email protected].
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
741
Projector
Camera
Self-moving parts trays (b)
Human worker with vital signs monitor
(a)
(c)
Figure 1. (a) Schematic view of attentive workbench (AWB). (b) Self-moving parts trays driven by Sawyer planar motors running on a passive iron platen. (c) Basic structure of a Sawyer planar motor and a platen with checkered grooves.
workers and clears away the finished products, using self-moving parts trays. In these ways, the system supports human workers from both information and physical side. As a consequence, the system will achieve a higher yield rate and productivity, by reducing failure and time in picking up the assembly parts. The system enables less experienced people to enter the workforce. As a related work, Roland DG Corp. (Hamamatsu, Japan) uses a production cell that monitors assembly progress and presents the information about the next assembly process [2]. A visual display device is used there as a presentation device. Molineros and Sharma, [4], Reinhart and Patron [5] introduce augmented reality to the information presentation using a semi-transparent head mount display. These studies deal with the assembly support of human workers from information side. The assembly support from physical side seems to have drawn less attention than the support from information side. Hayakawa, Kitagishi, and Sugano [6] propose an assembly supporting system with a manipulator grasping assembly parts to minimize wobbling. Raven Engineering Inc. (Bloomington, IN, U.S.A.) invented a new production cell called “Raven Super Cell” [3], with two tables combined: a horseshoe-shaped fixed table and a round-shaped movable worktable powered by an automatic rotary indexer. This improvement of the assembly cell in spatial layout and mechanical function brings about a higher productivity. Attentive workbench will also work effectively for home environments, where needs are focused on responding to wide variety of user’s preferences, e.g. computer work, reading and writing documents, eating foods, and so forth. In this application, we have proposed a deskwork support system. A pointing gesture interface is used in this system. We have proposed a method for estimating the user’s intention by integrating sensor observation with the history of user’s activity in the framework of Bayesian network [7]. In section 2, attentive workbench is outlined. Components of AWB are presented. In section 3, we focus on physical assembly support. Planning for self-moving trays is described. Two kind of planning, i.e. path planning for each tray and arrangement planning for multiple trays are preseented. In section 4, self-moving trays and a simple
742
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
gesture-based interface are integrated to show a real example of physical support. We conclude this paper in section 5.
2. Overview of Attentive Workbench (AWB) The key technologies and devices of an attentive workbench consist of the following three items: Self-moving parts trays: In usual manufacturing systems, parts trays have no mobile mechanism in themselves. They are transported by other manufacturing devices (e.g. belt conveyers). We introduce self-moving trays driven by a Sawyer-type 2-DOF stepping motor [8](see Fig. 1-(b)), aiming for higher productivity than usual cell production systems. A self-moving parts tray carries assembling parts and supplies the parts to the human worker. An empty tray receives a product or a subassembly from him and clear it out. When the object to be transported is larger than a single tray, multiple trays are organized to form a rigid body working as a large single tray. Sawyer planar motor is known to have high speed, high positioning accuracy, and high thrust. The motors move on a platen, which is an iron plate having checkered grooves filled with insulating plastic, as illustrated in Fig. 1-(c). The motors are floated by compressed air lifts about 20 (μm) above the platen surface. Figure 1-(b) shows the implemented self-moving trays, each of 8 (cm)×8 (cm) square shape, with a speed up to about 80 (cm/s), a positioning accuracy of 40 (μm), and a thrust of 40 (N) at maximum. Currently three trays are available. EnhancedDesk: It is a desk-type human-computer interface with augmented reality proposed by Nakanishi, Sato, and Koike [10]. Users express their intention by hand gestures, and the system recognizes them using the infrared camera [9]. Then the system presents the information using an LCD projector or a plasma display. Estimation of the state of workers based on bio-measurement technologies: The state of human workers can be estimated from their heart rates and respirations, both measured by vital signs monitors. In this respect, Kotani et al. [11] have introduced a new method for analyzing respiratory sinus arrhythmia (RSA) as to respiratory phase.
3. Planning for Multiple Self-Moving Trays 3.1. Path planning for the self-moving trays We use the priority scheme [12] proposed by Erdmann and Lozano-Pérez for the path planning for the self-moving trays [1]. Robots (i.e. self-moving trays in this case) have priority values. Then the robots plan their paths one after another in the order of priority values. On planning the path, each robot recognizes any other with higher priority value as an obstacle in the configuration time space. Computational power of this method is not exponential but linear to the number of robots. This method is therefore suitable for the case with a large number of robots. The configuration time space is approximated to be a discrete 3D space. The x–y plane is tessellated into square cells with a side of 8 (cm). For a tray with a moving speed of 8 (cm/s), it takes 1 (s) to move from a cell to one of its neighbors. Therefore
743
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
Trays
Selected Trays
Selected Trays
Objective arrangement
(a1)
(a2)
(b1)
(b2)
Figure 2. Two decisions necessary for tray arrangement. (a1) An example of tray selection. (a2) Another (but worse) example of tray selection. (b1) An example of start-goal pairing for the selected trays. (b2) Another example.
the temporal axis is divided into units of 1 (s) long. The total length of the temporal axis is set to 20 (s). For the path searching, we use the width-first search, considering the completeness and the optimality of this method. 3.2. Transporting Large Objects Using Multiple Trays When an object (a part or a product) is larger in size than a single tray, a group of trays is subjected to form a rigid body working as a large single tray. Referring to the shape and size of the objects, the control system of attentive workbench calculates the necessary number of trays and their arrangement for carrying the object. 3.3. Tray Arrangement Problem Then the control system of attentive workbench gathers the empty trays (i.e. trays not carrying any objects) together according to the calculation. The control system should, (a) select necessary trays from all the empty trays, and (b) determine the goal position of each of the selected trays. After these two decisions, the control system can plan paths of the trays. Hereafter we refer to the above two decisions (a) and (b) as “tray selection” and “start-goal pairing,” respectively. These two decisions affect the length of paths of the trays and the necessary time for developing an objective arrangement of trays considerably. Fig. 2-(a1) and (a2) are two examples of the tray selection for the same objective arrangement. Selection in Fig. 2(a2) will bring worse performance (i.e. longer necessary time for trays to build the objective arrangement) than in Fig. 2-(a1), because the selected trays are located farther. Fig. 2-(b1) and (b2) shows two examples of the start-goal pairing of the selected trays. The case in (b2) seems to bring worse performance than in (b1), since conflicts between trays in (b2) is inevitable. Having freedom of tray selection and start-goal pairing, our problem is quite different from conventional multi-robot path planning (e.g. [13,14,15]), in which the goal position of each robot is given a priori. Because the combination of tray selection and start-goal paring is very large, we cannot obtain the optimal solution in real time. For this problem, we use a method based on the priority scheme and heuristic rules.
744
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
Trays
Center of gravity of trays 9 8 6 7 5 3 4 2 1
Priorities of each goal position in the objective layout
(a)
5
6
7
6
5
6
4
Obstacle
4
5
3
2
1
2
3
4
2
1
0
1
2
3
Goal position
(b)
Figure 3. (a) Setting priorities to each goal position in the objective arrangement of trays. (b) Manhattan distance of each cell from a goal position with obstacles taken into account.
3.4. Tray Arrangement Planning Based on Priority Scheme and Heuristic Rule We extend the priority scheme mentioned in subsection 3.1 to arrange trays. Task Manager determines the paths of trays according to the following two steps, 1) Set priorities to the goal positions of the trays in the objective arrangement (see Fig. 3-(a)). 2) In the order of the priorities of goal positions, choose an empty tray to compose a start-goal pair. In the above procedure 1), in setting priorities to the goal positions, we employ a heuristic rule in which the priority number is given in the order of the Manhattan distance to the center of gravity of trays, as shown in Fig. 3-(a). This rule reduces the occurrence of deadlock in path planning. The reason Manhattan distance is used is that the movement of actual trays (see Fig. 1-(b)) is limited to x or y direction with no diagonal motion. It is due to the constraint of trays’ control boards in which diagonal motion is not supported asynchronously. In the procedure 2), for start-goal pairing, we propose two heuristic rules as follows, • Select the nearest tray to each goal position in regard to Manhattan distance, with obstacles taken into account (i.e. shortest path length, see Fig. 3-(b)). It is noted that “obstacles” include other trays (i.e. trays not selected to comprise the objective arrangement). By two procedures mentioned above, complete set of start-goal pairs are obtained. Then the actual path will be calculated using priority scheme described in subsection 3.1.
4. Demonstration of Physical Assembly Support We have integrated self-moving trays and the gesture-based interface into the prototype of attentive workbench. Using this system, we demostrated a physical deskwork support, as shown in Fig. 4. In the demonstration, a human worker assembles a toy car made of LEGO composed of four wheels, two wheel shafts and a body, shown in Fig. 5-(a). Although we employed an infrared camera for gesture recognition in [7], it is not suitable for implementation, due to heat generated by Sawyer planar motors. In this pa-
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
745
Figure 4. Demonstration of physical assembly support.
per, we use a color camera along with a colored glove shown in Fig. 5-(b). The glove has three colors; blue for the tip of the index finger, red for the tip of the thumb, and yellow for the rest of fingers and the palm. The process of the demonstration is, 1. The user points at a necessary tray. The tray moves toward the worker. 2. The user picks up assembly parts from a tray, and orders the tray to go back to their home positions. 3. Repeat processes 1 and 2 until the assembly is finished. 4. When the assembly is finished, the trays clear away the finished product. If the product is larger than a single tray, two trays are used to transport the product. Fig. 4-(a-h) shows the progress of demonstration. In Fig. 4-(a), there are three selfmoving trays, in which three kinds of parts (wheels, wheel shafts, and a body) are stored. In Fig. 4-(b), the user selects a tray necessary for his work by pointing his forefinger to it. In Fig. 4-(c), the user specifies the goal position for the selected tray, and the tray moves toward its goal position. The user picks up car wheels from the tray in Fig. 4-(d). In Fig. 4-(e), the user orders to the empty tray to go back to its home position by pointing his forefinger at the tray, The tray is going back to its home position in Fig. 4-(f). The assembly has finished in Fig. 4-(g), and the user hands over the finished product to the trays. The product is larger than each tray, therefore two trays are used to transport the product. In the above demonstration, pointing gesture has two different meanings; one is to show the direction, and the other is to show the specific position on the desk. The former
746
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
(a)
(b)
(c)
(d)
Figure 5. (a) A toy car made of LEGO. (b) Pointing gesture showing the pointing direction. (c) Pointing gesture showing a specific position. (d) Foot switch.
usage is shown in Fig. 4-(b), and the latter is shown in Fig. 4-(c) and (e). To avoid the ambiguity, we introduced two kind of gestures shown in Fig. 5(b-c), which are corresponding to these two meanings one to one. Although the colored glove reduces the occurrence of recognition errors, errors still occur when the system mistakes the worker’s non-intentional motion for an intentional pointing gesture. To avoid this kind of error, a footswitch, as shown in Fig. 5-(d), is introduced for an additional input device. When making a gesture, the user steps on the footswitch. On/off information from the footswitch helps the system to recognize the current status of the user (making a gesture or assembling parts) clearly.
5. Conclusions We have presented attentive workbench (AWB), a new cell production system in which an intelligent system supports human workers from both information and physical sides. The present system enables us to cope with dynamically diversifying demands of consumers with fewer workers. In this paper, attentive workbench (AWB) has been outlined first. AWB consists of an augmented human-computer interface, vital signs monitors, and self-moving parts trays driven by planar motors. Next we have dealt with the physical assembly support using self-moving trays. The trays supplies assembly parts to the workers and clears the finished assembly product. Multiple trays are gathered to transport large objects. For this gathering of multiple trays, we have proposed a planning method based on the priority scheme and heuristic rule. Integrating the prototype of attentive workvench with gesture based interface, we demonstrated a physical deskwork support. In the next stage, the attentive workbench will be evaluated based on real assembly experiments. The present system will be compared with a conventional assembly cell with respect to sensory evaluation by workers and productivity. Currently we are now constructing the second generation of the prototype system.
Acknowledgments This research is partially supported by Grant-in-Aid for COE Research “Information Science and Technology Strategic Core” and for Young Scientists (B) (No. 16700165), both
M. Sugi et al. / Human Supporting Production Cell “Attentive Workbench”
747
from Ministry of Education, Culture, Sports, Science, and Technology, and by ElectroMechanic Technology Advancing Foundation. The authors thank to Dr. Kenji Oka, who was a student of the Institute of Industrial Science, the University of Tokyo, for his useful suggestions and comments.
References [1] M. Sugi, M. Nikaido, Y. Tamura, J. Ota, T. Arai, K. Kotani, K. Takamasu, S. Shin, H. Suzuki, Y. Sato, “Motion Control of Self-Moving Trays for Human Supporting Production Cell ‘Attentive Workbench’,” Proc. 2005 IEEE Int’l Conf. Robotics and Automation, pp. 4091–4096, 2005. [2] S. Seki, “One by One Production in the ‘Digital Yatai’ — Practical Use of 3D-CAD Data in the Fabrication —,” J. Japan Society of Mechanical Engineering, Vol. 106, No. 1013, pp. 32–36, 2003 (in Japanese). [3] G. Vasilash, “Cell for Assembly,” Automotive Manufacturing & Production, June 1999. [4] J. Molineros R. Sharma, “Computer Vision for Guiding Manual Assembly,” Proc. 4th IEEE Int’l Symp. Assembly and Task Planning, pp. 362–368, 2001. [5] C. Reinhart, C. Patron, “Integrating Augmented Reality in the Assembly Domain — Fundamentals, Benefits and Applications,” Ann. CIRP, Vol. 52/1/2003, pp. 5–8, 2003. [6] Y. Hayakawa, I. Kitagishi, S. Sugano, “Human Intention Based Physical Support Robot in Assembling Work,” Proceedings of the 1998 IEEE/RSJ Int’l Conf. Intelligen Robots and Systems, pp. 930–935, 1998. [7] Y. Tamura, M. Sugi, J. Ota, T. Arai, “Deskwork Support System Based on the Estimation of Human Intention,” Proc. 13th IEEE Int’l Workshop on Robot and Human Interactive Communication, pp. 413–418, 2004. [8] B.A. Sawyer, “Magnetic Positioning Device,” US patent 3,457,482, 1969. [9] K. Oka, Y. Sato, H. Koike, “Real-Time Fingertip Tracking and Gesture Recognition,” IEEE Computer Graphics and Applications, Vol. 22, No. 6, pp. 64–71, 2002. [10] Y. Nakanishi, Y. Sato, H. Koike, “EnhancedDesk and EnhancedWall: Augmented Desk and Wall Interfaces with Real-Time Tracking of User’s Motion,” Proc. Ubicomp2002 Workshop on Collaborations with Interactive Walls and Tables, pp. 27–30, 2002. [11] K. Kotani, I. Hidaka, Y. Yamamoto, S. Ozono, “Analysis of Respiratory Sinus Arrhythmia with Respect to Respiratory Phase,” Method of Information in Medicine, Vol. 39, pp. 153– 156, 2000. [12] M. Erdmann, T. Lozano-Pérez, “On Multiple Moving Objects,” Algorithmica, Vol. 2, pp. 477–521, 1987. [13] S. J. Buckley, “Fast Motion Planning for Multiple Moving Robots,” Proceedings of the 1989 IEEE Int’l Conf. Robotics and Automation, pp. 322–326, 1989. [14] T. Yoshioka, H. Noborio, “Sensor-based Traffic Rules for Multiple Automata Based on a Geometric Deadlock-free Characteristic,” J. Robotics and Mechatronics, Vol. 8, No. 1, pp. 48–56, 1996. [15] T. Arai, J. Ota, “Motion Planning of Multiple Mobile Robots using Virtual Impedance,” J. Robotics and Mechatronics, Vol. 8, No. 1, pp. 67–74, 1996.
748
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
A foveal 3D laser scanner integrating texture into range data Marcus Walther, Peter Steinhaus and Rüdiger Dillmann Institute for Computer Science and Engineering Universität Karlsruhe email: {mwalther,steinhaus,dillmann}@ira.uka.de Abstract. In the meantime the acquisition of dense point clouds or triangulated surface data with 3D laser range scanners is in the meantime a widely discussed topic. Especially in the areas of world modelling or autonomous navigation reliable 3D data is necessary. Mobile systems applications like collision avoidance or simultaneous localisation and mapping (SLAM) require high data rates, dense point clouds, constant availability of data and exact single scans. Commercial 3D laser scanner are mostly expensive, not mobile, too slow or not exact enough. We present in this article a 3D laser range scanner which is based on a commercial 2D scanner with a movable origin. In contrast to other "self-build" 3D scanners our scanner reaches high data density in heading direction and lower density at the boundaries (foveal vision), which is important for autonomous navigation. Keywords. 3D vision, depth image, texture integration, world modelling
1. Introduction In autonomous navigation, collision avoidance and world modeling are still major topics towards real autonomy. Open question is how to collect, arrange and store environment information suitably. Depending on the task different types or amounts of data have to be collected. In general it can be seen that dense 3D point clouds suit well for most applications. Hence the task for the robot’s sensory equipment is as follows: • • • • • • •
Scanning of a wide region of the surrounding space Scanner range from some centimeters up to several meters High accuracy of the laser scanner High scanning velocity Possibility to configure the device for different resolutions of the point cloud Easy and robust construction Foveal vision for having the highest resolution in the heading direction
Different variations of moving 2D scanners have already been proposed. In [1], [2], [3], [4], [5] or [6] single scan lines are taken with a horizontal movement. This approach can generate a good model of the environment, but cannot avoid collisions. Depth image based approaches can be found in [7],[8], [9]. Here first steps towards autonomy or semiautonomy are taken. The taken depth images show a high resolution in (for the robot)
749
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
uninteresting areas like the cealing [7] or the area laterally [8] of the robot. As stated above it is important for us to have the highest resolution in the heading direction of the robot, especially for collision avoidance and model building with a single sensor device. The current commercial avaible 3D scanners have a very small viewing cone, which is bad for registration and do not support foveal vision. To overcome these problems we developed a 3D sensor called RoSi (Rotating Sick) which is a common Sick LMS200 scanner rotating continuously around its optical axis. This setup results in the foveal resolution of the point cloud. To improve navigation capabilities we also detect remission values and integrate texture and colour information into the model. The remainder of the article is as follows. Chapter 2 describes the setup of the Rosi scanner. Chapter 3 shows the implemented texture integration. In chapter 4 we show some experimental results.
2. The Rosi Scanner 2.1. Hardware Setup The RoSi scanner consists of a commercial Sick LMS 200 which is attached to a rotation axis. In contradiction to the known systems, the rotational axis lies in the middle of the scanner’s field of view, and hence in the area, which is observed. To rotate the scanner we use a common DC-motor. In figure 1a the setup of the first RoSi scanner (RoSi I) is depicted. A 24V DC-motor is used with a epicyclic gear and an optical encoder. For power and data transmission a 10 channel rotary feedthrough with slip rings is used. The optical encoder has a resolution of 2000 impulses per revolution. With the 36:1 gear a theoretical rotatory resolution of 0, 005◦ can be archieved. Since the motor, feedthrough and gear are oversized a smaller setup was developed therefore (1b). Here a small 24V
(a)
(b)
(c)
Figure 1. RoSi I (left), RoSi II (middle) and the corresponding coordinate sytem (right)
DC-motor is used with a commercial 7 channel rotary feed through. Since we use Sick LMS the depth resolution lies within a ±2cm boundary. The angular resolution depends on the rotational velocity of the scanner, but also on the configuration of the LMS. It is possible to configure the LMS resolution to 0, 25◦ , 0, 5◦ or 1◦ . To get an angular resolution of 1◦ in both angular directions the rotational velocity vrot has to be (assuming a single scan takes tscan = 15ms) vrot =
1◦ 180◦ = ≈ 67◦ /s tscan · N rof Deg tscan
750
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
since the scanner acquires a full model after N rof Deg = 180◦ . In that case, a full point cloud acquisition takes T =
180 ≈ 2, 7s. vrot
with a depth image consisting of about 32000 single points. It can be seen easily, that the spatial resolution in the center tends to be infinte, where the resolution gets lower towards the boundaries of the depth image, while the angular resolution is still 1◦ in both angular directions. The scanning cone is variable 100◦ or 180◦ (which results in a complete hemisphere). The maximal range is 8m (for indoor applications) or 80m (for outdoor applications). The impluses from the optical encoder are counted by a microcontroller and send to the PC via a digital I/O interface card. 2.2. Spatial Data Acquisition With the coordinate system shown in fig 1c a horizontal scan point without rotation would have the coordinates xi = cos (αi ) · ri yi = 0 zi = sin (αi ) · ri with αi as scanner angle and ri as corresponding distance. Taking into account the additional rotation around the optical axis leads to xi = cos (β) · cos (αi ) · ri yi = sin (β) · cos (αi ) · ri zi = sin (αi ) · ri with β as the rotation angle taken after the scan. Since the scanner needs some time tscan to do a single line scan, the rotation angle is variable in one scan line. Taking this into account leads to i · vrot · tscan · cos (αi ) · ri xi = cos β − 1 − M axV al i yi = sin β − 1 − · vrot · tscan · cos (αi ) · ri M axV al zi = sin (αi ) · ri with M axV al as the number of maximal values per linescan (e.g. 180). 2.3. Sample Depth Images Figure 2 shows a sample depth image with 80m maximum range and the corresponding original. Figure 3 shows several point clouds (left column) taken with different configurations. The same data triangulated as surface model (see chapter 4) is depicted in the right column. Each point cloud was acquired with a maximum range of 8m. The rotational velocity vrot was kept constant.
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
(a)
751
(b)
Figure 2. Sample with 80m range (original left, triangulated point cloud right)
3. Texture integration The texture of a scene is mapped on the model as follows: the texture image is acquired with a digital camera, which is mounted below the RoSi sensor and has a fixed, but unknown offset and orientation relative to the scanner (see figure 4a). Therefore, a calibration between the 3D coordinate system of the RoSi sensor and the 2D coordinate system of the digital camera is necessary. In the first step the point cloud is triangulated, so that every triangle patch is filled with texture from the grabbed image. The triangulation algorithm uses known neighborship relations between the points, due to the specific architecture of the sensor system. The triangles are built by connecting two neighboring points in the same scan row (Pi,j and Pi+1,j ) with the corresponding point in the next row (Pi+1,j+1 ) (respectively the former row, Pi,j−1 , see fig. 4b). A variable range threshold avoids the system to triangulate areas which do not belong to a reflected object. The texture is mapped on every triangle by calculating the corresponding 2D image pixel to every 3D point belonging to the triangle. Therefore a transformation from the scanner coordinate system SCS to the camera coordinate system CCS of the camera is necessary. An intermediate coordinate system ICS is used to connect these two coordinate systems. This ICS is located in the lower left front corner of a calibration object. This object is a box which contains 16 pin heads on two different levels (see fig.5a). A transformation CS is calculated to transform the SCS to the ICS. Another transformation CC is estimated to transform ICS to CCS. The complete transformation is Ccomplete = CC · CS . As soon as the transformation matrix Ccomplete is computed the calibration object is not needed anymore. Contrary to the camera transformation, the scanner transformation is determined directly from three intersecting orthogonal planes, the left side, the floor, and the back plane of the calibration object. Due to the low number of laser measurements on the pin heads and the scanner uncertainty, the pin heads are not used here. The determination of these three planes is interactive. The user selects three points on each plane. The calibration algorithm calculates the three planes from these points. To take the scanner uncertainty into account, it uses the mean value of a number of surrounding points, which are less distant than a threshold DM AX. The mutual intersections between two planes determine the axes of the ICS. The origin of the ICS is de-
752
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
Figure 3. Depth Images with all variations in configuration concerning opening angle and scanner angular resolution (from top to bottom)100◦ /0.25◦ , 100◦ /0.5◦ , 100◦ /1.0◦ , 180◦ /0.5◦ , 180◦ /1.0◦
termined by the intersection of the three planes and by the known distance of the back wall from the front corner (see figure 5b). The camera calibration matrix Cc is calculated via the 16 measurement points within the calibration object. In a first step a color image segmentation of these pin heads is performed, and a connected components labeling algorithm is executed on the resulting binary image. The image coordinates of the pin head center points make a computation of the calibration matrix Cc possible. With the
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
(a)
753
(b)
Figure 4. The camera of the Rosi scanner (left) and a visualization of the triangulation algorithm
(a)
(b)
Figure 5. Calibration object (left) and found planes (right)
two calibration matrices the scanner to camera transformation matrix Ccomplete is given. A triangle point can by calculated in image coordinates as xc = Ccomplete · xs Note that if the texture coordinates xc lie outside the field of view of the color camera F OVcc , the texture of the corresponding 3D triangle point xs will remain undefined. Also, it is possible for the transformation Ccomplete to map distinct 3D points onto the same texture coordinate, where the color camera sees only the texture value I(xc ) of the point closest to it (to its origin Occ ). The other 3D points are occluded and must therefore not be textured from this view. Therefore, the algorithm for single view texture assignment is summarized by the following formula, which is applied to all triangle points: I(xs ) =
I(xs ) undef ined
if xs − Occ = min x − Occ else
Nevertheless, the same point xs remaining untextured from this view may receive its texture from another view where it is not occluded through another object. To cope with this problem a method similar to [10] was implemented. Here, different memory areas
754
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
with size of the camera image are allocated to store distance information and the corresponding triangle. In this way it is not difficult to determine the closest point with a specific texture value and delete texture information from occluded areas.
4. Experimental Results Figure 6 depicts an occlusion situation. It can be seen that the areas which lie out of the cameras view stay untextured. The bottle with the yellow cap occludes the wall. The texture of the bottle is incorrectly projected onto the wall (a). The occluded area is marked in red (b) and removed (c). Figure 6d shows another occlusion situation. The occlusion is removed (e) or an estimated texture is put on the undefined areas (f). The texture is estimated using a mean value between the last valid point and the first valid point after the occlusion.
(a)
(c)
(e)
(b)
(d)
(f)
Figure 6. Occlusion, its detection and removal (a,b,c) and another occlusion, its removal and a estimated texture for the untextured area (d,e,f)
M. Walther et al. / A Foveal 3D Laser Scanner Integrating Texture into Range Data
755
Acknowledgments The work described in this paper was conducted within the EU Integrated Project COGNIRON ("The Cognitive Robot Companion", s. http://www.cogniron.com) and was funded by the European Commission Division FP6-IST Future and Emerging Technologies under Contract FP6-002020.
References [1] C. Früh, Automated 3D Model Generation for Urban Environments, 2002, Universität Karlsruhe, phd thesis [2] S. Thrun D. Hähnel, W. Brugard.Learning compact 3d models of indoor and outdoor environments with a mobile robot In IJCAI, 2001. [3] S. Kristensen and P. Jensfelt. Active global localisation for a moble robot using multiple hypothesis tracking. In Proceedings of the IJCAI99 Workshop on Reasoning with Uncertainty in Robot Navigation, pages 13-22, Stockholm, Sweden, 1999. [4] W. Burgard S. Thrun, D. Fox. A real-time algorithm for mobile robot mapping with applications to multi-robot and 3d mapping. In IEEE International Conference on Robotics and Automation, San Francisco, 2000. [5] F. Blais J.A. Beraldin S.F. El-Hakim, P. Boulanger and G. Roth. A mobile system for indoor 3-d mapping and positioning. In Proceedings of the International Symposium on Real-Time Imaging and Dynamic Analysis, 1998 [6] A. Walthelm and A. M. Momlouk Multisensoric active spatial exploration and modeling. ˝ Dynamische Perzeption: Workshop der GI-Fachgruppe 1.0.4, pages 141U146, Ulm, 2000. [7] Oliver Wulf, Bernardo Wagner, Mohamed Khalaf-Allah, Using 3D data for Monte Carlo localization in complex indoor environments, European Conference on mobile Robotics, ECMR, 2005, Ancona [8] Andreas Nüchter, Kai Lingemann, Joachim Hertzberg, and Hartmut Surmann. HeuristicBased Laser Scan Matching for Outdoor 6D SLAM, in KI 2005: Advances in Artificial Intelligence. 28th Annual German Conference on AI, Proceedings Springer LNAI vol. , Koblenz, Germany, September 2005. [9] H. Andreasson, R. Triebel and W. Burgard, Improving Plane Extraction from 3D Data by Fusing Laser Data and Vision, In Proc. of the International Conference on Intelligent Robots and Systems(IROS), 2005 [10] Kia Ng, et. al, An integrated multi-sensory system for photo-realistic 3d scene reconstruction, School of Computer Studies, University of Leeds and European Commission - Joint Research Centre.
756
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Autonomous Collaborative Environment for Project Based Learning Mihoko Otake , Ryo Fukano , Shinji Sako , Masao Sugi , Kiyoshi Kotani , Junya Hayashi , Hiroshi Noguchi , Ryuichi Yoneda , Kenjiro Taura , Nobuyuki Otsu , and Tomomasa Sato Division of Project Coordination, The University of Tokyo PRESTO program, Japan Science and Technology Agency Graduate School of Information Science and Technology, The University of Tokyo Graduate School of Engineering, The University of Tokyo National Institute of Advanced Industrial Science and Technology Abstract. The importance of integrating different discipline of information science and technology is growing for the realization of information system which works robust in real world. Such system should be achieved by integrating super robust algorithm, large-scale dependable network and devices or interfaces which sense and/or actuate in the real world. It is necessary to encourage collaboration among specialists of different disciplines, since it is difficult to cover all area in information science and technology by only one person or group. In order to support collaboration among various kinds of experts, the autonomous collaborative environment for project based learning was developed. The environment comprises the workshop protocol and the community site. The workshop named "A Hundred Hour Workshop" for graduate students who belong to different departments was carried out during summer holidays. The community site named "WS100H.NET" was developed for supporting and analyzing autonomous collaboration process. Novel interdisciplinary technology, fusion of recognition and parallel computation was successfully developed. Collaboration process among the participants was autonomous with minimal facilitation by the tutor in the environment. Keywords. computer supported collaborative learning (CSCL), workshop, project based learning (PBL), contents management system (CMS), real world information system
1. Introduction Information technology of the 20th century centering on information equipment represented by the Internet and personal computers is about to come to a turning point, after the start of the new millennium. The real-world-based information scientific technology of the 21st century should stand on a wide range of researches, namely, computer science, mathematical informatics, information physics and computing, information and communication engineering, mechano-informatics, and precision engineering. Information system of the next generation should be realized by fusing different discipline of informa1 Correspondence
to: M. Otake, E-mail: [email protected]
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
757
tion science and technology, which are studied in different departments. Collaborative learning environment will promote cross-disciplinary research projects. Participation of graduate students with flexibility and mobility would accelerate such projects. The goal of this study is to develop collaborative learning environment which support PBL for such next generation information systems. PBL stands for People-, Problem-, Process-, Product-, Project-Based Learning [1]. It originates from medical education[2,3]. Nowadays, PBL is widely used in engineering education. PBL Lab was created to provide a learning environment where architecture/engineering/construction students would collaborate in multi-disciplinary, geographically-distributed teams on project-centered activities that produce a product for an industry client[4]. In PBL, Cross-disciplinary learning is conducted which is a journey from the state of island of knowledge (disciplinecentric) to a state of understanding of the goals, language, and representations of the other disciplines. Students learn through solving problems and reflecting on their experience. They work in small groups guided by a facilitator. The role of the facilitator is guiding students on the learning process, pushing them to think deeply, and modeling the kinds of questions that students need to be asking themselves[5]. The role of the tutor is changing in a PBL learning environment, from the traditional teacher who delivers the course material in class to the coach. In order to support such collaborative learning process, the community has emerged in the past decade known as computer-supported collaborative learning (CSCL) [6,7,8] among those researchers working on computer-assisted learning[9]. One of the aims of CSCL is to promote mutual learning through interactions and discussions among learners[10]. Advanced technology can facilitate the students’ collaborative effort, but does not, by itself, provide sufficient support without appropriate pedagogical arrangements and scaffolding of the collaborative learning endeavor[11]. Therefore, the goals and strategies of an expert facilitator has been studied and how to implement system with built-in facilitation function [5]. In this study, the autonomous collaborative environment for project based learning was developed which respects initiative of the participants in order to bring out autonomous collaboration. The workshop named "A Hundred Hour Workshop" for graduate students was carried out during summer holidays by some of the members of the real world information systems project in the 21 COE program "Information Science and Technology Strategic Core" at the Graduate School of Information Science and Technology of the University of Tokyo[12]. The community site named "WS100H.NET"[13] was also developed for supporting and analyzing autonomous collaboration process. The originality of this research is to propose a set of protocol as well as platform which shapes collaborative learning environment with practical embodiments.
2. A Hundred Hour Workshop Activity centered workshop protocol was designed and conducted. The Authors of this paper was the committee, and the first author played a role of the facilitator, instructor, tutor, or coach of the workshop. These terms are used interchangeably in this paper. The committee configured the deadline and called for collaborative project proposal to the workshop. The intended applicants were the groups of graduate students who belong to different labs of different departments of the graduate school of information science and technology and graduate school of engineering. The committee provided research
758
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
Figure 1. The community site named WS100H.NET: it was developed for supporting the autonomous collaboration process for the workshop.
funding about a thousand dollar per one team, which mostly equals to the salary of the teaching assistant for a hundred hour in the university. Therefore, we named the workshop, "A Hundred Hour Workshop". The participants have to develop cross-disciplinary technology which demonstrates the information system in the near future based on their proposals. They have responsibilities and obligations to show their results at the final presentation. The application period was 15 days after the orientation. The orientation was on July 22, and the application period was from July 25 to August 8. The kickoff was the next day of the application deadline. The development period started on the next day of the kickoff and lasted for 52 days. The kickoff was on August 9, and the final presentation was on September 30.
3. The Community Site, WS100H.NET The community site named WS100H.NET[13] was developed for supporting the autonomous collaboration process of the workshop (Figure 1). We expected accumulation of empirical knowledge which usually remains inside the participants. The site is based on one of the typical contents management system called XOOPS[14]. XOOPS is a program that allows administrators to easily create dynamic websites with great content and many outstanding features. It is an ideal tool for developing small to large dynamic community websites, intra company portals, corporate portals, weblogs. XOOPS uses a relational database (MySQL) to store data required for running a web-based content management system. Modules can be installed/uninstalled/activated/deactivated with a click using the XOOPS module administration system. We installed the following modules: forum, album, calendar, news, link, download, wiki, FAQ, and questionnaire. Among them, participants mainly submitted their contents to forum, album and calendar. In other modules, the tutor submitted the contents beforehand. The contents submitted by the committe are described below. On news module, the tutor announced the orientation, kickoff, and final presentation schedules and acceptance
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
759
㪪㫂㫀㫃㫃㩷㫌㫇 㪥㪼㫋㫎㫆㫉㫂㫀㫅㪾 㪚㫆㫃㫃㪸㪹㫆㫉㪸㫋㫀㫆㫅 㪝㫌㫅㪻㫀㫅㪾 㪩㪼㫊㪼㪸㫉㪺㪿
Figure 2. One of the results of the questionnaire to the participants: the reasons for the participation in the workshop were skill up, networking, collaboration experience, part time job, and research interests.
of the proposals. On link module, the committee offered useful links for the participants. On download module, the tutor posted application formats and documents. On wiki module, the committee offered background information of the project. On FAQ module, the tutor offered general information for students and professors who are interested in the projects. On questionnaire module, the tutor prepared questions to the participants. The contents submitted by the participants are summarized below. On forum module, three forums were created for participants: entry, progress and meeting minutes. Album was used to record the meetings and presentations. Calendar was for announcing the meeting schedules. The site offered the place for submitting self introduction, meeting information and progress reports for participants.
4. Results 4.1. Participants of the Workshop Number of attendees to the orientation was 50, and 13 students participated in the workshop. There were two teams, Team A and Team B. Team A consists of 7 members, while Team B consists of 6 members. Members of both teams belong to either Lab X or Lab Y: research area of Lab X in the department of information and communication engineering is parallel computation; Lab Y has been working on recognition in the department of mechano-informatics, a cross-diciplinary area of informatics and mechanical engineering. Four students of Team A belong to Lab X, three of them belong to Lab Y. Team B consists of six members: half of them are of Lab X and rests of them are of Lab Y. The grade of the students was as follows: one was bachelor course fourth year, six were in master course, and four were in doctor course. Both teams try to speed up motion recognition software based on Cubic Higher-order Local Auto-Correlation (CHLAC) feature extraction algorithm[15] through parallel computation on GRID computing environment[16]. The goal of Team A is to speed up recognition software so as to operate in real time. The goal of Team B is to improve accuracy of the recognition which takes unrealistic period of time without pararell computating. Human motion recognition system, which can discriminate strong and weak punching and kicking motions was developed by Team A, while human gait recognition system, which can identify both person and motion by time series of image data by Team B. The development was a big success.
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
numbers of attendee
760
7 6 5 4
TeamA
3
TeamB
2 1 X Y X Y X Y X Y X Y X Y X Y X Y
0 2
9
13
20 23
27 30 44
Lab Day
Figure 3. Numbers of the meetings and the attendee to the meetings: horizontal axis shows the date of the meetings while vertical axis represents the numbers of participants at each meeting.
It is very important to make clear which aspects of the workshop are attractive for students, in order to broaden the reach of the workshop. Thus, we sent out questionnaire to the participants what was their motivation. The items of the questionnaire were skill up, networking, collaboration experience, part time job, and research interests. The workshop was appealing in this order (Figure 2). 4.2. Emergence of Autonomous Collaboration The day of final presentation was fixed by the committee of the workshop. The tutor directed the students to decide the rehearsal date. The final presentation date was set up on 52 days after the kick off. Meetings were held almost once a week. There were meetings on 2, 9, 13, 20, 23, 27, 30, and 44 days from the start. Number of the meetings was eight: six for discussion, one for data acquisition, and one for rehearsal. They were autonomously determined. During these meetings, there was no intervention by the tutor. Only students were participated in the meeting except for one of the rehearsal day. The tutor answered to the questions or requests via e-mail. From six to twelve students attended the meetings among thirteen students (Figure 3). 4.3. Submission of Contents Numbers of submissions are shown in Figure 4. There were total of 130 submissions before the presentation. Seventy three articles were posted to forums, and 62 of them were submitted by participants. There were 17 schedules to calendar, in which seven were by participants. Participants submitted 25 photos and total numbers of photos were 40. All members submitted to forums. One of the doctor course third year students played an important role in submitting photos to the album, meeting schedules to calendars, and minutes to the meeting forum. They were registered smoothly. This assignment of role was emerged autonomously. The tutor continued asking the students to register entry data to the self introduction forum until all participants submitted their articles. Everyday task was submitted to the progress forum by the active participants. Once the first record was registered, the author submitted their records successively. We have to keep in mind that each article sometimes contains multiple days of records, although the numbers of articles were counted. Also, numbers of photos were counted for albums,
numbers of submission
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
761
80 60 Tutor
40
Students
20 0 Forum
Calender
Album
Figure 4. Numbers of submissions to the community site: total numbers of submissions were 130; 73 articles were posted to forums, and 62 of them were submitted by participants; there were 17 schedules to calendar, in which 7 were by participants; participants submitted 25 photos and total numbers of photos were 40.
and numbers of schedules were counted for calendars, both of them were easy to submit compared to the forum articles. The tutor could observe the development of the project by the contents management system without attending the meeting. The tutor could also comprehend the tasks of each participant. This kind of process is not usually recorded on time, which limits the reproducibility and accumulation of the empirical knowledge. These recorded processes are definitely good references to people who try to practice project based learning workshop.
5. Discussion Did Students Participate Autonomously? The answer to this question is partially yes and partially no. Some of the participants were acquaintance of the tutor. The tutor invited them to participate in the workshop. This process was not autonomous. These students called for participation to their colleagues, who got interested in and participated. This process was autonomous. Three other groups of attendees of the orientation also considered entry for the workshop, but they gave up because of their research schedule. In order to increase the number of participants, the workshop committee should advertise the workshop more effectively. We sent out questionnaire in order to make clear what attracted the participants. The results of the questionnaire show that the major motivation for participation was skill acquisition, networking and collaboration experience. Funding was not the main reason for participation, although it attracted many students for the orientation. It is effective for attracting public interest to the projects, but not enough for participation. We should emphasize the merit of skill acquisition, networking and collaboration experience for attracting many students in the beginning. Did Autonomous Collaboration Emerge? Yes. The progress of the project was extremely autonomous. The committee determined the deadline. The tutor didn’t lead the development process after kickoff. The tutor only asked the participants to submit progress reports and minutes of the meeting to the community site. The tutor monitored the progress through the system and communication via e-mails without attending the meetings. From the results, not every member could participate every meeting. The members who were absent from arbitrary meetings could
762
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
catch up with the next meeting by minutes and photos archived on the system. They could check the schedule of the next meeting with the calendar. In summary, the purpose of the workshop, supporting emergence of autonomous collaboration was achieved by the community site. Was Knowledge Sharing Achieved? Meeting data were successfully recorded in the meeting forum, calendar and album, which supported progress of the project. Technical discussions were done with the e-mail list and forum was not used for that purpose. In order to share the generated technical knowledge among the participants, discussion should be recorded by participants or be curated by the tutor. One of the participants played a leading role during the project. He submitted most of the meeting records except for those submitted by the tutor as examples. His contribution was the key for this success. We would like note that his assignment of roles emerged autonomously.
6. Conclusion In this study, we developed autonomous collaborative environment which supports project based learning. The environment promoted cross-disciplinary study among graduate students of different departments. This doesn’t happen without it. The goal was to realize the next generation of the information system through integrating different domains in information science and technology via collaboration. The environment comprises the workshop protocol and the community site. The workshop was proposal based, and the teams consisting of different lab members were recommended for participation. The site was built on one of the typical contents managing system. Autonomous collaboration was achieved which is summarized as: Graduate students and an undergraduate student participated who are interested in skill up, networking and collaboration during summer vacation. Meetings were held autonomously and different lab members shared their expert knowledge. The tutor and committee followed autonomous collaboration process mainly through the community site without attending the meetings after kickoff. The tutor helped them when they need help. Facilitation by tutor was required at first for contents submission. Once the participants started to submit, they submitted successively. Empirical knowledge such as scheduling and role assignment were recorded successfully. Interdisciplinary technology, fusion of recognition and parallel computation was newly-developed successfully. The goal of this workshop was fulfilled. Future work includes representing the developed technology through the collaboration with thorough description, in order to share the results among the participants and non participants. We will figure out the general format or scaffold of community site for project based learning in order to provide the universal platform. The autonomous collaboration environment which we developed is one of the role models for supporting cross-disciplinary project based learning.
M. Otake et al. / Autonomous Collaborative Environment for Project Based Learning
763
Acknowledgements The authors thank T. Shirai, K. Ishiguro, T. Shiraki, H. Saito, Y. Kamoshida, H. Yoshimoto, R. Fukano, S. Itou, T. Nanri, K. Takahashi, Y. Horita, Y. Shimohata, and K. Kaneda for active participation in the workshop. This study was supported by the 21 COE program "Information Science and Technology Strategic Core" with the Graduate School of Information Science and Technology, and Science Integration Program - Humans, of the University of Tokyo.
References [1] R. Fruchter and K. Emery, “Teamwork: Assessing cross-disciplinary learning,” in Proc. Computer Support for Collaborative Learning (CSCL1999), Palo Alto, CA, 1999, pp. 166–174. [2] H. Barrows, The tutorial process. Springfield, IL: Southern Illinois University Press, 1988. [3] H. G. Schmidt, M. Machiels-Bongaerts, H. Hermans, T. J. ten Cate, R. Venekamp, and H. P. A. Boshuizen, “The development of diagnostic competence: Comparison of a problembased, an integrated, and a conventional medical curriculum,” Academic Medicine, vol. 71, pp. 658–664, 1996. [4] R. Fruchter, “Architecture/engineering/construction teamwork: A collaborative design and learning space,” Journal of Computing in Civil Engineering, vol. 13, no. 4, pp. 261–270, 1999. [5] C. E. Hmelo-Silver, “Collaborative ways of knowing: Issues in facilitation,” in Proc. Computer Support for Collaborative Learning (CSCL2002), Boulder, CO, 2002. [6] C. Crook, Computers and the Collaborative Experience of Learning. London, UK: Routledge, 1994. [7] C. O ’Malley, Computer Supported Collaborative Learning. Berlin, Germany: Springer Verlag, 1995. [8] P. Dillenbourg ed., Collaborative Learning: Cognitive and Computational Approaches. Amsterdam, the Netherlands: Pergamon, Elsevier Science, 1999. [9] G. Stahl, “Group cognition in computer-assisted collaborative learning,” Journal of Computer Assisted Learning, vol. 21, pp. 79–90, 2005. [10] T. E. Koschmann, CSCL : Theory and Practice of an Emerging Paradigm. Mahwah, NJ: Lawrence Erlbaum Associates, 1996. [11] L. Lipponen and J. Lallimo, “From collaborative technology to collaborative use of technology: Designing learning oriented infrastructures,” Educational Media International, vol. 41, pp. 111–117, 2004. [12] Information Science and Technology Strategic Core. The 21 coe program at the graduate school of information science and technology of the University of Tokyo website. [Online]. Available: http://www.i.u-tokyo.ac.jp/coe/index-e.html [13] WS100H.NET. A hundred hour workshop website. [Online]. Available: http://www.ws100h.net/ [14] XOOPS. An acronym of eXtensible Object Oriented Portal System. [Online]. Available: http://www.xoops.org/ [15] N. Otsu, “Towards Flexible and Intelligent Vision Systems - From Thresholding to CHLAC -,” in Proc. IAPR Conf. on Machine Vision Applications, May 2005, pp. 430–439. [16] Phoenix. Grid programming and tools project. [Online]. Available: http://www.logos.ic.i.utokyo.ac.jp/phoenix/
This page intentionally left blank
Part 13 Humanoid Robots
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
767
Pedaling Motion of a Cycle by Musculo-skeletal Humanoid with Adapting Ability Based on an Evaluation of the Muscle Loads Tomoaki Yoshikai a , Yuto Nakanish a , Ikuo Mizuuchi a , Masayuki Inaba a a The University of Tokyo Abstract. Adaptation mechanism during motions with some constraints is indispensable function not for destroying the robot itself. Particularly, pedaling motion has a strong constraint for the legs’ motion and such kind of adaptation ability is necessary. For realizing adapting ability in a pedaling motion for musculo-skeletal humanoid, monitoring and evaluating function for the muscle load and the reinforcement learning method based on the evaluated muscle loads are developed in this paper. At last feasibility of the proposed system is confirmed by an actual musculo-skeletal humanoid Kenji. Keywords. Pedaling motion, R-Learning, Parallel Monitoring & Evaluation for Muscle Loads, Musculo-skeletal humanoid
1. Introduction Adaptation ability is indispensable function while doing some motions with constraints. Without adaptation ability, every interferences with other objects should be modeled and predicted in advance, and that is almost unrealistic. Humans can instantly adapt to the unexpected interferences with various objects in an environment, and at the same time humans learn the constraints on how to interact with the object unconsciously. In pedaling a cycle, adaptation according to the constraints of a cycle’s motion is necessary, and that is one of the good examples of adapting while doing some motions under certain constraints. Accordingly, pedaling motion of a cycle by a humanoid robot is the main issue in this paper. When realizing adaptation ability in humanoids, the humanoid with mechanical redundancy has advantages because of the wide solution space. A musculo-skeletal design of the robot is one way to realize such mechanical redundancy. So far, in many musculoskeletal robots, pneumatic actuators are used for actuating it’s structure because of their soft and springy property([1][2]). However, pneumatic actuators have some demerits: controlling is difficult because of their nonlinearity, a large set of compressor is usually needed for driving them, etc. Therefore a series of musculo-skeletal humanoids with a wire take-up mechanism by a pulley and a motor have been developed and studied so far in author’s group [3][4][5] [6][7]. In those studies, main issues are how to construct the redundant humanoid itself and how to build the software environment for them. The issue discussed in this paper is a method to realize adaptation ability using such mechanical redundancy, which is not mentioned in the previous studies.
768 T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability
In a control aspect, adaptation ability against the contact objects for humanoid robots is usually realized by impedance control monitoring the force sensor set near endeffectors[8] [9]. However, the modeled relationships between the adapting motion and the force applied to the end-effector is fixed in those methods. In a musculo-skeletal humanoids stated above, the situations where those relationships are easy to change are assumed 1 . How to deal with those cases utilizing the redundancy of the body is one of the main issue for our musculo-skeletal humanoids. Accordingly, adaptation method for our musculo-skeletal humanoids has to be flexible depending on the body state. Our previous approach for a pedaling motion include the learning mechanism for impedance control that the optimized relationships between muscle tensions and a pedal reaction force is searched by a kind of hill climbing method through repeated training[10]. However the method needs some repetition time and it is difficult to follow the unexpected changes of the body state, such as actuator or sensor troubles. Consequently, realization of a mechanism for learning to adapt to the constraints online in a musculo-skeletal humanoid during a pedaling motion of a cycle is a main objective in this paper. For realizing that adaptation mechanism, we propose the online reinforcement learning method based on the actuator loads. In the following sections, the framework for realizing automatic adaptation during a pedaling motion is explained in Section 2. In Section 3, the method to evaluate the actuator loads using redundant sensor information is described. The design of R-learning environment, the reinforcement learning method adopted here, is described in Section 4. In Section 5, the experiments using actual musculo-skeletal humanoid Kenji are shown. Lastly concluding remarks and future works are stated in Section 6.
2. Adapting Ability in a Motion with Some Constraints 2.1. Learning to Adapting to the Constraints Some behaviors, such as going through very narrow space or pedaling a cycle, are executed with the robot body touched with an environment. In those behaviors, there are usually some constraints on which direction is easy for the robot to move. The internal force can become very high and destroy the robot body itself unless the constraints are considered precisely in the motion planed in advance. One method to avoid self destroying motion is introducing adaptation ability to the constraints. In this paper, ’Adaptation’ is defined as the modification of part of the original motion into the one with less actuator loads, and here the mechanism to learn to adapt to the constraints is aimed. Learning for adaptation is needed since how the robot adapts to the constraints can be varied by the body state. For example, actuator condition differs from time to time, and sometimes some actuators can be in a rest condition because of too much load. In a redundant robot system, such flexibility is necessary for utilizing the redundancy effectively. 1 For
example, troubles of some actuators or sensors easily occurred in a repetition of some behavior
T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability 769
Figure 1. Humanoid system architecture with parallel evaluating monitors
Figure 2. Automatic adaptation mechanism for a pedaling motion
2.2. Parallel Evaluating Monitors with Learning Mechanism For realizing adapting ability defined above, the system that monitors the actuator loads in real-time and modifies the original motion at the same time is needed. Therefore the framework of parallel evaluating monitors with learning mechanism is proposed in this paper. First, general idea of parallel evaluating monitors is shown in Fig.1. In the figure, objective behavior programs coded by humans is expressed by the upper loop that the system receives the sensor inputs and calculate the action outputs, then acts against the environment. In the background of the objective behavior programs, parallel evaluating monitors run. They always monitor sensor changes and evaluate the necessity of automatic behavior modification. If there’s some need for modifying the behavior, they modify the objective behavior’s outputs. Parallel evaluating monitors evaluate various things, such as the causality of the self motions and sensor changes [11], danger of the robot’s body state, or sudden environmental changes, etc. Here, the parallel evaluating monitor for learning adaptation is constructed as Fig.2. In this case, the system monitors the actuator loads for the leg used in the pedaling motion. Based on the evaluated actuator loads and which direction the force is applied to by the constraint of a cycle, the modification output is learned by reinforcement learning method. In the following sections, the detail implementations for monitors on actuator loads and reinforcement learning mechanism are described.
3. Monitoring & Evaluation of the Muscle Loads Monitoring and evaluation of the robot’s actuator loads are necessary functions in order to avoid destroying itself while doing some behaviors. Especially, those functions are effective in musculo-skeletal robots since each joint is driven by a set of agonist and antagonist muscles, and actuator loads go up easily even if the joint keep still. In a musculoskeletal robot, actuation unit that the motor pulls the wire can be considered to be ’a muscle’ as an analogy to humans. Accordingly we use the term ’muscle loads’ for the load for the actuator in this paper. By monitoring the muscle loads and modify its behaviors not to work too hard, actuators’ duration of life can be lengthened and the performance for the behavior is expected to become higher in total since the frequency for repairing the robot can be lowered.
770 T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability
3.1. Evaluation of Muscle Loads Based on the Redundant Sensor Information Usually, sensor information that can be used for an evaluation of one actuator varies in each robot. For some robots, only the difference between the reference and actual actuator angle (this is referred as ’Error’ bellow in this paper) can be used for evaluating actuator load. In other case, current or temperature for the actuator can be used in addition to the error information. Therefore, evaluation function for each sensor is defined separately and the load for one actuator is decided by the weighted mean of each evaluation (Eq. (1)). if f (vali ) > 0.9 f (vali ) (1) load = N αi fi (vali ) i=1 otherwise N In the Eq.(1), vali indicates the sensor values for sensor i, and fi (x) expresses the load evaluation function for the sensor. The result for fi (x) is assumed to be regulated from 0.0 to 1.0. Then, in the second equation, N indicates the total number of sensors which can express the actuator state directly or indirectly. αi means the weight for sensor i. The equation returns one specific evaluation value when it is beyond 0.9. Otherwise, it returns the weighted mean as the muscle’s load. In this paper, four sensors are assumed to be used for the evaluation of the muscle’s load in a musculo-skeletal humanoid. Those are error, current, tension, and temperature for the actuator. Basically, each evaluation functions are calculated in the form of sigmoid function (Eq.(2)). In the equation, α and β are parameters decided for each sensor empirically.
f (x) =
1 1+
eα(x−β)
(2)
3.2. Evaluation of the Direction for Adapting Based on the Load Distribution When the robot is interfered by some objects or humans, forces applied to the robot’s endeffector can be estimated from the load distribution. The direction of the force applied to the robot’s end-effectors can be used when the robot decide which direction to move its body for adapting to the interferences. For instance, when a muscle-driven robot is interfered by human and the robot tries to keep its posture, each muscle generate a power in proportion to the loads. Accordingly tension of each muscle is expressed as Eq.(3). tensioni = γi · loadi
(3)
In the equation, γi indicates constant of proportion for the loadi . Here, the magnitude of the tension itself is not important for deciding which direction to adapt. Therefore γi is assumed to be 1. From the tensions for each muscles, joint torques for the target limb limba can be calculated using the robot’s geometric model. As a result, from calculated joint torques, FMlimba , the force and moment that is applied to the robot’s end-effector for limba is calculated as Eq.(4). FMlimba = −(Je (θlimba )T )+ Jw (θlimba )T tensionlimba
(4)
T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability 771
In the equation, Je (θlimba ) indicates the jacobian matrix of the end-effector’s position for limba against the joint angles. (Je (θlimba )T )+ expresses pseudo inverse matrix of Je (θlimba )T . Jw (θlimba ) expresses jacobian matrix of muscle lengths against joint angles in limba . tensionlimba refers to the tension vector for limba . Thus, the direction for the force that is applied to the muscle-driven robot’s endeffector can be estimated from the distribution of the muscle loads.
4. Online Learning for Adapting Ability Based on the Muscle Loads For learning to adapt to the constraints while the robot is executing some motions, following conditions are required. 1. a learning method should be adaptable to the various changes of the learning environment. 2. learning process that is done through actually moving and evaluating the result is desirable. 3. a learning method should be able to deal with the continuing tasks. 4. learning process that is done online is desirable. The first and second conditions are needed since we are interested in the problems where the robot’s body state is changeable and the robot has to manage the body state’s changes with its own redundancy during learning process in this paper. From these two conditions, reinforcement learning is suitable. In the third condition, ’continuing tasks’ indicates the ones that have no explicit starts and ends. When a robot adapts to some constraints, how the robot adapts to the constraint is decided by how the robot moves. And the robot has to adapt to the constraint as long as the motion continues. In Q-learning, one of the most popular reinforcement learning method in robot learning , the facts are assumed that there’re explicit starts and ends, and that the learning process is going through by the repetition of some sequence. These assumption is against the characteristics for learning to adapt to the constraint. For those kind of online learning, r-learning is suitable as authors revealed in the learning of the imitation behavior in the previous study[12]. ’R-learning’ is originally proposed by Schwartz[13] in 1993 as a reinforcement learning method for maximizing undiscounted reward. This method is considered to be appropriate for dealing with continual tasks[14]. In R-learning, suitable actions for each state are learned through taking actions which maximize average rewards per-time-step. In the rest of this section, the design of the learning environment, i.e. state-action space, reward function, and action selection method for learning to adapt to the constraints with the R-learning, is explained. Since the state-action space is described by the estimated force direction to the end-effector for each limb, as it’s explained later, the separate learning environments for each limbs are prepared in this paper. First, how to describe the state space is an essential issue in order to adapting in a real time. When the state space is described by the joint angles or muscle loads, dimensions of the state space becomes proportional to the number of joints or actuators 2 and the dimensions of the space can be huge in the redundant robot systems like humanoid robots. So, evaluated force direction (three dimensions) applied to each end-effector is adopted for describing state space in this paper. Even if the robot touches the environ2 Usually
number of joints and actuators are not same in the tendon-driven robots
772 T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability
Figure 4. Pedaling motion with one leg where the robot’s hip is fixed at the saddle Figure 3. Reinforcement learning environment for adapting behavior
ment not with the end-effectors, but with the middle of the limb, adapting direction can be calculated correctly since the force direction is estimated by whole muscle loads in the limb. Here, each state space is digitized to some extent because subtle differences in the adapting direction has no such big influence to the effect of reducing muscle loads in adapting. Therefore, state space can be constructed as a set of some areas described in the spherical coordinates. In this paper, states are divided into 25 parts on the surface since we focus only on the direction of force (Fig.3 left). In addition to those 25 states, the state that is under a threshold for generating adaptation is added in this paper. Actions that can be selected in each state are basically decided by which direction the force is applied to. For keeping flexibility in the learning process, actions are defined in the continuous space. As Satamaria et al did [15], actions are digitized in the action selection phase. In order to express continuous action space, tile coding is adopted in this paper. Here, three different types of tiles are used and 4 tiles for each types are accumulated alternately in one state. In this case, each action is described by 2 variables, that are corresponded to the position on the state space, and has a value for adapting behavior (Fig.3 middle). The point with higher value is evaluated to contribute to reducing muscle loads. If the point is selected, the robot moves the corresponded limb slightly towards the reversed direction described by the point. In this paper, action space is initialized to have a two-dimensional normal distribution around the central point of each area. Reward function should be designed so that the robot gets higher rewards when the estimated muscle loads become low (Fig.3 right). Here, using the evaluated muscle loads described section 3, the reward function is expressed as follows. x = Rwd =
n
i=1 AcLoadi −M axRwd 2 M axX 2 x + M axRwd
(5)
In the equation, x refers to the sum of the muscle loads for one limb whereas AcLoadi indicates the evaluated load for actuatori in the limb. Rwd expresses the instant reward and M axRwd refers to the maximum value for the instant reward while M axX indicates the maximum value for the sum of the limb’s muscle loads. In this paper, M axRwd and M axX is decided empirically.
T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability 773 1
40 10.86[s]
0.9 35 Height of the left foot [cm]
Average actuator load
0.8 0.7 0.6 0.5 0.4 0.3
30
25
20
0.2 0.1 0
2.79[s]
0
20
40
60
80
Time[s]
Figure 5. Transfer of the average actuator load during pedaling by one leg
15
0
1.5
3
4.5
6
7.5
9
10.5
12
13.5
15
Time[s]
Figure 6. Transfer of the height of the left foot during pedaling by one leg
When the robot selects actions in each state, continuous action space is digitized into 9 areas (tripartition for each axis). First, one area is selected by soft-max method based on the value P for the area 3 . In this case, probability for the area selection is described P by e τ , where τ is a parameter called temperature and is set to 0.9 in this paper. Then, by − greedy method one point in the area is selected from the representative points used in the calculation of P . By two variables that describe the selected point, adapting direction is decided. After executing the selected action, the result of the action is evaluated and the action values are updated. In updating the action values for the current state, the result is propagated to the near states.
5. Experiments & Discussions for Humanoid’s Pedaling Motion with Adapting Ability With the reinforcement learning environment explained above, experiments for learning to adapt to the constraints by actual humanoid is shown in this section. In this paper, musculo-skeletal humanoid Kenji [6] (Fig.4 left) is used. Kenji is developed based on the tendon-driven humanoid Kenta [3][4]. Some of the unique features for Kenji are flexible spine, design of reinforceable muscles[16], redundancy of actuators and sensors. As shown in the Fig.4, half cycle of pedaling motion is done by a left leg with the robot’s hip fixed at the saddle in this experiment. Here, the objective behavior program for the pedaling motion is realized by controlling the wire lengths in the left leg for the posture sequence of half cycle pedaling4 . On the learning of adapting to the constraints during pedaling motion, relative modification against the current wire length is acquired through R-learning. When the action for adaptation is decided, the pgain of the actuator that lets out the wire is lowered for preventing increasing the actuator load by antagonistic activity. Without adaptation ability, the muscle loads for the pedaling motion is expected to become higher because of the internal force. Here, we compare two pedaling motions: 3 Value for the area is calculated by the sum of the five representative points in the area. Five points are a central point and four midpoint of the perpendicular line from the central point to each side of the square-shaped area. 4 These sequences of the wire lengths are generated by minimum jerk interpolation among three representative postures during half cycle pedaling
774 T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability
one with learning mechanism for adaptation, and the one without the learning mechanism. Fig.5 shows the transfer of average actuator load for the left leg. In the figure, horizontal axis indicates the time (unit is second) whereas vertical axis expresses average actuator load for one motor that drives each muscle for the left leg. In this experiment, evaluation of the actuator load is repeatedly done in every 40 [ms], and 13 muscles are used to drive the left leg. In the figure, the red solid line expresses the average load for the pedaling motion without adaptation while the blue dashed line indicates the one for the pedaling motion with adaptation. From the figure, the fact that actuator load becomes higher if the leg moves against the constraint is confirmed. With adaptation ability, it is clear that the average actuator load is kept low, and the maximum load is only 0.4. Thus adaptation learning system is effective for reducing the actuator load during executing pedaling motions. Next, the transfer of the position of the tip of the left leg during pedaling motion is recorded in 30[Hz] by 3D position sensor using alternating magnetic field (’Fastrack’ by Polhemous). Fig.6 shows the vertical transfer of the tip of the left leg for both pedaling motions. In the figure, horizontal axis indicates the time (unit is second) while vertical axis expresses the height of the tip of the left leg (unit is cm). The red solid line in the figure corresponds to the pedaling motion without adaptation whereas the blue dashed line corresponds to the pedaling motion with adaptation. Basically, the completion time for half cycle pedaling should be same between these two motions since both the number of representative posture and the interpolation time is same in those motions. However, the fact that the completion time for the pedaling motion with adaptation is shorter than the one without adaptation in the figure. This fact suggests the possibility that muscle driven pattern for the pedaling motion is reorganized to be executed smoothly and the performance of the motion improved by adaptation mechanism.
6. Conclusion and Future Works For realizing adapting mechanism for a pedaling motion, reinforcement learning mechanism based on the muscle loads is proposed in this paper. The learning mechanism is implemented with the framework of parallel evaluating monitors. In the framework, muscle loads are monitored and evaluated in every 40 [ms] from several kinds of sensor information that indicates the actuator state. Based on the evaluated muscle loads, suitable actions are learned online with R-learning, reinforcement learning method for continuing tasks. Using actual muscle-driven humanoid Kenji, feasibility of the learning mechanism for adaptation in the pedaling motion of a cycle is confirmed. As future works, application for the human-robot interaction with a close contact is considered. The proposed system can be used for adapting to the interferences from humans, and this can lead to a seamless teaching and recalling system. In addition to that, by learning the relationships among limbs, humans can teach the motion of the limb without any constraints as being related to adaptation of other limbs.
References [1] B. Hannaford, J. M. Winters, C. Chou and P. Marbot, The Anthroform Biorobotic Arm: A system for the study of spinal circuits, Annals of Biomedical Engineering, vol.23, pp. 399-408, 1995.
T. Yoshikai et al. / Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability 775
[2] T. TAKUMA, S. NAKAJIMA, K. HOSODA, and M. ASADA, Design of Self-Contained Biped Walker with Pneumatic Actuators, SICE Annual Conference 2004, WPI-2-4, 2004. [3] M. Inaba, I. Mizuuchi, R. Tajima, T. Yoshikai, D. Sato, K. Nagashima and H. Inoue, Building Spined Muscle-Tendon Humanoid, Robotics Research: The Tenth International Symposium, pp.113-127, Springer, 2003. [4] I.Mizuuchi, T.Yoshikai, D.Sato, S.Yoshida, M.Inaba and H.Inoue, Behavior Developing Environment for the Large-DOF Muscle-driven Humanoid Equipped with Numerous Sensors, in Proc. of the IEEE International Conference on Robotics & Automation, pp.1940-1945, 2003 [5] Y.Sodeyama, I.Mizuuchi, T.Yoshikai, Y.Nakanishi and M.Inaba, A Shoulder Structure of Muscle-Driven Humanoid with Shoulder Blades, in Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1077-1082,2005 [6] I.Mizuuchi, T.Yoshikai, Y.Nakanishi and M.Inaba, A Reinforceable-Muscle Flexible-Spine Humanoid "Kenji", in Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.692-697, 2005 [7] I.Mizuuchi, T.Yoshikai, Y.Nakanishi, Y.Sodeyama, T.Yamamoto, A.Miyadera, T.Niemela, M.Hayashi, J.Urata and M.Inaba, Development of Muscle-Driven Flexible-Spine Humanoids, in Proc. of International Conference on Humanoid Robots(Humanoids’05), 2005 [8] K.Harada, S,Kajita, F.Kanehiro, K.Fujiwara, K.Kaneko, K.Yokoi and H.Hirukawa, Real-Time Planning of Humanoid Robot’s Gait for Force Controlled Manipulation, in Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.616-622, 2004 [9] K.Inoue, Y.Nishihama, T.Arai and Y.Mae, Mobile Manipulation of Humanoid Robots -Body and Leg Control for Dual Arm Manipulation-, in Proc. of the IEEE International Conference on Robotics & Automation, pp.2259-2264, 2002 [10] Y.Nakanishi, I.Mizuuchi, T.Yoshikai, T.Inamura and M.Inaba, Pedaling by a Redundant MusculoSkeletal Humanoid Robot, in Proc. of International Conference on Humanoid Robots(Humanoids’05), 2005 [11] T.Yoshikai, I.Mizuuchi,and M.Inaba, A Humanoid Behavior Modification System by Monitoring & Evaluating Causality between Sensors & Actions, in Proc. of International Conference on Humanoid Robots(Humanoids’05), 2005 [12] T.Yoshikai and N.Otake and I.Mizuuchi and M.Inaba and H.Inoue, Development of an Imitation Behavior in Humanoid Kenta with Reinforcement Learning Algorithm Based on the Attention During Imitation, in Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1192-1197, 2004 [13] Schwartz.A , A reinforcement learning method for maximizing undiscounted reward, in Proc of the 10th International Conference on Machine Learning, pp.298-305, 1993 [14] R.S.Sutton and A.G.Barto, Reinforcement Learning:An Introduction(Adaptive Computation and Machine Learning), The MIT Press, 1998 [15] Santamaria.J.C, Sutton.R.S, and Ram.A, Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces, Adaptive Behavior 6 (2), pp.163-218, 1998 [16] I.Mizuuchi, H.Waita, Y.Nakanishi, T.Yoshikai, M.Inaba and H.Inoue, Design and Implementation of Reinforceable Muscle Humanoid, in Proc. of IEEE International Conference on Intelligent Robots and Systems, pp.828-833, 2004
776
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Behavior Transition between Biped and Quadruped Walking by using Bifurcation Kenji Asa , Kosei Ishimura and Mitsuo Wada Graduate School of Information Science and Technology, Hokkaido University Abstract. Animals select gait which is suitable for their moving speeds, environments and their purposes. Such behavior transition is carried out to minimize energy consumption or to adapt to the environment. The transition of animals from a gait to another gait is discontinuous generally. However, the changes of moving speed, energy consumption, and etc. are continuous. In this research, we focus our discussion on the discontinuous dynamic behavior transition of a walking robot. The robot transits from biped walking to quadruped walking to adapt to the environment. For the transition, bifurcation of potential function is utilized. We demonstrate the behavior transition of the walking robot between biped and quadruped walking. The behavior transition which is an adaptive behavior to the environment depending on gradient of a slope is carried out by using bifurcation of parameters used for control. We also show the transition using the hysteresis on a slope where the robot with simple transition between biped and quadruped walking cannot walk. The hysteresis is generated by using a potential function which changes dynamically depending on some parameters. Keywords. Walking Robot, Bifurcation, Behavior Transition, CPG, Adaptive Behavior
1. Introduction Recently, various researches about walking have been carried out as one of locomotion mechanisms for robots. Among them, biologically inspired walking control method is studied actively. Because the robot with the biologically inspired method doesn’t need reference trajectory and can perform robust and adaptive walking against unknown disturbance and environment. In some biologically inspired methods, CPG (Central Pattern Generator) [1] which generates and controls periodic motion such as walking, flutter, etc [2] [7] is applied. Concerning walking control with CPG, biped walking based on "global entrainment" has been realized [2] [5] through computer simulations. Experiments of real walking robots with CPG have also been carried out about both biped walking and quadruped walking [5] [7]. Animals select gait which is suitable for their moving speeds, environments and their purposes. Researches of behavior transition which uses these indicators also have been studied energetically. Fujii et al proposed new CPG circuit model based on neurophysi1 Correspondence
to: Graduate School of Information Science and Technology, Hokkaido University North 14, West 9, Kita-ku, Sapporo, 060-0814, Japan Tel.: +81-11-706-7105; Fax: +81-11-706-7105; E-mail: [email protected]
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
777
ology and demonstrated the behavior transition between stepping behavior and walking behavior of a biped robot [8]. However, this research dealt with only behavior transition between stepping and walking behavior. On the other hand, Yuasa et al showed a method of pattern generator model which uses bifurcation phenomenon with potential function for quadrupeds [9]. They realized transition of discontinuous gait pattern (walk, trot and gallop) by using continuous change of parameters (moving velocity), and attempted correspondence of potential function to energy consumption. However, their research dealt with only the quadruped walking. We attempt to apply their method and to use it for more various behavior transition. Especially, we focus on bilateral adaptive transition between biped and quadruped walking, in order to maintain locomotion against disturbances and changes of environment. Bipeds like a human and monkey can choose biped and quadruped walking to suit their environment and purposes. As research dealt with a transition from quadruped to biped walking, Mori experimented on the behavior transition using a japanese monkey from the viewpoint of neurophysiology[10]. Through the experiment, they discussed how the quadrupedal monkey acquires the bipedal walking posteriori. On the other hand, Aoi et al studied the transition from the viewpoint of the robotics and achieved smooth transition from quadruped walking to biped walking[11]. However, they deal with only unilateral transition. As mentioned previously, we deal with bilateral transition between biped and quadruped walking. Therefore, a transition from biped walking to quadruped walking is also treated. The transition is used to maintain a steady locomotion against an environmental variation. We have demonstrated that the robot can perform adaptive behavior to avoid falling over through transition from biped walking to quadruped walking [12]. The transition is carried out to get wider support polygon to avoid falling over. This adaptive behavior is realized by simple switching of two control laws (biped walking, quadruped walking) based on an indicator of walking stability from a control law for biped walking to another for quadruped walking. However, we use a single control law in this research for the walking with adaptive transition. Although the control law is single, the transition from one behavior to another one can be realized by moving one limit cycle to another limit cycle of the dynamics through the bifurcation. Furthermore, hysteresis in the transition can also be realized by the method. The hysteresis in the transition is an important characteristic because excessive frequent occurrences of bilateral transition could be caused in some situations without hysteresis.
2. Controller of the Walking Robot We deal with the walking robot which consists of a mechanical system and a controller. We apply CPG which is biologically inspired control method as a controller. In addition to the CPG, posture controller which works on the mechanical system is applied independently. The posture controller controls posture of the robot’s body in parallel to gravity direction. The control diagram is shown in Fig.1.
778
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
2.1. CPG In this research, walking rhythm is generated by neural oscillator proposed by Matsuoka [13]. We designed neural oscillator network as in Fig.2. The neural oscillator is represented by:
(1) (2)
(3) where subscript is number of neuron, is inner state of the -th neuron, is a weight is positive value, it means of connection from the -th neuron to the -th neuron. If inhibitory connection. If it is negative value, it means excitatory connection. is output is an external input with a constant rate to -th neuron, is a parameter of -th neuron, that determines the steady-state firing rate for , is a variable representing the degree is a feedback signal from of the adaptation or self-inhibition effect of the -th neuron, mechanical system to -th neuron, is a time constant of , and is a time constant of . Torques are given from neural oscillator to each joints of a walking robot as follows (4) where and are outputs of neuron, and are coefficients which transform output of neuron into torque, subscript f indicates flexor ,and subscript e indicates extensor. 2.2. Posture Controller By only CPG, it is difficult for the walking robot to generate stable walking. Therefore, independently of CPG, a controller to maintain standing posture is needed [3], and we introduce the posture controller based on the reference [8] in parallel to CPG. The posture
Figure 1. Control diagram
Figure 2. Neural oscillator network
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
779
Table 1. Left:Degree of freedom, Right:Parameters of the biped robot
Position of the robot Posture of the robot Joint angles of right leg Joint angles of left leg Joint angles of right arm Joint angles of left arm
Length
Mass
Trunk
0.700 [m]
27.76 [kg]
Thigh
0.435 [m]
7.450 [kg]
Shank
0.425 [m]
3.600 [kg]
Foot(Fig.5)
0.250 [m]
1.110 [kg]
Dorsum of foot(Fig.5)
0.175 [m]
0.777 [kg]
Upper arm
0.350 [m]
3.000 [kg]
Lower arm
0.350 [m]
3.000 [kg]
Total
1.560 [m]
64.08 [kg]
controller works on only support legs and is PD-controller which controls each link to desired angle. The posture controller follows Eq.(5).
(5)
where subscript is a number of the robot’s joint, is torque which is given to -th joint , and are positive constants, is desired angle from the posture controller, , is output of neuron for posture controller. of -th joint, and
3. Biped Robot and Environment Model We deal with 2D walking robot. The walking robot has 11 rigid links and 13 DOF (Fig.3,Table 1:left), and each joint has passive friction. Movable range of elbow joint is not considered. The absolute coordinate system is set as shown in Fig.4. These are decided with reference to parameters of human (Table 1:right). As contact points between the robot and the ground, right-and-left toes, heels, knees, hands and elbows are assumed. The contact points with the ground are modeled by virtual spring-damper system. Fig.5 is the conceptual figure, and the system follows Eq.(6).
Figure 3. Joint angles
Figure 4. Absolute coordinate sys- Figure 5. Model of ground reaction tem force
780
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
(6) is spring constant of horizontal direction in absolute coordinate system, where is damping constant, is spring constant of vertical direction in absolute coordinate system, is damping constant, , represent position of tip of a link in absolute coordinate system, and is stationary point of horizontal direction of virtual spring. In dynamical simulation, we used Newton-Euler method and fourth order RungeKutta method (time step 0.25 [ms]).
4. Behavior Transition by using Bifurcation Phenomenon We aim the realization of various behavior transitions by using bifurcation phenomenon. We try the transition among upright, static biped walking, dynamic biped walking and quadruped walking (Fig.6). 4.1. Concept of Behavior Transition System by using Bifurcation Our viewpoint is not analysis of bifurcation, but design of bifurcation of the potential function for the transition. The value of each parameter used for control (for example , , , in Eq.(1) (5), , of each joints and desired angle of posture controller) are given the value of the minimum point of the potential. We call these parameters control ones. Fig.7 shows the concept of bifurcation phenomenon. We consider a potential function whose shape changes dynamically depending on a parameter. We call the parameter bifurcation one. For example, the gradient of a slope is used as a bifurcation parameter in numerical simulation of transitions depending on gradient of a slope. The potential function can have a single minimum point (Fig.7:top and bottom) in some states and have two or more local minimum points (Fig.7:center) in other states. Change of the local minimum points of the potential function is performed by operating a continuous bifurcation parameter. Discontinuous jump of each parameter used for control is caused by sudden annihilation of minimum point (Fig.7). We use creation and annihilation of minimum point to cause bifurcation phenomenon. Desired bifurcation phenomenon can be caused by designing ideal potential function.
Figure 6. Behavior transition
Figure 7. Concept of bifurcation
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
781
4.2. Configuration of Behavior Transition System In this section, the configuration of the behavior transition system is explained. First, the partial potential functions (Fig.8, Eq.(7)) which have a single minimum at desired values of each control parameter are prepared. Second, in order to cause bifurcation at a desired value of the bifurcation parameter , the support functions (Fig.9, Eq.(8)) are used. A global potential function consistes of the sum of partial potential functions (Eq.(9)). As in Fig.10, the global potential function (Eq.(9)) has not only a global minimum point but also local multiple minimum points (Fig.10) depending on the value of bifurcation parameters. Control parameter used for control is derived as a value where the global potential function becomes min. The minimum point of the global potential function is defined as a point where the gradient of global potential function (Eq.(10)) becomes 0, and derived by Newton-Rapthon method. Therefore, sudden jumping from a local minimum point to another one, namely, jumping of the value of control parameter corresponds to the discontinuous behavior transition of the robot. We can obtain desired bifurcation phenomenon by designing this potential function and operating the bifurcation parameters.
(7) (8)
(9)
(10) where is control parameter, subscript is pattern of behavior, is number of control , is desired value of control parameter, is patameter, is sharpness of function bifurcation parameter, is constant for transition and represent global potential. The and the are used. Behavior transitions of animals show discontinuous transitions having hysteresis generally. Therefore, the hysteresis seems to be important. In this study, hysteresis is
Figure 10. The transition of a control paFigure 8. A partial potential rameter depending on bifurcation paramFigure 9. A support function eter. function
782
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
considered in trajectory of control parameters. In other words, trajectory of control parameters at the increase of bifurcation parameter is different from that at the decrease (Fig.10). Moreover, the shape of hysteresis shown in x-y plane of Fig.10 can be adjusted by arranging asymmetry of the support functions.
5. Results of Simulation At first, we try the behavior transition of the robot from dynamic biped walking to static biped walking to upright. In addition, we realize adaptive behavior to the changes of a slope. As the adaptive behavior to the changes of a slope, the robot transits to quadruped walking on a steep slope where the robot is impossible to maintain biped walking, and transits to biped walking on an easy slope where the robot is possible to perform biped walking. Moreover, by using hysteresis, the robot can avoid frequent occurrence of the transition between biped and quadruped walking on a slope where the gradient close to the bifurcation point continues. , , in Eq.(1) (5), , As control parameters, we used 29 parameters of , of each joints and desired angle of posture controller. 5.1. Behavior Transition 1 (Dynamic Biped Walking Static Biped Walking Upright) Elapsed time from walking start is used as a bifurcation parameter for transition from dynamic biped walking to static biped walking to upright. The system is designed to cause the transition from dynamic biped walking to static one at 5 [s], and the transition from static biped walking to upright at 10 [s]. Fig.11 shows snapshots(1 [s] intervals) of the results. Fig.12 shows one of the bifurcation parameters. This parameter is a coefficient which changes the output from CPG of hip joint into torque.
Figure 11. Snapshots of transition among dynamic walking, static walking and upright (1 [s] intervals) Figure 12. The behavior transition 1 of the control parameter
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
783
5.2. Behavior Transition 2 (Transition between Biped and Quadruped Walking depending on the Gradient) In this section, we’ll show transition between biped and quadruped walking depending on the gradient of the slope. As bifurcation parameters, Eq.(11) is used. (11) is derivative where is gradient at the ground projection point of the center of mass, is positive constant. In this case, although we use gradient at the ground value of , projection point of the center of mass as the bifurcation parameter, the gradient can be measured by using potentiometers attached to the each robot’s joint and accelerometers. Fig.13 shows that the robot can transit to quadruped walking on a steep slope where the robot is impossible to maintain biped walking, and as a result it can go up a steep slope adaptively. Moreover, it returns to biped walking on an easy slope where the robot is possible to maintain biped walking. When the robot returns to biped walking, reflex is introduced as Eq.12 as auxiliary torque.
(12) where is constant, is start time of reflex, is finish time of the reflex. Next, we try stable walking on a slope where the gradient close to the bifurcation point continues. Fig.14 and 15 are the results of numerical simulation. The Fig.14:top shows the result without behavior transition system. Namely, control parameters of CPG and posture controller are fixed for the biped walking. The Fig.14:center shows the result with behavior transition system without hysteresis. Namely, control parameters are transited from those for the biped walking to those for the quadruped one without hysterisis. In the middle case, the transition between biped and quadruped walking occurs frequently, and the robot falls over. In Fig.14:bottom, the control system with hysteresis as x-y plane of Fig.15 is used. In that case, stable quadruped walking continues after the transition from biped walking to quadruped walking since the system has hysteresis. Therefore, it is important that the history of control parameters have hysteresis on a slope where the gradient close to the bifurcation point continues.
784
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
Figure 13. Adaptive behavior to steep grade (1 [s] intervals) The robot has the improved system.
Figure 14. Snapshots of the robot (0.5 [s] intervals) Top:the case of only CPG and posture controller. Center:the case of simple transition which don’t have Figure 15. The transition of the control parameter hysteresis. Bottom:the case of transition system with hysteresis.
6. Conclusion In this paper, it is shown that transition between discontinuous behaviors can be performed by using the bifurcation phenomenon. The robot, to which the behavior transition system is applied, can perform the behavior transition among upright, static biped walking and dynamic one. In addition, it is shown that even if it is impossible to maintain biped walking on a steep slope, the robot can maintain locomotion adaptively through transition from biped walking to quadruped walking. By transition from quadruped walking to biped walking in the gradient of slope in which biped walking is possible, the walking robot realized efficient adaptive behavior from a viewpoint of moving speed. Moreover, the robot can avoid frequent occurrence of the transition between biped and quadruped walking on a slope where the gradient close to the bifurcation point continues. Through the numerical simulations, the importance of the hysteresis of control parameters is also shown. Therefore, it is confirmed that proposed method is useful for adaptive behavior transition against various environments. As future works, we will study more behavior transitions among not only uprignt, static biped walking, dynamic biped walking and quadruped walking, but also running, jumping, passing over stairs and etc, and demonstrate usability of our behavior transition system with real robots. Furthermore, we will study autonomous acquisition of the shape of the potential function in relation to the mechanical properties (e.g., body dynamics and musculoskeletal properties).
K. Asa et al. / Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation
785
References [1] U.B ssler: On the Definition of Central Pattern Generator and its Sensory Control, Biol. Cybern., Vol.54, pp.65-69, 1986. [2] G.Taga, Y.Yamaguchi and H.Shimizu: Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment, Biol. Cybern., Vol.65, pp.147-159, 1991. [3] G.Taga: A model of the neuro-musculo-skeletal system for anticipatory adjustment of human locomotion during obstacle avoidance, Biol. Cybern., Vol.78, pp.9-17, 1998. [4] S.Miyakoshi, G.Taga, Y.Kuniyoshi and A.Nagakubo: Three Dimensional Bipedal Stepping Motion using Neural Oscillators ―Towards Humanoid Motion in the Real World―, 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.84-89, 1998. [5] M.Matsuura, K.Ishimura and M.Wada: Emergence of Biped Robot Locomotion through Neural Entrainment, JRSJ, Vol.22, No.3, pp.56-62, 2004 (in Japanese). [6] Y.Fukuoka, H.Kimura and A.H.Cohen: Adaptive Dynamic Walking of a Quadruped Robot on Irregular Terrain Based on Biological Concepts, The International Journal of Robotics Research, Vol.22, No.3-4, pp.187-202, 2003. [7] K.Tsujita, K.Tsuchiya and A.Onat: Adaptive Gait Pattern Control of a Quadruped Locomotion Robot, 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2318-2325, 2001. [8] A.Ishiguro, A.Fujii and P.E.Hotz: Neuromodulated Control of Bipedal Locomotion Using a Polymorphic CPG Circuit, 2003 International Society for Adaptive Behavior, Vol.11, No.1, pp.7-17, 2003. [9] H.Yuasa and M.Ito: Generation of Locomotive Patterns and Self-Organization, Journal of Robotics and Mechatronics, Vol.4, No.2, pp.142-147, 1992. [10] S.Mori: Higher Nervous Control of Quadrupedal vs Bipedal Locomotion in Non-Human Primates; Common and Specific Properties, Proceedings of the 2nd International Symposium on Adaptive Motion of Animals and Machines, 2003. [11] S.Aoi and K.Tsuchiya: Transitioin from Quadrupedal to Bipedal Locomotion, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3419-3424, 2005. [12] K.Asa, K.Ishimura and M.Wada: Adaptive Behavior to Environment of a Humanoid Robot with CPG, SICE Annual Conference 2004. [13] K.Matsuoka: Mechanisms of Frequency and Pattern Contorl in the Neural Rhythm Generotors, Biol. Cybern., Vol.52, pp.367-376, 1985.
786
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Tendon Arrangement Based on Joint Torque Requirements for a Reinforceable Musculo-Skeletal Humanoid Yuto Nakanishi a,1 , Ikuo Mizuuchi a , Tomoaki Yoshikai a , Tetsunari Inamura a and Masayuki Inaba a a University of Tokyo Abstract. It is difficult to decide the most suitable muscle configuration of musculo-skeletal robots at the design phase, because the optimized arrangement depends on a robot task. In this paper, muscle configuration means types of muscle actuators and tendon arrangement, which is attachment positions, paths and quantity of tendons. In order to evaluate more suitable muscle configuration, after the design and development of robots, we have developed a musculo-skeletal humanoid whose muscles can be added and rearranged easily. We call these robots reinforceable musculo-skeletal robots. This paper describes how to decide the robot tendon arrangement which is suitable for each task based on the torque requirements. And we actually addressed the proper tendon arrangement minimizing the sum of muscle forces by using proposed method, in the two typical motions, such as bending-knee and kicking. Keywords. Decision of tendon arrangement, Changing muscle configuration, Reinforceable musculo-skeletal humanoids
1. Introduction We human beings can easily do various motions. So far several humanoids have been developed[1,2], but they cannot do what even children can do, such as climbing, swimming, cycling, and so on. It is partly because present general humanoids have less flexibility, redundancy and complexity than human. We have developed the fully-tendondriven redundant humanoids [3,4] in order to do various natural motions like human beings. Tendon positions of these musculo-skeletal humanoids is often decided according to those of human beings, because of their structure similarities. When we actually develop humanoids, however, we have to select tendons implemented from a large number of human tendons. The proper selection of tendons depends on tasks, which a humanoid does. Therefore, if we can easily rearrange tendons after developing humanoids, we can obtain proper tendon configuration according to robot tasks. We have newly developed a musculo-skeletal humanoid Kotaro[5] (Figure.1) as the robot, the tendons of which can be rearranged easily. In this paper, we call these robots reinforceable musculo-skeletal humanoids. 1 Correspondence
to: Yuto Nakanishi, 7-3-1,Hongo,Bunkyo-ku,Tokyo. Tel: +81 3 5841 7416; E-mail: [email protected].
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
787
On the other hand, in order to utilize the tendon rearrangement ability, we need the method to select a proper tendon arrangement from a large number of arrangement candidates based on some criteria. Pollard and Gilbert[6] give an overview of tendon arrangement based on muscle force requirements in a robotic finger. But, their main purpose is to confirm whether typical tendon-driven finger designs can be tuned to have force capabilities equivalent to that of the human finger, rather than to search the proper tendons position. As a result, they treat only two specific designs of finger tendon arrangements. In this paper, we explore general method how to search a proper tendon arrangement where a reinforceable musculo-skletal humanoid can generate joints torque in order to realize a task. Actually, we address the tendon arrangement which is suitable for a bending knee motion and one which is suitable for a kicking motion by using the proposed method, and present the feasibility of this method.
Table 1. The specifications of Kotaro
category
DOF
actuators pivots
type weight height spine neck legs arms head DC motor spine legs arms head
quantity 19.8kg 133cm 3 × 5 = 15 3×3=9 8 × 2 = 16 9 × 2 = 18 3 90 80 124 70 30
Figure 1. Kotaro
2. A Reinforceable Musculo-Skeletal Humanoid Kotaro In this section, we describe briefly the mechanical structure of a musculo-skeletal humanoid Kotaro. The positions and quantity of tendons and muscles of Kotaro can be easily rearranged. Table 1 shows the specifications of Kotaro. There are 90 muscles on Kotaro at this time, but we can easily increase or decrease muscles of Kotaro. We adopt a geared DC motor, which rewinds and veers out a chemical string(e.g. vectran fiber) by rotating a pulley, as a muscle and tendon of this tendon-driven robot. Each muscle has tension, length, current and temperature sensors and can be driven by tension control or length control using these sensors. And we combine these muscle components(e.g. sensors, an actuator and a pulley) into one unit, which we call a muscle unit. This muscle unit can be easily attached and detached to the robot at only two points. One point is the end of a string and another is the end of a muscle unit (Figure.2) . A bone structure is an endoskeleton(Figure.3), where boards and cables for actuators are built in to avoid interference between added muscles, boards, and cables. Tendon arrangement can be changed more easily by setting up the pivots, which are attachment points of tendons, on the bone structure(Figure.4). Many pivots are put on the bones
788
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
especially around the hip joints, the shoulder joints, and the vertebral joints, because the variety of tendon arrangement is needed around these joints. Kotaro has many three-dimentional spherical joints, which enable to do natural motions like human beings. In order to assure the variety of tendon arrangements in these spherical joints, pulleys cannot be installed on the tendon path. Getting the pulleys off enables to expand movable range of a spherical joint and vary moment arms of tendon depending on a robot posture like human beings. Kotaro can realize various tendon arrangements by adopting following three architecture points. • An easily attachable muscle unit • An endoskeleton put on many pivots • Three-dimentional spherical joints End point of a string Pulling a string by
Muscle unit
Cross-section
Pelvis
Joint
Robot
Bone structure Attatchment Pivot End point of a muscle
Figure 2. A muscle unit
Figure 3. An endoskeleton
Figure 4. Pivots
3. The method to decide optimized tendon arrangement In this paper, we propose the method to automatically decide the proper tendon arrangement of a musculo-skeletal humanoid for a given task. Figure.5 is the flowchart of this method. We select the proper tendon arrangement from various possible candidates by searching process. It is because that there can be multiple tendon arrangements and it is necessary to solve optimal problems. In this searching process, we have to evaluate each tendon arrangement by some criteria. As an criterion, for example, minimizing the sum of muscle forces is considered. In order to obtain muscle forces, we need to compute time series of joint torques required for a given motion. Overview of the method is following: • • • •
Evaluating a tendon arrangement Calculating time series of joint torque required for a goal motion Calculating muscle force required to generate joint torque Searching the proper tendon arrangement
3.1. Evaluation of tendon arrangement Tendon arragenment may be evaluated by following criteria.
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements Motion
Joint torque
Muscle force Force
789
Torque
Time
Time
Force
Torque
Evaluation Minimizing the sum of muscle force
Decision of proper tendon arrangement
Tendon arrangements (arround crotch joint)
Figure 5. Flowchart of the method to decide tendon arrangement
• Capability of generating joint torque It is the primary condition to generate joint torque required for a motion, in the selected tendon arrangement. • Minimizing the sum of muscle forces In the muscle driven by DC motor, current of a DC motor is roughly proportional to motor torque, which is muscle force. Large current causes to increase temperature and decrease performance of the motor. Therefore, the sum of muscle forces should be as small as possible. Also in the McKibben-style muscles, minimizing the sum of muscle forces leads to compactness, because in these muscles crosssectional area is proportional to force. • Maximizing movable range of end effector Movable range of joints depends on a tendon arrangement, because the tendon length between expansion and contraction is limited. When the robot does large range of movement, it is important to change tendon configuration to maximize movable range. • Simplifying tendon paths In order to avoid breakdown of the robot, we must attach tendons to the robot without crossing tendon paths. This can be realized by picking out some proper candidates of tendon arrangements (3.4). In this paper, we choose the tendon arrangement which realizes joint torque and minimizes the sum of muscle forces during a motion: T R
pi,k
(1)
minimizeE
(2)
E=
k=1 i=1
790
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
where subscript T denotes a quantity of time line of a motion, subscript R denotes a quantity of the robot muscle position, pi,k is the force required of muscle i at a phase of the motion and E is an evaluation criterion of tendon arrangement. 3.2. Calculating time series of joint torque required for a goal motion There are many researches for calculating time series of joint torque required to realize robot motions in the various cases, such as static motions, dynamic motions, contact with an environment or not [7]. The methods presented by these researches can be applied to a humanoid Kotaro in this paper. This section presents how to calculate joint torque to → realize a static motion. The joint torque − τ is computed as follows: −−→ → → − − → τ = J + fend + (I − J + J ) k + − qg
(3)
where J denotes Jacobian between end-effector velocity and the joint speeds, J + is −−→ → − pseudo inverse matrix of J , I is identity matrix, fend is force vector at end-effector, k → − is arbitrary vector, and q g means the gravity compensation torque. 3.3. Calculating muscle force required to generate joint torque → → In order to calculate Eq. (1), we have to evaluate muscle forces − p from joint torques − τ by a follwing equation. → − → p τ = J t−
(4)
where J t denotes Jacobian between joint torques and muscle forces, which can be calculated as moment arms of tendons around each joint. Each tendon path is approximated as a line between two attachment points of each tendon. → If there is not a unique set of muscle forces that will achieve the torques − τ , then it → − is not obvious what is the best p . Although it does not in general produce a global minimum, the following approach can be used to find a good initial guess. The force needed to achieve given torques is formulated as the following linear programming problem. minimize
R
pi
(5)
i=1
Subject to: → − → p τ = J t−
(6)
− → p = (p1 ...pR ) ≥ 0
(7)
− → p is obtained by solving the problem in Eq. (5) through Eq. (7). 3.4. Searching the proper tendon arrangement The proper tendon arrangement minimizing E is selected from many possible arrangements. This section describes how to decide the possible arrangements, which we have to choose from many pivots putted on the robot bone structure.
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
791
Select 4 tendons from each group of paths
Pelvis
Crotch joint
Whole tendon paths
Thighbone
Separate tendon paths
Pivot for the unit
Tendon path
Pivot for the string
Figure 6. Selection of possible tendon arrangements
In the case of tendon arrangement composed of only one-joint muscles, the tendon arrangement of each joint can be treated independently. This enables to do parallel searching processing. In a three-dimentional spherical joint, however, at least four tendons are needed to control its joint sufficiently, and a quantity of possible tendon arrangements can be larger than in other joints. By separating tendon paths into four groups and selecting each tendon from each group(Figure.6), we can not only reduce a quantity of possible arrangements, but also prevent tendon paths from crossing each other. In the Figure.6, a red cross mark means a pivot for the end of a string of tendon, and a blue circle mark means a pivot for the end of a muscle unit. If a quantity of red cross marks Nr is 8, a quantity of blue circle marks Nb is 8 and a quantity of muscles R is 4, whole combination number of selective tendon arrangements is: (Nr × Nb )R = (8 × 8)4 = 16777216
(8)
But if we select each muscle from each tendon path like Figure.6, combination number is following: ((Nr /2) × (Nb /2))R = (4 × 4)4 = 65536
(9)
We can sufficiently search the proper tendon arrangement from these reduced combinations. However, in the tendon arrangement including two-joint muscles or increasing more muscles, the combination of tendon arrangements is too large to solve the problem by a simple searching method. In these cases, we should use RRT(Rapidly-exploring Random Trees)[8] as a searching method.
4. Results and Discussion In this section, we show that the proposed method optimizes tendon arrangement around the Kotaro hip joint in two kind of motions, which are a bending-knee motion and a
792
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
kicking motion. The robot has to use a different part of the muscles in these two motions. In a bending-knee motion, it is important to support the robot body. Therefore, joint torques need to generate the foot force in the same direction of gravity. On the other hand, in a kicking motion, the robot has to raise its leg. In this case, joint torques need to generate the foot force against gravity. Regarding the muscles around the hip, extensors should be powerded up for bending-and-stretching, and flexors should be powered up for kicking. First of all, we calculated time series of the hip joint torque during each motion(Figure.7), by using Eq. (3) from the Kotaro geometric model, which has also mass properties of each robot link and the pivots position imformation. A quantity of a time line is ten. Figure.7 shows that the joint torques around pitch axis of the hip is different between a kicking motion and a bending-knee motion. Second of all, we chose the possible tendon arrangements in the four one-joint muscles around the hip, by selecting each tendon from each paths group like Figure.6. At this time, a quantity of the possible tendon arrangements was 20736. Third of all, we evaluated each tendon arrangement by computing the sum of muscle forces required for joint torques for each motion, according to Eq. (5), Eq. (6) and Eq. (7). From 20736 combinations, we obtained 252 tendon arrangements, which can realize both motions. Figure.8 shows evaluation of these tendon arrangements, where each point indicates each tendon arrangement, vertical axis indicates evaluation value at the bending-knee motion and horizontal axis indicates evaluation value at the kicking motion. Evaluation value indicates the sum of muscle forces during ten time line. In Figure.8, the tendon arrangement of circle A is the most efficient at a bendingknee(E = 18.08). In the same arrangement, kicking evaluation is 378.33. And the tendon arrangement of circle B is the most efficient at a kicking(E = 30.39). In this arrangement, bending-knee evaluation is 431.79. On the other hand, in the tendon arrangement which has adopted at the real robot Kotaro, bending-knee evaluation value is 268.303, kicking evaluation value is 62.11 Therefore, by tuning tendon arrangement according to each motion, Kotaro muscle forces can be reduced substantially. 800
Bending-knee Kicking
600
1000
Evaluation at kicking
Hip pitch torque
400 200 0 -200 -400
800 600
A 400 200
-600 -800
B 0 0
1
2
3
4 5 Time line
6
Figure 7. Hip joint torque
7
8
9
0
200
400
600
800
1000
Evaluation at bending-knee
Figure 8. Evaluation of tendon arrangements
5. Conclusion This paper proposes a general method how to get a optimized tendon arrangement where a reinforceable musculo-skletal humanoid can generate joints torque to realize a given
Y. Nakanishi et al. / Tendon Arrangement Based on Joint Torque Requirements
793
motion. Actually, we searched the optimized tendon arrangement in the two motions by using the method. According to this method, we can easily change tendon arrangement of the robot without trial and error using the real robot, which is very burdensome. In order to utilize a tendon-driven humanoid, two-joint muscles should be installed on the robot. But, it leads to increasing combinations of tendon arrangements. This problem will be solved by using more efficient searching algorithm and more efficient selection of possible tendon arrangements.
Acknowledgements This research has been partly supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Scientific Research on Priority Areas (#15017220), and Grant-in-Aid for Young Scientists (B) (#15700147). and also supported by NEDO (New Energy and Industrial Technology Development Organization), the Project for the Practical Application of Next-Generation Robots.
References [1] Kazuo Hirai, Masato Hirose, Yuji Haikawa, and Toru Takenaka. The Development of Honda Humanoid Robot. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation, pp. 1321–1326, Leuven, Belgium, 1998. [2] Koichi Nishiwaki, Tomomichi Sugihara, Satoshi Kagami, Fumio Kanehiro, Masayuki Inaba, and Hirochika Inoue. Design and Development of Research Platform for Perception-Action Integration in Humanoid Robot : H6. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’00), Vol. 3, pp. 1559–1564, 2000. [3] Masayuki Inaba, Ikuo Mizuuchi, Ryosuke Tajima, Tomoaki Yoshikai, Daisuke Sato, Koichi Nagashima, and Hirochika Inoue. Building spined muscle-tendon humanoid. In Robotics Research: The Tenth International Symposium, pp. 113–130. Springer Verlag, 2003. [4] Ikuo Mizuuchi, Ryosuke Tajima, Tomoaki Yoshikai, Daisuke Sato, Koichi Nagashima, Masayuki Inaba, Yasuo Kuniyoshi, and Hirochika Inoue. The Design and Control of the Flexible Spine of a Fully Tendon-Driven Humanoid “Kenta”. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2527–2532, 2002. [5] Ikuo Mizuuchi, Tomoaki Yoshikai, Yuto Nakanishi, Yoshinao Sodeyama, Taichi Yamamoto, Akihiko Miyadera, Tuomas Niemela, Marika Hayashi, Junichi Urata, and Masayuki Inaba. Development of muscle-driven flexible-spine humanoids. In Proceedings of International Conference on Humanoid Robots(Humanoids’05), 12 2005. [6] Nancy S. Pollard and Richards C. Gilbert. Tendon arrangement and muscle force requirements for humanlike force capabilities in a robotic finger. In Proceedings of the 2002 IEEE International Conference on Robotics & Automation, pp. 3755–3762, 2002. [7] Yoshihiko Nakamura. Advanced Robotics: Redundancy and Optimization. AddisonWelsley, Boston, MA, USA, 1991. [8] James J. Kuffner and Steven M. LaValle. Rrt-connect: An efficient approach to single-query path planning. pp. 995–1001, 2000.
794
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Vision-based Virtual Information and Semi-autonomous Behaviours for a Humanoid Robot Stasse∗ Olivier a,1 , Jean Semere a Neo Ee Sian a Takashi Yoshimi b and Kazuhito Yokoi∗ a a AIST/ISRI-CNRS/STIC, Joint Robotics Laboratory b Intelligent Systems Research Institute Keywords. Semi-autonomous behaviours, vision based, humanoid robot
1. Introduction This paper considers the problem of a humanoid robot evolving in an unknown environment semi-autonomously through a high-level human supervision. In this case the problems are to build a representation of the world, perceive objects, plan a path to explore the environment, and move accordingly. The sensing system consider here is a stereoscopic vision system. Compare to wheeled robot where 2D representation [1] is often enough for evolving inside an environment, humanoid robot needs a 3D representation of the world. Indeed, they can step on and over obstacles [2] [3], go through narrow spaces and crawl [4]. Therefore in order to plan motions a 3D knowledge of the environment is needed. Gutmann in [5] suggest the construction of a 3D grid map to update a 2.5D navigation map on which an A* algorithm is performed. Kagami [6] used an online dense local 3D world reconstruction by merging depth maps, self localise using the RANSAC algorithm, and planned a trajectory on a projected 2.5 D map. In both case, interestingly, they are no justification on the choice of the destination. One possibility is to use the unknown part of the environment to plan the next position of the robots as proposed by Sujan in [7]. This is especially useful for a full coverage of a closed area. But the most likely application is to be used in conjunction with a remote operator. In this case, a human is able to analyse the scene and to guess that a path is possible for the robot. Our goal is to enhance the information available for the remote operator, and to develop the robot’s autonomy for an efficient collaborative work. In this 1 Correspondence to: Olivier Stasse, Joint Japanese-French Robotics Laboratory, Intelligent Systems Research Institute, Centre Nationale de la Recherche Scientique National Institute of Advanced Industrial Science and Technology, AIST-Central 2 Umezono 1-1-1. Tel.: +81 29 861 9244; Fax: +81 29 861 3443; E-mail: [email protected].
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
795
Figure 1. (left) A polytope describing the full range of a perception for a HRP2’s stereoscopic system. (right) The set of reconstructed points for the polytope
context, a high level of local autonomy is very important because humanoid robots are almost impossible to manipulate at a low level: stability, motion generation are among the keys concepts which have to be solved for a proper teleoperation. Those points have been addressed in a previous work in our group by Sian in [8]. For a multiple purposes platform such as a humanoid, it is also important to be able to perceive precisely objects in order to localise and grasp them [8]. In order to meet both challenges, i.e. long range perception for map-building and short range for object perception several humanoids robots such as COG from MIT, DB from ATR, and several HRP-2 (DHRC, NAIST, JRL) have cameras with different field of views. In both situations the precision and the range of the field of view are important and should be presented to the user in a coherent manner. In this paper, we consider that a stereoscopic system is characterised by a polytope and a resolution function obtained by covariance analysis. The polytope representation gives a geometrical representation of the field of view and the depth-of-field. It is shown how those two information can be used to enhance the information presented to a user: by showing the 3D space where the robot can see in 3D, by showing the perception error of the 3D sensing, and by displaying the unknown space. We also present a simple autonomous behaviour where the robot plan a trajectory for which the destination is specified by the user. In this context, we do not use the full capabilities of the robot, but follows the line of Gutmann [5], i.e. the configuration space is split in simple motions dynamically stable. The work presented here differs by the use of the walking pattern proposed by Kajita in [9]. The goal of this planner is to be fast, in order to be re-plan on-line if an obstacle appear in front of the robot. It can also be used as a high level planner to be used in conjunction with more sophisticated approaches. The remainder of this paper is organised as follow: The first section described the method to specify a stereoscopic system in order to display virtual informations to a remote operator. The second section described how the 3D representation of the world is build, and how unknown space is presented to the user. The semi-autonomous path planning behaviour is described in section 4, and is followed by the conclusion.
796
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
2. Characterisation of a stereoscopic system 2.1. Introduction We consider a stereoscopic rig for which the two cameras Cr and Cl are model by two projective matrices Pr and Pl . A 3D point noted Q is projected by those two matrices on two pixels qr and ql . Considering that qr and ql are measured, and thus noisy, 3D triangulation consists in an estimation of {Qs } the space where lies Q. We are interested in representing the 3D space P where such estimation is possible. We propose to use a polytope representation such as P = {x ∈ R4 : Ax ≥ 0}
(1)
Considering that the stereoscopic rig has a position C and an orientation R, let us define H is the homogeneous matrix representing the rigid transformation {R, C}. It is then very straightforward to test if a 3D point Q of the space might be perceived by the 3D sensor. Indeed if AHQ ≥ 0 then Q can be seen by the 3D sensor. However in order to make it computationally efficient, it is necessary to find a minimal description of P i.e. a minimal A. This section describes how to compute such a minimal description. It also shows how to compute a polytope for a given resolution. 2.2. Building the polytope To build the polytope we consider all the possible combinations between qr and ql . Based on each pair, the corresponding Q is computed using the DLT algorithm has described in [10]. We recall the formulation of this system as it is used in the following of the paper: ⎡ M=
1 (x1 p31 p31 Q 1 ⎢ 3 (y1 p3 1 ⎢ p1 Q ⎢ 1 ⎣ p32 Q (x2 p32 1 (y p3 p32 Q 2 2
⎤ − p11 )Q − p21 )Q ⎥ ⎥ ⎥ = WAQ − p12 )Q⎦ − p22 )Q
(2)
where pik the i-th row of the projective matrix Pk , and W has only diagonal element given by the following vector { p31Q , p31Q , p31Q , p31Q }. Q is found by computing the 1
1
2
2
least-square estimate of (AX)T (AX). This give us the set of possible 3D points which can be perceived by the stereoscopic system. The convex hull of this set is found using the Quickhull algorithm [11] implemented by the qhull library. The main problem with this algorithm is the degeneracy in the output of its result. As several points belongs only to one facet of the convex hull, the resulting polytope is not optimal. In order to find the minimal representation we used an implementation of the double description method provided by Fukuda [12].
797
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
2.3. Error of the 3D triangulation For each 3D point the position uncertainty of Q is computed using covariance backpropagation as described in [10]. More precisely, the covariance matrix ΣQ for Q is given by: + ΣQ = (JT Σ−1 X J)
(3)
where M+ is the Moore-Penrose generalised inverse of matrix M. The Jacobian J is given by J= with
δW δQ
δWAQ δW = WA + AQ δQ δQ
is a 4 × 4 matrix, and for which the diagonals elements are {− p31Q2 , − p31Q2 ,
− p31Q2 , − p31Q2 , and all the other elements are 0. 2
(4)
1
1
2
Practically δW δQ is not computed because it is multiplied by AQ which should be equal to zero. As this reconstruction is done up to scale, also J has 4 rows, its rank is 3. This singularity is easily handled using a SVD decomposition. We call the resulting ellipsoid an uncertain point. This is used to update the occupancy grid. 2.4. Minimal representation The minimal representation of the polytope consists in finding the matrix A with non redundancy. The results provided by the brute-force approach described in the previous paragraph is a polyhedra. The first step is to compute the half-space representation of this polyhedra. A half-space representation of a polyhedra is the name used in computational geometry for the equivalent set of inequalities. Fukuda and others proposed to use the double description method to compute such minimal description. Other methods exist to compute the convex hull and the half-space representation of a polytope. As indicated previously we used the QuickHull algorithm to compute the convex hull of the polytope. Also this algorithm does not provide the minimal representation, it is very quick to use. It also provides a Delaunay triangulation useful to draw the polytope in a GUI. Other solutions exists such as: lrs [13] but this software accept only rational entries. 2.5. Results and comparison Using the method described above, we computed the set of reconstructed points for two cameras of our humanoid each giving a 340x480 pixels image. The result is depicted in figure 1-right. It is very straightforward to notice that the precision of the reconstruction is decreasing at points far from the center of the cameras. Figure 2 depicts a scene where the humanoid robot is looking at close range to a complex object. The raw reconstructed surface is displayed in an OpenGL windows depicted in figure 3-left. The corresponding image with uncertainty is depicted in figure 3-right. We used a similar approach based on interval analysis [14] to design a pre-attentive behaviour where the robot autonomously find an object of interest inside the environment.
798
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
Figure 2. Unknown object to analyse
Figure 3. (left) Reconstructed surface without uncertainty ellipsoids (right) Reconstructed surface with uncertainty ellipsoids
The polytope representation computed by qhull gives 250 points. We used the Delaunay triangulation functionality to generate the surfaces needed for display as in figure 1-right. The use of cdd reduced the number of points to 132. Also it is interesting for display purposes, this representation is still too high to be used in vision computation. Indeed generally the two cameras projective matrices only are used to model a stereoscopic system. Indeed it is possible to test if a 3D point Q is seen by the stereoscopic system at a given location using the projective matrix of each cameras Pl and Pr . If the projected points qr and ql are inside the image then the 3D point can be perceived in 3D. Thus this test can be done by 4 products of matrices (2 for changing the camera location, and 2 for projecting the point) and 8 conditions. Therefore any method dealing with such kind of problem must add some qualitative information. In this situation, the projective matrix does not give the depth limitation inherent to the discretization of the space. However, during disparity computation it is usual to put some limitation in the depth-of-field to limit the noise due to the background. This can be easily represented by putting a constraint on the distance of the point to the camera system.
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
799
Figure 4. (left) Original scene: situation of the robot. (left) Original scene: view of the robot.
Figure 5. (left) Reconstruction of the scene in figure 4 (right) Space explored after 5 views.
3. Occupancy grid and explored space The world is represented using an octree. This data structure, originally proposed by [15], recursively divided the 3D space. It stores the information detected by the stereoscopic system, the unseen space, and the empty space. The cells of the octree are updated following the usual occupancy grid algorithm as described in [1]. An example of such occupancy grid is given in figure 5-left. An information which is important for a remote operator and to develop an autonomous behaviour is the explored space. In both case, it can be used to decide where to look next. Thus for each view, each point Q of the occupancy grid has to be tested to decide if it has been seen or not. This is done using the projective matrices of the cameras. If the coordinates of the projection ql = Pl Q are inside the images, the point is tested against the range-image. The range-image follows the coordinates of the left image and gives the corresponding 3D point RI(ql ). If the Q is closer then it means that it does not exists anymore and its probability should be decrease. If Q is more distant then it cannot be perceived and stay unknown. The last case is where it correspond to the point perceived i.e. when Q = RI(ql ). This operation has been tested in the context of octree, and append to be quite time consuming (20 seconds in average for one view) partly because its implementation has not been optimised, and
800
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
because several thousand of points are processed each time. Thus for unknown space, 3D occupancy grid as an array are used. Figure 5-right depicts in blue the space explored by the humanoid-robot after 5 views. It can be seen that the perception of some elements in the scene hide some parts of the environment. Other views will be needed to cover those hidden places. It is possible to implement automatic Next Best View algorithm [7]. However as the function to minimise completely depends on the environment, the current common practice is too discretize the environment and chose the view which will minimise the motion to perform, and maximise the covered unknown area.
4. Path planning This part presents a semi-autonomous behavior based on the visual information presented previously which allow to a remote operator to control the humanoid in a high-level manner. In this behavior, the aim was to have a very fast planner in order to be able to replanify between 2 images the path of the robot. This is specially useful for obstacle avoidance in dynamic environment. The main challenges are to planify the foot position and to take into account the subsequent trajectory of the waist. Kuffner in [16] used a Randomised Rapid Tree to generate a path and a dynamic filter to realize a stable trajectory. Chesnutt in [17] perform foot planning using an automata describing all the possible transition from one foot position to the other foot position while the robot is stable. As we are using a HRP-2 humanoid robot with the initial foot planner and the stabiliser described in [18], the planner described here has been tailored for this setup. The approach described here is inspired from [5]. However, we do not consider in this paper 3D information. The main reason is that we do believe that in some situations, specifically where complex motion or even contacts with the environment have to be considered, a more complex planner is needed, and thus outside the scope of this paper. The main point of this planner is to provide to the exploration process a feasible path in real-time. 4.1. Pattern generator The pattern generator provided by [18] allows three kinds of functionalities: Translational motion with change of orientation, motion on an arc, and next step position. Translational motion takes the targeted position and orientation related to the current position. Motion on an arc takes the center of the arc, and the targeted position. In order to decrease the complexity of the search, following [5], the set of possible motions is split into 5 directions: forward, right, left, forward-right, forward-left. Because the cost of a path going backward is worst than a path which goes straight all the time, there is no backward motion. For right, left, forward-right and forward-left, they are several ways to realize the motion depending on the starting direction, and the final direction. Here we consider in general two different modes of walking: forward and sideway. Thus for each node of the grid, we consider a configuration being a pair of one mode and one direction. For each configuration a mask represents the 2D projection of the maximum space to be used by the robot in order to realize the motion. Those masks
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
801
Figure 6. (left) Generated path by the planner described in the text. (right) Sequence of actions for the footstep positioning method
have been build by simulation and represent the boundary in which the motion can be realized. 4.2. A* The algorithm used to explore the 2D grid is A* with 4 kinds of heuristics. Three are very usual : a cost-from-start heuristic, a cost-to-goal heuristic, and an obstacle one. The fourth one is based on the path shape. It allows to favor the creation of straight trajectories. 4.2.1. Path execution We developed a path execution system which either uses the basic functionalities of the original HRP-2, or which can generate foot positioning once the path is found. Because during the path generation the robot space considered is an upper bound, and because this is a 2D map, the steps can be positioned after finding the path. An extension of this planner would be to consider 3D and taking into account for instance stepping over obstacle motions. Once the steps have been planned as depicted in figure 6-right, a stack handler send the next step according to the delay imposed by the preview control described in [9]. This delay is the time needed by the robot to stop in a smooth manner, which is of two steps in HRP-2. 4.3. Experiences For our experience a HRP-2 humanoid robot has been used. It has 30 DOFs, its height is 1m64, and weight 54 kilogrammes [19]. It has two CPU-board each embedding a Pentium III 1 Ghz. One is dedicated to control, while the other one is used for vision. The head of the robot contains 4 cameras. One has a 90 degrees field of view while each of the three others has a 25 degrees field of view. The robot is put in the situation depicted in figure 4-left. It build its own representation of the world by taking several views of the environment. The space coverage has been design using the 3D polytope associate with the 3D sensor system. More precisely, the field of view of the 3D sensor system has been found by searching the points with the minimum and maximum angular coordinates relative to the origin of the polytope frame. More precisely here, this origin is the center
802
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
Figure 7. Execution of the generated path
of the baseline. Numerically we found that our robot has a stereoscopic field of view of 33o degrees in the horizontal plan, and of 33 degrees in the sagital plan. The limitation of the robots head are [−45; 45] for the horizontal plan and [−30; 45] for the sagital plan. Therefore for a complete coverage of the space that can be seen by moving only the head, 9 different viewpoints are necessary. The result put inside the occupancy grid described previously is depicted in figure 4. From the occupancy grid, a 2D map is extracted by projecting all the obstacles on the floor. The remote operator choose the destination on the map represented in figure 6. The map, the starting point and the destination point are transmitted to the path planner and a path is generated as depicted in green in figure 6. The realization of the path is depicted in figure 7.
5. Conclusion We have presented a set of tools which enhance the visual information reported to a remote operator. Those tools allow the operator to see where the 3D vision sensor of a humanoid robot can be applied. It allows also to visualise the precision of the 3D vision sensor by displaying the volume of integration inside an OpenGL window. In order also to improve the efficiency of the remote control of a humanoid robot, we have presented an autonomous behaviour where the robot create its own representation of a 3D world, and in which it can find a path.
References [1] S. Thrun, “Learning occupancy grid maps with forward sensor models,” Autonomous Robots, vol. 15, pp. 111–127, 2003. [2] K. Okada, T. Ogura, A. Haneda, and M. Inaba, “Autonomous 3d walking system for a humanoid robot based on visual step recognition and 3d foot step planner,” in Proc. IEEE Int. Conf. on Robotics and Automations, 2005, pp. 625–630.
O. Stasse et al. / Vision-Based Virtual Information and Semi-Autonomous Behaviours
803
[3] Y. Guan, K. Yokoi, and K. Tanie, “Feasibility: Can humanoid robots overcome given obstacles?” in Proc. IEEE Int. Conf. on Robotics and Automations, May 2005, pp. 1066–1071. [4] F. Kanehiro, T. Yoshimi, S. Kajita, M. Morisawa, K. Fujiwara, K. Harada, K. Kaneko, H. Hirukawa, and F. Tomita, “Whole body locomotion planning of humanoid robots based on a 3d grid mapm,” in Proc. IEEE Int. Conf. on Robotics and Automations, 2005, pp. 1084– 1090. [5] J.-S. Gutmann, M. Fukuchi, and M. Fujita, “Real-time path planning for humanoid robot navigation,” in International Joint Conference on Artifical Intelligence, July 2005. [6] S. Kagami, Y. Takaoka, Y. Kida, K. Nishiwaki, and T. Kanade, “Online dense local 3d world reconstruction from stereo image sequences,” in Int. IEEE Conf. on Intelligent Robots and Systems, 2005, pp. 2999–3004. [7] V. A. Sujan and S. Dubowsky, “Efficient information-based visual robotic mapping in unstructured environments,” The International Journal of Robotics Research, vol. 4, no. 4, pp. 275–293, April 2005. [8] N. E. Sian, K. Yokoi, S. Kajita, F. Kanehiro, and K. Tanie, “Whole body teleoperation of a humanoid robot -a method of integrating operator’s intention and robot’s autonomy.” in International Conference on Robotics and Automation, 2003, pp. 1613–1619. [9] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa, “Biped walking pattern generation by using preview control of zero-moment point,” in IEEE International Conference on Robotics and Automation, ICRA ’03., vol. 2, sept 2003, pp. 1620–1626. [10] R. Hartley and A. Zisserman, Multiple View Geometry. Cambridge University Press, 2003. [11] C. B. Barber, D. P. Bobkin, and H. Huhdanpaa, “The quickhull algorithm for convex hulls,” ACM Transactions on Mathematical Software, vol. 4, no. 22, pp. 469–483, December 1996, http://www.qhull.org. [12] K. Fukuda and A. Prodon, “Double description method revisited,” Lecture Notes In Computer Science, vol. 1120, pp. 91–111, 1995, selected papers from the 8th Franco-Japanese and 4th Franco-Chinese Conference on Combinatorics and Computer Science. [13] D. Avis, “lrs: A revised implementation of the reverse search vertex enumeration algorithm,” Polytopes - Combinatorics and Computation, Ed. G. Kalai and G. Ziegler, Birkhauser-Verlag, pp. 177–198, 2000. [14] O. Stasse, B. Telle, and K. Yokoi, “3d segmentation using interval analysis and pre-attentive behavior for a humanoid robot,” in IEEE Int. Conf. on Robotics and Biomimetics, 2005, pp. 284–289. [15] C. I. Connolly, “The determination of next best views,” in International Conference on Robotics and Automation, 1985, p. 432. [16] J. J. Kuffner, K. Nishiwaki, S. Kagami, M. Inaba, and H. Inoue, “Dynamically-stable motion planning for humanoid robots,” Autonomous Robots, Kluwer Academics publisher, vol. 12, no. 1, pp. 105–118, January 2002. [17] J. Chesnutt, M. Lau, G. Cheung, J. Kuffner, J. Hodgins, and T. Kanade, “Footstep planning for the honda asimo humanoid,” in International Conference on Robotics and Automation, Barcelona, Spain, April 2005, pp. 631–636. [18] S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d linear inverted pendulum mode : A simple modeling of a biped walking pattern generation,” in International Conference on Intelligent Robots and Systems, Maui, Hawaii, Usa, November 2001, pp. 239– 246. [19] K.Kaneko, F.Kanehiro, S.Kajita, H.Hirukawa, T.Kawasaki, M.Hirata, K.Akachi, and T.Isozumi, “Humanoid robot hrp-2,” in Proceedings of the 2004 IEEE International Conference on Robotics & Automation, vol. 2, 2004, pp. 1083–1090.
804
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
N 1 τi N i=0 τmaxi
(N = number of joints, τ = joint torque)
805
806
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
f τ f = −G mg + N k τ = J T f + hg G
Nk ∂P mg = ∂∂P xB hg = ∂ θ
J
N
,
k ⎤
⎡
e01 e02 e03 0 0 0 0 e12 e13 0 ⎥ −e01 0 ⎢ ∈ R12×6 N =⎣ e23 ⎦ 0 −e02 0 −e12 0 0 0 −e03 0 −e13 −e23 T
k = [k0 k1 k2 k3 k4 k5 ] eij
k
k0 , k5
k = [k0 k1 k2 k2 k1 k5 ]
T
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
807
k0 = −e02y k1 − e03y k2 k5 = e02y k1 + e12y k2 eijy
eij
M M
G[MG[M
M
M
M
M
M
M
M
M
CCP[RQUVWTG
M M
M
M
M
M
G[M G[M
DU[OOGVTKERQUVWTG EECPEGNVJGNCVGTCNHQTEG
12
leg f orce norm)
0.08 k1 = −15548
0
2
k1 arm f orce norm/(arm f orce norm + 0.67 k1 = −15235
0
k1 = 10918
k2 = −11968
250[mm]
650[mm]
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids 1
1 arm force norm / (arm force norm + leg force norm)
0.9 average torque ratio
0.8
0.8 force norm ratio
force norm ratio
0.6 0.4 0.2
0.7 0.6 0.5 0.4 0.3 0.2 0.1
0 -16000
-15500
-15000
-14500
-14000
0 -16000
k1
-15500
-15000
-14500
-14000
k1
arm f orce norm/(arm f orce norm + leg f orce norm) k1 1.2 k1=10918, k2=-11968 (min. torque ratio) k1=0, k2=0
torque ratio
1 0.8 0.6 0.4 0.2 0
0
5
10 15 joint number
20
25
1.2 k1=10918, k2=-11968 (min. torque ratio) k1=0, k2=0
1 torque ratio
808
0.8 0.6 0.4 0.2 0
15% 20%
5%
0
5
10 15 joint number
20
25
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
7 6
809
810
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
0.6
0.6 shoulder pitch shoulder roll shoulder yaw elbow pitch
wrist yaw wrist roll wrist pitch
0.4
knee pitch ankle pitch ankle roll
0.4
0.3 0.2 0.1
0.3 0.2 0.1
0 0
1
2
3
4 5 frame
6
7
8
0
9
0.6
0
1
2
3
4 5 frame
6
7
8
9
8
9
0.6 wrist yaw wrist roll wrist pitch
shoulder pitch shoulder roll shoulder yaw elbow pitch
0.5
crotch yaw crotch roll crotch pitch
0.5
knee pitch ankle pitch ankle roll
0.4
torque ratio
0.4
torque ratio
crotch yaw crotch roll crotch pitch
0.5
torque ratio
torque ratio
0.5
0.3 0.2 0.1
0.3 0.2 0.1
0
0 0
1
2
3
4 5 frame
6
7
8
9
0
1
2
3
4 5 frame
6
7
811
R. Adachi et al. / Load Distributed Whole-Body Motion Generation Method for Humanoids
0.6
0.6
shoulder pitch shoulder roll shoulder yaw elbow pitch
0.5
wrist yaw wrist roll wrist pitch
torque ratio
torque ratio
0.3 0.2
0.3 0.2 0.1
0.1
0
1
2
3
5 4 frame
6
7
8
0
9
shoulder pitch shoulder roll shoulder yaw elbow pitch
wrist yaw wrist roll wrist pitch
1
2
3
4 5 frame
6
crotch yaw crotch roll crotch pitch
0.5
7
8
9
knee pitch ankle pitch ankle roll
0.4
torque ratio
0.4 0.3 0.2
0.3 0.2 0.1
0.1 0
0
0.6
0.6 0.5
torque ratio
knee pitch ankle pitch ankle roll
0.4
0.4
0
crotch yaw crotch roll crotch pitch
0.5
0
0
1
2
3
5 4 frame
6
7
8
9
0
1
2
3
4 5 frame
6
7
8
9
812
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Measurement and Simulation Verification of Reflexive Responses to Perturbation during Walking Shahed Sarwar, Wenwei Yu, Masaru Kumagai, Masaki Sekine, Tamotsu Katane, Toshiyo Tamura, Osami Saitou Graduate School of Science and Technology, Chiba University
Abstract. Recently, walking assist systems, such as Functional Electrical Stimulation (FES) for hemiplegic walking assist, have been widely studied, for purpose of improving Activity of Daily Living (ADL) for paralyzed people. However, most of the systems are not available for the real-world environment, where perturbations resulted from uneven terrain, slopes, obstacles, should be dealt with ceaselessly. It is evident that human beings cope with those unpredictable perturbations during walking with reflexes, which cause relatively fixed responsive patterns to perturbations within several 10ms to 200ms. Our purpose was to analyze reflex mechanism qualitatively and quantitatively, and realize artificial reflexes in walking support system for those paralyzed people, whose reflexive system were also impaired to a certain degree because of weakened afferent and efferent neural pathway. In this study, we investigated reflex responses by experiment and computer simulation. We measured EMG and physical motions from unimpaired person during walking with perturbation. The experiment results were compared with the simulation output, in order to elucidate general reflexive mechanism. Keywords. walking, reflexes, Simulation, Functional Electrical Stimulation
1. Introduction Recently, walking assist systems, such as Functional Electrical Stimulation (FES) for hemiplegic walking assist, have been widely studied, for purpose of improving Activity of Daily Living (ADL) for paralyzed people [1]. However, most of the systems are not available for the real-world environment, where perturbations resulted from uneven terrain, slopes, obstacles, should be dealt with ceaselessly. On the other hand, human beings cope with those perturbations, especially when the perturbations could not be predicted or perceived in advance, with reflexes, which cause relatively fixed responsive patterns to perturbations unconsciously within several 10 ms to 200 ms. Reflexes of different functional organs and limbs (e.g., upper limbs [2], hearts [3], and lower limbs), in different contexts (e.g., during flexion/extension [2], during free fall [4], during walking) have been studied in the kinesiology and neuroscience research fields. But the spatiotemporal relation between muscle activity and physical motions is still unknown. Our ultimate goal is to realize artificial reflexes to real-world walking support systems for those paralyzed people, whose reflexive system were also impaired to a certain degree because of weakened afferent and efferent neural pathway. This goal needs both qualitative and quantitative analysis of human reflexive mechanism during walking. However, there is
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
813
no unified theory, nor clear experimental results that could be directly quoted in the discipline of physiology. Our approach includes developing a neuro-musculo-skeletal simulation model which could present conformable behavior to human normal walking and perturbed walking, to investigate the possibility of various hypotheses on neuro-reflexive mechanism. In this paper, we report our research effort to build a basic neuro-musculo-skeletal model, and evaluate the model by comparing its torque output with EMG profiles obtained by recording and processing Electromyographic (EMG) signals of muscle activation patterns during normal walking and perturbed walking. The results were reported with discussion.
2. A Neuro-musculo-Skeletal Simulator Model Computer simulation has been employed as one approach to study the role of afferent information during animal walking [5] and human walking [6][7] and. In [5], virtual reflexes were realized by a set of if-then rules, and the gait of cat walking with and without the virtual reflexes were compared. The results showed that the walking with virtual reflexes was more stable and perturbation-resistive. In [6] and [7], each neuron was expressed by a set of simultaneous differential equations, which originated from [8]. Neurons innervating lower limb muscles were mutually coupled and their oscillations were entrained to each other and the skeletal system controlled by the nervous system presented coordinated motion. The simultaneous differential equations were shown in expression (1).
Ǽnun
un ¦ȁns f max (u s ) n Ǫvn zn FEEDBACK ޓ
Ǽcn vn
vn f max (un ) ޓ
s
f max (Ƕ)
max(Ƕ,0) ޓޓ
(1)
Where un is the inner state of the nth neuron; fmax is the output of the nth neuron; vn is a variable representing the degree of the adaptation of self-inhibition effect of the nth neuron; zn is an external input with a constant rate; ȁ is a connecting weight; and Ǽ and Ǽ’ are time constants of the inner state and the adaptation effect, respectively. This neural expression has also been widely used in other walking simulation, for example [9], [10]. The item FEEDBACK in equation (1) contains the quantities proportional to velocity or reaction force of the skeletal system, so that, interaction between internal neuro-musculo-skeletal system and external world could be expressed in the model. Our simulation model also employed this expression. For each lower limb, one hip joint, one knee joint and one ankle joint were simulated. Two legs shared one hip joint. The framework of the model is illustrated in Figure1. Each circle indicates a nonlinear oscillator, whose output would be used as the torque of the muscle it controlled. The difference of each antagonized muscle pair would be used to drive the skeletal frame. Currently, the bipedal simulation model was confined to move in the sagittal plane. The mass, inertial moment and size of skeletal frame were set according to [6].
814
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
Figure1. Framework of the model
The model was developed using MATLAB version 7.0 software (The MathWorks, USA) and Working Model 2D version 7.0 software (MSC Software, USA). They are coupled by DDE (Dynamic Data Exchange).
3. Results of Simulation Model 3.1. Normal walking Figure2 shows the simulated motion of the legs on level ground. Figure3 shows the height of the hip joint during walking. Figure 4 shows the phase portrait of the angle of right thigh. From figure 3 and figure 4, it is clear that the gait for the first 3 steps is transitional, and only after 3 steps, stable gait was presented. Walking speed is about 5km/h.
Figure2. Simulated normal walking
Figure 3. Height of the hip joint
Figure 4. Phase portrait of the right thigh angle
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
815
3.2. Perturbed walking After stable gait was presented, a slip perturbation was introduced by setting the coefficient of friction of the floor for one side leg to 1/10 of the former coefficient value. Figure 5 shows the simulated motion of the perturbed walking. Figure 6 shows height of the hip. Figure 7 shows the phase portrait of the angle of right thigh.
Figure 5. Generated locomotion
Figure 6. Height of the hip joint
Figure 7. Phase portrait of the angle
The outputs of oscillator u2 (corresponding to extensor of the hip) during normal walking and perturbed walking are shown in Figure 8 and Figure 9. The data between heelstrike to next heel-strike would be extracted and their time axes would be adjusted via interpolation, so that, torque data could be compared with EMG profiles one by one point. In the following description, the adjusted time axis would be expressed as standardized time. A red circle in Figure7 denotes the moment of slip. The difference can be compared with Figure4.
Figure8. Output of oscillator2 at the time of normal locomotion
Figure9. Output of oscillator2 at the time of perturbed locomotion
816
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
It is clear that, the hip extensor tended to response to the slip perturbation, however, it is still not clear whether this response is conformable to human reflexive responses or not. Again, it is clear that this response is not enough for the balance maintenance.
4. Correlation of the action of a simulator and humans In order to investigate the simulation model’s conformity to human responses, EMG profiles of muscle activation patterns and trajectory of body link obtained from the image of video camera during normal walking and slip-perturbed walking were acquired through the following experiments. 4.1. Experiment method 4.1.1. Equipment 8 Dry-type EMG surface differential electrodes (Model DE-2.3, DELSYS) were fitted on the skin surface above the belly of the following muscles: gluteus medius, rectus femoris , vastus lateralis, vatus medialis, semitendinosus, tibialis anterior, gastrocnemius medialis, gastrocnemius lateralis. Two A/D cards (DAQ3024, NI) were used to collect all the sensor data, at a 1.6 kHz sampling rate. The EMG signals were amplified to 2000 times, by the pre-amp equipped in the surface EMG electrodes, and a hand-made EMG amplifier. Perturbation to gait was realized by a split-belt (PW-21, HITACHI), for which the speed of each belt could be independently and accurately controlled from 0-3km/hr., within 270ms (200ms for PC split-belt machine communication and 70ms for speed change), which could be treated as a constant delay to control signal. A video camera (BASLER A601f) was setup fixed near the split-belt to capture joint trajectory, at a 60 fps (frame per second) sampling rate. Markers were attached to hip, knee, ankle joints, besides pastern and toe. 4.1.2. Subjects 10 male subjects, with age 24.8rSD5.1, height 176.9rSD5.7cm, weight 68.2rSD5.9 kg and with no previous history of musculoskeletal or neurological disease, participated in the experiments. The subjects signed their written informed consent and were asked to report their dominant lower limbs prior to the experimental procedures. 4.1.3. Procedure Each subject was asked to walk on the split-belt at a speed of 3 km/hr. The gait of the subject was monitored using x-axis accelerometer fitted to the ankle of the subject. Concretely, the heel strike was detected according to the method described in [11], which decides heel strike by comparing the peak of x-axis of accelerometer fitting to the ankle with a preset threshold. After the gait reaches a steady state, that is, the heel strike to heel strike interval tends to be similar, the perturbation, a speed down (3 km/hrĺ0km/hr) would
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
817
be given to the belt of the dominant side of the subject, at the moment of heel strike of the dominant lower limb. 10 perturbations would be given to a subject in one trial. Then the perturbation would be given to another side of the same subject for 10 times in another trial. That is, for each subject, 2 trials, corresponding to left and right sides, would be conducted. There was a rest for at least 10 minutes between each two trials. For each trial, 10 sets of 5s data centered at the heel strike during normal walking were extracted and averaged to form a normal walking profile M. For each perturbation, a 5s data set centered at the heel strike was extracted to form a perturbation response profile P. 4.2. Conformity of simulated walking to human walking The comparison was made between the simulated torque outputs and EMG profiles. 4.2.1. Normal walking Figure10 shows the output of nonlinear oscillator u10 (corresponding to ankle extensor). Figure11 shows the EMG profile of the gastrocnemius.
Figure10. Output of oscillator10 during normal walking
Figure11. EMG profiles of the gastrocnemius
In both graphs, there is a peak at the center, which stands for the muscle activation for a kick at the end of a stance phase, could be considered as a characteristic action during walking. This comparison shows that the simulation model presents quite similar responses during normal walking. Figure12 shows the averaged joint trajectory of human gait. Figure13 shows the joint trajectory of our simulator. In order to make comparison between these trajectories, the coefficient of correlation of both trajectories were calculated and shown in Table 1. It is clear that human gait and simulated gait are quite similar during normal walking.
Figure12. Body trajectory of human gait
Figure13. Body trajectory of simulator gait
818
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses Table1. Coefficient of correlation of body trajectory during normal walking
㩷 x axis y axis
hip 0.8369 0.8717
knee 0.82555 0.87758
Ankle 0.85656 0.82091
pastern 0.81895 0.76496
toe 0.79123 0.52558
4.2.2. Perturbed walking Figure14 shows the output of nonlinear oscillator u10 (corresponding to ankle extensor). Figure15 shows the EMG profile of the gastrocnemius.
Figure14. Output of oscillator10 during perturbed walking
Figure15. EMG profiles of the gastrocnemius
It is clear that, during perturbed walking, not only the kinematic behavior, but also the torque outputs of the simulation model were unconformable to the human reflexive responses. Again, the coefficient of correlation of both trajectories were calculated and shown in Table 2. For knee, ankle, pastern and toe, either x or y axis coordinates shows minus or low correction. This reveals the unconformity of the kinematic behavior during perturbed walking. Table2. Coefficient of correlation of body trajectory during perturbed walking
x axis y axis
Hip
Knee
ankle
pastern
toe
0.8328 0.81945
0.81152 -0.6861
-0.80381 0.57138
-0.85327 0.50128
-0.86751 0.066192
5. Summary and Future Work In this study, in order to realize an artificial reflex function in walking support system, the reflexive mechanism during walking was investigated by developing a neuro-musculoskeletal simulation model. This paper reported the verification of the simulation model comparing its kinematic and torque outputs with EMG profiles acquired by a set of experiments. Through the comparison, it can be concluded that During normal walking x The simulator could generate stable gait and is dynamically conformable to human walking.
S. Sarwar et al. / Measurement and Simulation Verification of Reflexive Responses
819
x
The transition of the torque output of the simulator is conformable to that of human EMG profiles. During perturbed walking x The simulator could not express the reflexive responses to a slip perturbation. That is, although, the simulation model is conformable to human normal walking, in order to examine reflexive mechanism, the simulation model should be enhanced by adding reflex-related hypotheses then verified with the EMG profiles. This “hypothesize and prove” process should be repeated until responses to different perturbations could be sufficiently interpreted, so that the artificial reflexive function could be realized. Moreover, in order to build an artificial reflex system, a trigger signal from artificial sensors is also necessary. For this purpose, the present model should be improved to express not only lower limbs, but also body trunk and upper limbs, since those body parts also play important roles in reflexive responses to perturbation.
6. Acknowledgement This work was supported in part by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 2004, No. 16300178㧚
7. References [1]
W.Yu㧘H.Yamaguchi㧘H.Yokoi㧘M.maruishi㧘Y.Mano & Y.Kakazu㧦EMG automatic switch for FES control for hemiplegics using artificial neural network㧘Robotics and Autonomous Systems㧘Vol.40㧘 pp.213-224 (2002) [2] Cathers, N. O’Dwyer, P. Neilson, Variation of magnitude and timing of wrist flexor stretch reflex across the full range of voluntary activation, Exp. Brain Res., 157; 324-335 (2004) [3] R. Nakamura, H. Saito, Basic Kinesiology, 4th Ed., ISHIYAKU Publication (1992) [4] R. Bisdorff, A. M. Bronstein, C. Wolsley, M. A. Gresty, A. Davies, A. Young, EMG responses to free fall in elderly subjects and akinetic rigid patients, J Neurol Neurosurg Psychiatry, 66: 447-455 (1999) [5] A. Porchazka, S. Yakovenko, Locomotor control: from spring-like reactions of muscles to neural prediction. In: The Somatosensory System: Deciphering The Brain's Own Body Image. ed. Nelson, R.J. Boca Raton, CRC Press, 141-181 (2001) [6] G.Taga, Y.Yamaguchi, & H.Shumizu: Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment, Biol.Cybern. 65,147-159 (1991) [7] G. Taga, Emergence of bipedal locomotion through entrainment among the neuro-musculo-skeletal system and the environment, Physica D 75, 190-208 (1994) [8] Kiyotoshi Matsuoka: Susteined Oscillation Generated by Mutually Inhibiting Neurons with Adaptation, Biol. Cybern. 52, 367-376 (1985) [9] Naomichi Ogihara, Nobutoshi Yamazaki: Generation of human bipedal locomotion by a bio-mimetic neuromusclo-skeletal model, Biol. Cybern. 84, 1-11(2001) [10] Yasuhiro Fukuoka, Hiroshi Kimura and Avis H. Cohen, Adaptive Dynamic Walking of a Quadruped Robot on Irregular Terrain based on Biological Concepts, International Journal of Robotics Research, Vol.22, No.3-4, pp.187-202 (2003) [11] K. Aminian, K. Rezakhanlou, E. De Andres, C. Fritsch, P. –F. Leyvraz and P. Robert: “Temporal feature estimation during walking using miniature accelerometers: an analysis of gait improvement after hip arthroplasty”, Medical & Biological Engineering & Computing, Vol.37, pp.686-691 (1999)
820
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Toward a Human-like Biped Robot with Compliant Legs Fumiya Iida a,b,1 , Yohei Minekawa a Juergen Rummel a and Andre Seyfarth a a Locomotion Laboratory, University of Jena, Germany b Artificial Intelligence Laboratory, University of Zurich, Switzerland Abstract. Conventional models of bipedal walking generally assume rigid body structures, while elastic material properties seem to play an essential role in nature. On the basis of a novel theoretical model of bipedal walking, this paper investigates a model of biped robot which makes use of minimum control and elastic passive joints inspired from the structures of biological systems. The model is evaluated in simulation and a physical robotic platform with respect to the kinematics and the ground reaction force. The experimental results show that the behavior of this simple locomotion model shows a considerable similarity to that of human walking. Keywords. Legged Locomotion, Biped Walking, Passive Dynamic Walking
1. Introduction In the fields of biomechanics and robotics, bipedal walking has been investigated for our further understanding of versatile locomotion mechanisms of human and robots. Based on the biomechanical investigations, a few different approaches were previously proposed. Bipedal locomotion in artificial systems was firstly engineered by using predetermined trajectories of the leg joints (e.g. [1]). Although this approach demonstrated an outstanding versatility in locomotion behaviors, adaptivity is highly restricted because this approach requires a precise environment model and demanding computational duty for calculating the trajectories. Research of Passive Dynamic Walking has questioned the conventional approach. On the basis of biomechanical models, the so-called “compass gait model" or “ballistic walking model"[2], a number of Passive Dynamic Walkers (PDWs) have been developed and demonstrated natural walking behavior (e.g. [3,4,5,6]). Inspired by the muscle activities in the swing leg during human walking, these models utilize no actuation and purely mechanical pendulum dynamics are used to achieve walking behavior. More recently, a theoretical model, the so-called “spring-mass walking" has been proposed, which demonstrated a walking dynamics with a considerable similarity to that of human [7]. Although the spring-mass model was originally proposed for running behavior [8,9,10], an extension of the original model can be applied for walking behavior, 1 Correspondence to: Fumiya Iida, Andreasstrasse 15, CH-8057 Zurich, Switzerland. Tel.: +41 44 635 4354; Fax: +41 44 635 4507; E-mail: iida@ifi.unizh.ch.
821
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
1.02 y (cm)
y (cm)
1.02 1
0.98 2
3
4
5
6
7
160 140 120 2
3
4
5
6
Ankle (deg)
Ankle (deg)
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
140 120
130
120 110 100 90 2
3
4
5
6
7 (N)
800
L
600 400
GRF
(N)
0.2
160
7
130
L
0 180 Knee (deg)
Knee (deg)
180
GRF
1 0.98
200 0 2
3
4
5
6
7
Time (sec)
(a)
120 110 100 90
800 600 400 200 0 Time (sec)
(b)
Figure 1. Behavior of human walking. Vertical movement of the body (Top figure), angular trajectories of knee and ankle joints (Middle figures), and vertical ground reaction forces (Bottom figure) are shown over time-series 4 leg steps (a) and the data aligned with respect to the ground reaction force (b). Vertical lines in Figure (b) denotes the moment when the swing and stance phases are switched.
which leads to our further understanding of underlying mechanisms of human locomotion. Based on this novel theoretical model, the goal of this paper is to explore a mechanically realistic model of human-like bipedal walking, which can actually be implemented to a robotic platform. With a simple motor control scheme, we analyze how the morphological properties of the leg design can be used for dynamic bipedal locomotion behavior, and compare the behavior of the system with human locomotion. The structure of this paper is as follows. First, we briefly introduce the walking behavior of a human and the spring mass walking model. Then we propose a new locomotion model and test it in simulation and in a robotic platform.
2. Bipedal Walking with Compliant Legs 2.1. Bipedal Walking of Human In order to understand the influence of compliant legs during walking behavior, we firstly investigate the walking behavior of a human subject. In this experiment, we use a treadmill with a set of motion capture system (6 Qualisys motion capture units; sampling frequency of 240 Hz) and force plates (Kistler 9281B11; sampling frequency of 1000 Hz), with which we analyze the kinematics of human behavior as well as the ground reaction force.
822
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
(a)
(b)
Figure 2. (a) Compass gait model and (b) Spring mass walking model. The dashed lines represent the different body excursion in these models.
A human subject was asked to walk on a treadmill at a constant speed of 1 m/sec for 20 seconds. The kinematics and the ground reaction force were analyzed, and we found that there are a few unique aspects of human walking that cannot be explained by the compass gait model as shown in Figure 1. Firstly, there is a significant difference in the vertical movement of the body. While the compass gait model shows the circular trajectory of the body movement around the foot ground contact during a stance phase, the human body excursion is generally more complicated; Vertical position of the body decreases at the beginning of the stance phase, and it increases and decreases; And toward the end of the stance phase, it starts increasing again. An advantage of this movement of the body could be that it has less displacement of vertical oscillation of the body. Secondly, behavior of a knee joint during stance phase is also more complicated than that of the compass gait model; At the beginning of stance phase, the knee angle firstly decreases before a large peak. A possible advantage of the first decrease is that the knee joint could absorb the impact force at the touch-down. Thirdly, the ankle joint shows also a complex behavior although the compass gait model has no ankle joint; There is a small peak at the beginning, and it increases toward the end of stance phase. Finally, the ground reaction force generally shows a M-shape, whereas the compass gait model generally shows a single peak. 2.2. Spring Mass Model for Walking The spring mass model was originally proposed for characterizing running behavior of animals [8,9,10]. The model consists of a body represented as a point mass and a leg approximated by a linear spring. This extremely simple theoretical model has explained a number of eminent features of running behavior in animals including humans’. Recently, this model was extended for walking behavior, which explained a few aspects of human bipedal locomotion including the complex dynamics introduced in the previous subsection [7]. Although the spring mass model for walking shows a significant plausibility as a walking model of human, this theoretical model has to be elaborated for robotic implementation. In the rest of this paper, we explore a mechanically realistic model considering the theoretical spring-mass walking model and the anatomical structure of biological systems.
823
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
Table 1. Specification of the robot. ∗Spring constants are dimentionless.
Figure 3. Simulation model of a biped robot. Only one of the two legs is shown in this figure. The model consists of a joint controlled by a motor (represented by a black circle) and three leg segments which are connected through two passive joints (white circles). Two ground contact points are defined in the foot body segment.
Param. l1 l2 l3 l4 l5 S1 S2 S3 S4 M
Description Thigh Shank Foot Spring Attach. Heel Spring const. Spring const. Spring const. Spring const. Total mass
Value 10 cm 10 cm 4 cm 2 cm 2 cm 100 200 800 450 950g
2.3. Model of a Biped Robot Figure 3 shows the simulation model of a biped robot investigated in this paper. This model consists of 7 body segments, two motors at hip joints with position control, four passive joints in the knee and ankle joints, and 8 linear springs. The springs are implemented as the substitutes for the muscle-tendon systems, which constrain the passive joints. A unique feature of this robot is that 6 of the springs are connected over two joints, which are known as the so-called biarticular muscles in the biological systems (i.e. 4 springs attached between the hip and the shank, 2 are between the thigh and the heel). For the sake of real-world implementation in a robotic platform, the dimension of this model is scaled down as shown in Table 1. There are two ground contact points in a leg defined in this model. In the simulation, we test the model in a level ground surface with a physically realistic interaction model based on a biomechanical study [11]. The vertical ground reaction forces are approximated by nonlinear spring-damper interaction, and the horizontal forces are calculated by a sliding-stiction model. It switches between sliding and stiction when the velocity of the foot becomes lower or higher than the specified limit. We used 0.55 and 0.75 for the sliding and stiction friction coefficients, respectively. For control of the motors, we employed a simple oscillation, in which the angular position of the hip joint is determined by the sinusoidal curve as follows: Pr (t) = Asin(ωt) + B
(1)
Pl (t) = Asin(ωt + π) + B
(2)
where there are three parameters of amplitude A, frequency ω and offset B. By using this simple control scheme, we are able to evaluate how the morphological constraints can contribute to walking behavior. This model is implemented in a planar space for the
824
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
(a)
(b)
Figure 4. Stick figures of the walking behavior in simulation. (a) The behavior of one leg during four leg steps, and (b) the behavior of two legs during a single step.
0.2 y (cm)
y (cm)
0.2 0.195 0.19 0.185 2.5
3
3.5
4
0.195 0.19 0.185 -0.1
160
140 3
3.5
4
200
Ankle (deg)
Ankle (deg)
2.5
190 180 170
2.5
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
0.4
3
3.5
0
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
160
140 -0.1 200 190 180 170
4
-0.1
40
40 GRF (N)
GRF (N)
0
180 Knee (deg)
Knee (deg)
180
20
0 2.5
3
3.5
Simulation Time (sec)
(a)
4
20
0 -0.1
0
Simulation Time (sec)
(b)
Figure 5. Walking behavior of the simulated robot. Vertical movement of the body (Top figure), angular trajectories of knee and ankle joints (Middle figures), and vertical ground reaction forces (Bottom figure) are shown over 4 leg steps (a) and the data aligned with respect to the ground reaction force (b).
sake of simplicity, thus no rotational movement of the upper body (i.e. hip segment) is considered.
3. Experiments This section explores dynamic locomotion behavior of the proposed model in the simulation and discusses how the model can be implemented in a robotic platform.
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
(a)
825
(b)
Figure 6. (a) Schematics of the robot design, (b) photograph of the robot with the supporting rotational bar.
3.1. Simulation Result We constructed a simulation model of the proposed biped robot model in Mathworks Matlab 7.01 together with SimMechanics toolbox. In order to achieve biologically plausible locomotion behavior, we made use of hip-motor control parameters according to the walking behavior of human as follows: A = 20 degrees, B = 5 degrees, and ω = 2 Hz. With these control and morphological parameters, the system exhibits relatively stable walking behavior. Even without any sensory feedback, the system exhibits a stable locomotion process regardless of initial conditions. Figure 4 shows a sequence of single leg movement during a few leg steps and the movement of two legs during one leg step. There are a few points to be mentioned. Firstly, even without actuation, the natural movement of passive joints can be achieved due to the morphological constraints of segment length and elasticity. Secondly, the body (i.e. the upper end of the shank segment) shows a sinking movement at the beginning of stance phase because of the geometrical constraints of foot segment and the knee bending at the touch down. These points are more clearly shown in Figure 5. The behavior of this simulation model resembles to the behavior of human locomotion explained in the previous section; Vertical rise of the body starts after the leg touch-down and before the leg lift-off; Significant knee flex at the beginning of stance phase; Extension of ankle joint toward the end of stance phase; Multiple peaks in the ground reaction force. 3.2. Robot Experiments The simulation model is now implemented in a physical robot platform as shown in Figure 6. This robot consists of passive joints in knees and ankles, and two commercial servomotors (Conrad HS-9454) are used in the hip joints, as in the simulation model. We used four tension springs and rubber materials at the ground contact point in order to gain ground friction and to avoid impact force at touch-down. The same control parameters were used to conduct a set of experiments. Then behaviors were measured by both a
826
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
y (cm)
0.19
0.18 0.17 0.16
Ankle (deg)
Knee (deg)
0.15 0
0.2
0.4
0.6
0
0.2
0.4
0.6
0
0.2
0.4
0.6
0.2
0.4
0.6
180 160 140 120 100 80
180 160 140 120
GRF (N)
4 2 0 0
Time (sec)
(a)
(b)
Figure 7. (a) Leg trajectory of the biped robot during one step. Black and gray lines depict right and left legs, respectively. (b)Body excursion, angular trajectories of the knee and the ankle joints and the ground reaction force of the robot during a few leg steps (aligned with respect to the ground reaction force).
standard motion capture system and a force plate (the same setup mentioned earlier) for the detailed analysis. Figure 7(a) shows a stick figure of a typical locomotion behavior and more detailed trajectories are shown in Figure 7(b). Although the kinematics of the locomotion behavior is not as stable as that in simulation, we can clearly see the salient features of the proposed model (i.e. the body excursion, the knee flex at the beginning of the stance phase, and the multiple peaks in the ground reaction force).
4. Conclusion This paper presented a novel control scheme of bipedal locomotion and a set of experimental results in simulation and a robotic platform. Compared to the conventional PDWs, it is shown that the dynamic behavior of the proposed model is significantly comparable to that of human, although it was tested only in planar environment (i.e. yaw, pitch, and rotation movement are fixed).
F. Iida et al. / Toward a Human-Like Biped Robot with Compliant Legs
827
The significance of the achievement presented in this paper is twofold. Firstly, it is shown that biologically plausible walking dynamics can be achieved by using an extremely simple control if the leg design takes mechanical constraints into account. We expect that this model could help understanding some of the underlying mechanisms of human bipedal walking. Especially it would be interesting to explore further to what extent the behavior of the springs can be comparable to the muscle activities of a human during walking. According to a prosthetic research, such a leg design as proposed in this paper seems to play a significant role in human walking [12]. Secondly, the approach utilizing compliant properties in the leg can be extended to running behavior, as previously demonstrated by the spring-mass model. It would be particularly interesting to extend the same leg design for both walking and running behaviors. The transition between both gaits can also be investigated along the same line of research by using the proposed model.
Acknowledgements This research is supported by the German Research Foundation (DFG, SE1042) and the Swiss National Science Foundation (Grant No. 200021-109210/1).
References [1] Vukobratovic, M. and Stokic, D.: Dynamic Control of Unstable Locomotion Robots, Mathematical Biosciences, 24, 129-157, 1975. [2] McMahon, T. A.: Muscles, reflexes, and locomotion, Princeton University Press, 1984. [3] McGeer, T.: Passive dynamic walking, The International Journal of Robotics Research, Vol. 9, No. 2, 62-82, 1990. [4] Wisse, M. and van Frankenhuyzen, J.: Design and construction of MIKE: A 2D autonomous biped based on passive dynamic walking. Proceedings of International Symposium of Adaptive Motion and Animals and Machines (AMAM03), 2003. [5] Collins, S. H., Wisse, M., and Ruina, A.: A three-dimentional passive-dynamic walking robot with two legs and knees. International Journal of Robotics Research 20, 607-615, 2001. [6] Collins, S., Ruina, A., Tedrake, R., and Wisse, M.: Efficient bipedal robots based on passive dynamic walkers, Science Magazine, Vol. 307, 1082-1085, 2005. [7] Gayer, H., Seyfarth, A, and Blickhan, R.: Spring-mass model for walking, (in preparation). [8] McMahon, T. A., Cheng, G. C.: The mechanics of running: How does stiffness couple with speed?, J. Biomechanics, Vol. 23, Suppl. 1, 65-78, 1990. [9] Blickhan, R.: The spring-mass model for running and hopping. J. Biomech. 22, 1217-1227, 1989. [10] Seyfarth, A., Geyer, H., Guenther, M., Blickhan, R.: A movement criterion for running, Journal of Biomechanics, 35, 649-655, 2002. [11] Guenther, M.: Computersimulationen zur Synthetisierung des muskulaer erzeugten menschlichen Gehens unter Verwendung eines biomechanischen Mehrkoerpermodells. PhD Thesis, University of Jena, 1997. [12] Herr H.: Variable-mechanical-impedance artificial legs, Regular Patent Application, (in preparation).
This page intentionally left blank
Part 14 Service Robotics and Human Support
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
831
Using JavaSpace for a PEIS Ecology BeomSu Seo
Mathias Broxvall Marco Gritti Alessandro Saffiotti JungBae Kim bsseo, jjkim @etri.re.kr , mbl, mgi, asaffio @aass.oru.se ETRI(Electronics and Telecommunications Research Institute, Korea) AASS Mobile Robotics Laboratory, University of Orebro, Sweden Abstract. The ecology of Physically Embedded Intelligent Systems (P EIS ) is a new multirobotic framework conceived by integrating insights from the fi elds of autonomous robotics and ambient intelligence. A P EIS -Ecology is a network of intelligent robotic devices that can provide the user with assistance, information, communication, and entertainment services. In this paper we introduce the concept of P EIS Ecology, and we investigate about the use of JAVA S PACE to build a middleware infrastructure that meets its special requirements. At the end, we illustrate a concrete realization of a P EIS -Ecology we implemented using JAVA S PACE as a communication middleware. Keywords. Multi-robot system, Ecological robotics, Ambient intelligence, Distributed architecture, Service robotics, JAVA S PACE, J INI
1. Introduction By virtue of the prodigious developments in embedded systems, ambient intelligence and autonomous robotics, in a decade or two we will live in the world of the Physically Embedded Intelligent Systems (P EIS), which will improve the quality of life for every citizen by providing physical and/or cognitive assistance [1]. P EIS do not exist and operate in isolation: they can cooperate and communicate in an ecology, both inside and outside our houses. The overall functionality of the full system emerges from the interaction between a number of simpler units. The functionality of each individual unit is improved by this interaction. As an example, an hypothetical table setting robot waiter will not need to use its vision sensors to detect the position, shape, and weight of the various objects in order to grasp them. But every object, enriched with an micro-P EIS, would hold this information and communicate it to the robot. Most current approaches to building a ”robot companion” aim at building one isolated robotic device (often humanlike) empowered with extraordinary abilities for perception, action, and cognition (e.g., [2] [3]). By contrast, the P EIS-Ecology approach redefines the very notion of a robot to encompass the entire environment. Perception and manipulation of objects are thus replaced, or at least complemented, by direct communication between sub-systems in the environment. In the P EIS-Ecology vision, the robot will disappear in the environment quite in the same way as computers should disappear according to the well known vision of ubiquitous computing. 1 Correspondence to: BeomSu Seo, 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea Tel.: +82 42 860 6242 ; Fax: +82 42 860 6790; E-mail: [email protected]
832
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
The P EIS-Ecology inherits from the idea of ubiquitous computing for distributing resources over adequate systems spread into the working environment. In every application field of computer science where distributed computation is needed, the presence of a middleware is a point of discussion. Middlewares built specifically for sensor networks [4] systems and for ambient intelligence [5] systems are currently a topic of research and development [6] [7]. In the robotics community, when service distribution is needed, the focus is instead on exploiting general purpose middlewares, as it happens in projects like OROCOS [8] and Miro [9], both based on the TAO/ACE [10]. The possibility to use UPnP [11] for such kind of ubiquitous robotics applications has been analyzed [12] too. On the other hand, recent works in cooperative robotics show that advances in multi-agent coordination techniques can still be made without exploiting any middleware, but just relying on simple message passing at application level. The topic of deciding whether the P EIS-Ecology needs a dedicated middleware, or if it exists a general purpose middleware that can suit all its requirements is still open. In this paper we will concentrate on identifying the peculiarities of the P EIS-Ecology network, as a distributed computing environment, and on describing why J INI [13] enriched with the JAVA S PACE [14] service can constitute an adequate middleware for it. In the subsequent sections we introduce the basic concept of P EIS-Ecology, analyze the peculiarities of this networked eco-system, show how J INI and JAVA S PACE can meet our requirements, and detail an implementation of the P EIS-Ecology using this platform. Before concluding, we describe the experiment we have run. 2. The concept of P EIS-Ecology The concept of P EIS-Ecology builds upon the following ingredients. First, any robot in the environment is abstracted by the uniform notion of P EIS: any device incorporating some computational and communication resource and possibly able to interact with the environment through sensors and/or actuators. A P EIS can be as simple as a toaster and as complex as a humanoid robot. In general, we define a P EIS to be a set of inter-connected software components residing in one physical entity. Each component may include links to sensors and actuators that connect it to the physical environment, as well as input and output ports that connect it to other components in the same P EIS or in other P EIS. Second, all P EIS are connected by a uniform communication model, which allows the exchange of information among the individual P EIS, and can cope with their dynamic joining and leaving the ecology. Third, all P EIS can cooperate by a uniform cooperation model, based on the notion of linking functional components: each participating P EIS can use functionalities from other P EIS in the ecology in order to compensate or to complement its own. The power of a P EIS-Ecology does not reside so much in the power of the individual P EIS in it, but rather it emerges from their ability to interact and cooperate. We define a P EIS-Ecology to be a collection of inter-connected P EIS, all embedded in the same physical environment. We call configuration the set of connections between components within and across the P EIS in the ecology. Note that the same ecology can be configured in many different ways depending on the current context. Relevant contextual aspects here include the current goals, situation, and resources. Stated in these terms, a P EIS-Ecology redefines the very notion of a robot to encompass the entire environment: a P EIS-Ecology may be seen as a “cognitive robotic
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
833
Figure 1. A simple example of cooperation in a P EIS -Ecology.
environment”, in which perception, actuation, memory, and processing are pervasively distributed in the environment. As an illustration of these concepts, consider a simple autonomous vacuum cleaning robot. This can be seen as a P EIS, which includes simple functionalities for reactive navigation and obstacle detection. By itself, this robot cannot devise and execute complex cleaning strategies because its perceptual abilities are too simple to reliably estimate its own position in the home. Suppose, however, that the home is equipped with an ambient monitoring system using a set of cameras, which in addition to its other functions is able to estimate the position of the vacuum cleaner. Then, we can combine the monitoring system and the vacuum cleaner into a simple P EIS-Ecology, in which the former provides the latter with a global localization functionality. The vacuum cleaner can use this functionality to realize a smarter cleaning strategy(See Figure 1). Suppose next that the vacuum cleaner finds an unexpected parcel on the floor. The cleaner needs to know its properties (e.g., weight and fragility) in order to decide weather it can push it away or not. These properties can hardly be estimated using the robot’s sensor. However, if the parcel is equipped with an IC tag holding information about its content, then it can be seen as a P EIS, which can join the ecology and deliver this information directly to the robot. 3. Requirements for a P EIS-middleware The basic building blocks of a P EIS-Ecology are P EIS-components: the set of available software components implementing the robotic algorithms (perception, modeling, deliberation, control, etc.). These components can generally be seen as logical processes. In order to allow the interaction between P EIS-components, we need a P EISmiddleware: a software layer between the network abstraction provided by the operating system and the P EIS-components themselves. This middleware should support flexible and dynamic communication among P EIS along with various services such as identification, authorization, directories, naming, security, etc. By exploiting the P EIS-middleware, P EIS-components can transparently communicate to each other without re-implementing the mechanisms needed by all these services. In this section, we briefly describe the characteristics of a P EIS-Ecology which are critical for the choice of an adequate P EIS-middleware. High adaptation to change. A P EIS-Ecology can be constituted by many components, and any P EIS-component can both enter and leave a P EIS-Ecology without notification; so the P EIS-middleware should allow communication among these components to dynamically change according to the running context. Therefore, the technology chosen to implement P EIS-Ecology should be based on scalable and flexible adaptive networks.
834
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
Loosely coupled communication. P EIS-Ecology is data centric and not communication centric: this means that there is no direct method call or message passing between source and target components. The vacuum cleaner can use both its local navigation component and the global monitoring component based on running environments. Or, as a new P EIS-component enters into this ecology, it may provide navigation data to the cleaner. Under these circumstances, it is not only difficult to fix every connection path in advance but this would also not follow the philosophy of the framework. Distributed coordination-based model. In a P EIS-Ecology, what each component is interested in is which data it receives, not from whom it receives them. Since data sources could be changed according to the running context, we need a distributed coordinationbased communication model [15]. This would provide not only loosely coupled communication but also anonymity between components. With this model, a component can publish its data without any information about the clients and a client can just receive a notification about presence of the data interesting for him. Security support. Security is another very important technical issue if we need to protect and separate an ecology from unwanted -possibly dangerous- interactions with other sources. Security topics in P EIS-Ecology are data encryption/decryption, authentication, and authorization. Authentication consists in deciding whether or not a PEIScomponent can access a P EIS-Ecology. Authorization consists in allowing or forbidding a P EIS-component to read and/or write data into different parts of the ecology. 4. Middleware for P EIS-Ecologies One excellent P EIS-middleware candidate is J INI. It is the distributed dynamic networking architecture developed by Sun Microsystems. Within the middleware services provided by J INI, the most interesting one from the point of view of P EIS-Ecology is JAVA SPACE . P EIS -component may exchange data by calling directly J INI and JAVA S PACE APIs, but for the purpose of extensibility and flexibility we decided to abstract from the middleware specific API by means of an interface that we standardized for our purposes. We call this additional layer the P EIS-kernel, which provides a common abstraction to allow each component to exchange data hiding the existence of J INI and JAVA S PACE. In this way, P EIS-Ecology is independent from the choice of the middleware. In this section, we briefly discuss what are J INI and JAVA S PACE, and how they work; then we describe the architecture of the actual P EIS-middleware we implemented by means of these technologies. 4.1. J INI and JAVA S PACE J INI : The distributed, dynamic networking architecture. J INI provides a set of application programming interfaces as well as a network protocol. J INI is an open architecture that can be used to build adaptive networks in dynamic computing environments. The objectives of J INI are to provide an infrastructure to connect anything, at any time, anywhere, and enable network plug-and-work. This means that every service can join the network and become available to other applications without any manual installation or configuration. In order to use J INI, there should be at least one lookup server in the network. A lookup server is a service repository, where each service (such as JAVA SPACE ) is stored as a Java object and is made available to clients for downloading on demand. A service provider can register its services into a lookup server after having
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
835
Figure 2. An overview of P EIS -middleware implementation.
found it through a multicast message on the network it joined. A client can download the service proxy using the same discovery protocol and then directly communicate to the service provider. Usually, the downloaded service object is a Java RMI stub which delegates method calls and forwards messages to the actual server. J INI features a distributed event mechanism, so clients can register to interesting events and then receive the corresponding notifications. JAVA S PACE : Generative communication storage. One of the interesting services of the current J INI distribution is JAVA S PACE, which is an implementation of generative communication [16]: a coordination-based model. The main idea of generative communication is that distributed processes use a shared persistent database of tuples. A tuple is a data structure with a number of typed fields. A tuple is inserted into JAVA S PACE by means of a write operation. JAVA S PACE provides associative access to data, so a client can read/take a desired tuple issuing a template-based query on a JAVA S PACE. The template tuple matches a tuple in a JAVA S PACE if their non-null value fields have exactly the same values. To control the access to JAVA S PACE, the JAAS (Java Authentication and Authorization Service) [17] mechanism could be used. 4.2. P EIS-middleware implementation using JAVA S PACE The basic role of P EIS-kernel and P EIS-middleware is to allow each P EIS-component to show its interest in special tuples with subscription mechanism and get tuple with notifications about the interesting tuple published by other P EIS-components. The P EISkernel provides several APIs for a P EIS-component to get/set a tuple and subscribe its interest. Each P EIS-component exchanges tuples which are restricted to be of the form : . For example, the monitoring system in Fig. 1 can send the position tuples to the cleaner which has subscribed its interest with AbsolutePos key through JAVA S PACE. Because the P EIS-middleware with JAVA SPACE makes it possible for a P EIS -component to get any data not taking care of who creates it, it provides not just only a shared storage for tuples but also loosely coupled communication as well as anonymity between components. The implementation of the P EIS-middleware using JAVA S PACE consists of three parts; JavaSpacePlug running under P EIS-kernel, PeisKernelBroker, and PeisJavaSpaceServer. The JavaSpacePlug is bridge which instantiates the Java VM and delegates requests/responses between the P EIS-components and the PeisKernelBroker. And it allows the P EIS-kernel to use JAVA S PACE throughout the PeisKernelBroker, which acts as a
836
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
client to the PeisJavaSpaceServer. The PeisKernelBroker is a Java class which is responsible for reading and writing tuples using a JAVA S PACE proxy. Finally, the PeisJavaSpaceServer is the P EIS-middleware server which makes the JAVA S PACE and the PeisSecurityService available to the P EIS-Ecology. Fig. 2 gives the general overview of this implementation of the P EIS-middleware. The PeisJavaSpaceServer creates the PeisSecurityService to register its proxy into a lookup server. The PeisSecurityService creates the Authenticator and the Authorizer to checks component identification and method permission to write, read, and subscribe. The PeisSecurityService provides the PeisKernelBroker with a login method and returns a JAVA S PACE proxy only if the login succeeds. The PeisOutriggerWrapper intercepts operations that access the JAVA S PACE, and performs security check, logging, and multi-thread processing of tuples. In order to identify a P EIS-component during login, JAAS could be used. After the PeisKernelBroker has found the lookup server and downloaded the security service proxy, it logs in with a password or a special certificate. The security service asks the Authenticator for client identification at login time. Authentication can be performed with different security mechanisms depending on the actual implementation of the Authenticator interface. If the client P EIS-component is trusted, the security service returns to it a JAVA S PACE proxy. Whenever the component will read/write tuples, its access permission will be checked by the PeisOutriggerWrapper through the Authorizer which in turn will verify such permissions by checking the security properties. 5. An Experimental Realization of P EIS-Ecology In order to verify the viability of the P EIS-Ecology framework, and to test our P EISmiddleware implementation, we have developed an experimental test-bed, into which run our experiments. This is the P EIS-Home: an home-like environment that incorporates a number of P EIS. Some of the P EIS included into the P EIS-Home are: The Thinking Cap P EIS-component [18] which is an architecture for autonomous robot control based on fuzzy logic. It provides a mobile robot with advanced navigation functionalities. It responds to tuples with key ExecGoal, which provide a navigation goal. It produces tuples with keys Odometry, ExecStatus, etc, which reflect the robot status. The PTLplan P EIS-component [19] which is a conditional planner used for highlevel task planning. It can reason about uncertainty and about observation actions. It responds to goal tuples with key Task and information tuples with keys WorldStatus and ExecStatus. It produces tuples with keys ExecGoal to indicate new high-level actions to be performed (typically by the ThinkingCap). The Player P EIS-component [20] which is a P EIS-wrapped instance of the player program. It provides a low-level interface between the robot’s sensors and actuators and the P EIS-Ecology’s tuple-space. It reads robot command tuples (e.g., VelCommand and TalkCommand), and generates sensor reading tuples (e.g., Odometry and SonarRange). The localization P EIS-component which is a simple local positioning system which uses the web-cameras mounted on the ceiling to track a colored object using triangulation, and produces AbsolutePos tuples. In our experiments, we have equipped the tracked robot with a colored “hat”. The P EIS-Ecology monitor which can be used by the experimenters to dynamically observe, and possibly influence, the internal state of the P EIS-Ecology. And two real robots are;
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
837
1. Emil sends commands to Pippi 2. Pippi approaches the bedroom
3. Pippi exits the bedroom
4. Task completed
Figure 3. Four snapshots taken during an actual run.
The Pippi P EIS, a Magellan Pro robot with the ThinkingCap P EIS-component and the Player P EIS-component. The Emil P EIS, a Magellan Pro robot with the ThinkingCap, the Player, and the PTLplan P EIS-components. We assume here that Emil is a master robot and Pippi is one of the robot workers within the P EIS-home environment. One of the experiments we performed in the P EIS-Home considers the scenario in which a robot has to wake up Johanna, the hypothetical inhabitant of the P EIS-home. This experiment develops as follows. 1. Emil receives the high-level goal to wake up Johanna. She decides to delegate the execution of this task to Pippi and generates a plan for Pippi using PTLplan. The plan consists of three actions: GoTo(bed), Talk(wakeup), GoTo(sofa). 2. Emil sends the first action with ExecGoal tuple to Pippi, who execute it along with the information provided by the tracking P EIS for better localization. 3. If Pippi occasionally exits the field of view of the cameras, she reverts to using the (unreliable) self-location information from its internal odometry getting AbsolutePos tuples. 4. Pippi constantly publishes her execution status with ExecStatus tuples and receives through the JAVA S PACE the next actions issued by Emil. 5. After Pippi has reached the bed and woken up Johanna getting TalkCommand tuples, she moves to the sofa. 6. After reaching the sofa, she notifies Emil about the completion of the last action. All the above communications and synchronizations happen by the exchange of the relevant tuples through the JAVA S PACE, and are mediated by the P EIS-kernel. Fig. 3 shows four snapshots taken during an actual execution of this experiment. 6. Conclusions P EIS-Ecology offers a new paradigm to develop pervasive robotic applications that combines ideas and technologies from the fields of autonomous robotics and ambient intelligence. The major achievement of this approach will be the possibility to create real dynamic intelligent environments using the cognition, perception, and actuation capabilities of today’s state-of-the-art robotic devices. In order to let intelligence emerge from cooperation, the P EIS-Ecology needs a P EIS-middleware which is compliant with the
838
B.S. Seo et al. / Using JavaSpace for a PEIS Ecology
frequent changes in the ecology configuration and that allows uncoupled and distributed communication along with security support. Probably, J INI is one of the best candidate architectures to fulfill these requirements. To show the feasibility of P EIS-Ecology based on J INI and JAVA S PACE we built the P EIS-Home, which is equipped with mobile robots, a web-cam localization system, and a P EIS monitoring system. In the future we will extend our experiments to include situations involving a large number of P EIS, higher dynamicity, and dynamic configuration and re-configuration. Acknowledgments This work has been supported partly by the Swedish KK Foundation, and partly by ETRI (Electronics and Telecommunications Research Institute, Korea) through the project “Embedded Component Technology and Standardization for URC (2004-2008)”.
References [1] A. Saffi otti and M. Broxvall. Peis Ecologies: Ambient Intelligence meets Autonomous Robotics. Proc. of the sOc-EUSAI conference on Smart Objects and Ambient Intelligence. Grenoble, FR, October 2005. [2] The Cogniron Consortium. The Cognitive Robot Companion. www.cogniron.org [3] The Honda Humanoid Robot ASIMO http://world.honda.com/ASIMO/ [4] D. Estrin and D. Culler and K. Pister, and G. Sukhatme. Connecting the Physical World with Pervasive Networks. IEEE Pervasive Computing, Vol 1, No. 1, 2002. [5] IST Advisory Group. Ambient Intelligence: from vision to reality. In G. Riva and F. Vatalaro and F. Davide and M. Alca˜niz, editors, Ambient Intelligence. IOS Press, 2005. [6] E. Souto, G. Guimar˜aes, G. Vasconcelos, M. Vieira, N. Rosa, and C. Ferraz. A messageworkshop on Middleware oriented middleware for sensor networks, In Proceedings of the for pervasive and ad-hoc computing, Toronto, Ontario, Canada, 2004. [7] D. Arregui and C. Fernstrom and F. Pacull and J. Gilbert. Stitch: Middleware for ubiquitous applications, In Proc. of the Smart Objects Conference., Grenoble, France, 2003. [8] Katholieke Universiteit Leuven. The Open Robot Control Software project. www.orocos.org. [9] H. Utz, S. Sablatnog, S. Enderle, and G. Kraetzschmar Miro- middleware for mobile robot applications. IEEE Transactions on Robotics and Automation, Vol XX, No. Y, June 2002. [10] Center for Distributed Object Computing. Washington University. TAO: A high-performance, real-time Object Request Broker (ORB). www.cs.wustl.edu/ schmidt/TAO.html [11] The UPnP Forum. Universal Plug and Play. www.upnp.org [12] S. Ahn, J. Kim, K. Lim, H. Ko, Y. Kwon, and H. Kim. UPnP Approach for Robot Middleware. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, April 2005. [13] Jini technology homepage. http://www.jini.org [14] Eric Freeman, Susanne Hupfer, and Ken Arnold. JavaSpaces Principles, Patterns, and Practice. Pearson Education, 1999. [15] A. Tanenbum, M Steen. Distributed Systems - Principal and Paradigms, Prentice Hall, 2002 [16] Gelernter, D. Generative communication in Linda. ACM Transction on Programming Languages and Systems. 7, 1 (1985) pp. 80-112. [17] Java Authentication and Authorization Service. http://java.sun.com/products/jaas/index.jsp [18] A. Saffi otti and K. Konolige and E. H. Ruspini. A multivalued-logic approach to integrating planning and control. Artificial Intelligence, 76(1-2): 481–526, 1995 [19] Karlsson, L. Conditional progressive planning under uncertainty. In Proc. of the 17th IJCAI Conf., 2001. [20] The Player/Stage Project. http://payerstage.sourceforge.net/.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
839
A New Heating Method for the Actuation of the Shape Memory Alloy (SMA) Actuator Chee Siong LOH, Kojiro MATSUSHITA, Hiroshi YOKOI and Tamio ARAI Department of Precision Engineering, University of Tokyo Abstract: This paper describes a new control method for the shape memory alloy (SMA) actuator implementing a high voltage pulse heating method consists of high voltage short interval pulses in the millisecond order to actuate the alloy in an on-off manner. This proposed method replaces the conventional heating method that heats the alloy in a gradual manner, resulting in heat accumulation in the surrounding air. When cooling occurs, the heat transfer from the SMA to the environment becomes stagnant and slow, resulting in the accumulation of heat in the alloy when subsequent heating occurs. This excessive heat causes the alloy to remain in the form of the austenite structure, which is less elastic in nature compared to the low temperature crystalline structure of martensite. This phenomenon attributes to the slowness of the SMA in its control response speed, which is a major setback in the application of this alloy. Our proposed method optimizes the amount of applied heat, and therefore excessive heating of the SMA can be prevented. To verify the advantages of our method, we present experimental results for both conventional and proposed heating methods Keywords: SMA, high voltage pulse heating, duty ratio, sensory feedback
Introduction Shape memory alloys (SMA), are materials that exhibit the characteristics of both the shape memory effect and the pseudo-elasticity effect [1]. The shape memory effect involves a temperature change in the phase change. When it is at low temperature, the SMA mostly remains in the form of the crystalline structure of martensite, which displays an elastic nature. When heated, the crystalline structure transforms to the austenite structure, which is less elastic thus strain induced to the SMA at the lower temperature martensite phase can be recovered in the austenite phase [2]. The pseudoelasticity is an effect where, phase change from martensite to austenite occurs without the change in temperature but rather in the increase of load. As unloading takes place martensite phase is restored, which can be observed in its plasticity behaviour. At present pseudo-elasticity of the SMA is widely used for example in spectacle frames, brassieres, corrective braces and cell-phone antennas. The shape memory effect of the SMA has long been applied in the field of robotics [3-5]. SMA is used because of its small-size characteristic as an actuator, its large recovery force and corrosion resistivity. One major disadvantage is their slow control response speed. This phenomenon can be attributed to excessive heating, slow heat dissipation from the alloy to the environment and high tensile strength in the SMA.
This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-inAid for Scientific Research (B), 2004, No.16360118.
840
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
In the field of hardware research, Gorbet and Russell [6] proposed the usage of a mobile rotating heat sink to facilitate the dissipation of heat from the SMA to the surrounding. However, the heat sink is itself bulky and entirely supported by a shaft. Rotational inertia prevents the heat sink to stop justly upon coming in contact with the SMA, which can damage the wire. Kuribayashi [7] directly measured the SMA temperature using a copperconstantan thermocouple to determine the safe amount of current applicable to the SMA, without initially limiting the current, prior to actuation. Featherstone and Yee [8] proposed the usage of resistance as an indirect temperature measurement for the SMA. A maximum safe heating current was designed to prevent overheating of the alloy. Selden, Cho and Asada [9] proposed segmentation heating and cooling of the SMA wire using Peltier devices. Intermediate hold temperatures were also introduced to reduce latency time of the alloy.
High voltage pulse heating
High voltage pulse heating
Conventional heating
Voltage (V)
Voltage (V)
time (s)
Martensite phase
Austenite phase time (ms)
(a)
High temperature
Martensite phase
Low temperature
Austenite phase
(b)
Fig.1 The effects of (a) conventional (gradual) heating and (b) high voltage pulse heating have on the atomic rearrangement of SMA upon heating
In this paper, we propose a high voltage pulse heating method for the control of the SMA. A high voltage of very short interval pulse in the millisecond order is applied for each heating of the SMA in an on-off manner. This proposed pulse control makes heating with high power and low energy possible. In the following experiments and results, we show that high voltage pulse heating actuation can help achieve the same amount of strain with a smaller amount of energy compared to using the conventional gradual heating method. Simultaneously by implementing high power heating, the rise time for the control response of the SMA can be shortened and due to the usage of a smaller amount of energy, a minimal rise in temperature is expected. During cooling, only a small amount of heat energy needs to be channeled out from the SMA to the environment, therefore decreases the settling time. We also believe that a portion of the energy supplied when using high voltage pulse heating, leads to the direct conversion of electrical to potential energy in the SMA without going through the heat phase, contrary to conventional belief that heat is the single element in SMA actuation, as to what gradual heating is associated with. Very fast heating of the SMA, in the form of electrical pulses to the alloy, a portion of the electrical energy is absorbed for the rearrangement of its atomic structure in the phase change to the higher potential phase of austenite (Fig.1). Therefore, the same amount of strain obtained when heated gradually, can also be obtained at a lower heat energy phase when heated very quickly. In other words, straining the SMA at a lower
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
841
heat energy phase will not cause the SMA to heat up to a high temperature. Therefore, heat sinking can be done at a faster rate, preparing the alloy in time for the subsequent heating. As a result, consecutive cooling and heating can be done at a faster rate, increasing the response speed of the SMA actuator as a whole.
Hardware and experimental setting
(a) (b) (c) (d) Fig.2(a) Position sensor (b) Thermocouple-K (c) Current transducer (d) Power transistor, H8/3664, CONTEC PC card and sensor-circuits
The hardware used in our experiments is shown in Fig.2. The shape memory alloy used is a 0.3mm diameter wire with 55% composition of Ni and 45% of Ti. When current is passed through the SMA wire, the wire will undergo a 4%-5% strain. The strain of the SMA is detected using a position sensor (GP2D12, Fig.2(a)), the temperature of the alloy and the amount of current flowing through are measured using a thermocouple (TCKT0022, Fig.2(b)) and the current transducer (LTS25-NP,Fig.1(c)) respectively. The information detected by the sensors are fed back as feedback information to the computer via the 12 bit CONTEC PC card (Fig.2(d)), and the heating of the SMA is controlled using signals generated by the H8/3664 chip (Fig.2(d)) also connected to the computer serially. All programming is done in the C language. computer
power source
computer
power source
Controller
PC card
PCcard
Controller
current transducer
SMA thermocouple
PSD sensor
Spring
PSD Thermo Current SMA sensor couple transducer
(a) (b) Fig.3 (a) Task 1: Hauling up periodically a 3kg weight (b) Task 2: Stretching a spring on a horizontal plane
842
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
Fig.4 Schematic diagram of the circuitry design for the experiment settings
To investigate the effects of high voltage pulse heating have on the SMA, we carried out experiments on two different experiment settings, where the alloy is used to perform the task of hauling up an object against gravity and stretching a spring on a horizontal plane. In both the experiments, the circuitry design for the control and sensory feedback for the entire experiment settings are shown in Fig.4. Both experiments were carried out in room temperature of 25ºC to 26ºC. In the first task (Fig.3.(a)), a SMA wire of 0.3mm in diameter and of length 0.6m is used as an actuator to haul up a weight of 3kg periodically. The thermocouple is attached onto the alloy for temperature monitoring to prevent overheating and for temperature control of the wire. The SMA wire is passed through the current transducer for contact-less current detection. Feedback from the current transducer will be used in future research for the adaptive control of the amount of current supplied to the SMA. Below the weight is placed the position sensor to monitor the position of the weight, thus the control response of the SMA. The information obtained from the 3 sensors is fed back to the computer via the PC card. The sensory feedback is used in the control design and also the investigation of the alloy’s characteristic. The power supply to the SMA is controlled using the H8/3664 microchip, connected to the computer serially. In the second task (Fig.3.(b)), a SMA wire of 0.3mm in diameter and of length 0.85m is used as an actuator on a horizontal plane, pulling a spring in a periodic manner. Similar to task 1, where the weight of the attached object to the SMA is used to extend the alloy when current is turned off, in task 2, the force from the extended spring is used to recover the SMA to its initial length. The circuitry system in Fig.4 is also integrated into this actuator, where the position sensor is used to trace the movement of the spring, thus the extension and contraction of the SMA wire. The speed of the movement of the spring actually gives us the control response of the SMA wire.
Results and discussion In this section of the paper, the results obtained from the experiments performed are shown as the response of the SMA when actuated using the conventional heating method and the proposed method. The results obtained are compared and the discussions for the results are presented.
843
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
TASK 1: Hauling Up a 3kg Weight Periodically. (Vertical actuation) Experiment A: Comparing rise in temperature and strain temperature (㷄)
㪈㪉 㪈㪈㪅㪏 㪈㪈㪅㪍 㪈㪈㪅㪋 㪈㪈㪅㪉 㪈㪈 㪈㪇㪅㪏 㪈㪇㪅㪍 㪈㪇㪅㪋 㪈㪇㪅㪉 㪈㪇
㪊㪌
a p p ro x . cu rv e
㪊㪇
7 .5 㷄
㪉㪌
te m p e ra tu re
㪉㪇
d is p la c e m e n t
㪈㪌 㪈㪇
c u rre n t
㪌 㪇 㪇
㪈
㪉
㪊
㪋
current (A)
t im e ( s )
6m m 1 3 .5 㷄 㪈㪌
㪉㪇
㪉㪌
㪊㪇
㪊㪌
㪋㪇
㪋㪌
㪌㪇
㪌㪌
㪍㪇
㪍㪌
㪈㪋 㪈㪉 㪈㪇 㪏 㪍 㪋 㪉 㪇
6m m
㪇
㪎㪇
㪈 㪈 㪅㪉 㪈㪈 㪈 㪇 㪅㪏 㪈 㪇 㪅㪍 㪈 㪇 㪅㪋 㪈 㪇 㪅㪉 㪈㪇
㪈
㪉
㪊
㪋
displacement (mm 㬍10 )
displacement (mm 㬍10 )
㪋㪇
tim e ( s )
te m p e ra tu re ( º C )
Fig. 5 Displacement against temperature graph of SMA when heated gradually using 0.75A for 10s
Fig. 6 Temperature, displacement and current graphs when a pulse of 3A of 20ms is induced
In experiment A, the strain and the rise in temperature of the SMA for both the conventional gradual heating method and our proposed method were compared. Firstly, a current of 0.75A was induced for 10s to the SMA, slowly heating it till the 3kg weight reached its maximum height. The strain of the alloy was plotted against its temperature, obtaining the graph in Fig.5. Then, as a comparison, an electric pulse of 3A, 20ms is induced to the alloy, obtaining the graphs in Fig.6. From the experimental results, we can see that an electrical pulse of 3A, 20ms when produce a 6mm strain of the alloy, rise of 7.5ºC was shown on the temperature graph. On the contrary, when the alloy was heated slowly and gradually with 0.75A for 10s, the same strain was achieved only when temperature rose to a value of 13.5ºC, double of the temperature rise obtained in high voltage pulse heating. Experiment B: Comparing actuation cycle 㪍㪇
temperature (㫦C)
temperature (㫦C)
㪍㪇 㪌㪇 㪋㪇
te m p e ra tu re
㪊㪇
d is p la c e m e n t
㪉㪇
㪌㪇 㪋㪇
te m p e ra tu re
㪊㪇
d isp la ce m e n t
㪉㪇
cu rre n t
㪈㪇
c u rre n t
㪈㪇
㪇
㪇
㪇
㪌
tim e (s)
㪈㪇
㪈㪌
current (A)
㪏
4m m
㪍 㪋 㪉 㪇 㪇
㪌
㪈㪇
㪈㪌
㪈㪌
tim e (s)
Fig. 7 Temperature, displacement and current graphs of SMA when heated gradually using 0.75A, in every 2s periodically
㪈㪈 㪈㪋
㪈 㪇 㪅㪏
㪈㪉
㪈 㪇 㪅㪍
㪈㪇
㪈 㪇 㪅㪋
6m m
㪏
㪈 㪇 㪅㪉
㪍
㪈㪇
㪋
㪐 㪅㪏
㪉
㪐 㪅㪍 㪐 㪅㪋
㪇 㪇
㪌
㪈㪇
displacement (mm 㬍10 )
㪈㪇
displacement (mm 㬍10 )
㪈㪉
㪈㪇
tim e (s) 㪈 㪇 㪅㪏 㪈 㪇 㪅㪍 㪈 㪇 㪅㪋 㪈 㪇 㪅㪉 㪈㪇 㪐 㪅㪏 㪐 㪅㪍 㪐 㪅㪋 㪐 㪅㪉 㪐
㪈㪋
current (A)
㪇
㪌
㪈㪌
tim e (s)
Fig. 8 Temperature, displacement and current when by high voltage pulse heating of 3A, in every 2s periodically
In experiment B, actuation cycles for both heating methods were compared. In gradual heating, the input current is 0.75A for 2s with an interval of 2s, periodically. A 0.75A, 2s gradual heating provides an amount of 15J of energy to the SMA, and the amount of strain achieved was approximately 4mm as shown in Fig.7. The frequency of actuation in this case was 0.25Hz, and the displacement graph shows an inclining
844
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
tendency. This phenomenon could be explained by slow heat dissipation from the SMA to the environment before the subsequent heating occurred. Therefore, heat accumulation is expected to occur within the alloy, maintaining the alloy in the form of low elastic austenite structure. On the contrary, high voltage pulse heating improved the actuation frequency and the heat dissipation of the SMA actuator, as in Fig.8. A 3A, 20ms pulse heating provides a mere 2.3J of energy to the SMA, but the amount of strain achieved was approximately 6mm. The frequency of actuation in this case was 0.5Hz, double that of the speed achieved in the gradual heating actuation. Moreover, the weight remained at a constant distance of 10.0cm from the position sensor and actuating from that height without any inclining tendency suggests that, heat accumulation did not occur or occurred minimally and heat dissipation was fast enough before the subsequent heating took place. The achievement of increased response time can be seen as a comparative decrease in the rise and settling time of the SMA actuator as can be detected in the gradient of the displacement graphs. TASK 2: Stretching a Spring Periodically (Horizontal actuation) Experiment A: Comparing rise in temperature and strain temperature (㷄)
approx. curve 8mm
㪋㪇 㪊㪌
12㷄
㪊㪇 㪉㪌
te m p e ra tu re d is p la c e m e n t c u rre n t
㪉㪇 㪈㪌 㪈㪇 㪌 㪇 㪇
㪈
㪉
㪊
㪋
㪌
㪍
㪎
㪏
t im e ( s )
㪈 㪌 㪅㪌
㪈㪋
㪈㪌
current (A)
㪈㪉
19㷄
㪉㪇
㪉㪌
㪊㪇
㪊㪌
㪋㪇
㪋㪌
㪌㪇
㪌㪌
㪈㪇
㪈 㪋 㪅㪌
8m m
㪏
㪈㪋
㪍 㪋
㪈 㪊 㪅㪌
㪉
㪍㪇
㪇
displacement (mm㬍10)
displacement (mm 㬍10)
㪋㪌
㪈㪌㪅㪉 㪈㪌 㪈㪋㪅㪏 㪈㪋㪅㪍 㪈㪋㪅㪋 㪈㪋㪅㪉 㪈㪋 㪈㪊㪅㪏 㪈㪊㪅㪍 㪈㪊㪅㪋 㪈㪊㪅㪉 㪈㪊
㪈㪊 㪇
temperature (㷄)
㪈
㪉
㪊
㪋
㪌
㪍
㪎
㪏
tim e ( s )
Fig. 9 Displacement against temperature graph of SMA when heated gradually using 0.75A current for 10s
Fig. 10 Temperature, displacement and current graphs of SMA when a pulse of 3A current of 30ms is induced
Experiment B: Comparing actuation cycle 㪍㪇
㪍㪇
temperature (㷄)
temperature (㷄)
㪎㪇
㪌㪇 㪋㪇
temperature displacement current
㪊㪇 㪉㪇 㪈㪇 㪇
㪋㪇 㪊㪇
temperature displacement current
㪉㪇 㪈㪇 㪇
㪌
㪈㪇
㪈㪌
time (s)
㪉㪇
㪉㪌
㪇
㪈㪉
12.5mm
㪈㪇
㪈 㪋 㪅㪌 㪈㪋
㪏 㪍
㪈 㪊 㪅㪌
㪋
㪈㪊
㪉
㪈 㪉 㪅㪌 㪈㪉
㪇 㪇
㪌
㪈㪇
㪈㪌
㪉㪇
㪌
㪈㪇
㪈㪌
㪉㪇
㪉㪌
㪊㪇
time (s)
㪉㪌
time (s)
Fig. 11 Temperature, displacement and current graphs of SMA when heated gradually using 0.75A, in every 3s periodically
㪈 㪌 㪅㪌
㪈㪋
㪈㪌
㪈㪉
12.5mm
㪈 㪋 㪅㪌
㪈㪇
㪈㪋
㪏 㪍
㪈 㪊 㪅㪌
㪋
㪈㪊 㪈 㪉 㪅㪌
㪉
㪈㪉
㪇 㪇
㪌
㪈㪇
㪈㪌
㪉㪇
㪉㪌
displacement (mm㬍10)
㪈㪌
displacement (mm㬍10)
㪈 㪌 㪅㪌
㪈㪋
current (A)
㪇
current (A)
㪌㪇
㪊㪇
time (s)
Fig. 12 Temperature, displacement and current graphs of SMA when heated with high voltage of 3A in every 3s periodically
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
845
Comparative experiments similar to that in task 1 were performed for task 2. The results obtained for both the experiments A and B in task 2, further show the effectiveness of our proposed new heating method. In task 2, the position sensor was used to monitor the contraction of the SMA towards the sensor (opposite of task 1) therefore the displacement graphs are seen as decreasing distance against time. In results of experiment A, high voltage pulse heating (30ms duration) achieved 8mm (Fig.10) of SMA strain at a temperature 7ºC lower than that in gradual heating (Fig.9). In experiment B, pulse control improved on the actuation frequency of the alloy by double in speed for the same strain actuation and at lower energy of 4J compared to 68J of that used in the gradual heating method. By comparing the displacement graphs in Fig.11 and Fig.12, we can observe that the rise and settling time for the SMA actuator when heated with our proposed method show the decrease in those times when compared to gradual heating. These experiment results verify the advantage of the proposed method in improving the response speed of the SMA for both horizontal and vertical systems.
Application We have applied this methodology in the actuation of a robotic finger using the new SMA actuator (Fig.13) developed by Loh, Yokoi and Arai [10] for the actuation of the prosthetic hand. The actuation setup is shown in Fig.14, where 2 of the proposed actuators were used to actuate a DOF joint, each for the extension and contraction respectively. One major disadvantage in using SMA in actuating a joint is, a contracted SMA wire exhibits high resistive force against extension, therefore consecutive contraction and extension of the SMA wire for the flexion and extension motions are difficult to maintain, less achieve. By introducing the high voltage pulse heating method, the induced pulse control helps to produce a large impulsive force which is strong enough to extend the SMA wire on the opposite side of the joint, which undergoes contraction in the subsequent heating. As a result, the problem of achieving consecutive contraction and extension motions of the SMA can be overcome. electro de stain le ss ou ter tu be (㲍 㪇㪅㪏㪄㪈 㪅㪍㪀 stain le ss pipe (㲍 㪇㪅㪎㪄㪈㪅㪇 㪀
plastic sto p pe r
P E E K Tu be (㲍 㪇㪅㪊㪌㪄 㪇㪅㪋㪌㪀
n y lo n ba n d
robotic finger
proposed SMA actuator
le n g th w he n con tract 5 0 0
h e at sh rin k in g tu be S M A w ire (㲍 㪇 㪅㪊
e x ten sio n a p px .2 0 stain le ss p ipe (㲍 㪈㪅㪍㪄㪉 㪅㪌 㪀
P E line fo r stre tch in g the S M A (㲍 㪇 㪅㪉㪐 㪀
full stroke 20
Fig. 13 Application in actuation of a robotic finger
Fig. 14 The design of the new SMA actuator
Conclusion We described a high voltage pulse heating method for a better and optimum energy control for the actuation of the SMA, replacing the conventional gradual heating
846
C.S. Loh et al. / A New Heating Method for the Actuation of the Shape Memory Alloy Actuator
method which heats the wire excessively, leading to heat accumulation and slowness in response of the alloy. Compared to the conventional gradual heating method, our proposed control method makes actuation with low energy possible, achieving the same or larger amount of strain. Comparatively, high voltage pulse heating enables the actuation of SMA at a lower temperature phase, which led us to believe that high electrical power applied in short interval leads to the direct conversion of a portion of the induced electrical energy to potential energy in the SMA without going through the heat phase. With a smaller amount of heat energy to dissipate upon cooling, a faster response speed can be obtained, as shown in the results presented.
Future Work As a next step, an intelligent control system will be developed in the line of effort to further increase the current response of the SMA in an ambient environment, due to the reasons that in ambient condition, a more practical usage for the SMA in the field of robotics is in demand. Based on our present research and future results, we will also work on improving on conventional models of the SMA to develop a better platform for further analysis of the alloy.
Acknowledgement This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 2004, No.16360118.
Reference [1] Ferdinando Auricchio. “Shape Memory Alloys: applications, micromechanics, macromodelling and numerical simulations”. Ph.D dissertation of UCB, Berkeley, 1995. [2] Dr. Abhijit Bhattacharyya et al. “Shape Memory Alloy”. University of Alberta, Edmonton, Alberta, Canada, 2001. http://web.cs.ualberta.ca/~database/MEMS/sma_mems/sma.html [3] Yavuz Eren, Constantinos Mavroidis and Jason Nikitczuk. “B-Spline Based Adaptive Control of Shape Memory Alloy Actuated Robotic Systems”. Proceedings of IMECE ‘02, November 2002. [4] Kathryn J.De Laurentis and Constantinos Mavroidis. “Mechanical Design of a Shape Memory Alloy Actuated Prosthetic Hand”. Transactions of the ASME, Vol. 124, No. 4, pp. 652-661, 2002 [5] Shigeo Hirose, Koji Ikuta and Koichi Sato. “Development of Shape Memory Alloy Actuator (Improvement of output performance by the introduction of a Į-mechanism)”. Advanced Robotics, pp.89-108 (1989) [6] R. Andrew Russell and Robert B.Gorbet. “Improving the Response of SMA Actuators”. Proceedings of the 1995 IEEE International Conference on Robotics and Automation, vol. 3, pp. 2299-2304, May 1995. [7] Kuribayashi, K. “Improvement of the response of an SMA actuator using a temperature sensor”. The Interntional Journal of Robotics Research, vol. 10, pp. 13-20, February 1991. [8] R. Featherstone & Y. H. Teh 2004. “Improving the Speed of Shape Memory Alloy Actuators by Faster Electrical Heating”. Int. Symp. Experimental Robotics (ISER'04), Singapore, 18-21 June 2004. [9] Brian Selden, Kyu-Jin Cho and H.Harry Asada. “Multi-Segment State Coordination for Reducing Latency Time of Shape Memory Alloy Actuator Systems”. Proceedings of International Conference on Robotics and Automation, April 2005. [10] Loh, Yokoi and Arai. “New Shape Memory Alloy: Design and Application in the Prosthetic Hand”. International Conference of the IEEE Engineering in Medicine and Biology Society, September 2005.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
847
Informational Organization on Network among Multiple Agents Yoshihito Shikanai Koichi Ozaki and Sumio Yamamoto Graduate School of Engneering, Utsunomiya University Abstract. This paper presents an informational organization for multi-agent systems consisting a large number of agents such as ubiquitous computing devices. In the multiagent systems, it is required that every agent mutually behaves for cooperating tasks. The agent must search for available agents or information. We propose a conceptual model by using a notice board (NB) in order to search in these agents. Every agent is able to trace via each NB to search for available information. In this paper, we have verified efficient of searching the NBs on organized information according to processes. Keywords. Informational Organization, Communication, Multi-agent
1. Introduction Communication is necessary for a cooperation task among multiple agents in order to accomplish acquiring information, negotiation, consensus, etc. Generally, computer network such as Internet is widely adopted to robotics fields. Computer network is expanding into every place. Computing devices become smaller and lighter, though they are equipped with a communication function. Various computing devices are connected with network in everywhere. There are a large number of computing devices in a network. Therefore, we can assume that each device is an ubiquitous robotic agent. For example, a sensor which transmits sensor data to the network is an agent though it cannot behave physically. The number of agents in the network is several hundreds or thousands. In some situation, it can be more. The agents are those various devices connected to the network. Therefore, the kind of agents are also various. Previously, various communication methods are reported by many researchers [1][2]. Generally, Contract-Net Protocol (CNP) [2][3], Black Board Model [4], etc. are applied to the multi-agent systems. The CNP-based method is widely applied to systems consisting of several agents. In a vast network environment, the CNP-based method utilizes broadcasting to other agent, and it is difficult to set a timeout to wait for the receiving bidding messages from other agents, since it is unknown that how many replies can be existed under a large number of agents. Therefore, it is not suitable for ubiquitous agents. As in the other method, the Black Board Model can be applied to systems with multiple agents. Basically, the Black Board Model is a model where the agents exchange information via the Black Board which exists only one or few in the system, and information on the system is focusing on the Black Board. Therefore, the Black Board Model
848
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
Notice Board
w
at
ch
Relational Information
Figure 1. Concept of exchanging information between agents by Notice Board model (NB-model)
must centrally manage the information, so it is also not suitable for a large number of agents. In this paper, we propose a conceptual model for a large number and various kind of agents. This model is based on the Distributed Black Board Model[4], and we also consider various kind of information, function and ability of the agents. In this method, all the agents are having a Notice Board (NB) which shows their own information, and every agent can search for available information which is organized by relational information among the NBs. We have implemented a computer simulation to confirm the process of the informational organization. In this simulation, the agents can obtain the searching information on the NBs which is organized by the relational information. The efficiency of the informational organization is shown by the result of this simulation.
2. Communication for Agents in Large Scale Network In the multi-agent systems, it is important for each agent to mutually cooperating between each other. We assume that there is a large number of agents, where these agents are executing a cooperative behavior in an ubiquitous network. It is required for every agent to search for some partners within other agents for the tasks. Since it is difficult to do so because all the agents are centrally managed, every agent must autonomously search through communications. In fact, inquiry of all the agents is restricted in a large number of agents. Therefore, it is necessary for the agents to efficiently search and acquire for available information. In order to achieve that, we propose a new conceptual model where every agent publishes their own information by using notice boards. We call this model as the Notice Board model (NB-model). The concept of acquiring information by the NB-model is illustrated in Fig. 1. In this figure, all the agents show their own information on the Notice Board (NB) that they have respectively. Every agent can read information about the other agents by using their NBs, and search for available information. The point in this model is that the NB has the relational information to the other agents. Every agent stores up “relational information” to its own NB during searching on the network. Then, every agent is able to search for available information by tracing the relational information on NBs. With this relational information, information of the NBs is linked among a large number of agents, and the linked information is organized as information on the entire
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
Header
849
Class 1 Class 2
Class n
Data Tag n-1 Data Tag n-2
Class N Data Tag n-m
Figure 2. Hierarchical Framework of Notice Board
Name : Name of Agent Contents : Information of Agent Figure 3. Header in Notice Board
Name : Name of Agent Information : Data, Knowledge and Function of Agent Reference : Name of agents which have relational information
Figure 4. Data Tag in Notice Board
network. Therefore, all the information is shared, and every agent can obtain the available information for its own behavior. We define “informational organization” as an organization with relational information.
3. Informational Organization Algorithm Every agent has a NB in order to show their own information to other agents. Therefore all the agents are able to acquire available information via the NBs. We describe the NB and the algorithm of acquiring information as follow. 3.1. Notice Board The kind and the number of information that each agent owns is different. Therefore, we have to design the NB so that it has a flexible construction with each agent and able to acquire necessary information easily. Fig. 2 shows a hierarchical framework of the NB consists of three parts of “Header”, ”Class” and “Data Tag”. Fig. 3 and 4 represent the Header and the Data Tag, respectively. The Header shows the name and contents of the agent. The Data Tag shows the agent’s information; data, knowledge and functions. It also shows the relational information as “Reference”. The agent is able to find other agents which have available information by the framework. When a searching agent obtains the available information, the agent is able to know the contents by the Class on the framework.
850
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents (1)
(2) 3
3
2
2
4
4
1
Name: Agent 0 Info. : Organization Ref. : None
Name: Agent 1 Info. : Organization Ref. : Agent 2, 3
0
(3)
1
Name: Agent 0 Info. : Organization Ref. : Agent 1, 2, 3
0
(4) 3
Name: Agent 2 Info. : Organization Ref. : Agent 1, 4
2 3
4
1
2
4
Name: Agent 0 Info. : Organization Ref. : Agent 1, 2, 3, 4
1 0
0
(5) 3 2 4
Name: Agent 4 Info. : Searching Info. Ref. : Agent 2
1 0
Taget Agent Searching Agent
Other Agent Inquiry Reference; relation info. Reference of Searching Agent
Figure 5. Procedure of Organization
The Class classifies the information by its attribute. Agent’s information can be represented by five type of classes of “Control Class”, “Physical Class”, “Procedural Class”, “Knowledge Class” and “Concept Class” [5]. The Control Class is an electric data such as control signals or sensory signals. The Physical Class is a physical information such as position, velocity, or force. The Procedural Class is a procedure or a command to operate the agents with its parameter(s) if any. The Knowledge Class is a knowledge or a condition status, which is extracted from the knowledge base. The Concept Class is a concept such as intention or target. 3.2. Algorithm We have designed a method of informational organization for searching agents. Fig. 5 shows a procedure of this method as an example. In step (1), The T arget Agent has the available information for the Searching Agent, and the Searching Agent is not an organized information. The procedure of informational organization is shown below: 1. The Searching Agent, which is the Agent0 , inquires searching information of the Agent1 . The Searching Agent reads the Data T ag on the NB of the Agent1 . The searching information on the Data T ag can be compared to the
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
2. 3.
4.
5.
851
Header and the Class on the NB. The Agent1 does not have any information for the Searching Agent. However, the Agent1 has a Ref erence as relational information to the Agent2 and the Agent3 . The Searching Agent recognizes the Agent2 and the Agent3 . Then, the Searching Agent records them in the Ref erence of its own Data T ag. The Searching Agent also inquires searching information of the Agent 2 . The Agent2 does not have the searching information for the Searching Agent. However, the Agent2 has a Ref erence to the Agent4 . The Searching Agent also records the Agent4 in the Ref erence of its own Data T ag. Then, the Searching Agent has the Ref erence; the Agent 1 , 2 , 3 and 4 . The Searching Agent inquires searching information of the Agent 4 . Finally, the Searching Agent is able to obtain the searching information, and able to recognize the T arget Agent; which is the Agent4 .
4. Experiment by Simulation The point of this method is that every agent builds up the relational information on the NB as an informational organization. In order to evaluate the efficiency of the relational information, we have implemented a computer simulation. In this simulation, there is a large number of agents in an ubiquitous network, and each agent searches for the target agents via the NBs by inter-process communication. The simulation shows the relationship between the agents and the relational information through graphics. 4.1. Condition and Algorithm In the simulation, there are n sets of agents on the x-y space. This space is an area of 400 x 400 square. The distance between the agents indicating the agents’ requirement to search for the informational organization. Each agent can search within a radius of 20. The following two actions is performed repeatedly as one step. 1. Each agent obtains information with a probability of p. 2. Each agent that organizing information, collects the relational information which is shown as a link to the others. The agent could execute the linking with a probability of q. In this simulation, the fixed parameters are p = 10% and q = 10%. The simulation was executed on the number of agents: n = 300, 400, 500 and 600, respectively. The advancement of the informational organization by processing steps is evaluated. 4.2. Progress of Informational Organization Fig. 6 and 7 illustrates the simulation results of the informational organization in the case of n = 500. In these figures, the dots are the agents which have the available information to organize, and the lines are the relational information between agents. Fig. 6 shows the overview of the x-y space at 50 Steps. Fig. 7 shows the expanded figure of the condition inside the small frame of Fig. 6. Fig. 7-a, b, c and d shows the condition of step 20, 40, 60, and 80, respectively. In this figure, the cross is the Searching Agent which require
852
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
Figure 6. Overview of Simulated Result
for the searching information, and the large dot is the T arget Agent which is having the searching information. The process of the informational organization in each step is shown below: a. The Searching Agent required for a searching information which was owned by the T arget Agent. However, There was no agent which has the searching or relational information around the Searching Agent, so it could not do anything. b. Since the informational organization was progressed, the Searching Agent and the T arget Agent were linked by a relational information. The Searching Agent was able to obtain the searching information by tracing the relational information. Moreover, tracing the routes by the relational information was detoured on the right side of the route, and the number of hops was five. c. The informational organization progressed further. The number of hops decreased. However, the Searching Agent searched via several other agents in order to obtain the searching information, and able to the obtain searching information by three hops. d. The informational organization progressed by every agent, satisfactorily. The Searching Agent and the T arget Agent were linked nearly in a straight line pattern. Therefore, the Searching Agent able to obtain the searching information by two hops. When the informational organization is not progressing at all within few steps, it is difficult for the agents to immediately get the required information by using some links. As the step progresses, the lines between the agents that has the information are
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
a. After 20 Steps
b. After 40 Steps
c. After 60 Steps
d. After 80 Steps
853
Figure 7. Progress of Organization (500 agents)
increasing, and the information is organized. Each agent is able to search easily according to the increasing of the lines. 4.3. Experimental Results Fig. 8 shows the relationship between the number of steps and the total links in case of n = 300, 400 and 500. Although that each values of the maximum links is differs in this figure, the tendency of each curves of the maximum links is similar on each number of agents. Flat parts in the graph show that the information on the NBs was well organized by the relational information. Therefore, the degree of the organization can be recognized by the number of relational information. Fig. 9 shows the number of minimum hops between the two agents; that are the Searching Agent and T arget Agent; within the informational organization at n = 300, 400, 500, 600, and step = 40, 60, 80, 100, 120. The more increasing in total agents or decreasing in steps, the more number of hops tend to increase. When the agents increase more, the hops are also increase in same number of steps. The number of hops decreases as the step progresses and the organization is performed, and the difference of the number
854
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents 300000
Links (Relational Information)
Number of Agents : 300
250000
400 500
200000
150000
100000
50000
0
0
100
200
300
400
500
600
700
800
900
1000
Steps
Figure 8. Relationship between Total Links (Relational Information) and Steps 20 40 steps 60 steps 80 steps 100 steps 120 steps
Hops
15
10
5
0 200
300
400
500
600
Agents
Figure 9. Relationship between Shortest Hops and Total Agents
of hops in the total agents is also decreases. Therefore, even when there is a large number of agents, the agents are still able to obtain the searching information by fully performing the informational organization. In the case of n = 500 and step = 40, the number of hops is higher than the other results because the progress of the organization is uneven at the early stage of the organization. The number of hops decreases as well as other conditions when the number of steps increases and the organization progresses. Therefore, the cost of communication does not rise so much by the organization progress even if the number of the agent increases. The agents are able to use the information efficiently.
5. Conclusion In this paper, we have proposed Notice Board model (NB-model) for multi-agent systems with a large number of agents such as ubiquitous agents. In this model, each agent is able to search for available information by using relational information which is the Reference. We have implemented a computer simulation to confirm the process of in-
Y. Shikanai et al. / Informational Organization on Network Among Multiple Agents
855
formational organization. In this simulation, the agents are able to obtain the searching information on the NBs which were organized by relational information. The efficiency of the informational organization was shown by the result of this simulation. Though we had been discussing about organization, the necessary information for each agent is different depending on the agent. Therefore, an unnecessary organization is included in the link with the agent. It is necessary to evaluate the information inquiry.
Acknowledgement The authors would also like to thank Mr. Fairul Azni Jafar for his assistance in preparing this paper.
References [1] K. Ozaki, H. Asama et al. : “Negotiation Method for Collaborating Team Organization among Multiple Robots”, Distributed Autonomous Robotic Systems, Springer-Verlag, Tokyo, pp. 199-210(1994). [2] A. Kasi´nski and P. Skrzypczy´nski : “Communication Mechanism in a Distributed System of Mobile Robots”, Distributed Autonomous Robotic Systems 5, Springer-Verlag, Tokyo, pp. 51-60(2002). [3] R. G. Smith : “The Contract Net Protocol:High-Level Communication and Control in a Distributed Problem Solver”, IEEE Trans. on Computers, Vol.C-29, No.12, pp. 11041113(1980). [4] J. Wang : “On Sigh-board Based Inter-Robot Communication in Distributed Robot Systems”, IEEE Int. Conf. Robotics and Automation, pp. 1045-1050(1994). [5] A. Matsumoto, K. Ozaki, H. Asama, Y. Ishida and I. Endo : “Communication in the Autonomous and Decentralized Robot System ACTRESS”, IEEE International Workshop on intelligent Robots and Systems, pp. 835-840(1990).
856
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
A Flexible Task Knowledge Representation for Service Robots Steffen Knoop, Sven R. Schmidt-Rohr, and R¨ udiger Dillmann IAIM, Institute of Computer Science and Engineering (CSE), University of Karlsruhe(TH), Germany {knoop,srsr,dillmann}@ira.uka.de
Abstract. Robots that are designed to act in human-centered environments put up the need for a flexible and adaptive representation of task knowledge. This results on the one hand from the continuously changing and hardly predictable state of an environment that is populated with humans and robots. On the other hand, a task knowledge description of a robot which cooperates with humans has to be adaptable and extendable. This paper presents a task knowledge representation for service robots called Flexible Programs (FP) and the environment for execution of FPs. Flexible Programs can be created manually, or by using the results of machine learning approaches like Programming by Demonstration. It is possible to change, adapt and extend this task knowledge at runtime.
1
Introduction
Service robots which possess manipulation capabilities require a powerful task representation. Such a representation has to provide several features, e.g. representations for goals, tasks, possibly conditions, etc. Representing tasks has been studied for quite a long time in the domain of task planning, and each execution environment which is able to perform a sequence of actions is equipped with a (implicit or explicit) task representation. A simple task could be the goal of navigating to a target position. Here, the task and the goal both are implicitly given by the target position and the algorithms, which are part of the robot model. More complex tasks which include manipulation, variables within the task, and a larger set of actions and exceptions, have to be modeled more explicitly. Especially the capability of generating executable tasks from observation of human demonstrators (Programming by Demonstration, see [1]) and the demand of legibility for humans require for an explicit task model. This paper describes the task model and execution environment which is used at IAIM for the service robot AlbertII, see fig. 1. The robot is equipped with a differential drive platform, a 7dof manipulator, a three-fingered hand, stereo camera, speech in/out and a touch display on the front. The target application for this robot is the assistance for humans in household environments.
857
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
2
State of the Art
Within the field of robotics, almost as many task descriptions as robots can be found. Each robot which performs complex tasks requires a task description, but no standard task description has evolved yet. This mainly results from the fact that the task knowledge is very often closely linked to the dedicated execution system, and thus not very portable. Nevertheless, some task descriptions have been developed during the last decade which are highly interesting and have proved their reliability and effectiveness in several projects, mainly evolving from the field of robot programming languages. True robot programming languages are mostly built as extensions to standard programming languages like C (see Colbert [2]) or LISP (see RAPs [3][4], Frob [5]). Thus, runtime extension of tasks is not possible or only according to predefined rules. A survey can be found in [5]. Declarative task descriptions often rely on static task descriptions. These tree-like (e.g. TDL [6], or Hierarchical Task Networks [7], [8]) or procedural (see [9], [10]) descriptions can be extracted from user demonstrations and are closely related to PbD output action sequences. However, most existing declarative task descriptions have been designed for manual task compilation and do not provide extensive support for condition management, which is a prominent requirement for the execution of learned action sequences.
3
Framework and Software Architecture
A software architecture that is able to control a mobile robot basically has to cover three aspects to comprise all capabilities of the system: Control of the hardware, representation of the environment and integration of knowledge about skill and task execution. Including the demand for a lifelong learning system, these requirements have to be fulfilled in a way that the system remains extensible.
Instantiation Domain controlling Priorization
Flexible Program 2
Communication Manager
Flexible Program n
Resource Management
Agent Management
FP Management Flexible Program 1
Hardware Agent 1 Hardware Agent 2
Hardware Agent n
Environment model
Fig. 1. Robot AlbertII in kitchen environment
Fig. 2. CORBA-based software architecture with flexible programs, hardware agents and communication scheme
858
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
An expedient approach to design such an architecture is therefore to identify the functional components as hardware abstraction, environment model and skill and task execution. Our software architecture consists consequently of the following components (see figure 2): The hardware agents encapsulate all hardware specific functions. There is a hardware agent for each hardware in the robot system, e.g. one for the robot arm, one for the hand and one for the voice. Skill and task knowledge is represented by Flexible Programs. Example skills are ”drive to position” and ”open door”, tasks can be ”transport an object” or ”put object on table”. All information about the environment is stored in an environment model. This can hold all kinds of data: Coordinates, objects, relations, features, images or sounds. A communication bus connects all these components. The components are in detail: – The communication infrastructure consists of a notification distribution instance, where clients can subscribe for certain notification types. Notifications may be delivered by internal or external sources. – Hardware agents (resources) represent real or virtual sensors and/or actuators. There is an agent for laser scanner and for cameras, but there may be also an agent for detection of humans which incorporates laser scanner and vision information. – The agent manager administrates all hardware agents and provides the resource management. Each notification that is passed to an agent is filtered by the agent manager. Thus, unauthorized commands (from instances which have not locked the called resource) can be intercepted. – Flexible programs (FPs) contain the skill and task knowledge. The FPs are the core of the robot control architecture. Learning, within our context, means creation, extension and adaption of flexible programs. – Domain controlling and supervision as well as FP instantiation and priority control is done by the flexible program supervisor. Depending on the current context, FPs are created, prioritized or deleted. In later development, learning capabilities will be incorporated in this component. – The environment model holds environmental data as well as the robot’s internal state. It is implemented as a blackboard. An intelligent xml data base, which is being developed at our research group, has been integrated for storage of task, skill and object feature knowledge. The implementation of the proposed software architecture (see figure 2) consists of a set of CORBA object types, which communicate using a publishsubscribe mechanism (see also [11]). Each public object’s interface is described in CORBA’s Interface Definition Language, which enables the use of different programming languages within the same framework. A full description of the execution environment, including the robot hardware and the execution framework, can be found in [12].
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
4
859
Task knowledge representation
Within our concept, the flexible programs hold the task knowledge. This knowledge is represented as the number, type, parameterization and order of the basic skills and other flexible programs used. In addition to all generic requirements for programming languages, it is important that the task description is adaptive and extendable at runtime. This is a major requirements for integration of online learning and adaptation methods, which is an essential requirement for service robots in human-centered environments. Thus, the language must not need a compilation process, a generated or adapted program should be directly executable. Many robot programming languages have been developed throughout the last years, some of which are very mature and used in many applications (e.g. ESL, see [13], or TDL, see [14]). But in particular the program extendibility at runtime has not been an explicit demand. Therefore, we have developed a modular and extensible task description called Flexible Programs (FP). The tasks described as FPs are based on a set of primitive actions and functionality to combine and select different paths through these actions. 4.1
Flexible Programs
Similar to TDL and HTN planning (see section 2), the task is described as a hierarchical network which is processed with a depth-first search strategy. In contrary to other representations, the FP task tree is not statically built, but instantiated dynamically at runtime. This section describes the tree nodes and tree instatiation. One node is described by a parameter set P = {Id , Cpre , Cpost , Crt , R, S, P, A} with Id a unique identifier, Cpre , Cpost and Crt a set of pre-, post- and runtimeconditions, R a rating function, S a success measure, P the child prospect and A the action. Two different node types exist: – Leaves contain actions or other flexible programs which have to be executed. Their prospect is always empty P = ∅. – Inner nodes contain execution rules which control the order and execution of child nodes within the prospect P which must not be empty. Their action A is always A = ∅. Consequently, all robot actions are contained in tree leaves which are connected by inner branching nodes. The child prospect in branching nodes contains possible child configurations. It is composed of a set of n > 0 seats with m > 0 candidates each. The seats are arranged in p ≤ n parallel groups. Tree expansion then is done by dynamically selecting one candidate for each seat at runtime. The state of a node can be virtual (no children selected yet), visited (children selected and instantiated) or completed (execution of all children finished). Candidate selection is done based on two criteria:
860
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
– The preconditions Ppre are evaluated to determine whether a candidate node can be executed. Only if the given preconditions are true, the candidate can be chosen. – In the case where more than one candidate may be chosen for the same seat based on the precondition evaluation, each candidate’s rating function R is evaluated. This function can be any kind of function based on accessible environment parameters (e.g. time of day, light conditions, etc.). The candidate with highest rating will be chosen. Candidate selection is thus highly depending on environmental and robot parameters. To select the right subnode for a given situation, the seats are filled only when the parent node is under execution for the first time. This means that selection of actions can be done based on the same situation in which it will be executed. Children which are arranged in seats within parallel groups in the prospect will be executed in parallel. The children are entered simultaneously, and the parallel group is treated in the superior context equal to one node, which means that all dependent actions are entered when all children of the parallel group are completed. This implies that parallel seats must be arranged consecutively in the prospect. It is important to note that parallel subtrees can not be evaluated for colliding actions. So it is in the responsibility of the FP builder to avoid collisions in parallel subtasks. As stated before, tree leaves contain basic actions. This can either be a command to one of the hardware abstractions, or it can be the execution command and parameters for another flexible program. In both cases, the action will be marked as completed when a corresponding notification from the target instance returns. An example for dynamic candidate selection and tree expansion can
Put down object
Detect table Vision detect table
Place object on table
Speech output Voice say sorry
Fig. 3. Runtime candidate selection based on world and robot state with two candidates for the following action
be seen in fig. 3. Based on the result of the first action, the second one can be chosen. Therefore, the postcondition of the first action has to be part of the precondition of the second. Tree leaves trigger primitive actions within the hardware abstraction layer. These actions can be further parameterized from the environment model by
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
861
using variables from the global database. It can be specified in each node which slot has to be filled from which variable. In the same way, action results (return values) can be stored in the global database, thus giving the possibility to pass parameters from one node to another. Flexible program format and generation: Flexible programs are specified and stored completely in XML. An interpreter engine loads and executes the FPs according to external or internal trigger events. This approach provides a clear <entity name="REPutObjToTable" type="node"> <equal> <arg>holdObject <arg const="true">true <prospect seats="2"><seat> REMoveToTable 0 0 <seat> REPlaceObj 1 0
Fig. 4. Flexible Program node specification example in XML notation. Some fields are left out for clarity of presentation
separation of task description and execution, which increases portability between different robots and keeps the task knowledge base open-ended. Subtasks can easily be reused in different Flexible Programs, in the same way as primitive actions are included in each FP. An example xml is shown in fig. 4. The task knowledge of a robot in human environments will always be incomplete due to the changing environment. Thus, it is important that the robot can learn new tasks from a user. This problem is being addressed within the framework of Programming by Demonstration (see [15],[16]). The output of a PbD learning cycle is a task description which is independent from the target robot: It only includes the segmented actions performed by the human demonstrator. The resulting task description is called Macro-Operator (MO). It is now
862
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
possible to compile such a MO into a Flexible Program. To accomplish this, additional knowledge about the target system is required and has to be added and included in the task description. This work currently being addressed to gain (semi-) automatic robot program creation. Of course, it is possible to create and edit Flexible Programs manually by compiling actions, alternatives, and conditions into an execution tree. This process is simplified by making reuse of existing task knowledge.
5
Experiments and Results
The FP description and interpreter module have been used and tested in several applications with the service robot AlbertII. Two examples are shown: A ”pick” (fig. 5) and a complex ”pick and place” task (fig. 7). The first one demonstrates the candidate selection and parameter passing capabilities with a real robot system: Depending on the result of the object localization Detect object, the hand preshape, TCP position, approach direction and grasp type are chosen. Depending on the result of the grasp action, one of two speech outputs is selected. Finally, platform and arm retreat in parallel to a safe position. The execution can be seen in fig. 6.
Retrieve Object
Detect Hand Approach Grasp Speech object Preshape Obj Specific feedback Obj
Camera Move Hand search Generic Arm object Open AbsPos
Parameters - Object type - Position
Retreat Safe
Hand Specific Grasp
Speech out
Speech out
Voice success
Voice failure
Move ArmMove SafeDist WorkPos Move Move Arm Platform AbsPos AbsPos
Fig. 5. Flexible program Retrieve Object. Processed in depth-first search
Fig. 6. Top left: Detect object, top right: Preshape hand, bottom left: Grasp object, bottom right: Retreat
A complex example FP is shown in figure 7. The depicted task moves the robot in front of the fridge, takes an object out of it, including the open and close of the fridge, and places the object on a given table. It has been executed successfully in simulation. In the given fully instantiated (i.e. no candidates, execution finished) example, several mechanisms can be found. Leaves, i.e. actions, are marked with a circle, inner nodes, containing tree functionality, are depicted as squares. Parallel actions are linked together.
863
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
GetBottleOnTable FromFridge
Prepare HandArm
PutObjToTable
GetObjOutFridge
MoveToFridge
MoveTo Fridge
MoveTo Fridge
Open Fridge
Retrieve Obj
Move Platform AbsPos
Move Platform AbsPos
Close ArmMove Hand WorkPos
Open Approach Grasp PullDoor Retreat Hand Safe Open Door Door
Hand Move Generic Arm Close AbsPos
Move Hand Arm Generic Open AbsPos
Hand Parallel Grasp
Move Arm AbsPos
Detect object
Place Obj
Move ToTable Move Platform AbsPos
Open Approach Grasp Retreat Hand Obj SpecObj Safe
Camera Move Hand search Generic Arm object Open AbsPos
Hand Specific Grasp
Select Move Position ArmPut
Open Hand
Camera Move search Arm place AbsPos
Hand Generic Open
Retreat Safe
Move ArmMove SafeDist WorkPos
Move ArmMove SafeDist WorkPos
Move ArmMove SafeDist WorkPos
Move Move Platform Arm AbsPos AbsPos
Move Move Platform Arm AbsPos AbsPos
Move Move Platform Arm AbsPos AbsPos
Fig. 7. Complex task example as Flexible Program
– Sub-FPs are reused several times, see e.g. the retreat safe block. – Some non-colliding children are executed in parallel, see e.g. Close hand and ArmMoveWorkPos in PrepareHandArm.
6
Conclusion
In this paper, we described a task representation for service robots which is set up of parameterized basic actions. These actions are linked within the task description in a tree like manner, and the tree is read and executed in a depthfirst search. The task representation is able to cope with exchange of variables, and program parts can be reused in other situations and contexts. The tree-like structure also supports this reusability. The presented approach of modeling task knowledge has proven to work on a real robot, giving the possibility to automatically choose the best alternative solution for each subtask at runtime. The design of the task knowledge representation further allows for task adaptation and extension at runtime. This will be the next step in the development of Flexible Programs.
7
Acknowledgements
The work described in this paper was partially conducted within the EU Integrated Project COGNIRON (”The Cognitive Companion”) and funded by the European Commission Division FP6-IST Future and Emerging Technologies under Contract FP6-002020.
864
S. Knoop et al. / A Flexible Task Knowledge Representation for Service Robots
References 1. M. Ehrenmann, R. Z¨ ollner, O. Rogalla, S. Vacek, and R. Dillmann, “Observation in programming by demonstration: Training and execution environment,” in IEEE International Conference on Humanoid Robots, Karlsruhe, Germany, Octobre 2003. 2. K. Konolige, K. L. Myers, E. H. Ruspini, and A. Saffiotti, “The Saphira architecture: A design for autonomy,” Journal of experimental & theoretical artificial intelligence: JETAI, vol. 9, no. 1, pp. 215–235, 1997. 3. R. J. Firby, “The rap language manual, version 2,” I/NET Inc., Tech. Rep., 2001. 4. R. Firby, “Adaptive execution in complex dynamic worlds,” Ph.D. dissertation, 1989. 5. G. Hager, J. Peterson, and P. Hudak, “A language for declarative robotic programming,” in Proceedings of the International Conference on Robotics and Automation, 1999, pp. 1144–1151. 6. R. Simmons and D. Apfelbaum, “A task description language for robot control,” in Proceedings of the Conference on Intelligent Robots and Systems (IROS), 1998. 7. K. Erol, J. Hendler, and D. S. Nau, “HTN planning: Complexity and expressivity,” in Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), vol. 2. Seattle, Washington, USA: AAAI Press/MIT Press, 1994, pp. 1123–1128. 8. K. Erol, J. Hendler, and D. Nau, “Semantics for HTN planning,” Computer Science Dept., University of Maryland, Tech. Rep. CS-TR-3239, 1994. [Online]. Available: citeseer.ist.psu.edu/erol94semantics.html 9. K. L. Myers, “A procedural knowledge approach to task-level control,” in Proceedings of the 3rd International Conference on Artificial Intelligence Planning Systems (AIPS-96), B. Drabble, Ed. AAAI Press, 1996, pp. 158–165. 10. B. Jensen, G. Froidevaux, X. Greppin, A. Lorotte, L. Mayor, M. Meisser, G. Ramel, and R. Siegwart, “The interactive autonomous mobile system robox,” in Proceedings of the 2002 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems. IEEE, 2002, pp. 1221–1227, ePFL, Lausanne, Switzerland. 11. R. Dillmann, “Interactive natural programming of robots: Introductory overview,” in DREH 2002. Tsukuba, Ibaraki, Japan: Tsukuba Research Center, AIST, December 2002. 12. S. Knoop, S. Vacek, R. Z¨ ollner, C. Au, and R. Dillmann., “A corba-based distributed software architecture for control of service robots,” in Proceedings of the IEEE International Conference on Intelligent Robots and System. Sendai, Japan: IEEE Institute of Electrical and Electronics Engineers, 2004, pp. 3656–3661. 13. R. P. Bonasso, J. Firby, E. Gat, D. Kortenkamp, D. P. Miller, and M. G. Slack, “Experiences with an architecture for intelligent, reactive agents,” vol. 9, no. 2/3, 1997, pp. 237–256. [Online]. Available: citeseer.ist.psu.edu/bonasso97experiences. html 14. R. Simmons and D. Apfelbaum, “A task description language for robot control,” 1998. [Online]. Available: citeseer.ist.psu.edu/simmons98task.html 15. M. Ehrenmann, O. Rogalla, R. Z¨ ollner, and R. Dillmann, “Teaching service robots complex tasks: Programming by demonstration for workshop and household environments,” in Proceedings of the 2001 International Conference on Field and Service Robots (FSR), vol. 1, Helsinki, Finnland, 11.–13. Juni 2001, pp. 397–402. 16. R. Dillmann, O. Rogalla, M. Ehrenmann, R. Z¨ollner, and M. Bordegoni, “Learning robot behaviour and skills based on human demonstration and advice: the machine learning paradigm,” in 9th International Symposium of Robotics Research (ISRR ’99), Snowbird, Utah, USA, 9.-12. Oktober 1999, pp. 229–238.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
865
Intelligent Autonomous Japanese comic ³MANGA´ Designing Support System a
Yuko Ishiwaka a, Yuka Kobayasi b Divition of Synergetic Information Science, Hokkaido University, Japan Kita 14, Nishi 9, Kita-ku, Sapporo, 060-0814 [email protected] b Medec corporation, Japan
Abstract. The purpose of this research is to develop the MANGA designing support system for the beginners who strive to draw cartoons. There are four processes to form manga. This new system is supported NAME process which is most important and difficult during all processes. The system suggests new created NAME based on manga artist characteristics. We created manga database and successes to quantify the characteristics of manga processional artists. In this paper we show the database and explain how to create new NAME. Finally experimental results are shown. Keywords. Manga, Genetic Algorithms, Name, Manga Database
Introduction Japanese comic is sometimes called Manga, and it has quite different style from western comic. Manga is a kind of graphic novel and usually described black and white ink drawings which tell a story. In Japan there are many manga artists (more than 4000people as professional) and more than dozens of times people who hope to come to be manga artists. Japanese comic manga has established original style specially called ³NAME´ which is a kind of storyboard and most important and difficult part during process for forming manga. Many manga artists¶ beginners give up drawing manga in the process of NAME. Manga is categorized by some genres, for example classical, for boys, for girls and for young man. The characteristic of the NAME of each genre is different from the rest. The purpose of this research is to develop a new system to support Name architecture which has never supported. The system suggest some patterns of ³KOMAWARI´, which means to divide one squared paper into to some numbers and shapes, which is based on square, of panels and composition of each panel. A user selects a genre of his favorite manga and total number of page, and then the system proposes some types of NAME. KOMAWARI is generated using genetic algorithms from the characteristics of Manga artists¶ database.
866
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
Trial and Errors
(a) plot
(b) NAME
(c) rough draft
(d) elaboration
Fig. 1 Process for forming MANGA
In this paper we show the characteristics NAME data of each Manga artist, the method of generating NAME based on the database, and finally some results of our proposed system.
1.
Process for Forming Manga
There are four kinds of process, i.e. plot, NAME, rough draft, elaboration, until completing Manga. The definition of each process is Stage 1 .plot : to make a story including characters, rough storyline and concept. Usually the plots are shown by a run of the item. Stage 2. NAME: similar to draft a design for manga. In NAME all important factors are decided, especially how to divide into panels for each page (KOMAWARI) and composition and speech balloon. Many manga artists repeat trial and error in this process. NAME affects the success of manga. Stage 3. Rough draft: draw NAME on copy paper with pencil. Stage 4. Elaboration: finish rough draft with pen using special manga techniques, i.e. painting, visual effects by lines and screen tones and so on. The process for forming Manga is shown in Fig.1. Each painting shows each stage of manga. In this system we support between stage 1 and stage 2, i.e. users make stories by themselves and the system support how to make NAME.
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
867
2. System Architecture Our proposed system has three main functions which are called page allocation, panel allocation and composition suggestion per function. Each main function also has database and generating module. The block diagram of proposed system is shown in Fig.2. In manga artists¶ database there are 12,113 pages of famous Japanese manga artists and are categorized by genres and each artist. The cartelistic appears prominently in each genre and the remarkableness is rise up in each artist. The database is used for proposing new NAME fits within user¶s taste. Details of each function are described in subsections. 2.1 Page Allocation Function 2.1.1
Page Allocation Database
Page allocation database has allocation rate of introduction, development, turn, conclusion, which are very important for storyline, for each artist. The differences of the page allocation rate are prominence for each artist but for genre. Fig.3 shows the characteristics of each manga artist for page allocation.
868
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
Fig.3 Page allocation database of one manga artist per genre
2.1.2 Page Allocation Module Page allocation module divides the total pages which user inputs into introduction, development, turn, conclusion. The rate of allocation of each development is based on page allocation database. 2.2 Panel Allocation Function (KOMARI function) 2.2.1
Panel Allocation Database (KOMAWARI Database)
Panel allocation database has the probability of frameless, modification and the total number of panels. Each genre has a prominent feature of Panel allocation rate. Fig.4 shows the differences of panel allocation characteristics of each genre, i.e. classic style, boy¶s style and girl¶s style. Classic style uses squared panel mainly, boy¶s style uses squared, rhomboid-shaped and trapezoidal panel and girl¶s style uses more frameless and modification panel. Applied size of panel is also different for genre. In the database the size of each panel is categorized into small, middle and large.
Fig.4 Differences of panel allocation characteristics of each genre.(the number shows the average of number of panel per page.
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
869
Fig.5 Bit strings of gene is 163bit and the phenotypic
2.2.2
Panel Allocation Generating Module (KOMAWARI Module)
Based on above database, panel allocation generating module generates new panel allocation by genetic algorithms(GA)[1]. The purpose of applying GA is to create new seed which is unexpected and not to suggest optimized KOMAWARI. The gene of panel are organized by the numbers, the shapes (squared, frameless modification), the sizes (small, middle, large) and the locations of panel in each page. The bit strings of gene is 163bit and the phenotypic is shown Fig.5. From the database and the gene structure we generate large number of panel allocation genes. Initial number of individuals is 300 which are generated based on the above database, i.e. classic style, boy¶s style and girl¶s style gene characteristics are applied per 100. N-point crossover is applied and the positions of crossover are given randomly. The rate of mutation is 2%. The evaluation and selection rules are followings;
870
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
Fig.6. Example of composition of manga
width or height is less than 50px, two consecutive frameless panels, 80% of coordinate value are same as database.
If generated gene fulfills one of these three kinds of rules, the gene is not selected. We generate 20,000gene of panel. These generated panel genes are categorized by the rate of frameless and modification. We call this gene pool ³gene database´. 2.2.3 Panel Allocation Drawing Module The function of Panel Allocation Drawing module is to draw panels based on the phenotype of selected gene among gene pool. The genes are selected following probability distribution. The probability distribution for selection is based on panel allocation database. The gene database is categorized by characteristics of each manga artist. The input data for users are the total number pages and the type of genre. Panel allocation drawing module show some types of candidate NAME and users can chose their favorite NAME for each page. 2.3 Composition Proposition Function 2.3.1 Composition database We moved forward with collection of the manga composition data. Manga composition main types are followings; x x x
Updo: mainly face is drawn in one panel. The composition updo is usually used for expression of feeling for each character. Bottom Up: A waist upward is drawn in one panel. The composition bottom up is usually used for expression of situation of each character. Long: Body as a whole in one panel. Long is used for location or positional relation between characters.
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
871
Fig.7 Examples of results of panel allocated with proposed system x
Background: landscape is drawn in one panel. Background is used for location where the character is in.
The example of composition of manga is shown in Fig.6. 2.3.2 Composition Proposition module Based on the composition database this module suggests some kinds of composition with examples of drawing. The suggestions are shown as updo, bottom up, long or background. The drawing data which are categorized are prepared. The composition proposition module chose one of categorized drawing data for each panel.
3.
Experimental Results
We show the results of NAME using our proposed system in Fig.7 (a) (b). Users set the total number of pages and chose the favorite type of manga genre and manga artists. Then
872
Y. Ishiwaka and Y. Kobayasi / “MANGA” Designing Support System
the system allocates the page using page allocation database. Then the system is proposed some types of NAME with lines using based on panel allocation database. Fig.7 (a) shows the classical type of NAME and Fig.7 (b) shows for the girls¶ type. Classical type are using much squared panels and girls¶ type are using deformation squared or without frames. In this system panel without frames are described by dashed line. These NAME characteristics are consistent with characteristics of manga artists.
4.
Conclusion
We proposed a new intelligent autonomous Japanese comic ³MANGA´ designing support system. This system suggests NAMEs which are most important and difficult during process for forming manga. In order to suggest intelligent and new NAMEs, we create manga database and quantify the characteristics of manga artists. By using this new system manga artist candidate and beginners are supported to draw cartons. Furthermore this system can work for creating new types of photo album or new presentation style. Reference: [1] J.H. Holland: Adaptation in Natural and Artificial Systems, University of Michigan Press/MIT Press, 1975
Part 15 Human Behavior Analysis
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
875
Behavior Induction by Geometric Relation between Symbols of Multi-sensory Pattern Naoki Kojo, Tetsunari Inamura, Masayuki Inaba Graduate School of Information Science and Technology, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 Japan Abstract. In this paper, we propose a method for achieving behavior teaching, situation recognition, and behavior induction by abstracting time series data from various sensors and acquiring symbols of behavior. We use the proto-symbol space proposed by Inamura et al. for symbolization. Our system is fast enough to execute real-time performance. As an implementation, we performed an experiment on a humanoid robot, HRP-2W, to teach behavior by directly moving his hands and let him recognize situation and evocate behaviors. Keywords. Behavior induction, Hidden Markov Models, Humanoids, Interaction, Learning
1. Introduction Acquisition of new behavior through the interaction with human is important for intelligent robots which autonomously decide its behavior. As robots are developed, sensors which have effect on them are increasing, such as microphones, cameras, force sensors, pressure sensors, range sensors, and so on. For the aim of having these recent robots learn behavior and recognize situation, it is essential factor to memorize time-series correlation between various sensors. RNN (Recurrent Neural Network) is often used for the aim of teaching and recognition with symbolizing time-series data as dynamic model. Ito et al.[1] applied RNNPB (RNN with parametric bias) to behavior acquisition and induction. Though this method is applicable for relatively few sensor data, it has some difficulty when applying to large scale data. Moreover, convergence of learning computation is not always assured. Ogata et al.[2] utilize multi-sensory data to extract multi-modal dynamics of objects using RNNPB. Their system is similar to ours in terms of usage of multi-sensory data. The advantages of our method are capacity against multidimensional data and less computational cost. Comparison with their system is described later. Other famous approach is dynamics based approach[3][4]. In the method proposed by Okada et al.[3], they succeeded in memorization, production, and recognition of motion. However, they don’t mention how to teach robots motion interactively. On the other hand, Inamura et al.[5] are developing the learning method using CHMM (Continuous Hidden Markov Model) to symbolize time-series data
876
N. Kojo et al. / Behavior Induction by Geometric Relation
patterns. This system can deal with large-scale data whose physical value differs. What is more, convergence of learning computation is always assured. In this paper, we propose the interactive behavior learning and induction system using the proto-symbol space proposed by Inamura et al. The features of the proposed system are as follows. (a) Symbolization of multi-sensory timeseries data (combination of encoder, vision, force sensor, and so on). (b) Realtime situation recognition and behavior induction using the distance in the protosymbol space. (c) Interactive teaching method when implementation. The proposed method is implemented to the humanoid robot, HRP-2W, and realizes the situation recognition and behavior induction based on symbolization of multi-sensory data.
2. Behavior induction 2.1. Behavior induction with multi-sensory time-series data Most behavior should be prescribed by the relation between various sensor data. For example, when learning how to pour a glass of water, robots have to learn not only joint angles but also the positional relation between items, transition of weight got from force sensors, and so on. Induction is also discussed in the area of psychology. Knuf et al.[6] illustrates intentional induction, in which people tend to perform movements suited to achieve what they would like to see. Thus, we think that motion would be learned as behavior only with connection to environmental information obtained from various sensors. Unlike ”Teach and Playback”, for situation recognition by comparing current condition to learned one, symbolization of behavior is essential. With situation recognition under the symbolization, we can easily build up the behavior induction system. Behavior induction is often occurred in human. When walking along the corridor and passing through the room whose door is open, everyone would look at it. As humanoids, it is valid for infectious behavior. For example, a robot should shake hands if someone takes hold of its hands. Putting behavior induction to practice use, we can also construct the system for remote-control robots by which they recognize the current situation and execute remaining behavior with certain condition. Thinking from these discussions, the system of behavior acquisition and induction must meet following requirements. (a) Management of multi-sensory data whose physical values differ markedly. (b) Symbolization of time-series data pattern. (c) Calculation of physically meaningful distance between symbols. (d) Production of behavior pattern from symbol space. As the method which meets these conditions, we adopt the proto-symbol space using CHMM. Up to now, Inamura et al. have studied about creation of the proto-symbol space from time-series motion data and motion induction using symbolization by combining motion data and vision. With these studies in mind, we realize situation recognition and behavior induction on the real robot by allocating symbols, which is created from multi-sensory time-series data, to the proto-symbol space.
877
N. Kojo et al. / Behavior Induction by Geometric Relation
2.2. Proto-symbol space using CHMM CHMM is used to create discrete symbols from time-series behavior pattern. Fig.1 shows the relationship between CHMM and behavior pattern. In general, HMM have five parameters: finite set of states S={s1 ,...,sN }, that of output symbols O={o1 ,...,oM }, probability distribution of state transition A={aij }, that of output symbols B={bi }, that of initial state π={πi }. In the case of CHMM, π is constant because CHMM adapt Left-to-Right model. As you see in Fig.1, certain behavior pattern can be abstracted as one CHMM by relating output vector S to behavior data. Among the other four parameters, S and O don’t directly affect time-series data of behavior. That is to say, control of stochastic process is decided only by two parameters, A and B. Consequently, the proto-symbol is defined as the dyad of these two parameters, λ={A,B}. a11
a22
a33
aN-1N-1
s 1 a12 s 2 a23 s 3 b2( )
b1( )
O
o[1]
o[2]
pitch
bye leftmove
bN-1( )
b3( ) o[3]
pour
sN-1 aN-1 N sN o[4]
…
o[T] motion time
CHMM and Behavior
Proto-symbol space
Figure 1. CHMM and proto-symbol space
At this stage, it is difficult to treat relationship among motion patterns, or proto-symbols. In proto-symbol space, relation among proto-symbols is described as distance in reduced dimensional space. We basically use 3D space for visualization and computation cost. With proto-symbol space, we can execute behavior recognition and regeneration of symbolized motion only by geometric operation in 3D space. To create proto-symbol space, Ds , distance among proto-symbols, is defined using Kullback-Leibler divergence, which can quantitatively describe hiatus between probability distributions. Multidimensional scaling allows creation of 3D space from Ds . See [5] for detail. Fig.1 shows an example of proto-symbol space. Each symbol corresponds with HMMs. As proto-symbols are related to HMM parameters, reproduction of symbolized data is also capable. To get liable data, HMM is performed for many times and smooth them because HMM is a probabilistic model. 2.3. Utilization of proto-symbol space In our research, we symbolize multi-sensory data, such as joint angles, raw data of force sensors, positions of colored objects obtained from cameras, and positions of both hands. In learning phase, the start and end points are directed by human operator. We can symbolize behavior from just one data as well as several one. How many times we should teach is discussed later. After getting some symbols, we create proto-symbol space with them. In our system, learning phase is explicitly separated from the phase of recognition and induction. During recognition, symbolization is occurred successively as showed in right side of Fig.2. The
878
N. Kojo et al. / Behavior Induction by Geometric Relation
resultant symbol represents current situation. The time span for symbolization is constant and decided by human operator beforehand. This time we adopt 20 data per one symbolization. Current symbol is projected into proto-symbol space which is created before recognition. As a result, current symbol goes around the 3D space. To bring behavior induction to effect, we devised methods to select the symbol to be induced according to the distance against current symbol and its direction of transition. Once certain symbol is selected, induction phase occurs. The robot recalls the behavior and executes the rest of motion. As symbols include motion data, reproduction of recalled motion is accomplished with simple operation.
3. Teaching system on HRP-2W We use humanoid robot, HRP-2W[7], as the experimental platform of behavior acquisition and induction. HRP-2W is made by KAWADA industries, Inc. Its upper body is the same as HRP-2 and its lower body is electric wheelchair. HRP2W has following sensors: color stereo camera, two microphones, encoders for twenty joint angles, 6 axis force sensors on both hands, and Laser Range Finder. The proposed system can manage information from these sensors simultaneously. Fig.2 shows the system construction for behavior acquisition and induction on HRP-2W.
pour pitch
bye leftmove
Figure 2. System structure
For intuitive interaction, this time we adopted the method of moving the arms of HRP-2W directly by dropping servo gain of both arms. In our experiments, we apply 53 degrees for symbolization. The number of raw joint angle data is 20. Among twenty joints, two joints for the neck are not moved by human. To obtain appropriate visual data, HRP-2W always looks at the mean point between two arms. Position and rotation (roll, pitch, and yaw) of both hands account for 12 degrees. Position is described by the coordinate system of HRP-2W. Raw data from 6-axis force sensor on both hands have 12 degrees. The position of three colored objects needs 9 degrees. This time we use the center of red, green, blue
N. Kojo et al. / Behavior Induction by Geometric Relation
879
area, whose parameters are fixed, for target position. If nothing is recognized, rated values are used. These degrees add up to 53. Frequency of data acquisition is decided to be about 5 Hz for secure operation in induction phase. 4. Results 4.1. Situation recognition In order to certify the effectiveness of proposed system, we carried out some experiment of situation recognition. First, we taught HRP-2W thirteen different behaviors. Fig.3 shows the scene of teaching ”pitch-r”, ”goodbye”, and ”pourpot”. Each teaching was performed five times. The length of sequence data is all deferent and is about 250 with 53 dimensions. On an average, it took only 179 seconds with dual processor using Xeon, 3.06 GHz to learn. In the system of [2], RNNPB was trained by 100 sequence data with 10 dimensions 100,000 times and it took about 1 hour using Pentium IV, 2.8 GHz. Our system seems to be far faster.
Teaching “pitch-r”
Teaching “goodbye”
Teaching “pour-pot”
Figure 3. Experiments of teaching various motion
For recognition, we let HRP-2W act shaking hands with his right hand (”shakehand-r”), pitching red ball with his right hand (”pitch-r”), and pouring with blue pot and green cup (”pour-pot”) successively. Data are obtained three times. As an indicator of recognition accuracy, we defined ER (Error Rate) as Nmiss ÷Ntotal ×100(%). Ntotal is data amount as we arbitrary move him. This time Ntotal adds up to 2098. Nmiss is counted when the result of recognition is different from our intention. In this section, the behavior whose distance against current state is nearest shall be deemed to be the recognized one. As the definition of ER includes our intention, there remains a little ambiguity. 4.1.1. Changing number of symbolized behavior We changed the number of symbolized behaviors which construct proto-symbol space. Fig.4 and Table.1 shows the result of situation recognition with 3, 6, 9, and 13 symbols. Arrows in the under side of each figures show the name we intended, that is the part included in Ntotal . As showed in Table.1, recognition time is fast enough for real-time performance. Although ER tend to go up as number of symbols increases, it’s still low with 13 symbols. If recognition is difficult with more symbols, we can manage by substituting 4D or 5D space for 3D space. Whether recognition succeeds or not depends not only on number of symbols but also on whether there are similar behaviors. Thus, it is difficult to show clearly how many behaviors this system can manage.
880
N. Kojo et al. / Behavior Induction by Geometric Relation 1.6
pitch-r pour-pot shakehand-r
0.4
1.4
0.35
1.2
0.3
1
0.25
0.8
0.2
0.6
0.15
0.4
0.1
0.2
0.05
0
0
20
40
60
80
100
120
0
140
0
pitch-r pour-pot shakehand-r pitch-l shakehand-l uchiwa-l
20
Using 3 symbols 0.3
60
80
100
120
140
Using 6 symbols pitch-r pour-pot shakehand-r pitch-l shakehand-l uchiwa-l trash-r uchiwa-r viva
0.25
40
0.25
pitch-r pour-pot shakehand-r pitch-l shakehand-l uchiwa-l trash-r uchiwa-r viva goodbye hello-l hello-r punches
0.2
0.2
0.15 0.15
0.1 0.1
0.05
0.05
0
0
20
40
60
80
100
120
0
140
0
20
Using 9 symbols
40
60
80
100
120
140
Using 13 symbols
Figure 4. Experiments of situation recognition with some symbols
symbols, learning data, DOF
3,5,53
6,5,53
9,5,53
13,5,53
recognition time(msec/once)
47
85
132
184
frequency of error(%) 9.4 2.6 6.8 14.6 Table 1. Frequency of error and recognition time
6,1,53
6,5,20
85
72
1.1
5.2
4.1.2. Situation recognition under various conditions We performed two more experiments under different conditions. First, we changed the data amount to learn. This time, only one data per each behavior was used for learning. Other conditions were same as previous subsection. The left side of Fig.5 is the result of recognition with 6 symbols, and seems to be similar to the upper right of Fig.4. As showed in Table.1, ER is not much different. Average learning time was 35.2 seconds and faster than previous one (178.8 seconds). We can conclude that teaching once is enough and better. Second, we performed experiments with motion data (20 DOF) extracted from original learning data (53 DOF) to check the effect of utilization of multi-sensory data. Other conditions were same as previous subsection. Average learning time was 111.5 seconds. The right side of Fig.5 is the result of recognition with 6 symbols. In comparison with the upper right of Fig.4, overall speed of change is
881
N. Kojo et al. / Behavior Induction by Geometric Relation 0.4
pitch-r pour-pot shakehand-r pitch-l shakehand-l uchiwa-l
0.35
0.4
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
20
40
60
80
100
120
140
Using 1 data with 53 dimensions
pitch-r pour-pot shakehand-r pitch-l shakehand-l uchiwa-l
0.35
0
0
20
40
60
80
100
120
140
Using 5 data with 20 dimensions
Figure 5. Recognition results under various conditions
smooth, and there is time-lag. Based on this result, we can say that recognition would be more accurate with more multi-sensory data for learning, and that among 53 DOF which feature would be mainly used for recognition depends on circumstances. 4.2. Behavior induction For behavior induction, we use two thresholds. One is distance, and the other is gradient calculated from recent few distance data in proto-symbol space. As indicated in previous section, thresholds should be modified according to the number and trait of symbols. One of our system’s features is that nothing can be selected if current state doesn’t meet thresholds. With thresholds we can adjust how often or how easily behavior induction occurs. Fig.6 shows an experiment of behavior induction. Behaviors which construct the space were ”pitch-r”, ”goodbye”, and ”pour-pot”. We had HRP-2W move a little to give him trigger for induction, and succeeded in letting him recall intended behavior successively. It took about 3 to 5 seconds to let him evocate. Owing to the space with only three symbols and adjustment of thresholds, error rate was about one fifth. But performance of recalled behavior was not so good. Although motion was similar to learned one, HRP-2W can’t actually pitch ball and pour water to the cup because we use only motion for reproducing behavior now. In order to realize accurate operation of objects at induction phase, we will have to use positional relation between targets and hands and so on. It seems to be better to apply current system to behaviors whose motion flow is significant.
5. Conclusion In this paper, after discussing the importance of behavior induction on humanoids, we presented the system of situation recognition and behavior induction on HRP2W by applying proto-symbol space with multi-sensory data. Three key characteristics of proposed system is (a) symbolization of behavior with multi-sensory
882
N. Kojo et al. / Behavior Induction by Geometric Relation
recall “pitching”
recall “bye”
recall “pour-pot” Figure 6. An experiment of recalling various motion inspired by multi-sensory clue
data, (b) real-time situation recognition and behavior induction on real robot using distance in proto-symbol space, and (c) interactive teaching system by directly moving HRP-2W’s arms. Through experiments, we demonstrated the effectiveness of proposed system on real robot. We are currently working on how to give weighting on learned data. In the light of increase of sensor data, accurate situation recognition seems to be difficult without weighting. We are now thinking about interactive method of approach for this problem.
References [1] Masato Ito, Jun Tani. “Generalization in learning multiple temporal patterns using rnnpb”, Int’l Conf. Neural Information Processing (ICONIP’04), 2004. [2] Tetsuya Ogata, Hayato Ohba, Jun Tani, Kazunori Komatani and Hiroshi G. Okuno. “Extracting Multi-Modal Dynamics of Objects using RNNBP”, Proc. of the 2005 IEEE/RSJ Int’l Conf. on Intelligent Robotics and Systems (IROS’05), pp.160–165, 2005. [3] Masafumi Okada, Daisuke Nakamura, and Yoshihiko Nakamura. “Hierarchical Design of Dynamics Based Information Processing System for Humanoid Motion Generation”, Proceedings of the 2nd Int’l Symposium on Adaptive Motion of Animals and Machines, Kyoto, March.4-8, SaP-III-1, 2003. [4] Auke Jan Ijspeert, Jun Nakanishi and Stefan Schaal. “Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots”, Proceedings of the 2002 IEEE int’l Conference on Robotics and Automation, pp.1398–1403, Washington DC, May 2004. [5] Tetsunari Inamura,Yoshihiko Nakamura and Iwaki Toshima. “Embodied Symbol Emergence based on Mimesis Theory”, Int’l Journal of Robotics Research ,Vol.23, No.4, pp.363– 377, 2004. [6] Knuf Lothar, Aschersleben Gisa, Prinz Wolfgang. “An analysis of ideomotor action”, Journal of Experimental Psychology: General, Vol.130(4), pp.779–798, Dec 2001. [7] Tetsunari Inamura, Kei Okada, Masayuki Inaba and Hirochika Inoue. ”HRP-2W: A Humanoid Platform for Research on Support Behavior in Daily life Environments”, Proc. of 9th Int’l Conf. on Intelligent Autonomous Systems, 2006.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
883
Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure Qualitative Assessment Using Arm Trajectory Profiles Yukio Horiguchi a,1 , Satoshi Tsukamoto a , Hiroyuki Ono a , Tetsuo Sawaragi a , and Masahiro Sato b a Department of Mechanical Engineering and Science, Graduate School of Engineering, Kyoto University b Sato Giken Co, Ltd. Abstract. This paper presents a motion pattern analysis to examine the effects of the robotic power assisting behaviors to the user’s motion structures. The basis of the analysis is placed on how much the user’s original motion patterns are deformed by the intervention of the robotic system in terms of motion trajectory shapes. As to the one-handed lift-weight motion the developed robotic orthosis is intended for, spontaneous human motions are confirmed as specified by the bell curves of the angular velocity at the elbow joint, analogous to the arm trajectories in point-topoint movements. Using this frame of reference, those deformed motion structures that would fluctuate due to the purely passive, impedance controlled orthosis behaviors, are demonstrated by the comparative experiments. In addition, based on a hypothesis that the power assisting behaviors at the appropriate timing would keep the user’s spontaneous motion structure, another experiment is conducted to reveal that attuned orthosis motions led their collaborative motion structure to the same with the user’s original and unforced motion patterns in terms of their shapes. Keywords. Power assist orthosis, human-machine interaction, motion pattern analysis
1. Introduction Nursing-care is one of the application fields for the robotic orthosis technologies that aim at reducing human physical workloads by some robotic systems. For instance, moving a bedridden patient and changing his or her posture are such hard physical tasks for care-givers that some engineering supports are to be expected. Those interpersonal works basically require delicate and attentive operations to the care-givers. From this point of view, more than physical requirements like work performance and safety needs to be clarified for the robotic orthosis systems, which considers how both advantages of human skills and mechanized controls can be effectively combined into their joint activities. Since the orthoses work in touch with the users’ bodies, they should be realized 1 Correspondence to: Yukio Horiguchi, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan. Tel.: +81 75 753 5044; Fax: +81 75 753 5044; E-mail: [email protected].
884
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure 㪪㪿㫆㫌㫃㪻㪼㫉㩷㪡㫆㫀㫅㫋㩷㩿㪡㫆㫀㫅㫋㩷㪈㪀㩷
㪯
θ1
㪬㫇㫇㪼㫉㩷㪘㫉㫄 㪜㫃㪹㫆㫎㩷㪡㫆㫀㫅㫋㩷 㩿㪡㫆㫀㫅㫋㩷㪉㪀
θ2
㪣㫆㫎㪼㫉㩷㪘㫉㫄
㪰 Figure 1. Power assist orthosis for flexion and extension of the elbow joint
㪣㫆㪸㪻㫀㫅㪾㩷 㪮㪼㫀㪾㪿㫋
Figure 2. Upper limb model for simulating lift-weight motion dynamics
in accordance with a criterion not to change the user’s original or natural motion structures as much as possible. Their assisting behaviors are to be accommodated in the joint activities with human users. In order to sort out those factors to be considered in designing the human-machine joint activities, this paper analyzes the effects of the power assisting behaviors to the user’s motion structures. The basic idea of the analysis is based on the isomorhism [1] between work behaviors performed only by a human worker and by human-machine jointed workers (i.e., a human user and the robotic orthosis together). A prototype robotic orthosis, which was intended for assisting flexion and extension of the elbow joint during one-handed lift-weight motions, has been developed as a testbed for discussions. Its behavior is regulated by the standard impedance control scheme. At the same time, spontaneous lift-weight motions are qualitatively accessed using the arm trajectories of the angular velocity at the elbow joint. This frame of reference is utilized to discuss how much and why the user’s original motion patterns are deformed by the intervention of the robotic system. In addition, possibility of attuning the orthosis behavior is examined a little toward the collaborative motion structure to the same with the user’s original and unforced motion patterns.
2. Development of Power Assist Orthosis for Flexion of the Elbow Joint Figure 1 shows a picture of the power assist orthosis the authors have developed. The drive train of the orthosis is composed of one HarmonicDrive gear and two timing belt systems, all of which are placed at the backside of the upper arm and collaboratively transfer the rotation of the AC servo motor to the elbow of the orthosis with high speed reduction ratio. The maximum torque and rotational speed the orthosis can afford around the elbow joint are 36 Nm and 18 rpm, respectively. These functional capabilities were specified so that the user can easily lift some weight about 10 kg with one hand after a preliminary motion analysis experiment. The analysis exploited the rigid link model illustrated in Figure 2 as the simulation model to calculate the loading torque at the elbow joint during a series of lift-weight motions. On the other hand, the user operations or their intentions are captured through the FlexiForce load sensors as measuring relative changes in force on the contact surfaces between the user’s and the orthosis’ bodies.
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
885
3. Evaluating Power Assisting Behaviors based on Morphological Similarity of Motion Trajectories 3.1. Hand Trajectory Planning Criterion for Point-to-Point Movements and Schema of Lift-Weight Motion Evaluating power assisting behaviors by the robotic orthosis will require defining what specifies the natural lift-weight motions. For this purpose, the trajectory planning criteria for point-to-point movements are focused on. The point-to-point movements indicate the reaching motions in which the human subject moves his/her hand from initial to final positions. If those movements are performed in nature, hand trajectories will go straight and a profile of their tangential velocity will shape a “bell curve” that is smooth and symmetrical as to one maximal value [2, 3]. While there have been proposed various models that explain those shapes of trajectories, the minimum commanded torque change model has been advocated as providing the most accurate predictions of the hand movements among them [4]. The minimum commanded torque change model gives an account of the point-topoint movements as the minimization of the total torque change commanded to all the joints during the motions. Its trajectory planning criterion is expressed in the evaluation function of Eq. (1): CT =
1 2
0
tf
n dτi 2 ) dt ( dt i=1
(1)
Where, τi represents the estimated loading torque at the ith joint, which is derived from the muscular elasticity as well as the link dynamics, and tf represents the finish time of the reaching motion. In this work, assuming that the lift-weight motions can be analyzed as a class of those point-to-point motions, the simulated trajectories computed from the minimum commanded torque change model were compared with the actual motion trajectories. After the two-dimensional arm model with two links and two joints described in Figure 2, the torques at the respective joints are given by Eqs. (2) and (3): τ1 = [M1 S12 + I1 + M2 (L21 + S22 + 2L1 S2 cos θ2 ) + I2 ]θ¨1 +[M2 (S22 + L1 S2 cos θ2 ) + I2 ]θ¨2 2 −M2 L1 S2 sin θ2 (2θ˙1 θ˙2 + θ˙2 ) + M1 S1 g sin θ1 + M2 g[L1 sin θ1 + S2 sin(θ1 + θ2 )] (2) 2 2 +[m(L1 + L2 + 2L1 L2 cos θ2 )]θ¨1 + [m(L22 + L1 L2 cos θ2 )]θ¨2 2 ˙ ˙ ˙ −mL1 L2 sin θ2 (2θ1 θ2 + θ2 ) + mg[L1 sin θ1 + L2 sin(θ1 + θ2 )] +B11 θ˙1 + B12 θ˙2 τ2 = [M2 (S22 + L1 S2 cos θ2 ) + I2 ]θ¨1 + (M2 S22 + I2 )θ¨2 2 +M2 L1 S2 sin θ2 · θ˙1 + M2 gS2 sin(θ1 + θ2 ) + [m(L22 + L1 L2 cos θ2 )]θ¨1 + mL22 θ¨2 (3) 2 +mL1 L2 sin θ2 · θ˙1 + mgL2 sin(θ1 + θ2 ) +B22 θ˙2 + B21 θ˙1
Where, the subscripting number 1 or 2 of each equation variable indicates the variable’s attribution to the upper or lower arm of the link model. Then, Ii , Mi , and Li represent each arm’s moment of inertia, mass, and length, respectively, while Si does the length from the centroid of the ith arm to the ith joint position. Bij denotes the viscosity caused
886
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
Figure 3. Snapshot of the one-handed lift-weight motion with the robotic orthosis
by the muscle connecting the ith arm with the j th . The variables of m and g indicate the mass of the lifted weight and the acceleration of gravity, respectively. The penalty method was employed to compute the simulated arm trajectories in terms of the joint angles, i.e., theta1 and theta2 , whose evaluation function was given by Eq. (4). The evaluation function is determined by the terms on the boundary conditions of the final arm posture as well as the one corresponding to the criterion of Eq. (1). Trajectories were optimized stepwise by decreasing the value of λ from its large default value as giving more weight to the minimization of the commanded torque changes. 1 L = (θf − θfd )2 + (θ˙f − 0)2 + λ 2
0
tf
n dτi 2 ) dt ( dt i=1
(4)
In this calculation, the values of the viscosities Bij and the moment Ii were determined based on the results of the studies by Yamanaka et al. [4] and Gomi et al. [5]. The lengths of the arms were tailored to the experimental subject’s body so that L1 and L2 were set to 0.285 m and 0.250 m, respectively. Examining the actual motion trajectories recorded by a motion capturing device, the natural lift-weight motions proved in the “bell curves” in the angular velocity at the elbow joint, as shown in the profiles of (1)-(a) and (1)-(b) in Figure 4(a). This tendency is analogous with the case of the point-to-point movements because the bell shape specifies the reaching motions naturally conducted. The motion trajectory calculated under the condisions with no consideration of loading weights and within the natural motion speed gave instances for comparison, suggesting a dimension for evaluating whether a motion is naturally conducted by a human performer or not. 3.2. Evaluating Power Assisting Behaviors Controlled by Impedance Control Effects of the power assisting behaviors by the robotic orthosis were examined by the comparative experiments observing human motions with and without power assists. The motion of the orthosis was controlled by the position-based impedance control scheme after the impedance parameters had been adjusted by trial and error according to the users’ feelings. One load sensor was attached to the inside of the band fastening the user’s wrist with the orthosis, thereby measuring the contact force he was imposing to
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
887
the orthosis when trying to bend his elbow. The detected force was translated to the target elbow angle so that the orthosis could simulate the desired impedance properties during movements. In this way, the user’s motions were assisted in that the robotic orthosis subrogated in part the necessary power to lift weights. Figure 3 presents a snapshot of the one-handed lift-weight motion with the robotic orthosis. Table 1. Experimental conditions Loading Weight
(a) 0 kg
(b) 3 kg
(c) 7 kg
(1) Natural Motion Speed
(1)-(a)
(1)-(b)
(c)
(2) Slow Motion Speed
(2)-(a)
(2)-(b)
—
All the experimental conditions are summarized in Table 1, which are pairings of the loading weight condition (i.e., (a) 0 kg, (b) 3 kg and (c) 7 kg) and the motion speed condition (i.e., (1) the natural motion speed, and (2) deliberately slow motion speed). The condition (2)-(c) was removed from the experiments because of the much difficulty to perform in it. Figure 4 compares the motion trajectories in the angular velocity at the elbow joint. In those plots, the horizontal axis represents the normalized time, by which one unit of time elapses from start to finish in a lift-weight motion, while the vertical axis tells the angular velocity at the elbow joint. Figure 4(a) confirms two bell curves of (1)-(a) and (1)-(b) that are both the trajectories under the easy and unforced task conditions. As mentioned in section 3.1, the bell shape in the tangential velocity specifies the naturally conducted performances in the case of the point-to-point movements. The result suggests that we can assume that the natural lift-weight motions prove in the bell-shape curves of the angular velocity at the elbow joint analogously. On the contrary, Figure 4(b) shows many inflection points in the velocity curves. That is, the angular velocity fluctuates during a motion with the robotic orthosis. The longer the motion takes time, the more number of inflection points are observed in the curves. As well as the unnatural or forced task conditions in Figure 4(a), the robotic power assisting behaviors would deform the original motion patterns by the human subjects. The reason why their joint activity results in those fluctuated behaviors can be considered as follows: 1. After the user has started to bend his elbow, changes in the contact force detected by the load sensor drive the orthosis’s lift-up motion. This robotic motion helps the user with some assisting power, thereby accelerating their elbow flexion. The angular velocity gives a downward convex curve in this phase. 2. Whether the user has acknowledged the assisting force by the orthosis or not, the initiated robotic behavior functions to relax the user’s operational force. Thereby the measurement of the contact force decrease, and the assisting power abates accordingly. The angular velocity curve becomes upward convex in this phase. 3. The abatement of the power assisting behavior requires the user’s lead to go on with the bending motion. Therefore he has to increase the contact force, and thus will go back to the phase 1, again. As long as the users expose such trembling and deformed behavioral patterns, it can be hardly said that they could perform the task with no conscious of the robotic assistant. The authors consider one critical factor of those disharmonious collaborations as the dependency of the robotic behaviors only to the feedback control. In the current
888
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
Figure 4. Comparison of the angular velocity at the elbow joint among the different experimental settings: (a) without and (b) with power assists
implementation, all the motion trajectories are planned only by the human user. The orthosis only follows up the user’s motion.
4. Consideration of Timing for Assisting Behaviors toward Coordinated Motions Considering interpersonal collaborations like that two persons try to displace something heavy in cooperation, it is a commonly used technique to say some words to each other so as to attune the timing to initiate their respective actions. If their operations are not synchronized, the joint activity will result in an awkward thing. Supposing the lift-weight motion task analyzed so far as the joint activity by two different agents, i.e., a human
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
889
Figure 5. Yet another experiment in which the user can arbitrarily control the motion of the robotic orthosis by his left hand
Figure 6. Profiles of the angular velocity at the elbow joint in the second experiment
agent and a robotic agent, their collaboration is very different from the former example in that one half of the collaborators is always passive. The robotic orthosis has no means to acquire some predictive cues for or to call the tune toward their coordinated collaborative performance. Based on this consideration, the authors have set up a hypothesis that the power assisting behaviors at the appropriate timing would not deform the user’s spontaneous motion structure. For a preliminary analysis, another simple experiment was conducted. Figure 5 shows the experimental setting in which the user can arbitrarily control the motion of the robotic orthosis by his other hand with nothing on. The human operator who wears the orthosis on his right arm manipulates the elbow flexion angle by pinching the isolated load sensor using his free left hand. In this experiment, the load weight was selected of 3 kg. The experimental subject was instructed to operate the orthosis so as to make its assisting behaviors to suit his taste. Figure 6 presents three profiles of the angular velocity at the elbow joint in the experiment, which correspond to the motion trajectories in the first, second, and third trial, respectively. This result illustrates all the trajectories form the bell curves. In addition, as
890
Y. Horiguchi et al. / Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure
trials accumulate, the curve becomes smoother and more symmetric as to one maximal value. This “growth” implies that the user’s adjustment of the orthosis operation led their collaborative motion structure to the same with the user’s original and unforced motion patterns in terms of their shapes. It suggests that, so as to realize those natural joint activities, the orthosis should have capability to sense some predictive information to act, and should possess several behavioral schemata that are actualized by such triggers.
5. Conclusion This paper presented the motion pattern analysis to examine the effects of the robotic power assisting behaviors to the user’s motion structures. The basis of the analysis is placed on how much the user’s original motion patterns are deformed by the intervention of the robotic system in terms of motion trajectory shapes. As to the one-handed liftweight motion our robotic orthosis is intended for, spontaneous human motions could be specified by the bell curves of the angular velocity at the elbow joint, analogous to the arm trajectories in point-to-point movements. Using this frame of reference, those deformed motion structures that would fluctuated due to the purely passive, impedance controlled orthosis behaviors, were demonstrated by the comparative experiments. In addition, based on a hypothesis that the power assisting behaviors at the appropriate timing would keep the user’s spontaneous motion structure, another experiment was conducted to confirm that attuned orthosis motions led their collaborative motion structure to the same with the user’s original and unforced motion patterns in terms of their shapes.
Acknowledgements This research is supported in part by Center of Excellence for Research and Education on Complex Functional Mechanical Systems (COE program of the Ministry of Education, Culture, Sports, Science and Technology, Japan). The authors would like to thank the support.
References [1] J.H. Holland et al., Induction: Processes of Inference, Learning, and Discovery, The MIT Press, 1986. [2] M. Kawato et al., Undou, Iwanami Kouza: Ninchi Kagaku 4, Iwanami Shoten, Publishers, 1994 (in Japanese). [3] E. Nakano, H. Imamizu, R. Osu, Y. Uno, H. Gomi, T. Yoshioka, and M. Kawato, Quantitative Examination of Trajectory Planning Criteria Based on Many Data of Trajectories, Technical Report of The Institute of Electronics, Information and Communication Engineers, NC96-72 (1996), 111–118 (in Japanese). [4] K. Yamanaka, Y. Wada, and M. Kawato, Quantitative Examinations for Human Arm Trajectory Planning in Three-Dimensional Space, Transactions of The Institute of Electronics, Information and Communication Engineers, J85-D-II(3) (2002), 493–503 (in Japanese). [5] H. Gomi and R. Osu, Human Arm Stiffness and Viscosity in Interaction with Environments on a Horizontal Plane. Technical Report ISRL-96-3, NTT Basic Research Laboratories, 1996.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
891
A Human Behavior Discrimination Method based on Motion Trajectory Measurement for Indoor Guiding Services Hajime ASAMAa, b, Atsushi MORIMOTOc, Kuniaki KAWABATAb, a, Yasushi HADAb, a a RACE (Research into Artifacts, Center for Engineering), the University of Tokyo, Japan b RIKEN (The Institure of Physical and Chemical Research), Japan c Computer Associates International, Inc., Japan
Abstract. As a research on service engineering, a method to discriminate whether a person entering a space is a visitor or a staff was developed for guiding services in indoor environment. The motion trajectory of target is measured by background image subtraction and coordinate transformation, and the positional and orientational change per unit time (v and θ) of the trajectory is calculated. The motion index is defined by v cosθ, the value of which is characterized effectively to discriminate the target. The experiment setup was implemented using a sensor node with an artificial retina camera. The positional error of measurement was 350[mm] at the 6[m] distant position from the camera. The distribution of the motion index mean was obtained and approximated by Gaussian distribution. The probabilities of visitors and staffs were calculated by a statistical approach. As a result of experiment, it is shown that 90% of the target was successfully recognized. Keywords. Human behavior discrimination, motion trajectory, motion index, image processing, statistical approach, artificial retina camera
1. Introduction Towards the post mass production paradigm, the service engineering challenges to systematization of service synthesis to generate added value in artifact design and utilization[1]. We have been discussing on the function required for the artificial systems and the methodology to realize services, regarding the artificial systems as media to transmit services from service producers to users. In order to deliver suitable services, it is required to recognize what kind of services the users demand and in what kind of states the users are. In this research, assuming that such information must appear in the behaviors of the users, recognition and discrimination of the users for guiding services is discussed based on the measurement of users’ behavior. Human motion detection and recognition are subjected to concentrated research [2]-[7]. In CMU, VSAM (Video Surveillance and Monitoring) has been developed as a system in which motion of humans or vehicles is detected and recorded by the cameras deployed in outdoor environment[8][9].
892
H. Asama et al. / A Human Behavior Discrimination Method
Measurement of human behavior by the sensors equiped in the environment has been discussed in the research of Robotic Room[10], Intelligent Space[11], etc. Recognition of human behavioral patterns has also been discussed based on using position and posture of human body[12]. A system for detecting dubious persons was also developed[13]. However, for recognition of users to whom suitable services should be delivered, it is still required to descriminate the users or their states depending on services, based on the model of users’ behaviors representing the relation between users’ states and users’ behaviors. In this paper, aiming at development of a guiding and navigation service system to visitors in indoor environment, discrimination of persons who are walking into the entrance in a indoor space is investigated to recognize whether the persons are visitors or not, because it is desired to deliver the guidance service only to the visitors, and to avoid a nuisance to the staffs who know the environment very well already. This paper proposes an motion index utilizing the velocity and orientational change of the motion trajectory of a target person walking into the entrance, which is a characteristic variable to descriminate the persons effectively, and presents a moethod for the trajectory measurement by image processing and a statistical method for the descrimination based on the motion index mean. A compact sensor node device equiped with an artificial retina camera is used for the measurement toward the ubiquitous computing environment.
2. Method for Human Motion Trajectory Measurement and Discrimination The motion trajectory of a person walking into the entrance is very important to discriminate whether he/she is a visitor or not, because the visitors may blunder around in the environment to find the way to the destination, while the staffs must go straightforward to the destination in comparably fast speed, who know the map of the environment very well. There have been reported some methods to measure and discriminate the trajectories of human motion by cameras set up in the environment[12][14], where statistical methods to discriminate the trajectories are employed. We pay more close attention to the characteristic properties in the trajectories, which are velocity and orientational change. Our method for trajectory measurement uses the background subtraction to extract moving human from the image taken by a camera. Since the bottom part of the extracted image of human is the foot which must contact with floor, the position of the foot can be calculated by coordinate transformation from the coordinates of the bottom part of the subtracted image. By measuring the foot position continuously, the trajectory of human motion can be obtained as the time series of the foot position in the world frame. In the analysis of the human behavior from the motion trajectory obtained by image processing, it should be noted that the duration of the trajectory is not constant and the starting point and end point of the motion are difficult to determine. To cope with this problem, we focused on the motion feature on each unit time. The velocity of each trajectory segment and the angle of two connecting trajectory segments, namely change of orientation, were employed as featuring parameters which are effective to discriminate a person walking into the entrance and to recognize if he/she is a visitor or a staff (not a visitor). In this paper, we call a person visitor, who visits the place at the first time and to
H. Asama et al. / A Human Behavior Discrimination Method
893
whom the guidance information is desired to deliver, and staff, who knows the place very well and to whom the guidance information should not be delivered. Namely, since staffs have the knowledge on the map and the path to their destination, they must walk straightforward to the destination, which means the velocity is expected to be large and the orientational change to be small. On the contrary, since the visitors do not have enough knowledge on them, they must blunder around at the entrance, and the velocity is expected to be small and the orientatinal change to be large. If the position of the person is obtained in time ti as (xi, yi), the velocity of the person and his/her orientational change at time ti are represented respectively as follows:
v i = ( x i +1 , y i +1 ) T − ( x i , y i ) T
(1)
θ i = cos −1 (( v i −1 ⋅ v i ) /( v i −1 v i ))
(2)
Taking account that the motion difference between a staff and a visitor must appear in the velocity and orientational change of their trajectories, we define the motion index as vcosθ. The motion index mean X during time period n can be calculated as follows;
X =
1 n ∑ v i cos θ i n i =1
(3)
where n denotes attentive time steps in the motion trajectory. Figure 1 illustrates the motion index calculated from the partial trajectory of a walking person. As mentioned above, since the velocity of a visitor must be small and his/her orientational change must be large, the motion index mean value is expected to be small. On the contrary, the motion index mean value of a staff is expected to be large. Namely, the distribution (probability density function) of motion index mean of staffs is expected to be differentiable from that of visitors as shown in figure 2. (xi+1, yi+1) vi-1 (xi-1, yi-1)
vi
㱔i
(xi, yi)
vi cos θ i
Figure 1. Motion index calculated from the partial trajectory of a walking person
frequency fv(X): Visitor fs(X): Staff 0
X [mm/s]
Figure 2. Probability density function fv(X) and fs(X)
894
H. Asama et al. / A Human Behavior Discrimination Method
Suppose the distribution (probability density function) of X can be Gaussin distribution, the function can be represented as follows: fk ( X ) =
⎧ − ( X − μ k )2 ⎫ exp ⎨ ⎬ 2 2π σ k ⎭ ⎩ 2σ k 1
(4)
k = v (for visitor) or s (for staff) where μ k and σ k represent the mean value and the varience of the distribution. If the probabilities of visitors p(v) and staffs p(s) are obtained as priori probabilities, the posteriori probabilities p(v|X=xi) and p(s|X=xi), which are probabilities that a person entering into the scene is a visitor and a staff, can be obtained as follows by Bayes’ theorem:
p(k X = xi ) =
p(k ) f k ( X = xi ) p(v) f v ( X = xi ) + p( s ) f s ( X = xi )
(5)
k = v (for visitor) or s (for staff) By calculating and comparing the probabilities by (5), it is possible to descriminate the person as follows: If p (v X = xi ) ≥ p ( s X = xi ) , then the entering person is a visitor
If p (v X = xi ) < p ( s X = xi ) , then the entering person is a staff
3. Preparatory Experiments for Human Motion Trajectory Measurement We developed an experimental setup and evaluated the error of position measurement of a person at the entrance by experiments. The setup of the experimental system is shown in figure 3. Radio LAN
2750[mm] 2750[ Stairs 3100[mm]
y
EV
PC
0
Micro Server
1750[mm] View
1800 [mm]
Corridor
x Figure 3. Setup of the experimental system
H. Asama et al. / A Human Behavior Discrimination Method
895
In this system, a compact sensor node device called μT-Engine (Mitsubishi Electric Corporation) equipped with an artificial retina camera was employed, and embedded at the location where the landing arena area in the fifth floor of a building of the Komaba Campus of the University of Tokyo can be monitored, which is the entrance of the our research center (RACE) for persons entering from an elevator or stairs. An example of the image captured by the camera is shown in figure 4.
Figure 4. Image captured by the artificial retina camera of the sensor node device
The size of the image taken by the camera is 160[pixel] by 144[pixel]. We set the camera at the location as shown in the figure 5, and the resolution of the measurement is about 300[mm] for the vertical direction and about 70[mm] for the horizontal direction at the farthest position in the scene of the image (about 6.0[m] distance from the location of the camera on the floor).
2030[mm]
66[deg] Floor
Figure 5. Location of the artificial retina camera
Next, the error of the position measurement was evaluated. Figure 6 shows the experimental results of the errors which were obtained by comparison between the position measured by our system and the position measured by a motion capture device (SV-Tracker, Ohyoh Keisoku Kenkyusho, Accuracy: 50[mm]). As shown in the figure, the error of position measurement at the distance of 6[m] apart from the location of the camera was about 350[mm]. This result is considered allowable for human behavior discrimination, assuming the system displays the guidance information to a walking person at a location near the person, and makes the person notice the guidance information, while the velocity of human is about 0.86[m/s] in indoor environment which was obtained in our experiments. The preparatory experiments to measure the trajectories of the visitors and staffs, and the calculation of motion index mean were carried out by the experimental system. The numbers of samples for the preparatory experiments are 15 for visitors and 110 for staffs. We observed the persons walking into the entrance from the elevator or stairs, and judged manually whether they are visitors (strangers who visited the research center at the first time), or staffs (professors, students, secretaries, and clerks of the center).
H. Asama et al. / A Human Behavior Discrimination Method
E rror [m m]
896
400 300 200 100 0 0
2000 4000 6000
㪛㫀㫊㫋㪸㫅㪺㪼㩷 㪲㫄 㫄 㪴 Figure 6. Error of position measurement
(a) Example 1
(b) Example 2
Figure 7. Examples of measured trajectories of visitors
(a) Example 1
(b) Example 2
Figure 8. Examples of measured trajectories of staffs
H. Asama et al. / A Human Behavior Discrimination Method
897
Typical measured motion trajectories of persons are shown in figure 7 and figure 8. Figure 9 and figure 10 show the distribution of the motion index mean of the visitors and staffs. The dots in the trajectories denote the location of the persons during motion at every 1[sec] sampling time. As shown in the figures, the distributions of the X can be approximated by Gaussian distribution. The mean values μ k and variances σ k (k = v (for visitor) or s (for staff)) of the distribution of X under the approximation were obtained as follows:
σ v =2.70*102[mm/s]
μ s =1.14*103[mm/s]
σ s =6.56*102[mm/s]
0 30 0 60 0 90 0 12 00 15 00 18 00 21 00 24 00 27 00 30 00 33 00
14 12 10 8 6 4 2 0 -3 00
Frequency
μ v =2.37* 102[mm/s]
X [mm]
0 30 0 60 0 90 0 12 00 15 00 18 00 21 00 24 00 27 00 30 00 33 00
00
14 12 10 8 6 4 2 0 -3
Frequency
Figure 9. Distribution of X for visitors
X [mm] Figure 10. Distribution of X for staffs
898
H. Asama et al. / A Human Behavior Discrimination Method
We evaluated the appropriateness of the approximation of the distribution of X by the Gaussian
distribution. The kurtosis and the skewness were -0.81 and -0.13 respectively in the case of visitors, and 0.41 and 0.29 respectively in the case of staffs. Though this means the distribution of visitors is more acute and left-sided than Gaussian distribution and the distribution of staffs is more obtuse and right-sided, it is concluded that these distribution can roughly be approximated by Gaussian distribution.
4. Verification Test for Human Behavior Discrimination Since fv(X) and fs(X), the probability density functions of X for visitors and staffs, were obtained by the preparatory experiments as mentioned above, verification tests of the proposed method for human behavior discrimination were carried out. In the experiments, a priori probabilities of visitors and staffs were obtained as the ratios of the samples of the persons who came into the entrance during the preparatory experiments, which are p(v)=15/125 and p(s)=110/125. The numbers of samples for the verification test are 10 for visitors and 45 for staffs, who came into the entrance in another day. The results are shown in Table 1. For the purpose of the guiding services, it is required to recognize the visitors correctly. The error ratio of the first kind in which the visitors were misrecognized as staffs was 10%, and the error of the second kind in which the staffs were misrecognized as visitors was 4.4%, while the recognition rates of staffs and visitors for the original learning data in the preparatory experiments were 96.4% and 80.0% respectively. From the result, it is verified that the proposed method is effective to discriminate a person walking into the entrance to be a visitor or a staff. Table 1. Results verification test for human behavior discrimination
Success Failure
Visitors 90.0% 10.0%
Staffs 95.6% 4.4%
While the error ratios are required to be as low as possible for a better service system, the error of the first kind must be critical for reliable detection of visitors to guide. Figure 11 shows an example in which the visitor was misrecognized as a staff. As shown in this figure, the reason of the misrecognition is that the value of the motion index mean increased due to large velocity value in partial segments. To solve this problem, it must be effective to divide the motion trajectories into proper portions, calculate the motion index on not the whole trajectory but each portion, and evaluate the change of the motion index. Namely, the misrecognition ratios can be improved easily. We also estimated the effect of the noises in the trajectory measurement on the human behavior discrimination. Taking an example of the case of figure 8 (a), in which the calculated motion index mean was 1.05*103[mm/s], the reliability of the discrimination method was tested by adding white noises to the motion index mean value. Figure 12 shows the variation of the motion index mean value with various white noises in the limit of 200[mm], 300[mm], 400[mm], 500[mm], and figure 13 shows the variation of p(v|X=xi)
899
H. Asama et al. / A Human Behavior Discrimination Method
with the white noises. As shown in the figure, even scattering of the motion index mean value and resulting probability of p(v|X=xi) depending on the white noises, the maximum change in the probability was only 0.25, which was observed in the case of maximum noise of 500[mm], and it was verified that the noise does not affect on the discrimination results. This shows the reliability and the robustness of the proposed methods for the motion trajectory measurement and the human behavior discrimination.
y[m] 10
5
0 -3 -3
0
5
x[m]
㪈㪋㪇㪇
㪇㪅㪊
㪈㪉㪇㪇
㪇㪅㪉㪌
㪈㪇㪇㪇
200[mm] 300[mm] 400[mm] 500[mm]
㪏㪇㪇 㪍㪇㪇 㪋㪇㪇 㪉㪇㪇 㪇
Posteriori probability
Motion Index X[mm/s]
Figure 11. Example of misrecognized motion trajectory
㪇㪅㪉
㪇㪅㪈 㪇㪅㪇㪌 㪇
㪇
㪈㪇
㪉㪇
㪊㪇 Trials
㪋㪇
㪌㪇
㪍㪇
Figure 12. Variation of motion index mean with white noises
㪉㪇㪇㪲㫄㫄㪴 㪊㪇㪇㪲㫄㫄㪴 㪋㪇㪇㪲㫄㫄㪴 㪌㪇㪇㪲㫄㫄㪴
㪇㪅㪈㪌
㪇
㪈㪇
㪉㪇
㪊㪇
㪋㪇
㪌㪇
㪍㪇
Trials
Figure 13. Variation of calculated probability with white noises
5. Conclusion In this paper, taking an example of guiding service to visitors in indoor environment, methods to measure human motion trajectories using a sensor node device equiped with an artificial retina camera, and to descriminate persons walking into the entrance in indoor environment to recognize whether the persons are visitors or not were proposed. As a result of the experiments, effectiveness and robustness of the proposed method was verified.
900
H. Asama et al. / A Human Behavior Discrimination Method
This method is also considered applicable to security systems in which detection of dubious persons is required. The method proposed in this paper is limited to the case that a single person is included in the scene. It is planned in future to improve this method to cope with the case with multiple persons in the scene of the images, by labelling multiple targets in imege processing and probablistic correspondence even when occulusion occurs. The integration of information obtained by multiple cameras to cover wider area is also future work.
Acknowdgment The authors thank to Mr. Kenji Shirai (Mitsubishi Electric Corporation, Japan), Dr. Toru Shimizu (Renesas Technology Corporation, Japan), and other staffs in the corporations who supported us in the implementation of the experimental setup using μTEngine.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
T. Tomiyama, Service Engineering to Intensify Service Contents in Product Life Cycles, in Proc. of 2nd Int. Symp. on Environmentally Conscious Design and Inverse Manufacturing (EcoDesign 2001), (2001), 613 618. J. Yamamoto, J. Ohya, and K. Ishii, Recognition Human Action in Time-sequential Images Using Hidden Markov Model, Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1992, 379 385. A. F. Bobick and J. W. Davis, The Recognition of Human Movement Using Temporal Templates, IEEE Trans. Pattern Anal. & Mach. Intell., Vol. 23, No. 3, (2001), 257 267. I. Haritaoglu, D. Harwood, and L. S. Davis, W4 who? When? Where? What? A Real Time System for Detecting and Tracking People, Proc. FGR 98, (1998), 222 227. S. Ju, M. Black, and Y. Yacoob, Cardboard People: A Parameterized Model of Articulated Image Motion, Int. Conf. on Face and Gesture Analysis, 1996. M. Oren, et al., Pedestrian Detection Using Wavelet Templates, Proc. IEEE CVPR 97, (1997), 193 199. C. Wren, et al., Pfinder: Real-time Tracking of the Human Body, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 19, No. 7, (1997), 780 785. T. Kanade, et al., Advances in Cooperative multi-sensor video surveillance, in Proc. of DARPA Image Understanding Workshop, vol. 1, (1998), 3 24. H. Fujiyoshi, A. Lipton, and T. Kanade, Real-Time Human Motion Analysis by Image Skeletonization, IEICE Trans. Inf. & Syst., Vol. E87-D, No. 1, (2004), 113 120. T. Sato, Y. Nishida, H. Mizoguchi: Robotic Room: Symbiosis with human through behavior media 㧘 Robotics and Autonomous Systems, No.18, (1996), 185 194. P. Korondi, P. Szemes, and H. Hashimoto: Ubiquitous Sensory Intelligence, in Proc. of International Conf. in Memoriam John von Neumann, (2003), 73 86. S. Aoki, et al., Learning and Recognition Behavioral Patterns Using Position and Posture of Human Body and Its Application to Detection of Irregular State, IEICE, J87-D-II, 5, (2004), 1083 1093. http://www.meltec.co.jp/press/pdf/050906.pdf P. T. Szemes, T. Sasaki, H. Hashimoto, Mobile Agent in the Intelligent Space Which Can Learn Human Walking Behavior, Proc. 2005 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM2005), (2005), 1227 1232.
Part 16 Mutual Adaptation Among Man and Machines
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
903
Effects of Shared Communicational Modality to Joint Activity of Human Operator and Robot Autonomy Yukio Horiguchi a,1 , and Tetsuo Sawaragi a a Department of Mechanical Engineering and Science, Graduate School of Engineering, Kyoto University Abstract. The authors have proposed a formal approach to designing effective human-machine interaction channels, i.e., shared communicational modalities, between a human operator and a machine autonomy. In this framework, Kirlik’s classification scheme of information types defined as Generalized Lens Model is utilized for distinguishing information resources in the target system from the perspective of their proximity to both human and machine agents’ perception and action. The shared communicational modalities are then embodied virtually by imposing some mutual constraints among their respective [PP,PA] (i.e., proximal for both perception and action) variables. This paper evaluates the proposed interaction design after an actual implementation of such modalities into a shared-control environment, especially focusing on the mutual relationship of those agents’ cueutilization strategies as well as their joint task performances. Keywords. Shared autonomy, human-machine interaction, mixed-initiative interaction, shared communicational modality, Lens model
1. Introduction There proposed an idea of “shared autonomy” as a novel concept of human-machine collaboration styles, which expects to encourage the reciprocal complementarities emerging in their joint activity [1]. Unlike supervisory control in which a task of interest is hierarchically divided into the upper knowledge-level (e.g., planning) and the lower behaviorlevel (e.g., plan execution) that are assigned to humans and machines respectively, this style of human-machine collaboration intends their independent and parallel contributions to both levels of the task. Therefore, this concept stresses the design philosophy that a human- and a machine-autonomy should collaborate with each other as equivalent partners, while its comparable concept of “shared control” [2, 3] simply denotes the concurrent mixture of human operation and mechanical control. The idea of shared autonomy suggests a very important perspective on how to couple together a human user and a machine with highly advanced automated functions toward their good relationships, but it still remains at the conceptual. 1 Correspondence to: Yukio Horiguchi, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan. Tel.: +81 75 753 5044; Fax: +81 75 753 5044; E-mail: [email protected].
904
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
The essential differences in physical and cognitive capabilities between humans and machines can contribute to providing different accesses to an identical task situation, and so can enhance the total system performance. “Mixed-initiative interaction” [4] represents the style of interaction between the subjects collaborating with each other, where their roles and initiatives are not fixed in advance and appropriately assigned depending on the situations. Composing mixed-initiative interaction between human and machine agents has large potentials toward the truly effective human-machine collaboration [8–10]. However, any interventions by other than his/her own decisions may be the factors to disorder human control. They could hurt operationality of the system from the operator’s perspective by introducing unexpected behaviors into the system. Thus, in order to realize naturalistic collaborations in such human-machine systems, we need to explore some effective ways to establish and maintain correct understandings on their common task situation shared between them, especially, the ways to let the human operators adequately recognize interventions by the machine autonomy into their own operations. This is a key issue on human-machine interface design. Concerning the above issue, the authors have proposed a formal approach to designing effective human-machine interaction channels, i.e., shared communicational modalities, between a human operator and a machine autonomy [14,15]. In this framework, Kirlik’s classification scheme of information types defined as Generalized Lens Model [11] is utilized for distinguishing information resources in the target system from the perspective of their proximity to both human and machine agents’ perception and action. The shared communicational modalities are then embodied virtually by imposing some mutual constraints among their respective [PP,PA] (i.e., proximal for both perception and action) variables. This paper evaluates the proposed interaction design after actually implementing such a modality into a shared-control environment by examining the mutual relationship of those agents’ cue-utilization strategies as well as their joint task performances. The analysis especially focuses on the different development processes between their jointed performances with and without the mediation of that common modality.
2. Shared Communicational Modality: Supporting Socially Epistemic Actions As clarified in the design principle of ecological interface design [5, 6], human-machine interface designs must be coherent with the ways of human thinking and perceiving performed under their bounded cognitive resources. In connection with this design philosophy, we know an important empirical fact that action plays not only a performatory role but also an exploratory, or knowledge-granting one [7, 11]. The latter aspect of action, referred as “epistemic action”, plays an very important part in our human flexible, skillful performances in the complex world because it is an efficient strategy to reduce their cognitive burden such as inferring some in-depth structures of their work domains [11]. In conventional human-machine systems designs, actions of human operators are extremely limited in the control loop of the highly automated mechanical systems due to their admissible disturbances for stable, reliable, or efficient operations. However, in order to encourage their naturalistic collaboration expected in their joint activity, those systems should provide some effective ways to let the respective agents (human or mechanical) adequately be aware of what their partners are doing and going to do. In this sense, collaborative systems need to accommodate their epistemic actions toward each
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
905
other. As one form of such effective human-machine interaction channels, the authors have proposed to design shared communicational modalities between a human operator and a machine autonomy as their interfacing media [14, 15]. The proposed approach makes use of Kirlik’s classification scheme of information types defined as Generalized Lens Model [11], thereby depicturing the latent covariant relations among variables involved in the human-machine system. The original Lens model [3, 12, 13] is a functional representation of human perception and judgment in terms of dual symmetric models of a human judge (subject) and its environment (ecology). This model makes the proximal versus distal distinction in human perception. The “proximal” refers the direct accessibility by the judge while the “distal” represents the indirectness and is accessed through the proximal information. Kirlik has proposed to add the proximal-versus-distal structure of action into this model because the model only describes the view of the subject without any control over the environmental structure. With this extension, variables in the task environment are classified into four different types as enumerated in Table 1. Table 1. Four different types of variables defined in Kirlik’s Generalized Lens Model Variable type
Definition
[PP,PA] [PP,DA]
a variable that is proximal for both perception and action. a variable that is proximal for perception but distal for action.
[DP,PA] [DP,DA]
a variable that is distal for perception yet proximal for action. a variable that is distal for both perception and action.
Using this scheme, information resources in the target system can be distinguished based on their proximity to both human and machine agents’ perception and action. The shared communicational modalities are virtually embodied by adding some mutual constraints between [PP,PA] (i.e., proximal for both perception and action) variables of the respective agents. Those constraints impose each agent’s practicable actions depending on the other’s operational decisions, through which they can exchange their exploratory acts to each other. It composes a kind of bilateral information channels to be shared between the human and machine agents, and could afford their dense and enriched interaction cycles.
3. Implementation and Experiments of Shared Control System A shared-control system, in which the control of a teleoperator robot is shared between a human operator and a robot autonomy, has been developed to evaluate the proposed model of shared communicational modality. In this system, a human operator operates a mobile robot (ActivMedia PIONEER1 Mobile Robot) in remote corridor environments by a joystick (Microsoft SideWinder Force Feedback Pro) on a terminal PC that is connected with the robot through radio modems. The ambulatory motions of the robot are controlled by commanding the target translational and rotational velocities designated in response to the joystick position; forward-backward and right-left inputs to the stick are translated to the robot’s behaviors of translational and rotational velocities, respectively. The robot has a CCD camera capable of panning, tilting and zooming (i.e., a robotic PTZ camera) on its front, and seven super sonic range sensors to measure distances from ob-
906
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
Obstacle j (detected by Sensor j) Heading
θi
θj
Obstacle i (detected by Sensor i)
Repulsive Force from Obstacle i
F Ri
Repulsive Force from Obstacle 䌪
F Rj Movement Vector
Figure 2. The corridor environment for experiments Figure 1. Potential field method to generate ambulatory motion commands of the mobile robot
stacles. The operator basically comprehends the surroundings of the remote robot using the image sent from the camera. 3.1. Obstacle-Avoidance Behavior as Robot Autonomy As difficult for the operator to understand the surrounding circumstances of the robot completely because of a large blind spot the camera has, an obstacle-avoidance behavior is equipped into the robot as its autonomy. The autonomy’s obstacle-avoidance behavior is realized after a potential field method composed of repulsive forces from obstacles that are caught by the supersonic range sensors as illustrated in Figure 1. The velocity and steering commands to the mobile robot are computed as Eqs. (1), (2) and (3): FRi = e−Ci di , velocityautonomy = VM AX
(1) 7
FRi cos θi ,
(2)
i
steeringautonomy = SM AX
7
FRi sin θi .
(3)
i
Where, the parameter di indicates the distance measurement by the sensor i ∈ {1, 2, . . . , 7} whose direction angle is set to θi relative to the robot’s heading. FRi is the intermediate variable which represents the magnitude of the repulsive force from the obstacle the sensor i has identified. Ci is the variable gain parameter of the potential field, which determines the strength of the sensor i’s contribution. VM AX and SM AX are constants to translate the virtual forces into the ambulatory motion commands, and they are defined as VM AX = 300 mm/sec and SM AX = 20 deg/sec in tune with the specifications of the robot, respectively. As this potential field has the parameters each of which determines the incline of the cone representing the effect of a particular obstacle, i.e. Ci s, the autonomy can change
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
907
Display Information
Operator
Autonomy Task Environment
Robot Figure 3. Implementation of shared communicational modality by a force-feedback joystick
its behavioral strategy by adjusting those parameters: if FRI gives a good effect upon the robot’s behavior (e.g., the autonomy’s decision has agreed with the operator’s), the value of CI is decreased to intensify the sensor i’s contribution; but it is increased otherwise. 3.2. Sharing Control of Teleoperator Robot Figure 3 illustrates an overview of the mobile robot shared-control system developed. The joystick adopted here has the capability to generate the force-feedback effect, and thus we use it to embody the model of the shared communicational modality. By letting its decisions reflect on the joystick motions using the feedback force, the autonomy can manipulate the stick controller as well as the human operator does. Thereby, the operator’s and the autonomy’s operations are mutually restricted by the joystick itself, and they can affect the other’s judgment policies through it. This joystick functionates as an important communication channel in the shared-control system. The subsequent experiments are basically performed on this operational condition, but, for the comparison purpose, another condition was prepared in which the autonomy’s decisions were displayed on the GUI instead of the force-feedback effects of the joystick. This latter condition is named No-MII as the both agents cannot directly affect the other’s decision, while the former condition (i.e., the proposed model) MII which is the abbreviation for Mixed-Initiative Interaction.
908
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
4. Evaluating Effects of Sharing Modality to Affect the Collaboration Partner 4.1. Performance Comparison Four different operators respectively executed a set of trials of No-MII and MII conditions by turns until ten sets. The remote task environment for this experiment was the L-formed corridor as shown in Figure 2. Its passageway width is so narrow that even smaller mistakes of the robotic operations would come out in collisions with walls, and its simple form contributes easy capturing of the operator’s and the autonomy’s judgment policies to control the robot. Table 2 presents the comparison of several statistic values computed from the data recorded in those trials. Where, Psuc and T¯ represent the success rate of the navigation task (i.e., the fraction of the task completions with no collisions) and the averaged task completion time all through the trials in each condition, respectively. T¯S is the average value of task completion time among the successful tri¯cw represents the average number of als as well as T¯F among the unsuccessful trials. N “cut-the-wheel” operations among unsuccessful trials. Table 2. Comparison of several statistic values between No-MII and MII conditions Psuc
T¯ [sec]
T¯S [sec]
T¯F [sec]
¯cw N
No-MII
0.425
14.3
12.7
18.4
1.52
MII
0.625
13.6
12.6
15.5
1.4
This comparison indicates that T¯F is the most noteworthy variable to distinguish the two operational conditions while MII basically outperforms No-MII about all these statistics. The T¯F values represent that operations in the No-MII condition took longer about recoveries from collisions than in the MII condition. This result can be thought of as follows: the autonomy’s “tacit” interventions into human decisions may work well while their joint activity is going smoothly; but otherwise it may cause some mismatch between the actual robotic behavior and the operator’s anticipation on it, thereby confusing human decisions. At the same time, the result suggests that sharing operational modality (i.e., the case of MII condition) seems to contribute to adjustments of human and mechanized decisions during those recovery operations. 4.2. Depiction of Judgment Policies Based on the Lens Model Formalism Well-coordinated collaborations by independent individuals involve the adequate roleassignment among them; each contributor should occupy its own social “niche” in their joint activity. The socially epistemic actions would support the sustainable efforts to find out their positions. To contemplate the two different collaboration styles from this point of view, the social relationships of human and mechanized decisions are investigated based on the Lens Model formalism [3, 12, 13] here. Figure 4 illustrates the criterion-judgments model which depicts the operatorautonomy joint judgment structure of interest in terms of the Lens Model. On the one hand, the operator’s (YOP ) and the autonomy’s judgments (YAT ) contribute to the locomotion control of the teleoperator robot as they are jointed into the judgment of YJ in this model. Both judgments of YOP and YAT are rendered on the basis of available cues while the following state variables are selected as the cues because of their measurabil-
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
Ye
′ YOP
YOP
VEL CRITERION
OPERATOR’S COVERT JUDGMENT
OPERATOR’S INDIVIDUAL JUDGMENT
CUES
909
RVEL
YJ
SEN1
SEN7
YAT AUTONOMY’S INDIVIDUAL JUDGMENT
JOINT JUDGMENT
′ YAT AUTONOMY’S COVERT JUDGMENT
Figure 4. Depiction of the joint judgment scheme in terms of the Lens Model formalism
ity; the robot’s translational (VEL) and rotational (RVEL) velocities, and the measurements of all range sensors (i.e., SEN1, SEN2, . . ., SEN7). The connections among those cue variables and each judgment indicate the judgment policy, or decision structure, attributed to the human operator or machine autonomy. On the other hand, Ye represents the criterion of the robotic operations, i.e., the counterpart of YJ , but is taken no notice of its value here because an ideal operation in each task situation cannot be determined. In this scheme, the two judgment policies are estimated using the multiple regression modeling technique (Stepwise model-building technique) which is the most prevailing in ˆ AT , are thus ˆ OP and Y policy capturing methodologies [13]. The captured policies, i.e., Y cue-utilizations in the form of the weighted sum of cue variables as shown in Eq. (4): ˆ = b + bVEL XVEL + bRVEL XRVEL + bSEN1 XSEN1 + . . . + bSEN7 XSEN7 . (4) Y 4.3. Relative Relation between Individual Judgment Policies Measuring the social relationships of human and mechanized decisions has been done by considering the distance between the two judgment policies. Eq (5) defines the disˆ OP and Y ˆ AT , in which β (j) and β (j) represent standardized retance between Y OP,i AT,i gression coefficients (β weights) of the operator’s and the autonomy’s judgment policies, respectively. Where, the parameter j is defined as j ∈ {velocity, steering}, and the parameter i indicates every cue variable implicated in the regression models (i.e., i ∈ {VEL, RVEL, SEN1, . . . , SEN7}). 4 (j) (j) (βOP,i − βAT,i )2 . (5) distanceYˆ OP −Yˆ AT = j
i
Figure 5 parallelizes the averaged values of distanceYˆ OP −Yˆ AT for all the experimental subjects in both No-MII and MII conditions. This result indicates that the NoMII collaboration style induced more similar judgments between the human operator and the robot autonomy than the MII did. Their similarity implies redundant or at least partly overlapped operations by the two different agents. Such operations increase the
910
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
Figure 5. Comparison of the averaged values of distanceY ˆ
ˆ
OP −YAT
likelihood that more extra or excessive control commands may come into effect on the robotic behaviors than the operator anticipated. It would get worse to come when the operator have no knowledge on the autonomy’s decision structure. Those unestablished role-assignments incur inconsistency or uncontrollability in human decisions. That is, the other independent entity might introduce some “noises” into their system, and should so hinder the operator’s adequate reactions to the task situations. Table 3. Transitions of distanceY ˆ
ˆ
OP −YAT
Early Runs (AVG. of 1st -3rd )
values from early to final runs
Final Runs (AVG. of 8th -10th )
Percentage increases
SUBJECT A
1.610
1.617
+0.5%
SUBJECT B
1.380
1.707
+23.7%
SUBJECT C
1.526
1.928
+26.3%
SUBJECT D
1.546
1.708
+10.4%
Final Runs (AVG. of 8th -10th )
Percentage increases
(a) No-MII
Early Runs (AVG. of 1st -3rd ) SUBJECT A
2.018
2.080
+3.1%
SUBJECT B
2.019
2.308
+14.3%
SUBJECT C
2.130
2.222
+4.4%
SUBJECT D
2.195
2.217
+1.0%
(b) MII
On the other hand, the distanceYˆ OP −Yˆ A T measurements also confirmed gradual modifications of the operator-autonomy relationship even under the No-MII condition. Table 3 compares the distanceYˆ OP −Yˆ A T values between early (averaged among the first
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
911
three trials) and final runs (averaged among the last three trials) for all the experimental subjects on the No-MII (a) and MII conditions (b), appending their percentage inˆ OP and Y ˆ AT are observable creases. In every case, the enlarging distance between Y along the experimental trials. This result tells us that accumulated experiences of the joint activities have differentiated their roles to contribute to the robotic operations, and that consequently they could cultivate a complementary relationship with each other in some senses. At the same time, the distanceYˆ OP −Yˆ A T values of the MII collaboration style (Table 3(b)) are considerably large even from early runs. They provide evidence of swift development of good human-machine partnership through the embedded shared communicational modality.
5. Discussions and Conclusion The in-depth analysis of the operator-autonomy joint activity based on the distance between the individual policy models suggests the operators’ adaptation processes in which they had learned the autonomy’s decision structure through their practical work experiences. At the same time, it also indicates that the autonomy had gotten to accommodate to individual operational skills in turn. Apparently had the roles of the operators and the autonomy in their joint activity changed into the ones composing better function-allocations from their initial states. Meanwhile, different task performances between the two collaboration styles explain how effectively their shared modality contributed to cultivation of more cooperative human-machine relations. Collaboration by independent autonomies, in principle, demands common information resources to facilitate establishing coherent and consistent decisions among them. The authors consider the shared communicational modality as such resources in their dynamic and fluent interactions. Go back to Figure 4. The directed dashed lines in this scheme add the individual policy modification cycles to the basic joint judgment structure. As previously mentioned, the actual robotic behaviors are controlled by the joint judgment of YJ , which is composed of the two judgments of YOP and YAT . Adjustments or revisions of those judg ment policies then assume covert judgments [13], i.e., YOP and YAT , that try to move the individual judgments toward the compromise positions as reflecting the influence of the other’s policy. Thereby, the operator’s introspecting process is represented by the dashed lines going through YOP , in which the judgment policy of YOP can be adjusted as considering its relative location to the partner’s policy of YAT through their joint judgment of YJ . Altogether, YOP plays a kind of meta-level cognition in the operator, organizing the YOP ’s structure based on the YOP -YAT relationship. The same applies to the case of the robot autonomy, i.e., the dependence of YAT and YAT . In this view, YJ plays an important role for improving the operator-autonomy collaboration states because it does mediate and influence both modification cycles of the human and autonomy’s judgment policies. Therefore, the point on developing their better partnership is the accessibility to the joint judgment for the individual agents. In the MII collaboration style, the human operators can directly assess the state of YJ through the shared communicational modality, and the modality also allows their direct manipulation of the state of YJ . In other words, the shared communicational modality provides the direct access to the joint judgment from both perspectives of perception and action. Coupling the respective [PP,PA] variables with each other brings together those proximal
912
Y. Horiguchi and T. Sawaragi / Effects of Shared Communicational Modality to Joint Activity
resources the subject can act to and get feedback from. From the viewpoint of epistemic actions, this functionality will effectively contribute to the explorations of the appropriate cooperative relationship in the human-machine system. On the other hand, the NoMII collaboration style provides no such resources. These estimated accounts portray the capability of the shared communicational modality and then resulting collaboration performances that are different between with and without it.
Acknowledgements This research is supported in part by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 17760165, 2005, and by Center of Excellence for Research and Education on Complex Functional Mechanical Systems (COE program of the Ministry of Education, Culture, Sports, Science and Technology, Japan). The authors would like to thank their supports.
References [1] S. Hirai, Theory of Shared Autonomy, Journal of the Robotics Research in Japan 11(6) (1993), 20–25 (in Japanese). [2] T.B. Sheridan, Telerobotics, Automation, and Human Supervisory Control, The MIT Press, 1992. [3] T.B. Sheridan, Humans and Automation: System Design and Research Issues, WileyInterscience, 2002. [4] J.F. Allen, Mixed-Initiative Interaction, IEEE Intelligent Systems 14(5) (1999), 14–16. [5] K.J. Vicente and J. Rasmussen, The Ecology of Human-Machine Systems II: Mediating “Direct Perception” in Complex Work Domain, Ecological Psychology 2(3) (1990), 207–249. [6] K.J. Vicente and J. Rasmussen, Ecological Interface Design: Theoretical Foundations, IEEE Transactions on Systems, Man and Cybernetics 22(4) (1992), 589–606. [7] D. Kirsh, and P.P. Maglio, On Distinguishing Epistemic from Pragmatic Action, Cognitive Science 18 (1994), 513–549. [8] G. Ferguson, J.F. Allen and B. Miller, TRAINS-95: Towards a Mixed-Initiative Planning Assistant, Proceedings of Third Conference on Artificial Intelligence Planning Systems (1996), 70–77. [9] G. Ferguson, J.F. Allen, TRIPS: An Integrated Intelligent Problem-Solving Assistant, Proceedings of the Fifteenth National Conference on AI (1998). [10] E. Horvitz, Uncertainty, Action, and Interaction: In Pursuit of Mixed-Initiative Computing, IEEE Intelligent Systems 14(5) (1999), 17–20. [11] A. Kirlik, The Ecological Expert: Acting to Create Information to Guide Action, Proceedings of Fourth Symposium on Human Interaction with Complex Systems (1998), 15-27. [12] E. Brunswik, Representative Design and Probabilistic Theory in a Functional Psychology, Psychological Review 62 (1955), 193–217. [13] R.W. Cooksey, Judgment Analysis: Theory Methods and Applications, Academic Press, 1996. [14] Y. Horiguchi and T. Sawaragi, Design of Shared Communicational Modality for Naturalistic Human-Robot Collaboration in Teleoperating Environment, Proceedings of The Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies (2001) 854–858. [15] Y. Horiguchi and T. Sawaragi, Naturalistic Human-Robot Collaboration Mediated by Shared Communicational Modality in Teleoperation System, Proceedings of The Sixth International Computer Science Conference on Active Media Technology 2001 (2001) 24–35.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
913
Learning of Object Identification by Robots Commanded by Natural Language Chandimal Jayawardena , Keigo Watanabe and Kiyotaka Izumi Saga University, Saga 840-8502, Japan. Abstract. Natural language communication is very important in Human-Robot cooperative systems. As a part of natural language communication, learning to identify natural language references such as “small red cube” is very useful. This paper proposes a method of learning object identification in which the meanings of references such as above can be inferred. The proposed method is demonstrated with an experiment conducted with a PA-10 redundant manipulator and a set of objects. The robot manipulator is issued commands like “pick the big red cube” to identify and pick objects placed on a table. The robot learns to interpret the meaning of this type of natural commands by learning individual lexical symbols in the vocabulary and their corresponding object features. Keywords. Natural language commands, Lexical symbols, Object identification, Object features
1. Introduction Human-robot interaction is one of the most important developments in the field of robotics. The effectiveness of a human-robot cooperative systems would be enhanced by improving the naturalness of the human-robot interface. In achieving this, the ability to communicate as peers using natural languages is of utmost importance. On the other hand, object identification is one of the important features in intelligent robotic systems. Most research on vision based object identification systems have concentrated on identifying known objects in a scene (e.g. [1] [2]). In addition, there have been some research on learning and identifying unknown objects too (e.g. [3]). In the experiment presented in this paper, a human user can command a robot manipulator verbally to pick objects placed on a table. The user can refer to objects naturally using references such as “small red cube.” Learning to identify objects referred to in this manner is important for natural language understanding robots. 1.1. Learning Object Identification The object identification method employed in this work is different from the existing systems pointed out above. In this method, instead of learning an objects as it is, different object features and lexical symbols which represent those features in English lan1 Correspondence to: Chandimal Jayawardena, Department of Advanced Systems Control Engineering, Saga University, Saga 840-8502, Japan. Tel.: +81 952 28 8608; Fax: +81 952 28 8587; E-mail: [email protected].
914
C. Jayawardena et al. / Learning of Object Identification by Robots
guage are learned. Then, that knowledge is applied to identify new objects which are characterized by combinations of learned features. In natural languages object references are composed of combinations of lexical symbols representing shapes, colors, sizes, etc. In order to infer the meaning of such a combination, one should know the meaning of each lexical symbol. For example, to identify a “large green car,” one should know what is meant by large, green, and car. In the human learning process, once the grounded meaning of a lexical symbol is learned, humans are capable of interpreting it with relation to different scenarios. This is true for childhood learning as well as for new language learning by adults. Our objective is to apply a similar strategy for learning object identification by robots. Object perception by any robot is only via sensors. If the camera images are used it is possible to extract various features of the objects presented in a scene. This is a completely automated process where there is no consideration as to how these objects are represented in the domain of natural languages. Although the robot perception is limited to sensory data, a human user may refer to objects with combinations of lexical symbols. “red cube”, “blue cylinder”, or “big yellow sphere” are some examples. In order to execute user commands which consist of such references, there should be a method to learn the meanings of these lexical symbols. There have been many important work related to this problem [4] [5]. However, those work considered the problem as a fundamental cognitive problem. In this paper, learning the meaning of lexical symbols with the help of a human user is studied. On the other hand, it is not limited to acquiring knowledge of some symbols; rather it uses independently learned lexical symbols to understand the meaning of a composite lexical item: i.e. a complete reference to an object. For example, after the meaning of the lexical symbol “red” is learned, it is meaningful for any red object; “red cube”, “red cylinder”, etc. Here, we define two terms: relative features and non-relative features. An object feature whose meaning can be inferred without comparing with other objects is called a non-relative features. The object color is an example. In contrast, meaning of a relative feature can be inferred only after comparing with other objects. For example, the meanings of “small red cube” and “big red cube” are understood only by comparison between all “red cubes.”
2. Object Perception by a Robot Let the number of objects presented in a scene be . Assume that any object possesses number of non-relative feature values which belong to number of mutually exclusive non-relative feature categories. For example, color may be a feature category. Decolor components may be a feature value in pending on the application, vector of the category color. Therefore, object can be represented with a vector . (1) is a feature value of the feature category . It is a value obtained from raw where sensory data. (2)
915
C. Jayawardena et al. / Learning of Object Identification by Robots
is the set of all object representations. Assuming that the number of distinguishable features within each non-relative feature category is finite, it should be possible to identify feature clusters within sensory data pertaining to any feature category; i.e. it should be possible to identify clusters among for the th feature category. Let the number of clusters identified within sensory data pertaining to the th feature category be . Set of objects that belong to any cluster in the th non-relative feature category is given by: (3) for
. Here where for any and . Let the set of all be . All the objects in cluster have one common feature that belongs to the th nonrelative feature category. Let that feature be . Therefore, the sets of objects which have features are given by: (4) where
,
and so on. Let be any
.
3. Lexical Representations Suppose, in the user lexicon, the set of lexical symbols corresponding to non-relative . These non-relative lexical symbol learning is described by the feature category is bijective functions such that (5) Learning described by is achieved by the algorithm shown in Fig. 1(a). However, the above relationship is not valid for learning relative lexical symbols such as “big” or “small” which are associated with relative features. Let the set of rel. Assume that each relative ative feature categories be . For example, relative feafeature category is associated with an ordering relation ture category size may be associated with the ordering relation number of pixels in an object. Let the set of lexical symbols corresponding to the relative feature category be . If is a well-ordered set whose elements consist of the elements of which , the learning of lexical symbols is are ordered according to the ordering relation described by the bijective function such that (6) The algorithm shown in Fig. 1(a) would identify objects considering only nonrelative features. Therefore, it will identify all the objects which differ only in relative features. For example, it will identify all “red cubes” irrespective of the presence of a “small red cube” and a “big red cube.” The set of such objects form the set . Therefore, learning described by is achieved through the algorithm shown in Fig. 1(b).
916
C. Jayawardena et al. / Learning of Object Identification by Robots Get the user command. Extract relative lexical symbols, ; Get the set of objects identified by the non-relative lexical symbols: Find the ordered sets:
Get the user command. Extract non-relative lexical symbols, ; IF is known .. . is known Identify the object(s). ELSE Consult the user: “What do you mean by ?”. User points to any object with the feature described by Learn : Map to the cluster to which the pointed object belongs. Go to step . .. .
IF
IF
ELSE Consult the user: “What do you mean by ?”. User points to any object with the feature described by Learn : Map to the cluster to which the pointed object belongs. Go to step .
.
is known .. . IF is known Identify the object. ELSE Consult the user: “What do you mean by ?”. User points to any object with the feature described by Learn : Map to the position of the pointed object in the ordering relation . Go to step . .. .
.
ELSE Consult the user: “What do you mean by ?”. User points to any object with the feature described by . Learn : Map to the position of the pointed object in the ordering relation . Go to step .
.
(b) Learning non-relative lexical symbols.
(a) Learning relative lexical symbols.
Figure 1. Learning algorithms.
Table 1. Grammar Action
Article
Size
Color
pick
(the)
small
red
cube
medium big
green blue
cylinder
grab take
Shape
4. Overview of the System Overview of the experimental system developed to demonstrate the above concept is shown in the Fig. 2. On the object table, objects of different colors, shapes and sizes are placed. Observing the objects a user may ask the robot to pick any one of the objects. For example, user may say “pick the small red cube”. Valid grammar for this experiment is given in the Table 1. Any combination of the lexical symbols size, color, and shape would form a valid reference to an object.
C. Jayawardena et al. / Learning of Object Identification by Robots
917
Object Table Object Feature Extraction
Image Acquisition
Low-level Knowledge Base
Observation
Lexical Symbol Identification
High-level Knowledge Base
Command Execution
PA-10 Manipulator
“Pick the small red cube”
Figure 2. Overview of object identification system by a robot.
8 1
5
16
15 13 6
3
18
11 12
7
2 4
Figure 3. A view of the experimental setup.
17
9
10
14
Figure 4. Object table.
5. Implementation 5.1. Experimental Setup The experimental setup consists of a PA-10 industrial manipulator and controller, object table, three USB cameras, a microphone and a PC running WindowsXP. The three cameras are placed over, in front of, and on left of the object table. For the image acquisition DirectX technology is used. Voice recognition is performed using IBM ViaVoice SDK. The camera placed right above the table provides a calibrated image and it is further processed in order to extract object features. All three images are displayed on the users computer monitor in order to provide three dimensional details of the workspace. 5.2. Low Level Knowledge Base Object feature extraction module in Fig. 2 extracts shape, color and size representations of each object. Shape representation of an object should be invariant to change in size, to change in location and to rotation. Although there are various descriptors such as thinness ratio, shape elongation, spreading, compactness, etc. Hu descriptors has the particularity of being invariant to scale, translation and rotation [6]. If a digital image is considered as
918
C. Jayawardena et al. / Learning of Object Identification by Robots
Table 2. Object representations.
Object No.
Pixels
1 2 .. . 17 18
2081 5744 .. . 418 1912
First Hu Descriptor
Color ( , , )
496 206 .. . 209 499
0.1067, 0.3281, 0.5652 0.1181, 0.3465, 0.5354 .. . 0.4980, 0.2451, 0.2569 0.6220, 0.2165, 0.1614
(
Center )pixels 255, 765 262, 461 .. . 822, 451 854, 805
a two dimensional function, from the normalized moments of order up to three, it is possible to derive seven invariant moments or Hu descriptors. In this work, only the first Hu descriptor, is used as the shape representation, because it is sufficient to distinguish among the shapes used in this experiment: (7) and are normalized central moments of order 2. If more complicated and where diverse shapes are used, more descriptors may be used to increase the representing accuracy. Number of pixels of an object is used as the size representation. Normalized red ( ), green ( ), and blue ( ) components are used as the color representations. in Eq. (1). All ’s (or ) These three kind of representations are the elements of are stored in the low level knowledge base. It is low level in the sense that it contains only sensory data without any lexical information. A portion of the content in the low-level knowledge base is shown in the Table 2. According to the discussion in the section 2, color and shape are non-relative lexical symbols while size is a relative lexical symbol. Since these objects belong to finite number of colors and shapes, it should be possible to identify color and shape clusters within sensory data shown in the Table 2 as explained in the section 2. If these clusters are correctly identified, the number of color clusters should be equal to the number of object colors and the number of shape clusters should be equal to the number of object shapes. 5.2.1. Object clustering Object clustering is performed according to the non-relative features described in the section 2. The number of clusters is not a priori known for both shape and color. Therefore, we have used a leader-follower algorithm to find clusters because it need not know the number of clusters in advance. When defining = current center for cluster , = threshold, = a sample, the algorithm is as follows: begin initialize ,
919
C. Jayawardena et al. / Learning of Object Identification by Robots
Table 3. Clustered objects.
Table 4. Lexical symbols to cluster mapping
Table 5. Objects of same color and shape
Shape
Object
Cluster
No.
lexical
Shape
Color
Colored
Object
1, 3, 4, 9, 10, 11, 14,16, 18
symbol
cluster
cluster
Object
Nos.
cube
1
–
red cube
4, 16, 18
2, 5, 6, 7, 8, 12, 13, 15, 17
cylinder red
2 –
– 3
red cylinder green cube
5, 12, 17 3, 9, 10
green blue
– –
2 1
green cylinder blue cube
6, 13, 15 1, 11, 14
blue cylinder
2, 7, 8
1 2
Color Cluster
Object No.
1
1, 2, 7, 8, 11, 14
2 3
3, 6, 9, 10, 13, 15 4, 5, 12, 16, 17, 18
do accept new (find nearest cluster) if then else add new until no more patterns return end
For shape clustering, is taken to be 100. are the Hu moments given in the third column of the Table 2. For color clustering, is taken to be 0.1. are the normalized vectors given by the fourth column. 5.3. High Level Knowledge Base This is high-level in the sense that it contains lexical knowledge. This contains the mappings between lexical symbols and corresponding object features obtained from sensory data. Initially, this is empty. It is filled using the algorithms in the Fig. 1(a) and (b) as discussed in the section 3.
6. Results and Conclusion An image of the object table taken from the top camera is shown in the Fig. 4. The Table 2 shows a portion of object representations corresponding to object 1 to 18 contained in the low-level knowledge base. Table 3 shows objects clustered according to non-relative features, shape and color. After learning with the algorithm given in Fig. 1(a), mapping between non-relative lexical symbols and the clusters mentioned above is shown in the Table 4. This mapping provides the result shown in the Table 5. We can see that there are three objects of
920
C. Jayawardena et al. / Learning of Object Identification by Robots
Table 6. Final identification.
object
Object No.
Lexical Representation
Object No.
Lexical Representation
Object No.
Lexical Representation
1
small blue cube
7
small blue cylinder
13
small green cylinder
2 3
big blue cylinder big green cube
8 9
medium blue cylinder small green cube
14 15
medium blue cube medium green cylinder
4
big red cube
10
medium green cube
16
small red cube
5 6
medium red cylinder big green cylinder
11 12
big blue cube big red cylinder
17 18
small red cylinder medium red cube
the same color and the shape. They should be identified with their relative features as explained in the section 3. In this experiment there is one relative feature, size. Final object identification result is shown in the Table 6. In this paper, we have discussed the possibility of learning of object identification by robots commanded by natural language. The proposed concept was demonstrated with an object identification experiment using a PA-10 redundant manipulator. Users could command the robot to pick objects placed on a table using natural references like “big red cube,” “small blue cylinder,” etc. To identify the referred objects, composite lexical item understanding system based on individual lexical symbol learning was presented. In this implementation, relative small set of lexical symbols was used. Incorporating more lexical symbols and study about their interpretation is a future work. On the other hand, here we have not considered the learning of actions. That too is a possible improvement that can be included in a future work.
References [1] A. J. Baerveldt, A vision system for object verification and localization based on local features, Robotics and Autonomous Systems, 34(2001), 83–92. [2] D. Kragic, M. Bjorkman, H. I. Christensen, and J. O. Eklundh, Vision for robotic object manipulation in domestic setting, Robotics and Autonomous Systems, 52(2005), 85–100. [3] N. Bredeche, Y. Chevaleyre, J. D. Zucker, A. Drogoul, and G. Sabah, A meta-learning approach to ground symbols from visual percepts, Robotics and Autonomous Systems, 43(2003), 149–162. [4] P. Vogt, The physical symbol grounding problem, Cognitive Systems Research, 3(2002), 429– 457. [5] D. Roy, K. Y. Hsiao, N. Mavridis, Mental imagery for a conversational robot, IEEE Trans. on Systems, Man, and Cybernetics-Part B: Cybernetics, 34-3(2004), 1374–1383. [6] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans. of Info. Theory, 8(1962), 179–187.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
921
An f-MRI study of an EMG Prosthetic Hand Biofeedback System Alejandro HERNÁNDEZ A.*, Hiroshi YOKOI*, Takashi OHNISHI**, Tamio ARAI* *The University of Tokyo, Precision Engineering Dept. Japan (alex,hyokoi,arai-tamio)@prince.pe.u-tokyo.ac.jp ** Department of Radiology, National Center Hospital for Mental, Nervous, and Muscular Disorders, National Center of Neurology and Psychiatry, Tokyo 187-8551, Japan
Abstract: The need of biofeedback in Man-Machine interfaces is of vital importance for the development of subconscious control with external devices. In order to include external devices into the user’s body schema, we need to provide more information channels to the human body. In this case study we focus on a EMG controlled prosthetic system. We use electrical stimulation to translate pressure information into biofeedback for the human body. We used functional magnetic resonance imaging in order to measure the effectiveness of this system. Keywords FES, Biofeedback, Cortical Reorganization, fMRI
Introduction The development of myoelectric prosthetic hands has advanced incredibly since their introduction in 1960 [1]. Some studies present the preference of the myoelectric prostheses over body powered and cosmetic ones, due to its functionality. Still there is the need for tactile feedback in these systems. Some research has been done on electrically powered prosthetic devices [2, 3] and the increase in their efficiency when using some forms of tactile feedback [4, 5], but there is a little research done on the application of biofeedback to provide information to the human body [6, 7]. In the man-machine interfaces we find some research [8] on haptic interfaces in order to provide tactile feedback. Their direct application to prosthetics is limited due to the fact that all these researches focus on the sensorial substitution using the finger tips. Regrettably, those cannot be applied to prosthetic devices where the user presents partial or complete loss of the arm, which are our interest in this study. Therefore, we need to find a different way to provide sensorial information to the human body. It is been demonstrated that the brain works with correlative information, therefore when provided with simultaneous stimuli, the brain can associate the stimuli into a unique event [9, 10]. Using this knowledge, we can generate new sensations, provided that the stimulus is simultaneous, so the person using a prosthetic hand can have sensorial feedback besides the visual.
922
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
Our proposal is to use electrical stimulation to interact with the human body providing tactile information. In his study, we use an EMG controlled electrically powered prosthetic hand with 13 DOF (Figure 1). The system uses a neural network as classification unit to identify the intended movement from the user based on [11].
Figure 1Myoelectric Prosthetic Hand
As in the studies done on control of Functional Electrical Stimulation (FES) using Electromyography (EMG) signals [12], the electrical noise generated by the electrical stimulation affects the EMG signal processing, sometimes, overwriting the signal generated by the muscles. This becomes a challenge for the current system. In this study we intent to provide feedback to the hand user to interact with the environment in order to develop extended proprioceptive control of the robot hand. In this paper we present the results from our system information transition to the human body without affecting the EMG acquisition process by the electrical stimulation. In order to evaluate our system we used a functional magnetic resonance imaging system to measure the activation on the brain due to the electrical stimulation.
Background In medicine, biofeedback is used as treatment of several illnesses by providing information of the internal status of the body to the patient. For purposes of this paper, we treat the term “biofeedback” not only as the information acquired from the human body for control, but also, we include the information provided to the human body. There has been some work going on the application of different types of feedback besides visual to increase the efficiency in the interaction with mechatronics systems [7]. In most of the cases, the objective is to provide some form of tactile feedback, either to the human body or directly to the controller to improve its performance. To provide artificial feedback to the body, we could either, connect directly to the nervous system (invasive solution) or use sensory substitution (visual, auditory, tactile). The first solution might provide more reliable information to the brain, but there is high risk of rejection from the body, as well as, the development of infectious diseases. The use of visual or auditory feedback is not reliable enough when extended proprioception is intended [13]. Tactile and proprioceptive feedback substitution for artificial limbs can use direct neural stimulation, transcutaneous stimulation, or mechanical vibrators.
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
923
There has been some research concerning the use of electrical stimulation in order to provide proprioceptive feedback in lower limb prosthesis mentioned in [14]. From the results of these studies, it’s been show the benefits of providing biofeedback. We decided to provide tactile feedback to the prosthesis’ user by means of electrical stimulation. The use of myoelectric prostheses helps to reduce phantom pain and cortical reorganization in the brain [15]. There are some studies that try to measure the effects of the use of myoelectric hands [16] as well as to measure the activation in the brain of imaginary hand movements [17] using magnetic resonance. In this study we use functional magnetic resonance imaging to measure the activation on the brain due to the hand movements along with the tactile feedback provided by the electrical stimulation.
Methods This study focuses in the development of biofeedback, in this particular case, the transmission of tactile information from a prosthetic hand into the human body. In this paper, we describe the effects of the electrical stimulation when providing tactile feedback to the human body. In order to measure such effects we used a functional magnetic resonance imaging device. The system can be divided in two parts, EMG classification part and sensory part. The EMG classification system is based on the work done in [11,21]. The sensory part is composed by the electrical stimulator connected to the body. The stimulator device used during the course of this study was constructed using a Hitachi’s tiny H8 microprocessor. Figure 2a shows the stimulation device design used for this study.
a)
b)
Figure 2 a) Stimulator Design. b) Stimulator Output signal
The device has two output channels; the signal at channel 0 is generated by using the PWM port of the microcontroller. The output signal at channel 1 is generated by software, using general I/O ports of the microcontroller. The device is controlled by receiving the control commands from the computer using serial communication at 19,200 Bauds. The voltage output is set at 9 Volts. This device uses the biphasic stimulation method [18,19] (Figure 2b), because it requires of less energy to provide with the same effects of other stimulations, keeping the flowing current at a low level (10mA). The device translates the signal obtained from the pressure sensors into a
924
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
pulse based electrical stimulation signal, providing with tactile feedback to the system user’s body. The stimulation level control is done by changing simultaneously the duty rate of both phases, positive and negative, while keeping the pulse frequency constant. For purposes of this study, we used only the channel 1 of the stimulator.
Figure 3. The diagram shows the distribution of the different components of the system.
A.
Experimental Setting
In order to measure the effects of the electrical stimulation when using the EMG prosthetic hand, we performed the following experiments with system as shown in Figure 3. To identify and measure the brain activation due to the electrical stimulation, we applied two different levels of stimulation low and high. In order to determinate the stimulation levels, due to individual changes, we increased gradually the stimulation until the stimulation was perceptive enough (22%) to set the lower level. For the higher level of stimulation, we continue increasing gradually the stimulation until the feeling was stronger than the previous stimulation, but without any unpleasant reaction (44%). We used one channel for the surface electrical stimulation. Figure 4 a) shows the place used during the course of the experiment for the stimulation electrodes. The distance between electrodes was set to 1 cm to avoid muscle contractions. For this experiment, we used a test subject (amputee) and a control subject (healthy person). The test subject presents a 5 years old right arm amputation in the forearm. Figure 4 c) shows the amputee arm. For the pattern classification process, the surface EMG sensors were located as shown in Figure 4 c) and b). For this experiment, we trained the classifier with the following motions: fingers flexion/extension, wrist flexion/extension, thumb flexion. Cerebral activity was measured with fMRI using blood oxygen level-dependent contrast [22]. After automatic shimming, a time course series of 59 volumes was obtained using single-shot gradient-refocused echo-planar imaging (TR = 4000 msec, TE = 60 msec, flip angle = 90 degree, inter-scan interval 8 sec, in-plane resolution 3.44 x 3.44 mm, FOV = 22 cm, contiguous 4-mm slices to cover the entire brain) with a 1.5T MAGNETOM Vision plus MR scanner (Siemens, Erlangen, Germany) using the standard head coil. Head motion was minimized by placing tight but comfortable foam padding around the subject's head. The first five volumes of each fMRI scan were
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
925
discarded because of non-steady magnetization, with the remaining 54 volumes used for the analysis. The fMRI protocol was a block design with one epoch of the task conditions and the rest condition. Each epoch lasted 24 seconds equivalent to 3 whole-brain fMRI volume acquisitions. Data were analyzed with Statistical Parametric Mapping software 2 [23]. Scans were realigned and were transformed to the standard stereotactic space of Talairach using an EPI template [24]. Data were then smoothed in a spatial domain (full width at half-maxim = 8 x 8 x 8 mm) to improve the signal to noise ratio. After specifying the appropriate design matrix, delayed box-car function as a reference waveform, the condition, slow hemodynamic fluctuation unrelated to the task, and subject effects were estimated according to a general linear model taking temporal smoothness into account. Global normalization was performed with proportional scaling. To test hypotheses about regionally specific condition effects, the estimates were compared by means of linear contrasts of each rest and task period. The resulting set of voxel values for each contrast constituted a statistical parametric map of the t statistic SPM {t}. For analysis of the each session, voxels and clusters of significant voxels were given a threshold of P < 0.005, not corrected for multiple comparisons. In order to measure the effects of the stimulation while using the prosthetic hand we set the following setup. First we trained the EMG classification system for each subject. In this experiment we used the fingers flexion/extension task to grab a sphere. In order to provide with visual feedback, we used a video camera to show the subjects the prosthetic hand movements while inside the fMRI device. The subjects see the image through a set of mirrors that allow transmit the image from the video camera (Figure 4e). The subjects were requested to grab the sphere as soon as they saw it approach the prosthetic hand. When grabbing the sphere the pressure sensors placed on the hand were activated sending the stimulation signal at 44% of the maximum output to ensure that the test subjects perceived the stimulation while inside during the scan acquisition process.
e)
Figure 4 Shows an example for the surface stimulation method electrodes placement. a) gel type surface stimulation electrodes (PALS), b) surface EMG sensors (control subject), c) and d) surface EMG sensors (amputee) e)The image of the prosthetic hand was projected inside the fMRI device through a mirror array to show the movements of the prosthetic hand
926
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
Results From previous experiments we found that when using surface electrical stimulation, the subjects were able to identify the changes in the stimulation strength with a delay of 300 ms. The dynamical changes were discriminated properly, whether the stable phase level showed an error between 13% and 17%. The subjects were able to say when the stimulation changed, but needed some training in order to differentiate the different levels of stimulation (Figure 5). 㪪㫌㫉㪽㪸㪺㪼㩷㪪㫋㫀㫄㫌㫃㪸㫋㫀㫆㫅 㪈㪇㪇
㪈㪇㪇㪇
㪏㪇 㪎㪇 㪍㪇
㪍㪇㪇
㪌㪇 㪋㪇
㪋㪇㪇
㪊㪇
㪪㫋㫀㫄㫌㫃㪸㫋㫀㫆㫅㩷㪪㫋㫉㪼㫅㪾㫋㪿
㪪㫌㪹㫁㪼㪺㫋㩾㫊㩷㪩㪼㫊㫇㫆㫅㫊㪼
㪐㪇 㪏㪇㪇
㪉㪇
㪉㪇㪇
㪈㪇 㪇
㪇 㪇
㪐
㪈㪏
㪉㪎
㪊㪍
㪋㪌
㪌㪋
㪍㪊
㪎㪉
㪏㪈
㪐㪇
㪐㪐
㪈㪇㪏 㪈㪈㪎
㪫㫀 㫄㪼㩿 㫊㪀
Stimulation Signal 㪘㪛㩷 㪭㪸 㫃 㫌㪼 Subject's response
㪛 㫌㫋㫐 㩷㪩 㪸㫋㪼
Figure 5 Response to the surface electrical stimulation
When analyzing the effects on the EMG signal with the frequency spectrum, we found a small effect in the 50-60 Hz, with a more influential part at the higher frequencies. The EMG signal information is found between the 1-500Hz, by applying a low pass filter at 50 Hz we nullify the effects of the stimulator over the EMG pattern classifier (figure 6). 㪪㫌㫉㪽㪸㪺㪼㩷㪪㫋㫀㫄㫌㫃㪸㫋㫀㫆㫅㩷㩿㪋㩷㫂㪟㫑㪀
㪘㫄㫇㫃㫀㫋㫌㪻㪼
㪉㪌㪇㪇 㪉㪇㪇㪇 㪈㪌㪇㪇 㪈㪇㪇㪇 㪌㪇㪇 㪇 㪇
㪎㪊
㪈㪋㪍
㪉㪈㪏
㪉㪐㪈
㪊㪍㪊
㪋㪊㪍
㪌㪇㪐
㪌㪏㪈
㪍㪌㪋 㪎㪉㪎
㪎㪐㪐
㪝㫉㪼㫈㫌㪼㫅㪺㫐㩷㩿㪟㫑㪀 㪚㪿㪸㫅㫅㪼㫃㩷㪇
㪚㪿㪸㫅㫅㪼㫃㩷㪈
㪚㪿㪸㫅㫅㪼㫃㩷㪉
Figure 6 Spectrum analysis of the electrical noise produced by the stimulator over the surface EMG sensors
For the fMRI data analysis we used a value of p=0.005 (spm2), resulting in a threshold value of 2.59. With these settings, we found that when applying surface electrical stimulation only, the brains of the test subjects presented activation on the frontal lobe, denoting the processing of a new sensation, but the activation on the somatosensory and parietal area was not localized on a specific point.
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
927
When the subjects were asked to grab the sphere using the prosthetic hand, the results show a decrease in the activation on the frontal lobe, as well a decrease in the activation on the posterior parietal lobe, instead, we found a clear activation in the primary motor cortex due to the hand activation. We also found that the activation on the visual cortex does not appear in this test. Besides the activation on the right hand area, we found activation on the somatosensory area related with the right hand. The amputee subject present activation on the frontal lobe, equally to the healthy subject, for the application of electrical stimulation at 44% of the maximum on the left arm (healthy arm). The activation on the visual cortex is higher that the activation found on the healthy subject. The electrical stimulation again is not localized in a specific area. In the grasping case, the activation in the amputee brain increased along the primary motor cortex related to the right hand, due to the use of the right hand muscles for the prosthetic hand control. When asked, this person affirmed that there is still the image of the hand, which is what is used in order to control the prosthetic hand. In the primary somatosensory cortex, we see an increase in the activation, principally in the area related to the hand, but also, we see the activation on the area related on the left arm, where the stimulation is actually performed. After the fMRI scanning, the subjects were questioned on the sensation perceived during the grasping task. The results verified that the subjects presented an illusion of being touching the sphere. The illusion of feeling as if the right hand were touching the object is found in both cases (healthy and amputee subjects). The control subject presents a more specific activation than the amputee one. This may be related to the lack of use in the daily life of the right hand, besides the cortical reorganization present in amputees due to the lack of sensorial input from the missing limb.
a)
b)
Figure 7 Comparison between the stimulation results. a)electrical stimulation at 44% , b) grabbing an object while receiving stimulation at 44%.
In Figure 8 we can see in the case of the healthy subject the clear differentiation the stimulation when provided along with the grasping action using the prosthetic hand.
928
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
Discussion In the course of these experiments we confirmed the importance of the simultaneous stimuli needed for the brain to correlate, in order to be identified as a single event, opening new channels in the man-machine interaction. The fMRI is a useful tool to measure in an objective way the changes due to the interaction with the system proposed. Still there are several challenges to deal with, such as electrical noise while inside the fMRI chamber for it to become a more practical use. From the previous studies in neuroscience, there was the knowledge on the brain workings, that can be use now in order to generate more efficient man-machine interfaces. In this study we confirm the need for more sensorial channels for prosthetic applications. Along with continuous use and training, the correlation between visual and sensorial stimuli can be strengthened, allowing the development of a more close relationship. In the future work, we expect to continue measuring the development in the sensorial cortex due to continuous use of the prosthetic hand with sensorial feedback.
Acknowledgements This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 2004, No.16360118.And for the project RobotCub: Robotic Open-architecture Technology for Cognition, Understanding and Behavior (Project no. 004370).
Reference [1]
Kobrinski, A. E. et al.:”Problems of bioelectric control,” in Automatic and Remote Control, Proc. 1stIFAC Int. Cong., Vol. 2, Coles, J.F. (Ed), Butterworths, p.619, 1960 [2] Yiorgos A. Bertos+, Craig W. Heckathorne, Richard F. ffWeir, and Dudley S. Childress, ”Microprocessor Based E.P.P. Position Controller For Electric-Powered Upper-Limb Prostheses”, Proceedings - 19th International Conference - IEEE/EMBS Oct. 30 - Nov. 2, 1997 [3] “SensorHand technical information booklet,” Otto Bock Co., Ltd., 2001, http://www.healthcare.ottobock.com/ [4] Andrea Tura, Angelo Davalli et al. Upper limb myoelectric prostheses: sensory control system and automatic tuning of parameters. Intelligent Systems and technologies in rehabilitation engineering. CRC press, 2001 [5] Alvaro Rios Poveda. Myoelectric prosthesis with sensorial feedback. University of New Brunswick’s MyoElectric Controls/Powered Prosthetics Symposium. 2002. [6] M. Yoshida, Y. Sasaki, N Nakayama, ”Sensory feedback for biomimetic prosthetic hand”, BPES 2002,In Japanese “The 17th living body and physiology engineering symposium”. [7] Daniela Zambarbieri, Micaela Schmid, and Gennaro Verni. “Sensory feedback for lower limb prostheses”. Intelligent Systems and Technologies in Rehabilitation Engineering. Pp. 129-151. CRC press. 2001 [8] Arvin Agah, ̌Human interactions with intelligent systems: research taxonomy̍, Computers and Electrical Engineering, No. 27,pp. 71-107, (2001). [9] Carrie Armel K. and Ramachandran V. S. Projecting sensations to external objects: evidence from skin conductance response Proc. R. Soc. B. 270, p. 1499 – 1506, 2003 [10] Ramachandran V.S., Rogers-Ramachandran D. “Phantom Limbs and Neural Plasticity” Archives Neurology, vol.57, p 317-320, March 2000 [11] D. Nishikawa, W. Yu, M. Maruishi, I. Watanabe, H. Yokoi, Y. Mano, and Y. Kakazu, “On-line learning based electromyography to forearm motion classifier with motor skill evaluation,” JSME International Journal Series C, vol.43, no. 4, pp. 906–915, Dec. 2000.
A. Hernández A. et al. / An f-MRI Study of an EMG Prosthetic Hand Biofeedback System
929
[12] Thierry Keller and Milos R. Popovic ,“Real-Time Stimulation Artifact Removal In EMG Signals for Neuroprosthesis Control Applications”, Proceedings of the IFESS'2000 Conference, Cleveland, USA, June 2001. [13] Dick H. Plettenburg, Prosthetic Control: A case for extended physiological proprioception. University of New Brunswick’s Myoelectric Controls/ Powered Prosthetics Symposium. Pp 73-75. 2002. [14] Atkins D.J., Heard D.C.Y., Donovan W.H., “Epidemiologic overview: Individuals with upper limb loss, and their reported research priorities”, Journal of Prosthetics and Orthotics International, 25, 228-227, 2001. [15] M. Lotze, et al. “Does the use of a myoelectric prosthesis prevent cortical reorganization and phantom limb pain?” Nature Neuroscience, volume 2 no 6, June 1999 [16] Masaru Maruishi, et al.”Brain activation during manipulation of the myoelectric prosthetic hand: a functional magnetic resonance imaging study” Elsevier, Neuroimage 21.p.1604-1611,2004 [17] Lotze M. “Activation of Cortical and Cerebellar Motor Areas during Executed and Imagined Hand Movements: An fMRI Study”. Journal of Cognitive Neuroscience. No 11, p. 491-501, 1999. [18] Roger M. Nelson, Dean P. Currier. Clinical Electrotherapy, Second Edition. Appleton & Lange, 1991. [19] Alon G, DeDomenico G, “High Voltage Stimulation: An integrated Approach to Clinical Electrotherapy”. Chattanooga, Chattanooga Corp, 129-146. 1987. [20] Hosimiya Nozomu, Izumi Takashi, Handa Yasunobu. In japanse: FES ߦ߅ߌࠆᗵⷡࡈࠖ࠼ࡃ࠶ࠢ. ࡃࠗࠝࡔࠞ࠾࠭ࡓቇળ,Vol. 12, No. 1, 1998 [21] Alejandro Hernandez Arieta, Wenwei Yu, Hiroshi Yokoi et al. Integration of a Multi-D.O.F. Individually Adaptable EMG Prosthetic System with Tactile Feedback. IAS-8, F.Groren et al.(Eds) IOS Press, pp.1013-1021,(2004) [22] Logothetis, N.K.: The Neural Basis of the BOLD fMRI Signal. Phil. Trans. R. Soc. Lond. (357), 10031037 (2002) [23] Statistical Parametric Mapping, Wellcome department of Imaging Neuroscience. http://www.fil.ion.ucl.ac.uk/spm/ [24] M. Young, The Technical Writer’s Handbook, Mill Valley, CA: University Science, 1989. [25] Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ, et al. Design and construction of a realistic digital brain phantom. IEEE Transactions on Medical Imaging 1998; 17: 463-68.
930
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Wearable inertial sensors for arm motion tracking in home-based rehabilitation Huiyu Zhou a,1 , Huosheng Hu a and Nigel Harris b a University of Essex, Colchester, CO4 3SQ, UK b University of Bath, Bath, BA1 1RL, UK Abstract. We in this paper introduce a real-time human arm movement tracking system that can be used to aid the rehabilitation of stroke patients. A 3-axis inertial sensor is used to capture arm movements in 3-D space and in real time. The tracking algorithm is based on a kinematical model that just considers the human forearms at this stage. To improve accuracy and consistency, a Lagrangian based filtering strategy is adopted. The experimental results demonstrate that the proposed framework can be used to track forearm motion. Keywords. Rehabilitation, motion tracking, inertial sensor, filtering
1. Introduction Stroke is the biggest cause of severe disability in the UK. Ten thousand people each year experience a fi rst stroke, and a further 30,000 have a further stroke [5]. More than 75% of these people require multi-disciplinary assessments and appropriate rehabilitative treatments after they are discharged from hospital [1]. This places a large demand on community healthcare services, which often have quite limited therapy resources. As a result, there is considerable interest in training aids or intelligent systems that conduct rehabilitation in patient’s home environment rather than in hospital. The goal of rehabilitation is to enable a person who has experienced a stroke to regain the highest possible level of function. Although some functional abilities may be spontaneously restored soon after a stroke, recovery is an ongoing process and the patient must perform repeatative movements, correcting any undesired motion behavior in order to regain fi ne control of the upper or lower limbs. During these rehabilitation exercises, if the movements of stroke patients can be tracked incorrect motion patterns can be readily identifi ed and corrected. Therefore, devices that can accurately track the position of limbs in space is an essential component of such a rehabilitation system. Here, we will briefly summerize the existing human tracking systems, which can be classifi ed as non-vision based, vision-based with markers, vision-based without markers, and robot-guided systems. Non-vision based: Systems can deploy sensors, e.g. inertial, mechanical and magnetic ones, to continuously collect movement information. For example, inertial and 1 Correspondence to: H. Zhou, Department of Computer Science, University of Essex, Colchester, UK. Tel.: +44 1206 874092; Fax: +44 1206 872788; E-mail: [email protected].
H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation 931
Figure 1. Illustration of the proposed rehabilitation system.
magnetic sensors based devices (i.e. [6]) exploit micro-electromechanical systems (MEMS). These devices can be used in most circumstances without any limitations (i.e. illumination, temperature, or space, etc.). They have better performance in accuracy against mechanical sensors based devices. The main drawback of using inertial sensors is that accumulating errors (or drift) will become signifi cant after a short period of time. Vision-based with markers: Numerous marker-based tracking systems are nowadays available in markets and academics. CODA (Charnwood Dynamics, UK) and Qualisys (Qualysis AB, Sweden) are two examples: the former uses “active" markers, and the latter exploits “passive" markers that are grasped by the allocated cameras. However, these systems inevitably suffer from the occlusion problem due to the existence of the line of sight. Vision-based without markers: As a less restrictive motion capture technique, markerless based sensing is capable of partially overcoming the occlusion problem as it concerns boundaries or features on human bodies. This approach has a robust performance but is ineffi cient in computation and less effective in rigorous circumstances. To solve the ineffi ciency problem, for example, Mihailidis et al. [3] designed a sensing agent for an intelligent environment that assists older people with dementia during their life. It has shown promising results of 2-D hand motion tracking. However, this system failed to provide 3-D motion estimation. Robot-guided: To fi nd out whether exercise therapy influences plasticity and recovery of the brain following a stroke, an automatic system, named MIT-MANUS, was designed to move, guide, or perturb the movement of a patient’s upper limb, whilst recording motion-related quantities, e.g. position, velocity, or forces applied [2]. This system was successfully implemented. Unfortunately, during exercise then arms has to be attached to the robot arm so the patient is unable to carry out free motion. In this paper, a kinematic model of human forearm motion is developed, which can provide consistent estimates of forearm movements such as position and orientation based on a commercially available inertial sensor MT9 (Xsens Motion Technology, Holland). At this preliminary stage, the tracking system requires the elbow joint to be fi xed,
932 H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation S 1 (0,0,0) Y
Z X
sensor Human arm
S2 (x,y,z)
r rm
rea
wz
Fo
r’
S3 (r,0,0) Fixed point
Figure 2. Kinematics of a human forearm (the fixed point is the elbow joint).
however later work will remove this constraint. The designed motion tracking framework has been integrated within a home-based rehabilitation system illustrated in Figure 1. The rest of the paper is organized as follows. Section 2 presents a novel approach to conduct the 3-D tracking based on the collected angular rate, which exploits the kinematics of the human forearm movements. A weighted least squares fi ltering method is proposed in Section 3 that reduces the errors whose Euclidean distance is larger than a threshold. Section 4 introduces experimental results. Conclusions and future work are fi nally provided in Section 5.
2. Kinematic modelling of human arm motion A kinematic model of the forearm is proposed in this section. Consider a rigid human forearm moving in the 3-D inertial space. Figure 2 shows the kinematics of a human forearm, where the elbow presumably is fi xed and the inertial sensor is attached nearby the wrist. r is the distance between the the centre of the sensor and the fi xed point, which can be known a priori. Let the coordinates of an arbitrary point be denoted by u(x, y, z), then one can have projected coordinates on three orthogonal planes, i.e. x-y plane √ x=√ r − r2 − z 2 cos ωz (1) y = r2 − z 2 sin ωz y-z plane √ y = √r2 − x2 cos ωx z = r2 − x2 sin ωx x-z plane x= r − r2 − y 2 sin ωy z = r2 − y 2 cos ωy where ωx , ωy and ωz are Euler angles around x-, y- and z-axis, respectively.
(2)
(3)
H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation 933
Assume that a = (cos ωx )2 , b = (cos ωy )2 and c = (cos ωz )2 . x then is x=
r±r
1 − (1 + abc)(1 − c + bc − abc) 1 + abc
(4)
Substituting Equation (4) to the remainders of Equation (1), (2) and (3), the solutions for y and z will be explicitly available as follows
√ y = √r2 − z 2 sin ωz z = r2 − x2 sin ωx
(5)
Solutions for x, y and z rely on the estimated Euler angles, which are the integration of the collected turning-rates [7]. The angles usually accompany noise or drifts due to the inertial properties. These errors might be up to 5◦ or more, which can signifi cantly bias the estimated 3-D positions. One example is shown in Figure 3, where one can see signifi cant discrepancy between the estimates of x-axis position by a standard motion tracker and our method, respectively. So, a real-time fi lter is needed for restoring the true data.
3. Real time filtering We intend to reconstruct a true data point u from its observation u ˜ (e.g. x, y and z positions) u˜ = u +
(6)
where is noise or an unknown error. One of the common denoising techniques is to minimise a function of gradient given as |∇ε u|p dx + λ||˜ u − u||2 , (7) min Fε,p (u), and, Fε,p (u) = u
Ω
where λ ≥ 0 is a Lagrange multiplier, and ε is a regularization coeffi cient: 1
|∇ε u| = (u2 + ε2 ) 2 .
(8)
To solve the minimisation problem, Equation (7), we use the Euler-Lagrange equation as follows: ∇u + β(˜ u − u), (9) u1 = ∇ · |∇ε u|2−p where β (= 2λ p ) is the constraint parameter (indicating the descent direction), and λ can be available if we take the derivative for Equation (7) with respect to u and then set it to zero. The required derivatives are yielded as follows:
934 H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation 8 motion tracker our method
6 4 X POSITION (CM)
2 0 −2 −4 −6 −8 −10 0
200
400 600 SAMPLES
800
1000
Figure 3. Discrepancy between the estimates by a standard motion tracker and our method, respectively. rab2 c2 sin ωx (−c2 +b2 c2 −2a2 b2 c2 ) 2rab2 c2 sin ωx 2rab2 c2 d sin ωx ∂x ; ∂ωx = (1+a2 b2 c2 )2 − (1+a2 b2 c2 )2 + (1+a2 b2 c2 )d 2ra2 bc2 sin ωy 2ra2 bc2 d sin ωy ra2 bc2 sin ωy (1−c2 +b2 c2 −a2 b2 c2 ) ∂x + ∂ωy = (1+a2 b2 c2 )2 − (1+a2 b2 c2 )2 + (1+a2 b2 c2 )d r(−bc2 sin ωy +a2 bc2 sin ωy ) ; d ra2 b2 c sin ωz (1−c2 +b2 c2 −a2 b2 c2 ) 2ra2 b2 c sin ωz 2ra2 b2 cd sin ωz ∂x = − + ∂ωz (1+a2 b2 c2 )2 (1+a2 b2 c2 )2 + (1+a2 b2 c)d r(c sin ωz −b2 c sin ωz +a2 b2 c sin ωz ) ; d ∂x √ x ∂xy sin ωx x sin ω x ∂ωx ∂z 2 − x2 a; ∂z = − ∂ω √ √ + r ; 2 2 ∂ωx = − ∂ω y r −x r 2 −x2 ∂x ∂z x sin ω z sin ω x z ∂y ∂ωz ∂ωx ∂z √ ; ∂ωx = − √r2 −z2 ; ∂ωz = − r 2 −x2 ∂z √ z sin w z z ∂zz sin ωz ∂ωy ∂y ∂y √ √ ; ∂ω = − ∂ω + r2 − z 2 c. ∂ωy = − z r 2 −z 2 r 2 −z 2
(10)
where d = 1 − (1 + a2 b2 c2 )(1 − c2 + b2 c2 − a2 b2 c). When p = 1 and ε = 0, Equation (7) will become a problem of total variation (TV) [4]. However, noise or outliers often exist in the estimation because of irregular motion or the soft-skin effect, etc. These points possibly mislead the solution by Equation (7) to an incorrect position. To remove these errors before computing, we explore a smoothing scheme: in a small neighborhood, the point with the variance of the Euclidean distance between its position and the weight centre of the region larger than twice of the averaging variance will be considered as errors and be removed from the computation list. From the effi cient point of view, the size of the neighborhood is 5-7. This fi ltering strategy is illustrated in Figure 4, where square 7 is the fi ltered output of circle 7 given the previous seven points, and the dotted circle consists of the inliers. To conduct a fast minimization for Equation (8), we use the iterative LevenbergMarquardt (L-M) algorithm. Although L-M can only seek a locally optimal solution, we wish that it converges to the correct solution by setting the starting point as the weight centre of the investigated point area.
4. Experimental work To evaluate the performance of the fi lter-on kinematic model against the fi lter-off technique, we make use of a commercial human motion tracking system (“Qualysis"). This
H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation 935
5 2
6
11
4
8
1
10
7 9
3 7
Figure 4. Illustration of the denoising strategy.
system provides absolute positions for the human arm movements during trajectories. Qualysis uses retro-reflective ball markers that can be identifi ed by the cameras surrounding the object person. It directly reconstructs 3-D positions of the moving human limbs after proper calibration is achieved. To set up the experimental environment, a Qualysis marker is attached to one side of the wrist joint while the MT9 inertial sensor is mounted on the other side. This method has better accuracy than the confi guration where the marker and the inertial sensor both are placed on the same side. The distance between the elbow joint (fi xed point) and the centre of the sensor is 24 cm. Figure 5 illustrates the experiemental set-up in which the “Qualysis" has 3 video cameras to observe upper limb movements via the ball marker. Before the experiments start, we need to run a calibration for aligning the coordinate system of the inertial sensor with that of the Qualysis system. Inertial measurements can be obtained by going through the following procedure. The MT9 inertial sensor collects data and then pre-fi lters it to remove high-frequency noise. This is followed by computing the 3-D position and orientation based on the proposed kinematical modelling. Due to the presence of noise and errors a dynamic fi lter is applied for suppressing the noise or errors. This process will run continuously until it is manually terminated. In the experiments, these two systems work independently and asynchronously, so we cannot directly compare their 3-D motion estimations. A simple way to get around this problem is to develop a “least-squared fi tting" method, which is used to fi t individual estimates to one-cycle curves (see Figures 6 and 7). By doing this, one can easily judge whose performance is better. A number of repeated motion patterns have been captured in order to obtain comprehensive comparisons. In this paper, only two cases are investigated: (1) up-down motion, and (2) cyclic rotation in 3-D space. It should be noticed that the elbow joint is kept stationary on the testing bench during the forearm movements. Case 1: up-down movements In this experiment, the up-down forearm motion is captured by both the MT9 inertial sensor and the “Qualysis" tracking system. Figure 6 shows 3-D motion trajectories of the forearm movements that are captured. Note, the solid line is the “Qualysis" data, the dashed line is the inertial data after fi ltering, and the dotted line is the inertial data without fi ltering. These results are the mean values produced by the weighted least-square calculation. The mean error between the “Qualysis" data and the inertial data after fi lter-
936 H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation
1
2
video camera
Z
X
MT9
Y 3
Figure 5. Illustration of the experimental set-up.
ing is 1.09 cm (SD: 0.47 cm), while the mean error between the “Qualysis" data and the inertial data without fi ltering is 1.52 cm (SD: 1.05 cm). Case 2: cyclic rotation Figure 7 shows 3-D motion trajectories of this motion style, and the symbols can be referred to the earlier description. The mean error between the “Qualysis" data and the inertial data after fi ltering is 1.73 cm (SD: 0.93 cm), while the mean error between the “Qualysis" data and the inertial data without fi ltering is 2.27 cm (SD: 1.56 cm).
5. Conclusions and future work We have presented a wearable inertial sensing based tracking system that integrates kinematics of human arm movements and a dynamic fi ltering strategy. Compared to the commercial tracking system “Qualysis" that uses markers, our system is able to deliver real time human forearm motion estimation with a simple set-up. Also, our system has achieved reasonably accurate results. The future work will be addressed to extend the ideas presented here in order to implement a portable device that considers real threejoints arm movements with higher degrees of freedom. The kinematic model of the upper arm is the same as that of the forearm, although the fi xed point is now located at the shoulder. Another MT9 sensor will be used to represent this motion.
Acknowledgements This is part of SMART Rehabilitation Project funded by the UK EPSRC under Grant GR/S29089/01. We are grateful to Ms Yaqin Tao for helping us to set up the experiments.
References [1] J.H. Cauraugh, S. Kim, Two coupled motor recovery protocols are better than one electromyogram-triggered neuromuscular stimulation and bilateral movements, Stroke 33 (2002), 1589–1594.
H. Zhou et al. / Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation 937
3 1 2
25
Z (CM)
20 15
−6.5 −6 −5.5 −5 X (CM) −4.5
10 5 5
10
15
20
25
Y (CM)
Figure 6. Case 1 - Up-down motion trajectories: the “Qualysis" data (solid line - 1), the inertial data after filtering (dashed line - 2) and the inertial data without filtering (dotted line - 3).
3
25
1
Z (CM)
20
15 2
10
20
10 Y (CM)
0
−5
0
5
X (CM)
Figure 7. Case 2 - Cyclic motion trajectories: the “Qualysis" data (solid line - 1), the inertial data after filtering (dashed line - 2) and the inertial data without filtering (dotted line - 3).
[2] H.I. Krebs, B.T. Volpe, M.L. Aisen, N. Hogan, Increasing productivity and quality of care: robot-aided nero-rehabilitation, Journal of rehabilitation research and development 37 (2000), 639–652. [3] A. Mihailidis, B. Carmichael, J. Boger, The use of computer vision in an intelligent environment to support aging-in-place, safety, and independence in the home, IEEE Transaction on information technology in medicine 8 (2004), 238–247. [4] L. Rudin, S. Osher, E. Fatemi, Nonlinear total varitation based noise removal algorithms, Physica 60 (1992), 259–268. [5] The Stroke Association, Speaking out about stroke services, London: the Stroke Association, 2001. [6] H. Zheng, N.D. Black, and N.D. Harris, Position-sensing technologies for movement analysis in stroke rehabilitation, Medical & Biological Engineering & Computing 43 (2005), 413– 420. [7] H. Zhou, H. Hu, Inertial motion tracking of human arm movements in home-based rehabilitation, Proc. of IEEE Int. Conf. on Mechatronics and Automation, Niagara Falls, Canada, 2005.
938
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Learning and control model of arm posture K.S.Kim
a
H.Kambara a D.Shin
c
M.Sato
b
and Y.Koike
b,c
a
b
Department of CISS, Tokyo Institute of Technology Precision and Intelligence Laboratory, Tokyo Institute of Technology c CREST, Japan Science and Technology Agency
Abstract. We will introduce a learning and control model of arm posture which can be thought useful as a control model for a humanoid-robot, such as a muscledriven robot in terms of generating proper tensions to each muscle. It is difficult to generate proper tensions for controlling arm posture because the problem is redundant, which means that the number of ways of choosing tensions is a lot even in generating the same joint torque. In simulation of using the model, we got proper and energy efficient tensions needed to maintain the arm in one’s goal positions. The model used the Actor-Critic method and the feedback-error-learning scheme, which are considered as a control model in the brain that humans use. Keywords. Learning, Posture control, Muscle-driven robot, Simulation, ActorCritic method, Feedback-error learning
1. Introduction When talking about a robot, a lot of people may imagine a motor-derived robot which has motors around the joints for making movements. However, many researches on humanoid-robots having similar architecture to humans have been performed[1][2]. One good example of them is a muscle-driven robot where movements are made by controlling muscles attached to joints in parallel to limbs. However, it is hard to decide proper tensions to each muscle for controlling arm posture since the problem of generating the tensions is redundant. Here, we acquired proper and energy efficient tensions in simulation of controlling arm posture using previously proposed model by Kambara et al[3] and added energy efficient reward in the Actor-Critic method[4]. The model used the ActorCritic method and feedback-error learning scheme[5]. The Actor-Critic method is one of the major framework in reinforcement learning. Reinforcement learning is a learning method acquiring suitable actions based on the rewrad given through trial-and-error interactions with a dynamic environment. It learns and acquires actions so as to get higher reward. Feedback-error learning, proposed by Kawato et al[5], is a scheme to learn the outputs of inverse dynamics model(IDM) for feedforward-control so that the outputs of feedback controller become zero. In case of controlling arm posture, the IDM means the arm model in our brain that calculates motor commands necessary to maintain the arm at a desired position. In simulation, a two-link arm model with six muscles, shown in Fig.2, was used as controlled object in Fig.1. The arm model was also used to transform
939
K.S. Kim et al. / Learning and Control Model of Arm Posture
IDM ISM
u ff Gate
θd
+
θe −
Actor Critic
u fb
δ
+
+
u
Controlled Object
θ
r
Figure 1. Learning and control model using Feedback-Error-Learning and the Actor-Critic method
motor commands to muscle tensions and joint torques. In this paper, we show the graphs of enegy efficient motor commands acquired from the learning and discuss the possiblity of Fig.1 as a control model of a muscle-driven robot.
2. Method 2.1. Process of the learning and control model The learning and control model is shown in Fig.1. We adopted the Actor, a controller of the Actor-Critic method, as the feedback controller in the feedback-error-scheme. In this simulation, the trajectory space, θ = (θs , θe , θ˙s , θ˙e ), consists of angles and angular velocities of shoulder and elbow joint. However, angular velocities in the desired trajectory space, θ d = (θs , θe , 0, 0), are always set to be 0 because the task purpose is to maintain the arm at a desired position. Therefore, we call the feedforward controller in Fig.1 inverse statics model(ISM), in order to distinguish it from IDM including angular velocity and acceleration. As shown in Fig.1, ISM receives desired trajectory θ d and outputs feedforward motor commands uf f for six muscles. Meanwhile Actor receives the trajectory error θ e between desired trajectory θ d and actual trajectory θ, and produces feedback motor commands uf b for six muscles. The sum of uf f and uf b is fed to the controlled object as motor commands u. Then, the actual trajectory θ is updated according to the dynamics of the controlled object and the reward r which is a function of updated trajectory is calculated. Based on the reward, a reinforcement signal δ, which is called temporal difference(TD) error, is calculated[6]. Improvements of Actor and Critic are carried out by δ. At the same time, the output of Actor uf b is used as the error signal for improving ISM. It must be noted that this error signal uf b doesn’t work correctly until the improvement of Actor reaches on some level. From this reason, Gate influenced by TD error δ was
940
K.S. Kim et al. / Learning and Control Model of Arm Posture Muscle sh flx : shoulder flexor sh ext : shoulder extensor el flx : elbow flexor el ext : elbow extensor dj flx : double joint flexor dj ext : double joint extensor
dj flx
sh flx
el flx el ext
sh ext
dj ext
elbow
shoulder Figure 2. Arm model
made in Fig. 1 so that we can properly change the influencing rate of uf b to ISM. This idea comes from the fact that the magnitude of TD error decreases as the improvement of Actor and Critic advances. It is one of the properties in reinforcement learning. 2.2. Implementations of ISM, Actor-Critic and Gate We used TD(λ) learning and continuous Actor-Critic methods for the implementations of Actor-Critic[6]. Normalized Gaussian Network(NGNet)[6] were used as function approximators for Actor-Critic and ISM. The network of ISM is as follow N ff f f _max ff e ∗g qji bi (θ (t)) (1) uj (t) = uj i=1 e
bi (θ (t)) is a normalized gaussian basis function[6]. N = 225 is the number of the basis functions. qji is a weight from ith basis function to jth output. ufj f _max is the maximum value of jth motor command at ISM, which is set to 1. g() is a sigmoid function. The update of ISM is following the below rule. q˙ji = αufj b (t)bfi f (θ e (t))
(2)
α is a learning coefficient that changes depending on TD error. Gate makes the α value increase with the decrease of TD error and is expressed as follow. α = α0 exp(−
ˆ2 |δ| ) s2lr
(3)
ˆ is a value that takes average of absolute δ(t) during one trial before. α0 = 0.08 and |δ| slr = 0.01 are constants. 2.3. Simulation task and controlled object The learning and control model was tested on the posture control of the upper limb in a saggital plane. We used the two-link arm with six muscles as shown in Fig.2. The muscle
K.S. Kim et al. / Learning and Control Model of Arm Posture
941
Table 1. Moment arms Moment Arms [m] Around Shoulder
Around Elbow
sh flx
0.04
0
sh ext
-0.04
0
el flx
0
0.025
el ext
0
-0.025
dj flx
0.028
0.028
dj ext
-0.035
-0.035
model can be written in Eq.4 expressing the muscle as an elastic element and a viscous element arranged in parallel[7]. Each muscle is designed to be a nonlinear actuator, which muscle tension, T , depends on the motor commands, length and contraction velocity of muscles. (j)
(j)
(j)
(j)
Tj = (k0 + k1 uj )(l0 + l1 uj −
2
aij θi )
i=1 (j)
(j)
−(b0 + b1 uj )(
2
aij θ˙i )
j = 1, 2, ..., 6.
(4)
i=1
Here, u is a motor command to each muscle. k, b and l are the elasticity, visosity and muscle length coefficients, respectively. j is an index of muscles. i is an index of joints. Since the controlled object is two-link arm with six muscles, the value of i and j are 2 and 6, respectively. a is a moment arm. θ is the joint angle. θ˙ is the joint angular velocity. Each coefficients are determined and adjusted properly for the simulation task, based on the data from [3] and [7]. The values of data are shown in Table 1 and 2. τi =
6
aij Tj
i = 1, 2.
(5)
j=1
Joint torque, τ , was calculated using Eq.5. The i and j are indexs of joints and muscles, respectively. The simulation task is to learn motor commands which brings hand to a desired position. Once the motor commands are determined from the outputs of ISM and Actor, muscle tensions and torques can be calculated from Eq.4 and 5. 2.4. Reward Now we explain about the reward r in Fig.1. Again, note that reinforcement learning is a learning method that acquires actions so as to get higher reward. Therefore, it is important to set up reward functions that have the maximum value on the goal. In this task, our goal is to acquire motor commands which is energy efficient and maintain the hand at a desired position. First of all, we determined a reward for arm posture rp (t) as follows d(t)2 (6) rp (t) = 2 exp(− 2 ) − 0.5 sr
942
K.S. Kim et al. / Learning and Control Model of Arm Posture
Table 2. Muscle parameters k0 [N/m]
k1 [N/m]
b0 [N s/m]
b1 [N s/m]
l0 [m]
sh flx
1000
6000
50
100
0.150
0.150
sh ext
1000
4000
50
100
0.055
0.150
l1 [m]
el flx
600
2800
50
100
0.100
0.150
el ext
600
2400
50
100
0.040
0.150
dj flx
300
1200
50
100
0.250
0.150
dj ext
300
1200
50
100
0.130
0.150
k1 and b1 are the elastic and viscosity coefficients, respectively; k0 and b0 are the intrinsic elasticity and viscosity; l0 , intrinsic rest length; l1 , constant.
where d(t)[m] is the distance error between desired hand position and actual one. sr is the constant which adjusts how much the distance error is severely evaluated. Its value is set to 0.1. In this simulation, rp (t) gets more reward as the hand approaches to the desired position. Next we determined a penalty ru_actor (t) in proportion to the square of Actor output uf b (t) as follows ru_actor (t) = ku_actor
6
ufj b (t)2
(7)
j=1
where ku_actor is the coefficient, 0.3 and j is an index of muscles. The role of this penalty is to make the motor commands of Actor become near 0 when ISM having a good accuracy is acquired. Since maintaining a desired arm-position can be successfully carried out only with the accurate ISM, the outputs of Actor is needed to be 0. At last, we set another penalty ru_total (t) in proportion to the square of the sum, u(t), of Actor output and ISM output as follows ru_total (t) = ku_total
6
uj (t)2
(8)
j=1
where ku_total is the coefficient, 0.05 and j is an index of muscles. Even if the outputs of Actor and the distance error of the hand are 0, Eq.8 gives penalty if u(t) has a large value which is bad in terms of energy efficiency. As a result, energy efficient motor commands can be acquired by adding the penalty Eq.8 to overal reward r(t). The overall reward is expressed as follows r(t) = rp (t) − ru_actor (t) − ru_total (t)
(9)
2.5. Simulation conditions during the learning Each trial continues until the simulation time reaches at 2 seconds or before the updated θ exceeds the boundary range: (θs [deg], θe [deg], θ˙s [deg/sec], θ˙e [deg/sec]) = (-120 ∼ 0, 0 ∼ 130, -900 ∼ 900, -900 ∼ 900). θs stands for the shoulder angle from the horizontal axis to upper limb. θe stands for the elbow angle from the extended segment of upper limb to lower limb. At the beginning of every trial during the learnig, one desired and one initial positions are chosen randomly from the following range, (θs , θe ) = (-90 ∼
943
K.S. Kim et al. / Learning and Control Model of Arm Posture 1
Reward
Reward
1
0
−1 0
0.5
1
1.5
2
2.5
0
−1 0
3
0.5
1
1.5
2
20
10 0 0
0.5
1
1.5
2
Number of trials
2.5
3 4
x 10
Figure 3. Reward and Distance errors were averaged and filtered when ru_total (t) was set to 0 in the overall reward during the learning.
Distance Error[cm]
Distance Error[cm]
30
3
2.5
4
x 10
x 10
4
30 20 10
0 0
0.5
1
1.5
2
Number of trials
2.5
3 4
x 10
Figure 4. Reward and Distance errors were averaged and filtered when ru_total (t) is included in the overall reward during the learning
-50, 30 ∼ 90)[deg](See "· · · " ranges in Fig.5 and 7). The chosen desired position is fixed during one trial. Calculation of motor commands, update of trajectory and improvements of ISM and Actor-Critic during the trial are all performed every 0.01 second.
3. Result Fig.3 and 4 show the averaged reward of r(t) and distance errors d(t) from desired positions during 30000 trials. The values in the figures were calculated as follow: (1) Calculate the average of r(t) and d(t) during one trial and repeat these calculation for 30000 trials (2) Pass it through a low-pass filter with 100 trial time constant for smoothing. As shown in the figures, the rewards increase toward the maximum value, 1, and the distance errors decrease toward zero as the trials are going on. From these results, we can see that proper motor commands needed to maintain arm posture get acquired whether or not energy efficient reward ru_total (t) is included in Eq.9 during the learning . To check the accuracy on maintaining arm posture, we investigate the arm posture after controlling the controlled object in 2[s]. Weight files of the controllers(ISM and Actor) at the 20000th trial were used for the test. The trial number, 20000, was determined where the learning is considered adequate from Fig.3 and 4. Each start position during the test was set the same as each goal position. Fig. 5 and 7 show the results for each reward method. Each "∗" and "◦" indicates the desired and actual hand position respectively. "· · · " represents the range of desired positions used during the learning. These results indicate that controlling arm posture with feedback-error scheme was achieved successfully on various positions in the desired range( the desired and actual hand positions in Fig 5 and 8 are overlapped). On the other hand, there is a clear difference between aquired motor commands when including ru_total (t) during the learning or not as shown in Fig.6 and 8. In order to compare the differences quantitatively, summed motor commands and tensions of six muscles are used. When ru_total (t) is set to 0 in the overall reward during the learning, the summed motor commands and tensions are 1.93 and 1296[N]. When ru_total (t) is included in the overall reward during the learning, the summed motor commands and tensions are 0.56 and 671[N]. Therefore, it can be said
944
K.S. Kim et al. / Learning and Control Model of Arm Posture 1
20 Goal Hand Position
ufb
−10 −20
−40 −50
0
10
20
30
40
50
60
70
Horizontal direction [cm]
Figure 5. Arm positions after controlling the arm model in 2[s] using ISM and Actor: ru_total (t) was set to 0 in the overall reward during the learning.
2
1
2
1 Time[sec]
2
2.5 0 0
Figure 6. Motor commands to six muscles and distance error at (-70,70)[deg] when using ISM and Actor: ru_total (t) was set to 0 in the overall reward during the learning.
1 ff
20
u
Learning range Goal
10
0.5 0 0 0.5
Hand Position
0
fb
−10
1
2
1
2
1 Time[sec]
2
u
0
−20
−0.5 0 5
−30
Distance[cm]
Vertical direction [cm]
1
0 −0.5 0 5
−30
−60 −10
0.5 0 0 0.5
0
Distance[cm]
Vertical direction [cm]
uff
Learning range 10
−40 −50 −60 −10
0
10
20
30
40
50
60
70
Horizontal direction [cm]
Figure 7. Arm positions after controlling the arm model in 2[s] using ISM and Actor: ru_total (t) was included in the overall reward during the learning.
2.5 0 0
Figure 8. Motor commands to six muscles and distance error at (-70,70)[deg] when using ISM and Actor: ru_total (t) was included in the overall reward during the learning.
that energy efficient motor commands are acquired by considering ru_total (t) during the learning.
4. Discussion The learning and control model[3] including ru_total (t) in Eq.9 during the learning can be thought useful as a control model in a muscle-driven robot in terms of energy efficiency. It is also natual as a control model in humanroid-robots to use small motor commands since humans tend to control their bodies with less force in experienced tasks. Another advantage in using this model in the muscle-driven robot can be to acquire proper tensions through the learning automatically and safely by avoiding the situation where the robot continuously outputs big motor commands. This is possible because the out-
K.S. Kim et al. / Learning and Control Model of Arm Posture
945
put of big motor commands cause the overall reward low when Eq.8 is included in the overall reward during the learning. On the other hand, some problems can occur when applying this model to a real robot. First of all, how to change the goal position given in cartesian coordinates to goal angles in body coordinates necessary to the model as input signals? The goal angles will be redundant for one goal position if the arm model in Fig.2 has 7-DOFs such as a human arm. Secondly, the delay in sensory information and generating tensions from the motor commands can make the model learn wrong motor commands. Correct evaluation of reward corresponding to current motor commands may be interfered from the delay. Therefore, in order to apply this model to a muscle-driven robot having less than 7-DOFs, one thinkable solution is to extend time interval, set to 0.01[s] in this simulation, so as to be long enough for the state, caused by current motor commands, of the robot to be in stable and get sensory informations correctly. Another solution would be to use feedforward dynamics model to estimate the state that current motor commands will influence in future with available sensory information at present. Then, the reward for current motor commands will be evaluated correctly.
5. Conclusion We introduce a learning and control model of arm posture where an energy efficient reward ru_total (t) was added to previously proposed model[3] during the learning. By comparasions of the acquired motor commands in the original reward method with in the new reward method, controlling arm posture in the new method was acheived successfully with less motor commands and the same accuracy as observed in the origin method. We also discussed the advantages of the new method in terms of enegy efficiency, safety in applying it to a humanoid-robot, and a natural aspect as a control model that humans use. As a next step, we plan to apply this humanlike model to a muslce-driven arm in consideration of mentioned points in the discussion.
References [1] M. Wisse and J. van Frankenhuyzen, "Design and construction of Mike; a 2D autonomous biped based on passive dynamic walking," Proc., 2nd International Symposium on Adaptive Motion of Animals and Machines. 2003, Kyoto, Japan. [2] Ikuo Mizuuchi, Tomoaki Yoshikai, Yuto Nakanishi, and Masayuki Inaba, "A ReinforceableMuscle Flexible-Spine Humanoid “Kenji”," In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.692–697, 2005. [3] H. Kambara, J. Kim, M. Sato and Y. Koike, "Learning Arm’s Posture Control using Reinforcement Learning and Feedback-Error-Learning," Proceedings of the 26th Annual International Conference of the IEEE EMBS, pp486-489, 2004. [4] R. S. Sutton and A. G. Barto, “Reinforcment Learning,” MIT Press, Cambridge, MA, 1998. [5] M. Kawato, K. Furukawa and R. Suzuki, “A hierarchical neural-network model for control and learning of voluntary movement,” Biological Cybernetics, 53, pp57-66, 1987. [6] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural Conputation, 12, pp219-245, 2000. [7] M. Katayama and M. Kawato, “Virtual trajectory and stiffness ellipse during multijoint arm movement predicted by neural inverse models,” Biological Cybernetics, 69(5-6), pp353-362, 1993.
946
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Competitive Learning Method for Robust EMG-to-Motion Classifier Ryu KATO, Hiroshi YOKOI, Tamio ARAI Intelligent Systems Division, Dept. of Precision Engineering, Graduate School of Engineering, The University of Tokyo, JAPAN
Abstract. This paper describes a competitive learning method for a robust EMGto-motion classifier system, to adjust to the change in user's characteristics. This method is done under the assumptions that the input motions are continuous, and the teaching motions are ambiguous in nature, therefore, automatic addition, elimination and selection of learning data are possible. Applying our proposed method, we conducted experiments to discriminate eight forearm motions, with the results, a stable and highly effective discrimination rate was achieved and maintained even when changes occurred in user's characteristics. Keywords. EMG, Mutual Adaptation, ADL, Motion discrimination, f-MRI
Introduction Surface electromyogram (EMG) is the electric potential detected from the skin surface during muscle contraction. The research on the estimation of human’s motion from the surface EMG can be applied as a control signal source for the prostheses. Further technological innovation and research in this field will enable disabled people to perform basic ADL (Activities of Daily Living), fulfilling their physical and psychological needs. Many researchers in this field have proposed EMG-to-motion classifiers to acquire the relationship between an EMG signal pattern and a corresponding motion (what we term as “user’s characteristics”) by machine learning methods[1][2][3]. They generate the learning data, which is a mapping between EMG signal pattern and the desired motion, under the assumption that the feature space expressed in the EMG pattern is invariable in time. However, we encounter difficulties when they hypothesized the invariability as mentioned above. The relationship between an EMG signal and the corresponding motion is not a one-to-one relationship because it varies with time. It is clear that it is difficult to discriminate many motions stably using conventional learning methods. One known cause is the physiological factor (such as the change of impedance in the skin when perspiration occurs, and muscle fatigue due to continuous muscle contraction), and another known cause is the physical factor (such as slippage of EMG sensor from the skin surface). Note that the former exhibits a gradual change, where as, the later exhibits a drastic change against time. As a solution to this problem, in this paper, we propose a competitive learning method to adjust to gradual and drastic changes in user’s characteristics. This method consists of three functions on learning data: automatic elimination (AE), automatic addition (AA), and selective addition (SA). AE and AA judge a discrimination state by monitoring the discriminating results at all time, and adjust to the gradual change by
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
947
eliminating or adding the learning data according to the judging result. The judging criterion is the continuity in motion. SA also helps in adjusting to the drastic change by adding new learning data instructed from the user, according to the complexity in the latest learning data. In order to investigate the effectiveness of the proposed method, we conducted an experiment with amputees and non-amputees to discriminate eight forearm motions with the system, and analyze brain function by f-MRI for an amputee to investigate the effect of learning on the user.
Overview of EMG-to-motion classifier system The outline of the EMG-to-motion classifier system in our research [4] is shown in Fig.1. Based on the general methodology for pattern recognition, the conventional method employs two processes: Feature extraction and recognition. The feature extraction process generates a feature vector from EMG signals. The feature vector is expressed as a motion pattern of the user. In the proposed method, the average rectified value and the frequency component of EMG signal are employed as components of the feature vector. Here, feature vector is represented as X (Ł{x1,…,xD}). A recognition process discriminates a desired motion ș from the X. The discrimination method must acquire adaptively the user’s characteristics (that is the relationship between X and ș) so that it can deal with the individual variation. Therefore, the recognition process employs a three-layered feed-forward neural network (NN) for the nonlinear mapping function (the number of input-hidden-output unit: D-42-M) which a learning mechanism is embedded. The number of unit in output layer is equal to the total number of discrimination motion M. Given that the output layer in network is expressed as output vector O(Ł{O1,…,OM}, Ok [0.0,1.0]) and discriminating motion is represented as Ĭ(Ł{ș1,…,șM,ij}). When only output of one output unit Ok exceeds the threshold IJ, motion șk corresponding to k th unit is selected as discriminated motion. Otherwise, this process does not discriminate motion according to the judgment that can not discriminate one motion from the current feature vector. We call this action “reject ij”. Moreover, learning data are generated from X when the user executes a desired motion ș. This process is executed by the user’s direct instruction to the system. Here, management mechanism of this learning data set is described as “user model”. The user model stores the acquired new learning data ȥ, updates the weight w of the NN using stored learning data set Ȍ so that only the output unit corresponding to the desired motion is firing. It employs the back-propagation algorithm as the learning method. This learning process is executed when Ȍ are modified. Discrimination process
Feature vector X
Human EMG
Feature Extraction
Recognition
X
Instruction of desired motion ID 㱔 Instruction process
: Proposed method
Discrimination result㱔
Selective Addition
Update rules
User model
㱔
Automatic Addition/Elimination
Figure 1. Overview of proposed EMG-to-motion classifier system
948
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
Competitive learning method for learning data in EMG-to-motion classifier In the conventional method, this learning process has two problems to discriminate the motion stably. First, it is difficult to terminate the process in a short period of time because of drastic increase in the number of stored learning data. Another problem is the decrease in the discriminating rate of this method caused by the time change in feature vectors corresponding to desired motions. In order to solve these problems, we designed a competitive learning method which gives stored learning data Ȍ the competition in real-time (Fig.1). This method modifies Ȍ according to the changes in the users characteristics, executes to re-learn the mapping for discriminating motion in each case. This method consists of three functions: the automatic elimination (AE), automatic addition (AA) and the selective addition (SA) of Ȍ. In the conventional user model, learning data was generated by user’s interaction only. Our proposed method realizes automatic elimination and addition of learning data by monitoring the discriminating results at all time. Furthermore, it realizes selective addition of new learning data according to the complexity of latest Ȍ. To begin with, we define the learning data. Assuming the learning datum ȥ and stored leaning data Ȍ are represented as M
\ T { X, T , c !, Ȍ T { {\ 1T ,\ 2T , ,\ TȌT }, Ȍ { k 1 Ȍ k
{\ 1T1 ,\ 2T1 , ,\ kT M , ,\ TȌM } (1)
where X is a feature vector corresponding to the desired motion ș ( Ĭ-{ij}); c is the degree of contribution to the discrimination by learning datum; When c of one learning datum is relatively larger than that of the other one, the learning datum ȥ is more important in that situation; Ȍș is the learning data set belonging to the same class ș. It is necessary to evaluate the state of motion discrimination so that the EMG-tomotion classifier system autonomously acquires effective learning data for motion discrimination. In this method, we set the following criterion for judging the success or failure of discriminating state: “The interval where the discrimination result of a single motion șk is monitored. If the interval is shorter than a given time Tf, it regards the motion discrimination as a failure at the time. On the contrary, if the interval is longer than a given time Ts, it regards the motion discrimination as a success.” This criterion is based on assumption that "human’s motion cannot be interchanged in less than a give time". Laming reported that, “The simple reaction time of human is an average of 0.22 sec” [5]. Therefore, the failure time Tf is set as 0.22, and the success time Ts is set as a longer time than Tf. Now, input-output pair of feature vector X and discrimination result ș at time t is defined as monitoring information ȟt, and its set is expressed as Ȅș:
[ t { X t , T t !, ;T (t1 , t n ) { {[ t , [ t , , [ t }, | ;T (t1 , t n ) |{ t n t1 1
2
(2)
n
where, t is sampling time (t1<…
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
c t 1
J ct f AE : ck / cmax H (0 d H d 1) ȌT {\ kT } ȌT , cmax { max ȌT (c)
949
(3) (4)
In Eq. (3), Ȗ (0<Ȗ<1) is the damping rate; f (-1 or 0 or 1) is the evaluation function for discrimination, which is defined later. The smaller the value of c is the smaller, the contribution of learning data to the latest discrimination is. In Eq. (4), İ is the threshold for determining learning data elimination; In the case of İ=0, elimination process is not executed. ck represents the contribution value of the latest evaluated learning datum ȥk and cmax represents the highest contribution value in the learning data Ȍș. Equation (4) means that if ratio between ck and cmax is lower than İ then relevant ȥk is eliminated. Evaluation function f should be designed as follows: the more unnecessary for discrimination the learning data is, the smaller the value is. Moreover, it is necessary to detect which learning data causes which discrimination state (success or failure). It is reasonable to suppose that the most similar learning data to the feature vector generated from the feature extraction process affects strongly the discrimination state. Based on the above-mentioned concept and criterion for continuousness of motion, f is designed as: 1 : ;T d T f \ T arg\ (min\ ĭ (||\ X[ X ||)) ([ ;T ) (6) ° T f (; ) { ® 1 : ;T t Ts \ T arg\ (min\ ĭ (||\ X[ X ||)) ([ ;T ) T
T
°0 ¯
: otherwise
where argȥmin(d) returns the learning datum ȥ which minimizes function d. ȥX and ȟX show feature vector in learning data and monitoring information. When discrimination state is judged as failure or success, for all monitoring information ȟ, all of learning data ȥș of most similar feature vector ȥX to feature vectors ȟX received a penalty of (1) or reward of (+1). In automatic elimination, ș=ij is ignored because it cannot be judged as success or failure in the discrimination process. Meanwhile, Automatic addition function of learning data generates the learning datum ȥ using monitoring information ȟ. In order to realize automatic addition efficiently, the class ȥș of added learning datum ȥ has to be different from the discrimination result ȟș in ȟ, and it requires a condition which can predict accurately the discrimination result. Hence, a set of monitoring information Z(Ł{Ȅ1,…,Ȅn}) is defined as defective discrimination state which discrimination results consist only of reject ij and a single motion șk. It selects new learning data ȥadd in Ȅ of reject ij in Z. Here, three selective conditions of Z subjected to automatic addition are: i)Ȅ1T
Ȅn
T T
ii)¦ ȄI d T f
¦ ȄT
tT
iii)(ȄI ) (ȄT ) Z (7)
where Ȅș is the discrimination result ș in Ȅ. Equation (7) shows that discrimination result of the first and last in Z is a single motion ș, and the interval for monitoring information set of reject ij is a failure as discrimination state, and the interval for monitoring information set of single motion ș is a success. In set Z which satisfies conditional equations i)-iii), the new learning data ȥadd are added as:
\ add
arg[ min\ ȌT ,[ȄI (||\ X[ X ||)
AA : ȌT {\ add } ȌT Using these equations, ambiguous discrimination result ij would be reduced.
(8) (9)
950
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
Selective addition function for learning data Selective addition function of the learning data adds the new learning data received from user, according to the degree of complexity in the latest learning data set. The purpose of this function is to adjust to a drastic change in user’s characteristics. Therefore, we assume that complexity in the data cluster of the learning data set represents the degree of change in the user’s characteristics. Assuming that, the learning data that is added newly for desired motion ș is expressed as Ȍșadd and the degree of complexity in the latest stored learning data Ȍ is represented as E(Ȍ), selective addition function is defined as: T
SA : E (Ȍ ȌTadd ) t Ea E (Ȍ Ȍ add ) Ea
ˆ T } ȌT Ȍ {(Ȍ ȌT ) Ȍ add Ȍ ȌTadd Ȍ
(10)
ˆ T is learning data which averaged in latest where Ea shows allowed entropy value and Ȍ ș ș Ȍ before Ȍ add is added. Put simply, in the case when Ȍ becomes too complex, the latest Ȍș is replaced by the newly added learning data Ȍșadd in the mapping process for effective discrimination. If this is not the case, Ȍș is also used in the mapping process. The degree of complexity E in Ȍ is defined as the cluster similarity of each desired motion formed by learning data. It is important to map intended feature space from a high dimensional learning data to a lower one, because the learning data feature space has high dimensionality. In this method, a self-organizing map (SOM) [6] is employed for mapping to a lower dimensional feature vector. The SOM ĭ which maps X in Ȍ to two-dimensional discrete space ȍ is expressed as ĭ( X) { arg ȍ j ȍmin v j V (|| X v j ||)
( v, X D , ȍ j 2 )
(11)
where V(Ł{v1,…,vL}) is a code vector which is represented ȍ(Ł{ȍ1,…, ȍL}) placed in a lattice, and this factor determines the class ȍ which X belongs to. The mapping ĭ(X) maps X to the class ȍ of the code vector v which is most similar to X. For all X in learning data, all code vectors V are updated with iterative calculation (see Ref. [7] for details). Using Eq. (11), X is mapped onto ȍ, in a manner where feature vectors are grouped according to their vector similarities rather than according to their desired motion ș. A feature map ȁ thus formed, mapping all X onto ȍ. The degree of complexity E of learning data can be regarded as the degree of the overlapping of each desired motion ș in learning data Ȍ on feature map ȁ, which is defined in the following calculation based on conditional information entropy: ȁ ( Ȍ, ĭ) { {/1 , , / Ȍ } { \ 1 , ĭ( X1 ) !, , \ L
M
Ȍ
, ĭ( X Ȍ ) !}
E (Ȍ) { E ( ȁ) ¦i 1 p/ (ȍ i )¦ j 1 p/ (T j | ȍ i ) log 2 p/ (T j | ȍ i )
(12) (13)
where pȁ(ȍi) is the choice probability of class ȍi in all X, p(șj |ȍi) is the conditional probability of desired motion șj when a choice ȍi is made. The entropy shows the uncertainty, which reaches its maximum when all sub-events occur at an equal probability. In other words, when feature vectors corresponding to dissimilar desired motion in learning data are similar to each other, E becomes large and this condition represents the drastic change in the user’s characteristics.
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
951
Experiment We carried out experiments to discriminate a maximum of eight forearm motions using EMG signals from four channels using the motion discrimination system with the competitive learning method described in the previous section. To begin with, we showed the effects of the three functions (AE, AA, and SA) in the proposed method. Next, we analyzed the brain function by f-MRI for amputees to investigate the effects of mutual adaptation between human and the EMG-to-motion classifier. Figure 2 shows the experimental system setup. A dry-type electrode was attached to four regions on the right forearm of a subject and eight discriminating motions (four patterns of wrist motion and four patterns of hand motion) were executed for three seconds. Each motion was assigned an ID number (Fig.2). Measured EMG signals were processed by a PC, in which the motion discrimination method was implemented in software. Discriminating result was represented by a CG hand on the monitor or by the prosthetic hand developed in our laboratory. In order to teach the motion to the system, the keyboard was used as the interface, and subjects pushed the key when they thought that the hand moved unsatisfactorily corresponding to the desired motion. To evaluate the discriminating ability of this system, we carried out the ability tests in which the subject controls the prosthetic hand according to the instructions presented on the monitor. The subject executes each motion for three seconds (altogether 8 motions which last 24 seconds) and this ability test is done every half an hour (one trial). We carried out nine trials in this experiment. Discriminating rates are calculated by the comparison between the instructions and the discriminating result (motion). Three subjects took part in the experiment: two healthy subjects in their 20s (A and B) and one female amputee (severed right hand) in her 50s (C). Table 1 shows the common parameters in the experiment.
Figure 2. Experimental system, four sensor position, and eight discriminating motions. Table 1. Common parameters in the experiments. Lerning Truncation Learning rate Feature D M IJ Ȗ İ Ts Tf Ea T L Recognition Time in NN Error in NN in NN User model extraction 36 8 0.5 0.9 0.025 800 220 1.4 1000 25 30000 0.01 0.05
Results and discussions Figure 3(a) shows the results of the ability test for three subjects (A, B, C). The graph (top) in Fig.3 (a) represents the discriminating rate using the conventional method (does not manage learning data in proposed method; İ=0,Ȗ=1,Tf=0,Ea=), and the graph (bottom) represents discriminating rate using our proposed method. We see from Fig.3 (a) that our proposed method maintains a high discrimination rate whereas the conventional method shows a decreasing tendency in its discriminating rate with respect to time. Results for the healthy subjects indicate that the proposed real-time learning method can adjust to the change in user’s character
952
(a)
Discriminating rate[%]
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
Subject A
Subject B
100 80 Conventional 60 method 100 80 Proposed 60 method
1 2 3 4 5 6 7 8 9
(b)
(c)
Subject A
9th Trial
Trial ID1 ID2 ID3 ID4 ID5
(d) Ȍ ˆ T4
4th Trial
Subject B
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Trial
2nd Trial
4th Trial
(e)
ID6 ID7 ID8
ˆ T5 Ȍ
AA AE
E=2.30
E=1.23
Subject C
Trial
ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8
(f) 1st analysis (1st) Significant level: 0.5% Motion#(over DR80%):3 1stanalysis (2nd) Significant level: 0.5% Motion#(over DR80%):4 2nd analysis(after1 month) Significant level: 0.5% Motion#(over DR80%):6
SA
Figure 3. Experimental results.
-istics effectively. In addition, learning times for updating the NN in subjects A and B at the final trial were 60 and 87[sec.] respectively in conventional methods, as contrasted with 530 and 880[sec.] respectively in the proposed method. These results indicate that AE, AA and SA of learning data help in preventing exponential increment of the learning data. Furthermore, to discuss in more detail the change in learning data caused by realtime learning method, we plotted feature vectors in the learning data Ȍ on a twodimensional distribution map as shown in Fig.3 (b-e). Eight kinds of marker shows each desired motion ș (ID1-8) in the learning data. Figure 3(b,c) shows the learning data at 2nd trial (b) and last trial (c) in subject A to observe the behavior of AE and AA. As can be seen from Fig.3 (b,c), AE and AA are executed near the decision boundary of the cluster of each desired motion. The results indicate that AE and AA simplify the clusters and their relationships by fine adjustments to the boundary of the clusters. However, in the maps with eliminated learning data, there are some differences on the decision boundaries and the cluster relations. That is, in the initial stage (b), distributions of the clusters are large. Moreover, the extent of the overlapping area is also larger in (b) than in (c). The differences of the cluster distribution in size for the pre-training and post-training data suggest that clusters’ complexities of the initial trials are lower compared to the final trials of training. It is clear that the training reduces the overlapping of the clusters. Figure 3 (d,e) shows the learning data in subject B at the 4th trial. Figure 3 (d,e) shows the results after and before the execution of SA respectively. At the 4th trial, the complexity of the cluster increased owing to added new learning data (E=2.30). The clusters of ID4 and ID5 were widely shifted because past learning data was eliminated by SA when new learning data was added and the complexity in stored learning data was over the Ea (=1.4). As a result, the complexity in learning data was reduced to E=1.23, and learning time required for updating NN was reduced to 350 [sec.] compared to when SA was not in used. These results show that SA enables EMG-tomotion classifier to adjust to the drastic change in users characteristics effectively. Lastly, we discussed the results of subject C (amputee). From Fig.3 (a), sufficient
R. Kato et al. / Competitive Learning Method for Robust EMG-to-Motion Classifier
953
evidence for the effectiveness of proposed method could not be seen in subject C s discriminating rate. A few days later, we carried out the same experiments on subject C, and then analyzed her brain function by the f-MRI to investigate the change in her adaptation. Through the analysis of her brain function, we tried to investigate mutual adaptation between human and the proposed method. Brain function analysis for subject C is carried out twice. In the first analysis, we measured her brain function by fMRI when motion discrimination was performed. Then the measurement was repeated on the same day after sufficient training had been done in discriminating motions. From this point, for a month’s time, subject C continued to use the motion discrimination system every day as part of the training process. After that, we measured her brain function in a similar way for a 2nd analysis. Figure 3(f) shows results of the analysis for subject C, and the regions on the brain where activation were detected are shown in color. From this MRI images, as she had sufficient training, primary motor area (M1) and primary somatosensory area (S1) were strongly activated. Moreover, the number of motions that has discrimination rate of over 80% has increased from 3 to 6, after she had sufficient training. These facts suggest that the amputee had adapted well to the prosthetic hand as if it was a part of her own body. After these trainings, she was able to perform many motions (ex. “playing with building blocks”) for ADL as shown in Fig.1. This was made possible due to the improved stability in motion discrimination by applying our proposed method.
Conclusion In this paper, we proposed competitive learning method for robust EMG-to-motion classifier to adjust to the change in user's characteristics. It was found through experiments that EMG-to-motion classifier can stably discriminate many motions by adjusting to both gradual change and drastic change in user’s characteristics. Additionally, high discriminating rate is achieved, it is found that not only learning system adapts to user’s characteristics but also human adapts to the classifier. The results of this research will change the direction of researches on assistive devices for welfare purposes. From this point, we may go on to even more detailed examination of the concept of mutual adaptation between human and EMG-to-motion classifier.
Refferences [1] [2] [3]
[4] [5] [6] [7]
K.A.Farry, “Myoelectric teleportation of a complex robotic hand,” IEEE Trans. Robotics and Automation, vol. 12, no. 5, pp. 775-778, Oct 1996. H. Yamaguchi, "EMG Automatic Switch for FES control for Hemiplegics using Artificial Neural Network", Intelligent Autonomous Systems, pp.339-346, 2000. T. Tsuji, "Discriminating of Forearm Motions from EMG Signals by Error Back Propagation Typed Neural Network Using Entropy", Trans. of the Society of Instrument and Control Engineers, 29, 10, pp.1213-1220, 1993. (in Japanese) D. Nishikawa, “On-line Learning Based Electromyogram to Forearm Motion Classifier with Motor Skill Evaluation,” JSME International Journal, vol.43, no.4, series C, pp.906-915, Dec. 2000. D. R. J. Laming, “Information Theory of Choice Reaction Times”, Academic Press, London, 1968. T. Kohonen, "Self-Organization and Associative Memory", Springer-Verlag, 1984. R. Kato, ”Evaluation of Biosignal Processing Method for Welfare Assisting Devices - Evaluation of EMG Information Extraction Processing Using Entropy –“, Journal of Robotics and Mechatronics, Vol.14,No.6, pp.573-580, 2002.
954
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Mutual Adaptation among Man and Machine by using f-MRI analysis Hiroshi Yokoi*, Alejandro Hernandez Arieta*, Ryu Katoh*, Takashi Ohnishi***, Wenwei Yu**, and Tamio Arai* * Department of Precision Engineering, The University of Tokyo, Tokyo 113-0033, Japan [email protected] ** Department of Medical System Engineering, University of Chiba *** Department of Radiology, National Center Hospital for Mental, Nervous, and Muscular Disorders, National Center of Neurology and Psychiatry, Tokyo 187-8551, Japan
Abstract The prosthetic device for disabled people requires new and reliable robotics technology. This paper describes the interesting reaction of our brain to an adaptable prosthetic system. The adaptable prosthetic system is composed of an EMG controlled robot hand with EMG pattern recognition learning function for Transradial (below elbow) prostheses. The mutual adaptation between the system and the human body is analyzed using a functional magnetic resonance imaging (fMRI) in order to clarify the plasticity of motor and sensory cortex area according to the changes in the prosthetic system. The developed prosthetic hand has 13 D.O.F.: three motors on the thumb, two motors for each finger, and two motors for the wrist. The tactile feedback is applied by using electric stimulus. The f-MRI data shows the process of replacement from a phantom limb image to the prosthetic hand image. Keywords: Human Adaptation, Learning Machine, Machine Adaptation, EMG prosthetic hand, Motion pattern recognition,
1. Introduction The robot based prosthetic equipment is a developing technology that aims to support disabled people in their daily live, who for some reason has lost control of their bodies, such as amputees, muscular debility illness, nerve diseases, etc. Nowadays, advanced robotics technology provides with an adaptable prosthetic controller. Using learning paradigms, this controller can discriminate the electromyography (EMG) signal adapting to the different characteristics of human behavior. The EMG signal patterns changes with every person; in order to overcome the individual difference, the adaptable controller can make rules that describe the relationship between the EMG signal and the human intention by incremental learning. On the other hand, human adapts to the behavior of the prosthetic equipment too. In the case of forearm amputation, a phantom limb image remains in the brain for sometime. This image is the representation of the forearm in the cerebrum cortex’s motion area. This phantom image is a part of the human body image in the brain; also it is a biological example of the embodied AI representation of the end effectors. The human adaptation can be
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
955
observed by the changes in the f-MRI images, as an effect of the interaction with the prosthetic hand application. Such adaptation system produces mutual adaptation, and gives important insight to the embodiment issue. The mutual adaptation is the result from the collaboration of two or more adaptive systems that communicates with each other for the purpose of goal-directed behaviors, in this case, the human body and the adaptive prosthetic system. Generally, at the beginning the system requires more information to communicate with each other, whereas it decreases when the system adapts to the characteristics of the other parts. This paper aims to the understanding of the underlying mechanisms of adaptive goal–directed behavior by observing the communication between its parts. More specifically, we look for the way how a prosthetic robot hand and a human interact with each other by observing the brain activity.
Figure 1. Mutual Adaptation among man and prosthesis
Figure 2. Degree of freedom of human hand and finger The EMG is one of the potential bionic signals to control mechanical devices that are detected from the surface of the human body and reflects the muscles’ motion. Many studies reports the big potentialities and difficulties of using EMG signal patterns [1] [2] [3]. A powered prosthetic hand with many degree of freedom (DOF) could imitate the movement of a natural hand, increasing its functionality [4] [5] [6]. Some products are already applied to practical use in medical and welfare area [7]. This is significant not only for medical and welfare area, but also for robotics and mechanical engineering because it could be a landmark to achieve a humanoid hand. There are large differences in the specifications between industrial robotic hands and the externally powered prosthetic hands such as EMG controlled ones, such as: adequate size, weight,
956
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
appearance, speed, power, and control precision. For the compliance to these requirements, the tendon driven mechanism has been investigated [8] [9]. This paper proposes a prosthetic hand with 13 DOF, and adaptive joint mechanism based on tendon driven mechanism. The figure 1 shows a snap shot of the proposed system.
2. Prosthetic hand The natural forearm consists of 5 fingers, palm and wrist joint. Each finger has 3 joints and 4 DOF. The palm has many joints but the motive freedom is integrated into one DOF. The wrist joint has 3 DOF. Therefore in order to realize the functional level of human hand, the ideal prosthetic hand should have up to 24 DOF. However, the prosthetic hand has also physical restrictions, such as weight, size and power. The size is significantly different depending on the age. The weight is limited to the natural hand’s weight. It is necessary to reduce the internal pressure in the socket and to reduce the load produce on the upper arm by supporting the prosthetic hand. Therefore, the prosthetic hand must be designed light. Body ground
EMG sensor
Figure 3. Electrodes of EMG sensor and Body ground The grasping power should be strong enough to grip a glass of water; this condition requires more than 3.5 kgxcm. Objective values of the speed and torque are also severe for prosthetic hands. For this purpose, our developed prosthetic hand has 13 degree of freedom, with tendon driven mechanism, and controlled by EMG motion recognition system as shown in Figure 3 and Figure 4.
Figure 4. Interference wire driven Prosthetic hand
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
957
3. Main Controller for the EMG prosthetic hand The controller needs to recognize the human intention from the motion patters. Our system uses EMG to provide information about the human intention. The system requirements are summarized as follows: (1) The internal state and system parameter of controller should be able to change. (2) The motion’s functions of EMG prosthetic hand shall be improved. (3) The amputee should receive visual feed back of the EMG signal pattern. (4) The learning mechanism should be able to work even if the evaluation is weak. (5) The learning speed should be fast enough for real time change. Our controller uses the On-Line Learning Method [10], which consists of three units as shown in the figure 5. The units are Feature Extraction Unit, Classification Unit, and Evaluation and Selection Unit. Feature Extraction: This unit extracts the feature vector from the EMG signal. Classification Unit: This unit classifies predicted motion of forearm from the feature vector produced in the Feature Extraction Unit, and generates the control command for the prosthetic hand mechanism. This unit receives the learning pattern set from the Evaluation and Selection Unit, and updates its inner state in order to determine the mapping functions of the feature vector and the control command. Evaluation and Selection Unit: This unit manages the learning pattern set for tracking alterations of amputee's characteristics. These learning pattern sets are sent to the Classification Unit. EMG Signal Amputee
Feature Vector
Feature Extraction (Spectrum Analysis)
Motor Commands Classification (ANN)
treng th 1 2 3
frequency g
Comparison Human Instruction Evaluation and Selection (Learning Data Storage) Evolutionary 1997-1999 AI rule base 1998-2000 Entropy based 2000SOM Filter 2004-
Figure 5. Control architecture of EMG prosthetic hand This controller carries out the experiments to classify up to ten patterns of forearm motion from two channels of surface EMG using the classifier.
958
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
Figure 6. Ten patterns of forearm motion
Figure 7. Electric stimulus feedback system for tactile information of finger tip
The ten forearm motions that the controller classifies contain four patterns of wrist motion and six patterns of hand motion. Each motion is assigned an ID number for the description below. The figure 6 shows the motions and their corresponding ID numbers. 4. Tactile Sensory feedback One of the current problems in the prosthetic devices is the lack of tactile information, which results in difficult manipulation of the prosthesis, and clumsy actions from the device. This lack of feedback makes the system unable to react and adapt to the changing conditions of the surrounding environment. Having tactile information increases the performance of the robotic hand devices [12], [13], [14]. The application of tactile sensing on prosthetic devices has been addressed in [15] [16], giving feedback to the controller. In order to get tactile feedback, the prosthetic hand has been equipped with pressure sensors based on conductive silicon rubber [17]. The sensors are placed in the fingertips and in the base of each finger; in addition there are two sensors in the upper part of the palm. The signal provided by the sensors has the following characteristics: non-linear, hysteresis, working range between 15 gf and 400 gf, high sensitivity. 5. f-MRI analysis of human adaptation for the Prosthetic hand This section shows the analytical results of the human body adapting to the prosthetic hand system by measuring brain activity using an f-MRI system.
Visual feedback from LCD projector
Fig.8. f-MRI analysis of human adaptation
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
959
The prosthetic hand is designed as an adaptable machine that can predict what the patient wants to do next by processing the EMG signal. The amputee also adapts to the prosthetic hand by learning how to control it, by trying to move the remains of his arm body image, or phantom limb [18]. Using the functional magnetic resonance imaging (f-MRI) we measure the activation of the cortical mapping to observe the amputee’s reaction [19]. Three sets of experiments were performed. The experiments follow the next setup: three Delsys EMG sensors were placed on the right arm forehand of the subject, the transcutaneous electrical neuro-stimulation electrodes were placed on the upper left arm. In order to provide with visual feedback of the prosthetic hand’s movement, the video image was shown inside the f-MRI device using a mirror image from a projector. The cerebral activity was measured with fMRI using blood oxygen level-dependent contrast [20]. After automatic shimming, a time course series of 59 volumes was obtained using single-shot gradient-refocused echo-planar imaging (TR = 4000 msec, TE = 60 msec, flip angle = 90 degree, inter-scan interval 8 sec, in-plane resolution 3.44 x 3.44 mm, FOV = 22 cm, contiguous 4-mm slices to cover the entire brain) with a 1.5T MAGNETOM Vision plus MR scanner (Siemens, Erlangen, Germany) using the standard head coil. Head motion was minimized by placing tight but comfortable foam padding around the subject's head. The first five volumes of each fMRI scan were discarded because of non-steady magnetization, with the remaining 54 volumes used for the analysis. The fMRI protocol was a block design with one epochs of the task conditions and the rest condition. Each epoch lasted 24 seconds equivalent to 3 whole-brain fMRI volume acquisitions. Data were analyzed with Statistical Parametric Mapping software 2 (SPM2, http//:www.fil.ion.ucl.ac.uk/spm). Scans were realigned and were transformed to the standard stereotactic space of Talairach using a EPI template [21]. Data were then smoothed in a spatial domain (full width at half-maxim = 8 x 8 x 8 mm) to improve the signal to noise ratio. After specifying the appropriate design matrix, delayed box-car function as a reference waveform, the condition, slow hemodynamic fluctuation unrelated to the task, and subject effects were estimated according to a general linear model taking temporal smoothness into account. Global normalization was performed with proportional scaling. To test hypotheses about regionally specific condition effects, the estimates were compared by means of linear contrasts of each rest and task period. The resulting set of voxel values for each contrast constituted a statistical parametric map of the t statistic SPM {t}. For analysis of the each session, voxels and clusters of significant voxels were given a threshold of P < 0.005, not corrected for multiple comparisons. In the first set of experiments, we compare the brain activity due to the use of the prosthetic hand; we recorded the first time and after a month of continuous use. Figure 9 a) shows the brain activation at the beginning of the experiment. Before using the prosthesis, the contra lateral sensory area of the brain (M1) presented a broad activation, whether after the prosthetic hand use the brain activation is modified, figure 9 b). The experiment shows an example of cortical plastic change during EMG prosthetic hand usage; also shows the possibility of Neuro-modulating technology [22] In the second set of experiments, we measured the brain activity induced by tactile feedback. The sensory feedback was provided by electric stimulation in subject's left
960
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
arm. The level of electric stimulation is proportional to the pressure sensor signal placed on the prosthetic hand finger tips. In this experiment, we focus on the sensor and motor areas in the brain in order to test the adaptive behavior of the subject. Namely, we compared the brain activity with and without using motor function of the prosthetic hand. As shown in Figure 9(c), when no motor control is applied, the brain activity can be observed only in right side of the brain because the electric stimulation is given. When the human subjects control the hand through EMG device, however, this brain activity disappears while relatively large activity on the other side is observed (Figure 9(d)). It can be concluded that, the reason why the different regions are activated was significantly influenced by the motor action: Because the subject uses the muscle of right arm, the brain expects to obtain the sensor feedback on the same side of the brain, which can be interpreted as a kind of "illusion". a)
b)
Activity Up
d)
c)
Mirroring
Illusion
Disappear
Activity Up
Activity of left hand (effect of electric stimulus)
a) Before training to control prosthetic hand by right hand without tactile feedback b) After training c) Only electric stimulus on left hand (effect of tactile feedback) d) Prosthetic hand control by right hand with tactile feedback on left hand e) Transition of activation through training to use of prosthetic hand with tactile feedback
Fig.9. f-MRI brain analysis of human adaptation for EMG prosthetic hand Finally, in the third set of experiments, we conducted the similar experiment to the first ones, but with sensory feedback. Figure 9(e) and (f) show the brain activity of a human subject before and after the training of prosthetic hand. A salient difference is observed particularly in the visual cortex (bottom part of the brain in the figures). This figure clearly shows that, before the training, the subject makes use of visual sensory information for controlling the device, whereas it is not used after the training as the brain activity is substantially suppressed. By contrast with the first set of experiments, the activities in visual cortex shows opposite transition before and after the training:
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
961
Without sensory feedback, more visual information is used after the training, while it is not used after the training in the presence of tactile feedback. Therefore it can be concluded that, because the subject relies on the tactile feedback for controlling the device, the visual sensory feedback is no longer necessary. These changes make the mutual adaptation producing an unstable relationship between man and machine, with a local minimum of parametric optimization for the classification unit of the controller. The f-MRI image is a somato-sensory status indicator of skill level of the prosthetic system use. The activation of visual cortex shows the requirement for a feedback system. Therefore we have new challenges for the practical application of prosthetic system: I) Active search is necessary for the mutual adaptation, in order to build the adaptable prosthetic system. II) Evaluation of skill level is necessary for the active search (Proprioception). III) Visual feedback should be replaced with tactile sensory feedback.
6. Summary Two sets of results are obtained according to the presence or absence of tactile feedback, which can be resumed in adaptation and illusion generation. I) Adaptation Without tactile feedback from the prosthesis, the activation of the primary motor area (M1) and primary sensory area (S1) are highly activated; we also found the activation of Parietal lobe. This result shows the sensory motor coordination coupled tight between visual recognition and muscle control for grasping an object. With tactile feedback, the activation of Parietal lobe decreases. This shows how the information processing load is distributed, allowing for an effective sensory motor coordination, which might produce subconscious object manipulation. II) Illusion Without tactile feedback from the prosthesis, normal sensory motor coordination between visual servo and hand motion is obtained without generating any illusion of property of the artificial limb. However, the tactile feedback made a strange reaction in the sensory area. The stimulus acts on the left hand side, but the reaction of sensory area occurs on right hand side. This reaction is misunderstanding of tactile feedback, so it is similar to the illusion. This result means the active motion control determines priority of cortex area, and the motor area and parietal area became supervisor for Sensor. Therefore the illusions are appeared as a result of human adaptation for control prosthetic hand.
References [1] M. Uchida, H. Ide, S. P. Ninomija, "Control of a robot arm by myoelectric potential," Journal of Robotics and Mechatronics, Vol. 5, No. 3, P259-265, 1993. [2] K. A. Farry, I. D. Walker, R. G. Baraniuk, "Myoelectric teleoperation of a complex robotic Hand," IEEE Trans. Robotics and Automation, Vol. 12, No. 5, P775-788, Oct. 1996. [3] B. Hudgins, P. Parker, R. N. Scott, "New strategy for multifunction myoelectric control," IEEE Trans. Biomedical Engineering, Vol. 40, No. 1, P82-94, Jan. 1993. [4] H. H. Sears and J. Shaperman, “Electric wrist rotation in proportional-controlled systems,” Journal of Prosthetics and Orthotics, Vol.10, No.4, pp.92-98, 1998㧚
962
H. Yokoi et al. / Mutual Adaptation Among Man and Machine by Using f-MRI Analysis
[5] N. Dechev, W.L. Cleghorn, S. Naumann, “Multiple finger, passive adaptive grasp prosthetic hand,” Mechanism and Machine Theory, Vol. 36, pp. 1157–1173, 2001. [6] M. Neal, "Coming to grips with artificial hand design," Design Engineering, P26-27,29,32,34, March 1993. [7] “SensorHand technical information booklet,” Otto Bock Co., Ltd., 2001, http://www.healthcare.ottobock.com/ [8] Y. Ishikawa, W. Yu, H. Yokoi, Y. Kakazu, “Development of robot hands with an adjustable power transmitting mechanism,” Intelligent Engineering Systems Through Neural Networks, C.H.Dagli, et al. (Eds.), Vol. 10, pp. 631–636, ASME Press, ISBN:0-7918-01616, 2000. [9] S. Hirose, S. Ma, “Coupled tendon-driven multijoint manipulator,” Proceeding of IEEE International Conference on Robotics & Automation, pp.1268-1275, 1991. [10] D. Nishikawa, W. Yu, M. Maruishi, I. Watanabe, H. Yokoi, Y. Mano, and Y. Kakazu, “Online learning based electromyogram to forearm motion classifier with motor skill evaluation,” JSME International Journal Series C, vol.43, no. 4, pp. 906–915, Dec. 2000. [11] Ryu Katoh, Daisuke Nishikawa, Wenwei Yu, Hiroshi Yokoi, Yukinori Kakazu, Evaluation of Biosignal Processing Method for Welfare Assisting Devices - Evaluation of EMG Information Extraction Processing Using Entropy -, Journal of Robotics and Mechatronics, Vol.14, No.6, December, pp.573-580, 2002 (ISSN:0915-3942). [12] Samuel Crinier Behavior-Based Control of a Robot Hand using Tactile Sensors. Master Thesis conducted at the Center for Autonomous Systems (CAS), Royal Institute of Technology in Sweden, 2002. [13] G. Buttazzo, A. Bicchi, and P. Dario. Robot Tactile Perception. In C. S. G. Lee, editor, Sensor Based Robots: Algorithms and Architectures. Springer Verlag, Berlin Heidelberg, Germany, 1992. [14] A. Bicchi. Optimal Control of Robotic Grasping. In Proc. American Control Conference, 1992. [15] M. C. Carrozza, B. Massa, S. Micera, M. Zecca, P. Dario, "A wearable artificial hand for prosthetics and humanoid robotics applications", IEEE-RAS International Conference on Humanoid Robots HUMANOIDS 2001, Waseda University International Conference Center, Tokyo, Japan, November 22-24, 2001. [16] Kyberd, P. J. and Chappell, P. H., (1994): The Southampton Hand: An Intelligent Myoelectric Prosthesis. Journal of Rehabilitation Research and Development, 31(4), pp. 326-334, Nov. [17] In Japanese. “Conductive Rubber Silicon sheet. Characteristic PerformanceCS577RSC(CSA)”. [18] Malcolm MacLachlan, PhD; Deirdre Desmond, BA (Mod); Olga Horgan, BA (Mod) : Psychological correlates of illusory body experiences, Journal of Rehabilitation Research and Development, Vol. 40 No. 1, pp. 59 — 66, 2003. [19] Flor H; Elbert T; Knecht S; Wlenbruch C; Pantov C; Birbaumer N; Larbig W and E. Taub.Phantom-Limb Pain As A Perceptual Correlate Of Cortical Reorganization Following Arm Amputation. Nature, vol. 375, June 8, 1995: 482-84. [20] Logothetis, N.K.: The Neural Basis of the BOLD fMRI Signal. Phil. Trans. R. Soc. Lond. (357), 1003-1037 (2002) [21]Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ, et al. Design and construction of a realistic digital brain phantom. IEEE Transactions on Medical Imaging 1998; 17: 463-68. [22] Mano Y, Nakamuro T, Tamura R, Takayanagi T, Kawanishi K, Tamai S, Mayer RF. Central motor reorganization after anastomosis of the musculocutaneous and intercostal nerves following cervical root avulsion. Ann Neurol 1995; 38: 15-20.
Part 17 Women in Robotics, Human Science and Technology
This page intentionally left blank
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
965
Behavior Generation of Humanoid Robots Depending on Mood Kazuko ITOH a,1, Hiroyasu MIWA b,c,d, Yuko NUKARIYA e, Massimiliano ZECCA f,g, Hideaki TAKANOBU d,h, Stefano ROCCELLA f, Maria Chiara CARROZZA f, Paolo DARIO f, and Atsuo TAKANISHI a,c,d,g a Department of Mechanical Engineering, Waseda University, Tokyo, Japan b Digital Human Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan c Institute for Biomedical Engineering, ASMeW, Waseda University, Tokyo, Japan d Humanoid Robotics Institute (HRI), Waseda University, Tokyo, Japan e Graduate School of Science and Engineering, Waseda University, Tokyo, Japan f ARTS Lab, Scuola Superiore Sant’Anna, Pontedera (PI), Italy g ROBOCASA, Waseda University, Tokyo, Japan h Department of Mechanical Systems Engineering, Kogakuin University, Tokyo, Japan
Abstract. Personal robots, which are expected to become popular in the future, are required to be active in joint work and community life with human. Therefore, the objective of this study is the development of new mechanisms and functions for a humanoid robot to express emotions and to communicate naturally with human. We developed both the mental model from psychological point of view and the Emotion Expression Humanoid Robot WE-4RII (Waseda Eye No.4 Refined II) from engineered point of view. In this paper, a co-associative memory model using mutually coupled chaotic neural networks was proposed and implemented in WE-4RII as its mental model. We confirmed that the robot could generate the behavior depending on its mood in response to a stimulus. Keywords. Humanoid Robot, Neural Network, Mental Model, Memory
Introduction Industrial robots have various functions, such as assembly and conveyance. However, an operator has to define the robot’s behavior with very complex processes or methods. On the contrary, personal robots, which are expected to become popular in the future, have to be active in joint work and community life with human. Adaptation to partners or the environment and communication with partners are necessary for them. Therefore, the objective of this study is the development of new mechanisms and functions for the natural bilateral interaction between a robot and a human, e.g. expressing the emotions and personality, and generating active behaviors. 1 Corresponding Author: Research Associate at Waseda University; Address: #59-308, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555 Japan; Tel: +81 3 5286 3257; Fax: +81 3 5273 2209; URL: http://www.takanishi.mech.waseda.ac.jp/; E-mail: [email protected] (K. Itoh), [email protected] (A. Takanishi).
966
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
Various communication robots are researched in robotics. Brooks at MIT has developed an expressive robotic creature that expresses facial expressions using its eyes, eyelids, eyebrows and mouth. It communicates with human by using visual information from CCD cameras [1]. Sony Corporation has developed the entertainment humanoid QRIO, which is 50 [cm] tall and has 50-DOFs. It can autonomously walk by using CCD cameras information on the head, and it can control its behavior with the homeostasis regulation mechanism [2][3]. We have produced emotional expressions and active behavior with the Emotion Expression Humanoid Robot WE-4 (Waseda Eye No.4) series, which have the face, neck, lungs, waist, 9-DOFs emotion expression humanoids arms and humanoid robot hands RCH-1 (Robo Casa Hand No.1) [4]. In addition, we have developed the mental model for humanoid robots in order to realize the communication with humans. A mental space with three independent parameters, mood, second order equations of emotion, robot personality, need model [5], consciousness model and behavior model have been introduced into the mental model. On the other hand, many authors research new models for neural networks. In the Hopfield model, which is the most typical neural network, it is shown that the energy function decreases monotonically, and one energy minimum corresponds to one memory [6][7]. However, if the system falls into a spurious minimum, it cannot then escape from it. Therefore, several methods have been proposed to solve this problem by using chaos. Shimizu has proposed a chaotic neural network model, which consists of N Brownian particles [8]. It was found that the network retrieves all stored patterns and reversed patterns in the associative memory problem. We have been researching a system of multiple harmonic oscillators interacting via chaotic force as a model of neural network [9]. We have investigated a system of mutually coupled neural networks, in which each neuron is connected only with the correspondent neuron in the coupled network. Storing the different pattern in two networks, we have found that each network retrieves not only the pattern stored in it but also the pattern stored in the coupled network. In the previous mental model, the robot has showed just one behavior in response to a stimulus, though human behavior in response to a stimulus depends on the mood at the time. Therefore, we proposed a co-associative memory model using mutually coupled chaotic neural networks for generating the behavior to a stimulus depending on the mood. We implemented new mental model with memory model in the Emotional Expression Humanoid Robot WE-4RII (Waseda Eyes No.4 Refined II). In this paper, we describe in detail the new memory model.
1. Previous Mental Model We have developed the mental model with the mental space with three independent parameters, mood, second order equations of emotion, robot personality, need model, consciousness model and behavior model for a humanoid robot to interact bilaterally with a human. First, we have defined the mental space consisting of the pleasantness, activation and certainty axes shown in Figure 1. The Emotion Vector E has been defined in the mental space as the robot’s mental state as:
E ( E p , E a , Ec ) ,
(1)
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
Pleasantness unpleasant on ati tiv c asleep A
Certainty
certain Emotion Vector E
967
arousal
pleasant Trajectory of E
uncertain
Figure 1. 3D Mental Space and Emotion Vector E.
Figure 2. Mapping of Seven Different Emotions.
where Ep is the pleasantness component of emotion, Ea is the activation component of emotion, and Ec is the certainty component of emotion. We have mapped out seven different emotions in the 3D mental space as shown in Figure 2. The robot expresses the emotion corresponding to the region traversed by the Emotion Vector E. We have considered that the transition of a human mental state is expressed by similar equations to the equation of motion. Therefore, we have expanded the equations of emotion into the second order differential equation as shown in Eq.(2). Also three Emotional Coefficient Matrixes, the Emotional Inertia, Emotional Viscosity and Emotional Elasticity Matrixes have been introduced.
ME *E KE
FEA ,
(2)
M : Emotional Inertia Matirx * : Emotional Vis cos ity Matirx K : Emotional Elasticity Matirx FEA : Emotional Appraisal where the Emotional Appraisal FEA stands for the total effects of internal and external stimuli on the mental state. The robot expresses different reactions to a stimulus by changing the Emotional Coefficient Matrixes. The mental state is affected not only by emotion, but also by mood. Therefore, we have defined the Mood Vector M, consisting of a pleasantness component and activation component:
M
M , M ,0 , ³ E dt , 1 M M M p
Mp M a
(3)
a
(4)
p
2 a
a
a
0,
(5)
where Mp is the pleasantness component of the mood and Ma is the activation component of the mood, respectively. Since we have considered the current mental state to influence the pleasantness of mood, Mp has been defined as the integral of the pleasantness component of the emotion (Eq.(4)). On the other hand, since the activation
968
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
component of Mood Vector is similar to the human biological rhythm such as the internal clock, the Van del Pol equation has been applied to define Ma (Eq.(5)). In addition, we developed the need model, which consists of the Appetite, the Need for Security, and the Need for Exploration, in order to generate actively a behavior for bilateral interaction with a human [5]. The need matrices Nt at time t and Nt+'t at t+'t are described by the Equation of Need:
Nt 't
Nt PN u 'N ,
(6)
PN : Need Personality Matrix 'N : Small differences between two need states where 'N is determined by the stimuli from the internal and external environment, PN is a 3*3 matrix and stands of the personality for the need. In psychology, each need is independent, so PN is a diagonal matrix. Especially, the appetite depends on the total spent energy (described as the sum of the basal metabolism energy and output energy):
'N A 'A
f ( 'A) 'ABM 'AEA
,
(7)
where 'A means the variation in the energy spent by the robot, ABM is the basal metabolism energy, and AEA is the output energy. However, the robot with the previous mental model has been able to determine just a single kind of recognition in response to a stimulus and has showed behavior corresponding to the recognition. On the contrary, humans recognize a stimulus and generate a behavior depending on their mood at the time. Therefore, we proposed a new memory model for recognition (memory retrieval) depending on mood in response to a stimulus, using mutually coupled chaotic neural networks.
2. Human Memory
Human memory is related to their mood by mood state-dependency and mood congruency [10]. Humans store Memory A when in a certain mood. They can easily retrieve Memory A if their mood becomes the same mood again. This is known as the mood state-dependency. On the other hand, a certain mood helps retrieving a memory corresponding to that mood (mood congruency). Basically, humans tend to retrieve pleasant memories if they are pleasant and conversely, unpleasant memories if they are unpleasant. Moreover, human performance is related to their activation level [11]. If an activation level becomes too high or too low (i.e. superexcitation or bleariness), active human performance becomes impossible. The best human performance comes at a medium activation level. In this paper, we proposed a co-associative memory model for generating various behaviors to a stimulus, whereby the robot retrieves a pleasant memory if it is pleasant and an unpleasant memory if it is unpleasant. In addition, we controlled the time for retrieving a memory by the activation component of the robot’s emotion, in order to realize the connection between performance and activation level.
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
Z iB , A
969
Z iA,,BB Wi ,Bj
Wi ,Aj
䊶 䊶 䊶
䊶 䊶 䊶
䊶 䊶 䊶
Network A
Network B
Figure 3. Coupling of Two Neural Networks.
3. Co-associative Memory Model
3.1. Mutually Coupled Chaotic Neural Networks We proposed a system of mutually coupled chaotic neural network, which consists of many harmonic oscillators (neurons), as a co-associative memory model. The internal state of each neuron is represented as the position of the harmonic oscillator driven by the chaotic force. If the internal state of neuron i (i=1,2,…,N) in Network Į (Į=A or B) at time t=nIJ is denoted by xiĮ(t), the time evolution of the neuron is provided by Eq.(8). f(t) is the input from the surrounding neurons and itself, whose amplitude changes chaotically at time interval IJ:
xiD t kx iD t Z 02 x iD t K
f i D t
W
hiD n
f iD t ,
(8)
for nW d t (n 1)Wޓޓ (n 0,1,2,) ,
(9)
where, k, Ȧ0 and K are the damping constant, the eigen-frequency and the magnitude of f(t), respectively. The factor 1/ W is needed to obtain a finite diffusion constant in the small IJ limit [12]. h(n) denotes the interaction among neurons. We coupled two networks by connecting neuron i in Network A(B) with neuron i in Network B(A) as shown in Figure 3: N
N
hiA (n)
B ¦Wi,Aj y Aj (n) ZiA,B yiB (n) , hi (n) j 1
Wi, j
1 N
¦W
B B i, j j
y (n) ZiB, A yiA(n) ,
(10)
j 1
P
¦[
s i
[ js ,
(11)
s 1
where Wi,j is the coupling constant from neuron j to neuron i and ZiA,B (ZiB,A) is that from neuron i in Network B(A) to neuron i in Network A(B). ȟi denotes the stored pattern vector and ȟi takes +1 or -1. To store patterns in the network, we used the Hebb rule to determine Wi,j as shown in Eq.(11). The self-coupling constant is equal to 1; Wi,i=1. In Eq.(10), y(n) denotes the output of the neuron and represents the nth iterate of a map. As an example of the map, we employed the Logistic map, whose bifurcation parameter r(n) is modulated by the internal state of the neuron:
970
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
yiD (n 1) riD (n)(0.5 yiD (n))(0.5 yiD (n)) 0.5 (0.5d y(n) d0.5) riD (n) 4 b b cos 2 ExiD (n)ޓޓ (0 d b d 4)
(12) (13)
where b and ȕ are control parameters. y(n) changes chaotically or periodically according to the bifurcation parameter. Since the bifurcation parameter r(n) is modulated by the internal state of the neuron as shown in Eq.(13), the chaos is controlled by the neuron itself. A new type of feed-back mechanism is included in this model. The internal state x(n) determines the bifurcation parameter r(n), which, in turn, determines the dynamics of the chaotic output y(n). The chaotic output then affects the dynamics of the neuron. Thus, the dynamics of the chaos is changed by the system itself. In particular, if we put the period of the harmonic oscillator 2.0, the neural network can retrieve the original and reverse patterns alternately, meaning this neural network can perform very well. 3.2 Co-associative Memory Model In this paper, we defined “Apple” as the pleasant memory for “Red” and “Tomato” as the unpleasant memory for “Red”. Of course, we can select other memories according to the robot personality. In the case of ZiB,A=0.0, if ZiA,B becomes large the number of times in which Network A retrieves “Apple” increases and the number of times in which Network A retrieves “Tomato” decreases. Therefore, we introduced the mood state-dependency and the mood congruency by modulating ZiA,B by the pleasantness component of the mood Mp. The robot retrieves “Apple” for pleasant mood and “Tomato” for unpleasant mood when it recognizes “Red”. However, human retrieve the favorite food even if they are unpleasant by feeling hungry. We solved this problem by defining ZiA,B as the sum of the pleasantness component of mood Mp and the Appetite NA:
Z iA, B
M p NA.
Figure 4. Relation between Performance and Activation Level.
(14)
971
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
In addition, we controlled the time for retrieving a memory by the activation component of the robot emotion Ea, in order to realize the connection between performance and activation level, as shown in Figure 4. The robot with suitable activation level can retrieve the memory corresponding to its mood soon. However, the robot needs time to retrieve the memory if Ea becomes too high or too low. Moreover, we considered that a wrong memory is sometimes retrieved in case of too high or too low activation level. For ZiB,A=-4.0, Network A retrieves “Tomato” in almost smaller half region of ZiA,B but sometimes retrieved “Apple.” Reversely, Network A retrieves “Apple” in almost larger half region of ZiA,B but sometimes retrieves “Tomato”. Therefore, we defined that ZiB,A is equal to 0.0 for suitable activation levels and to -4.0 for too high or too low activation levels.
4. Experimental Results
We evaluated a new co-associative memory model through implementation in the Emotion Expression Humanoid Robot WE-4RII as shown in Figure 5. We set the parameter values as K=14.0, k=0.1, Ȧ0=31.4, ȕ=0.05, IJ=0.1 and b=1.2, and the initial x i values as xi(0)=25.0, (0 (0)=0.0 and yi(0)=0.2. Figure 6 shows the time evaluation of the emotion, the mood and the retrieved memory of WE-4RII. At first, we showed the red ball to the robot and the robot became unpleasant by being hit. It retrieved the unpleasant memory “Tomato” and shows the “Disgust” emotional expression. Next, the robot felt hungry since it moved considerably. Due to hunger, it retrieved the pleasant memory “Apple” in spite of unpleasant mood. After that, the robot can take the apple using its hand, and generate the behavior such as eating. If its hunger is satisfied, the robot becomes happy. Moreover, we confirmed that it took only about 2[s] for the robot from looking at the red ball until retrieving a memory. On the other hand, the robot needed about 9[s] for memory retrieval, since the activation level became very high by being hit many times. At this time, sometimes the robot could not retrieve the memory, depending on mood. Retrieve “Tomato”
Hit
Retrieve “Apple”
㪉㪇㪇㪇
㪜㫄㫆㫋㫀㫆㫅㩷㪸㫅㪻㩷㪤㫆㫆㪻㩷㪭㪼㪺㫋㫆㫉㫊
㪈㪇㪇㪇 㪇 㪄㪈㪇㪇㪇 㪇
Figure 5. WE-4RII.
㪋㪇
㪍㪇
㪏㪇
㪄㪊㪇㪇㪇 㪄㪋㪇㪇㪇 㪄㪌㪇㪇㪇
㪜㫇
㪄㪍㪇㪇㪇
㪤㫇
㪄㪎㪇㪇㪇
㪘 㪥㪘
㪄㪏㪇㪇㪇
㪉㪇
㪄㪉㪇㪇㪇
㫋㫀㫄㪼㩷㫊
Figure 6. Experimental Results of Mood State-Dependency and Mood Congruency
㪈㪇㪇
972
K. Itoh et al. / Behavior Generation of Humanoid Robots Depending on Mood
5. Conclusions and Future Work
In this paper, we investigated the co-associative memory model including the mood state-dependency, the mood congruency and the connection between performance and activation level in order to generate a behavior depending on mood in response to a stimulus. This memory model was realized using mutually coupled chaotic neural networks, which consist of many harmonic oscillators (neurons) interacting via the chaotic force. We confirmed that the robot can retrieve the memory depending on mood in response to a stimulus and show the behavior corresponding to the memory by implementing this memory model in the Emotion Expression Humanoid Robot WE-4RII. In the future, we will increase a number of retrievable memories. Furthermore, we will study a method for storing memories depending on mood.
Acknowledgment Part of this research was conducted at the Humanoid Robotics Institute (HRI), Waseda University. The authors would like to express their thanks to Okino Industries LTD, OSADA ELECTRIC CO. LTD, SHARP CORPORATION, Sony Corporation, Tomy Company LTD and ZMP INC. for their financial support to HRI. The authors would like to thank Italian Ministry of Foreign Affairs, General Directorate for Cultural Promotion and Cooperation, for its support to the establishment of the ROBOCASA laboratory and for the realization of the two artificial hands. In addition, this study was supported in part by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 17700211, 2005. Also, this research was supported by a Grant-in-Aid for the WABOT-HOUSE Project by Gifu Prefecture. Finally, the authors would like to express thanks to ARTS Lab, NTT Docomo, SolidWorks Corp., Advanced Research Institute for Science and Engineering of Waseda University, Prof. Hiroshi Kimura for their supports to our research.
References [1] C. Breazeal and B. Scassellati: “How to build robots that make friends and influence people”, Proceedings of the IROS1999, pp.858-863, 1999. [2] M. Fujita, Y. Kuroki, et al.: “Autonomous Behavior Control Architecture of Entertainment Humanoid Robot SDR-4X”, Proceedings of the IROS2003, pp.960-967, 2003. [3] T. Ishida, Y. Kuroki, et al.: “Mechanical System of a Small Biped Entertainment Robot”, Proceedings of the IROS2003, pp.1129-1134, 2003. [4] H. Miwa, K. Itoh, et al.: “Effective Emotional Expressions with Emotion Expression Humanoid Robot WE-4RII”, Proceedings of the IROS2004, pp.2203-2208, 2004. [5] H. Miwa, K. Itoh, et al.: “Introduction of the Need Model for Humanoid Robots to Generate Active Behavior”, Proceedings of the IROS2003, pp.1400-1406, 2003. [6] J. J. Hopfield: “Neurons with graded response have collective computational properties like those of two-state neurons,” Proceedings of Natl. Acad. Sci. USA 81, pp.3088-3092, 1984. [7] J. J. Hopfield and D. W. Tank: “Neural Computation of Decisions in Optimization Problems”, Biol. Cybern 52, pp.141-152, 1985. [8] T. Shimizu: “Chaotic Brownian Network”, Physica A 256, pp.163-177, 1998. [9] K. Itoh and T. Shimizu: “The Virtual Attractor in Mutually Coupled Networks”, Journal of the Korean Physical Society, Vol.40, No.6, pp.1018-1022, 2002. [10] Y. Takano: “Ninchi Shinrigaku 2 Kioku (in Japanese)”, Tokyo Daigaku Shuppan Kai, pp.11-13, 240-241, 1995. [11] D. O. Hebb: “Koudougaku Nyumon (in Japanese)”, Kinokuniya Shoten, pp.257-262, 1975. [12] T. Shimizu: “Relaxation and bifurcation in brownian motion driven by a chaotic force”, Physica A 164, pp.123-146, 1990.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
973
Construction of Human-Robot Cooperating System based on Structure/Motion Model a
Fumi Seto a,1 , Yasuhisa Hirata a,b , Kazuhiro Kosuge a Department of Bioengneering and Robotics, Tohoku University b PRESTO, JST
Abstract. The purpose of this study is constructing a unified methodology which could deal with many control problems existing in human-robot cooperating systems such as self-collision, collisions with obstacles, singular configurations and so on. To construct such methodology, we could realize safe and highly maneuverable human-robot cooperation as well as human-human cooperation. For this purpose, we propose a concept of "Structure/Motion Models" which represents the structures and their motions of robots, objects in the environment and the partner for the cooperation (humans) . In this paper, we explain "RoBE (Representation of Body by Elastic elements)", one of the modeling methods to construct structure/motion models. Constructing the models by using RoBE, we could deal with the problems of self-collisions and collisions with obstacles with a unified approach. We explain how to solve these control problems as the examples of solving some problems during the human-robot cooperation by using the structure-motion models, and results of these examples illustrate the validity of the concept of the structure/motion model. Keywords. Human-Robot Cooperation, Structure/Motion Model, Self-collision, RoBE
1. Introduction With the coming of aging society and the decrease in the labor pool, robots are expected to do various tasks in our daily life, such as assistive tasks in medical and welfare fields, tasks in a home and an office etc. Most of the tasks in such daily life are difficult to be automated perfectly, however, these tasks could be executed by assuming interactions between the robots and the humans. To develop the robot helping us in our daily life, many researches related on robot systems assuming interaction and cooperation with a human/humans have been reported. K. Kosuge et al. have proposed a mobile robot with dual arms, referred to as "Mobile Robot Helper (MR Helper)" for handling an object in cooperation with a human based on the impedance control[1]. O. Khatib et al. have proposed the concept of "Robot Assistant" using a mobile manipulator[2]. H. Hirukawa et al. have presented humanoid 1 Correspondence to: Fumi Seto, Aoba-yama 6-6-01, Sendai 980-8579, Japan. Tel.: +81 22 795 4035; Fax: +11 22 795 4035; E-mail: [email protected].
974
F. Seto et al. / Construction of Human-Robot Cooperating System
robotics platform that consists of a humanoid robot and an open architecture software platform, and they use it to develop various applications including the cooperative tasks with a human[3]. These robot systems should consider how they realize cooperation with a human more safely and effectively, not just carrying out the cooperating tasks. Human-robot cooperating system has a lot of control problems about safety and maneuverability such as self-collisions, collisions with obstacles, singular configurations and keeping manipulability. For example, self-collision, which are collisions of the robot’s body with its arms and so on, will cause the damage or falling down of the robot, and the robot might harm humans around the robot. When the decrease of manipulability happens to the robot, we will have difficulties manipulating an object in cooperating with the robot. A lot of researches have been proposed about the safety and maneuverability issues. In the researches of maneuverability issues, R. Ikeura et al. have proposed variable impedance control which improves the manipulability of the robot for humanrobot cooperating system in carrying an object, and they have evaluated this control subjectively[4]. Y. Maeda et al. have proposed human motion estimation system using the minimum jerk model for smooth human-robot cooperation[5]. There are also a lot of researches for the safety issue during the human-robot cooperation. Y. Yamada et al. have proposed construction of a power assist device called "Skill-Assist" for automobile assembly processes, and they have also proposed a method which allows a Skill-Assist to detect human’s error[6]. The goal of our study is realizing natural, smooth and safety human-robot cooperation as well as cooperation among humans. To realize such human-robot cooperation, we will have to construct a methodology which could unify these control algorithms mentioned above. We will be able to deal with many problems existing in human-robot cooperating systems by constructing such a unified methodology. Then, for constructing such methodology, we focus attention on cooperation among humans. Thinking about the cooperation among humans, humans can realize cooperating tasks safely and effectively with no difficulty. Why humans could do that? Then, we focus on the cooperation between humans. We consider that humans have the information about the structures of their bodies, objects in the surrounding environment and the partner of the cooperation, and they also have the information about the motions of these structures. We also consider that humans obtain the information through their experiences, by their senses and by learning. By using the information appropriately, humans could realize the safe and effective cooperation among humans. For example, by using the information of the partner, a human could estimate the next motion of the partner and he/she could generate his/her motion in harmony with the estimated motion of the partner. Humans could also control their motions to handle the object easily by using the information of themselves. Considering them, we will be able to construct a unified methodology by providing the models of the information about the structures/motions of its body, objects in the surrounding environment and the partner of the cooperation (a human) to the robot as well as humans. Following of this paper, we propose "Structure/Motion Models" which represent both the information about structures of robots’ body, objects in the environment and the partner of the cooperation (humans) and the information about the motions of these structures. There would be a lot of modeling methods to construct the "Structure/Motion Models", and we have to select a modeling method which meets the prob-
F. Seto et al. / Construction of Human-Robot Cooperating System
975
Figure 1. Concept of Structure-Motion Model
lem to be solved. In this paper, we address a self-collision avoidance problem and an obstacle avoidance problem as the examples of the problems during human-robot cooperation, and we select the modeling method referred to as "RoBE" in which robots and objects are represented by using "Elastic elements" for solving these two problems with a unified approach. Then, we explain how to solve these problems by using proposed methodology.
2. Design of Structure/Motion Model "Structure/Motion Model" is constructed by unifying "Structure Model" and "Motion Model". "Structure Model" is defined as the model to represent the structure information of robots, humans and objects. The structure information includes its anatomy, link structure, shape, weight and so on. "Motion Model" is described the motion capability and the motion condition of a structure model such as its moving direction and velocity. Then we define three types of "Structure/Motion Models" to represent the information of structures and motions which have relations with human-robot cooperation as shown in Figure 1; "Self-Model", "Object-Model" and "Partner-Model". In this section, we describe the concept of each model, and we briefly explain how to deal with the control problems during human-robot cooperation by using these models. 2.1. Self-Model "Self-Model" is defined as the model to represent the information about a structure of the robot itself and its motion such as its anatomy, somatotype, link structure, the placement of the degrees of freedom and the range of its joint movement. The robot could understand its condition and situation by having its self-model so that the robot could control itself to avoid the dangerous situations. Furthermore, the robot could also control its configuration to keeping the manipulability, and it could carry out the tasks effectively.
976
F. Seto et al. / Construction of Human-Robot Cooperating System
For example, the robot could recognize the danger of self-collisions by using its selfmodel which represents its somatotype, and the robot could control its motion to avoid self-collisions. The robot could also control its configuration to keep the manipulability and avoid its singular configuration. 2.2. Object-Model "Object-Model" is defined as the model to represent the information about structures of the objects in the surrounding environment and their motions such as their shapes, weights, locations, moving directions and velocities. Humans who aren’t cooperating with the robot are also represented as the object-models. By having these object-models as well as the self-model, the robot could understand the condition of the environment and the information of the object handling by the robot. The robot could understand or estimate the locations and motions of the obstacles by using the object-models. Then the robot could move around without colliding obstacles. The robot could also manipulate an object appropriately and select a manipulation method depending on the property of handling object. When the robot is handling a cup of coffee, the robot could understand the property of the handling object by using the object-model and keep the orientation of its hand not to spill it. 2.3. Partner-Model "Partner-Model" is defined as the model to represent the information about a structure of the partner of the cooperation (humans) and its motion, as well as the self-model. By having the partner-model as well as the self-model and the object-models, the robot could understand the condition and situation of the partner, and realize the cooperation with considering the condition and situation of the partner. For example, the robot could recognize the danger that human’s joint moves over the range of his/her joint movement, and change its motion for preventing the danger. The robot could manipulate an object cooperating with a human so as to alleviate a burden on the human.
3. Construction of Human-Robot Cooperating System by using Structure/Motion Models We defined the concept of "Structure/Motion Models" in the previous chapter. To realize safe and effective human-robot cooperating system by using it, we have to consider how to construct these models and decide the control algorithm of the robot based on these models so as to realize safe and effective cooperation. Especially, depending on a problem to be solved, selecting information for constructing the structure/motion models and designing the construction method of the models are very important. In this paper, we explain "RoBE (Representation of Body by Elastic elements)", one of the modeling methods to construct the structure/motion models. Constructing the models by using RoBE, we could deal with the problems of self-collisions and collisions with obstacles with a unified approach. Then, we explain how to solve these control problems as the examples of solving some problems during the human-robot cooperation by using the structure-motion models in this section.
F. Seto et al. / Construction of Human-Robot Cooperating System
B
977
S
S H O U L D E R S E
E L B O W E
(a)Example of the Robot
B O D Y
H H A N D
(b)Self-model constructed by Elastic Element
Figure 2. Construction of Self-Model
3.1. Self-collision Avoidance Control based on Self-Model The self-collision problem is one of the problems in human-robot cooperation. The selfcollisions include collisions of the robot’s body with its arms, collisions between two arms/legs, and so on. When the self-collision occurs, the robot is damaged or loses its balance and fall down, and harms humans around the robot. Unfortunately, the possibility of self-collisions increases when the robot is cooperating with a human/humans, because the motion of the robot is affected by the intentional force/moment between the robot and a human/humans. We have proposed a real-time self-collision avoidance control method for the robot which is used for human-robot cooperation using the self-model. In this method, the self-model representing the robot’s body and its motion is constructed by the "Elastic elements", which consist of an elastic area and a rigid area. For example, the self-model of the robot shown in Figure 2(a) is constructed by elastic elements as shown in Figure 2(b). This self-model construction method is referred to as "RoBE (Representation of Body by Elastic element)". The self-model constructed by using RoBE is moving in the same movement as the actual robot’s movement which is generated based on the intentional forces applied from a human. If the contact between two elements is detected, a virtual reaction force is generated between them, and the motion of the robot is generated based on the virtual reaction forces. Therefore, the robot could avoid self-collisions by using the self-model. Figure 3 shows an example of the experiment of self-collision avoidance during the human-robot cooperation. In this experiment, a human applies an intentional force to the robot with the force/torque sensor attached to its wrist so that its manipulator is winding around its body, and we confirm whether the robot could avoid self-collisions and complete the winding motion. From this example, you can see that the robot can complete the winding motion without self-collisions while the robot is cooperating with a human. Detailed explanations about this self-collision avoidance control method using RoBE is described in [7]. 3.2. Obstacle Avoidance Control based on Self-Model and Object-Model In the previous section, the self-collision avoidance is realized by using the self-model constructed by using RoBE. By constructing the object-models representing obstacles
978
F. Seto et al. / Construction of Human-Robot Cooperating System
0.0[s]
4.0[s]
8.0[s]
12.0[s]
16.0[s]
20.0[s]
24.0[s]
28.0[s]
32.0[s]
36.0[s]
40.0[s]
44.0[s]
Figure 3. Example of Self-collision Avoidance Experiment
Figure 4. Construction of Object-Model
in the environment around the robot as shown in Figure 4, and providing these objectmodels to the robot, the robot could also avoid colliding these obstacles as well as the self-collision avoidance by using the self-model and the object-models constructed by using RoBE. Figure 5 shows an example of the computer simulations of the obstacle avoidance during the human-robot cooperation. In this simulation, we use the 7-DOF manipulator and apply the virtual intentional force to the endpoint of the manipulator in the direction indicated by the arrows instead of the force applied by a human. Furthermore, the objectmodel representing the location and the shape of an obstacle is constructed by using the elastic elements. To simplify the representation, we represent the obstacle with one ball in this simulation. Figure 5(a) shows the manipulator’s movement when there are no obstacles in the environment, and Figure 5(b) shows the manipulator’s movement when there is an obstacle in the manipulator’s working space. From these simulation results, you can see that the movement of the manipulator is changed when there is an obstacle in the working space, and the manipulator could avoid the obstacle while cooperating with a human by using the self-model and the object-model.
979
F. Seto et al. / Construction of Human-Robot Cooperating System
x[m] 1 0.5 0 -0.5 -1
0 -0.4 z[m] -0.8
-0.8
-1.2-1.2
-0.4
0.4
0 y[m]
(a) Without Obstacles
x[m] 1 0.8 0.6 0.4 0.2 0 -0.2
0 -0.4 z[m] -0.8 -1.2
-1
-0.5
0 y[m]
0.5
(b) With an Obstacle Figure 5. Simulation Result of Obstacle Avoidance
Then, we could deal with the self-collision avoidance and the obstacle avoidance with a unified approach, the structure/motion models constructed by using RoBE. In this simulation, the information of the obstacle, the location and the shape, are given beforehand. However, in the living environment, the robot have to avoid the obstacles whose locations and shapes are dynamically changed. Therefore, we have to establish the methods of getting the knowledge of the obstacles in the working space and constructing the object-models by using this knowledge. A. P. del Pobil et al have proposed the method of constructing object models by using spatial representation[8], then we could use the methods like it to construct the object-models. We can decide the representation particularization in their method, then we could change it depending on the distance between the robot and the obstacle, the operating accuracy required to the task, instructions given by the partner (human) and so on. We focus on obstacle avoidance and construct the object-models by using only the information of the obstacle in this simulation. However, we have to construct the objectmodels by using the information of handling object when the robot is handling the object. The information of handling object includes not only the location and the shape but the weight, the position of the center of mass and so on, and it could not be modeled by
980
F. Seto et al. / Construction of Human-Robot Cooperating System
using RoBE. Then, we have to establish some modeling methods which can describe such information in our future work.
4. Conclusion To construct a unified methodology realizing natural, smooth and safe human-robot cooperation as well as cooperation among humans, we proposed the concept of "Structure/Motion Models" which represent the information about structures of robots’ body, objects in the environment and the partner of the cooperation (humans) and their motions in this paper. Then, we explained one of the modeling methods to construct structure/motion models referred to as "RoBE". We also explained how to solve the selfcollision avoidance problem and the obstacle avoidance problem by using the structure/motion models constructed by using RoBE, as the examples of solving the problems during human-robot cooperation. Constructing the models by using RoBE, we could deal with these problems with a unified approach. Results of these examples illustrated the validity of the concept of the structure/motion model. As our future work, we have to construct the human-robot cooperation system which enable the robot to consider the partner of the cooperation (a human) by using the partnermodel. For example, almost robots have larger range of joint movement than humans, and these robots can also move their joint in the direction which human can’t move his/her joint. When such robot cooperate with a human, the robot might move without considering the fact that a human can’t move his/her joints in specific directions, and injure a human. At the present time, such problems are solved by moving the robot similarly to the human’s motions. However, if the robots come to understand the fact by using the partner-model, they could move to avoid injuring a human without simulating the human’s motions.
References [1] K. Kosuge, M. Sato and N.Kazamura: Mobile Robot Helper. Proceedings of IEEE International Conference on Robotics and Automation , (2000) 583-588. [2] O. Khatib: Mobile manipulation: The robotic assistant. Robotics and Autonomous System 26 (1999), 175-183. [3] H. Hirukawa et al.: Humanoid robotics platforms developed in HRP. Robotics and Autonomous Systems , Vol. 48, No. 4 (2004), 165-175. [4] R. Ikeura, H. Inooka and K. Mizutani: Subjective Evaluation for Maneuverability of a Robot Cooperating with Human. Journal of Robotics and Mechatronics , Vol. 14, No. 5 (2002), 514-519. [5] Y. Maeda, T. Hara and T. Arai: Cooperative Human-robot Handling of an Object with Motion Estimation. Journal of Robotics and Mechatronics , Vol. 14, No. 5 (2002), 432-438. [6] Y. Yamada, T. Morizono, Y. Umetani and H. Konosu: Warning: To Err Is Human. IEEE Robotics and Automation Magazine , Vol. 11, No. 2 (2004), 34-45. [7] F. Seto, K. Kosuge and Y. Hirata: Self-collision Avoidance Motion Control for Human Robot Cooperation System Based on RoBE. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (2005), 50-55. [8] A. P. del Pobil and M. A. Serna: Spatial Representation and Motion Planning. Lecture Notes in Computer Science, Springer-Verlag, 1995.
981
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Motion Assist Devices for Rehabilitation using Parallel Wire Mechanisms Keiko HOMMA National Institute of Advanced Industrial Science and Technology (AIST)
Abstract. In Japan, robotic systems for rehabilitation or assistance of the elderly and the disabled people are actively studied recently. In developing rehabilitation/assistive robots, safety is one of the most important issues, since the robots work very closely to the users or worn by the users. Parallel wire mechanisms have prospects of giving solutions to safety issues on rehabilitation robots. In this paper, the author takes a look back on her studies on motion assist devices using parallel wire mechanisms and discusses advantages and disadvantages of the motion assist devices using parallel wire mechanisms. Keywords. rehabilitation robot, motion assist device, parallel wire mechanism
1. Introduction These days in Japan, robotic systems for rehabilitation or assistance of the elderly and the disabled people are actively studied, since the needs for these kinds of robots are made apparent by rapid aging of the population, as well as maturation of robot technologies has made it possible to realize these kinds of robots. Assisting the elderly or the disabled person to move his/her limbs is a typical application in this field. Most of the motion assist devices being developed are exoskeletons. Even though exoskeletons have the advantages such as large motion range and similarity of configuration between the system and the user’s limb, there are concerns on safety. Parallel wire mechanisms have prospects of giving solutions to safety issues. Since 1993, we have been focusing on parallel wire mechanisms applied to the motion assist devices for rehabilitation. In this paper, I would like to outline our studies and consider the advantages and disadvantages of the developed systems. In section 2, related studies are described. In section 3, author’s studies are presented. In section 4, advantages and disadvantages of the motion assist devices using parallel wire mechanisms are discussed.
2. Related studies Literature shows that the roots of rehabilitation robots can be found in powered prostheses and orthoses [1]. Externally-powered upper limb prostheses began to be studied since World War II. Study of powered orthoses started in 1960’s. Since then a
982
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
lot of studies have been carried out until today. Current studies on motion assist devices for upper and lower limbs are outlined below. The devices that are focused on below are the devices worn by the user or that hold the user’s limb. Yardley et al. has been developing an upper limb orthotic exercise system [2]. The system is designed to conduct four kinds of elbow joint exercises. The elbow joint mechanism has been designed as a single-DOF joint attached to the user and driven by a motor through gearbox. Kiguchi et al. are developing an exoskeleton to assist upper limb motion in daily life or rehabilitation [3]. The system assists shoulder and elbow motion using EMG signals. The main component of their system is 3-DOF arm driven by DC motors and wires. DC motors are supposed to be mounted on a wheelchair. The system is attached to the user at the wrist and at the upper arm. Lee et al. have been developing an exoskeleton-type walking aid for physically weak people [4]. This system uses DC servo motors and harmonic drive gears attached to a frame. The specification of actuator used in the system was determined based on maximum angular velocity and maximum torque required for human motion. Wheeler et al. are developing a robotic system for gait rehabilitation of stroke patient [5]. The system is designed to actively assist dorsi/plantar flexion and inversion/eversion of ankle, and to passively assist internal/external rotation of ankle. By actuating fewer degrees of freedom than anatomically possible, the system can be installed without misalignment with the patient’s joint axes. Their first prototype weighs about 3.6kg. It is attached to the user at the shank and at the foot. The connecting piece is easily replaced so that the system can be adjusted to each user. Nikitczuk et al. are studying the feasibility of a powered knee orthosis using Electro-Rheological Fluid (ERF) actuators [6]. It is expected that the use of ERF actuators will supply controllable resistive torques and make the system compact. In the first version prototype, a commercially available knee brace is used and ERF actuators are attached to the knee brace. Therapeutic devices can also be regarded as motion assist devices. Hirata et al. have developed a rehabilitation system for two legs of stroke patients [7]. The system executes the range of motion exercise as well as gait training. Each leg is moved using two mechanical arms. One arm supports the patient’s leg at thigh and the other supports at the lower leg. In the studies mentioned above, exoskeletons or robotic systems hold the user’s limb to conduct a motion. These systems have the advantages such as large motion range and similarity of configuration between the system and the user’s limb. However, there are concerns on safety since the user cannot easily to escape in case of emergency and excessive actuator power may cause damage to the user in case of malfunction. Of course, most of them have taken measures for safety. For example, Lee et al. determined motor specification based on human motion and avoided excessive output. Wheeler et al. designed the system so that fewer degrees of freedom than anatomically possible. But still they have the problem when the rotation axis of the joint of the mechanism is not consistent with the rotation axis of human joint. One solution for safety issue is to use pneumatic actuators. Kobayashi is developing a wearable muscular support apparatus which employs pneumatic actuators in order to assist the upper limb motion [8]. The use of the McKibben-type actuator makes the system light and avoids the use of rotating axes. However an initial prototype could not generate expected motion because of slippage and slack of wear. To overcome the problems, an armor-type suit with mechanical joint and hard frame has been developed. Sasaki et al. is studying an active support splint driven by
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
983
pneumatic soft actuator [9]. They are developing two kinds of prototypes; one is for increasing the motion range and the other is for increasing muscular endurance. Both prototypes uses pneumatic actuator designed to realize rotating motion. Pneumatic actuators will make the system light and human-friendly. However, pneumatic actuators have to be accompanied by air cylinders and valves that are space-consuming and noisy. In order to overcome disadvantages mentioned above, an alternative approach is required. The use of parallel wire mechanisms is considered to be a prospective approach. Studies of parallel wire mechanisms are conducted in the field of manufacturing, rescue system, rehabilitation system, haptic interface, and so on. From 1980’s, researchers at NIST have been studying cable-based Stewart-Gough platform machines for large scale manufacturing such as welding and grinding a beam [10]. Osumi et al. showed a concept and discussed the inverse kinematics of a 7-DOF crane with three wires for heavy parts assembly [11]. Kawamura et al. has been studying a wire-driven parallel manipulation system for assembly operations with ultra-high speed [12]. They built a prototype with seven wires. In order to dampen vibration of the end effector, internal force control was introduced. Tadokoro et al. have been studying a portable motion platform using a cable-driven parallel mechanism which is used to searching for victims of disaster such as earthquake [13]. The system consists of multiple cables suspended from actuator winch units and a traveling plate. The number of cables used can be changed according to the required task. Mayhew et al. are developing a cable robot for upper limb rehabilitation for stroke patient based on Multipurpose, Multiaxial Isokinetic Dynamometer technology [14]. The robot employs eight wires to realize 6-DOF motion of the end-effector. Each wire is driven by a DC motor mounted at the corner of the frame. Nagai et al. have developed a power assistive device for the elderly and the disabled, which helps the user to transfer from a bed to a wheelchair, for example [15]. The system hangs the user’s body from the frame installed in the room using wires. The system consists of two layers of parallel wire mechanisms; the upper layer conducts motion in horizontal plane and the lower layer conducts motion in vertical plane. Surdilovic et al. have developed a gait rehabilitation system [16]. The system holds the user’s body with seven wires from the frame. The system is controlled based on the zero moment point (ZMP) estimation. Arcara et al. developed both a 1-wire haptic interface device and a 3-wire haptic interface device for wearable robotic system which helps visually impaired persons to move [17].
3. The use of parallel wire mechanism for motion assist devices In this section, our studies on motion assist devices using parallel wire mechanisms are looked back. From 1993 to 1998, an upper limb motion assist system was studied. From 1999 to 2004, a multiple-DOF Leg Rehabilitation System was studied. 3.1. Upper limb motion assist system We proposed and developed an upper limb motion assist system using parallel wire mechanism [18-20]. This system aims for assisting people with upper limb disability to move the arm by his/her own will.
984
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
A conceptual drawing of the system is shown in Figure 1. The main idea of the upper limb motion assist system is to suspend the disabled arm with wires arranged in parallel, and to move the arm by changing the length of each wire. The system consists of orthoses, wires, pulleys, tubes, an overhanging plate, and a motor-controller unit. Two belt-like orthoses hold the forearm. One wraps around the elbow, and the other wraps around the wrist. Each orthosis is suspended from the overhanging plate by three wires. The motor-controller unit winds and unwinds the wires according to commands from the user. Input device is chosen according to the residual function and the preference of the user. The system can be mounted on a wheelchair, a desk, and so on. Since the target tasks with this system were writing, drawing and so on, wires were extended only from above the user’s head. In the initial design process, relationship between wire arrangement and available workspace were simulated. In simulation, forearm and upper arm were modeled as uniform rods and equilibrium of force was considered. Simulation result showed that larger workspace will be obtained as the distance between wires becomes larger. Following the initial design stage, a prototype was built for positioning experiments, in order to evaluate the effectiveness of the proposed system. The overall view of the prototype is shown in Figure 2. The system consisted of a frame, winding units, pulleys, wires, and a controller. The total size of the frame was 1.57 m in width, 2 m in length and 1.92 m in height. Six winding units, each comprising a DC servo motor, an incremental encoder, gears, timing belts, and a winding drum, were placed at the foot of the frame. When generating a motion, target position of the wrist was inputted to the controller and target length of each wire was calculated using inverse kinematics. In order to execute reproducible movements to examine the arm motion, two kinds of model arms were used in the experiments. The dimensions of the model arms were approximately twice that of the human forearm and upper arm, and the ratio of these two parts is almost same as that of the human arm. Experimental results using the forearm model showed that the cross section of the
Figure 1. Concept of an upper limb motion assist system.
Figure 2. A prototype.
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
985
obtained motion range was almost as same as the area of a square whose side had the same length as that of the forearm model. The cross section became smaller as the target position became further from the hanging points of wires. Positioning error between target position and the actual position became larger as the target position approaches to the edge of the motion range. Experimental results using the forearm-upper arm model showed that displacement less than 5mm could not be realized accurately. 3.2. Multiple-DOF Leg Rehabilitation System We also proposed and studied a multiple-DOF leg rehabilitation system that employs a parallel wire mechanism. Figure 3 shows the conceptual design of the proposed system. This system supports and moves the leg using multiple wires extended from frames around the user. The system aims to carry out knee flexion/extension, hip flexion/extension, hip abduction/adduction, and hip internal/external rotation. At the first stage of the study, a single-DOF prototype for knee flexion/extension was built and tested in order to study the feasibility of the proposed concept [21]. This prototype consisted of a suspension mechanism for suspending thigh and an ankle holding mechanism for translational motion of the ankle on the bed. Both mechanisms were controlled cooperatively to conduct knee flexion/extension. This prototype had the problems such as difficulty in increasing the target degree of freedom and difference between the direction of wire tension and that of the major axis of the leg. Based on the lessons form the result of the single-DOF prototype, a 2-DOF prototype for both hip and knee flexion/extension was built and tested [21-22]. Target motions can be regarded as 2-DOF motions in the sagittal plane. Therefore, location devices were arranged symmetrically. When designing location devices, winding mechanisms that move along rails in addition to winding wires were introduced. These mechanisms were controlled so that the direction of wire tension was kept constant even when the joint angle of the patient's leg changed significantly, otherwise this change of the joint angle would cause unnecessary external force applied to the leg. Rail shape was determined by simulation using 2-DOF model of a leg and three wires (Figure 4(a)). In the simulation, a condition was set that the length of wire and the angle between the wire and long axis of the leg were constant. Thus, different rail shapes for different wires were obtained, according to the hanging position on the leg. The obtained shape was then modified for
Figure 3. Concept of a multiple-DOF leg rehabilitation system.
986
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
(a)
(b)
Figure 4. (a) Simulation of wire trajectory, (b) conceptual sketch of the flexion/extension
manufacturing. Configuration of the 2-DOF prototype is shown in Figure 4(b). Based on the 2-DOF prototype, a 4-DOF prototype was built and tested [23-25]. Hip abduction/adduction and hip internal/external rotation were added to the list of target motions. When extending from a 2-DOF prototype to a 4-DOF prototype, hip abduction/adduction and hip internal/external rotation were decided to be conducted as a combined motion, using an attachment fixed to the bed and the wires on the 2-DOF mechanism. This decision was made to avoid employing additional actuators which require more complicated mechanical design and control method. Motion of the 4-DOF prototype was conducted using position control. According to the user’s body dimensions and the target motion range, target positions of the orthoses in time series were obtained and then the target position of each winding mechanism and the target length of each wire were calculated. In order to examine the system properties, experiments using a test dummy and experiments using human subjects were carried out. Four kinds of target motion were given; (1) hip flexion/extension, (2) knee flexion/extension with a subject laying face down, (3) knee flexion/extension with a subject laying on his back, and (4) rotating motion (hip abduction/adduction and internal/external rotation). The test dummy used
Figure 5. A 4-DOF prototype.
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
987
in the experiments was developed by the former Agency of Industrial Science and Technology for testing medical equipment. The height and the weight of the test dummy were 1.649m and 48.2kg, respectively. The human subjects were five healthy men aged in the twenties to fifties. Before conducting the experiments, the experimental procedure was explained and consent was obtained from the subjects. After the experiments, the conducted motion ranges were evaluated with reference to the target motion ranges. The conducted motion ranges were expressed as the joint angles calculated from the positions of optical markers attached to the leg. A stereo camera system was used to measure the positions of optical markers. In the experiments using the test dummy, magnitudes of the obtained motion ranges were from 80% to 100% of the target motion range. In the experiments using human subjects, magnitudes of the obtained motion ranges were from 80% to 100% of that of the target motion range in cases of experiments (1) and (4). In cases of experiments (2) and (3), magnitudes of the obtained motion ranges were from 18% to 43% and 25% to 52%, respectively. These significant position errors appeared in the results of human subjects were considered to be caused by the orthoses not fitting to lower legs of the subjects and sliding on their legs.
4. Discussion The use of parallel wire mechanisms for motion assist devices is expected to have the following advantages. (1) Lightness of the components attached to the user: the use of wires eliminates the need to attach heavy components to the user's body. (2) Avoiding misalignment of the rotational axis between the device and the user: in the case of exoskeleton-type devices using rigid links, misalignment of the rotational axis of the device with the rotational axis of the human joint may cause damage to a user. The parallel wire mechanism avoids this problem because it does not have a rotational axis attached to the human body. (3) Flexibility of the posture of the user: using a parallel wire mechanism, motion can be executed in various patient postures. (4) Human-friendliness: wire-driven mechanisms are light and flexible, thus making the system human-friendly. These advantages have been assured as follows, through the experimental results. (1) Lightness of the components attached to the user: in the leg rehabilitation system, orthosis for thigh and orthosis for lower leg are 166g and 55g, respectively. The user does not need to bear loads of other heavy components. (2) Avoiding misalignment of the rotational axis: in both of our studies, prototypes did not have rotational axis attached to the human body, except for a rotating mechanism of the leg rehabilitation system. (3) Flexibility of the posture of the user: the leg rehabilitation system prototype conducted knee flexion/ extension both in supine position and in prone position. (4) Human-friendliness: in order to estimate psychological effects of the system, subjective evaluation of the prototype was carried out [25]. The result of the evaluation showed that the feeling of constraint was significantly low compared to the clinically-used continuous passive motion (CPM) device.
988
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
At the same time, the prototypes we built show the following disadvantages. (1) Accuracy of the motion: even though the target users have motor dysfunction, not all of them are immobilized. When the user moves his/her body spontaneously, current position changes. Also, skin and subcutaneous tissue are easily to be deformed by external force. These properties of human body tend to cause the positioning error, since it is difficult for parallel wire mechanisms to allow for a margin of error. In the experiments on the leg rehabilitation system using human subjects, significant position errors were caused by the orthoses sliding on the leg, because the orthoses did not fit to the leg properly. In order to hold the leg securely and yet comfortably, the orthoses should be made to measure for each patient. This is considered to be a most significant problem. (2) Design problem: in the case of the upper limb motion assist system, a large space should be created under the arm of the user, because the target tasks will be executed on the desk. This requirement made it impossible to generate the force in the direction of the gravity. In the case of the multiple-DOF leg rehabilitation system, the obtained prototype was too bulky for actual use. However, by the nature of our proposed mechanism, the motion range of the object was surrounded by the system components. Thus, downsizing the system would reduce the available motion range. At the next stage of development, a necessary motion pattern should be selected based on requirements from potential usages, and the system configuration should be changed and downsized consequently. From the viewpoint of safety, most important design issue for the motion devices using parallel wire mechanisms is to avoid interference of wires with human body. In the case of the multiple-DOF leg rehabilitation system, moving winding mechanisms were effective for avoiding interference to some extent.
5. Summary In this paper, our studies of motion assist devices using parallel wire mechanisms are looked back; one is the upper limb motion assist system and the other is the multiple-DOF leg rehabilitation system. By examining the prototypes we developed, it was shown that these devices had the advantages of such as lightness of the components attached to the user, avoiding misalignment of the rotational axis between the device and the user, flexibility of the posture of the user, and human-friendliness. Also, examination of prototypes showed that there were problems to be overcome such as motion accuracy and the design problem.
References [1]
W.S. Harwin, T. Rahman and R.A. Foulds, "A Review of Design Issues in Rehabilitation Robotics with Reference to North American Research", IEEE Transactions on Rehabilitation Engineering, Vol. 3, No. 1, pp.3-13, 1995.
[2]
A. Yardley, G. Parrini, D. Carus, and J. Thorpe, "Development of an Upper Limb Orthotic Exercise System", Proc. of International Conference on Rehabilitation Robotics, pp. 59 - 62, 1997.
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
[3]
989
K. Kiguchi, M.H. Rahman and T. Yamaguchi, "Adaptation Strategy for the 3DOF Exoskeleton for Upper-Limb Motion Assist", Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp.2307-2312, 2005.
[4]
S. Lee and Y. Sankai, "Power Assist Control for Walking Aid with HAL-3 Based on EMG and Impedance Adjustment around Knee Joint", Proceedings of 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1499-1504, 2002.
[5]
J.W. Wheeler, H.I. Krebs and N. Hogan, "An Ankle Robot for a Modular Gait Rehabilitation System", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1680-1684, 2004.
[6]
J. Nikitczuk, B. Weinberg and C. Mavroidis, "Rehabilitative Knee Orthosis Driven by Electro-Rheological Fluid Based Actuators", Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp.2294-2300, 2005.
[7]
R. Hirata, T. Sakaki, S. Okada, Z. Nakamoto, N. Hiraki, Y. Okajima, S. Uchida, Y. Tomita and T. Horiuchi, "BRMS:Bio-Responsive Motion System (Rehabilitation System for Stroke Patients)", Proceedings of 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1344-1348, 2002.
[8]
H. Kobayashi, "Development of a Muscle Suit for Realizing All Motion of the Upper Limb", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1630-1635, 2004.
[9]
D. Sasaki, T. Noritsugu and M. Takaiwa, "Development of Active Support Splint driven by Pneumatic Soft Actuator (ASSIST)", Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp.522-527, 2005.
[10] R. Bostelman, A. Jacoff, F. Proctor, T. Kramer and A. Wavering, "Cable-Based Reconfigurable Machines for Large Scale Manufacturing", Proceedings of the 2000 Japan-USA Symposium on Flexible Automation, 2000. [11] H. Osumi, T. Arai and H. Asama, "Development of a Seven Degrees of Freedom Crane with Three Wires (1st Report) –Inverse Kinematics of the Crane–", Journal of the Japan Society for Precision Engineering, Vol. 59, No. 5, pp. 767-772, 1993 (written in Japanese). [12] S. Kawamura, W. Choe, S. Tanaka and S.R. Pandian, "Development of an Ultrahigh Speed Robot FALCON using Wire Drive System", Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pp.215-220, 1995. [13] S. Tadokoro and S. Kobayashi, "A portable parallel motion platform for urban search and surveillance in disasters", Advanced Robotics, Vol. 16, No. 6, pp.537-540, 2002.
990
K. Homma / Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms
[14] D. Mayhew, B. Bachrach, W.Z. Rymer and R.F. Beer, "Development of the MACARM – a Novel Cable Robot for Upper Limb Neurorehabilitation–", Proceedings of the 2005 IEEE 9th International Conference on Rehabilitation Robotics, pp.299-302, 2005. [15] K. Nagai, H. Hanafusa, Y. Takahashi, H. Bunki, I. Nakanishi, T. Yoshinaga, T. Ehara, "Development of a Power Assistive Device for Self-Supported Transfer Motion," Proceedings of IEEE/RSJ Conference on Intelligent Robots and Systems, pp. 1433-1438, 2002. [16] D. Surdilovic, R. Bernhardt, T. Schmidt, J. Zhang, "STRING-MAN: A New Wire Robotic System for Gait Rehabilitation," Proceedings of the 8th International Conference on Rehabilitation Robotics, pp. 64-67, 2003. [17] P. Arcara, L. Di Stefano, S. Mattoccia, C. Melchiorri and G. Vassura, "Perception of Depth Information by Means of a Wire-Actuated Haptic Interface", Proceedings of the 2000 IEEE International Conference on Robotics and Automation, pp.3443-3448, 2000. [18] K. Homma and T. Arai, "Design of an Upper Limb Motion Assist System with Parallel Mechanism", Proc. of IEEE International Conference on Robotics and Automation, pp. 1302 - 1307, 1995. [19] K. Homma and T. Arai, "Static Model for Upper Limb Motion Assist System", Proc. of International Conference on Advanced Robots, pp. 945 - 950, 1995. [20] K. Homma, S. Hashino and T. Arai, "An Upper Limb Motion Assist System: Experiments with Arm Models, " Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.758-763, 1998. [21] K. Homma, O. Fukuda, and Y. Nagata, "Study of a Wire-driven Leg Rehabilitation System," Proceedings of IEEE/RSJ Conference on Intelligent Robots and Systems, pp. 1451-1456, 2002. [22] K. Homma, O. Fukuda, and Y. Nagata, "Development of Leg Rehabilitation Assistance," Journal of Robotics and Mechatronics, Vol.14, No.6, pp.589-596, 2002. [23] K. Homma, O. Fukuda, J. Sugawara, Y. Nagata and M. Usuba, "A Wire-Driven Leg Rehabilitation System: Development of a 4-DOF Experimental System," Proceedings of the 2003 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp.908-913, 2003. [24] K. Homma, O. Fukuda, J. Sugawara, Y. Nagata and M. Usuba, "Study of a Wire-driven Leg Rehabilitation System - Development of a 2-dof system and extension to a 4-dof system -," Journal of the Society of Life Support Technology, Vol. 15, No. 4, pp.151-158, 2003 (written in Japanese). [25] K. Homma, O. Fukuda, Y. Nagata and M. Usuba, "Study of a Wire-driven Leg Rehabilitation System — Human Subject Experiments using a 4-DOF Experimental System —," Proceedings of IEEE/RSJ Conference on Intelligent Robots and Systems, pp. 1668-1673, 2004.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
991
Analysis of Skill Acquisition Process -A case study of arm reaching taskKahori KITA, Ryu KATO, Hiroshi YOKOI, Tamio ARAI Intelligent Systems Division, Dept. of Precision Engineering, Graduate School of Engineering, The University of Tokyo, JAPAN
Abstract. Analysis of continuous process of motor learning gives a lot of useful knowledge for the recovery of human motion activities, and functional adaptability in new environment. This paper proposed a valuation index for degree of proficiency, and showed results of motor skill analysis for several arm reaching tasks. The motor skills were evaluated by using the reproducibility of muscle activation patterns, which were represented by using the variance value of the Electromyographic (EMG) signal patterns, and the motion accuracy. We confirmed that the reproducibility is high when the motion accuracy is high, and the various skill acquisition processes exist due to individual difference. We concluded that, the reproducibility is one of the important indices for evaluating the degree of proficiency. Keywords. EMG, Skill acquisition process, Reaching task, Motor learning
Introduction Assistive devices require more user-friendliness, since they are usually utilized by disabled or old people. In order to realize this user-friendliness, it is necessary to know both the current degree of proficiency, i.e., how well the device is operated by the user, and the skill acquisition process expressed by some simple index. The basic idea of this study is to analyze from a biomedical signal closely related to the human motor process the degree of proficiency when, in this case, performing a reaching task with the arm. Human functionality is improved by exercises [1]. The established (well trained) motion usually results in effective movement. The degree of the establishment is called human skill. For example, a pitcher changes his pitching form and strength through exercises, in order to improve his efficiency, steadiness and accuracy. The analysis of such skill acquisition process will give much useful knowledge to human adaptable functionality to new or changing environments. The knowledge extracted from the skill acquisition process gives an index to measure the degree of human skill. In conventional approaches, performance measures, such as motion accuracy and time for completing one certain motion [2], have been used for the measurement of the degree of human skill and also for monitoring the skill acquisition process in progress. These kinds of measures, for which an external coordinate system is usually necessary, take into consideration the "effect" of skill acquisition. However, recently, the necessity to observe and analyze the changes from an internal point of view in order to get a step closer to the “cause” has been realized. For example, cardiac beats [3] and muscular rigidity changes [4][5] have been employed to
992
K. Kita et al. / Analysis of Skill Acquisition Process
investigate the skill acquisition process. It is expected that more detailed information about the changes of human neural-musculo-skeletal systems can be measured based on such internal coordinate systems, so that, skill acquisition process can be much more intensely investigated. As pointed out by relational studies [6][7], as skill acquisition process proceeds, motion patterns tend to converge stably to suitable patterns. Therefore, the changes and the reproducibility of motion patterns are closely related to the improvement of motor skill. In this study, we suggest a method to quantify the degree of skillfulness from biomedical signals closely related to human motor process, based on pattern reproducibility. We used surface electromyogram (EMG) as the biomedical signal. It is detected on the skin surface during the period of muscle contraction, and presents the resultant effect of motion dynamic processes and body mechanisms. Using our method, we analyzed the skill acquisition process for some target reaching tasks, which are useful to examine the skill acquisition process.
Evaluation method for analyzing skill acquisition process The method gets the index of reproducibility incorporates three steps; preprocessing, feature extraction and covariance calculation; on the other hand, the index of motion accuracy is comprised in one step. The reproducibility of motion patterns Preprocessing Raw EMG signal is measured on the skin surface when some muscles contracts. EMG signal is a time series data and consists of many superimposed motor unit pulses. Therefore, it is not easy to infer the human body’s motion from raw EMG signals. There are many methods proposed to extract some information from EMG. In this study, motion related EMG signals were divided into frames with fix length, then integrated and averaged. Assume that, at time t, EMG signals from electrode k is defined as ot,k. The signal was rectified into value | ot,k | with length F, then integrated and averaged to the value st,k.
st , k
1 t ¦ ot ,k F i t F
(1)
Eq. (1) shows the value st,k. The values acquired as result are set to components of feature vectors and are called amplitude values. The feature vector of each channel is normalized so that the total sum of each feature vector is one in time t. We defined the process from the start of a moment until its completion as one trial. During one trial, we pick out the feature vectors n times. Eq. (2) is feature vectors X that are extracted during one trial.
X
>X
T 1
,, XTn
@
993
K. Kita et al. / Analysis of Skill Acquisition Process
ª s1,1 « « « s1,k « « « s1, K ¬
st ,1 st , k
st , K
s n,1 º »» s n ,k » » » s n ,K »¼
(2)
where st,k is amplitude value which is measured by kth electrode when the total number of the electrodes is K. Feature extraction Feature vector X has a large amount of information. In this study, we adopt principal component analysis (PCA) to reduce it to a more manageable size. PCA projects įdimensional data onto a lower dimensional subspace in a way that is optimal in a sumsquared error sense. PCA was applied to X that is the preprocessed signals for each complete objective motion. PCA can be formulated as follows,
Zt
LX t
(3)
We can rewrite the same formula as following,
ª Z1 º « » « » «¬ Z K »¼ t
ª L1,1 L1,k « « « Lm ,1 Lm ,k « « « LK ,1 LK ,k ¬
L1, K
º »ª º » st ,1 « » Lm , K » « » » » «¬ st , K »¼ LK , K »¼
(4)
where Lm,K are components of eigenvector group L and they belong to eigenvalue Ȝm (m=1, …, K). Ȝm is correlation matrix with K variables. Z is principle component; consist of combination of variable X and eigenvector group L. Z1 is called first principle component, Z2 is called second principle component, and so on. In this study, we use only first eigenvectors (L1,1,…, L1,K) because the contributing rate of first eigenvectors is over 90 % in the pilot study. The eigenvectors calculated by PCA express a simplified basic muscle activation patterns in a transformed space, and eigen-values represent the degree of each muscle's contribution to the motion. The eigenvectors were used as the feature vectors for the next variance calculation step. Covariance calculation Since, muscle activation patterns reproducibility is very important for the skill acquisition process, we need to quantify the varying extent of feature vectors. In this study, we calculate the variance of the transformed and simplified muscle activation
994
K. Kita et al. / Analysis of Skill Acquisition Process
patterns, i.e., feature vectors, to express the reproducibility. Therefore we calculate the variance of eigenvectors (L1,1,…, L1,K) in N trials at time t, and the sum of the variance of each channel was set to ıe.
Ve
1 KN
K
j §1 ® L1.i ¨ s N ¯ ©N s
¦¦ i 1 j
s
¦L
h 1,i
h s N
·½ ¸¾ ¹¿
(5)
The reproducibility of motion patterns are expressed by ıe. If ıe value is smaller, it means subject uses almost same muscle patterns during movement. Therefore, reproducibility is higher if ıe value is smaller, and vise versa. The accuracy of motion patterns Accuracy of motion patterns is expressed as d, which is the distance between actual and ideal trajectories.
d
1 nN
N
n
¦ ¦ §¨© x
j sN i 1
i
2 2 D i yi E i ·¸ ¹j
(6)
Here (x, y) denotes the position of actual arm movements and, (Į, ȕ) denotes the position of ideal trajectories. Again, accuracy of motion patterns is higher if d value is smaller, and vise versa.
Experiment In order to verify the usefulness of the evaluation method described in the previous section, we carried out the following experiments. First, we verified the relation between motion accuracy and reproducibility, because the reproducibility of the muscle activation patterns is suitable for the skill acquisition process evaluation. Furthermore, we tested the changes in the reproducibility and the motion patterns. Finally, we analyzed the relation between skill acquisition patterns and the difficulty of each task. Setup Figure 1 shows the experimental system. We used a target-tracking task to carry out the skill acquisition process. Subjects were asked to track the target that moved on a computer display, using a joystick as shown in Figure 1. The cursor on the computer display moves according to the amount of movements of the joystick. Both spatial and temporal accuracy are counted when calculating the trajectory error. We prepared three types of trajectories: linear, circular and random, shown in Figure 2. In addition, the target could move on two speeds, 1.5[sec/trial] called “slow” and 0.7[sec/trial] called “fast”. Combining the trajectory and the speed of the target sets up to six tasks, displayed in Table1. The target moves at a constant speed from the start position to the goal position. We define this process as one trial. After the target reached the goal position, the target trajectory and cursor were displayed on the
K. Kita et al. / Analysis of Skill Acquisition Process
CRT
995
G
Subject S
Coordinate conversion
Joystick(3D)
PC
Process
EMG
Figure 1. Experimental system and a joystick (FF.JOYARMTM, MES) used in this experiment.
Circle
Line
Random Goal
Goal
Ch.6 Goal Start Start Start Figure 2. The trajectories of three tasks system. Table 1. Tasks in the experiments. Trajectory Speed
Linear Circular Random
Ch.3
Ch.2 Ch.1 Ch.5 Ch.4
1.5[sec/trial] (Slow)
L-S
C-S
0.7[sec/trial] (Fast)
L-F
C-F
R-S
Figure 3. Position of EMG electrodes on the left arm. R-F
computer monitor. The Subjects looked at the two trajectories on the monitor and recognized the distance between the target and the cursor. The subjects tried to bring the cursor trajectory close to the target trajectory in the next trial. The Surface EMG signals were recorded from the six relational muscles of the left arm, the manipulating arm. Here parameter K is six and F is 128. The sampling rate is 1.6 kHz. Figure 3 shows the position of EMG dry type electrodes. Ch. 1 and Ch. 2 were placed in biceps brachii muscle, Ch. 3 was placed in deltoid muscle, Ch. 4 was placed in flexor carpi ulnaris muscle, Ch. 5 was placed in flexor carpi radialis muscle, and Ch. 6 was placed in greater pectoral muscle. Lastly, reproducibility and accuracy of motion patterns were calculated by proposed method using the sampled data set. Eight healthy subjects, six males and two females, took part in the experiments. Each subject carried out 150 trials per task. Results and discussions First, we verified the relation between motion accuracy and reproducibility. Three subjects took part in this experiment (subject II, III, IV) executing the circular-fast task. Figure 4 shows the motion accuracy and the reproducibility of three subjects. As can be seen from the graph, motion accuracy and reproducibility became higher at the end of the experiments. This fact suggests that muscle activation patterns converged to certain stable patterns, and the degree of skillfulness increased as the exercises go on. Furthermore, we verified the changes in the reproducibility and the motion
996
K. Kita et al. / Analysis of Skill Acquisition Process 500 Subject II Subject III Subject IV
d [pixel]
400 300 200 100 0 0.1
㱟e
0.08 0.06 0.04 0.02 0 㪉㪈 41
㪈㪈 21
1㪈
㪊㪈 61
㪋㪈 81
㪌㪈 101
㪍㪈 121
trial Figure 4. Correlation between the motion accuracy d and the reproducibility ıe. If values of d and ıe are smaller, the motion accuracy and the reproducibility are higher.
d [pixel]
㪋㪇㪇 400 㪊㪇㪇 300 㪉㪇㪇 200 㪈㪇㪇 100
0㪇
㪇㪅㪇㪉㪌
0.02
㱟e
㪇㪅㪇㪉
0.015
㪇㪅㪇㪈㪌
0.01
㪇㪅㪇㪈
0.005
㪇㪅㪇㪇㪌
contribution
0
㪇
1st 2nd 3rd 4th 5th 6th
㪈
㪈㪈
㪉㪈
㪊㪈
㪋㪈
search stable
㪌㪈
㪍㪈
㪎㪈
search
㪏㪈
㪐㪈
stable
trial Ch1
Ch2
Ch3
Ch4
Ch5
Ch6
Figure 5. The motion accuracy, the reproducibility and the contributions of each muscle to the motion. In the lower graph, each color shows each muscle and from 1st to 6th means the order of contribution.
patterns. Here, the eigenvectors show how much each muscle activates during the motion. In other words, the eigenvectors show the motion patterns and how much each muscle contributes to the motion. Therefore, the percentage of each eigenvector after normalization in one trial is defined as the contribution of each muscle to the motion. Figure 5 shows the results of subject VIII doing the circular-fast task. The upper graph shows the motion accuracy, the middle one shows the reproducibility, and the lower one shows the changes in contributions of each eigenvectors to the motion. In the lower graph, “1st” means the most useful muscle which has the highest percentage value of eigenvector and “6th” means the least useful muscle in motion.
997
PVe Average of reproducibility (䇭䇭䇭)
K. Kita et al. / Analysis of Skill Acquisition Process
0.05 0.04
Random movement tasks
Constant movement tasks
0.03
P2
P1
0.02 0.01
P3
0
0㪇
㪈㪇㪇 100
㪉㪇㪇 200
㪊㪇㪇 300
Average of movement accuracy ( R-F
R-S
Subject I Subject V
Task 1
Task 2
㪋㪇㪇 400
Pd )
Task 3
Task 4
Subject II Subject III Subject IV Subject VI Subject VII
Figure 6. Skill acquiring pattern using μıe and μd.
In the upper and middle graph, the motion accuracy became higher, but values of ıe sometimes rise steeply. Thus, the reproducibility is sometimes low, for example the red circle points (search) in the Figure 5. And we can see form the lower graph that motion patterns dramatically changed when the values of ıe rise steeply. Therefore, motion patterns alternate between stable states (the points in the black circle on Figure 5) and states of searching new motion patterns (the points in the red circle on Figure 5). Finally, we analyzed the relation between skill acquisition patterns and the difficulty of each task. Seven of the subjects executed the task where trajectory and speed are constant. Three of the subjects (subject II, III, IV) were set to Group A and the other three subjects (subject V, VI, VII) were set to Group B. Group A executed four tasks in the order of linear-fast, circular-fast, linear-slow, and circular-slow. Group B executed four tasks in the order of linear-slow, circular-slow, linear-fast, and circular-fast. And the remaining subject (subject I) executed random speed and random trajectory task. In Figure 6, μd, means the average value of d for last P (P =50) trials, and μıe means the average value of ıe.
1 P ¦ di Pi1
Pd PV
e
1 P ¦V e Pi1 i
(7)
(8)
Figure 6 shows the results, where task 1 means the task that each subject performed first in the experiment, and task 2 means the task performed next and so on. Both μıe and μd values of the random movement tasks were higher than those of the constant movement tasks. That is, the degree of skillfulness of the tasks with high difficulty is lower than that of the easy tasks. In turn, the reproducibility of the difficult
998
K. Kita et al. / Analysis of Skill Acquisition Process
tasks is lower than that of the easy task. Here it can be considered that the random movement tasks are beginner's condition to a certain movement because there is a different degree of skillfulness between the random movement tasks and the constant movement tasks. That is, it is considered that proficiency progressed. Therefore, this measure of skillfulness may be able to evaluate skill acquisition process of beginners in vocational rehabilitation, for example. We see the μıe and μd values of the constant movement tasks in Figure 6. The skill acquisition patterns of the constant movement tasks are classified into three classes (P1, P2, P3). The μıe and μd values are low in P1, so that P1 is a condition of skillfulness. The μıe and μd value is high in P2, so it is considered that proficiency is an intermediate state. In P3, subjects may become skilled in the task further because μd value is high, or subjects don’t become skilled in the task any more because μıe value is low. Thus the degree of reproducibility is not fixed when it is concerned to subjects who became skilled in the tasks. We assume that fluctuation exists in the movement pattern so that subjects can respond to an environmental change such as the change of posture during the trial, muscles fatigue, and stimulus from the outside. In other words, the subjects maintain high motor redundancy, which enables them to respond to various situations, so that the reproducibility is not fixed. In addition, it is clear that due to the individual difference, various skill acquisition processes exist.
Conclusion In this paper, we proposed the evaluation method to quantify the degree of skillfulness from biomedical signals closely related to human motor process, based on the pattern repeatability and motion accuracy. We confirmed that the reproducibility is high when motion accuracy is high, and various skill acquisition processes exist due to individual difference. We also showed that the motion patterns alternate between stable states and unstable states. We concluded that, the reproducibility is one of important indices for evaluating the degree of skillfulness. In the future, we need to add the element of the change accompanying time progress and the change of amplitude of EMG to our evaluation method.
Refferences [1] [2] [3]
[4] [5] [6] [7]
Brandon Rohrer et al., “Movement Smoothness Changes during Stroke Recovery”, The Journal of Neuroscience, vol. 22, no. 18, pp. 8297-8304, 2002. Y. Murase and M. Miyashita, “Kinesiology of bowling”, Journal of Health, Physical Education and Recreation, vol.23, pp.654-659, 1973, (in Japanese). Shinkishi Sakahashi et al., “Analysis of Physical Activities during Prolonged Skiing using Heart Rate Variability and Myoelectric Signals”, Proceedings of the 17th Symposium on Biological and Physiological Engineering (BPES 2002), pp.145-146, 2002. Rieko Osu, Hiroaki Gomi, “Multijoint Muscle Regulation Mechanisms Examined by Human Arm Stiffness and EMG Signals”, Journal of Neurophysiolgy, vol.81, pp.1458-1468, 1999. Gomi, H. and Kawato, M., “Human arm stiffness and equilibrium-point trajectory during multi-joint movement”, Biological Cybernetics vol.76, pp.163-171, 1997. Hermann Haken, “Principles of Brain Functioning”, Springer-Verlag, 2000. Naohisa Matsunaga, “Proficiency of Pitching Motion of infielders (In Japanese)”, Journal of health, physical education and recreation, vol.24, pp448-452, 1974.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
999
Generation of Size-Variable Image Template for Self-Position Estimation Considering Position Shift Kae DOKI a,1 , Naohiro ISETANI a , Akihiro TORII a and Akiteru UEDA a a Aichi Institute of Technology Abstract. We propose a new image template generation method for the self-position estimation of an autonomous mobile robot. In the proposed method, an image template is generated with Genetic Algorithm. Then, in the process of the self-position estimation, the size of the image template can be varied in order to change the time for the self-position estimation according to the situation around the robot. Therefore, a suitable image template is searched by GA search as the size of the image is varied. The position of the robot is estimated by matching the input image at the current situation with the stored image templates which indicate certain positions. As a criterion of the template matching, the normalized correlation coefficient is applied. This method is sensitive to the position shift of the image. Therefore, in order to realize the robust self-position estimation for the position shift, the amount of the position shift between the image template and the input image is compensated before the template matching. The usefulness of the proposed method is shown through some experimental results. Keywords. Image Template Generation, Vision for Autonomous Mobile Robot, Self-Position Estimation, Genetic Algorithm, Normalized Correlation Coefficient
1. Introduction We have researched action acquisition methods of an autonomous mobile robot with the real-time search [1][2]. In the action acquisition with the real-time search, a robot action is acquired through the recognition of the current situation and the action search on the ground. Therefore, we have to consider the balance between the time for recognition and the time for action search according to the situation around the robot in order to utilize the limited computational resource for the action acquisition efficiently. This idea is based on anytime sensing which has been proposed by S.Zilberstein et.al[6], and a crucial feature in action acquisition with the real-time search. For example, in an obstacle avoidance problem, when the robot is in a safe situation like a free space, it can find a proper path to the goal with detailed information about the environment spending enough time for the environment recognition and the action search. On the contrary, it has to decide an acceptable action by the minimum recognition and action search spending 1 Correspondence to :Aichi Institute of Technology, 1247, Yachigusa, Yakusa-cho, Toyota, Aichi, 470-0392, Japan. Tel.: +81 565 48 8121(ext.2002); Fax: +81 565 48 0010; E-mail: [email protected]
1000
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
very short time when it is in a dangerous situation where obstacles are at the point of colliding with the robot. Therefore, we aim to realize a new recognition method in the action acquisition with the real-time search. And as an example of recognition, we deal with a self-position estimation problem for the robot with vision based on the above idea. The robot estimates the current self-position by matching an acquired input image with stored image templates which indicate certain positions in the environment, and the normalized correlation coefficient[9] is applied as a criterion of the template matching in this research. In this problem, the following features are required to the the template matching for the self-position estimation. First, it is necessary that the computational amount for template matching varies according to the situation around the robot. In order to balance the time for recognition with the time for action search in the action acquisition with the real-time search. Moreover, the efficient and robust image processing is necessary so as not to deteriorate the quality of the search result. Generally speaking, the template matching with the normalized correlation coefficient needs a lot of computational resources. In addition, the quality of the template matching is sensitive to the position shift of the input image caused by the slip in the robot motion and so on. Therefore, it is necessary to realize the robust and efficient template matching. In order to satisfy the first feature, the size of the image template and input image is varied according to the situation around the robot in this research. Then, we propose a new image template generation method to realize the stable template matching, even if the size of the template and input images is varied or the position shift of the input image occurs in the robot motion.
2. Self-Position Estimation 2.1. Self-Position Estimation with Normalized Correlation Coefficient The robot estimates its position by matching an input image to stored image templates using the normalized correlation coefficient(NCC) as a criterion for the template matching. M N f (x, y) − f¯ · {g(x, y) − g¯} x y (1) R = ( 2 (M N M N 2 f (x, y) − f¯ ¯} x y x y {g(x, y) − g Equation (1) indicates the normalized correlation coefficient between two images F and G. They are grey-scale images and have the size M ×N respectively. The parameters f (x, y) and g(x, y) are the brightness at the coordinate (x, y) in each image of F and G. The parameters f¯ and g¯ are the mean value of f (x, y) and g(x, y) respectively. When the image f (x, y) is identical with g(x, y) , the value of R is 1.0. On the other hand, when the image f (x, y) is completely different from g(x, y) , R is -1.0. Here, it is assumed that f (x, y) is an input image acquired at the current situation of the robot, and that g(x, y) is an image template stored in the robot. By the template matching using this equation, it is estimated that the robot is at the position that the image template with the highest value of R indicates in the stored images. From this equation, we can see that the performance of the self-position estimation is influenced by the position shift of the input image which is caused by the robot motion
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
1001
Figure 1. Compensation method of position shift
because R is calculated with the pixels of the input and template images. Moreover, the computational amount for calculating R is proportional to the image size. According to these features, the size of an image template and an input image is varied according to the situation around the robot. In addition, in order to realize the robust template matching for the position shift, we introduce the compensation method of the position shift described in the following subsection. Then, in order to realize the efficient image matching, the image templates for the position estimation are generated with Genetic Algorithm(GA)[10] described in the following section. 2.2. Compensation of Position Shift An acquired input image may shift from the image template even if it is acquired at the same position. This is because of the slip caused in the robot motion. The self-position estimation by the image matching with NCC is sensitive to the position shift of the image. Therefore, the amount of the position shift between the image template and the input image should be compensated before the template matching for the self-position estimation. Fig.1 shows the compensation method of the position shift in this research. First of all, the one-dimensional arrays of the input image and an image template are calculated by summing up the brightness of each column in each image. (See Fig.1(a).) Then, the mean squared error(MSE) of two arrays acquired in the previous process is calculated. The same calculation is repeated shifting one of them. (See Fig.1(b).) The amount of the position shift between the image template and the input image is decided as the one of the shift of two arrays when the MSE is minimal. Before the template matching for the selfposition estimation, the input image is shifted right or left based on the calculated amount of the position shift, and template matching is performed using the only overlapped area between the input image and the image template.
1002
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
Start
Image Reduction n to 1/4 Size
Bit Selection by Genetic Algorithm
Image Expantion (n=n-1)
Original Size? (n=0?)
No
Yes End
Figure 2. Flow of template generation by Genetic Algorithm
3. Image Template Generation 3.1. Flow of Image Template Generation In order to realize the efficient template matching, the image templates for the selfposition estimation are generated by Genetic Algorithm(GA). The basic idea of template generation is that GA selects the efficient parts for the position estimation from the preacquired image which indicates a specific position. In the template matching, the only efficient parts for the position estimation are used. According to this procedure, the efficient self-position estimation can be realized with the small computational resources. Moreover, in order to get the robust templates for the change of the image size, the image size is varied during the template generation by GA. Fig.2 shows the flow of the template generation. Its detail is as follows. 1. The pre-acquired images are reduced to the 41n size. The reduction of the image is executed with the 2-dimensional wavelet transformation. The parameter n means the reduction level. 2. From these images, the efficient parts for the position estimation are selected by GA. The efficient parts are selected as some rectangles. (See its detail in the following subsection.) 3. After the GA search almost converges, the image templates acquired by selecting the efficient parts are enlarged by doubling the side of the rectangles. The 1 size by the wavelet inverse pre-acquired images are also enlarged to the 4n−1 transformation. 4. The size of the enlarged template is checked. When the parameter n is 0 i.e. the size of the enlarged template is equal to the one of the original pre-acquired images, the GA operation is finished. Otherwise, go back to the step 2 and keep searching the template by GA.
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
1003
Figure 3. Coding of the chromosome of GA
3.2. Coding and Evaluation in Genetic Algorithm Fig.3 shows the chromosome in the GA operation. In each chromosome, the parameter a is the side of the rectangle, and the parameters b and c respectively means the coordinate x and y of the rectangle. Therefore, one rectangle is expressed by a set of these three parameters, and each chromosome is constructed by connecting some sets. In this paper, we constructed the template generation system which distinguished a certain position from another as a test system. Therefore, the evaluation function of each chromosome is k=m i=n Rw (k) i=1 Rc (i) − k=1 . (2) E= n m In Eq. (2), Rc (i) and Rw (k) respectively indicate the values of NCC between the template and an image acquired at the same position and at the different position of the image template. The parameter n and m are the numbers of the same and different images respectively. The bigger the value of Rc (i) becomes and the smaller the value of Rw (k) becomes, the higher this evaluation value becomes. Therefore, the template with the higher evaluation value can distinguish a certain position from another more clearly.
4. Experimental Result 4.1. Experimental Setup In order to show the usefulness of the proposed method, we describe some experimental results that we performed. First of all, we examined how image templates were acquired with the proposed method. Fig.4 shows the experimental setup. As this figure shows, the robot was located at the three positions A, B and C, and an image was taken at each place. From Fig.4 (b) to Fig.4 (d) show the acquired images. Each image indicates the front position of the room A, B or C respectively. In this experiment, the image template was generated which distinguished the position A from the position B. Besides, the template was also generated which distinguished the position B from the position C. The conditions of the experiment are as follows. 1. The initial value of the parameter n in the Fig.2 is 2. That is, the size of the 1 of the original at the beginning of the pre-stored images and chromosome is 16 template generation .
1004
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
Corridor
Room C
(b) Acquired image at position A Position C
Room B
Bulletin board Position B
Room A Door
(c) Acquired image at position B
Camera Position A
Robot
(a) Experimental environment
(d) Acquired image at position C
Figure 4. Experimental Setup
2 The number of the generation is 5000 in the GA search on each size of the pre-acquired images and chromosome. The number of the chromosome is 100.In Eq.(2), the parameter n is 2, and the parameter m is 3. 3 The maximum number of rectangles is 10. The maximum length of the side of the rectangles is 20[pixel]. 4.2. Results and Verification Fig.5 shows the experimental results. In this figure, the images (a) of Fig.5 (1) and (2) are the chromosome with the highest evaluation value in each case. The white rectangles mean the selected part for the image template. That is, the only white parts are used in the calculation of NCC. The images (b) and (c) are the ones acquired at the position A and the position B(or the position B and the position C). The rectangles in each figure indicate the parts selected to recognize each position. According to these figures, we can see that the efficient parts to distinguish two images are chosen well by GA search. Next, we examined the relationship between the normalized correlation coefficient and the image size. We also examined the influence of the position shift on the normalized correlation coefficient. Table 1 and Table 2 show the values of the normalized correlation coefficient calculated with the acquired templates in the previous experiment Fig.5 (1) and (2) when the image size is changed. In Table 1, A1 and A2 mean the images
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
1005
White part : selected rectangular Black part : non-selected part
(a) Best chromosome
(b) Image at position A
(c) Image at posotion B
(1) Acquired template for distinguishing the position A from the position B(Result 1)
(2) Acquired template for distinguishing the position B from the position C(Result 2)
Figure 5. Experimental results
acquired at the positions which are respectively 10cm and 20cm far from the position A in Fig.4 (a). From B1 to B3 mean the images acquired at the positions which are respectively 10cm, 20cm and 30cm far from the position B in Fig.3 (a). Similarly, in Table 2, B1 and B2 mean the images acquired at the positions which are respectively 10cm and 20cm far from the position B in Fig.4 (a). From C1 to C3 mean the images acquired at the positions which are respectively 10cm, 20cm and 30cm far from the position C in Fig.3 (a). In each table, the matching time means how long it takes to calculate the normalized correlation coefficient using the acquired image template. From these tables, we can see that the value of the normalized correlation coefficient hardly varies by changing the image size. This result indicates that a robust image template for the change of the image size can be acquired by the proposed method. Moreover, the acquired image templates can recognize the right position even if an input image shifts from the one acquired the image for the template generation. We can also see that the time for the template matching is varied by the image size. According to this result, the time for the self-position estimation can be varied by changing the image size according to the situation around the robot.
5. Conclusion In this research, we aim to realize a recognition method in the action acquisition with the real-time search, and proposed a new image template generation method for the selfposition estimation of an autonomous mobile robot in this paper. We performed some experiments, and it was shown that the efficient and robust template matching could be realized through the results. In this paper, we constructed the system as a test which generated a template to distinguish one position from another. Therefore, we will expand the template generation system in order to generate the multiple image templates simultaneously in order to distinguish multiple positions in the environment.
1006
K. Doki et al. / Generation of Size-Variable Image Template for Self-Position Estimation
Table 1. Value of normalized correlation coefficient(Result 1)
Image Size
Image number A1
A2
B1
B2
B3
Matching time
1/1
0.809
0.879
-0.336
-0.323
-0.235
4.090
1/4
0.864
0.911
-0.364
-0.351
-0.250
1.043
1/16
0.893
0.918
-0.454
-0.374
-0.343
0.266
Table 2. Value of normalized correlation coefficient(Result 2)
Image Size
B1
B2
Image number C1 C2
C3
Matching time
1/1
0.979
0.901
-0.240
-0.779
-0.967
4.061
1/4
0.994
0.932
-0.263
-0.795
-0.970
1.119
1/16
0.988
0.944
-0.234
-0.840
-0.971
0.272
Acknowledgments This research has been supported by Hibi Science Foundation, and based on Grant-inAid for Scientific Research by Japan Society for the Promotion of Science, Ministry of Education, Culture, Sports, Science and Technology. References [1] K.Fujisawa et. al: "A real-time search for autonomous mobile robots", J.of the Robotics Society of Japan, Vol.17, No.4, pp.503-512,1999 [2] K.Doki et.al: "Environment Representation Method Suitable for Action Acquisition of an Autonomous Mobile Robot", Proc. of International Conference on Computer,Communication and Control Technologies,Vol.5, pp.283-288,2003 [3] M.Asada et. al: "Purposive Behavior Acquisition for a Robot by Vision-Based Reinforcement Learning", Journal of the Robotics Society of Japan, Vol.13, No.1, pp.68-74,1995 [4] H.Koyasu et.al: "Recognizing Moving Obstacles for Robot Navigation Using Real-Time Omnidirectional Stereo Vision", J.of Robotics and Mechatronics, Vol.14, No.2, pp.147-156,2002 [5] Y.Abe et.al: "Vision Based Navigation System by Variable Template Matching for Autonomous Mobile Robot", Proc. of the 1998 IEEE International Conference on Robotics & Automation, pp.952-957m 1998 [6] S.Zilberstein et al.: "Anytime Sensing, Planning and Action: A Practical Model for Robot Control", the ACM magazine,1996 [7] K.Doki et al.: "Generation of Image Template with Variable Matching-Time for Autonomous Mobile Robot with Vision", Proc. of International Conference on Computing, Communications and Technologies, Vol.VI, pp.365-370, 2004 [8] K.Doki et al.: "Self-position Estimation of Autonomous Mobile Robot with Vision using Matching-time Variable Image Template", Proc. of the Sixth IASTED International Conference on Intelligent Systems and Control, 446-089(CD-ROM), 2004 [9] Y.Satoh et.al: "Robust Image Registration Using Selective Correlation Coefficient", Trans. of IEE Japan, Vol. 121-C, No.4, pp.1-8, 2001 [10] D.E. Goldberg: "Genetic Algorithms", Addison-Wesley,1989 [11] S. Mallat: "A theory for Multiresolution Signal Decomposition: The Wavelet Representation", IEEE Trans. Pattern Anal. & Mach. Intell., Vol.11, No.7, pp.674-693, 1989
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
1007
Subjective Age Estimation using Facial Images - the Effects of Gender, Expressions and Age Groups Noriko NAGATA a, Naoyuki MIYAMOTO a, Yumi JINNOUCHI a and Seiji INOKUCHI b a School of Science and Technology, Kwansei Gakuin University, Japan b Takarazuka University of Art and Design, Japan
Abstract. We propose a relative estimation method for subjective age, imaged by ourselves, using peoples’ facial images and their chronological (real) age. We experimented with a rating scale for facial images which stimulated subjects. The subject evaluated an image as looking older than themselves with a range of responses. Finding an approximation curve of this range, the zero crossing point in the approximation curve is defined as the subjective age. The experimental result shows that the subjective age tends to be found in negative direction (tendency to estimate oneself as younger than actual). Besides, there are other trends between gender, between age groups, and between the different expressions such as ordinary and smiling. Keywords. Subjective age, facial image, rating scales methods, estimation, chronological age, affective computing.
1.
Introduction
Humans can estimate their age and gender by their experience of facial color and part of the facial features of their companions. The automated estimation of age and gender is an important factor in the study of the recognition of faces and facial expressions, however, it is difficult for this to reach the level of human estimation [1]. On the other hand, we often find ourselves being much more humble than needed, after finding a companion’s actual age. Then, we often say; “I thought he was much older than me!” It can be said that we didn’t estimate his age wrongly, but that we saw ourselves as younger, or older, than we really are. Seeing our age like this is called our subjective age. In this paper, we propose a relative estimation method for subjective age, using people’s facial images and their chronological age. This can be developed to further studies, such as the range of the subjective age, or, finding the cause of misunderstanding our subjective age and the chronological age, according to the generation and gender. There has been research of finding one’s imaged subjective age on the basis of a questionnaire [2]. However according to the search so far conducted, we did not find any research using facial images.
1008
N. Nagata et al. / Subjective Age Estimation Using Facial Images
(a) ordinary
(b) smiling
Figure 1. Examples of the facial images in this database.
In this paper, we present three points; the construction of a database of facial images which is the foundation of this research, a proposal for the definition of subjective age and the estimation method, and the result of research.
2.
Facial Image Database
There are total 20 classes from 20 years old to 70 years old for each gender in this database. At this moment, each class has 10 people, and a total of 400 facial images have been recorded, which include two different expressions (ordinary and smiling) for each person. Figure 1 shows examples of the facial images. Facial images are saved as high-resolution digital images by film scanning.
3.
Experiment of Subjective Age Estimation
3.1. Rating Experiment We choose facial images for both male and female subjects who were of different age and gender groups, from the same age class as the subject, and the next younger and older classes. The total number of facial images used was 60, composed of 5 images (per 1 class)
N. Nagata et al. / Subjective Age Estimation Using Facial Images
1009
x 3 classes x 2 genders (the same and the opposite genders) x 2 expressions (ordinary and smiling) as shown in Figure 2. Next, we experimented with a rating scale for these facial images which stimulate the subjects. The subjects were shown 60 facial images chosen by above way, and they evaluated if it looked older or younger than themselves. The evaluation had 5 ranks; “Definitely older than myself (2)”, “Probably older than myself (1)”, “Not able to estimate (0)”, “Probably younger than myself (-1)” and “Definitely younger than myself (-2)”. The reason for adopting a range of responses was not to estimate the chronological ages of the facial images, but to seek their relative position to others. Figure 3 shows an example of the chooser screen of the subjective age estimation system. This system is constructed by Java. The size of the facial images is 300*350 pixels. Before starting experiences, subjects were directed to make their decisions speedy based on intuition and not to mind the facial images of a same person. 3.2. Definition of Subjective Age and Method of Estimation To quantify the result of the estimation, we plotted the results in a two-dimensional plane with the x-axis being the relative age (the chronological age of the facial image minus the chronological age of the subject) and the y-axis was the estimation result, as shown in Fig. 4. The x-axis is from –9 to 9, because the subjects evaluated the facial data with a 9 year difference as a maximum (For example, in Case A the subject was 34 years old, he was shown facial data from 3 classes; 25-29 class, 30-34 class and 35-39 class). Thus, the data with the range of upper-right direction was obtained.
ordinary
32 years old, male
smiling
Next younger class (25-29 years old, male)
Same age class (30-34 years old, male)
Next older class (35-39 years old, male)
Next younger class (25-29 years old, female)
Same age class (30-34 years old, female)
Next older class (35-39 years old, female)
Figure 2. How facial images were chosen.
1010
N. Nagata et al. / Subjective Age Estimation Using Facial Images
Figure 3. An example of the choosing screen of the subjective age estimation system.
This range shows the subjective age of the group of subjects. Finding an approximation curve of this range, the zero crossing point in the approximation curve is defined as the subjective age [3]. Since we assume the curve to have a sigmoid function (nonlinear, continuous, monotone increasing function), here we adopt a logistics function [4], which is a kind of sigmoid function. Then a logit transformation to allow linearization can be applied as:
Y
§ y · ¸¸ ln ¨¨ ©1 y ¹
(1)
This results in the problem of the linear approximation of the transformed data. That is to say, the zero crossing point in the approximation line can be defined as the shift of the subjective age. The subjective age is considered to give us a kind of standard of the relative position (age) for a companion or in a group.
1011
N. Nagata et al. / Subjective Age Estimation Using Facial Images
㪪㪿㫀㪽㫋 㫆㪽㩷㫋㪿㪼 㪪㫌㪹㫁㪼㪺㫋㫀㫍㪼 㪸㪾㪼
2
10
1
5
0 㪄㪈㪇
㪄㪌
㪩㪸㫋㫀㫅㪾㩷㫍㪸㫃㫌㪼 㩿㫃㫆㪾㫀㫋㩷㫋㫉㪸㫅㫊㪽㫆㫉㫄㪸㫋㫀㫆㫅㪀
Approximation line 15
㪩㪸㫋㫀㫅㪾㩷㫍㪸㫃㫌
Approximation curve (logistics function)
㪇
㪌
transformed
㪈㪇
㪄㪈㪇
㪄㪌
㪇
㪌
㪈㪇
-5
-䋱 㪛㫀㪽㪽㪼㫉㪼㫅㪺㪼㩷㫀㫅㩷㪸㪺㫋㫌㪸㫃㩷㪸㪾㪼
-10
-2
䋺㪩㪸㫋㫀㫅㪾㩷㫍㪸㫃㫌㪼
-15
㪛㫀㪽㪽㪼㫉㪼㫅㪺㪼㩷㫀㫅㩷㪸㪺㫋㫌㪸㫃㩷㪸㪾㪼
䋺㪫㫉㪸㫅㫊㪽㫆㫉㫄㪼㪻㩷䌲㪸㫋㫀㫅㪾 㫍㪸㫃㫌㪼
Figure 4. Method of estimating subjective age.
3.3. Results of Research According to the above method, we treated the results obtained from research by showing ordinary and smiling facial expressions to a total of 156 male and female subjects who were between 25 and 54 years old. The zero crossing point of the approximation line of the entire data set after applying the logit transformation was found at -2.17 from the formula; y = 0.5357x + 1.1602. This is the shift of the subjective age of the subjects group from this research. For comparison, three cases were adopted: the effect of gender, the effect for expression and the effect of age. First, figure 5 gives the shift of the subjective ages of the male and the female subjects. Male subjects showed an unexpectedly lower (younger) value of -2.69, as compared with a value of -1.57 in the female subjects. Next, figure 6 shows the shift of the subjective ages between 3 age groups (25-34, 3544, 45-54). The results indicate that the older the subjects are, the nearer the subjective ages get to the real ages. Only the difference in the shift values of male subjects as they get older was not so greater than that of female subjects. Moreover, figure 7 shows the difference in the subjective ages by 2 expressions (ordinary and smiling) and 3 age groups. The subjective ages for smiling expressions show higher values than those for ordinary expressions for subjects of less than 34 years old. On the contrary, the subjective ages for ordinary expressions give higher values than those for smiling expressions for subjects of more than 35 years old.
1012
N. Nagata et al. / Subjective Age Estimation Using Facial Images
Shift
㪾㪼㫅㪻㪼㫉
㪇
Real age LOWER
㪄㪇㪅㪌
Subjective age
㪄㪈
㪄㪈㪅㪌 㪄㪈㪅㪌㪎
㫄㪸㫃㪼 㪽㪼㫄㪸㫃㪼
㪄㪉
㪄㪉㪅㪌 㪄㪉㪅㪍㪐 㪄㪊 Figure 5. Shift of the subjective age by gender.
Age group Shift
㪉㪌㪄㪊㪋
㪊㪌㪄㪋㪋
㪋㪌㪄㪌㪋
㪇㪅㪌
Subjective age HIGHER
㪇
Real age LOWER
㪄㪇㪅㪌
Subjective age
㪄㪈
㪄㪇㪅㪏㪏
㪄㪈㪅㪌 㪄㪉
㪄㪈㪅㪏㪊
㪄㪈㪅㪏㪐
㪄㪉㪅㪌 㪄㪊
㪄㪉㪅㪎㪎 㪄㪉㪅㪎㪎
㪄㪉㪅㪎㪌
㪄㪉㪅㪎㪌
㪄㪉㪅㪌㪊
Figure 6. Shift of the subjective age between age groups.
㪄㪉㪅㪌㪊
㫄㪸㫃㪼 㪽㪼㫄㪸㫃㪼
1013
N. Nagata et al. / Subjective Age Estimation Using Facial Images Age group
㪉㪌㪄㪊㪋
Shift
㪊㪌㪄㪋㪋
㪋㪌㪄㪌㪋
㪇㪅㪌
Subjective age HIGHER
㪇
Real age
㪄㪇㪅㪌
Subjective age
LOWER
㪄㪈
㫆㫉㪻㫀㫅㪸㫉㫐 㫊㫄㫀㫃㫀㫅㪾
㪄㪈㪅㪌 㪄㪈㪅㪌㪊
㪄㪉
㪄㪈㪅㪐㪌
㪄㪈㪅㪏㪎 㪄㪉㪅㪊㪋
㪄㪉㪅㪌 㪄㪉㪅㪍㪊
㪄㪉㪅㪌㪉
㪄㪊 Figure 7. Shift of the subjective age by expressions and age groups.
3.4. Discussion In the experimental result, subjective age was generally found to be in the negative direction (tendency to estimate younger than actual) as a tendency. Although we see our own faces everyday, there is no opportunity to evaluate our face relative to others. So, we tend to see a past record, such as a photograph, as being the same as we look now. For instance, everybody has had the experience of being sneered at after sending in an old photograph, in which we see little change, when submitting our picture for official purposes. It is presumed that this tendency comes from a kind of conviction that we never get old. Next, there are also other interesting trends between the gender and between the age groups. The results indicate that the shift value of male subjects is lower than that of female subjects, and the older the subjects are, the nearer the subjective ages get to the real ages. In other words, the more the subjects view themselves objectively. It can be explained that men generally place themselves in a more complex society than women, and they have to rise in higher social rank and acquire their confidence with age. There are also different tendencies between the different expressions of ordinary and smiling. These may be related to various factors such as psychological, physiological and social factors. Smiling expressions commonly look younger than ordinary expressions. It is
1014
N. Nagata et al. / Subjective Age Estimation Using Facial Images
supposed that the reason why this is that cheerful is replaced by youth in young generation. It can also be thought that the psychological distance gets shorter by smiling expressions. Therefore the subjective ages for smiling expressions must show higher values than those for ordinary expressions. However unusual results were obtained in over 35 years age groups. We suppose that it was affected by wrinkles in their faces when smiling. In the future, we will carry out further research of the different tendencies based on age and gender, facial expressions, texture of skin, makeup, and on occupation, and with more images and subjects. In addition, we are now considering the possibility of collecting the examination data through a public website. We believe this will be possible if the issue of the rights to the portraits can be settled. On the other hand, by the method with ‘average facial image’ [1], we can stimulate subjects with “average facial images of 25 years old male or 42 years old female”, so that we can prevent the unevenness of the stimulation. Furthermore, there would be no problem of the rights to the portraits.
4.
Conclusions
As a foundation into the research of subjective age with facial images, we have built up a facial image database, and have introduced of a definition of subjective age and an estimation method. The result shows that the tendency of the subjective age is younger than actual age and changes by stimulus. The subjective age represents the relative position to others in a society. Such research becomes more and more important from the viewpoint of self-recognition in the field of HCI and e-learning. In the future, we will improve the facial image database, and will further examine the estimation method of subjective age. In addition, we are planning to examine objective age, which is appearance age judged by others. We are also considering adopting other stimuli, for example, images of eye or texture of skin and so on.
References [1]
[2] [3]
[4]
Hayashi, J., Yasumoto, M., Ito, H. and Koshimizu, H., “Age and Gender Estimation and Modeling by Means of Facial Image Processing,” Proc. KES2001 Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies, Part2, pp. 1128-1136, September 2001. Sato, S., Shimonaka, Y., Nakazato, K. and Kawaai, C., “A Life-Span Developmental Study of Age Identity: Cohort and Gender Differences,” The Japanese J. Developmental Psychology, Vol. 8, No. 2, pp. 88-97, 1997. Noriko Nagata and Seiji Inokuchi, “Subjective age obtained from facial images -How old we feel compared to others ?, “ V. Palade, R. J. Howlett and L. Jain (Eds.), Knowledge-Based Intelligent Information and Engineering Systems II, Lecture Notes in Artificial Intelligence 2774, Springer-Verlag, pp. 877-881, 2003. Dillon, W.R. and Goldstein, M., Multivariate Analysis, John Wiley & Sons, 1984.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
1015
Evaluation and Suppression of Overrun of Microorganisms using Dynamics Model for Microrobotic Application Naoko Ogawa a Hiromasa Oku a Koichi Hashimoto b,c Masatoshi Ishikawa a a University of Tokyo, Japan b Tohoku University, Japan c PRESTO, JST, Japan Abstract. Our goal is to utilize living microorganisms as smart, autonomous microrobots. In previous work, we constructed a dynamics model of Paramecium cells for navigation using galvanotaxis (locomotion response to an applied electric field). In this paper, an overrun phenomenon, which affects the control performance, is evaluated using the proposed dynamics model, and it is found that the turning trajectory is independent of the individual ciliary forces. A trapping experiment based on this model is performed and better performance is achieved. This is an important step toward our eventual goal of using living cells as autonomous microrobots. Keywords. Microrobot, Paramecium, Galvanotaxis, Model, Overrun, Trapping
1. Introduction Today, there is great deal of interest in the measurement and control of objects at the micrometer and nanometer scales. Conventional technologies, however, have required human operators with good dexterity, a high level of expertise, and long experience. There is therefore a need for intelligent, autonomous microsystems to assist less skilled operators. However, there remain many problems to be solved before the practical implementation of such microsystems becomes realistic. Our approach to overcome these problems is to utilize microorganisms [1]. Motile microorganisms have acquired sophisticated sensors and actuators for survival through the course of their evolution. In this sense, microorganisms are smart microrobots with complete autonomy. If we can develop techniques to control them freely by exploiting their intrinsic functions and behavior, we can realize multi-purpose, programmable, organic microrobotic systems that are superior to existing micromachine systems. By controlling microorganisms in this manner, we aim to achieve various applications, such as cell manipulation, microscopic transportation, smart microsensors, and assembly of micro-electro mechanical systems (MEMS). One key technology to developing such microrobotic applications is navigation of cells and microorganisms. A promising candidate for navigation is to make use of gal-
1016
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
vanotaxis, an intrinsic locomotor response to an electrical stimulus, because of its noninvasive and non-contact nature. Some recent studies successfully used galvanotaxis to achieve simple motion control of Paramecium caudatum, a kind of protozoa exhibiting strong galvanotaxis [1–3]. In these studies, however, the motion of the microorganisms was based on simple empirical models, and limited control performance was achieved. For instance, in turning of a cell, there was a considerable time lag of several hundred milliseconds, causing the cell to go too far, which we call the “overrun” phenomenon [1, 3]. To realize more precise control, it is essential to deal with Paramecium in the framework of standard robotics. Thus, as an essential first step, we constructed a dynamics model of Paramecium galvanotaxis using a bottom-up approach [4]. In this paper, we apply our previously proposed model to the supression of overrun phenomena in order to improve the performance. We evaluated the overrun phenomenon using this model and found the interesting property that overrun is independent of the ciliary force, which allows us to easily implement countermeasures against overrun. Preliminary experiments based on this finding demonstrate better performance in trapping a cell within a small region. We expect that this technique will be effective toward our eventual goal of using living cells as autonomous microrobots.
2. Dynamics Model of Paramecium Galvanotaxis In this section, we briefly describe the dynamics model of Paramecium galvanotaxis. More details and its evaluations are available in our previous work [4]. 2.1. Paramecium and Its Galvanotaxis Paramecium caudatum is a unicellular protozoan with an ellipsoidal shape, inhabiting freshwater. It swims by waving cilia on its body; thousands of cilia beat the water backward to yield forward propulsion by means of a reaction force [5]. The cilia therefore act as tiny actuators. For navigation, we adopt galvanotaxis. Paramecium cells exhibit quite strong negative galvanotaxis [6]. An external electrical stimulus modifies the membrane potential and alters the ciliary movements, thus affecting the cell motion. Viewed macroscopically, the cell is made to swim toward the cathode in a DC electric field. A Paramecium cell in an electric field shows a characteristic ciliary movement pattern. To analyze this movement, we assume an imaginary plane that is perpendicular to the electric field and located near the center of the cell, slightly closer to the cathodal end, dividing the cell into two parts, as illustrated in Figure 1. The electric field causes cilia on the anodal end to beat more frequently (ciliary augmentation) [7] and the cilia on the cathodal end to also beat more frequently but in the opposite direction (ciliary reversal) [8]. The asymmetry of the ciliary beatings generates a rotational force that orients the cell toward the cathode. In order to analyze this phenomenon quantitatively, we constructed a dynamics model of Paramecium galvanotaxis, which is described below.
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms Boundary Plane
Ciliary Augmentation
1017
Ciliary Reversal
Ciliary Forces
Torque
Posterior
Cathode
Anode Anterior
Figure 1. Qualitative explanation of galvanotaxis.
Y
E
y
Posterior
Y
Ol
φ Anterior
Og
X
x X
Figure 2. The global coordinate system (X, Y ).
2.2. Assumptions By making several assumptions focusing only on those properties that are essential and dominant in galvanotaxis [4], we can describe the cell motion in a two-dimensional plane including the cell axis and the electric field vector. Hereafter, we consider cell motion only in this plane. Also, we can consider the cell as a two-dimensional ellipsoid on this plane. We define a global coordinate system (X, Y ) on the plane, as shown in Figure 2. It is fixed with respect to the external world, with the X-axis parallel to the electric field E. Let φ be the angle of the cell axis in the global coordinate system (φ < 0 in Figure 2, for the sake of convenience in deriving the model). Let the cell shape be an ellipsoid E with a major axis of length 2L and a minor axis of length 2R. We assume that cilia are distributed uniformly around the edge of the ellipsoid with linear density n. In the presence of an electric field, imagine a plane perpendicular to the field (hereinafter referred to as “a boundary plane”). The shortest distance between the plane and the center of the cell is set to l. The propulsion force yielded by one cilium is assumed to be f0 in the absence of an electric field. In the presence of an electric field E, the force increases to f = (1+βE)f0 , where β is a positive parameter.
1018
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
y L
μ
Ciliary Augmentation
E Ciliary Reversal
F1
P1
E 2R
r1
O
r2
Anterior l x+
μ P2 w
x
x{ F2
2L
Figure 3. Parameters in the local coordinate system.
2.3. Model of the Torque The Paramecium cell swims toward the cathode due to a torque caused by asymmetry of ciliary motion. In this section, we estimate this torque. First, consider an ellipsoid E in the local coordinate system (x, y), as illustrated in Figure 3. The asymmetry of ciliary beating exists only at the substantially trapezoidal region formed by the intersection of the boundary plane and the ellipsoid (shown as the hatched region in Figure 3). Thus, we have only to consider the forces generated at this region. Because it would be too complicated to consider the individual minute forces generated by each cilium, here we focus on the resultant forces for simplicity. We set sites of action, P1 and P2 , at the midpoints of the sides of the trapezoid and assume the directions of the forces to be tangential to the ellipsoid. We then define position vectors, −−→ −−→ r1 = OP1 and r2 = OP2 . Next, let us suppose that the magnitude of the resultant force is proportional to the “height” of the trapezoid w, which is a signed value the same sign as θ. Then, the propelling forces F 1 and F 2 at the points P1 and P2 , respectively, are represented by F {1,2} = ∓f wnm{1,2} , where m1 and m2 are the unit tangent vectors at P1 and P2 , and the ∓ symbol indicates the two directions of ciliary beating. Thus, we find the torques at the points P{1,2} , namely, τ {1,2} = r {1,2} ×F {1,2} . It should be noted that these vectors are treated as three-dimensional in calculating cross products. The total torque rotating the cell body is given by τ = τ 1 + τ 2 . Since its x and y components are obviously zero, hereafter we call its z component, τz , the “torque”. Finally, the torque generated in the Paramecium cell oriented at angle φ is described as: √ 4LR2 f ns L2 c2 + R2 s2 − l2 , (1) τz (φ) = − √ L4 c4 + 2L2 R2 c2 s2 + R4 s4 − L2 l2 c2 + R2 l2 c2 where s = sin φ and c = cos φ. 2.4. Equations of Motion of Paramecium Cell Using the torque estimated in the previous section, we now discuss the equations of motion of a Paramecium cell. In a micrometer-scale world, the inertial resistance of the
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
1019
fluid is small enough to be negligible, and the viscous resistance becomes dominant instead. Hence we can apply Stokes’ law. Since a rigorous evaluation of the viscous resistance around an ellipsoid is quite complicated, here we approximate the viscosity by substituting Stokes’ law for a sphere. Thus, the equation of motion for the translational motion of the cell can be approximated ¨ + DX ˙ = F , where X = (X, Y )t is the cell position, F = 2f n|xa |eX is by M X a forward propulsive force, eX = (cos φ, sin φ)t is a unit vector along the longitudinal axis, D = 6πμR is the viscous friction coefficient, M = ρV is the cell mass, ρ is the cell density, and V = 4πLR2 /3 is the cell volume. In addition, the equation of motion for the rotation is given by I φ¨ + D φ˙ = τz (φ), where I = πM (R2 +L2 )/5 is the moment of inertia for an ellipsoid, D = δπμL3 is the viscous friction coefficient, and δ is a coefficient to compensate for errors in the model. Finally, integration of the equations of motion for the translational motion and the rotational motion leads to the following equations: y˙ = Ay + B(y), ⎞ ⎞ ⎛ ⎛ 00 1 0 0 0 0 ⎟ ⎜0 0 ⎜ 0 1 0 0 ⎟ 0 ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ 0 0 −D/M ⎜ (P cos φ)/M ⎟ 0 0 0 ⎟ ⎟, ⎜ ⎜ A=⎜ , B(y) = ⎜ ⎟ 0 −D/M 0 0 ⎟ ⎟ ⎜0 0 ⎜ (P sin φ)/M ⎟ ⎠ ⎝0 0 ⎝ 0 0 0 1 ⎠ 0 00 0 0 0 −D /I τz (φ)/I
(2)
˙ t and P = 2f n|xa | . ˙ Y˙ , φ, φ) where y = (X, Y, X, With parameters determined by experimental data, numerical experiments using our model demonstrated realistic behaviors, such as U-turn motions, like those of real cells. The validity of the model was also verified by comparing it quantitatively with experimental data, where the simulated data was approximately in agreement with several experimental results. See the previous work [4] for more details.
3. Evaluation of Overrun 3.1. Overrun in Paramecium U-Turn We have reported several experiments of motion control of Paramecium cells, such as zig-zag motion, U-turn motion, and trapping cells within a small region [1]. However, these studies revealed the problem that it can take hundreds or thousands of milliseconds between applying the stimulus to turn the cell and completion of the actual turn [9]. This time delay resulted in poor performance; for example, cells overran out of the trapping region by a substantial distance [1]. In this paper we attempt to improve the performance by considering overrun in our proposed model. The overrun is, in brief, a phenomenon that the cell goes too far in U-turn motion. Figure 4 illustrates a schematic overview of a U-turn of a Paramecium cell by reversal of the electrical stimulus. The cell does not turn only at the point of stimulus application (at P), but turns after a substantial overrun (at Q). Thus, let us define the X-overrun as the distance traveled in overshooting from P (stimulus point) to Q (turn point) in the
1020
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms Asymptote R
15
Eventual Y-Overrun
Q
Y-Overrun P Initial Angle X-Overrun
Figure 4. Definition of overrun.
X-direction (parallel to the electric field). Correspondingly, Y-overrun is defined as the distance from P to Q in the Y -direction. 3.2. Evaluation of Overrun In order to evaluate the overrun using the proposed model, we performed numerical calculations. Numerical analysis software (MATLAB 7, MathWorks) was used for these calculations. In numerical simulation, swimming cells encountered a stimulus reversal when they oriented at 30◦ with respect to the electric field (φ = −150◦ ) at the point P. The X and Y distances between P and Q (the tip of the U-turn) were calculated to be referred as the X-overrun and the Y -overrun. Parameters were set based on several experimental results, as described in our previous work [1]. The force generated by cilia, f0 n, cannot easily be determined due to diversity among cells and the difficulty of precise measurement. However, it is possible to obtain a rough estimate based on actual measurements. Therefore, we used eleven types of values for f0 n with a suitable range, namely, from 0.5 μN/m to 1.5 μN/m, in order to compare the results obtained. Figure 5 shows time sequences in the X-direction for several values of f0 n. The minima of the curves correspond to the X-overrun. It is noteworthy that the X-distance to the point Q seems constant, whereas the time to arrive at Q varies depending on the ciliary force f0 n. In addition, the magnitude of the overrun agrees well with actual measurements [1]. Likewise, Y -overrun was also evaluated. The time sequence in the Y -directon for several f0 n is shown in Figure 6. The point where the inclination of the curve temporarily goes to zero represents the Y -overrun. Like the X-overrun, the Y -overrun is independent of the ciliary force f0 n. The terminal points of the curves indicate that the asymptotes and the eventual Y -overrun are almost constant, whereas the U-turn times depend on the ciliary force f0 n. 3.3. Evaluation of Whole Trajectory These results indicate that the overruns for both the X and Y directions can be treated as constant, that is, independent of the ciliary force. Consequently, this leads to the natural hypothesis that the whole trajectory itself is independent of the ciliary force. We verified
1021
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms 200
0
–50
150
–100
100 –150 Y-Overrun
–200 Y [um]
X [um]
50 f0n=0.5uN/m
0
f0n=0.5uN/m
–250 –300
–50
–350
–100 f0n=1.5uN/m
–150
–400
X-Overrun
–200 0
–450
1
2
3
4
5
Eventual Y-Overrun
f0n=1.5uN/m
–500 0
5
10
15 T [s]
T [s]
20
25
30
Figure 5. Time sequences of X positions for Figure 6. Time sequences of Y positions for various f0 n. various f0 n.
30 25 f0n=1.5uN/m
0 Y [um]
T [s]
20
f0n=0.5uN/m
15 10 5
–200 –400 –600
0 0
0
500
1000
1500 X [um]
2000
2500
3000
–200
2000 1000
–400 0 Y [um]
–600
–1000
Figure 7. Relations Y -position, and time t.
Figure 8. Projection of Figure 7 onto the XY plane.
X [um]
among
X-position,
this hypothesis by numerical calculations. Figure 7 shows the relationships among the Xposition, Y -position, and the time t in the three-dimensional space, which are extracted from Figure 5 and Figure 6. Its projection onto the X-Y plane is shown in Figure 8 to show multiple trajectories overlaid. The trajectories are completely identical, suggesting that the whole trajectory is independent of the ciliary force. These experiments imply that, at least for the U-turn motion, disparities in the ciliary force have little or no influence on the cell trajectory. This is a quite convenient property for motion control of Paramecia for the following reasons. First, it is unnecessary to consider the ciliary force in dealing with only positions of cells. This is beneficial because the experimental identification of ciliary motion is quite difficult. Second, disregarding individual diversity in the ciliary forces allows us to apply a-priori countermeasures against the overrun phenomenon without performing calibration in advance. Finally, it will allow for simple trajectory planning and control of Paramecium cells in future works.
4. Trapping Experiment Considering Overrun The discussion in the previous section suggested that we can improve the performance a-priori without detailed knowledge of the cell properties. In this section, this finding
1022
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
RTOS PC
Image Features
I-CPV
Tracks a moving microorganism so as to keep it in the center
I-CPV
Image Microscope
Torque Instruction
XY-Stage Glass Slide
Actuator Position Voltage
X Electrodes
Y XY Stage
Specimen
Figure 9. System configuration.
is applied to a trapping experiment. Note that this is a preliminary experiment for more rigorous evaluation to be forthcoming. 4.1. Experimental Setup This section briefly describes the experimental setup. For more details, see our previous work [1]. The configuration of the overall system is illustrated in Figure 9. An electrical stimulus is applied to cells swimming in a chamber mounted on an XY stage. The stage is controlled by a high-speed vision system called I-CPV to keep a cell in the center of the field of view. Using reading encoders on the stage and performing coordinate transformation, we can obtain the global position of the cell. The I-CPV system [10] integrates an image intensifier with a Column Parallel Vision (CPV) system, which is a high-speed vision system developed for robotic applications [11]. It captures and processes 8-bit gray-scale images with 128×128 pixels at 1-kHz frame rate. Each pixel has a programmable general-purpose processing element (PE) based on the S3 PE architecture [12]. The I-CPV is mounted on an upright optical microscope (Olympus, BX50WI) and captures dark-field images. From the captured images, the I-CPV system provides the image moments every 1 ms. The trajectory and orientation of the target are reconstructed from the moments and are used for visual feedback control of the XY stage (SMC, LAL00-X070) to achieve “lock-on” tracking of moving microorganisms [13]. 4.2. Methods and Materials The system can adjust the stimulus voltage applied to the electrodes according to the cell status captured by the I-CPV system. For example, the system can reverse the voltage when a cell goes out of a certain region, which causes the cell to move in the opposite direction. This allows us to trap the cell within the region. We adopt this simple closedloop trapping experiment as a preliminary demonstration. The stimulus was adjusted in real time according to the target status. The width of the trapping region was set to 1 mm. The electrical stimulus was applied in the X direction by two carbon electrodes on either side of the chamber. The strength of the voltage gradient was 4.1 V/cm.
1023
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
Model-based
Not Model-based Voltage Gradient
Voltage Gradient
500
0 -5
–500
5 0
-5
Trap Region
–500
Cell Position
Cell Position
0
2
4
Time [s]
Voltage [V/cm]
5
X Position [um]
X Position [um]
Trap Region
Voltage [V/cm]
500
6
8
10
0
2
4
Time [s]
6
8
10
Figure 10. Applied voltage (dashed lines) and X position (parallel to the electric field) of a cell (solid lines). The shaded region is the bounded region for trapping. Left: conventional method in which overrun was ignored [1]. Right: proposed method in which overrun was accounted for. Overrun was suppressed in the proposed method and better trapping performance was achieved.
In previous experiments, the electrical stimulus was reversed just when the cell moved outside the boundaries. This time, however, we embedded the overrun estimates of 140 μm derived from the numerical calculations shown in Figure 5 into the system; the system was designed to reverse the stimulus before the boundary lines with the undershoot of 140 μm equal to the estimated overrun so that the cell would turn just at the boundaries. Wild-type Paramecium caudatum cells were cultured at 20-25◦ C in a soy flour solution. Cells grown to the logarithmic or stationary phase (4-10 days after incubation) were collected together with the solution, filtered through a nylon mesh to remove debris, and infused into the chamber. 4.3. Results Figure 10 shows time sequences of the electrical stimulus (dashed lines) and cell position in X-direction (solid lines). The shaded region is the bounded region for trapping. Comparing the case in which overrun is considered (right) and the case where it is not considered (left), it is clear that overrun was suppressed with the proposed method, implying that the trapping performance was improved. Note that this result is preliminary and further evaluation needs to be performed. Particularly, more rigorous experiments and evaluation are to be performed by obtaining three-dimensional data from our tracking and focusing system [13, 14]. In addition, we are planning to evaluate statistic properties of overrun by several cells under various conditions.
5. Summary In this paper, we evaluated the overrun of Paramecium cells in galvanotactic navigation and found that the trajectory is independent of ciliary force. Using this finding, we improved the performance of a trapping task. This is an important step towards using living cells as microrobots.
1024
N. Ogawa et al. / Evaluation and Suppression of Overrun of Microorganisms
References [1] N. Ogawa, H. Oku, K. Hashimoto and M. Ishikawa. Microrobotic visual control of motile cells using high-speed tracking system. IEEE Trans. Robotics, 21(4), 704–712, Aug. 2005. [2] R. S. Fearing. Control of a micro-organism as a prototype micro-robot. 2nd Int. Symp. Micromachines and Human Sciences, Oct. 1991. [3] A. Itoh. Motion control of protozoa for bio MEMS. IEEE/ASME Trans. Mechatronics, 5(2), 181–188, June 2000. [4] N. Ogawa, H. Oku, K. Hashimoto and M. Ishikawa. Dynamics model of Paramecium galvanotaxis for microrobotic application. Proc. 2005 IEEE Int. Conf. Robotics & Automation (ICRA 2005), 1258–1263, Apr. 2005. [5] Y. Naitoh and K. Sugino. Ciliary movement and its control in Paramecium. J. Protozool., 31(1), 31–40, 1984. [6] H. Machemer and J. de Peyer. Swimming sensory cells: electrical membrane parameters, receptor properties and motor control in ciliated Protozoa. Ver. Deut. Zool. gesell. , 86–110, 1977. [7] T. Kamada. Control of galvanotropism in Paramecium. J. Fac. Sci. Imp. Univ. Tokyo, Sect. IV, Zoology, 2, 123–139, 1929. [8] K. Ludloff. Untersuchungen u¨ ber den Galvanotropismus. Arch. Ges. Physiol., 59, 525–554, 1895. [9] N. Ogawa, H. Oku, K. Hashimoto and M. Ishikawa. Response measurement of Paramecia for realization of organized bio-modules. Proc. 2003 JSME Conf. Robotics and Mechatronics (Robomec’03), 2P2–3F–E3, May 2003. in Japanese. [10] H. Toyoda, N. Mukohzaka, K. Nakamura, M. Takumi, S. Mizuno and M. Ishikawa. 1ms column-parallel vision system coupled with an image intensifier; I-CPV. Proc. Symp. High Speed Photography and Photonics 2001, 5-1, 89–92, 2001. in Japanese. [11] Y. Nakabo, M. Ishikawa, H. Toyoda and S. Mizuno. 1ms column parallel vision system and it’s application of high speed target tracking. Proc. 2000 IEEE Int. Conf. Robotics & Automation (ICRA2000), 650–655, Apr. 2000. [12] M. Ishikawa, K. Ogawa, T. Komuro and I. Ishii. A CMOS vision chip with SIMD processing element array for 1 ms image processing. Dig. Tech. Papers of 1999 IEEE Int. Solid-State Circuit Conf. (ISSCC’99), 206–207, 1999. [13] H. Oku, N. Ogawa, K. Hashimoto and M. Ishikawa. Two-dimensional tracking of a motile micro-organism allowing high-resolution observation with various imaging techniques. Rev. Sci. Instr., 76(3), Mar. 2005. [14] Theodorus, H. Oku, K. Hashimoto and M. Ishikawa. Optical axis tracking of microorganism using high-speed vision. Proc. Focus on Microscopy 2005, Mar. 2005.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
1025
From Muscle to Brain: Modelling and Control of Functional Materials and Living Systems Mihoko Otake , Division of Project Coordination, The University of Tokyo PRESTO program, Japan Science and Technology Agency Abstract. This paper describes modelling and control of typical open systems: one is electroactive polymer gel and another is spinal nervous system. It is very important to estimate the model based on their mechanisms in order to navigate the subjects into objective states. Firstly, the wave-shape pattern control method was proposed based on the gel model. Wave-shaped gels with varying curvature were obtained by switching the polarity of a spatially uniform electric field. Secondly, time series of images which represent distribution of somatic information inside the spinal cord were successfully obtained through measurement and computation utilizing somatotopic organization model of the spinal cord. The general problem underlying these studies is the degrees-of-freedom problem. Making use of the nature of functional materials or living systems through modelling their mechanisms helped us to solve the problem. Keywords. modelling, control, electroactive polymer, spinal nervous system, somatosensory information
1. Introduction Active materials and living systems are both open systems, which are capable of exchanging matter and energy with their environment. Their internal states vary over time because of their characteristics. Therefore, it is very important to estimate the model based on their mechanisms in order to navigate the subjects into objective states. They are unstable as well as controllable if we understand their mechanism and making use of them. In this paper, modelling, simulation and control of typical open systems are described: one is electroactive polymer gel and another is spinal nervous system. Both studies were carried out by the author, which challenge to solve degrees-of-freedom problem of soft materials and neuromusculoskeletal systems. The first study takes synthetic approach. It proposes how to control shape of the functional material, so-called artificial muscle having conceptually infinite degrees of freedom. In contrast, the second study takes analytic approach. It investigates how the nervous system perceives and deals with vast numbers of muscular information in the human body. 1 Correspondence
to: M. Otake, E-mail: [email protected]
1026
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
2. Modelling and Control of Functional Materials 2.1. Wave-Shape Pattern Generation of Electroactive Polymer Gel Electroactive polymers [1] change their shapes in the electric fields. The fact motivated us to design the whole-body deformable robots. The purpose of the studies have been to establish methods for deriving a variety of shapes and motions of deformable robots whose bodies are made of active materials. Shape control of such mechanisms is difficult, which is commonly called the degrees-of-freedom problem [2], namely a problem of controlling many points of continuum with a small number of inputs. Mechanisms consisting of a typical electroactive polymer gel[3] containing poly 2 -acrylamido -2methylpropane sulfonic acid (PAMPS), named ’gel robots’, were designed, developed, and controlled experimentally[4,5,6,7,8]. We reported that straight beam of gels exhibit wave-shape pattern in a uniform constant electric field. The experimental results suggest that complex shapes would be generated by simple electric fields. This characteristic should be useful for shape control of the gel having large degrees-of-freedom. Then, a method to reach the variety of wave-shapes was proposed[9]. 2.2. Modelling of Electroactive Polymer Gel and its Mechanism of Pattern Formation The nonlinearity of the electroactive polymers is the key to understand the mechanism of wave-shape pattern formation of the gels. An ionic polymer gel in an electric field deforms through penetration of the surfactant solution [10]. The state of the gel is characterized by the distribution of adsorbed molecules, which determines its overall shape. Adsorption state transition of the gel is approximated in the following local nonlinear differential equation[11]: (1) where is the adsorption rate defined as the molar ratio of bound surfactants to the local sulfonates group of the polymer chains inside the gel; is the current density vector on the gel surface; is the normal vector of the gel surface; and are association and dissociation constants. The equation shows that the effect of an electric field to a gel is determined by the geometry of the equi-potential surface and the gel surface. Adsorption reaction causes the mechanical deformation. The deformation determines the subsequent reaction. This interaction of deformation and reaction brings out the wave-shape pattern formation. 2.3. Wave-shape pattern control of electroactive polymer gel Wave-shaped gels with varying curvature were obtained by switching the polarity of a spatially uniform electric field. Originally straight-shaped gel deformed into the shapes containing one, two and three waves. The period for reversing the polarity was explored through numerical simulation. The polarity of one of the electrodes was either anodic(0) or cathodic(1). A control sequence is described with a time interval and its sequence. A time interval of 10 second was initially selected and its sequence of eight intervals were enumerated from (00000000) to (11111111). Other intervals were also considered
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
1027
every 10 seconds from 20 to 120 seconds. We determined (00001111) with 80 seconds time interval as the best input sequence, which generated three-waved shape with large curvature. 2.4. Experimental Methods The gel was prepared by radical copolymerization at 323K for 48 hours. The total monomer concentration in N,N’ - dimethylformamide was kept at 3.0M in the presence of 0.01M N,N’- methylenebisacrylamide (MBAA) as a cross-linking agent and 0.01M a,a’ - azobis (isobutyronitrile) (AIBN) as an initiator. Monomers were 2 - acrylamido -2- methylpropanesulfonic acid (AMPS), n-stearyl acrylate (SA), and acrylic acid (AA) with the composition (AMPS: SA: AA) = (20: 5: 75). After the polymerizations, the gel was immersed in a large amount of pure water to remove un-reacted reagents until it reached an equilibrium state. In order to apply the electric field, the gel was immersed in a dilute solution of 0.01M lauryl pyridinium chloride containing 0.03M sodium sulphate. All experiments were carried out at a room temperature of 25 oC. The experimental setup included a pair of parallel platinum plate electrodes of 25 [mm] wide and 40 [mm] long each, which were horizontally placed with 40 [mm] vertical spacing between them. A beam-shaped gel of 4 [mm] wide, 21 [mm] long, and 1[mm] thick was also horizontally placed in between with one end fixed for 5 [mm] and the other end free. The two electrodes and the gel were immersed in the solution. A spatially-uniform electric field was applied by the electrodes. The current density was kept constant by a galvanostat at 0.1 [mA/mm ]. The polarity of the electric field was reversed from anodic(0) to cathodic(1) when the tangential angle at the tip of the gel reached the same values as that of the simulation. The deformation of the gel was monitored and recorded by a video microscope. 2.5. Experimental Results The experimental snapshots corresponding simulation show the initial (1), transitional (2-8), and final (9) forms of the gel(Figure 1). Let be an angle between the tangential line of the gel at the free end and one of the fixing ends. First, the gel bent toward the anode side (2). A portion of gel near the free end started to bend in the other direction went over /2 (3). The deformation of root portion remained same. Again, when when went under /2, a smaller portion of the gel near the free end started to bend the first direction (4). The polarity of the electric field was reversed at (5). Then, the gel bent toward the reverse side (6). A portion of gel near the free end started to bend in the other direction(7,8) without changing the polarity of the electric field. Finally, the gel reached the desired shape (9).
3. From Muscle to Brain In the previous section, the author proposed how to control shape of the functional material, so-called artificial muscle having conceptually infinite degrees of freedom based on chemical mechanism. How the living systems solve the degrees-of-freedom problem? In the human body, the nervous system perceives and deals with vast numbers of muscular information. Therefore, the author modeled the nervous system based on anatomy
1028
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
10[mm]
Figure 1. Experimental results of the gel which deforms into three-half-waved shape
in order to figure out how the motion data are observed inside the human body, which is described in the following section.
4. Modelling of Living Systems 4.1. Somatosensory Information in the Spinal Nervous System For health and partially disabled, the spinal nervous system comprising peripheral nerves and spinal cords connect brain and muscles. Therefore, we should be able to estimate the brain activities through motion measurement. During rehabilitation process, analysis of walking motion is critical for monitoring functional recovery of the nervous system after stroke [12,13]. Visualization and analysis of somatosensory information in the spinal nervous system would help us understand how the brain perceive and regulate its body movement, since they interface the brain and muscles. The motivation is to design motor learning support system for acquisition of motor skill considering neural mechanism. We should be able to obtain neural information through analyzing whole body muscle data,
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
musculoskeletal system
1029
nervous system
motor command sensory input
Figure 2. The information flow between the musculoskeletal system and the nervous system
because the nervous system controls the musculoskeletal system, while the lengths and forces information gets back to the nervous system (Figure 2). Muscles are controlled by different nerves and are classified by the innervated nerves. The 31 pairs of human spinal nerves consist of 8 pairs of cervical nerves (C), 12 pairs of thoracic spinal nerves (T), 5 pairs of lumbar nerves (L), 5 pairs of sacral nerves (S), and a single pair of coccygeal nerves (Coc). These nerves pass through and depart from the clearance between spinal bones. In order to track the motor information, we developed the anatomical model of the peripheral nerves and the spinal cord. Motor neurons were arranged in each layer of the spinal cord and feedback signals which trigger simple reflex were mapped onto the plane. Groups of muscular data innervated by the same layer of the spinal cord were compared for each trial and the way of regulation was studied over time[14]. 4.2. Computation of Motor Information through Motion Mesurement We measured ’kesagiri’, a sword swinging motion. The motion was selected because it is a typical coordinated whole body motion which requires motor learning. The snapshot of ’kesagiri’ is shown in Figure 3. The ’kesagiri’ motion was measured utilizing the optical motion capture system. The positions of the markers distributed on the body were obtained. The frame rate was 33 [frame / sec]. The number of trials was 26. One of the trials was utilized for visualization, and two of the trials were selected for analysis. Human motion can be analyzed through combining motion capture system and musculoskeletal model. The body motion is mapped onto the musculoskeletal model so that the lengths and the forces of muscles are calculated[15,16,17]. Inverse kinematics problem was solved utilizing musculoskeletal human model which contains 53 links and 366 muscles, whose degrees of freedom is 153[18]. The forces of the muscles were obtained also utilizing musculoskeletal human model via inverse dynamics computation. In this way, motor information, which should be obtained at mechano-receptors were calculated.
1030
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3. Snapshot of sword swinging ‘kesagiri’ motion: Motion capture data are mapped onto the musculoskeletal model.
4.3. The Images of Spinal Neural Information and its Application for Analysis The time series of images of the spinal neural information at C5 among 31 segments of the spinal cord were mapped from the motor information utilizing the map by applying the rules for somatotopic organization. Figure 4 shows the spinal neural image at C5 which represents the length of the muscles innervated by the segment. We extracted muscle length data on a muscle controlled by these nerves and normalized it with the muscle length when standing. The value is represented by intensity. We can see how the spinal cord segment C5 perceive the whole body motion. C5 innervates the upper part of the body especially chest and upper arms. Both parts moves rapidly during ’kesagiri’ motion. The analysis method was proposed which calculates correlation and phase contrast of the neural information. Corresponding periods were obtained by local correlations for each pattern of the different trials. As a result, phase contrasts were obtained. Results of the analysis indicated that intersegmental regulation occurs through repetative trials of coordinated movements[14].
5. Discussion Two approaches were taken to study degrees of freedom problem. The first study suggests that hysteretic property of the materials helps to solve the degrees-of-freedom problem of deformable robots containing electroactive polymers. From the model of deformation and experimental results, we can consider gel as an integrator of the series of input. Molecules are adsorbed on the surface of the gels, which deform the shape of the gels. The deformed surface forms a reaction field in the next step. In this way, series of input by a time varying electric field are accumulated. The second study implies that the parallel processing of the somatosensory information in the spinal cord reduces the difficulty of the degrees-of-freedom problem of human body movement. The distributed
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
(a)
(b)
(c)
(d)
(e)
(f)
1031
Figure 4. Somatosensory image of spinal cord at C5 during ’kesagiri’ motion. Each image corresponds to each posture in Figure 3.
neural circuits in the spinal cord segments receive and process length and forces information of the innervating muscles. The anatomical model of the spinal cord extracted time series of partial images from large numbers of muscular data during whole body motion.
6. Conclusion In this paper, wave-shape pattern formation of electroactive polymer gel and estimation of spinal information through external observation are described in terms of modelling. Effectiveness of the models were demonstrated through the following results. The wave-shape pattern control method was proposed based on the model. Waveshaped gels with varying curvature were obtained by switching the polarity of a spatially uniform electric field. Time series of images which represent distribution of somatic information inside the spinal cord were successfully obtained through measurement and computation utilizing somatotopic organization model of the spinal cord. The general problem underlying these studies is the degrees-of-freedom problem. The first study took synthetic approach while the second study took analytic approach. Making use of the nature of functional materials or living systems through modelling their mechanisms helped us to solve the problem. Future work includes development of method for navigating internal state of the nervous system into objective state like electroactive polymer gel for supporting motor learning and learning in general. Acknowledgement The author would like to thank Prof. H. Inoue with JSPS and AIST, Prof. T. Takagi, Prof. M. Inaba, Prof. Y. Nakamura with the University of Tokyo, Prof. Y. Kakazu, Prof. Y. Osada with Hokkaido University, who provided us valuable suggestions and discussions. This work is supported by Japan Science and Technology Agency Grant for
1032
M. Otake / From Muscle to Brain: Modelling and Control of Functional Materials
PRESTO program "Development of the Bilateral Multiscale Neural Simulator" (PI: Mihoko Otake).
References [1] Y. E. Bar-Cohen, Electroactive Polymer (EAP) Actuators as Artificial Muscles - Reality, Potential and Challenges. SPIE Press, Bellingham, WA, 2001. [2] N. Bernstein, The Co-ordination and Regulation of Movments. Pergamon Press, 1967. [3] Y. Osada, H. Okuzaki, and H. Hori, “A polymer gel with electrically driven motility,” Nature, vol. 355, pp. 242–244, 1992. [4] M. Otake, M. Inaba, and H. Inoue, “Kinematics of Gel Robots made of Electro-Active Polymer PAMPS Gel,” in Proceedings of the 2000 IEEE International Conference on Robotics and Automation, 2000, pp. 488–493. [5] M. Otake, Y. Kagami, M. Inaba, and H. Inoue, “Dynamics of Gel Robots made of Electroactive Polymer Gel,” in Proceedings of the 2001 IEEE International Conference on Robotics and Automation, 2001, pp. 1457–1462. [6] M. Otake, Y. Kagami, Y. Kuniyoshi, M. Inaba, and H. Inoue, “Inverse Kinematics of Gel Robots made of Electroactive Polymer Gel,” in Proceedings of the 2002 IEEE International Conference on Robotics and Automation, 2002, pp. 3224–3229. [7] M. Otake, Y. Kagami, M. Inaba, and H. Inoue, “Motion design of a starfish-shaped gel robot made of electro-active polymer gel,” Robotics and Autonomous Systems, vol. 40, pp. 185– 191, 2002. [8] M. Otake, Y. Kagami, Y. Kuniyoshi, M. Inaba and H. Inoue, “Inverse Dynamics of Gel Robots made of Electroactive Polymer Gel,” in Proceedings of the 2003 IEEE International Conference on Robotics and Automation, 2003, pp. 2299–2304. [9] M. Otake, Y. Nakamura, M. Inaba, and H. Inoue, “Wave-shape Pattern Control of Electroactive Polymer Gel Robots,” in Proceedings of the 9th International Symposium on Experimental Robotics, 2004, p. ID178. [10] H. Okuzaki and Y. Osada, “Effects of hydrophobic interaction on the cooperative binding of a surfactant to a polymer network,” Macromolecules, vol. 27, pp. 502–506, 1994. [11] M. Otake, Y. Nakamura, and H. Inoue, “Pattern Formation Theory for Electroactive Polymer Gel Robots,” in Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004, pp. 2782–2787. [12] M. Akay, M. Sekine, T. Tamura, H. Y., and T. Fujimoto, “Fractal dynamics of body motion in post-stroke hemiplegic patients during walking,” Journal of Neural Engineering, vol. 1, pp. 111–116, 2004. [13] R. Dickstein, S. Hocherman, T. Pillar, and R. Shaham, “Stroke rehabilitation. Three exercise therapy approaches,” Physical Therapy, vol. 66, pp. 1233–1238, 1986. [14] M. Otake and Y. Nakamura, “Anatomical model of the spinal nervous system and its application to the coordination analysis for motor learning support system,” in Proceedings of the 2005 IEEE International Conference on Intelligent Robots and Systems, 2005, pp. 847–853. [15] S. L. Delp and P. J. Loan, “A computational framework for simulating and analyzing human and animal movement,” IEEE Computing in Science and Engineering, vol. 2, pp. 46–55, 2000. [16] J. Rasmussen et al., “Anybody - a software system for ergonomic optimization,” in Fifth World Congress on Structural and Multidisciplinary Optimization, 2003. [17] T. Komura and Y. Shinagawa, “Attaching physiological effects to motion-capture data,” in Proceedings of Graphics Interface, 2001, pp. 27–36. [18] Y. Nakamura et al., “Dynamic computation of musculo-skeletal human model based on efficient algorithm for closed kinematic chains,” in Proceedings of the 2nd International Symposium on Adaptive Motion of Animals and Machines, 2003.
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
1033
Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs a
Maria Gini, a,1 Jan Pearce b and Karen Sutherland c Dept of Computer Science and Engineering, University of Minnesota b Dept of Mathematics and Computer Science, Berea College c Dept of Computer Science, Augsburg College Abstract. We describe a proposed approach to increase diversity in undergraduate Computer Science programs and to encourage undergraduates majoring in fields such as health professions, business, art, and education to take more computer science courses. The approach is centered around using the Sony robot dogs AIBO in the classroom, starting at the very beginning of the computer science curriculum, as a tool to attract a diverse population of students to Computer Science, to introduce them to to fundamental concepts in computing, and to give them an appreciation of the importance of becoming competent with technology. Keywords. Undergraduate CS curriculum, Sony AIBO, women in CS
1. Introduction Women and minorities are underrepresented in Computer Science (CS) and related disciplines. CS departments have long had difficulties in both attracting and retaining female and minority students [5,21]. In the USA the percentage of bachelor’s degrees given to women increased from 43% in 1970 to 57% in 2001, but in fields such as engineering and science, women still lag behind [28]. Females earned 13% of the bachelor’s degrees in Computer Science in 1970. This increased to 37% in 1985, but has been declining ever since, to 28% in 1994 and 19% in 2001 [27]. To encourage undergraduates majoring in different areas, such as health professions, business, art, or education to consider a career in computer science or, at least, to take more computer science courses, we are developing a beginner course centered around the use of the Sony robot dog AIBO (http://www.sonystyle.com). We see the AIBO as a powerful tool for exposing students to concepts in Computer Science, for strengthening their programming skills and increasing their self-confidence, and for showing them how to use technology to solve real world problems. As adaptive technologies and medical devices become more pervasive, it is important that all students realize that learning more about computer technology is not only attainable but also advantageous and exciting. 1 Correspondence to: Maria Gini, Dept of Computer Science and Engineering, 200 Union St SE. Minneapolis, MN 55455. Tel.: +1 612 625 5582; Fax: +1 612 625 0572; E-mail: [email protected].
1034
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
The ability of the Sony AIBO robots to interact with people [13] through voice commands, their cuteness, their ability to move graciously, to follow orders given to them via voice commands or visual cards, to communicate with each other, and to grow from a puppy stage to a fully developed personality make them ideal for our goals. Activities we have done over the last year using AIBOs with people of different ages, from elementary school to graduate students, and of different backgrounds, from computer illiterate to expert programmers, have shown consistently that people get engaged with the AIBOs in a way that is hard to duplicate with any other technology we have tried (such as Web browsing, working with Lego-based robots, etc.)
2. Improving Recruiting and Retention of Women in CS Studies have shown that many students stereotype computer scientists almost exclusively as “geeky white males” [31,22] and that those not identifying with this label suffer an alienation that significantly impacts their continued progress towards a degree in computer science [32]. It is a damaging cycle that reinforces the male stereotype and significantly limits the number of people trained in computer science. In addition to the negative stereotypes associated with the field [9,14,16,21,30], factors that contribute to the difficulties associated with recruiting and retaining female students include a feeling of insufficient preparedness when starting their undergraduate degrees [6], and a feeling of isolation within the computer science community [26]. Creating a larger community of women is not simple, as observed in [22]. CarnegieMellon addressed the lack of “a critical mass of women, resulting in a shortage of new leaders” [12]. Their success and recommendations require much higher numbers of women in CS than many universities and colleges currently have. We believe we can mitigate some of these problems by offering a new entry point into the CS curriculum that is more appealing to women and that, at the same time, will help them fill any real or perceived gap in knowledge. Felder [8] reports a study of the effect of personality types on engineering student performance. Although it is known that in order to be successful in an engineering career, a student should experience learning styles other than only the one they prefer, Felder states “severe mismatches commonly occur between the teaching style of instructors and the learning style of their students.” The study shows a positive correlation between students who do not learn well in a lecture-based environment and both women and first generation college students. Many engineering courses are lecture-based, so to increase women representation it is important also to adopt different teaching styles. We will address specifically three of the major issues that affect women recruiting and retention: 1. Increase confidence by hands-on programming experience. Women often have little confidence in their computing ability [3]. This can result in more stress for female students, or women changing majors even when their class performance is above average. Beyer et al. state that one of the reasons for poor selfimage around computing is that women are less playful than men with computers. Robotics programming, by nature, is playful. Through the class experience we hope to give students experiences that “boost their self-confidence” [3] and that improve their self-perceptions about computer science and career goals.
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
1035
Furthermore, it is common wisdom that students gain confidence in their scientific endeavors by hands-on manipulation, and by seeing concrete effects of their work. It is our experience that robot programming teaches and reinforces a variety of skills, and that most students (and faculty!) enjoy using robots. Our goal is not to create robotics experts but to give students an opportunity to develop programming skills in an environment that gives quick and concrete feedback, and which is also fun. 2. Solve real world problems. We believe that focusing on solving problems, rather than just learning how to program the robot, will increase female initial interest, attracting students who do not see themselves as “computer scientists.” Men are often interested in how the computer works as an entity unto itself. They see little need to relate the computer to real-world settings or as a tool to solve other problems. This was very evident in the introduction of the AIBOs at Augsburg College. The men were looking for processor size, the memory stick, the battery, the sensors, reading the specs, while the female were interested in questions such as “If the ball was blue, would they be able to find it?”, “How can they hear me?”, “Do they walk down stairs?”. It has been shown that female retention in CS improves when the focus is not simply on the computer itself, but on the connections between computer science and other areas [22,19]. Finally, introducing students to research and problem solving early on is another recommendation for retaining female students [6,33]. We will addresses the need to introduce undergraduate women to computer science research by requiring them to solve in innovative ways real-world problems. 3. Work in groups Work by Carol Gilligan [15] suggested that women score differently on Piaget’s scale of moral development because they do not think in the same terms as men. In particular, Gilligan claimed that women are more oriented toward cooperation than competition. Recent studies indicate that “pair programming” [34] is an effective tool for increasing the retention of women in computer science classes [33]. The students will work together in group projects. The projects they will work on will be open-ended and may not be “solved” by the end of the class. They will be designed so that success will be measured by participating in the process and generating a workable solution to the problem.
3. Centering a Beginner’s Course on the Sony AIBO Robotics has become widely used in undergraduate programs as a way of introducing concepts in AI [25,10,7,24,20] and more in general in Computer Science [4]. We are not aware of any systematic study on the value of teaching robotics to improve learning computer science, but anecdotal evidence supports the teaching of robotics as a way to engage the students. What makes our proposed course different from other robotics courses is that (1) we use the robots with beginner students, (2) we center the coursework on using the Sony AIBO robots, and (3) we use a problem-based learning approach (PBL).
1036
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
3.1. Why a course for beginner students? Making the students excited about their first computing course by including creative contents and building bridges to other disciplines is recommended [18] as a way to encourage more students to consider majoring in CS. 3.2. Why the AIBO? The Sony AIBO comes with a rich set of sensors (camera, proximity, touch, microphones), a complex body (with 20 degrees of freedom), and preloaded software that allows it to interact with humans via voice and vision without the need for any programming. We believe they are specially suited for beginner courses because of their simple and intuitive use, but, at the same time, they will allow students to grow with them. As students become more knowledgeable about programming, they will reprogram the AIBOs and make them do more interesting movements and interactions. Compared to other robots used for educational purposes, the AIBO is more expensive than Lego-based robots, but its price is comparable to the price of the Amigobot and the Khepera [17], and significantly cheaper that the popular Pioneer [1]. Given the rich set of sensors, the integrated software, and the sophisticated mechanical construction of the AIBO, we believe it is a better deal. The AIBOs are not intimidating and are intuitive to use, yet they will allow us to expose students to sophisticated concepts in Computer Science and to develop interesting and challenging projects. An informal poll done with University of Minnesota women students suggested that they would prefer using an AIBO over using a Pioneer or other robots. Currently, many college-level robotics courses require group robot building projects using Lego robots with Handyboard controllers or the like. Many students, particularly women, are not drawn to soldering, shrink wrapping or low level programming in C. To the contrary, the AIBO is a commercial product, designed and built to be used by a broad population. Everything is enclosed so that the robot need not be checked for loose wires or incorrect connections each time it is run. This will allow the students to go further since they won’t have to spend a lot of time on finding sensors that work and tracking down faulty wires. The AIBO is conducive to research and experimentation not just for robotics but also for human interaction, AI, and programming. We intend to utilize the AIBOs for projects which emphasize human-computer interaction, which traditionally is an area of Computer Science quite appealing to women. Projects such as developing AIBO support for persons with physical disabilities will appeal to those CS students who are not excited about the Handyboard hardware. A unique feature of the AIBO is its autonomy. AIBO are programmed to explore their environment and to make autonomous decisions. We believe this is a very important feature for the course, since the students will understand the power of computing by understanding what it takes to make autonomous decisions, in the face of a complex and unknown real-world that can only be partially observed via sensors. Recent studies with adults [11] and with children [23] show that the majority of them views the AIBO as having mental states, life-like essence, and social communication skills. To the best of our knowledge, this is the first robot for which people have shown the type of psychological, cognitive, and emotional reactions they tend to have with pets [2].
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
1037
What makes the AIBO special for people is one of the reasons for selecting it as the platform for our course. The fact that it has a complex body with multiple sensors and can be programmed with varying level of sophistication is another important reason. We know that our students will have different levels of programming skills. With the AIBO no one will get scared by the complexity but, at the same time, no one will get bored. 3.3. Why Problem-Based Learning In problem-based learning, relevant problems are introduced at the beginning of the instruction and are used to provide the context and motivation for learning. As described in [29], “While no evidence proves that problem-based learning enhances academic achievement as measured by exams, there is evidence to suggest that PBL “works” for achieving other important learning outcomes. Studies suggest PBL develops more positive student attitudes, fosters a deeper approach to learning and helps students retain knowledge longer than traditional instruction.”
4. Course Material The course material is designed to both attract students to CS and to give them the initial background they need to succeed in further CS courses. The course will use the preloaded AIBO software and its scripting language, together with more sophisticated programming tools to program them for more complex tasks. We will start with Pyro [4], an easy to use programming system written in Python. We will then use extensions we have developed at the University of Minnesota as part of an AIBO class for advanced students. The extensions allow for additional control (such as controlling the LEDs on the robots, or controlling the ear motions), for proper execution of sequences of commands, and for read/write from sockets. The use of sockets will allow us to create clients for the AIBO in different programming languages, such a C, scheme, perl, or php. This in turn will allow the students who have already some programming experience to use their programming knowledge more effectively. We expect the preparation and background of the students to vary substantially, from students who have no knowledge of computing, except perhaps some computer literacy, to students who have already taken programming courses. The course will be driven by real world problems. By working on application areas where computer science is useful, we will shift the focus from teaching the technology to understanding the role of the technology in solving real problems. We believe this will help attract students who are interested in the long term results of using the technology more than in the technology itself. We will ensure that different application areas are included in the course to make it appealing to all students, including those interested in human factors, social systems, psychology, cognitive science, and animal behaviors. We will, in particular, focus on developing functionalities in the area of speech recognition and other forms of human interaction. Beginner students do not know much programming, but are very interested in natural ways of interacting with the robots. By providing relevant software to support the interactions, we will give the students a better appreciation of the need for software and some understanding of the process by which software is created.
1038
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
Topic: Steps:
Topic: Steps:
Include AIBO in an evaluative game to assess Attention Deficit Hyperactive Disorder (ADHD) children. 1 Learn about methods to assess ADHD. Find an expert to help you. Design a simple game using the AIBO around an existing method used to assess ADHD. Evaluate the game with someone not from your group. Ask an expert to give you feedback. Include AIBO in a play activity that your grandmother would enjoy doing. Think about how an elderly person would like to interact with a pet. Find an expert to help you. Design software for the AIBO to engage in a simple interaction game. Evaluate the results with someone not from your group. Ask an expert to give you feedback. Table 1. Sample projects.
The activities for the course will be centered around lectures and laboratory activities. The objective of the lectures is to expose students to relevant material related to the programming of the dogs and to related Computer Science concepts. The objective of the laboratory activities is to work on group projects. Students will work in groups of 3-4 students each. The groups will be organized to mix level of skills and of academic interests in each group. The projects will be interdisciplinary to appeal to the broad background of the students. The projects will run through the semester and will be demonstrated at the end of the semester to an outside audience, such as local girl scout troops or other groups. This will put the students who attended the course in the role of outreaching out to the community and becoming role models for younger students. Table 1 provides some examples of the types of projects we will propose to the students.
5. Conclusions We have presented a proposed curriculum for a beginner course designed to increase diversity in Computer Science undergraduate programs. The course is centered on the use of the Sony AIBO as a tool for attracting new students to study computer science and for introducing them to fundamental concepts in Computer Science. The unique features of the AIBO in terms of its ability to interact with humans makes it an attractive platform for the students we wish to reach. The main idea is to help students to develop a deeper understanding of the Computer Science field and to appreciate the role of technology, not to make them experts in robotics. The ultimate goal is to make students more interested and better prepared for further courses by providing them with an enjoyable learning experience, by exposing 1 Thanks
to Laurel Lewis for suggesting this problem.
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
1039
them to the excitement of being able to come up with solutions to real-world problems, and ultimately by helping them to take increasing responsibility for their own learning.
Acknowledgments Work supported in part by the National Science Foundation under cooperative grants DUE-0511304, DUE-0511352, and DUE-0511282.
References [1] http://www.activrobots.com, 2004. [2] A. Beck and A. Katcher. Between pets and people. Purdue University Press, 1996. [3] S. Beyer, K. Rynes, J. Perrault, K. Hay, and S. Haller. Gender differences in computer science students. In Proc. of the 36th SIGCSE Technical Symposium on Computer Science Education, pages 49–53, 2003. [4] D. Blank, L. Meeden, and D. Kumar. Python robotics: An environment for exploring robotics beyond legos. In Computer Science Education Conference (SIGCSE), 2002. [5] T. Camp. The incredible shrinking pipeline. Comm. of the ACM, 40(10):103–110, Oct. 1997. [6] J. M. Cohoon. Recruiting and retaining women in undergraduate computing majors. In Proc. of the 36th SIGCSE Technical Symposium on Computer Science Education, pages 48–52, 2002. [7] A. P. Danyluk. Using robotics to motivate learning in an AI course for non-majors. In Accessible Hands-on Artificial Intelligence and Robotics Education, AAAI Spring Symposium, 2005. [8] R. Felder, G. Felder, and E. Dietz. The Effects of personality type on engineering student performance and attitudes. Journal of Engineering Education, 91(1):3–17, 2002. [9] A. Fisher, J. Margolis, and F. Miller. Undergraduate women in computer science: Experience, motivation, and culture. In Proc. SIGSCE Technical Symposium on Computer Science Education, pages 106–110, Feb. 1997. Published in SIGSCE Bulletin, Vol. 29, N. 1, March 1997. [10] S. E. Fox. Using robotics to introduce AI topics to a wider audience. In Accessible Hands-on Artificial Intelligence and Robotics Education, AAAI Spring Symposium, 2005. [11] B. Friedman, P. H. J. Kahn, and J. Hagman. Hardware companions? what online AIBO discussion forums reveal about the human-robotic relationship. In CHI, 2003. [12] C. Frieze and L. Blum. Building an effective computer science student organization: The Carnegie Mellon Women@SCS action plan. SIGCSE Bulletin, 34(2), June 2002. [13] M. Fujita. AIBO: Towards the era of digital creatures. International Journal of Robotics Research, 20(10):781–794, Oct. 2001. [14] O. N. Garcia and R. Giles. Research foundations for improving the representation of underrepresented minorities in the information technology workforce, June 2000. www.cise.nsf.gov/itminorities/it_minorities_final_report.pdf. [15] C. Gilligan. In a different voice: Psychological theory and women’s development. Harvard University Press, 1982. [16] W. Haliburton. Gender differences in personality components of computer science students: A test of Holland’s congruence hypothesis. In Proc. SIGSCE Technical Symposium on Computer Science Education, pages 77–81, Feb. 1998. Published in SIGSCE Bulletin, Vol. 30, N. 1, March 1998. [17] http://www.k-team.com/robots/khepera/, 2004. [18] M. Klawe. Blue skies ahead for it jobs. CIO Magazine, Dec 1, 2005.
1040
M. Gini et al. / Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs
[19] M. Koch. No girls allowed!! Technos Quarterly, 3(3), Fall 1994. [20] D. Kumar and L. Meeden. A robot laboratory for teaching Artificial Intelligence. In D. Joyce, editor, Proc. of the Twenty-ninth SIGCSE Technical Symposium on Computer Science Education. ACM Press, 1998. [21] M. Mannix. Getting IT right. Prism, pages 15–20, Mar. 2001. [22] J. Margolis and A. Fisher. Unlocking the Clubhouse: Women in Computing. MIT Press, 2002. [23] G. F. Melson, P. H. Kahn, A. M. Beck, B. Friedman, T. Roberts, and E. Garrett. Robots as dogs?: children’s interactions with the robotic dog AIBO and a live australian shepherd. In CHI, pages 1649–1652, 2005. [24] D. P. Miller. Using robotics to teach computer programming and AI concepts to engineering students. In Accessible Hands-on Artificial Intelligence and Robotics Education, AAAI Spring Symposium, 2005. [25] R. Murphy. Introduction to AI Robotics. MIT Press, 2000. [26] Adviser, teacher, role model, friend: On being a mentor to students in science and engineering. National Academy Press, 1997. [27] Women, minorities, and persons with disabilities in Science and Engineering: 2004, 2004. [28] Trends in educational equity for girls and women. National Center for Education Statistics, 2004. [29] M. Prince. Does active learning work? a review of the research. Journal of Engineering Education, pages 223–231, July 2004. [30] M. G. Sackrowitz. An unlevel playing field: Women in the introductory computer science courses. In Proc. SIGSCE Technical Symposium on Computer Science Education, pages 37– 41, Feb. 1996. Published in SIGSCE Bulletin, Vol. 29, N. 1, March 1996. [31] E. Spertus. Why are there so few female computer scientists? Technical Report 1315, MIT Artificial Intelligence Laboratory, Aug. 1991. [32] V. Valian. Why So Slow? The Advancement of Women. MIT Press, Cambridge, MA, 1998. [33] L. L. Werner, B. Hanks, C. McDowell, H. Bullock, and J. Fernald. Want to increase retention of your female students? Computing Research News, 17(2), March 2005. [34] L. Williams and R. R. Kessler. All I really need to know about pair programming I learned in kindergarten. Communications of the ACM, May 2000.
1041
Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Author Index Adachi, R. 804 Adamatzky, A. 3 Alici, G. 375 Antonelli, G. 235 Arai, T. v, 165, 349, 542, 574, 622 694, 740, 839, 921, 946, 954, 991 Arrichiello, F. 235 Asa, K. 776 Asada, M. 702 Asama, H. 632, 891 Aspert, F. 713 Aviña Cervantès, G. 408 Baba, A. 145 Balch, T. v Ban, S.-W. 57 Battaglia, F. 104 Bayro-Corrochano, E.J. 399 Behnke, S. 676 Bischoff, R. 124 Bonani, M. 433 Bongard, J. 11 Bouguerra, A. 391 Bovet, S. 525 Broxvall, M. 831 Buchheim, T. 651 Burkhardt, H. 181 Capezio, F. 41 Capi, G. 416 Carpin, S. 199 Carrozza, M.C. 965 Castelnovi, M. 49 Chatila, R. 145 Chiaverini, S. 235 Chiba, R. 349 Cottret, M. 534 D’Angelo, A. 640 Dan, C. 243 Dario, P. 965 Davesne, F. 425 Devy, M. 408, 534 Dillmann, R. 383, 748, 856 Doki, K. 999 Dorigo, M. 433, 487
Duan, F. Effinger, R. Eggenberger Hotz, P. Everist, J. Ferguson, D. Floreano, D. Fournier, E. Fujii, M. Fujii, N. Fujiki, T. Fukano, R. Fukuda, T. Funato, T. Garrido, S. Gaussier, P. Gini, M. Gomez, F. Gómez, G. Gonzalez-Aguirre, D.I. Graefe, V. Gritti, M. Groß, R. Hada, Y. Halawani, A. Hamaoka, K. Harris, N. Hashimoto, H. Hashimoto, K. Hashimoto, M. Hayashi, J. Hazan, A. Hernández A., A. Hernandez, A. Hirata, Y. Homma, K. Horiguchi, Y. Hoshino, S. Hosoe, S. Hou, F. Hu, H. Hutter, M. Iba, H. Iida, F.
542 595 298 207 65, 114 153 326 290 463, 497 632 756 471 605 94 326 550, 1033 272 298, 443 399 124 831 433, 487 891 181 227 930 505 1015 135 756 425 921, 954 298 973 981 883, 903 505 30 207, 455 217, 930 272 318 566, 820
1042
Ikemoto, Y. 471 Inaba, M. 732, 767, 786, 804, 875 Inamura, T. 732, 786, 875 Inokuchi, S. 1007 Inoue, H. 732 Inoue, J. 686 Inoue, T. 605 Isetani, N. 999 Ishiguro, A. 585 Ishikawa, M. 1015 Ishimura, K. 776 Ishino, A. 660, 686 Ishiwaka, Y. 865 Itoh, K. 965 Izumi, K. 913 Jarvis, R. 189 Jayawardena, C. 913 Jinnouchi, Y. 1007 Junco, Ma. de los Á. 668 Kakazu, Y. 255, 290 Kalra, N. 114 Kambara, H. 938 Kamijima, S. 365 Kamio, S. 318 Kaneko, T. 75 Kaneko, Y. 515 Kanzaki, S. 804 Käppeler, U.-P. 651 Kassahun, Y. 263 Katada, Y. 357 Katane, T. 812 Kato, R. 946, 991 Katoh, R. 954 Kaufmann, P. 443 Kawabata, K. 632, 891 Kawabe, T. 165 Kawakami, T. 255 Kayawake, M. 75 Kim, J.B. 831 Kim, K.S. 938 Kimura, F. 740 Kinoshita, M. 255 Kita, K. 991 Kitano, I. 334 Knoop, S. 856 Kobayashi, H. 660 Kobayashi, M. 497 Kobayashi, Y. 30
Kobayasi, Y. Koike, Y. Kojo, N. Kosuge, K. Kotani, K. Koudu, D. Koyasu, H. Kubo, M. Kubota, N. Kumagai, M. Kuniyoshi, Y. Kurabayashi, D. Kuraoka, H. Kurisu, M. Lafrenz, R. Laroque, P. Lee, M. Levi, P. Liaw, H.C. Lipson, H. Loh, C.S. Lungarella, M. Maaref, H. Maeda, Y. Maegawa, T. Malone, E. Marzouqi, M. Mastrogiovanni, F. Matsuba, H. Matsushita, K. Matubara, T. Melo, F.S. Minekawa, Y. Miura, J. Miwa, H. Miyamoto, N. Mizuuchi, I. Mondada, F. Moreno, L. Morimoto, A. Moriya, T. Müller, U. Murayama, T. Nagata, N. Nakanish, Y. Nakanishi, Y. Nanjanath, M. Nikaido, M.
865 938 875 973 740, 756 365 85 243 365 812 558 605 463 613 651 326 57 651 375 11 839 558 425 165 585 11 189 173 585 574, 839 243 282 820 85 965 1007 767, 786 433 94 891 165 383 135 1007 767 786 550 740
1043
Nishi, T. Nishino, N. Nogawa, Y. Noguchi, H. Nouyan, S. Nukariya, Y. Nunnink, J. Oba, F. Oda, S.H. Oetomo, D. Ogata, S. Ogawa, N. Ohashi, Y. Ohkura, K. Ohnishi, T. Okada, K. Oku, H. Ono, H. Ota, J. Otake, M. Otsu, N. Ozaki, K. Pagello, E. Pavlin, G. Payne, K. Pearce, J. Pfeifer, R. Phong, P.H. Pitti, A. Rajaie, H. Ramel, G. Ramírez, J. Ribeiro, I. Robertson, P. Roccella, S. Rummel, J. Ryde, J. Saffiotti, A. Saitou, O. Sako, S. Sarata, S. Sarwar, S. Sato, H. Sato, M. Sato, T. Sato, Y. Sawaragi, T. Schmidhuber, J.
702 515 365 756 433 965 722 135 515 375 135 1015 308 357 921, 954 732, 804 1015 883 349, 505, 622, 740 756, 1025 756 847 199, 640 722 207 1033 v 326 558 651 713 668 282 595 965 566, 820 217 831 812 756 613 812 243 883, 938 19, 341, 756 740 883, 903 272
Schmidt-Rohr, S.R. Schreiber, F. Schröder, J. Schwenk, J. Sekine, M. Sekiyama, K. Semere, J. Seo, B.S. Seto, F. Setola, R. Seyfarth, A. Sgorbissa, A. Shen, W.-M. Shikanai, Y. Shimizu, M. Shin, D. Shin, S. Shinohara, A. Shinozaki, A. Shirinzadeh, B. Sian, N.E. Siegwart, R. Sommer, G. Stasse, O. Steinhaus, P. Stentz, A. Stückler, J. Sugi, M. Sutherland, K. Suzuki, H. Suzuki, K. Suzuki, M. Suzuki, S. Svinin, M. Takahashi, Y. Takamasu, K. Takanishi, A. Takanobu, H. Takasu, R. Takenaka, T. Takeshita, K. Tamimi, H. Tamura, T. Tamura, Y. Taniguchi, K. Tapus, A. Taura, K. Tian, G.
856 651 383 676 812 308 794 831 973 235 566, 820 41, 49, 173 207, 455 847 585 938 740 660, 686 505 375 794 104, 713 263 794 748 65, 114 676 740, 756 1033 740 334, 341 153 479 30 702 740 965 965 497 479 694 181 812 740 365 104, 713 756 542
1044
Torii, A. Trevai, C. Tsubouchi, T. Tsukamoto, S. Ueda, A. Ueda, K. Ueda, R. van der Blij, J. Vega, J.L. Vigneron, V. Wada, M. Walther, M. Watanabe, K. Williams, B. Won, W.-J. Yajima, A. Yamakita, M. Yamamoto, A. Yamamoto, S.
999 622 613 883 999 463, 479, 497, 515 694 153 668 425 227, 776 748 913 595 57 605 487 740 847
Yamashita, A. Yokoi, H. Yokoi, K. Yokota, M. Yoneda, R. Yoshii, S. Yoshikai, T. Yoshimi, T. Yossawee, W. Yu, W. Zaccaria, R. Zecca, M. Zell, A. Zhou, H. Zhumatiy, V. Zweigle, O. Zykov, V.
75 v, 574, 839, 921, 946 954, 991 794 416 756 290 767, 786 794 613 812, 954 41, 49, 173 965 181 930 272 651 11
This page intentionally left blank
This page intentionally left blank