Intelligent robots and systems: selections of the International Conference on Intelligent Robots and Systems 1994, IROS 94, Munich, Germany, 12-16 September 1994

This Page Intentionally Left Blank Selections of the International Conference on Intelligent Robots and Systems I994,...

Author: V. Graefe

25 downloads 842 Views 64MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

This Page Intentionally Left Blank

Selections of the International Conference on Intelligent Robots and Systems I994, IROS 94, Munich, Germany, 12-i6 September i 9 9 4

E D I T E D BY

V o T.K~.R G RAE F E

Institut ffir Messtechnik Universitgt der B W Mi~nchen Neubiberg, Germany

1995 ELSEVIER A m s t e r d a m - Lausanne - New Y o r k - O x f o r d - Shannon - Tokyo

ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 2II, IOOO AE Amsterdam, The Netherlands

Library oF C o n g r e s s

Cataloging-In-Publication

Data

IEEE/RSJ/OI Internationa] ConFerence on I n t e l l i g e n t R o b o t s and Systems (1994 : M u n i c h , Oermanv) Inte]ligent r o b o t s and s y s t e m s : s e l e c t i o n s oF t h e I n t e r n a t i o n a ] C o n F e r e n c e on I n t e l l i g e n t R o b o t s and Systems 1994, IROS 9 4 , M u n i c h , Oerman#, 12-16 S e p t e m b e r 1994 / e d i t e d bv V o l k e r O r a e ~ e .

p. cm. Includes bibliographical references. ISBN 0-444-82250-X I. Robotics--Congresses. 2. Inte111gent control systems-Congresses. I. OraeFe, Volker, 1938. II. Title.

TJ210.3.I447 629.8'92--dc20

1994a

95-36630 CIP

ISBN: o 444 82250 x 9 i995 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 52i, iooo AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA, oi923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science BV, unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands.

PREFACE In September 1994 the seventh International Conference on Intelligent Robots and Systems- IROS '94-- was held in Neubiberg, a suburb of Munich, Germany, on the premises of the German Armed Forces University Munich. Of the 300 papers presented during the conference I have selected 48 ones which, in my subjective opinion, are particularly significant and characteristic for the present state of the technology of intelligent robots and systems. This book contains the selected papers in a revised and expanded form. Robotics and intelligent systems constitute a very wide and truly interdisciplinary field. The papers in this book have been grouped into the following categories: Sensing and Perception Learning and Planning Manipulation Telerobotics and Space Robotics Multiple Robots - Legged Locomotion - Mobile Robot Systems Robotics in Medicine -

-

-

-

-

-

They cover, however, many additional fields, e.g. control, navigation, and simulation, to mention only a few. When preparing the conference, and this book, I found that most researchers in robotics are now apparently interested in some combination of learning, mobile robots, and robot vision. Consequently, most of the articles in this book relate to at least one of these fields. A book like this is always the result of many persons' efforts. Special thanks are due to Mr Michael LiJtzeler who helped with the compilation of the book by creating and maintaining various data bases, and to my secretary, Mrs Sigrun Haussmann, who among other things, helped some of the authors to improve the style and grammatical correctness of their texts. Thanks also to Ms Drs. Eefke Smit &Elsevier Science BV for competently and patiently managing all matters related to the publishing of the book.

Neubiberg, March 1995 Volker Graefe

This Page Intentionally Left Blank

vii

TABLE OF CONTENTS

Preface

SENSING AND PERCEPTION

A new High-performance Multisonar System for Fast Mobile Robot Applications U.D. Hanebeck and G. Schmidt Proper Selection of Sonar and Visual Sensors in Vehicle Detection and Avoidance N. Moghaddam Charkari, K. Ishii and H. Mori

15

Selective Refinement of 3-D Scene Description by Attentive Observation for Mobile Robot H. Takizawa, Y. Shirai and J. Miura

29

Active Vision Inspired by Mammalian Fixation Mechanism T. Yagi, N. Asano, S. Makita and Y. Uchikawa

39

Realtime Range Imaging Using An Adjustment-free Photo VLSI A. Yokoyama, K. Sato, T. Yoshigahara and S. Inokuchi

49

A Pedestrian Discrimination Method Based on Rhythm S. Yasutomi, H. Mori and S. Kotani

61

Visual Recognition of Obstacles on Roads U. Regensburger and V. Graefe

73

Visual Collision Avoidance by Segmentation I. Horswill

87

LEARNING AND PLANNING

Using Perceptions to Plan Incremental Adaptions A.J. Hendriks and D.M. Lyons Robot Task Programming by Human Demonstration: Mapping Human Grasps to Manipulator Grasps Sing Bing Kang and Katsushi Ikeuchi Robot Learning By Nonparametric Regression S. Schaal and C.G. Atkeson

101

119

137

~ 1 7 6

VIII

Behavioral Control in Mobile-Robot Navigation Using a Fuzzy Decision Making Approach H.R. Beom, K.C. Koh and H.S. Cho Learning Emergent Tasks for an Autonomous Mobile Robot D. Gachet, M.A. Salichs and L. Moreno Efficient Reinforcement Learning of Navigation Strategies in an Autonomous Robot J. del R. Milldn and C. Torras

155 169

185

A Lifelong Learning Perspective for Mobile Robot Control S. Thrun

201

A Multilevel Learning Approach to Mobile Robot Path Planning F. Wallner, M. Kaiser, H. Friedrich and R. Dillmann

215

A Self-Organizing Representation of Sensor Space fi~r Mobile Robot Navigation B.J.A. KrOse and M. Eecen

229

Path Planning and Guidance Techniques for an Autonomous Mobile Cleaning Robot C. H ~ w r and G. Schmidt

241

MANIPULATION

Grasp Strategies for a Dextrous Robotic Hand K. Woelfl and F. Pfeiffer

259

Motion Scheme for Dextrous Manipulation in the Intelligent Cooperative Manipulation System - ICMS M. Buss and H. Hashimoto

279

Designing a Behavior of a Mobile Robot Equipped with a Manipulator to Open and Pass through a Door K. Nagatani and S. Yuta

295

The Development of Re-usable Software Systems for Intelligent Autonomous Robots in Industrial and Space Applications E. Freund and J. Roflmann

311

TELEROBOTICS AND SPACE ROBOTICS

Toward Integrated Operator Interface for Advanced Teleoperation under Time-Delay A.K. B~jczy, P. Fiorini, W.S. Kim and P.S. Schenker

327

Active Understanding of Human Intention by a Robot through Monitoring of Human Behavior T. Sato, K Nishida, J. Ichikawa, K Hatamura and H. Mizoguchi

349

Human/machine Sharing Control for Telerobotic Systems T.-J. Tarn, N. Xi, C. Guo and A.K. Bejczy

373

Task-Directed Programming of Sensor-Based Robots B. Brunner, K. Arbter and G. Hirzinger

387

Telerobotics with Large Time Delays - the ROTEX Experience G. Hirzinger, K. Landzettel and Ch. Fagerer

401

Feature-Based Visual Servoing and its Application to Telerobotics G.D. Hager, G. Grunwald and K. Toyama

415

Practical Coordination Control Between Satellite Attitude and Manipulator Reaction Dynamics Based on Computed Momentum Concept K. Yoshida

431

A 5-Axis Mini Direct Drive Robot for Teleoperation and Biotechnology M. Moreyra, P.-H. Marbot, S. Venema and B. Hannaford

445

A Laboratory Test Bed for Space Robotics: The VES II S. Dubowsl,y, W. DurJ~)e, T. Corrigan, A. Kuklinski and U. MOiler

463

MULTIPLE ROBOTS

Hierarchical Prediction Model for Intelligent Communication in Multiple Robotic Systems K. Sekiyama and T. Fukuda

477

Proposal for Cooperative Robot "Gunryu" Composed of Autonomous Segments S. Hirose, T. Shirasu and E.F. Fukushima

491

Unified Control for Dynamic Cooperative Manipulation K Kosuge, H. Yoshida, T. Fukuda, M. Sakai, K. Kanitani and K. Hariki

501

LEGGED LOCOMOTION Comparative Study of Adaptive Controllers for a Pneumatic Driven Leg M. Guihard

513

An Efficient Forward Gait Control for a Quadruped Walking Robot D.J. Pack and A.C. Kak

533

The Six-Legged TUM Walking Robot H.-J. Weidemann and F. Pfeiffer Vision-Based Adaptive and Interactive Behaviors in Mechanical Animals Using the Remote-Brained Approach M. Inaba, S. Kagami, T. Ishikawa, F. Kanehiro, K. Takeda and H. Inoue

549

559

MOBILE ROBOT SYSTEMS

Autonomous Sonar Navigation in Indoor, Unknown and Unstructured Environments W.D. Rencken

577

Control and Localisation of an Autonomous Mobile Vehicle M. Adams, N. Tschichold-Garman, S. Vestli and D. yon FlOe

589

The Robotic Travel Aid "HITOMI" H. Mori, S. Kotani and N. Kiyohiro

607

Vision-based Naviation for an Office Messenger Robot T. Gomi, K. Ide and P. Maheral

619

Trajectory Tracking Control for Navigation of the inverse Pendulum Type Self-Contained Mobile Robot Y.-S. Ha and S. Yuta

637

An Outdoor Robots System for Autonomous Mobile All-purpose Plattbrm T. Okui and Y. Shinoda

657

Mechanism and Control of Propeller Type Wall-Climbing Robot A. Nishi and H. Miyagi

669

Automated Guided Vehicles in Japan T. Tsumura

679

ROBOTICS IN MEDICINE Robotics in Medicine P. Dario, E. Guglielmelli and B. Allotta Planning, Calibration and Collision-Avoidance for Image-Guided Radiosurgery A. Schweikard, M. Bodduluri, R. Tombropoulos and J.R. Adler

691

721

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

A new High Performance Multisonar System for Fast Mobile Robot Applications Uwe D. Hanebeck and Giinther Schmidt Department for Automatic Control Engineering (LSR) Technische Universit~t Miinchen 80290 Miinchen, Germany A new type of actively sensing multisonar system has been developed for obstacle detection, localization and map updating in indoor environments. It comprises 24 individual intelligent transmitter and receiver elements arranged in a horizontal plane around a high-speed motion platform. The physical sensors are electronically configured to form various types of virtual sensor arrays for specific tasks and situations. A method for high-frequency firing compensates for the low speed of sound in air. Various schemes for intelligent evaluation of the array outputs provide highly accurate localization estimates. Experiments with the developed multisonar system in structured and cluttered environments demonstrate improved perception capabilities compared to state-of-the-art approaches. 1. I n t r o d u c t i o n

Ultrasonic sensor systems in mobile robot applications serve three major objectives: 9 detecting collisions with unexpected, possibly mobile obstacles, 9 localizing known environmental structures, 9 updating an internal map with previously unknown objects. Different approaches have been reported. One class of systems uses one rotating time-offlight sonar sensing device to obtain a 360 degree scan of the environment [4]. The width of the sonar beam causes a blurred image. Attempts to narrow the beam by phased arrays [2], [6] or horn constructions [3] have also been undertaken. Mechanical positioning makes these systems slow and inadequate for collecting data during fast motion. In another class of systems several (20-30) individual time-of-flight sensors surround a robot. Fixed firing schemes independent of mission and situation are employed [7], [13]. Undesired mutual influences of the sensors result in slow firing rates and make error correction algorithms necessary [1]. Crosstalk effects are eliminated by identifying and rejecting pulses received from other sensors. The fastest system known to the authors achieves a sampling interval of 60 ms for each sensor with a maximum range of 2.5 m [1]. The separate evaluation of sensor outputs results in an unsatisfactory angular resolution. This feature complicates many sensory tasks, like determining the vehicle's own position with respect to a landmark or the position of a moving obstacle. While the second class of systems tries to avoid crosstalk effects, another class takes advantage of these phenomena. Sparse sensor arrays are used and the characteristics of sound propagation are taken into account. This approach permits determination of the normal directions of walls [10], I12] or tracking

of moving obstacles as discussed in [8]. The need for mechanical aiming renders these systems too slow for the intended application. The sensor system discussed in this paper is used with an omnidirectional locomotion platform as shown in Figure 6. For this application, omnidirectional measurement capabilities are required. In addition, the system is expected to concurrently perform various sensory tasks around the mobile robot. For example, localizing of a wall is performed while reliably scanning the current travelling direction. Data has to be provided at a sufficiently high rate to guide a fast (2 m/s) mobile robot with steering commands generated every 10 ms. The proposed system which adopts useful aspects of today's sensor systems is discussed in detail in Section 2. A method to alleviate complicated cross-talk error correction schemes is presented. High-frequency firing schemes compensate for the low speed of sound in air. Furthermore, it is shown how the system is dynamically configured during operation. Section 3 presents algorithms for extracting several relevant environmental features with the proposed system. Section 4 introduces a current prototypical implementation. Experiments with the multisonar system mounted on a fast motion platform are described in Section 5. Compared to state-of-the-art systems the new approach gives improved perception capabilities and simplifies subsequent algorithms like obstacle detection and localization. The techniques presented in this paper apply to what is called conformal arrays in radar technology, i.e. sensor arrays which conform with a given vehicle geometry. 2. S y s t e m O v e r v i e w The proposed multisonar system is designed for operation in office or laboratory type indoor environments. Unknown objects like humans, chairs or trash cans must be detected in a wide surveillance area around the mobile robot even during fast motion. The estimate of the obstacle positions should, nevertheless, be accurate enough to avoid blocking of narrow passages. Section 2.1 details on a method that allows combined firing of many sensors while avoiding cross-talk errors. We assume that static structures such as walls, edges and doorframes are stored in a 2D map as natural landmarks. These a-priori known features are used for localization estimation. A fast-firing method that takes advantage of this prior knowledge is introduced in Section 2.2. The sampling rate is increased far beyond the limits usually imposed by the low speed of sound in air. To allow for several tasks to be performed concurrently, only the most appropriate subsets of sensors are used. This form of dynamic resource management is discussed in Section 2.3. 2.1. M e t h o d of V i r t u a l Point Source Moving with high speed in a cluttered hallway requires early detection of the closest unknown obstacles and the determination of their positions. Usually several sensors are used separately for this purpose. Disturbing crosstalk effects require elaborate error correction schemes. A more appropriate configuration for this application uses one single transmitter with the desired beam form and a receiving array configured of several receivers in order to evaluate incoming echos as reported in [8]. A typical surveillance area is shown in Figure 1 a). Since the objective is to use only transmitters located on the

Figure 1. a) Typical surveillance area. b) Combining physical transmitters to form a virtual transmitter. vehicle surface, this beam shape cannot be achieved with a single element. A virtual transmitter with the desired characteristics is obtained by appropriately phasing several adjacent transmitter elements with the main axis intersecting at the origin, Figure 1 b). For developing an adequate firing scheme, we assume a virtual circular wave which is starting from the origin. When the virtual wave front passes a physical transmitter, an ultrasonic pulse is fired. For M transmitters involved, an index vector I can be defined such that CIi < CIi+l, i = 1, 2, . . . , M -

1,

(1)

with Ci the distance from transmitter i to the origin. The time differences ti between firing transmitters Ii and/i+1 are given by ti =

CIi+~ - CIi V~

, i-

1, 2, . . . , M -

1,

(2)

with V~ the velocity of sound in air. This idea result,, in the following firing scheme: FORi=I,2

..... M-1

~

Fire S e n s o r I i Wait t~ Fire S e n s o r I M

! i

Let Tj denote the time difference between firing of the first sensor 11 and detection of an echo at receiver j, j = 1, 2, . . . , N, where N is the number of receivers. The distances Rj from virtual transmitter to the object and back to receiver j are then given by n j : T j . Ys -Jr-CI1, j • 1, 2, . . . , N . (3)

Figure 2. Fast firing

combining windowing and interlacing.

The number M of neighbouring transmitters employed depends on the desired beam width. Since the physical sensors' main axis intersect at the origin, superposition of the pulses r o u g h l y a p p r o x i m a t e s one pulse of a single centrally located virtual transmitter. Since we assume only one single pulse stemming from the origin, no decision on which physical transmitter produced the pulse detected by an individual receiver is required. Rotation of the beam to another focus of attention is achieved by electronically shifting the sensor arrangement. 2.2. Fast F i r i n g It is common engineering practice to focus on a certain interval around a nominal or predicted measurement value. For example, prior knowledge from a map could be used in conjunction with an estimated position to a-posteriori discard inconsistent measurements. However, we propose to go one step further and use prior knowledge at an earlier stage, i.e. during the data gathering process. Echos are received only in certain precalculated time-slots. This scheme is also applicable to reflectors not represented in the map. In this case, for windowing the distance predicted from previous data is used. Missing an obstacle occluding the expected reflector is impossible because the firing scheme is changed when the predicted time-slot is empty for several times. Due to the low speed of sound in air, the elapsed time between transmitting a pulse and receiving the echo is approximately 3 ms per meter travelled. With longer distances, interlaced transmitting and receiving becomes more and more appropriate. Thus, windowing and interlacing are combined in a highly effective manner. After initializing with a standard firing method, i.e. determining the expected echo return time T~, we receive the echo from the previous pulse after the next pulse has been transmitted. A window of length Tv + TN, with Tv = TN, is centered around the expected time of arrival. The most straightforward scheme for one sensor is shown in Figure 2. The pause time Tpkus~ is given by

Tpkus~ - T~ - 2. Tdead- 2 . Tv -- T N .

(4)

The expected echo return time Tk is updated with the knowledge from every new sample. Td~d corresponds to the dead zone of the transmitter/receiver-combination. Tv = TN = 0.5 ms and Td~ad = 1.5 ms results in the firing rate 1/T~ with

T~ = T E - T d ~ d - Tv

~

d m

- - "

6 m s - 2 ms.

(5)

N "1-

)

'~--~ 4 5 5 .i....,

C U_

250

loo i

45!

. . . .

i ..........

t + _ _ •

!....

0.7

~

L

i

1

*

2

>

4 Distance d / m

Figure 3. Firing rate for fast firing of first order as a function of distance.

The dependency of the firing rate on distance d is shown in Figure 3. The order of the fast firing scheme is defined according to the number of pulses interlaced in the pause times. With simple first order fast firing, sampling frequencies above 100 Hz are achieved for distances of up to 2 m. Thanks to windowing, the scheme is invulnerable to noise and secondary reflections. It is applicable for d > (2. Tdead -4- 2" Tv + TN)" V~/2, in this case approximately 70 cm. If the tracking process breaks down because the object is moving too fast, the system restarts with initialization measurements. Fast firing of higher order is achieved by interlacing more pulses during the pause times.

2.3. Dynamic Reconfiguration The 24 transmitter and receiver elements arranged on a horizontal plane around the fast mobile robot constitute a homogeneous multisonar system. They are, however, combined to a variety of virtual sensing arrays depending on tasks and situations. This type of operation is similar to a concept described in [5]. Since virtual sensors are aimed electronically, there is no need for moving parts. All virtual sensor arrays consist of N arbitrarily distributed receiving elements. W h e n the angular region of interest is wide, a virtual point source combining several physical transmitters is used to illuminate the scene. On the other hand, when prior knowledge on reflector positions is available, the angular region of interest is narrow and one physical transmitter may suffice. 3. O b j e c t L o c a l i z a t i o n Based on the concept of dynamic reconfiguration, virtual sensors are formed from several physical transmitting and receiving elements. In general, we assume a virtual point source located at the vehicle's origin and an array of N arbitrarily distributed receivers. For the sake of simplicity, only two types of reflecting surface primitives are considered here: i) line reflector ~ walls, objects with small curvature, ii) point reflector --~ edges, doorframes. Both reflectors represent classes of real targets within the 2D sensor plane.

Figure 4. Measurement model for a) line type reflector b) point type reflector.

Given a reflector hypothesis and N measured distances Ri, i -- 1, 2 , . . . , N, an advanced object localization scheme should 9 allow for arbitrary receiver distribution, 9 provide position estimates with high angular accuracy, 9 discard inconsistent measurements, 9 quantify the estimates' uncertainty, 9 use redundancy: N receivers -~ N equations, 9 be valid for near/far-field reflectors, 9 be simple, linear, and non-iterative. The derivation of schemes that fulfill the above set of requirements is given next. 3.1. Line R e f l e c t o r Assuming specular reflection and introducing the mirror image of the transmitter, the following relationship can be obtained from Figure 4 a). R 2 - (C[ - 2x) z 4 - ( C ~ - 2y) 2 , i - 1, 2 , . . . , i

(6)

with c; = c,

= e,

(7)

With r 2 = x 2 + y2 the set of Eqs. (6) is rewritten in vector/matrix form

z+4r21-H'(X)+e_ y

(8)

with

c~ - n~ z

c~ ci'

c~-n~

-

,H

-

4

C~ C~ .

c ~ - n~

.

1 ,

1

-

c~ c~

.

(o)

,

i

where e denotes an error vector. In a first step, x, y and r 2 are assumed to be independent variables. Then a weighted least-squares estimate of 3, 9 in terms of 22 is given by 9

- ~+4

~

(10)

with a

-

a.z,

G

-

(HTE-IH)

(11) (12)

~-G.1 -1. H T. E -1.

E denotes the weighting matrix. With the relationship ~

,

(13)

the following quadratic equation for 22 is obtained 16flTz~ 4 + ( 8 a T f l - 1)~z + a T a -

0.

(14)

Substitution of 22 back into Eq. (10) gives the desired solution. The optimal choice for E is derived next by adopting a set-theoretic point of view. The errors Ai in the measurement distances R{, i - 1, 2 , . . . , N, are assumed to be located inside an ellipsoid defined by a symmetric, positive definite matrix SA, i.e. (A,, A2, . . . , AN) S~' (A1, A2, . . . , AN) T

<__1.

(15)

According to the appendix, the error e will then be constraint by an ellipsoid defined by (16)

E - D. SA. D T

with D ~ {I-1

8(x,0) G}. - 8(&,9)/~ - 1

RD,

RD --

--2 diag(/~,,/~2, .-

"'

/~N).

(17)

The noise free value of 9 is denoted as ~ and I is the identity matrix. The distance of the actual error from the surface of the error ellipsoid in Eq. (16) is given by d- 1-

[ z + 4r2_1- H ( ~ ~) ] f

E -1 [ z + 4r21_- H ( 3 ~) 1

.

(18)

If d is negative, the solution is not valid within the constraints given in Eq. (15) and may be discarded. Otherwise (0 <_ d < 1) the solution is valid and contained in an ellipsoid given by - d. (HTE-1H)

-1 ,

(19)

which is centered at 5, ~. Since the matrix D contains the true distances/~i and the true position (~,~), some approximations are required to arrive at a feasible solution for the weighting matrix E. The measured distances R i are used to approximate the unknown true distances /~i. In a first step, R D 9S A 9R D is used in place of E to solve for a, and then ~2. Next, the resulting ~2 is used as an approximation of the true ~2 in Eq. (17) to calculate D. Although the described process could be repeated to obtain even better solutions, iteration does not lead to significant improvements in practice. Special Case: Linear Array For a linear arrangement of receivers, a simpler solution may be obtained and may replace the above formulas. When the transmitter is also lined up with the receivers, this solution must be used to avoid singularity of G in Eq. (12). Without loss of generality the linear array is assumed to be arranged along the x-axis. Then Eq. (6) simplifies to C~ - R ~ -

4 C[ x + 4 y,

i -

l, 2,...,

N,

y -

C V y - r 2.

(20)

In matrix form, this reads as C~2 -

( )x

z-H

+e

z-

,

R~

C~- R~ ,

H-4

C~v- R~v

C~

1

~

o

C~1 C}

(21)

9

1

With ~2 _ 52 + ~2 an estimate ~ is found from ~2 _ CV?~+ ~ + 52 _ 0.

(22)

The error e is then contained in an ellipsoid given by E -

RD.

S A " RTD .

(23)

A straightforward first-order error propagation analysis provides A~

~ F.

A,)

, F-

(,

2~ cy-2~

0)

1 cy-26

9

(24)

and it follows that -

d. F.

( H T E -~

H) -1.

(25)

F T .

In the special case, the solution is obtained in one step. 3.2. P o i n t R e f l e c t o r From the measurement model of Figure 4 b) follows that

( n ~ - ~)~

__

x

)2

(c~ - 9 + ( c ~ -

y)~, i - ~, 2 , . . . ,

g.

(26)

This can be written as

z+2rR-H

( x ) +e

n, R-

n~ .

c i ~ c~' ,H-2

c~c~ . .

.

(27)

Again, x, y and r are first assumed to be independent variables. The estimate ~, ~ in terms of ~ is then given by

with ~

=

G -

G . z , ~ = G . R

(29)

(H TE -1H)-1. HT " E_I.

(30)

With Eq. (13), a solution for ~ is obtained from (4/3T/3 -- 1)~ 2 § 4aTl3i" + aTa = 0.

(31)

Using similar arguments as for the wall reflector, the error e will be located within an ellipsoid defined by

E = D. SA. D T

(32)

with D

~

(I -

h2(~,~)~_ ~

a]

+

(33)

Special Case: L i n e a r A r r a y When the receivers a n d the transmitter are lined up, G in Eq. (30) becomes singular. Without loss of generality, the array is again assumed to stretch along the x-axis. A solution for ~, ~ is then obtained from

c~ z-H(X] +e'H-2v

Cf.

-R1 -R2

(34)

c~ --RN and

(35) in a very similar fashion as for the line reflector. 3.3. D i s c r i m i n a t i n g R e f l e c t o r T y p e s Besides localizing objects for which the reflector type is a-priori known, it is often useful to identify reflector types based on actual measurement data. The additional surface information simplifies the determination of line endpoints and outlier elimination for subsequent segmentation algorithms. The hypothesis "line reflector" is accepted, if the line condition Eq. (18) is true. In analogy, the hypothesis "point reflector" is accepted, if the respective condition is true. No decision is possible, if both conditions are true. Both hypothesis are rejected, if no condition is true. For the prototype implementation considered, the measurement error is in the order of magnitude to the differences between the distances for the two reflector hypothesis. Numerous misinterpretations are the result. The amplitude of the returned signals is very well suited to support decisions in conflicting cases. Unfortunately, amplitude information is not yet available in the current prototype.

10

Figure 5. Overview of the ultrasonic sensing hardware.

4. I m p l e m e n t a t i o n Issues The prototype implementation comprises 24 piezo-ceramic emitter and receiver elements operating at 40 kHz. They are fixed in a horizontal plane around the mobile platform. The maximum range is about 4 m for most reflector types. The minimum range the dead zone is approximately 0.25 m. Each sensor is connected to an analog signal processing unit which generates pulses and detects the returned echos. A tandemarranged digital signal processing unit determines the elapsed time between transmitting a pulse and the returning echos. The evaluation units are fully decoupled. The digital electronic is mapped to the memory of a local transputer network consisting of four T805 transputers, Figure 5. This network takes care of configuration, the sensor firing, data gathering, and prefiltering. For experimental evaluation of the proposed multisonar system, the omnidirectional service robot ROMAN (ROving MANipulator) has been used. It is a full scale mobile robot, Figure 6, adapted to indoor requirements: width 0.63 m x depth 0.63 m x height 1.6 m. Three independently steerable wheel systems provide excellent maneuverability. Wheel diameters of 0.2 m allow travel across rough surfaces like carpeted floor. The maximum speed of 2 m/s has been chosen to correspond with human walking speeds and wheel-chair velocities. Steering commands are emitted every 10 ms based on actual position information. A lightweight 7 DOF manipulator with a maximum range of 80 cm is mounted in one corner of the platform. Besides the ultrasonic sensors, ROMAN employs a laser navigation system, a wheel based dead-reckoning system and a low-cost gyroscope. The local sonar transputer network is part of a larger distributed network which takes care of the sensor system. 5.

Applications

So far, a number of tools for enhancing the perception capabilities of sonar array sensors have been introduced. Their application depends on how structured the environment is, whether the objects are static or moving, and the degree of prior knowledge. Table 1 lists the rules that allow the system to automatically select sensing strategies depending on the current task and situation.

ll Table 1 Rules for automatic selection of sensing methods. # of concurrent arrays j i

I

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

data rate per array

fast firing . . . . .

_

1

typ. 2-3

virtual point source __+

.

.

.

.

.

.

.

.

cf. Fig. 3

5

X

i i

!

6-10

1 .

updating

# of receivers i per array i I

Localization for navigation Obstacle detection

# of transmitters per array

2-3

.

.

.

.

.

.

.

.

t

1

.

.

.

.

5

.

.

X

25 ms

6-10 .

!

25 ms i

5.1. E x p e r i m e n t s in Cluttered Environment In a cluttered environment, the multisonar system is used to detect and localize obstacles. Absolute localization of the vehicle is performed by the laser navigation system with respect to known artificial landmarks. To demonstrate the performance of the proposed system, the following experiment has been conducted: While following a given course, the robot encounters two obstacles of different size. A virtual point source consisting of 8 physical transmitters points to the travelling direction of the vehicle. 6 receiver elements are employed to detect the incoming echoes. For simplicity, it is assumed, that the reflecting surfaces may be approximated by lines within their 2D projection. Every 25 ms, 6 distances Ri corresponding to the first echoes are measured. The procedure from Section 3.1 is used in an order-recursive manner to combine these measurements. Inconsistent combinations are discarded. Valid combinations are kept and provide the reflector position in the vehicle frame. Via the known transformation vehicle ~ world, the object positions are converted to the world frame and plotted in Figure 7 without further processing. Only a few outliers are generated. For the small obstacle, however, the line reflector assumption leads to errors in the localization estimate.

5.2. E x p e r i m e n t s in Structured Environment In a structured environment, a few a-priori known natural landmarks such as walls and edges are tracked and used for updating the vehicle's position and orientation. The number of employed virtual sensors is equal to the number of simultaneously visible landmarks, typically 2 to 3. Every sensor array employs one physical transmitter and up to 5 receiving elements. At any instant of time, the transmitter pointing closest to the predicted landmark is chosen. Fast firing is used to exploit a-priori knowledge on the reflectors' positions. This leads to very high update rates. For demonstration purposes, the vehicle moved along a wall. Based on the measured distances, the wall position is calculated with respect to the vehicle frame with the scheme discussed in Section 3.1. Since the wall position is known, the vehicle position in world coordinates may be obtained. The resulting data is compared with localization data from the laser navigation system, Figure 8. A satisfactory correspondence of data from both measurement systems is achieved. The raw data is used subsequently to calculate the actual position estimate. The previous estimate is propagated by means of dead-reckoning

12

Figure 6. Mobile manipulator ROMAN.

and gyroscope and fused with localization data from every visible landmark. The respective uncertainties are again modeled based on a set-theoretic approach. Statistical modeling would not capture the systematic errors caused by coordinating the three wheels. According to this point of view, the fusing process results in uncertainty set intersection [11]. Going into more detail would take us too far afield.

6. C o n c l u s i o n s A new high performance multisonar system for application with fast (2 m/s) omnidirectional mobile robots has been developed. 24 transmitter and receiver elements are arranged in a horizontal plane around a mobile robot and constitute a homogeneous construction set for forming of transmitting and receiving arrays. Depending on mission and situation the individual elements are combined to form several heterogeneous virtual sensing arrays, each designed for carrying out a given task. Virtual sensors are aimed electronically, i.e. there is no need for moving parts. High frequency firing schemes exploit a-priori information about the current environment to purposefully collect data in a surveillance area of 8.6 m in diameter. This makes the proposed system superior to those with fixed data gathering schemes. The results of experiments underline the system's improved perception capabilities in detecting obstacles during motion. Other tests demonstrate sonar localization results that are in good accordance with data from a laser navigation system.

13

Figure 7. Obstacle detection during motion.

Figure 8. Sonar localization of the vehicle compared with a laser navigation system.

REFERENCES

1. Borenstein, J. and Koren, Y., "Noise rejection for Ultrasonic sensors in Mobile Robot Applications". Proceedings of the 1992 IEEE International Conference on Robotics and Automation, Nice, France, S. 1727-1732, May 1992. 2. Burks et al., "Autonomous Navigation, Exploration, and Recognition Using the HERMIES IIB Robot". IEEE Ezpert, Vol. 2, No. 4, S. 18-27, 1987. 3. Crowley, J. L., "Dynamic world modelling for an intelligent mobile robot using a rotating ultrasonic ranging device". Proceedings of the 1985 IEEE International Conference on Robotics and Automation, St. Louis, MO, 1985. 4. Drumheller, M., "Mobile Robot Localization Using Sonar". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, No. 2, S. 325-332, 1987. 5. Fukuda, T. and Nakagawa, S., "Dynamically reconfigurable robotic systems". Proceedings of the 1988 IEEE International Conference on Robotics and Automation, Philadelphia, PA, S. 1581-1586, 1988. 6. Kampmann, P.; Freyberger, F.; Karl, G.; and Schmidt, G., "Real-Time Knowledge Acquisition and Control of an Experimental Autonomous Vehicle". Intelligent Autonomous Systems, S. 294-307, 1986.

14 .

10.

11. 12.

13.

Kortenkamp et al., "Integrated Mobile-Robot Design, Winning the AAAI '92 Robot Competition". IEEE Expert, Vol. 8, No. 4, S. 61-73, 1993. Kuc, R. and Barshan, B., "Bat-Like Sonar for Guiding Mobile Robots". IEEE Control Systems Magazine, Vol. 12, No. 4, S. 4-12, 1992. Kuc, R. and Siegel, M. W., "Physically Based Simulation Model for Acoustic Sensor Robot Navigation". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, No. 6, S. 766-778, 1987. Nagashima, Y. and Yuta, S., "Ultrasonic sensing for a mobile robot to recognize an environment - Measuring the normal direction of walls -". Proceedings of the 1992 I E E E / R S J International Conference on Intelligent Robots and Systems, Raleigh, NC, S. 805-812, July 7-10, 1992. Sabater, A. and Thomas, F., "Set Membership Approach to the Propagation of Uncertain Geometric Information". Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, CA, S. 2718-2723, 1991. Sabatini, A. M., "Active Hearing for External Imaging Based on a Ultrasonic Transducer Array". Proceedings of the 1992 IEEE//RSJ International Conference on Intelligent Robots and Systems, Raleigh, NC, S. 829-836, July 7-10, 1992. Walter, S. A., "The Sonar Ring: Obstacle Detection for a Mobile Robot". Proc. IEEE International Conference on Robotics and Automation, Rayleigh, NC, March 31. - April 3., S. 1574-1579, 1987.

Appendix The error e in Eq. (8) may be written as ~ - s + ZXz + 4 ( ~ ~ + z x ~ ) l -

H

~

(36)

- ZXz + 4 z X ~ , ! .

From Eq. (14) follows that 16Z~Z(~ ~ + ~x~) ~ + ( s ( a + ~x.)~Z -

~)(~ +

~ x ~ ) + (a + ~x.)~(a + A . ) = 0.

(37)

If squared error terms are neglected, error At2 is given by 2(4~3 ~ +

a~)

Ar~ ~ - 8(4~2/~T~ + 5 T 3 ) _ 1

a.

= _ AO~

"

(3s)

From Eq. (11) it is concluded that An = G" Az and we get from Eq. (9)

Az ,-~ RD "

A1 A2 .

9

(39)

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

15

P r o p e r S e l e c t i o n of S o n a r a n d V i s u a l S e n s o r s in V e h i c l e D e t e c t i o n a n d A v o i d a n c e N. Moghaddam Charkari, Kazmmri Ishii and Hideo Xlori * Department of Electrical Engineering and Computer Science "~unanashi University A general 1)aradigm for vehicle detection and avoidance by proper selection of visual and sonar sensors is 1)resented. The most appropriate sensor for detection and navigation of the robot is selected in accordance with the sensor fimctions and limitations. This strateg T is activated when one sensor can't obtain enough information for navigation of the robot because of sensor limitations. Five types of actions based on stereotyped motion are used for th(, navigation system: Moving Along Wall, Moving Along, Moving Toward, Tracking Motion and Danger Estimation. The first is sonar based and the rest are vision based a('tions. Visual detection of the vehicle is 1)ase(l on the locomotion strateg3' "Sign PattcrT~ based Stereotyped MotioTi'. Th(" sign pattern of the vehicle is the shadow underneath the vehMe, which is dark('r than any other parts of the paved roa(l. The rol)ot detects the sign pattern of a vehMe fi'om a (listance of 30 m and extracts its front shape 1)y video rate pro('essing.

1. Introduction One of the most important tasks for the navigation of mol)ile robots in outdoor envirom nents is the ability to detect and avoid moving or I)ark('(l vehicles. In this paper, sonar sensors are used in combination with vision to compensate for some of the difficulties and limitations encountered in the machine processing of visual information. The proper use of multiple sensors is becoming an in(,reasingly important area of research and appli('ation to improve the capabilities of intelligent machines, because of the advantages and the limitations of each type of sensor, most mobile robots use a combination of different sensors to perform some tasks. Each of the sensors in a defined task may be operated synchronously or asynchronously with the other sensors, depending on the Selection Process. Selecting the most appropriate sensor, or group of sensors, during the navigation of the mobile robot is also perform('d by the Selection Process. The strategy of selection dep('nds on the limitations of the s('nsors. For instance, a visual sensor observes an entire scene and can be used for dete('tion an(l re('ognition of objects that are distant, while tactile and sonar sensors are used to ol)tain inibrmation about a few objects that are nearl)v. *This work is parti~dly supported l)y Grmlt-in-Aid for Sci(,ntifi(" Research from the .Ministry of Education, No. 03650343 and fr(nn tile Okawa Institute of Information and Tel(wommunication No. 91-32.

15 Two (liffcr,'nt al)l)roachcs to the s('nsor Sclcctiou Process are Preselectiug during initialization and On.line selection in resl)OnSC to changing (,nvironment or syst(un conditions. Gaskell and Prol)crt [1] discussed the use of De,.cisiou Theoretic PTi'aciplcs for sensor management in multiscnsor systems. Their fi'amcwork i)r(~vidcs a way of rcl)rcsc:lting sensory data and choosing actions under uncertainty. Luo and Kay [2] surveyed a variety of recent al)l)roa('hes to the pro])lcm of multiscnsory information. G('neral multiscnsor fitsion m('tho(ls, sensor selection strategy, an(l some other apl)roachcs and applications are discussc(l in their report. In the NAVLAB paper [3] sonar, a lapser rangcfindcr and a color vision system are used for object detection as well as navigation in outdoor environment. The design of the architecture can support parallel processing and th(, development of multiscnsor integration has been the major goal of this research. h:imoto and "~]lta [4] dcvclol)cd a method based on the fusioll of internal sensors and ultrasonic sensors, ms external sensors, using an extended Kahnan filter for the navigation of the YAMABICO rol)ot. Information on the 1)lamlcd path and environmental data are the prior knowledge of the robot. Sonar, a laser rangcfin(lcr and two color vi(h'o camera.s have been Cml)loycd for roadfollowing and ol)jcct detection appli('ations on road environments in ALV [5]. Color video information is us('(l to locate the road, as the las('r rangcfindcr may siml)ly be confllscd if there is litth" diffcrcncc between the road au(l th(, surroundings. On the other hand, la.,~cr range information is used to ol)tain an accurate (lcscril)tion of the geometrical features of obstacles on the road. Visual ol)jcct detection is one of the main subj(,(,ts in robotic studies. Among such objects, vehicles are the :::()st troublesome ones in out(loor cnviromncnts. Several methods for vision 1)ascd vehicle detection have 1)con prol)oscd l)y researchers. A dynamic 2-D model has been iml)lcmcntc(l on VaMotLs [6]. This model is l)ascd on the notion that the world is structured, rather than monolithic, and consists of individual objects. It allows the determination of the relative Sl)ced and distance in real time and may guide appropriate maneuvers if necessary. In this method, the right and left. sides and the bottom of a car arc detected horizonta.lly and vertically by a local edge operator. Symmetry is another method which provides a useful feature since vehicles' rears are generally contour-symmetric and spatially gray-level-symmetric about a vertical centerline [7,8]. It means that an appropriately shifted mirror image of a vehicle overlaps itself. However, the error rate may increase when the mctho(1 is performed in a highly structured scene.

In our earlier article [9] we discussed a method 1)ascd on "Sign Pattern based Stereotyped Motiou" for vehicle detection. There the robot (lots not need to fully understand the obstacle as it reconstructs the 3-D shal)e of the obstacle in its memory. The robot only needs to obtain signs to such a degree that they inform it of the existence of an object in its way. The sign wa~s referred to as "sign 1)attcrn" in the earlier work. We found shadow underneath a vehicle to 1)c a rclial)lc sign 1)att('rn of the vehicle [10,11]. \Tchicle detection ba.,~ed on sign 1)attern is one of the lowcr-lcv('l COml)ctcn('ics of the rol)ot's information set of the cnviromncnt. Therefore, it does not nc('d highly computational image processing. Moreover, it cnal)les the s.x'stcm to overcome the tra.cking process, i.e. the ability to keep a vehicle in track l)y continuously sensing its sign 1)attcrn in real time.

17

Figure 1. The structure of HARUNOBU-6

In section 2, the structure of the HARUNOBU-6 robot and its explained. A simple reflection model of the road surface is discussed developments of the visual vehicle detection module can 1)e found in paradigm of proper sensor selection is discussed in section 5. Finally, the experimental results and conclusion respectively.

2.

sensory system are in section 3. Recent se(-tion 4. A general sections 6 and 7 are

S t r u c t u r e of the R o b o t

The work on developing the compact rol)ot HARUNOBU-6 (54 cm x 84 cm x 84 cm) has been undertaken in our laboratory since 1992. The structure of the robot is shown in Figure 1. The undercarriage is a motor-tricycle developed by Suzuki Co. Ltd.

2.1. Sensor S y s t e m Two types of sensors have been installed on the rol)ot: internal and external sensors. The internal sensors include encoders, a magnetic directional sensor and a gyroscope. The external sensors are a video camera and six sonar sensors. Sonar sensors are attached to the right, left and front of the robot. Two sonar sensors which are attached on the front of the robot are used for obstacle detection. The remaining sensors are used for moving along an object. Data from sonar sensors is sampled every 50 ms while the frequency of each sensor is about 40 kHz. The video camera is set at a height of 84 cm with a tilt angle of 12 ~ A VME image memory system (CPU 68040, 25 MHz) is used for ilnage processing and navigation. The communication between the image processing system and the vehicle motion control systern is performed through an RS232C connection. Data from internal and sonar sensors are fed to a UP P (universal pulse processor) for flu'ther processing. The UP P contains a 10-bit A / D converter, 1 kB RAM and a pulse modulator with counter. The control of the UPP is done by a single board control 1)ro('cssor (CPU V25, 8 MHz).

18 !

urce I Y

(o L _

o•'4000

I

o Io

/'

//7

~ i~L ~' r /-/ J -- ~ ' ~ ' - :

vJow

E

D i f f u s e Light

_ . . I: ~

...

: , 'IX

|

$

E

C

un

.

,~4,

oo

.:

9

3000

I """"'=,=

I

5

'

I~o

'

3~o angle

Figure 2. Schematic of specular and diffilsed light. Incident angle is Oi degrees in the x-y 1)lane. View position is defined by

(0~, 0,.).

~ r

Figure 3. Image intensity curve of the reflected light on the a.sphalt road when view position 0,. changes from 0 to 360 ~ and Oi =70 ~ and ~ = 60 ~

3. N o t e s o n t h e T e r m i n o l o g y

Tile terminologT employed in this stu(ly to describe the reflectance geometry and tile reflectance model is illustrated inFigure 2. A surfa~'e of an object is illuminated by the sun light I~,, and the sky light I~ky. The light s o u r c e is assumed to lie in the x-y plane with the zenith angle 0i. The light is reflected in the ((r , 4),.) direction, where 8~ denotes the angle between the normal vector of the surface and the reflected.light which is seen from the viewpoint of the video camera. ~ is the reflected light angle between the plane of I~,, and the normal vector. When the view position lies in the reflected direction (0i = 8,.), then the luminance of a surface is mostly evaluated by the specular component of the light source. Furthermore, if I,,,, becomes ahnost parMlel to the surface, the surface behaves like a mirror and all the light is reflected in the ~)i = 0~ direction. The reflected light is called "specular light" and represented by R,~. The specular component ha~s a spectrMenergy distribution close to that of the incident light. The directional characteristic of R~pr is called the "specular lobe" and is expressed by R~

= L,,,, " s i ,

e,. . cos4(t~i - 0 , ) . cos40,..

(1)

When I.,,,, is nearly perpendicular to the surface the light penetrates into the substance of the object and is partially scattered in many layers. The scattered light is called "diffused light" and is represented by RdL,. The (liffused light of a surface is given by Rd/s = P" h~,, " cos(Oi - 0r)(1 -- kCO.S" 20,.)/(1 -t-/,:),

(2)

where p is the reflectance factor of the sllrfa(,e and k denotes the fraction of diffuse light. Equations (1) and (2) show that surface reflectance not only depends on the material properties, but also on the viewing geometry. Figure 3 shows the image intensity curve of the reflected light on the road surface in one of our experiments. 0i and 0r are assumed to be almost equal (0i = 70 ~ , 0,. = 60 ~ while viewing position 0~ varies from 0 to 360 ~

19 Tal) le 1 Luminance of road surface on a clear day Clear weather Road surface type In specular light In diffused light Shadow of vehicle Underneath vehicle

Luminance (cd/m 2) Dry surface Wet surface 3600 to 4800 2020 to 3000 2600 to 3300 850 to 2200 400 to 900 275 to 500 100 to 200 60 to 150

Table 2 Luminance of road surface on a cloudy day Cloudy weather Road surface tyl)e Open parts Underneath vehMe

Lmninance (cd/m2) Dr3' surface Wet sm'fa,ce 2610 to 3500 2000 to 3000 105 to 140 80 to 100

On tile other hand, tile reflected light of the sun R.,,,,, is generally expressed by

R:,,, = R~p~ + Rd/.~.

(3)

The. total reflected light R, is given 1)3: R -

a l 9R , , , , + 0'.2. R , k y + c~a. R~,,v,

(4)

where at, (~'2and a3 are transparency factors equal to or less than 1.0. From the viewpoint of optical science, the sunny part of the asphalt road is illuminated by tile sun and the sky light (al - a2 = aa = 1) while the sha.d('d part is only illuminated by the blue sky light (a'l ~ 0, a2 = 1, aa - 1). The area underneath a vehicle is illuminated through the gap between the vehicle and the ground l)y the faint environment light R,,,,, (at - ~2 = 0, oa - 1). Equation 4 shows that although the luminance of the shadow underneath and the other shaded parts on the road apl)ear similar to the human visual system, this is an illusion of brightness constancy. To determine 1)recisely the above differences, the luminance of sunny regions, shadows, and the areas underneath a vehicle on dr3" and wet road surfa.ces were measured. Tables 1 and 2 show the result of the experiments on a clear day and during cloudy weather conditions respectively. It was found that the area ml(lerneath a vehicle on a ('lear day is markedly dm'ker than the shaded parts and wet patches in luminance. Similarly, tile area. underneath is nmch darker than the other open parts of the asl)halt road on a cloudy day when there is no shadow. Moreover, the luminance of the shadow underneath shows little change while those of sunny and shaded parts have no consistency because of the different sources of illunfinating light. Finally, the shadow underneath is not responsive to change in the viewing position and weather conditions.

20

Figure 4. Flow diagram of vehicle detection and tracking process

4. Vehicle D e t e c t i o n Algorithm Visual vehicle detection and tracking is (lecoml)osed into three layers. In the lowest layer the candidate vehicles are detected l)y the Selection sul)-module. In the middle layer, the pseudo-objects which arc wrongly selectte(l as vehicles are identified. In this l~'er, some additional features of vehicles are investigated by the Confirmation sul)-module. In the third layer, the type of motion is investigated and the candidate vehicles are tracked. Moreover, the trajectory of vehicles in motion and the motion direction, which are important in danger estimation, are found during the tracking process. The entire system structure is composed of the following sub-modules: Window Set-up, Searching Underneath, Confirmation, Motion Estimator, Danger Estimation, and Tracking and is shown in Figure 4. In the following the operation of ea~'h sub-module is discussed in brief. (1) W i n d o w Set-up: A fi'ame is digitized by its intensity and the right and left lane marks are detected by the line-fitting method. Then a thin window Wt, with its length equal to the road width, is set up at a position corresponding to a distance of 30m ahea~l of the robot on the image plane. (3) Searching U n d e r n e a t h : The vertical projection is performed on the window rWt. The candidate pixels are extracted and linke(l into the line segments by comparing the projection with the threshold value Trl. Then, the short line segments are eliminated and the collinear line segments are merged. The result is a new line segment L, = (~,, ui, ~, ui)T representing the a.rea underneath the vehicle.

21

Figure 5. Ilesult of vertical projection on a road partly covered by shadows. The area underneath a vehicle is clearly distinguished from other parts of the road surface when it enters the search window.

Figure 5 shows ml example of the vertical projection when the area underneath a vehicle enters the search window on a. road partly covered by shadows. As can be seen, the area underneath is clearly distinguished from other parts of the asphalt road by the intensity vahle. (4) Onc0ming/Parking Detection: Physiological studies suggest that the human visual system may use instantaneous motion information COmlmted fl'om two successive flames. In our work, once two feature points of t lw area underneath the vehicle ((~,, I~,)T and ((~,ui) T with length of Li are extracted in the search window 11,'1, then the next window II.) is set up at (ui + 7~e) as the next image. (ui + ~ite) indicates the shift of the search window on the 2-D image. Then, the corresl)onden('e of these points in successive images is determined. If the correspondence is estalJlished within an interval of time tl th('n the detected vehicle in lI'l is identified as an oncoming vehicle; otherwise it is a.s a stationary w'hicle. Since moving vehicles have a smooth and continuous and not a. transverse motion, we have emplyed a time interval tl which indicates the teml)oral 1)roperty of the area underneath when it is in motion. ( 5 ) C o n f i r m a t i o n : Moving vehicle detection based on two successive fl'anws may include some errors caused by distortion, random noise, 1)lurring, etc. The uncertainty can be considerably reduced by the following assumptions: (5.1) Number of Line Segments: The presence of at most two line segments in the vertical projection. When these are two line segments, one line segment corresponds to the shadow underneath all oncoming vehicle a~ld the other to that of a departing vehicle. (5.2) Length of the Line Segment: The length of the line segment should be greater

22 than the thresholcl value/Ji defined as tile metal value of vehicle width on the 2-D image. The threshold value increases with decrease in the depth of view. (5.3) Vertical Edge O p e r a t i o n : Vertical edge operation is performed on the t)eginning and the points of the line segment to detect the left and right sides of the vehicle. Confidence is defined to find the accuracy of measurement. It is represented 1)y the square root of the inverse of the vaa'iance of the candidate edge points. The smaller the variance, the higher the confidence of the measurement. (5.4) Vehicle Side S y m m e t r y O p e r a t i o n : The cletected eclges on tile left. and right windows belong to tile vehicle sides if they are symmetric about the vertical centerline defined by the line segment L,, = (g~,,,~'i,~,:~,,,(vi - Li/2)) T. ~,~,, is the mean of {, and ~ and Li denotes the length of the line segment expressed by ({~ - ~,). (5.5) G r a y Level S y m m e t r y : Window ITQ,,d of size (Li x K ) is set up at tile position of tile candidate. Two columns of pixels in Wu,,d are gray level symmetric al)out tile vertical eenterline Lv if

k

~-~[(a(eent-i)j-

j=l

I < A,.

(5)

where Ag indicates the threshold value during tile symmetry operation. This process is done for i = 1,2 ..... , ({+ - {,)/2. The extent of symmetry is found to be proportional to the number of pixels which are gray level symmetric. As the vehicle's underneath is only illuminated by faint light from the environment, the difference between two column pixels gradually increases from the vehicle's centerline to tile sides of tile vehicle, in tile small range of Ag. This feature is effectively use(l to clistinguish the area underneath the vehicle from pseudo-objects in tile mid-range distance of tile vehicle on tile road in tile vehicle detection module. (5.6) H o m o g e n e i t y : Homogeneity is one of the main properties of the area underneath a vehicle. As this area is not directly illuminated by tile sun or tile sky light, the intensity dispersion of pixels has little change. Therefore, the average intensity and tile standard deviation of this area can be efficiently used to distinguish the shadow underneath a vehicle from pseudo-objects, such as, low-contrast shaded parts of an asphalt road. If the confirmation is successfully completed then tile detected segment is assumed to be the sign pattern of a vehicle, otherwise it is noise. (6) Tracking Vehicles: Tile tracking process gives tile robot the ability to estimate the safe distance between the robot and the vehicle. A vehicle is detected and tracked in successive images by shifting the dynamic windows Wi (i - 3,4,...) to tile lower side of the image. Tile size and the position of tracking window Wi is determined by the length, center position, and ordinate of tile line segment in tile tracking window Wi-l. The length of the tracking window is given by

Wi = W~-I + AL

(6)

where A L is expressed by A L --

L i - l [ R i - ~ - Di • 2tan(aJ/2)] Di • 2 tan(a/2)

(7)

23 and a a n d / ? i - I are the field of view (F()\,') and the n,lative width of tile area subtended by the camera at distance Di-t respectively. L;-t denotes the length of line segment in window IV/_ I. Finally, depth, velocity, and width are determined by analyzing the time series of the ordinate of the area underneath the segment.

5. Proper Sensor Selection Two types of external sensors, visual and sonar, are currently used in vehMe detection and avoidance in HARUNOBU-6. The proper selection of sensors is the main purpose of the control s3"sWm of this robot. In the whoh, system, the most al)propriate group of sensors, or only one type of sensor, is selected to use as a response to changing conditions during the robot's motion. This means that all the sensors do not seem to l~e combined but rather, depending on the environmental conditions and the sensors' fimctionalities, the data fi'om one sensor tends to dominate. This helps to simplify the control program, while in the case of combining or filsion the complexity of the control sysWm grows exponentially [2]. Five types of actions are currently undertaken in HARUN()BU-6; Moving Along Wall (MAW), Tracking Motion (TM), Moving Toward (MT), Danger Estimation (DE), and Moving Along (MA). The first action is a sonar based and the remaining actions are vision based. The sign pattern of MA action is white lane marks or long edges along the ground, while a line segment found from the shadow underneath the vehicle is the sign pattern in TM, M T and DE actions. The principles of MA W and MA actions have been discussed in [9] and [12]. Each of these actions consists of simple processing whicll enables the robot to respond to sudden changes in its environment quickly. Selection of an action is determined by knowledge of tile limitations and the advantages of each type of sensor as well as by the current action of the rol)ot. The relevant action is reliable if its response to the enviromnent is positive. Two new actions M T and TM applicable to vehMe detection and avoidance are presented in this paper. Tile selection of TM and M T is done in the Motion Estimator module as shown in Figure 4. TM is used fox" tracking an oncoming vehMe. Distance 1)etween tile robot and tile vehicle as well as tile collision time are continuously estimated in this action. It is a useful action which ensures the robot maintains a salt' distance and finds a free route for its motion. On the other hand, M T is used fox" tracking a parked vehicle. Continuous detection and tracking of a parked vehicle helps the robot to evaluate its distance from that vehicle as it moves. Moreover, the width of the vehicle and the free path are estimated and fed to tile control system for the avoidance process. A free path in the road is found from knowledge of the road width and tile estimated wMth of the vehicle. It indicates whether tile path can be followed or not. Finally, DE is activated during the tracking process. The trajectories of vehicles as well as their direction of motion are found during this action. The output of this action indicates whether the route is safe or not. The result may be used to decide the next action of the rol)ot: for instance, the decision to ('ross an intersection, make an elnergency stop, or continue in motion, and so on. Tile detail of this action has been discussed in [13].

24

Figure 6. Stereotyped motion of the robot in avoiding a vehMe parked at the side of the roa(l

Fig~lre 7. The state transition diagram

5.1. O r d e r of P r o c e s s Figure 6 shows the Stereotyped Motion of the rol)ot in tile vehMe avoidance process. A general 1)aradignl for vehicle detection an(l avoidance l)y the l)roper use of visual and sonar sensors is shown in Figure 7. In the first phase, one of two actions, M A W or MA, is activated. In both cases the global information concerning the environmellt is ol)tained ])y the visual system. This information is ('omI)osed of the position an(l the angular directions of the left and right lane marks. If a vehMe is detected during the rol)ot's motion then one of two types of action, TM or MT. is started as a result ()f the motion estimator. At the same time, DE is activated to estimate the danger coefficient of vehM('s. The relevant action is continued until the visual sensor fails to provide sutfi('ient information about the vehicle. In other words, the uncertainty of visual sensory data progressively increases a,s the detection

25

fre(lllency decreases. Thus. the rehwan('e and the r('lial)ilitx" of this sensory data to the ('ulrent action is re(luce(l an(l the resl)()nse fl'()m ()ther t.X'l)eS of sensory data ten(ls to dolninate. Ill 1)ractice, this hal)l)ens at ('h)se l"ang(', less than 61n, when the visual sensor can't see the entire vehicle an(l fails to dete('t the sign 1)attern ()f the vehicl('. In this case, the reliaMlitv of the visual sensor reaches z('r() while that of the sonar sensor increases and 1nay reach unity. The estimat(,d width of the v(,hi('le, the al)l)roximate distance 1)etween the robot and the vehicle, and tile free path. are sent to the control system at this 1)oint. In the secon(l phase, the fl'ont sonar sensor is activated to prepare inforlnation on the position of the rear of the vehicle. The robot continues its motion until it is f meters from the vehicle, where distance f is provided by the front sonm" sensors. In the third phase, avoidall('e occurs. The avoidance lnaneuver is set up to sidestep to the right of tile 1)arked vehMe. The inforlnation ()l)tained fi'om the visual sensor, i.e. width and fl'ee path, as well as that fi'Oln the front sonar sensors, i.e. distance f, is used for ax'oiding the vehicle. The robot avoids the vehi('le 1)y following the sylnmetrical arc R. The interior angle 0 and the radius r of the arc ar(, given 1)y .)

II 2 -1- I',

-

~

(8)

,

4u II

0 - 2arctalt-,

(9)

U

where u and v are define(l as follows: u-

a

+ s

~,-f+~,

IV

+ -~-,

L

(10)

(11)

where L and W are the length and width of the rol)()t, a is equal to half of the vehicle width, and s is define(l as the distance between the rol)ot and the vehicle's side while it is moving along the vehicle. In the fourth phase, the M A W action is activated. MA can't be selected, as the left, lane mark is not in the visual field of the rol)()t. Xloving along the vehicle's side continues mgil the vehicle's side is not detected by the sonar sensors attached on the left side of the robot. In this case, the robot uses its locati()n 1)efor(, the avoidance process and returns to its position during the M T action. Finally, phw~e 1 will 1)e activated again.

6. E x p e r i m e n t a l R e s u l t s The vehicle detection and avoidance systenl was tested in the university carol)US. A vehicle was parked 30m in front of the rol)ot. Figure 8 shows the locus of the robot during motion and avoidance processes. The order of 1)ro('ess('s was as exl)lained in the previous section. The avoidance process was activated when the rol)ot reached 2m in front of the vehicle. The robot continued its motion, keeping l m fl'om the right side of the vehicle, using the ilffOrlnation received fl'om the s(mar sensors attached on the left side of the robot. Figure 9 shows the result of six frames of visual vehicle detection flom 30m to

26

Figure 8. The locus of the robot during motion

Figure 9. The result of vehicle detection in six images

5m in distance. Figure 10 shows tile result of estimating the vehicle width during M T action. Tile estimated width is 165(,m where tile real width of the vehicle is 160cm. The accuracy of more than 90% in many exl)eriments as sures tile robot of safe avoidance. Finally, Figure 11 shows the time series results of vehi('le detection by: vision-based and sonar based actions fl'om 30m to 2m in distan('e. The number of d('te('tions has been found in a defined time interval T. Because of the limitation of visual sensors in detecting tile sign pattern of the vehicle at a near distance, tile number of visual vehicle detections, shown by x, has reached 0 at a distance of 6m. Similarly tile number of sonar-1)ased vehicle detections, shown b v . , has increased at a near distance. 7. C o n c l u s i o n

A general paradignl for vehicle detection and avoidance by proper use of visual and sonar sensors has 1)een 1)resented. The 1)erfornmn(,e of the system has 1)een evaluated in

27 .-.. K

|

i

300

"o E r

uJ

|

Vehicle Width Estimated Width

r

,,,

=

,,|

O (J (D

160 cm " 165 cm 9

Visual Sensor

Cl O

200

d z Z

60

40

"-

1O0

20

2'o

5 lh

'

Ib

'

,

g"

30

Distance(m)

Figure 10. The estimated width of the vehicle. The result has been found from 20111 to 5m in distan('e during MT a ctioll.

20

10

a1

Distance(m)

0

Figur(, 11. The llUllll)el of v(,hMe dete('tions l)y visual and sonar s('nsors (hu'ing th(" l"(~l)ot's nlotion frOlll 3()m to 3111 in (list alwe. x indicates the nUlll])er of det(,('ti()ns 1)v visual sensor. ,, indicates the llllllll)('l" of (h't('('tions by sonar sensor.

Table 3 Results of vehMe detection

Road surface Clear weather (1)artly shadow) Clear weather (entirely shadow) Clear weather( llO shadow ) Cloudy weather

Detection success 99.8~! 100% 100% 100(~

()vet d(,t('('tion 0.4(/ ().8(/(~ 0.2(/: 0.6(/

Under detection 0.2(~ 0.0% 0.0% 0.0%

Total 11o. of vehMes 850 1050 880 860

tile university camlmS. The order of using s('ns()rs is sele('tcd according to the sensors' ('al)al)ilities. Tile lnethod of selection uses knowh'(lg(' of th(, functionalities and linfitations of each type of sensor a.s well as the outdoor cnvil-Olnn('nt situation. Five types of actions, originated froln Stereotyped Motion. have 1)ten al)l)li('(l for detection and a voidallce problems: they are lklovillg Along Wall, Moving Along, Tra('king NIotion, Nioving Toward, and Danger Estinmtion. Visual vehicle detection is discussed in 1)fief. Th(' method is 1)ased on Sign Pattcrn Stereotyped Motion. The Sign Pattern of the vchich' is the shadow underneath the vehicle. It. is unique to the time, shadow, weather, and light conditions. Tal)h ~ 3 shows the robustness of the shadow underneath for vchi('le d('t.ection in different conditions. This result has been obtained Kern one hour of videotal)e consisting of more than 3000 scenes of vehicles in different road and weather conditions. ()ver-det.ectioll occurs when pseudo-objects are detected as vehicles by mistake. Similarly, under-detection indicates the system is failure to detect, vehicles.

28

:s research on 1)rol)er selection of sensors is still ('ontimfing. hnl)roving the navigational system and multi-ol)ject detection are flltm'e work. REFERENCES

1. Alex Gask('ll and Penelope Probcrt, "Sensor Models and a Framework for Sensor Managenwnt", SPIE 2059, United States, Boston, pp. 2 1 3 , 1993. 2. R. luo and M. Kay, "Multisensor Integration and Fusion in Intellig('nt Systems", IEEE Trans. on Systems, Man, and Cyl)ernetics 19, No. 5, pp. 901- 931, 1989. 3. S.A. Shafer, A. Stentz and E. Thrope, "An Architecture for Sensory Fusion in a Mobile Rol)ot', Proc. of IEEE Int. Conf. Robotics and Automation, Raleigh NC, pp. 4 0 2 - 4 0 8 , 1987. 4. K. Kimoto and S. Yuta. "Sonar Ba~se(l Outdoor Navigation - Nax'igation using Natural Landnmrks ", IC'AR, Tokyo, Japan. Pl). 239 244. 1993. 5. M.A. Turk. D.G. ~Iorgenthaler, h:.D. Ch,'ml~an and M. Marra, "\'ideo Road Following for the Autonomous Land \,~,hich,", in Proc. of IEEE Int. Conf. Rol)oti('s and Automation, pp. 727-733, 1988. 6. U. Solder and V. Graefe, "O1)ject Detection in Real Time", in Pro(-. of the SPIE Symposiunl on Advances Intelligent System. Boston, 1)P- 104111, 1990. 7. A. Kuehnel. "Symmetry based Recognition of Vehicle Rears", Pattern Recognition Letters 12, North Holland, pp. 249--258, 1991. 8. T. Zielke, M. Braukman and W.Seelen. "Intensity and Edge-Based Symmetry Detection aI)I)lied to Car Following", Coml)ut('r Vision- ECCV '92, 1)1). 865-874, 1992. 9. H. Mori, M. Nakai, H. Chen, S. Nakayanm and T. '~51guchi, "A Mol~ile Robot Strategy - Stereo typed Motion by Sign Pattern - " , ~IIT press, pp. 161-172, 1990. 10. N. Moghadam and H. Xlori, "A New A1)proach for Real time VehMe Detection", IEEE Int. Conf. on Intelligent Robots and Systems IROS, Yokohanm, Japan, pl ). 273--278, 1993.

11. Hideo Mori. N. Moghadam Charkari and Takeshi ~Iat.sushita, "On-Line VehMe and Pedestrian Detections Based on Sign Pattern", IEEE Transaction on Industrial Electronics 41, No. 4, pp. 384-391, 1994. 12. H. Xlori, "Guide Dog Robot HAI:/UNOBU-5 - Stereotyped Motion and Navigation -", Springcr-\:erlag Press, Germany, Pl). 135 149, 1991. 13. Hideo Mori, N. Moghaddam Charkari and Shinji Kotani, "Danger Estimation of Vehicles at Intersection", IEEE Int. Conf. on Rol)oti('s and Automation, Nagoya, Japan, 1995.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

Selective Refinement of 3-D Scene Description for Mobile Robot

29

by Attentive

Observation

Hotaka Takizawa ~, Yoshiaki Shirai and Jun Miura ~Osaka University, 2-1, Yamadaoka, Suita, Osaka 565, JAPAN This paper describes a method of constructing a 3-D scene description by attentive observation to find a path to a given destination. From low-resolution stereo images, features are extracted and a scene description is constructed with the constrained Delaunay triangulation. Then, a path is searched for in the description. If no reliable paths are found, an unknown image region on the most promising path is examined in zoomed-in stereo images and new features are extracted from the zoomed-in images. The new features are integrated to the original ones. A new description is similarly constructed of the integrated features, and a path is again searched for in the description. This procedure is repeated until a path is found or no more ambiguous regions are left. The experimental results for a real indoor scene including a chair, desks and computers are shown. 1. I n t r o d u c t i o n Vision is useful for an autonomous mobile robot to reach a given destination safely in an environment [1]. The cost of vision may be high if the robot has to observe the entire environment to make a detailed description [2] [31. There have been attempts to reduce the cost of making a scene description by selecting observation regions. Xie [4] proposed a method of viewpoint selection to determine whether or not the environment is different from a given map. Jasna et al. [5] proposed to observe an unknow environment (without maps) from several different viewpoints. The viewpoints are selected so that the robot observes the entire environment and that the number of the observations is minimized. Subhodev et al. [6] proposed to acquire range information of an environment with low-resolution stereo images to observe selected regions with high-resolution images. The observation regions are selected so that the fine description of the entire scene is constructed with minimum camera movement. These methods, however, are not effective for the task of finding a path to a given destination. In Blake et al. [7], the environment is first observed to find a destination. If it is not found, a freespace to pass through is searched for. The robot moves to the freespace and observes the environment again. The search of freespace may be inefficient because the robot has no global information of the environment. In our method, the robot selects the region on the most promising path out of the candidates obtained by low-resolution images. Then, the robot observes the region attentively. Fig. 1 shows a typical situation where the method is useful. If only the features Ti(i = 1 ,,~ 4) are detected from the low- resolution image, the movable area for the robot

30 is the movable area-1 because the feature points are regarded as lying on a surface. If the destination of the robot is behind the plane containing T2 and T3, the region corresponding to the area is observed attentively. If any textures on the back wall are observed, the movable area expands to the movable area-2 and the path to the destination might be found.

Figure 1. Movable areas for the robot.

Figure 2. Panther-VZ on a mobile robot.

In this paper, a 3-D scene description is constructed by attentive observation to find a path to a given destination. From low-resolution stereo images, features are extracted and a scene description is constructed. Then, a path is searched for in the description. If a safe path is found, the robot goes to the destination. If no reliable paths are found, impassable image regions and unknown regions are determined and an unknown region on the most promising path is examined in a stereo pair of zoomed-in images. From the zoomed-in images, new features are extracted. The new features are integrated to the original ones. A new description is similarly constructed of the integrated features and a path is again searched for. This procedure is repeated until a path is found or no more ambiguous regions are left. We use an autonomous mobile robot equipped with an Active Vision System (See Fig. 2), named Panther-VZ [8]. The camera system has four degrees of freedom: pan, tilt and vergence for the two cameras. A motorized lens is attached to each camera. Zoom, focus and aperture of the lens are controllable. 2. O v e r v i e w of t h e m e t h o d The procedure of the method to find a path to a given destination is as follows:

31 1. Take a pair of low-resolution stereo images and compute the range data. 2. Constructed a scene description from the data(described in Section 3.1). 3. If a safe path is found from the description, the robot goes to the destination and the procedure ends successfully. Otherwise, extract impassable regions and undefined regions (described in Section 3.2). 4. If the undefined regions are found, select a region to observe attentively. Zoom in the region and take a pair of high-resolution stereo images of the region(described in Section 4.1). Integrate the new range data with the original data (described in Section 4.2). Go to Step (2). 5. Otherwise. If there are no next viewpoints, the procedure fails. Otherwise, the robot goes to the nearest viewpoint. Go to Step (1).

3. Scene Interpretation 3.1. 3-D Scene Description with Constrained Delaunay Triangulation The 3-D positions of feature points(edges) are obtained by our feature-based stereo method [9]. The straight line segments are extracted by merging connected edges. The scene description is constructed by interpolating edges and segments with triangles. Delaunay triangulation is made use of to generate the triangles. It generates triangles such that nearer pairs of edges are preferably linked. In this description, a generated triangle may not correspond to a real surface. We will verify whether or not the triangles are real surfaces later.

3.1.1. Delaunay Triangulation The Delaunay triangulation is calculated from a Voronoi diagram [10]. The Voronoi diagram and the Delaunay triangulation on a 2-D plane are defined as follows(See Fig. 3). For a given set of points (called generators), Mi, on a plane, the corresponding Voronoi diagram consists of polygons, each of which includes a generator such that the polygon region is closer to the generator than to the other generators. The Delaunay triangulation draws line segments between adjacent generators in the Voronoi diagram. The Voronoi diagram and the Delaunay triangulation in a 3-D space is represented by polyhedra.

""/

"

M3 - ~ ~

M 4 ~ ~ . ~ i

Voronoi diagram

M2

....

it"............

~ M5 Delaunay triangulation

Figure 3. Voronoi diagram and Delaunay triangulation.

32

3.1.2. 3-D Scene Description with Constrained Delaunay Triangulation In indoor scenes, straight line segments are useful for 3-D scene description. Although the ordinary Delaunay triangulation is computed only with the generators, it is desirable to preserve straight line segments in the triangulation. The constrained Delaunay triangulation keeps the line segments as a part of edges of the final triangulation [11]. Floriani et al. [12] proposed to apply the 2-D constrained Delaunay triangulation to the 3-D description. First, the 2-D constrained Delaunay triangulation is generated from segments in an image. Then, the 2-D triangulation is backprojected to the 3-D space using the 3-D positions of the segments. We apply the method to segments and points. Given edges obtained by the stereo method, the straight line segments are extracted. The 2-D constrained Delaunay triangulation is constructed from both the segments and the remaining isolated points. Finally, the 2-D triangulation is backprojected to 3-D space. Fig. 4(a) shows an example of scene and the extracted segments (the white arrow indicates the destination). Fig. 4(b) shows the 3-D scene description with the constrained Delaunay triangulation.

Figure 4. Scene description.

3.2. Interpretation of Delaunay Triangulation The safe path to a given destination is calculated using the scene description. If the destination is in front of the triangles, the path is the straight line from the current position to the destination. Otherwise, the path is undefined because nothing is known about the space behind the triangles. For example, in Fig. 4(b), the robot can get no reliable paths, because the destination is behind the triangles. Some triangles of Delaunay triangulation, however, are generated on the plane which is not object surface. The robot can go into the space through those triangles. We search such regions. Triangles of the Delaunay triangulation are classified into the following two categories: 9 impassable: The robot cannot pass the triangle region.

33

9 undefined: The passability cannot be decided. The procedure is as follows. 1. Triangles which are higher than the height of the robot are eliminated. Triangles which are farther than the destination are also eliminated. 2. Triangles whose width is narrower than the robot are classified as impassable because the robot cannot pass the region. The other triangles are classified as undefined. Fig. 5 shows examples of the undefined triangles. 3. The undefined triangles are merged into a polygon if they lie on a plane. For every pair of neighboring triangles(or polygons), a plane is fitted by the least squares method. The pair is selected which has the smallest squared error. If the value is less than a threshold, the pair is merged. The merged pair is added to the undefined triangles(or polygons) set and the procedure is repeated until no more pair is merged. Fig. 6 shows the result of this process.

4. Undefined polygons are examined using the light intensity image to see if they are the surfaces of an object. If the angle between the normal vectors of two adjacent polygons is large, there must be a visible boundary between them. Thus, if there exist edges along a boundary between two polygons, at least one side of the boundary can be regarded as an object surface. (Broken lines in Fig. 6 indicate such boundaries.) If a robot cannot pass through either of the polygons without interfering the boundary, the both polygons are labeled impassable. The remaining polygons are labeled undefined. The image regions( See Fig. 7) which consist of the neighboring undefined polygons are the candidates for attentive observation to verify the passability. 4. A t t e n t i v e O b s e r v a t i o n 4.1. D e t e r m i n a t i o n of R e g i o n for A t t e n t i v e O b s e r v a t i o n The image region for attentive observation is selected from the candidates as the one which is closest to the promising path. The camera direction is set to the center of the region. The zoom value(magnification) is determined so that the zoomed-in image includes the region. It is also necessary that the zoomed-in image contains segments in various directions, because they are used to register the position of the zoomed-in image and the original image. The positional difference of two segments can be defined only in the perpendicular direction to the that of segments. For example, in Fig. 4(a), the destination is the left most computer on the desk. The right region shown in Fig. 7 is chosen for attentive observation, because the region is on the path which is straight line from the current position to the destination. Fig. 8 shows the zoomed-in image including the region for attentive observation and the extracted segments.

34

Figure 5. Undefined triangles.

Figure 6. Result of merging. The broken lines indicate object boundaries.

4.2. Integration of Observed Data This section describes how to integrate the base data and the zoom data. The base data are defined as the 3-D positions of the segments and the points which are extracted from the initial images. The zoom data are defined as the 3-D positions which are extracted from the zoomed-in images and are transformed into the coordinate system of the base data. The zoom data include errors from the pan and tilt angles of the camera platform. We correct the errors using the positional differences in the image plane between the corresponding segments in both data. The following procedure is repeated until the sum of the differences is less than a threshold. First, the matching pairs of segments are determined. Second, the difference vector (described in Section 4.2.2) in the image plane between both data is computed. Finally, the zoom data are modified with the difference vector. After the final difference vector is obtained, the segments and the points are integrated to the base data. We assume that the zoom value is always correctly controllable.

4.2.1. D e t e r m i n a t i o n of M a t c h i n g of S e g m e n t s We define the following evaluation values to be used for matching: 9 The average contrast is the average of the contrast value of the pixels which compose a segment. 9 The average direction is similarly defined. 9 The non-overlap-ratio is the ratio of the length of the shorter segment to the nonoverlapping length between the both segments. It is given as follows: N o v e r = (18 - lo)/l,,

(1)

where 18 is the length of the shorter segment and the lo is the overlap length as shown in Fig. 9.

35

Figure 7. The neighboring undefined polygons are the candidates for attentive observation.

Figure 8. Zoomed-in image.

9 The mean-squared-error is defined as follows:

Msqua-

1 ~)~ {e(t) }2 dt,

t 2 - t,

(2)

where e(t) is the distance from the point at t, on the shorter segment, to the longer segment, tl and t2 are the positions of the endpoints of the overlapping part along the longer segment as shown in Fig. 10. For a segment in the base data, the segment in the zoom data is selected which minimizes the following expression:

wlCnij + w2Drij + waNoverij + w4Msquaij,

(3)

where Cnij is the difference of the average contrast, Drij is the difference of the average direction between the i-th segment of the base data and the j-th segment of the zoom data. If the minimum value is smaller than a threshold, the selected segments in the zoom data correspond to the segment in the base data.

4.2.2. Computation of Difference Vector With a set of matched pairs of segments, we determine how much the zoom data should be translated on the image plane. The optimal amount of translation is called the difference vector Z. The difference vector is determined so that the sum of the mean-squarederror weighted by the overlapping length may be minimized, that is, Z is given by,

{

where errk is mean-squared-error of the k-th matched pair, lo,k is the overlapping length of the pair.

36

Figure 9. Non-overlap-ratio.

Figure 10. Mean-squared-error.

Let us consider Fig. 11 where the solid lines are in the base data, the broken lines are in the zoom data. The difference vector is determined to minimize the sum of the meansquared-error in the shaded area. The difference vector is computed by the contragradient method.

Figure 11. Computation of the difference vector.

Figure 12. Scene description with the integrated data.

5. Experimental Results The initial input image of a scene, shown in Fig. 4(a) is taken from the start point and the scene description is constructed with the constrained Delaunay triangulation. Fig. 4(b) shows the result of the triangulation. The triangles are merged into polygons and the polygons are classified into two categories: impassable and undefined. Fig. 6 shows the undefined polygons. The regions which include the undefined polygons are the candidates for attentive observation. Fig. 7 shows

37 the attentive region candidates. If the left most computer in Fig. 4(a) is the destination, the path is set to the straight line course from the current position to the destination. Since the right region in Fig. 7 is on the path to the destination, the region is selected for attentive observation. The zoomed-in image including the region is shown as Fig. 8. The constrained Delaunay triangulation with the integrated data is shown as Fig. 12. Fig. 13 shows the top views of the descriptions. With some of the newly obtained features, the surface composed of the edges of the chair and the edges of the desk is determined to be passable(See Fig. 13(b)). Finally, the path from the current position to the destination is found.

Figure 13. Top view of the description. 6. C o n c l u s i o n This paper describes a method of constructing a 3-D scene description to find a path to a given destination by attentive observation. The 3-D scene description is constructed with the constrained Delaunay triangulation and a path to a given destination is determined from the description. If the path is not found, the unknown image region which is on the most promising path is observed attentively. New data is obtained with higher-resolution. The new data, however, includes the position displacement by the errors from the pan and tilt angles of the camera platform. The both images are registrated by matching segments in both images and by computing the displacement. The new data is integrated with the original description, and a path is searched for in the new description.

38 Experiments have been performed for an indoor scene which includes a chair, desks, and computers and so on. The results have proved that the proposed method is valid for cluttered scenes.

Acknowledgment This research is partially supported by Mr.Fredric Solomon and by the Science and Technology Agency of Japan as one of the fundamental technical research project in their Sensor Fusion Program and by Grant-In-Aid for Scientific Research under Grant No.06555071 from the Ministry of Education, Science, and Culture, Japanese Government. The authors would like to thank those concerned for their cooperation and support.

REFERENCES 1. 2.

3.

4. 5.

6.

7. 8.

9.

10. 11. 12.

"Vision-Motion Planning with Uncertainty", J.Miura and Y.Shirai, Proc. International Conference on Robotics and Automation, pp.1772-1777, 1992. "Omni-Directional Stereo For Making Global Map", Hiroshi Ishiguro, Masashi Yamamoto and Saburo Tsuji, International Conference on Computer Vision, pp.540-547, 1990. "Map Building For a Mobile Robot Equipped With a 2D Laser Rangefinder", Javier Gonzalez, Anibal Ollero and Autonio Reina, Proc. IEEE International Conference on Robotics and Automation, vo13, pp.1904-1909, 1994. "View Planning for Mobile Robots", S.Xie, Proc. IEEE Int.Conf. on Robotics and Automation, pp.748-754, 1990. "Occlusions as a Guide for Planning the Next View", Jasna Maver and Ruzena Bajcsy, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.15, No.5, May 1993. "Multiresolution Image Acquisition and Surface Reconstruction", Subhodev Das and Narendra Ahuja, IEEE International Conference on Computer Vision, pp485-488, 1990. "Visual Exploration of Free Space", Andrew Blake, Andrew Zisserman and Roberto Cipolla, ACTIVE VISION(The MIT Press), pp.175-188. "Tracking a Moving Object by an Active Vision System: PANTHER-VZ", J.Miura, H.Kawarabayashi, M.Watanabe, T.Tanaka, M.Asada, Y.Shirai, Proc. International Symposium on Robotics, Mechatronics and Manufacturing System'92, pp.957-962, 1992. "A Multistage Stereo Method Gibing Priority to Reliable Matching", O.Nakayama, A.Yamaguchi, Y.Shirai, M.Asada, Proc. International Conference on Robotics and Automation, pp.1753-1758, 1992. "Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure", Franz Aurenhammer, ACM Computing Surveys, Vol.23, No.3, pp.345-405, 1991. "Representing Stereo Data with the Delaunay Triangulation", O.D.Faugeras, E.Le Bras-Mehlman, J.D.Boissonnat, Artificial Intelligence, vo144, pp.41-87, 1990. "Constrained Delaunay Triangulation for Multiresolution Surface Description", De Floriani L., Puppo E., Proc. International Conference on Pattern Recognition'88, 1988.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

39

A C T I V E VISION INSPIRED BY M A M M A L I A N F I X A T I O N M E C H A N I S M T. Yagi, N. Asano, S. Makita, and Y. Uchikawa Bio-Electronics Engineering Laboratory, Dept. of Information Electronics Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-01, Japan* Focusing on the mammalian fixation mechanism, we propose a localization technique for object-detection in a scene, and implement it into an active vision system. In our technique, large and small camera movements are coordinated to detect an object precisely at the optical axis of the camera. Each type of movement is performed via two separate localization methods: approximate localization and adjustment localization. In the former localization, a preliminary fixation point is detected using the multi-resolution retina. After the optical axis of the camera is pointed in the direction of this point, the latter localization method determines a precise fixation point by means of visual feedback control, which minimizes the angle between the object's center and the optical axis of the camera. Experimental results show that the active vision system using our proposed technique succeeded in fixating objects one after another in a scene, even if several objects are placed within it. According to these results, we have confirmed that bottom-up processing using primitive visual features is an effective technique for localizing an object. 1. INTRODUCTION Active vision is expected to solve ill-posed problems in the understanding of 3-D scenes from 2-D images, and to be useful for more sophisticated visual information processing in industries [ 1][2][3]. One of the important components of active vision is "localization," which is the detection of an object and determination of its position. Conventional studies have used various techniques to localize an object in a scene; among them, segmentation by means of clustering several extracted features (such as intensity, color, texture, etc.) in a feature space, so called "feature-clustering" [4][9], is widely used. The problem inherent in this technique, however, is the long processing time required, which increases in proportion to the number of extracted features. Furthermore, it is difficult to apply this technique in scenes where each object has features that vary from the others only slightly. The other segmentation techniques, "region-growing" [15] or "split and merge procedure" [5], perform better in some cases; however, they are greatly affected by noise and sharp edges, as well as by lines which form disconnected boundaries. As a result, it is difficult to formulate simple criteria to decide whether or not they form true region boundaries. Of course, the "spatial-filtering" procedure [6] can be applied to find region boundaries; however, it also is sensitive to noise, and requires several other refining processes like noise-reduction or smooth-filtering in order to obtain clear and continuous edges. Consequently, conventional localization techniques have some problems in terms of processing time and noise. * E-mail: [email protected]

40 In order to effect better and easier localization in active vision, we have for years focused on biological visual systems, and have studied the mammalian fixation mechanism from the computational point of view [12][13][14]. Fixation is the process of looking at a single point using eye movements [8][11]. Because spatial resolution in vision is the highest near the optical axis and decreases towards the periphery, only an object placed at the center of the retina is identifiable. Therefore, fixation is one of the inevitable functions of mammalian visual systems. Fixation is usually completed with a fast eye movement called saccadic eye movement; however, sometimes overshooting or undershooting of the object occurs. Such detection errors are compensated with a small saccade (catch-up saccade) and/or a slow eye movement known as smooth pursuit eye movement. In addition, fixation is generally considered to be driven by at least two different visual systems: the primary visual system (i.e. r e t i n a - lateral geniculate body - striate area), and the secondary visual system (i.e. r e t i n a - superior colliculus - pulvinar). Because identification of an object is carried out in the occipital lobe, fixation through the primary visual system seems to be related to top-down (or knowledgebased) visual information processing. On the other hand, some experimental results have shown that localization of an object can be achieved using only the secondary visual system, such as in lesion-induced malfunction of the primary visual system [7][10]. Therefore, fixation through the secondary visual system seems to be related to bottom-up visual processing. Moreover, it has been theorized that the secondary visual system in fact specializes in finding where objects are. Consequently, the understanding of this system is a key to accurate emulation of localization in active vision. In our previous psycho-physical and computational studies, we have hypothesized the importance of retinal signal processing in the secondary visual system, and revealed that certain fixation is correctly executed only if the mathematical model of the retina has linearly increased dendritic field size in the retinal ganglion cells [13][14]. Based on this hypothesis, in this paper we propose a localization technique which is similar to the mammalian fixation mechanism. Implementing this technique in an active vision system, we demonstrate that the active vision system is able to fixate objects one after another in a scene where several objects exist. In the last part of this paper, we discuss the advantages of our proposed technique and our future research work using this system. 2. B I O L O G I C A L L Y - I N S P I R E D L O C A L I Z A T I O N In order to fixate an object at the optical axis of a camera, the object must first be detected in a scene, then its location determined, prior to camera movements. Following two

Figure 1. The proposed localization method.

41 different types of localization, approximate localization and adjustment localization, work interactively for this purpose (Fig. 1). Whenever a camera is moved, a new image is sent to a host computer, then either type of localization is executed. The former localization directs large camera movements and determines a preliminary fixation point, which is approximately identical to the object's position. When overshooting or undershooting occurs, small camera movements are effected for compensation by the latter localization method until the object is foveated at the optical axis of the camera. Such coordination of large and small camera movements, which correspond to saccadic eye movements and smooth pursuit eye movements, respectively, is expected to perform precise localization.

2.1. Approximate localization Our previous studies have revealed that retinal signal processing is considered to play a great role in the secondary visual system [12]. Moreover, fixation points seem to be computed primarily at the retina [13][14]. Therefore, we use a multi-resolution retina "Eye Field" (Fig.2) in approximate localization. This Eye Field increases resolution near the optical axis and lowers it in the periphery. It is a simplified version of the mathematical retina model, and has non-overlapping rectangular-shaped receptive fields. (Although the mathematical model could be applied, we have modified the model in order to reduce computational load.)

Figure 2. Various types of Eye Fields. Fig.3 shows how a camera-taken image is converted into the image processed by the Eye Field. Every camera-taken image is split into smaller sizes near the optical axis than in the periphery (Fig.3a). In each split area, the intensity values of each pixel are added. The total intensity is divided by the split area to obtain the mean intensity (Fig.3b). Fig.3c indicates the mean intensity in each split area. The Eye Field is expected to intensify each object differently, corresponding to its location. In addition, it is supposed to reduce the effect of noise, especially at the peripheral region. From among several peripheral split areas, the one which has the maximum intensity value is selected, and the center of that split area is defined as a preliminary fixation point. The camera is then pointed in the direction of this

Figure 3. Approximate localization.

42 selected position (Fig.3d). These processes are repeated until the image part of the object is located at the optical axis of the camera. As shown in Fig.2, several types of Eye Fields could be proposed. Because the dendritic field size in the retinal ganglion cells greatly affects fixation [13][14], the localization ability of the Eye Field was hypothesized to depend on its splitting format. Therefore, we evaluated each type of Eye Field on a computer in terms of the following: suppose several objects are placed at random in a scene. When they are the same in size, intensity, shape, and color, it is known that human beings tend to fixate the object which is more closely located to the fovea. From the facts, the nearness of the object, otherwise termed the proximity of the visual stimulus to the fovea, is one of the factors which induce fixation. Focusing on this tendency, we simulated the localization ability of each Eye Field in various scenes. For quantitative analysis, we employed the "Proximity Index," which was defined in our previous studies [12]. Proximity Index refers to the normalized traveling distance of the optical axis of the camera in fixation. Assume the following distances: L from an initial fixation point to a detected fixation point, Lmax t o the farthest object, and Lmin t o the closest. Then, Proximity Index p is given as follows; p =

tmax m t

(1)

Lma x - Lmi n

As shown in this equation, this index gives a value of 1.0 if the detected object is located close to a fixation point, and a value of 0.0 if it is farther away; consequently, we can determine how far the detected object is located from an initial fixation point. In addition to Proximity Index analysis, we also checked if each localization was executed without overshooting or undershooting. In this analysis, we simply observed whether or not an object was detected at the central region of the Eye Field. Fig.4 is the histogram where the frequencies of Proximity Index in steps of 0.1 were graphed. It shows very high frequency in the category 0.9-1.0 in every type of Eye Field; therefore, it can be said that the object which was more closely located to the center of the Eye Filed tended to be detected. From this analysis, we confirmed that Type-I Eye Field resulted in slightly better localization than the others in terms of the proximity of the visual stimulus to the fovea. Also shown below, Table 1 indicates the percentage of localization without overshooting and undershooting. Note that 67% of localizations were performed without overshooting and undershooting in the case of Type-l, the highest score among the four Eye Fields. As a result of its superior performance, we have used Type-1 Eye Field in following experiments. Table 1 The percentage of the localization without overshooting or undershooting.

Figure 4. Proximity Index analysis.

Eye Field

Percentage

Type- 1 Type-2 Type-3 Type-4

67% 62% 35% 40%

43

2.2. Adjustment localization After approximate localization is carried our, adjustment localization takes place. Because approximate localization may result in overshooting or undershooting, it is necessary to compensate for such errors. In adjustment localization, only the central region of the Eye Field (the smallest split areas) is used for processing (Fig.5a). After approximate localization has occurred, the smallest central region is divided into horizontal and vertical halves (Fig.5b). The mean intensity values of each split area within regions A and B, or C and D are then added. In order to determine the traveling direction of the camera, we compare the total value in each region. If the total value in region A is greater than in region B, the camera is pointed in an upward direction. If not, the camera is moved downward. Likewise, comparison of the total values in regions C and D determines the horizontal traveling direction, left or right, respectively. The traveling distance of the camera is computed using the sum of the values of intensity in each pair of regions. Fig.5c shows an example of this procedure. The maximum traveling distance is defined as the distance from the center of the fovea to its edge. (It corresponds to 4 blocks in the figure.) As shown, region A is 14.50, and B is 4.25, hence the total is 18.75, and the difference 10.25. The ratio, therefore, between the total and the difference is 0.547. As for the horizontal case, the sum, difference and ratio are 18.75, 14.75, and 0.787 respectively. The traveling distance is defined as the multiplication of the sumdifference ratio by the maximum traveling distance. Therefore, in the case of Fig.5, the vertical traveling distance is computed as 0.787 x 4 (blocks), and the horizontal as 0.547 x 4 (blocks). These processes are repeated until the total values in regions A and B, as well as C and D, reach equality. When they become equal, an object is expected to be localized at the center of a camera-taken image.

Figure 5. Adjustment localization.

44

2.3. Subsequent object detection Based on the detected position of the object via the above, the camera is then aimed in such a way that the object is placed at the center of the Eye Field after the completion of localization. In the next phase, a new target is selected from the periphery again using approximate localization; however, the next target might be one which has been previously localized. In order to avoid such repetitive localization, we set the following rule: whenever an object is detected, its position is memorized. During approximate localization in the next phase, the position of the newly-detected split area's center is compared with stored positions. If it coincides to a previously memorized position within a c e r t a i n range, it is excluded as a candidate for the next fixation point. Accordingly, the area which has the second highest intensity is then selected.

3. EXPERIMENT 3.1. Experimental settings Our proposed localization technique was evaluated in the experimental settings described below. We applied the technique to an active vision system, called "Eye Mover" (Fig.6). In Eye Mover, a small CCD camera (EM-102II, produced by ELMO CO.) was incorporated into a universal joint linked to a motion driver via a sliding rod. Because the camera is able to rotate around the rotational center of the joint, it is possible to point the optical axis of the camera in any direction. An X-Y plotter (GRAPHTEC WX1000, produced by GRAPHTEC CO.) was used as the motion driver of the camera. Camera-taken images were transferred to a host computer (EPSON PC-386V, produced by EPSON, with an Intel 80386 20 MHz CPU, and 80387 co-processor) through an image digitizer (PHOTRON FDM 4-256, produce by PHOTRON, with 256 x 256 pixel resolution and 1/60 second sampling time). In Eye Mover, the rotational centers of the camera and the focusing point are identical and fixed at one position. In addition, the angle between the optical axis of the camera and the vertical axis of Eye Mover, is easily obtained from the camera-positioning data in the cameramotion driver. Therefore, the object's position in the 3D space could be geometrically computed using the angle and the height of the rotational center from the floor. The maximum traveling speed of our system was 180 (degrees/second), and its acuity was within 0.4 degrees. Unlike hand-eye systems, the size of our Eye Mover system was relatively small. In addition, the simple mechanism enabled the camera to execute movements within a wide rage of motion and to fixate an object quickly.

Figure 6. Active Vision System, "Eye Mover."

45 3.2. Results

The experiment was executed in a scene which contained several randomly placed objects on a plane background. Fig.7 shows a typical example of approximate localization (Fig.7a and 7b) and adjustment localization (Fig.7c and 7d). In the figure, camera-taken images are indicated on the left, and processed images on the fight. The camera-taken image at the initial fixation point is depicted in Fig.7a-left. Fig.7a-right shows the processed image using the Eye Field. Using the approximate localization process, the most intensified split area, the middle upper-fight in Fig.7a-fight, was selected. The center of the area was then defined as a preliminary fixation point, and the optical axis of the camera pointed in that direction. Fig.7b-left shows the camera-taken image after the camera movement. In this case, the object was correctly foveated using only approximate localization, and adjustment localization was skipped in order to select the next object for localization. Overshooting or undershooting, however, happened sometimes, as can be seen in Fig.7c. Because the large camera movement resulted in pinpointing only the edge of the ball (Fig.7c-left), adjustment localization was performed. Using the central region of the processed image (Fig.7c-right), a more accurate position was then computed. After the second small camera movement, the object was fixated precisely at the optical axis of the camera (Fig.7d-left).

Figure 7. Experimental results. (a)(b) Approximate localization. (c)(d) Adjustment localization.

Fig.8 indicates a locus of camera movements. Open triangles and open circles denote fixation points in approximate localization and adjustment localization, respectively. Starting from the center of the scene, Eye Mover pointed towards each object and foveated them one after another within a few small camera movements. In addition, it can be observed from this figure that repetitive detection was avoided. At the second to last fixation point (near the bottom of a can), the system detected the can because it was closely located to the camera's optical axis. The detected position, however, coincided within a pre-described range to the previously detected position of the can, so the can was consequently excluded as the next fixation target. Finally, the last object (a wooden piece) was localized.

46

Figure 8. Locus of camera movements.

4. DISCUSSION AND CONCLUSION The experimental results reveal that an active vision system implementing our proposed localization technique is able to achieve precise localization when simple scenes are displayed. The optical axis of the camera is directed close to each object on the first camera movement during approximate localization; each object is then correctly localized within a few small additional movements in the adjustment localization phase. In approximate localization, camera images are split into various sizes, and the intensity values of each pixel are successively added in order to compute the mean intensity in each split area. On this point, approximate localization is similar to the conventional "split and merge" procedure. One of differences between the two methods, however, is the splitting format. In the Eye Field, split areas are inhomogenenous. This makes resolution with our technique high at the center and lower at the periphery, while resolution is uniform in the "split and merge" procedure. Therefore, the proposed technique enables every object in a scene to be weighted differently according to the distance from the center of the scene to its location. Moreover, approximate localization is more applicable to a noisy image, because the effect of noise is reduced by means of blurring the peripheral region of the image. The second difference is the time required in the splitting procedure. Because the splitting format of the Eye Field is already determined before processing, the time in the approximate localization phase is always constant, while in the "split and merge" procedure it is variable. This constant time element allows us to estimate the processing time of approximate localization in advance. Consequently, the simplified computation in approximate localization overcomes the disadvantages experienced by conventional techniques in terms of noise and the processing time. Our proposed technique can be seen as bottom-up processing. Based on a low-level visual feature such as intensity, an object is localized without the assistance of top-down processing. Moreover, this technique does not require any noise-reduction or segmentation processing, thus consequently reducing processing time. Of course, some criticism can be leveled against such simple processing. Because low-level visual features mainly affect the determination of a fixation point, the localizing ability of the system is still at a primitive

47 level. The experimental results, however, show that our technique achieves adequate localization in a scene where several objects were placed. As we mentioned at the beginning of this paper, localization was realized without object identification. Hence it is suggested that such primitive processing is a key element of localization, and a key to understand "attention," which involves top-down processing. Based on this assumption, we have undertaken psycho-physical experiments on human eye movements, and are formulating a mathematical model of fixation and attention in order to reveal the relationship between bottom-up and top-down processing. Results will be published soon. In addition, we are attempting to apply our proposed active vision system to robot-manipulation tasks. In this study, a robot hand is manipulated by means of the positioning-data computed via our technique. After the position of an object is determined, the hand is moved to grasp the object. Preliminary analysis shows that the acuity of the detected position is precise enough to manipulate the hand. Gradually incorporating top-down processing into our active vision system, we will seek to further refine our technique in future studies. REFERENCES

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [ 13] [14] [15]

J. Aloimonos, and I. Weiss, and A. Bandyopadhyay, "Active vision," Proc. 1st Int. Conf. Computer Vision, pp. 35-54, 1989. R. Bajsy, "Active perception vs. passive perception," Proc. IEEE Workshop on Computer Vision, 1985. D . H . Ballard, "Reference Frame for Animate Vision," Proc. 11th Int. Joint Conf. Artificial Intelligence, pp. 1635-1641, 1989. G.B. Coleman and H. C. Andrews, "Image segmentation by clustering," Proc. IEEE, 67, 5, pp. 773-785, 1979. S.L. Horowitz and T. Pavlidis, "Picture segmentation by a directed split-and-merge procedure," Proc. 2nd Int. Joint Conf. on Pattern Recognition, pp. 424-433, 1974. D. Marr, Vision, W. H. Freeman and Company, New York, 1982. M. Mishkin, L. G. Ungerleider, and L. A. Macko, "Object vision and spatial vision: Two cortical pathways," Trends in Neurosciences, 6, pp. 414-417, 1983. D. Noton and L. W. Stark, "Scanpaths in eye movements during pattern perception," Science, 171, pp. 308-311, 1971. R. Ohlander, et. al., "Picture segmentation using a recursive region splitting method," CGIP, 8, 3, pp. 313-333, 1978. G.E. Schneider, "Two visual systems," Science, Vol. 163, pp. 895-902, 1969. A.L. Yabus, Eye movement and vision, Plenum Press, 1967. T. Yagi, K. Gouhara, and Y. Uchikawa, "An algorithm of eye movement in selective fixation," Proc. of IEEE International Conference on Neural Networks, Vol. 2, pp. 761765, 1993. T. Yagi, K. Gouhara, and Y. Uchikawa, "The influence of the inhomogeneous dendritic field size of the retinal ganglion cells on the fixation," Proc. of The European C o ~ on Artificial Neural Networks, Vol. 1, pp. 26-29, 1994. T. Yagi, K. Gouhara, and Y. Uchikawa, "A computational study on fixation and the effect of an inhomogeneous morphological feature in the retina," Proc. of IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, pp. 777-781, 1994. S.W. Zucker, "Region growing: childhood and adolescence," Comput. Graph. Image Process., 5, pp. 382-399, 1976.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

Realtime Range Imaging Using An Adjustment-free

49

PhotoVLSI

Atsushi Yokoyama ~ , Kosuke Sato b , Takayuki Yoshigahara ~ and Seiji Inokuchi ~ ~Production Technology Group, Sony Corporation, 6-7-35 Kitashinagawa Shinigawa-ku Tokyo, 141 Japan bNara Institute of Science and Technology, 8916-5, Takayama, Ikoma, 630-01 Japan CDepartment of Systems Engineering, Osaka University, Toyonaka, Osal~, 560 Japan We report a VLSI sensor which gives a range image of a scene with high robustness and high accuracy at a video frame rate. No adjustment is required. Photodiodes and operational circuits are integrated into each pixel of the VLSI sensor so they function as analog/digital micro-rangefinders. The 2D array of the micro-rangefinders acquires frames of range data in parallel manner as a lightstripe is swept across the scene. Complementary pairs of photodiodes in each cell reliably acquire range data without being disturbed by temperature drift, ambient light, manufacturing irregularities or other noise. A prototype system has been built using a 24 x 24 VLSI array of sensing elements. It gives us 24 x 24 range images at a video-frame rate. 1. I n t r o d u c t i o n Obtaining 3D information of a scene at a high speed and with high accuracy is important in factory automation tasks such as inspection and assembly. Numerous studies have been made of rangefinding techniques[ill2]. The lightstripe-rangefinding method is one of the most common and reliable methods of rangefinding. In this paper we report an adjustment-free range-imaging VLSI sensor which receives a complete range image consisting of lightstripe range data with high robustness at a video frame rate. The speed of a conventional lightstripe rangefinder using a CCD camera does not exceed one image/s because a lightstripe rangefinder needs hundreds of intensity images in order to construct a 2D range image. The conventional CCD camera captures intensity images of the scene on which the lightstripe is projected. The position of the projected lightstripe is extracted from each image and used to calculate the range data. For each position of the lightstripe, only a one-dimensional range image is acquired by triangulation. In order to acquire a complete set of the range data in the camera's field of view, these operations should be iterated as the lightstripe moves across the scene in steps, which requires more than one second. Cell-parallel lightstripe rangefinders have been reported by Araki et al. [3], Kida et al. [4] and Kanade et al. [5][6]. In these rangefinders, the lightstripe continuously sweeps

50 across the scene. During one sweep of the lightstripe, the time at which the image of the lightstripe crosses a cell indicates the range between the rangefinder and the scene along the line-of-sight of each cell. The time is detected by each cell and a complete set of range data in the camera's field of view is acquired during each sweep of the lightstripe. Araki et al. [3] have built a realtime rangefinder with 47 • 47 resolution with discrete phototransistors connected to separate signal processors. This rangefinder holds the timestamp by digital latches. Kida et al. [4] have built a rangefinder with 4 • 4 resolution with discrete photodiodes. This rangefinder holds the time-stamp in the shape of an analog sawtooth voltage signal by a sample-hold circuit. For large arrays, a high bandwidth link between each photosensor and signal processor is required and is provided by a 2D array of micro-rangefinders - each cell includes a photosensor and a signal processor. Kanade et al. [5] [6] constructed a VLSI sensor which consists of an array of cells, each of which contains a photodiode and signal processor. This sensor holds the time-stamp in the shape of a charge in a MOS capacitor with an analog sawtooth signal. One disadvantage of this sensor is that the time at which the image of the lightstripe crosses each cell is determined by comparing the photocurrent of the photodiode in each cell with the external threshold. Precise adjustment of an external threshold is therefore necessary, taking into account the ambient light, the reflectance of the object's surface and the operating temperature. Another disadvantage is that leakage of the holding capacitance makes it difficult to obtain accurate range data. In this paper we report a realtime, adjustment-free range-imaging VLSI sensor which gives a complete frame of robust range-image data at a video-frame rate. We call this new sensor a SRF(Silicon Range Finder) chip. In order to determine the time at which the image of the lightstripe crosses each cell with high robustness, a photo-comparator which consists of a complementary pair of photodiodes and analog/digital circuitry is integrated in each cell of the SRF chip. This has the advantage of allowing reliable 3D data to be obtained without being disturbed by noise. A prototype system using this SRF chip and a CCD chip has been developed. This system provides both a range image (from the SRF chip) and an intensity image (from the CCD chip). 2. C e l l - p a r a l l e l L i g h t s t r i p e R a n g e f i n d e r In this technique, each cell of the sensor array acquires range data while the lightstripe continuously sweeps across the scene. As shown in Figure 1, a lightstripe sweeps across the scene from right to left at a constant angular velocity of the scanning mirror. The image of the lightstripe on the sensor array moves left to right across each cell. Each cell detects the time at which the image of the lightstripe crosses it during one sweep of the stripe. The time-stamp signal is sampled and held, and converted to the orientation of the lightstripe, then the range data is calculated based on the triangulation principle. Figure 2 illustrates the cell-parallel lightstripe rangefinder geometry. The range data Zi,j of the cell Pi,j is given using baseline Bi,j and A, the direction of the line-of-sight/3i,j and the orientation of the lightstripe ai,j,

tan(~i,jtan/~i,j A Zi j = ( Bi,j + ), ' tan(~i,j + tan/3i,j tanc~i,j where/3i,i,

Bi,j, A

are constant.

(1)

51 The range Zc,r of each cell Pc,r on the sensor array is acquired during one sweep of the lightstripe by the detection of the orientation ac,r of the lightstripe. It is obvious that robust detection of the orientation of the lightstripe is essential for accurate range finding.

Figure 1. Cell-parallel Lightstripe Rangefinder

Figure 2. Cell-parallel Lightstripe Rangefinding Geometry

3. Silicon Range Finder The reliability and robustness of the range measurement of each cell depend on accurate determination of the timing. The VLSI sensor designed by Kanade et al. [5][6] determines the time by comparing the photocurrent of a photodiode with an external fixed threshold. For highly reliable and robust rangefinding, however, the time at which the image of the lightstripe crosses each cell must be determined without being affected by the ambient light, the reflectance of the object's surface or the operating temperature.

3.1. A N e w Timing D e t e c t i o n M e t h o d As shown in Figure 3, each cell of the SRF chip consists of an optical comparator capable of distinguishing minor differences in light intensity across the cell. The optical comparator has a complementary pair of adjacent photodiodes which generate photocurrents proportional to their incident light intensity. The photocurrents are fed into a current comparator which determines which photodiode is receiving the brighter light. As the lightstripe sweeps from right to left, the dominant photocurrent moves from right to left. The time at which the image of the lightstripe crosses each cell is determined by the transition of the dominant photocurrent. Even though the lightstripe is blurred due to defocus, the transition of the dominant photocurrent occurs at the exact center of the pair of photodiodes. In addition to this, the transition of the dominant photocurrent is not affected by ambient light, thermal drift or other noise.

52

Figure 3. Optical Comparator

Figure 4. Sensing element circuitry and Clock

3.2. Sensing Element Circuitry As shown in Figure 4, the circuit consists of five main components: photodiodes, current mirrors with gain, inverting voltage buffers, a clocked comparator, and comparator read circuitry. The photodiodes are made by N + doping in a P - substrate. They are surrounded by guard rings to ensure a solid ground connection and isolation fl'om adjacent cells. A complementary pair of photodiodes forms part of the optical comparator which determines which photodiode is receiving the brighter light. The size of the photodiode pair is slightly different to prevent disturbance from weak noises- the right photodiode is 3% bigger than the left one. The current mirror acts as a current amplifier and a buffer between the photodiodes and the inverting voltage buffers. It serves three purposes: first, it amplifies the photocurrent (a ratio of 2:1 is used); second, it isolates the photodiode cathode from the buffer input, which is susceptible to the influence of clock breakthrough during buffer clamping and unclamping; and third, it allows the back bias voltage on the photodiode to be independent of the large capacitance of the photodiodes. The buffer is a simple CMOS inverter with a CMOS switch between the input and output. Closing the CMOS switch clamps the buffer to its highest gain state.

53 A clocked comparator detects very weak differential voltage of the pair of buffers. The comparator is constructed using two back-to-back low drive CMOS inverters with a CMOS switch in parallel to allow initialization. The comparator is initialized during each sensing cycle before the buffer outputs are connected to the comparator inputs. The result of the comparison is converted to a digital value and sent to the output (Yn) via the inverting tri-state buffer selected by the output control signal Xn. 3.3.

Circuit

Operation

The sensing element is operated under three phase signals - r Figure 4. These signals iterate the following four phases:

r

r

-, as shown in

9P h a s e - A The input and output of the buffer are clamped together to place the buffer

in its highest gain state. The comparator holds the previous state. 9P h a s e - B The buffer clamping is released and the buffers are fl'ee to work. The stray capacitance at the buffer input nodes will be charged, via the current mirror, at a rate proportional to the photocurrents of the two photodiodes, and therefore proportional to the incident light intensity. As the charge of the photocurrent slightly changes the input voltage, the output voltage will be changed by the gain of the inverter. If the incident light intensity is different on the photodiodes, the resultant voltage change on each buffer output will be different. This differential voltage is compared by the following latched comparator. Sufficient time is allowed for PhaseB to ensure that enough voltage movement is achieved at the buffer outputs to switch the comparator. 9P h a s e - C The comparator is clamped into a cross-coupled neutral state in preparation for comparison. During this phase, the buffer continues to output. 9P h a s e - D The comparator is unclamped and the outputs fl'om the two buffers are switched to the comparator inputs. At the end of this phase, when the buffer drive is released, the comparator flips to a definite logic state, depending upon the polarity of the differential input voltage. This state will be read out near the end of Phase-B in the next cycle. A peripheral circuit addresses each column (X~) and reads the output (7~;,) in turn at a clock of 30kHz to 100kHz. When a pixel (X~,Yj) transits low to high, the corresponding digital latch in a frame buffer acquires the time-stamp count in 16 bits. Thus, the resolution of the rangefinding depends on both scan rates - the clock rate of addressing each column and the rate of the lightstripe scanning. For example, when the lightstripe sweeps at 30Hz and when the peripheral circuit scans the output (}2) at this clock rate, the resolution of rangefinding ranges from l@00 to ~ of the measurement range. 3.4. P e r i p h e r a l C i r c u i t r y The SRF chip has n-parallel column select lines (X~) and n-parallel row read lines (?~;,). Peripheral circuitry encodes the column address lines and decodes the row read lines, as shown in Figure 5. A time-stamp counter holds the current time. A scanning mirror controller resets the time counter, so that the time stamp is synchronous to the mirror scan. The frame buffer

54 is a 2D array of digital latches, with the same pixel resolution as the SRF chip. According to the trigger generated by the decoder module, the corresponding digital latch holds the time stamp. After the lightstripe was scanned a scene, all digital latches in the frame buffer hold the time stamp. An interface module transfers the time-stamp data to a computer. The computer then calculates the range data based on the triangulation principle. .n~

7 \ qi "

C,o~ ~

i

i

Eq .

Line Xn ---

II

"

"1o 0

II -I1 --

SRF Chip

Reset

~' 'IC~ inter I

Scanning Mirror

rim e-S tam p

II' I

,..,rror Con,,o,,.r

Row Read Line Yn

*

Trigger

~IL IlL IILI ~ ~ I L IlL I I Interface ]

~~IL

IlL

I

1

Computer Frame Buffer

Figure 5. Peripheral Circuitry

3.5. S e n s o r L a y o u t a n d F a b r i c a t i o n A prototype SRF chip has been fabricated in a 1.5# CMOS using a double metal, single polysilicon process. Figures 6 and 7 show the layout of a sensing element and a 24• element range sensor array with read-out circuitry, respectively. Each cell is 250pro • 250#m and the two photodiodes in each cell are 66#m • 100/tm and 64pm • 100pro. Figure 8 shows the SRF chip within the 84 pin ceramic PGA package, which includes 24 column-address pins, 24 row-read pins, and 3 comparator-control (phase inputs) pins.

4. P r o t o t y p e R a n g e f i n d e r A prototype rangefinder has been built using the new SRF chip. Figure 9 shows the prototype system. Figure 10 shows the configuration of the prototype rangefinder system, which includes a camera module, a range-image memory unit, a lightstripe generation module, and a host computer. The camera module consists of the SRF chip, a CCD chip, a lens, and a dichroic mirror which splits the incident light into visible ray and perpendicularly reflected infrared (Figure 11). Infrared focuses on the SRF chip and visible ray focuses on the CCD. The SRF chip and the CCD are mounted at the same optical distance from the lens. A range image and an intensity image in the camera's field of view are obtained at the same time.

55

Figure 6. Layout of Sensing Element

Figure 7. 24 • 24 SRF chip

Figure 8. SRF chip within the PGA package

The range-image memory unit consists of the frame buffer and a time-stamp counter. As mentioned before, this unit outputs the control signals to each cell on the SRF chip through 24 column address lines, and reads out the comparator state through 24 row read lines from the SRF chip in parallel. The counter value - the time-stamp data - at which the comparator changes its state from low to high is stored in the frame buffer. During one scan of the lightstripe, the counter values corresponding to each pixel are stored in the frame buffer. The lightstripe generation module consists of a 30roW infrared (780nm) laser diode, cylindrical lenses through which the laser beam goes to the lightstripe, a scanning mirror mounted on a galvanometer and a controller. This module outputs the reset signal to the time-stamp counter in the range-image memory unit. A host computer converts the counter values of each cell in the frame buffer into range data through a look-up table. In the next section, we see that the look-up table which maps the counter values into range data along all sensing element line-of-sight rays is generated by a calibration process.

56

Figure 9. The Prototype Rangefinder

Figure 10. System Configuration

5. C a l i b r a t i o n For accurate measurement, the system must be calibrated to reflect the exact relationship between the time-stamp count and the range data. The conversion function f ( C ) is derived by measuring four known parallel planes, as shown in Figure 12. In this figure, the lightstripe and the SRF chip are perpendicular to the plane of the figure, a reference plane SO is located at a known distance and the time-stamp counter Co of each cell is recorded. After measuring the distance of the reference plane SO, this process is repeated for planes S1 and $2, which are located at d and 2d from SO, and time-stamp count C1 and C2 are recorded. Using these data, the conversion function f ( C z ) which maps time-stamp count C z into range data Z from the reference plane along the line of sight is obtained as 2dT~(T~ - Tz) Z = f ( C z ) = TzT~ - 2TzT~ + T~T~'

(2)

where T~ =

tan(kC1 - kC2)

T~ = T~ =

tan(kCz - kC2) tan(kC0 - kC2), k = const.

The constant k in (2) is derived from measuring another plane, $3, which is parallel to the reference plane at the known position of Zx(r 0, d, 2d). In practice, a look-up table for each cell is generated using conversion function (2) for realtime conversion. p f -

57

Figure 11. The Camera Module

Figure 12. Calibration

6. E x p e r i m e n t s 6.1. R a n g e a c c u r a c y a n d r o b u s t n e s s In order to test range accuracy and the robustness of the prototype system, a white plate was positioned on a translation stage in the measurable area and range data were obtained by moving the white plate toward the sensor. Initially, the white plate on the stage was positioned 30cm away from the sensor and stepped 2500 times at 4pro pitch, with range data measured at each position. This experiment was done both in a dark room and under ambient light (50cm below a 30W fluorescent lamp) without thermal control. An experimental system using the SRF chip was built to the following specifications. B a s e line

920cm,

Focal l e n g t h of lens : 55mm, M e a s u r a b l e r a n g e : 28cm to 32cm, Theoretical maximum resolution S c a n r a t e of l i g h t s t r i p e

930Hz,

C l o c k r a t e of r o w s c a n : 50kHz, P o w e r of l a s e r : 30roW, W a v e l e n g t h of l a s e r : 780nm.

924#m,

s

.

o

o -0.02 i l -0.04 0

,

.

,

5

.

.

,

10 15 Column

.

.

.

]: ]!}

.

'

.

"

20

(a) under ambient light

o 8e-05

I

0

.

.

.

.

-o.o: .i

m*" -0.04

0

.

.

, 5

.

.

,

,

10 15 Column

,

.

20

(b) without ambient light

I under ambient I I light -I 1

without ambient ligh.t.:-..---

"= 4e-05

>

.

e

""

0

""-I

......

5

I''"

I0 Column

I

I

15

20

(c) variance of errors Figure 13. Range Measurement Errors

Measured range data were compared to the actual range data. Averaged errors of 2500 measured range data are plotted in Figure 13(a) and (b). Figure 13(a) shows the averaged range measurement error of the each cell on the 14th row of the SRF chip under ambient light. Figure 13(b) shows the averaged measurement error in a dark room. The diamondoid points in the plot represent the mean errors and the vertical lines represent the ranges of the 2500 measurement errors. Figure 13(c) shows the variance of range measurement errors under each condition. The maximum range error under ambient light is - 2 4 # m to +10#m and that without ambient light is - 2 4 g i n to +14#m. This shows that the experimental error is less than the quantization error (+24#m). Because the accuracy and reliability of sensing data are the same in both cases, we conclude that a complementary pair of photodiodes - an optical comparator - is robust for the ambient light, and adjustments are not necessary. 6.2. R a n g e I m a g e of a Scene w i t h C o m p l e x V a r i a t i o n s in R a n g e We acquired a range image of a PCB mounting two QFP ICs, eight chip capacitors and two chip resistances. The heights varied between 0.5ram to 1.6mm. Figure 14(a) shows the PCB intensity image captured by a CCD. The PCB was positioned 180mm ahead of the SRF camera under the ambient light without thermal control. Figure 14(b) shows an elevation map of the 24 × 24 range image acquired by the SRF chip. In order to avoid occlusion, light generation modules were set at both the right and left side of the camera module. A lightstripe was projected from the right and left in turns and each range image was acquired at a video frame rate. The complete range image was generated by

59

Figure 14. ICs and Chip Parts on PCB

integrated two range images. The range image was a few pixels wider than the intensity image because the SRF chip is wider than the CCD chip. Although there are several different reflectances of the object's surface, such as mold surface and solder surface, the range image shows that the acquired range data are quite accurate. We also acquired a range image of a human face (Figure 15(a)). The face was positioned 50cm ahead of the SRF camera with a 20ram lens under ambient light without thermal control. Figure 15(b) shows an elevation map of 24 x 24 range image acquired by the SRF chip. The curved surface of the face is clearly visible and the lips, the nose and the cheeks are readily distinguishable. 7. C o n c l u s i o n \Ve have reported a new photo-sensing structure for a cell-parallel lightstripe rangefinder capable of reliably acquiring range data without disturbance from ambient light, the temperature drift of the environment, or the reflectance of the target object. We fabricated a smart vision VLSI sensor (SRF chip) based on this new structure. An experimental prototype system using the SRF chip demonstrates that it is able to measure a dynamic 24 x 24 range image with 99.9% resolution at a video fl'ame rate. Unlike other rangefinder chips, it is adjustment free. The prototype system also acquires intensity images by a CCD chip in the camera's field of view at a video flame rate. The combination of these range and intensity images of the scene should be useflll for various factory automation tasks, such as visual inspection and assembly.

50

Figure 15. Face REFERENCES

1. Y. Nishikawa and S. Inokuchi, "Range Data Entry Techniques Using a Laser Spot Scanner and Two Linear Solid-State Image Sensors", IEEE Trans. on Instrumentation and Measurement, VOL.IM-30, No.4, pp.270-273, 1981. 2. K. Sato and S. Inokuchi, "Range-Image System Utilizing Nematic Liquid Crystal Mask ", 1st Int. Conf. of Computer Vision, pp.657-661, 1987. 3. K. Araki, Y. Sato and S. Parthasarathy, "High Speed Rangefinder", Proc. of SPIE, Vol.850, pp.184-188, 1987. 4. T. Kida, K. Sato and S. Inokuchi, "A Realtime Range Imaging Sensor", Proc. of 5th Sensing Forum, pp.91-96, (1988) in Japanese. 5. T. Kanade, A. Gruss and L. R. Carley, "A Very Fast Rangefinder", Proc. of IEEE 1991 Robotics & Automation, pp.1322-1329, April 1991. 6. A. Gruss, T. Kanade and L. R. Carley, "Integrated sensor and range-finding analog signal processor", IEEE Journal of Solid-State Circuit, vol.26, pp.184-191, March 1991. 7. K. Sato, A. Yokoyama and S. Inokuchi, "Silicon Range Finder- A Realtime Range Finding VLSr', Proc. of Custom Integrated Circuit Conference '94, pp.339-342, 1994.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

51

A Pedestrian Discrimination Method Based on Rhythm S. Yasutolni , H. ~h)ri and S. Kotalfi Department of Electrical Engineering and Computer Science, Yamana.shi University 4-3-11, Takeda, Kofll, Yamanashi 400, Japan A new metho(l of fin(ling pedestrians in a time sequence of visual images has beell proposed, and its great 1)ra('ti(:al al)l)li('al)ility has ])een demonstrated. Walkers exhil)it 1)eriodic motion in (luite a different way from other moving ()l)jects. This 1)eriodicity, or in other words rhythm of walking, is the result of the walking action and can ])e ol)served in two kinds of perio(lic intensity change of the 1)e(lestrian's image. So, pe(lestrians can 1)e ibun(l if we l()()k for these p('rio(licities whi('h are (llfite unique to them. To realize this idea, we estimate the teml)oral-fre(llmn(.y and the spatial-period ()f the moving ()l)je(.ts. The temporal-fie(luen('y is the os(,illati()n of intensity against time. The spatial-1)erio(l represents the fluctuation of intensity along the traje('tory of the object, an(l is referred to as the stride of walking. A pe(lestrian detection meth()(l using these two oscillations has the following advantages. First, this metho(l is in(lel)en(lent of the al)solute intensity or contrast of the ol)je('ts. Secondly, clothes, hairstyle, shape of 1)ody, illumination and the distance fi'om the camera do not affect the relial)ility of detection. Finally, the algorithm is simple enollgh for real time operation. The (letaile(l algorithm is presented here and its vali(lity and 1)racti('al appli('ability are examined 1)3" an experiment on asphalt paved road. 1. I N T R O D U C T I O N

We focus on how to discriminate pedestrians flom other moving objects such as bicy(-les, animals and swaying plants. This technique is inq)ortant ill lnany fields: for example, detecting intruders for a. security,,' gual(ling systeln, a.n(l finding an avoidance-1)ath lnechanism for mobile rol)ots. However, variety,, of clothes, hairstyle and bodyshal)e make it. difficult to realize this end. So, we propose a new (l('te('tioll method which makes fldl use of periodicity in walking [1]. The recognitioll of an ol)je(-t in motion is (me of the tol)i('s of 1)sychol)hysics. ~\}' use the word "motiolF' here to refer to the challge of shape of the ol)ject and not to the change of its position. The moving light disI)lay (5ILD) nlctho(l proposed l)x" Johansson [2] is capable of perceiving the l)ody and the lnotioll of a walker. Kozlowski and Cutting [3] have demonstrated that the MLD can tell the sex of a walker using the patterll of the motion. These facts suggest that the motion of the ol)ject ("(mid 1)e the key to its recognition. R ashid [4] has reconstructed the structure of an ol)ject k o m the MLD kom a.n ellgineering point of view. So, given the structure of the lmlnan body, pedestrians can be selected from other moving ol)jects. However, this idea will not work because it. will be difficult

62 to know the in(livi(h:al ln(~tion of tit(' lmlts (if" th(, l~(,(l(,st:ian, i.('. arlns, f'(,('t, h('a(l an(l so ()lI. \\'(' can't ('Xl)('('t ('v(u'y 1)('(h'strial~ t(~ w('ar Sl)('cial marks at ('v(,ry j()int. P(>lana's a l)l)r()a('h [5] is s(mwwhat (lose to ours" It(' has tal:('n th(' m()ti(m int() a(-(-()lmt 1)y al)l)lying F F T and its 1)(uio(li('ity against tim('. A new ln('thod w(' 1)lOl)OSC in this 1)al)(U t(,lls 1)('d('st:ians h o m oth(u ln()ving ()l)j(,('ts 1)y utilizing two tyl)CS of 1)('rio(li('ity uni(lll{' t{) th(,m. Th(, first ()1:(' is the pitch (): 1)a('(, of walking. \V(' shall ('all this 1)(':i()(li('it)' against tim(' "t(unl)()ral-fr('q~:('ncy". Th(' oth('r tyl)(' is st ri(l(' of walking whi('h wc ('all "Sl)atial-1)('rio(l"" it is a n('w ('on('('l)t a(l(l('(l t() th(' apl)roach by Polana. The pace of walking at natural Sl)r has its o.wn Ul)l)(U' and low(u limits. So has the walking stride. Thus, t lwsc two featur(,s can represent all the motion of 1)c(h'strial~s. In order to ol)sc:vc t h('sc 1)('ri()di('iti('s in mono('ular images, w(' look at th(' intcmsity diffi'r('m'(' l)etwc(,l~ suc('essivc flam(,s. The l~e:'iodi(" ('hal~g(' of intel:sity l~rought al)out 1)y t h(' motion of t h(, f(,et can 1)(, ()l)s(uv(,(l and is ~s(,(l t() ('stimat(' t h(" t('ml~oralh'('qucnt'y. Tlw aml)lit~td(' of tlw os('illati(m will not aff'('('t th(' tcml)()ral-fr('(l~:(u~('y. ()m'(' the 1)a('(' aim dire('tiol: and distan('c of a m()ving ()l)j(,(-t a:(' knr th(, sl)atial-1)('ri(~(l ('an 1)e 1):'('(li('t('(l. Again, thcs(' 1):'o('c(hu('s a:'(' n()t aff'('('t('(l 1)y vis~al (listurl)am'(' as m('nti(m('(l a])ov('. Pedestrian (h,t(,('tio:~ 1)as(,d on t h(, a l)ove i(h'a is :('aliz('(l l)y tim,(' lm~('('ss('s: 1) m()vingol)j('ct (h't('ction, 2) the t ra('kil~g 1)l'O('('ss, 3) s('l('(ti(m l~y rhythln. In section 2, w(" (lcs('ril)(' t h(' m e t h o d of (l(,t('('ting a m()ving ol)j(,('t and its 1)()siti(m in successive h'alnc data. Then section 3 will l~(, (h,v(,t(,d to a tracking m(,th()(l l)asr on th(' kincmati(" model and ('onsi(h~latioll of its :vlial)ility. Cal('~:lation (,f t(,ml)oral-h('(l,:Cn(y and sl)atial-p('riod as wcll as a mod('l mat('hing t('('hniq~:c ::,ill 1)(, Dilly lnVs('nt('d in s(,('ti(m 4. The CXl)(Uiln('ntal results ol)tain(,d (m an asphalt lmV('d roa(l at(' (h,s('ril)(,(l in s(,('ti(m 5. Th(, final section gives some conch:(ling r(una:ks.

2

MOVING

OBJECT

DETECTION

A moving ol).jcct ('an 1)e detect(,d in m a n y ways such as sul)tra('ti(m of su(('(,ssiv(' frames [6], sul)tra('tion from background imag(', ol,ti('al flows [7] and so on. Sin('(" th(, sul)tractioll of su('c'r frames can 1)e dolw in the sh(n't(,st 1)ro('cssing t inw, w(, al)l)li('d this at the l~cginning of the data pro(cssing. The detail('d dctecti(m ln'O('cdm(' is as folh)ws. 1. Th(' illt('nsity ('hang(' 1)('tween su('('('ssiv(' flam('s is ('ah'ulatc(l and its al)solut(' vah:(,s arc t ransferr('d to the following lnOc('ss as sh()wll ill figlll(' 1. 2. To d(:'fin(" the moving-ol~jcct region a('('urat('ly, a vertical ('(lg('s arc sear('h('d fin and d a t a l~oints :cmoved whi('h hay(, h'ss int(,nsity than a t hrcsh(~M vahw. Th~,se lnOCeSs('s a.rc us(,ful bccaus(' imag('s ~)f"p(,d(,strians at(' mo:'~' likc,ly t.o hay(, vcrti~'al ('(lg('s than arc ()tlwr obj('cts. Th(' t hr(,sh(fl(l int(,nsiti(,s I/'t//el and Iv'nl~2 arc fix('(l so to cut off th(' 99 % of the intensity ('hange Inought h o m the 1)a('k-ground of t lw images. 3. To get th(' l~ott(ml position of t h(, ol)j(,(t :vgion, th(, v(,rti('al and h(n'iz(mtal lnoJ('('tions of the intensity a.rc ('al('ulatcd as shown in figure 2. The ol)je(,t-rcgion width

63

F i g u r e 1. D(,t(,ctit)n ()f a m o v i n g ol)ject

F i g l m ' 2. P o s i t i o n an(l wi(lth on t h e imag('

u'i(lt:xl(;(~.) is d(~t('rmined as t h a t of t h( ~ vcrti('al 1)roj(,('ti()n, a n d tit(, h o r i z o n t a l 1)osition u(A.) p o i n t s th(, (cnt(~r of th(' wi(lth. Th(' v('rti('al 1)()sition ~,(~.) is (h't('rmin(~(l as the 1)()ttt)m of the 1)rojti~(:tioll so t() fi)(.us 111)t)n th(, m()ti(m of t ht ~ t'eet. St'e figurt 92 t'()r (l(,tails of tht' r(qations 1)(~twt~en u'i(//.xtc,.(z.), u(~.) a n d ~'(~.). Her( ~ th(" in(l('x L" s t a n d s for tht, tim(, at whi('h th(' flamt, is ol)s(,rv(,tl. Th( ~ 1)ro('t'ss('s list t'd al)ov(, art, 1)el'fOrln('(| ()vt, r t h(' wh()l(' t)f th(' imagt~s ol)tain(~(l at t h(" v('ry 1)(~ginning of th(, d a t a 1)roct'ssillg. H()w(,v(~r, if tht, 1)t)siti()n of th(, ()l)jt,('t ca.n 1)(" 1)redi('t(~d in tit(' t't)urs(, of t r a c k i n g , only th( , (lata in a s m a l l t l a ( k i n g w i n d o w ('Clltt,lt,(l Olt t h(, ol)j(,('t art, 1)r()('t'ss(,(l. T h i s grt'atly r(~thl('('s 1)r()('('ssing cost a n(l nois(' t'a.llst,(| ])y ()th('r m o v i n g ol)j(~('ts.

3. T R A C K I N G Thtu(, art, tw() k i n d s ()fmt, t ht)(l tt) realiz(' t r a c k i n g . Tit(' first is tt) s('t'k th(" mfitllu 9shal)(" of the t)l)j(-'(-t. Gill)(,rt t't al. [8] h a v e (l(,mt)nstrat(,(l this mt, th()d in t r a c k i n g an airl)lan( ~ using its twt)()rth()g()nal 1)rojt'ctit)llS. Howt, vt'r. this mt,thotl rt,tluircs t ht~ ol)jt,('t to 1)t, rigid an(l is nt)t suital)h' for ()1lr l m r p o s ( '. T h t ~ s('('t)n(l att('tho(l is l)ased ()n t ht' ldnt'mati('s of the ()l).j(,(,t. Lt, gtt,rs anti Y ( m n g [9] hart, a i)l)lit'(l a m('tht)tl whi('h us(,s kintunati(" mo(h,ls an(l t.ra('killg filters to the sinmlatt~(| scent'. Alth()ugh thi.s ln('tho(| is us(~ful a n(l suital)h' f()r ()ur goal, it is n()t 1)ow('rflfl (m()ugh. \V(' h()l)(' th(, t r a c k i n g will (-ontinu(, (,Vt,ll if t h(, m('asur(un('nt ('n('()untt'rs ()utli('rs. Th(' ilnl)r()vt'(l r(~('()v('ry svst(un for th( ~ t ra('king will 1)(, stat('d in sul)s(,('tion 3.2.

3.1. K i n e m a t i c s a n d M e a s u r e m e n t M o d e l B('t'or(' s t a t i n g th(' m('asllr(un(mt or (lata h a n d l i n g t('('hni(llt(', th(' s('tting ()f t h(, vi(h,() ('amtua sht)lfl(l l)(' ('xl)laint'(l. \\-(' assure(' it tt)1)(, a 1)in-h()h' c a m t u a an(l alignt'(l 1)aralh'l tt) th(' r()ad. T h i s makt's it 1)()ssil)h ~ to st't, a m a x i m l m ~ mmfl)('r t)f 1)edt'strians in a vi(,w. Tit(' horiz()ntal axis ()f" vi(,w is also 1)aralh'l tt) th(' ,gr()lln(| s() as tt) ltlak(' 1)(,d('strians hay(, m()rt ' vtuti('al ('(lg('s as (letaih~d in the 1)rt'vi()lls s(,('ti()n. L('t .~'(a.)and !I(L.) giv(' th( ' 1)()siti()n

64

Z 9

,,0

/'-'-'-'-'-'-'-'O ~- / D l " " ",,

(~,,v(~)

.I

~ ~ 1 ',, ...--" )- ' 4 ~ ~ . / " , ,

ax,

~(X(k) ,y(k) )-Y

X"

Figure 3. ~Ieasurelnent model and ('OOl'(linat(' s3"stelns of a 1)edestrian at tim(' k. The state v(,('tor X(t.) is conll)osed of the position .r(~.) and y(k.) an(l their (lerivatives with respect to tilne .i'(/,,) and /1(i,.), i.e. velocity. Now we suppose a pedestrian to have an ahnost constant velo('ity. Then the estimated 1)osition of the pedestrian at time (k + 1)-Ys will 1)e

(i)

X(k+t) = FX(,.) + w(t.) where

X(t.)=

,i'(t.) y(k) !)(,.)

1 00

,F-

0 0 0

1 0 0

0 0 1 Ts' 0 1

,w(t.)-

i01 ,'t(t.) 0 ,"2(~.)

"

Here Ys is the nornlalized time interval and w(i,.) is calh'd system noise, w(t.) represents the fact that no pedestrian goes exactly straight, and it. is supposed to be a sequence with zero lnean, white noise and Gaussian with ('ovaiiallce Q. The Q is set to be 0.27Tm/s 2, as determined by experiment. The relation between the position in real-world coordinates with that in the linage plane Y(k) can l)e expressed using translation operator hit. ) as

(2)

Y(~'i = hIk'l(X(~'l) + z(k)

where Y(k)

=

~'(k)

, Zlk)

--

-'2(k)

9

The operator h(t.) defines the relative position of tile video camera and the ground (see figure 3) mid is read as

" fc :r(t.) uc + fc--Y(k) sin(O-Tr/2)+h cos(O-Tr/2) h(k.j(X(k.i) =

(3)

b fc (Y(t.) cos(O-Tc/2)+h s i n ( O - r / 2 ) ) fc-Y(k) sin(O-Tr/2)+h, cos(O-Tr/2)

65 where .f(:, h, and 0 correspond to focal length, height an(l depression angle of the ('amera. The optical axis is set at tile center of the image plane and its position is (u(,, vc). a and b are the aspect, ratios needed to correct the rati() 1)(,tween wi(lth and height of the image plane. Z(k) represents the measurement error and again is set to be zero mean, white noise and Gaussian with cova.rimlce R(k.). R(k) del)ends on the distance 1)etween pedestrian and tile video camera: the measurement noise becomes smaller when a 1)edestrian goes farther from the camera. Extended h:ahnan filter (EKF) is used to get the estimated state vector X(~.]~.) and predicted measurement Y(klk-1). X(klk) is the result of the incremental procedure of EKF. The predicted ~r(klk._l) points the center of the window at the next time step as is mentioned in section 2.

3.2. Checking the Validity of M e a s u r e m e n t The pedestrian may be hidden 1)3;other moving ol)jects and noise may lessen the validity of measurement. These facts reduce the reliability of measur('ment and will lead tracking to end in failure. For these reasons, we should make s~tr(, that observation is reliable. If the observation does not satisfy the following ('()nditions, th(' ol)serve(l l)osition Y(~.) is judged to 1)e wrong and should 1)e replaced with th(" 1)redi('ted 1)ositi()n ~r(~.l~._l). 1. Occlusion When the image of a pedestrian ix covered by another moving object, occlusion may arise. In this case, the 'mcas'urcd width of the object may not lie with in the range of the pedestrian width. So. th,c width of the object is a good measure of occlusion. The width of the object in real spacc wid(~.) is determined by applying an inverse perspective transformation to tt,i(11,~l(;(~.) at (u(~.), t'(~.)), and ch(:cki'ng by thc following condition. I'VI > u'id(k) > H2

(4)

where l't'l and H,~ dcnote the maximum and minimum widths of pedestrian respectively. .

Outlier When the noise cause an outlier in th,c measurement of position, the estimation may not be valid. We check the following condition for validity.

[X(k',/~'-1)-s

p(kl/,._l)-I[X(k',~,-l)-- s 2_

(5)

where ~( is a tentative estimation by the mcas'uTvmcnt "Y at I,. P(~I~-J) is the prediction error covariance determined by EKF. This condition means that wh,en the Mahalanobis distance between ~((k'l~'-l) and ~( is greater than 3.0 .) in the chisquare distribution of 4 degrees of freedom, then the measurement ~Y ix not reliable.

4. D I S C R I M I N A T I O N B A S E D ON R H Y T H M Tile rllythm of walking is caused 1)y a two-stage 1)il)edal action: first, a pedestrian stauds still for a relatively long time on both feet" second, oue of the feet steps forward rather swiftly. This is clearly seen in figure 4 as the 1)eriodi(' intensity change of the sul)tracted image data around the feet. In this section we will r the ways of placing a small window which is especially useful for detecting rhythln and ascribing it to walking.

66

Figure 4. Sul)tra('ted images at the fi,(,t of a 1)edestrian. Th(' ('(mtrast was enhan('ed fin. clarity

Fig~lr(, 5. \Vin(h)w fin d(,te(-tion of int(,nsitv ('hauge

4.1. D e t e c t i o n of R h y t h m A small win(low which in useful fin" (h,t(,('ting th(, rh.xthm ()f 1)('(h'strians in 1)1lt within the tra('king window as shown in figure 5. Th(' small win(low in l l l in wi(lth an(l 11~ in height. The actual values of 11, and ll~ will l)(, stated ill s('('ti(m 5. Th(' int('nsitv in this window will oscillate during walking lint th(' (listan('(' 1)('tw('en th(' camera and t h(, pedestrian aftb('ts its a nll)litude. Since the (listan('(' is inv(u'se 1)rol)ortional to the win(l()w size, the total int(msity within tit(" window in divide(l 1)3" the x,~:in(h)w size ill th(' imag(' plane. The norlnalized tilne se(tuen('c of total int(ulsity (~f"the window (l('ll()t,'(l by I(~.) in pa.rti('ularl.x,, suit al)le fl)r detecting the rh.x'thm. To calculate th(, 1)ower sl)e('trum P(.f) flOln I(~.) (1,: - 0 , . . . , / ~ ) for a d m a t i o n ()f :X' 9T.s. seconds, F F T is not slfital)le be('ause res(flllti()n is lilnit('d 1)y t.h(' numl)er of d a t a 1)r in a sampling interval. So, we use the n mxim, ml cntrol)y m(,thod (~IEM) [10] for its 1)etter fl'eqllency resolution. The final 1)l'('dicti(m Crl'()r in used to estimate the (n'(h'r 3I tbr the aut(~regressive lno(h'l for ~IE~I. The f'~uMaln('ntal t elnl)oral-fl(,(l~ten('y F of I(~.) ol)tain('(l in this way is related to the sl)atial-1)(ui()d T 1)y r -

D F - N . T,s,

(6)

where D iel)resents the distan('e of the ol)je('t at _Y. J-:.s'.

4.2. M o d e l M a t c h i n g Before 1)roceeding to the matchillg 1)roc('ss. a whit(, 1lois(, ('(mtl)(ment is r~unove(l flOln the power sl)e('truln ol)tained. Peaks slnalh,r than PrH/~ are set to 1hake no ~'ontril)uti(m. Walking a.t natm'al speed has its own limitatiollS fl)r th(' flmdamental t('ml)()ral-fl('q,Wll('y F and sl)atial-1)eriod T. Let F:~l.:l.\', F~ll:v and T.~I i.\', T~t/,x l)e the Ul)l)er and lower limits of these quantities. The dimensions of tiles(' 1)aramet('rs at(' presented in sul~section 5.1. The (letaih~d process in as follows: 1. To nornmliz(, the power Sl)ectrunl shal)e: a l l / } larg('r than P/'t11,' are smnme(l:

67 ,\:p

(7) /--1

H('r(' ~ is th(' p()w('r at th(, fl'('qll('ll('y f I (/ - 1 , . . . , A ~ , ) . If th('r(' is n() 1)()W('l ~/ gr('at('r than P~H~r no 1)('(lcstrian is in th(' win(low. S(), tit(, ('vahlati()n valu(' 5" is set to () and th(' 1)ro('('ss is lmltcd. 2. F()r ('a('h fr('(lu(ul('y _F!, th(' ('()rr('sl)()n(ling Sl)atial-1)('ri()(l T/ is ('ah'ulat('(l llsing th(' r('latioll 6. 3. The fr('qu(uwy F i is found at which Pi is a m a x i n m m and slwh that

F,~ttx < F/ < F.~t.~x, T,~llx < T / < T.~I ~.x-.

(s)

4. The evaluation valu(' E is t lw normaliz('(l 1)ow('r Pi, i.(,. E - P j / S . 10() 5. If E is grcat('r than t h(" giv(,n yah1(, E/'H t~, w(, (()n(ht(h ~ that th(u(' is a 1)(,(l(,strian in th(' window. ()n th(, oth(,r hand, if E is smalh'r than ErH/r t h(u'(" is n() 1)('(l('strian. ErH I~ is ('Xl)('rim(mtally s(,t so as t() nfininfiz(' th(, dis('riminati()n ('rr()r and th(, vahw is givcll in slll)s('ction 5.2. In th(, r(,al (mvir()nment, irwgular nois(:, will fr('(lll('ntly al)l)car whi('h has almost th(' sam(' Sl)c('trunl as that a ssigne(l t o 1)edestrians. This t('n(huwy is strong if' th(' ()l)s('rvati()n is ('arri(,d out fl)r only. a sh()lt. tiln(,. . To less(u1 th(, ('()ntril)lltion ('alls(,(l 1)v nois(', th(' . 11111111)o1 of o])s(uvations _V should l)e largc. In oth(,r words, the Salnl)lillg 1)(uio(l sh()uhl 1)(' long. Howcv('r, this will 1hake t lw d a t a 1)r()('('ssing tim(' l()ng. As a r(,sult ()i'a vari('tv of ('Xl)erim(ults. we t'oluld 64 d a t a 1)oints to 1)(, ('n()l@l t() lnak(, t h(' disturl)an(-(' smalhu than E/'Hfr 1)rovid('d only tw()('y('h:'s of 1)(u'i()(li(" ('hang(' wh()s(, fr(,(llWn(-y is th(, sam(, as that of walking, at(' a('cid(,ntally m(,rg('(l in th(, (lata.

5. E X P E R I M E N T S 5.1. E x p e r i m e n t a l S y s t e m The algorithm fi)r 1)('(l('strian dete(tion dcs('ril)(,(l in th(, 1)rcvious s('('ti()ns was t('stc(l using imag(' s('(lu(uwcs of an asl)halt paved road ()n th(' university ca nll),ls. Th(' vi(h'() c a m e r a S o ~ g C C D - TRI()00 was 1)la('('d at 1.()m h('ight, with 15.0 ~ (l('l)r('ssi()n angh'. In t lw RGB dcco(l(,(l vi(h'o signal, only tlw G (,Oml)()n('nt was us('d, whi('h has al)out 60(X: of th(" total hmfinosity. The A N D R O X ICS-400X~I9 imag(' 1)ro('('ssillg system ( C P U 68030, OS-9) was (Unl)loy('d with i b m digital signal 1)r()('('ss()rs. The saml)ling of th(' images and t la('king of the ('Xl)('rim('ntal s('t-ul) arc 1)r('s(ultc(l in tal)h,s 1 an(l 2. 5.2. E x p e r i m e n t a l R e s u l t s Two exanll)los of imag(, sequences arc shown in tigmc 6. Th(u(' are six 1)('(h'strians (Nil-El6), two 1)i('.x'('l('s (NIt, :~i~) and a dog (NI~)). The video ('anwra can look at ol)j(,('ts at distan('cs of 3.0 to 20.0m. Figure 7 plots t h(' cstimatvd traj('('tori('s and v(,lo('iti('s of t h(, ol).ic('ts. Th(' time illtcnsity an(l its power Sl)(,('trlml fl)l ('a(-h ol)j(,(t arc shown in figur(' 8. The int('nsity (l(,1)('n(ls on the a ctioll of walking (n ln()titm such as riding a 1)i('y('h,

68

Table 1 Parameters for the exl)eriments Threshold for moving object detection Number of intensity change data Order of AR model Threshold of power Pace model Stride model Pedestrian width model Ankle height model

Ir..i N.ll -

-

11 64

I r H R2 =

1?

15 P r H , =10.0 F.~/i,v = 1.50

F,~,.ux,

TMI,v =0.45

Tal..~x =0.99

ll~l= li,~--

14:2=

0.66 0.13

-2.60

Hz 111

0.13

Ill 111

Table 2 Examples of the standard deviation of measurement noise Distance frolll the video camera 3.0 5.0 10.0

20.0

m

/R(t:)ll

10.4

7.5

3.2

0.5

1)ixel

R(t,)22

4.9

3.9

2.2

0.7

1)ixel

o1" the running of a dog, as well as on the intensity diff('ren('(, l)etw(,en t lw ol)ject and the background. The pedestrians MI-.NI~i show 1)(,rio(lie change of intensity, but nonpedestrians lklr-M9 exhibit no periodicity. The estimated 1)ace, stride, evahmtion value and real pace for each object are talmlated in tal)h, 3. Th(' real pace for the pedestrialls is that of an average over 10 steps of walking. \Vith regard to non-pedestrian, pace and stride are estimated kom the frequency which has the maximum power. The estimated pace for non-pedestrian Mr is very (,lose to the lower ])omld of F, i.e. FM~A'. However, the stride is estimated to be about 2m which is not within the range of pedestrians (0.45-0.99m). So, we can tell Mr is not a pedestrian. Indeed, the evaluation value for 1~I7 is equal to zero.

The polmlation of the evaluation value E for 533 1)edestrians is shown in figure 9(a) and for 109 non-1)edestrians in figure 9(b). Of tht' 1)('d(,strians 82cZ~.wear trousers, 11% are in knee-length skirts or short 1)ants and 7(X, art' in long skirts. Of tht, non-pedestrians 807c, are bicycles, 18% (logs and 2c/(:. swaying 1)lants. Figure 9(a) shows that evaluation values E for 1)edestrians have a peak at the right side of the figure. On the other hand, for non-pedestrian there is a peak on the left. If the threshold value E'rHI~ is set to be 10, 94.0% of pedestrians and 95.40/(,, of non-pedestrians are corre('tly recognized. This m('ans that 94.2~: of all moving objects are correctly identified. The reasons for error will be summarized as follows. For pedestrians, 1) jittering of the observed images disturb the detection of the rhythnl of walking, 2) with swaying 1)lants often al)l)ear in the tra('king window, 3) walkers light-coh)red shoes diminish the contrast between the images of feet and the paved road. Reasons (1) and (2) are also true for n~ Jitters between successive fram('s is the main cause of error.

69

Figure 6. hnage sequences for.the eXl)eriments

6. C O N C L U D I N G

REMARKS

\Ve have I)roposed a new method for discrimination of pedestrians fl'om other moving objects based on the rhythm of walking. This method is achieved by the following three processes: moving-object detection in successive frames, tracking of moving objects using kinematic models and measurement applying extended 1,2ahnan filter, and searching for the walking rhythm in the power spectra of intensity of the images. These processes arc realized and executed in a realtime operation. The experiment has shown 94.2% of moving ol)jects to 1)e correctly
70

20

,

10

'

'

' .,

' ' ., ' ' '~ " '\ \~2" ,,. ,,,,~, M1 'x " 5.09 9 \\\' \, km/h ,, 4.71 \ \ "\, km/h \\ "". M3 \ ", ",, '", 6.33 \,\ "\ ", km/h ".,, \,, ""

\

-,4'

'

-'2

'

20'

'

',

M5 3.92 km/h

';\ ' '\

'

'

'

'

"\'x,\\,,X M6 \'", "'x"~ 4.04 ~ '\ \~ km/h "'\, Ix\

10

~r -4 i

20r

\,, \ M4 ',,X~,),,~: X 4.27 \ ~,, \ \ _ ,_ '~ \ . \ Km/n ""'\."*'\ X

~ i,

6 x(~) m

'

i

',

',

, ,

101"-

-'2

!

6 'x(~) m

I

\

M7 '" '" 10.54 ",,

km/h

',

\

\

\

9 "",

M9

~ -4.

I

i

'

\',,,~ ~kk\\

6.86 km/h

~.. !

i

.

. -'2.

\\

xx ~

M8

7.00 \\

\\k m / h

"% N "',

"\,

6

x(;0 m

\

Figure 7. Traj(,('tori(,s of l nOVillg ol)j('cts (1)ir(l's-(,y(, vi('w)

Tal)l(, 3 Estimat('d 1)a('('. st.ri(l(,, ('valuatioll Ol)j('('t lmznl)(u" ~Ii NI.2 Estilnat(,d 1)a('(' 1.92 1.91 Estimat('(l stri(l(' 0.74 0.68 Evaluation vahw E 48 92 R('al 1)a('(' 1.91 1.93

yah1(, Nla 1.94 0.91 73 2.00

and r('al 1)a('c of ea('h ()l)j(,ct Nil NI.~ Nlci NIt his 1.86 1.85 1.89 1.39 0.42 0.64 ().59 0.61 2.11 4.63 85 100 100 0 0 1.88 1.86 1.84

M!) 0.71 2.69 0

Hz m Hz

and 1)e(l('strians t() 1)(, re('ogniz('(l w('arillg various kin(Is of cl()th('s, and with diff(uing hairstyl('s and 1)()dyshal)('s. The exp(uim(,ntal results v('riI~" the vali(lity and r()l)ustness of th(, met h()(l. Th(" a(lwmtag('s of ()ur ln(,thod al'(': 1) the ('l( )t h('s, shal)(' of 1)()(ly, hairstyh' and s() on (l() not aff('('t th(, :(,suits, 2) th(' 1)a('( 9(t('lnl)()ral-h('(lll('ll('y) and th(' stride (Sl)at.ial-1)(ui()(l) at(' well ('()ml)incd tog('th(,r t.() make th(un 1)()w('rful (,n()ugh t() r(,m()v(, mils(, su('h as swaying plants, 3) the m('th()(l is simple and can 1~(, 1)(,rf'():m(,(l in r(,al time. Pedestrians in sha(low may not 1)(, (l(,t(,(,t(,(l 1)y ()lu 1):('s('nt m('tho(l. This 1):ol)l('n: should 1)(' ()v(u'('(mw in th(' :war fltt,u('. ACKNOWLEDGMENTS

\V(' at(' thankfld to Associat(' Prof. Dr. I(. Fujima fin valual)l(, sl:gg(,sti()n to iml)r()v(' this 1)at)(u". This work has 1)(,on 1)artia.lly SUl)l)()lt('(l 1)y Grant-Ai(l for S('i(,ntifi(" R('s('al('h flom ~Iinistrv of Edu('atioll No. 03650343. ()kawa Institut(' ()f hfformation and T('l('('ommuni('ati(m No. 91-32 and OMI1ON Co. Lt(l. REFERENCES

1.

H. 1[():'i, N.1I. Charkari and T. XIats,:shita. "()n-line \),hi('l(' an(l Pedestrian D('t(,cti(ms Based on Sign Pattern", IEEE Trans. In(h:s. El(,ctroni('s. \2)1. 41, No. 4,

71 >, 6 0 t l , , , , , , ,

,

60

oo 40 c-

(D

__E 20 0"()

'

601_,

'

' :2 '

, 1, ,

,

'

' 4s

,

, ,

0

2

(~[~)

4

i-i,

6

,

,

,

,

,

,a'

,

60

;

' I-I

' '

~-o ~ 1000I

' 2

'

'

'

'

'

'

'

'

0

2

4s

0

0

i'

2

,

;,s'

20

::4~

i

(~[,~)

~

0

(3 Hz

-

4

(x[._,)

",'

,

6

Hz

'

,

'

'

'

'

'

'

4

0";

'

'

' 2 '

'

' ~,s I

60

'

'

' 2

'

,

,

, i

,

,

,

' 4s ,

0

60

,

,

,

2

(x[:~)

4

,

,

,

~ ,

,

6

Hz

,

,

4s '

'

'

'

o

0

2

4s

'

1000

0

2

'

'

'

0 0

~-I

I

I

,-

4

'

'

'

'

I

I

I

2 t

O

6 Hz

(~[~-,)

'

4s

100

20~

,

20

2

140,

'

,

1 24~

.'.',,.,

2011

I'

4~f,

'

, :

4

,

40

0 0

'

,

60i,

10I~,

2

'

20

A t

Oh--'b-'~

0 0

20

40

O"

,

40

~ 60

(D

__E 20

i

40

Hz

--(oo-9 40

o~

,

'~176176I

601~ , , , , ,

60

,

2 0 0 0 ~ ~ ~ r ~

o~ 4020

>,

,

0 0

B_

0

,

2

4~ 60~,

,

,

,

v 0

I

.

~

0

I

(~[.) ,

4

6 Hz

,

,

,

I

I

I

2 .I

I

I

,

4s I

1 4

J

11

&.

0

0

2

(~[:)

4

6 Hz

0

0

2

4

I- I

6 Hz

F i g u r e 8. I n t ( ' n s i t y ('haltgc a n d p o w e r Sl~(,('tllun ~ff' c a ( h ol~j(,(t

0

2

4

6 Hz

72

100

2oo7 1 94 OO/o

104 o

I [

100-

e-

t_ U.

0

10

50 Evaluation Value (E)

90 100

(a) Pedestrians

0 10

50 90 100 EvaluationValue (E) (b) Non-1)edestrians

Figure 9. Histograms of evaluation value pp. 384-391, 1994. 2. G. Johansson, "S1)atio-Teml)oral Differentiation and Integration in Visual Motion Perception", Psych. tt,'s., Vol. 38, Pl). 379-383, 1976. 3. L.T. I(ozlowski and J.E. Cutting, "Recognizing the Sex of a Walker fi'om a Dynanfic Point-Light Disl)lay", Perception & Psychophysics, Vol. 21, No. 6, pp. 575-580, 1977. 4. R.F. Rashid, "Towards a. System for the hlterl)retation of Moving Light Disl)lays", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-2, No. 6, pp. 574581, 1980. 5. R. Polana, "Detecting Activities", Pro('. of IEEE Computer Soc. Conf. on Computer Vision and Pattern Recognition, pp. 2 7, New York, 1993. 6. M. Yachida, M. Asada and S. Tsuji, "Automatic Analysis of Moving hnages", IEEE Trans. Pattern Anal. ~[ach. Intell., Vol. PAMI-3, No. 1, pp. 12-20, 1981. 7. A. Shio and .I. Sklansky, "Segmentation of Peol)le in Motion", Proc. of IEEE Workshop on Visual Motion, pp. 325-333, 1991. 8. A.L. Gill)ert, M.G. Giles, G.M. Flachs, R.B. Rogers and Y.H. U, "A Real-Time Video Tracking System", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-2, No. 1, pp. 49-56, 1980. 9. G.R. Legters, Jr. and T.Y. Yomlg, "A Mathematical Model for Computer hnage Tracking", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-4, No. 6, pp. 583594, 1982. 10. D.F. Elliott (ed.), Handbook of Digital Signal Processing Engineering Applications, pp. 701-740, A('ademic Press, 1987.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

73

Visual Recognition of Obstacles on Roads Uwe Regensburger and Volker Graefe Institut ftir MeBtechnik Universit~it der Bundeswehr Mtinchen Neubiberg, Germany Fax (+49 89) 6004-3074

As part of a driver support system for motor vehicles on freeways an obstacle recognition system was developed within the EUREKA project PROMETHEUS over the last four years. The obstacle recognition system uses a single monochrome TV camera and a multi-processor robot vision system. The recognition system includes a road tracker and an obstacle detector that were described in previous publications, and an object classification and tracking module presented here. The module is based on generic 2-D object models. It recognizes obstacles in real time and from distances of 200 to 300 m, sufficient for high-speed driving. The module was extensively tested in real-world scenes on the German Autobahn and on various other roads, including city streets with heavy traffic. 1. INTRODUCTION A driver support system for road vehicles is a great challenge for the technology of machine vision. The goal is to relieve the driver from monotonous tasks, to warn him of imminent danger (for instance, if he is about to fall asleep), and - possibly- to assume control of the vehicle in precarious situations. The necessary technology is being developed in numerous laboratories all over the world; there is hope that it will eventually lead to a significantly increased traffic safety [Masaki 1992]. Among the characteristics demanded of such a system are high reliability, low costs, and independence of any special infrastructure. Within the European research project PROMETHEUS the Institute of Measurement Science has developed real-time vision technology that may be used for a driver support system [Graefe 1993 ]. Freeways were chosen as the principal domain for testing and demonstrating the visual recognition of objects that are relevant for the understanding of traffic situations. The reason for choosing freeways, in spite of the typically high driving speeds, is that the complexity of the traffic situations and the variety of objects are much lower on freeways than on other roads. Obstacle recognition is an essential subtask of a driver support system. (In the sequel "obstacle" refers to any visible object on the road in front of the ego vehicle that is, or could possibly become, a danger for the ego vehicle.) According to a concept for obstacle recognition proposed by [Graefe 1989] the task should be decomposed into several largely independent subtasks, or modules, including the initial detection of possible obstacle candidates, the classification of obstacle candidates into false alarms (caused e.g. by shadows), and real physical objects, the tracking of obstacles, etc.. A module for detecting obstacles in distances adequate for high-speed

74 driving was introduced by [Solder 1992] and [Solder, Graefe 1993], and road modules by [Tsinas, Graefe 1992] and [Wershofen 1992]. A multitude of methods for recognizing obstacles in monocular image sequences has been described in the literature. Essentially three different classes of methods have been used. Gestalt approaches, e.g. [Davies 1990], aim at a direct reconstruction of the 3-D world as a basis for object recognition. Other approaches, e.g. [Enkelmann 1990], [Carlsson, Eklundh 1990] and [Young, G. et al. 1992] evaluate displacement vector fields and the differences in image motion between points within the road surface and points above the road surface. Due to the large amounts of computation required by these methods, they have not yet been demonstrated in real time and, thus, do not appear suitable for vehicle applications. The third class of methods is based on object models. According to this approach objects are recognized by certain combinations of visible features. This may be accomplished in a number of different ways. As one example, [Moil et al. 1989] and [Raboisson, Even 1990] first segment the image. Irregularities within the road segment are then taken as indications for objects. Additional criteria (edges, color, etc.) are used to minimize the frequency of false detections, while additional sensors, such as laser range finders [Alizon et al. 1990] or radars [Young, E. et al. 1992], may be used to measure the distance from an object. [Korn 1989] bases the tracking of vehicles on histograms of edge orientations. [Seitz et al. 1991] use a hierarchically structured multi-layer representation of edge directions for building object models and recognizing objects in images. [Hartmann, Mertsching 1992] describe object images by a "hierarchical structure code." [Ktihnle 1991] and [Zielke et al. 1992] track vehicles on the basis of edges and symmetry. [Graefe, Jacobs 1991], [Efenberger et al. 1992], [Solder 1992], and [Solder, Graefe 1993] detect and track vehicles and obstacles on roads by comparing generic 2-D object models with edges detected in image sequences. [Thomanek, Dickmanns 1992] and [Schmid, Thomanek 1993] use spatio-temporal models for tracking objects. 2. THE CONCEPT OF OBSTACLE RECOGNITION An object-oriented concept for dynamic robot vision has been proposed by [Graefe 1989; 1992]. It is based on the notion that the world is structured, rather than monolithic, and consists of individual objects. Only a few physical objects in a robot's environment affect the control of the robot in any instant. Following this concept, the task of object recognition should, therefore, be structured according to the relevant objects. This is done by associating one individual object process within the vision system with each relevant external object. The following object processes are needed for recognizing obstacles on roads (Figure 1): 9 Recognition and tracking of the road (because only objects that are actually sizuated on the road are relevant as obstacles) 9 Detection of obstacle candidates (the corresponding object is either a hypothetical object that has not yet been found in the image, or an object that has just been detected) 9 Tracking of an obstacle; multiple copies of this process may exist in the system to allow multiple obstacles to be tracked simultaneously.

75

Video section

V Road recognition

V

0otection II tr. ,o0

V Obstacle

tracking

L _ _ '_--'-'-- . . . . . _; Situation assessment

Vehicle control

Figure 1 Structure of an object-oriented obstacle recognition system Each object process receives image data (pixels) from the video section of the vision system and continuously generates a description of its assigned object. All object descriptions are evaluated by the situation assessment module and combined into a situation description. Depending on the perceived situation, vehicle control commands or messages for the human operator may be issued. Step by step the information contained in the pixels is transformed into more and more abstract forms. Starting with a large number of pixels a chain of data fusion processes eventually leads to a description of an object or a situation. At each step specific knowledge in the form of appropriate models is introduced into the process (Table 1). At the first step, the feature extraction, groups of pixels are combined into features, such as edges or corners. Knowledge about characteristics of the camera and of the features being searched is used in this process. Feature extraction may be supported by the next higher level in the processing hierarchy, the 2-D object level. Here, knowledge about the visual appearance of specific physical objects is used to generate hypotheses regarding the nature of the object being seen and expectations as to which of its features should be visible, and where in the image they should show up. Feature extraction is thus not a form of schematic image processing, but part of a model- and knowledge-based hypothesis verification or falsification process; its task is to provide answers to very specific questions as to the presence and location of certain expected features in the image. A very robust image interpretation may be realized by such an interaction between the feature extraction and the 2-D object levels. It is possible to add additional 3-D or 4-D (spatio-temporal) levels if a description of the 3-D shape of physical objects, or of their motions, is desired.

76

Table 1 Levels of Modeling and Data Fusion Model Level

Data Fusion Performed at that Level

groups of pixels feature

-, elementary features complex features

2-D object

4-D object

environment and situation

groups of features 2-D objects 2-D objects, time, sensor data 4-D objects robot state, external objects -- situation

3. THE FALSE-ALARM PROBLEM

For reasons of safety an obstacle detection system must be perfectly reliable in the sense that it never fails to detect any obstacle that is actually present. Moreover, in order to allow a timely reaction, obstacles must be detected while they are still far away. These requirements make it practically unavoidable to generate false alarms occasionally, caused, for instance, by shadows, dirt, markings or patches on the road surface. Although false alarms are not directly dangerous they must be processed and, thus, constitute an unnecessary load for the vision system. To minimize this load, false alarms should be identified and discarded as quickly as possible. Distinguishing between actual obstacles and shadows is easy in many cases, but sometimes it can be extremely difficult. We use a set of heuristics to reject most shadows within the detector module as quickly as possible. Therefore, all alarms reported by the detection module are classified by matching the visible features that have been detected in the image against generic 2-D models of obstacles. If a fairly good match is possible, the obstacle candidate is accepted as an actual obstacle. This decision is final. If, on the other hand, the detected features do not match the expectations derived from the model, the obstacle candidate is considered a possible false alarm. A final rejection, however, does not occur at this point. Rather, the detector is requested to report the same obstacle candidate again if it can be detected in one of the subsequent images. Due to the motion of the vehicle the subsequent images will differ from the previous ones in respect of perspective, specular reflections, occlusions, etc., and, accordingly provide an independent chance to recognize an actual obstacle if there is one. A final rejection thus occurs only if an obstacle candidate has been classified many times and has not been accepted even once.

77 If the time required for one cycle of detection and rejection is short, many independent images may be analyzed while approaching an obstacle candidate. The probability of rejecting an actual obstacle is then very small. On the other hand, it is possible that once in a while false alarms are accepted as an obstacle. They must be rejected during tracking.

4. GENERIC 2-D OBJECT M O D E L S 4.1 Effects of Distance on the Visual Appearance

As Figure 2 shows, the visual appearance of an object depends strongly on the distance from which it is seen. Moreover, for a given camera and object the area of the object's image is inversely proportional to the square of the distance. Images of nearby objects are clear with large contrast and many visible details. Many features may, therefore, be detected in such an image. A lateral motion of the object causes a significant motion of its image. Images of distant objects have a low contrast and do not display much internal structure. Their contours tend to be somewhat blurred. Because of their small size and low contrast hardly any internal features can be detected. Rotation of the camera is the dominating cause for motions of objects in the image. In an intermediate range of distances the contours of objects tend to be clearly visible, but the object images do not have much internal structure. Motions of the camera and of the objects have similar effects on the motions of the image. Because of these differences specific models are used to model the appearance of physical objects, depending on their distance. The models include a number of free parameters in order to let them match objects of different shapes. Therefore, they are called generic object models.

4.2 The 2-D Model for the Nearby-range

Most of the obstacles that exist on freeways display a nearly rectangular contour with vertical and horizontal sides when seen from an approaching vehicle. Therefore, parts of rectangular contours are searched in the image.

Figure 2 Different 2-D object models (top) take into account the significant differences in the appearance of nearby (left) and far away (right) objects.

Characteristic points on the contour that may be used for an object description are the corners or the centers of the sides of the rectangle. The upper part of an obstacle is often seen before a structured background, such as trees, mountains, or other vehicles; this makes it difficult to locate the upper part of the contour in the image. Usually numer-

78 ous horizontal edges are visible, and it is not easy to determine which one is the upper edge of an obstacle. The lower part of the contour is usually seen before the surface of the road and may, therefore, be detected and tracked more easily. Recognizing only the left, right, and lower edge is sufficient for tracking the object in the image and for measuring its distance. Therefore, only these edges are looked for in the image. Nevertheless, the large number of features that are often visible in the images of nearby objects makes it difficult to single out those belonging to the contour of the object. A statistical approach was chosen to solve this problem. The image of a typical object contains a number of homogeneous areas, delimited by approximately vertical edges. Many of these homogeneous areas begin, or end, near the le~ and right contours of the object. In an area of interest that encloses the image of the object candidate, the le~ and right ends of homogeneous areas, and their centers, are counted in each image row. Local maxima of the distribution of these features indicate a possible location of the object contour and its center line. Combining these yields strong expectations where the contours of the object should be found in the image. Vertical edge detectors based on controlled correlation [Kuhnert 1986] are then applied in the vicinity of the expected contours to find the exact locations of the contours. Generally, the lower edge of an object is distinctly visible. In case of vehicles the lower edge of the shadow underneath the vehicle is taken as the lower edge of the object. It may usually be detected reliably and robustly because the shadow appears much darker than the pavement. 4.3 The 2-D Model for Distant objects

Images of distant objects are typically small and of low contrast (Figure 2). Only few features may be extracted; therefore, a statistical approach is not feasible for recognizing such objects. The left, right, and lower edges are detected directly by applying suitable edge detectors. Due to the large image motions induced by unavoidable camera motions, the tracking of distant objects would not be robust if it were based only upon a few local edge elements. Using additional features, such as corners and symmetry, improves the robustness. In contrast to the near range where, due to the high spatial resolution of the image, object contours rarely contain any sharp corners, corners may be detected robustly in the far range. Specific corner masks are used for detecting the lower left and right corners of objects by the method of controlled correlation. The images of many distant objects, including the rear-views of vehicles, have a vertical symmetry axis. Symmetry is, therefore, included as an additional feature in the object model. In the near range symmetry would be difficult to detect because of the many visible details, and finding a symmetry axis would be time-consuming due to the large areas in the image that must be processed. In the far range, where object images are small and do not contain many features, symmetry detection is fast and robust. 5. DETECTING VERTICAL SYMMETRY AXES Numerous methods for detecting and exploiting symmetry have been proposed. If object contours have been determined and converted into B-splines, symmetries may be extracted according to [Saint-Marc, Medioni 1990]. A method for detecting symmetries of polyhedrical objects has been described by [Jiang, Bunke 1990]. Shape recognition based on symmetry was introduced by [Yuen 1990]. [Friedberg 1986] determined symmetry by computing a moment matrix. A known

79 symmetrical object may be detected by convolving its image with a mirror image of a model; atter detection it may be localized by convolving it with its own mirror image.

Figure 3 Pairs of pixels, centered around a pixel, x, are evaluated. For each pair the difference of the grey-levels is computed. The absolute values of the differences are added up. Their sum is the symmetry function at the center pixel x.

All these methods presuppose a good segmentation of the image (e.g. into object and background) or have been tested only on synthetic images or line drawings. Moreover, they require computing times of tens of seconds.

Methods that are suitable for real-time and real-world applications have also been proposed. [Zielke et al. 1992] determine for each pixel a symmetry measure, based on the even and the odd part of the symmetry function. A symmetry-based statistical approach has been used by [Ki~hnle 1991] to determine the center of the image of a car as seen from the rear. Histograms of symmetrical contour pixels, of symmetrical grey-level pixels, and of the lengths of horizontal edges are computed; local maxima in all three histograms indicate candidates for a vertical symmetry axis [Marola 1989]. In the sequel a method is described that detects individual symmetry points and then determines image columns containing a maximum number of symmetry points. For finding a symmetry point the even part of the symmetry function is determined. To accomplish this the similarity of greylevels of pairs of pixels, located symmetrically to the left and to the right of a point in question, is studied (Figure 3). The absolute values of the differences of the grey-levels of each pair of pixels are added up and assigned to the pixel in question. A symmetry function is obtained by repeating this procedure for all pixels within an area of interest. Local minima within a row of the symmetry function indicate symmetry points; in Figure 4 the symmetry points have been marked. Figure 5 shows the corresponding symmetry function, computed over a 40 pixels by 40 pixels area of interest around the vehicle. When symmetry points are searched, the width of the environment upon which the symmetry function is based should be somewhat larger than the width of the object. However, if the

Figure 5 Figure 4 Symmetry points determined from 12-pixel wide environments

The symmetry function in an area of interest covering the car in Figure 4; each graph corresponds to one image row.

80 environment is chosen too large, many background pixels are included in the computation. A strongly asymmetrical background may then severely distort the symmetry function. 6. EXPERIMENTAL RESULTS 6.1 Test Environment

A real-time vision system BVV 3 [Graefe 1990] was used for testing the obstacle recognition system. Each of the 3 sub-processes, road recognition, obstacle detection, and obstacle tracking, was implemented on a separate parallel processor within the B VV 3. The obstacle recognition system was designed to cooperate closely with other modules of a driver support system, for instance modules for modeling the dynamics of the recognized obstacles. Such a cooperation has been tested earlier and described by [Dickmanns, Christians 1989] and [Regensburger, Graefe 1990]. The tests reported in the sequel were, however, performed without any additional modules. The results may, therefore, be directly attributed to the modules described here. The situation assessment module (cf. Figure 1) was replaced by a message passing system that allowed communication between the recognition modules, but did not perform any additional processing. 6.2 Recognition Distance

The ability to recognize obstacles from a great distance is of critical importance for driving at high speed. Experiments have shown that cars (moving and resting) can be tracked in distances of about 200 m; under cooperative conditions, e.g. high contrast and little camera motion, cars could be tracked in distances of over 300 m. As Figure 6 shows, a car in a distance of 300 m appears rather small in the camera image. The obstacle tracking module does not include any distance measurement. The distance from the obstacle was, therefore, measured by external means. Visible markers were placed on the pavement in known intervals. As this was impossible on a public road, the runway of a closed airfield was used for the experiments to measure the maximum recognition and trackingrange,

Figure 6 Passenger car seen from a distance of 300 m through the camera of the vision system, with overlaid markings showing the detected contour and symmetry axis

6.3 Robustness

Although the obstacle recognition system was designed for freeways, due to the model-based tracking module it proved to be so robust that it was also tested on ordinary highways and furthermore in heavy city traffic. There, other cars, as well as the own car, changed lanes frequently causing time-varying occlusions, and together with structured backgrounds and markings on the pavement generated a high degree of visual complexity. Even in such difficult scenes (Figure 7) vehicles could be tracked reliably over extended periods of time.

81 The 2-D object models employed by the recognition system assume a nearly rectangular object contour and they have been optimized especially for cars and trucks. Nevertheless, it could be shown that various other objects that may occasionally be seen on roads, such as boxes, barrels and, in some cases, persons, may also be classified and tracked. However, tracking such objects that do not match the generic models well is less reliable, especially under conditions of poor lighting or when they are far away and their images are small. Increasing the repertoire of different object models in the system should further improve the system performance in this respect.

6.3 Speed

Figure 7

The computing time required by a module of Tracking a vehicle in heavy city traffic a real-time vision system is always of prime importance. The cycle time of the object classifier and tracker was, therefore, measured in a number of experiments. One parallel processor of the robot vision system BVV 3 using an Intel 386/20-MHz CPU, augmented by a special co-processor for fast feature extraction [Graefe, Fleder 1991], was used as a hardware platform in the experiments. The basic cycle time of the vision system is the image field period of the camera's video signal, i.e. 20 ms. Thus, all cycle times are integer multiples of 20 ms. Tracking a distant object requires always less than 20 ms of computing time; the cycle time is in this case 20 ms. The time required for tracking a nearby object depends on the size of its image (see Table 2). On freeways the cycle time is usually between 40 and 60 ms. Only when the distance to the preceding vehicle is exceptionally short, e.g. in stop-and-go traffic, a cycle time of 80 ms may be observed. In such situations it would be preferable anyway to evaluate the image of a wideangle camera instead of the telescopic image used, because the lower edge of the vehicle may not even be visible if a vehicle in such a short distance is imaged by the telescopic camera.

Table 2 Cycle times for tracking nearby-objects Object width

Cycle time

< 45 pixels

40 ms

45 - 90 pixels

60 ms

90 - 120 pixels

80 ms

82 6.4 Reliability of Classification

Parameters such as scene complexity or image quality that are essential for judging the performance of the tracking module are difficult to define quantitatively. Therefore, long image sequences covering a wide variety of situations were evaluated. A 20-min run on a freeway may be used as an example. It included 60,000 images processed in real time without any preselection. Goal of the classification of obstacle candidates is the discrimination between false alarms and actual obstacles. It should be remembered that primarily it must be avoided to ever mistakenly reject any actual obstacle as a false alarm. (The other type of misclassification, accepting a shadow as an obstacle, is much less dangerous.) Table 3 shows the results obtained during the test. All objects that were reported by the detector were correctly classified, and only a very small fraction of the system resources were wasted tracking false alarms. Table 3 Results of object classification during a 20-min run on a freeway

Number of actual objects misclassified as false alarms Percentage of false alarms that were correctly rejected

0

immediately

67%

within a few seconds of tracking

33%

Number of final misclassifications

0

Fraction of time spent tracking false alarms

1.3%

6.5 Quality of Tracking

Because the true motions of the objects tracked in natural scenes were not measured, it is difficult to determine the quality of the tracking, or the accuracy of the localization of object features, quantitatively. Figures 8 and 9 are intended to give at least a qualitative impression of the quality of tracking. The ranges of applicability of the two models, for the near range and for the far range, overlap to a large extent. This allows a direct comparison between the two models by using an object in the overlap range. In each of the Figures, 8 and 9, the left graphs show the image coordinates of the two sides of a car driving in front of the vehicle with the camera. The right graphs show the width of the car's image in pixels. The scene was recorded on video tape while driving in a car, and then evaluated in real time in the laboratory, using the methods for the near range (Figure 8) and for the far range (Figure 9) on the same analog data. Both methods yield nearly identical results. The method for the far range yields slightly smoother curves (in both cases raw, unfiltered data are shown). This is probably due to the robust localization of the lower corners. Altogether both curves indicate that the unfiltered measured data are consistent and free from significant outliers, in spite of the image deterioration caused by the video tape recorder. This may be taken as evidence for a robust tracking.

83

image column 240

/

width of object (pixels)

"'-~ ,,,~_

"~--- I " ~

200

-- _ ---"---

0

\

40

160 , 120

60

100

200

300 400 program cycles

20

0

100

200 300 400 program cycles

Figure 8

Results of tracking a car in an intermediate distance range using the model for the far range (1 program cycle is equivalent to 20 ms)

image column

240

200

width of object (pixels)

F'-

6(1 ~ ' ~ 9, - , - ~

n r -

-

160 /

4()

"~1.. " ' , v ~ ~

120 0

100

200 program cycles

2C

0

100

200 program cycles

Figure 9

Results of tracking a car in an intermediate distance range using the model for the near range (1 program cycle is, on average, approximately equivalent to 40 ms)

7. SUMMARY As an essential part of a driver support system for road vehicles a module for the robust classification and tracking of objects on roads has been realized. Obstacles are recognized in real time and from great distances (200 to 300 m) as required when driving at high speed on freeways. The obstacle recognition module was originally designed for the relatively simple scenes on freeways, but it has proven sufficiently robust to allow the tracking of moving vehicles even in heavy city traffic. The module is based on generic 2-D models optimized for the rear views of cars and trucks. However, the genetic models are sufficiently general to also allow the tracking and classification of various other objects that may occasionally be encountered on roads. For a truly reliable recognition of arbitrary objects the repertoire of internal object models should be expanded.

84

ACKNOWLEDGEMENT Part of the work reported has been performed with support from the Ministry of Research and Technology (BMFT) and from the Daimler-Benz AG within the EUREKA project PROMETHEUS.

REFERENCES Alizon, J.; Gallice, J.; Trassoudaine, L.; Treuillet, S. (1990): Multi-sensory data fusion for obstacle detection and tracking on motorway. In Monique Thonnat (Ed.): Proceedings of the Workshop on Vision (Prometheus, Pro-Art). Sophia Antipolis, pp 179-187.

Carlsson, S.; Eklundh, J.-O. (1990): Object detection using model based prediction and motion parallax. In Monique Thonnat (Ed.): Proc. of the Workshop on Vision (Prometheus, Pro-Art). Sophia Antipolis, pp 139-145. Davies, E.R. (1990): Machine Vision: Theory, Algorithms, Practicalities. London: Academic Press, 1990. Dickmanns, E.D.; Christians, Th. (1989): Relative 3-D State Estimation for Autonomous Visual Guidance of Road Vehicles. Proceedings, Intelligent Autonomous Systems. Amsterdam. Efenberger, W.; Ta, Q.-H.; Tsinas L.; Graefe V. (1992): Automatic Recognition of Vehicles Approaching from Behind. Proceedings of the IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 57-62 Enkelmann, W. (1990): Obstacle detection by evaluation of optical flow fields. In Monique Thonnat (Ed.): Proceedings of the Workshop on Vision (Prometheus, Pro-Art), Sophia Antipolis, pp 146-166. Friedberg, S.A. (1986): Finding Axes of Skewed Symmetry. Computer Vision, Graphics, and Image Processing, 34, Nr. 2. San Diego: Academic Press, pp 138-155. Graefe, V. (1989): Dynamic Vision Systems for Autonomous Mobile Robots. Proceedings, IEEE/RSJ International Workshop on Intelligent Robots and Systems, IROS '89. Tsukuba, pp 12-23. Graefe, V. (1990): The BW-Family of Robot Vision Systems. In O. Kaynak (ed.): Proceedings of the IEEE Workshop on Intelligent Motion Control, Istanbul, pp IP55-IP65. Graefe, V. (1992): Vision for Autonomous Mobile Robots. Proceedings, IEEE Workshop on Advanced Motion Control. Nagoya, pp 57-64. Graefe, V. (1993): Vision for Intelligent Road Vehicles. Proceedings, IEEE Symposium on Intelligent Vehicles. Tokyo, pp 135-140. Graefe, V.; Fleder, K. (1991): A Powerful and Flexible Co-processor for Feature Extraction in a Robot Vision System. International Conference on Industrial Electronics, Control, Instrumentation and Automation (IECON '91). Kobe, pp 2019-2024. Graefe, V.; Jacobs, U. (1991): Detection of Passing Vehicles by a Robot Car Driver. IEEE/RSJ International Workshop on Intelligent Robots and Systems, IROS '91. Osaka, pp 391-396. Hartmann, G.; Mertsching, B. (1992): A Hierarchical Vision System. Proceedings, IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 18-23.

85 Jiang, X.Y.; Bunke, H. (1990): Detektion von Symmetrien polyedrischer Objekte. In GroBkopf (Hrsg.): Mustererkennung 1990, 12. DAGM-Symposium, Oberkochen-Aalen, InformatikFachberichte 254. Berlin: Springer, pp 225-231.

Korn, A. (1989): Zur Erkennung yon Bildstmkturen durch Analyse der Richtungen des Grauwertgradienten. In Burkhardt et al. (Hrsg.): Mustererkennung 1989, 11. DAGM-Symposium, Hamburg, Informatik-Fachberichte 219. Berlin: Springer, pp 507-511. Kiihnle, A. (1991): Symmetry-based recognition of vehicle rears. Pattern Recognition Letters, Volume 12, Number 4. North-Holland, pp 249-258. Kuhnert, K.-D. (1986): A Model-driven Image Analysis System for Vehicle Guidance in Real Time. Proceedings of the Second International Electronic Image Week. CESTA, Nice, pp 216-221. Marola, G. (1989)" Using Symmetry for Detecting and Locating Objects in a Picture. Computer Vision, Graphics, and Image Processing, Vol. 46, Nr. 2, San Diego: Academic Press, pp 179-195. Masaki, I. (1992): Vision-based Vehicle Guidance. I. Masaki, editor. New York: Springer 1992. Mori, It.; Nakai, M.; Chen, H. (1989): A Mobile Robot Strategy: Stereotyped Motion by Sign Pattern. Preprint of the 5th International Symposium of Robotics Research. Tokyo, pp 225-236. Raboisson, S.; Even, G. (1990): Road Extraction and Obstacle Detection in Highway Environment. 2nd Prometheus Workshop on Collision Avoidance (CED 3). Coventry, pp 74-83. Regensburger, U.; Graefe, V. (1990): Object Classification for Obstacle Avoidance. Proceedings, SPIE Symposium on Advances in Intelligent Systems. Boston, pp 112-119. Saint-Marc, Ptt.; Medioni, G. (1990): B-Spline Contour Representation and Symmetry Detection. In G. Goos, J. Hartmanis (Ed.): Computer Vision- ECCV '90, Lecture Notes in Computer Science, Vol. 427. Berlin: Springer, pp 604-606. Seitz, P.; Lang, G.K.; Giiliard, B.; Pandazis, J.C. (1991): The robust recognition of traffic signs from a moving car. In Radig (Ed.): Mustererkennung 1991, 13. DAGM-Symposium, Informatik-Fachberichte 290, Mtinchen, Springer, pp 287-294. Schmid, M.; Thomanek, F. (1992): Real-time Detection and Recognition of Vehicles for an Autonomous Guidance and Control System. Pattern Recognition and Image Analysis, Vol. 3, pp 337-38O. Solder, U. (1992): Echtzeitfa,hige Entdeckung von Objekten in der weiten Vorausschau eines StraBenfahrzeugs. Dissertation, Fakultfit fur Luft- und Raumfahrttechnik der Universit~it der Bundeswehr Munchen. Solder, U.; Graefe, V. (1993): Visual Detection of Distant Objects. IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS '93. Yokohama, pp 1042-1049. Thomanek, F.; Dickmanns, D. (1992): Obstacle Detection, Tracking and State Estimation for Autonomous Road Vehicle Guidance. Proc. 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems IROS '92. Raleigh, pp 1399-1406. Tsinas, L., Graefe, V. (1992): Automatic Recognition of Lanes for Highway Driving. IFAC Conference on Motion Control for Intelligent Automation. Perugia, pp 295-300.

86 Wershofen, K. P. (1992): Real-time Road Scene Classification Based on a Multiple-lane Tracker. Proceedings, International Conference on Industrial Electronics, Control, Instrumentation and Automation (IECON '93). San Diego, pp 746-751. Young, E.; Tribe, R.; Conlong, R. (1992): Improved Obstacle Detection by Sensor Fusion. IEE Colloquium on "Prometheus and Drive", Savoy Place.

Young, G.-S.; Hong T.-H.; Herman, M.; Yang, J.C.S. (1992): Obstacle Detection for a Vehicle Using Optical Flow. Proc. IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 24-29. Yuen, Sh.-Y.K. (1990): Shape from Contour Using Symmetries. In G. Goos und J. Hartmanis (Ed.): Computer Vision- ECCV '90, Lecture Notes in Computer Science, Vol. 427. Berlin: Springer. pp 437-453. Zieike, T.; Brauckmann, M.; v. Seelen, W. (1992): Intensity and Edge-Based Symmetry Detection Applied to Car-Following In G. Sandini (Ed.): Computer Vision- ECCV '92, Lecture Notes in Computer Science, Vol. 558. Berlin: Springer, pp 865-873.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

87

Visual Collision A v o i d a n c e b y S e g m e n t a t i o n Ian Horswill* *MIT Artificial Intelligence Laboratory, Cambridge, MA 02139 USA Visual collision avoidance involves two difficult subproblems: obstacle recognition and depth measurement. We present a class of algorithms that use particularly simple methods for each subproblem and derive a set of sufficient conditions for their proper functioning based on a set of idealizations. We then discuss and compare different implementations of the approach and discuss their performance. Finally, we experimentally validate the idealizations.

1

Introduction

One of the major difficulties of visual obstacle avoidance is that obstacles can seemingly have arbitrary appearance. They can be tall or fat, plain or textured, rigid or flexible. Different algorithms can be seen as embodying different assumptions about the appearance of obstacles. Moravec's Stanford Cart [11] can be seen as assuming that obstacles (1)project significantly from the ground plane and (2) are densely covered with visual corners. Kriegman, Triendl and Binford's Mobi system [9] can be seen as assuming that obstacles (1) have at least a certain height and (2) have vertical edges around their perimeters. The stereo algorithm of Storjohann et al. [13] is similar, but does not require the presence of vertical edges, only some sort of texture. The motion-based systems of Coombs and Roberts [2] and of Sandini et al. [ 12] also make these assumptions. Each system is compatible with a particular class of obstacles and incompatible with another. For example, none of these systems would know to avoid an oil slick, a patch of ice, or a delicate cable lying on the floor, because none of these objects would project sufficiently from the floor to be geometrically distinguishable from the floor. The other major difficulty with visual collision avoidance is depth recovery. Both the Stanford Cart and Mobi use stereo triangulation to determine position. Triangulation involves computationally expensive correspondence calculations and generally requires the use of well-calibrated cameras. In this paper, we will discuss a class of algorithms based on assumptions about the appearance of the background. Simply put, the method is as follows: take an image, use some criterion to discard pixels that look like the floor, and avoid driving toward the remaining pixels. Our current implementations are designed to operate in environments with textureless carpeted floors. In such environments, the presence of an edge is a cue to the presence of an obstacle, so the robot need only steer to avoid image edges. The algorithm uses 2D image position to determine obstacle position: higher edges are assumed to be farther away. The algorithm has several advantages: 9 All computations can be performed on 2D rasters. 9 The algorithm requires very little computation. 9 The algorithm is very conservative, preferring false positives to false negatives. We have implemented the algorithm on five different robots here at MIT. The algorithm has also been implemented and tested in other labs [5][ 14]. The algorithm is quite robust and has seen hundreds of hours of testing in many different environments.

88

Figure 1: Polly (left), the first generation vision-based robot, Frankie (center), the second generation, and Elvis (right), the third generation. The next section describes the edge-based algorithm. Section 3 shows that the algortihm's behavior is equivalent to the behavior of a system with a perfect depth map, provided that a set of assumptions about the environment and the robot dynamics are valid. Section 4 gives the details of several implementations done here at MIT. Section 5 discusses their performance and failure modes and concludes that the assumptions about environment and vehicle dynamics are accurate enough for the purposes of the analysis. Finally, section 6 provides conclusions.

2

Algorithm

We assume a forward-looking camera. Let I(t) be the image at time t and let E(.) be some edge detection operator. We define the bottom projection ("bprojection") of a binary image J to be a vector b ( J ) , indexed by the x (horizontal) coordinate, whose x'th element is the height of the lowest marked pixel in the x'th column of J:

b~(J) = min{y : J ( x , y ) =

1}

It will be shown below that under the right conditions, b ( E ( I ) ) is a radial depth map: a mapping from direction to the amount of freespace in that direction. Given the radial depth map b(E(I(t)))), we define the left, right, and center freespaces to be the distances to the closest objects to the left, right and center:

l(t)

=

r(t)

-

c(t)

=

min

b~(E(I(t)))

min

b~(E(I(t))) b~(E(l(t)))

~<xe

x~>~c

min

(1)

where xc is the x coordinate of the center column of the image, and w is a width parameter which in our implementations has been 1/4 the width of the image. The exact choices of the ranges over which the minima are taken is not important. Finally, we define a simple reactive control system for a holonomic base:

dO

dt (t)

v(t)

-

co ( l ( t )

r(t))

(2)

=

c. (c(t) - dm,~)

(3)

-

89 where (dO/dt)(t) is the rotational velocity of the robot, v(t) is its translational velocity, d,~n is the closest that the robot should ever come to an obstacle, and co, c~ are user-defined gains. The algorithm turns in the direction of greater freespace at a rate proportional to the difference in freespace on the two sides. It drives forward if the distance to the closest object ahead is greater than d~,~, otherwise it drives backward. Of course, real systems will implement velocity caps (maximum velocities), which introduce min and max terms to the definitions above. We have left these out for simplicity.

3

Analysis

The discussion above leaves open a number of questions: in what environments will this algorithm work? What types of edge detectors can be used for N? Do the gains matter? In this section, we will examine these questions by comparing the algorithm to one which uses the same velocity equations (eq. 2), but which computes the distances l(t), T(t) and c(t) using a perfect oracle. 1 Space precludes a formal presentation in full detail. The interested reader is referred to the proofs of lemma 3 and theorem 2 in [7]. We can represent the perfect-information system schematically as follows: image

oracle

l, r, c

velocity eqs

-d-~' v

An image is fed to an oracle that somehow determines for each direction the distance to the closest object in that direction. The l, r and c values are computed from the resulting depth map and these values are passed to the velocity equations (eq. 2), which control the motors. Suppose that we were to warp the oracle's distance estimates by some strictly increasing function f: q

oracle ~ - ~

/, v, c H velocity eqs b

Because f is strictly increasing, this transformation will preserve the signs of both dO/dr and v. If we model the rotation and translation subsystems as first-order, zero delay systems, and if we can treat them as approximately separable (see section 5), then each subsystem will always converge to the same angle/distance, regardless of our choice of f. Only the rate of convergence will be affected. To say it more abstractly, if we treat the robot-environment combination as a dynamical system (whose state space is the configuration space), then the dynamical system will have the same attractors, repellors, and fixed points, regardless of the choice of f. Thus, the qualitative behavior of the system is invariant with respect to the introduction of a strictly increasing function. The practical import of this is that that we do not need to use a calibrated depth m e a s u r e ~ a n y monotonic measure will do, so long as we adjust d ~ n accordingly. Furthermore, the system should be robust in the presence of variations in sensor parameters. An oracle is a theoretical device generally used to prove that some function is not Turing computable. It is ,assumed to correctly compute the value of some function whether or not that function is formally computable. Oracles ,arenot intended to be physically realizable. We use the term here simply to mean a system that somehow correctly computes the right answer.

90 Now suppose we have an oracle that can identify obstacle pixels in the image: its input is an image, and its output is a binary image whose pixels are 1 iff the corresponding pixels in the input were projected from an obstacle. Call this the "figure/ground" image. It has been known at least since Euclid (see Cutting [3]) that for points projected from a ground plane, image plane height is a strictly increasing function of distance. This result can be extended to show ([7], theorem 2) that in the right environments, the bottom projection of the figure/ground image is equal to the radial depth map, modulo a strictly increasing function. Therefore we can substitute the figure/ground oracle for the distance oracle:

q F/Goracle H bproj H l,r,c H vel. eqs ground-planeconstraint

without altering the behavior of the system, provided that the (GPC) is true of the robot's environment. The GPC requires that all obstacles rest on the ground plane and that their projections be contained within the interiors of their regions of contact with the ground plane (see [7]). To complete the transformation of the distance oracle algorithm into the section 2 algorithm, we need only replace the figure/ground oracle with an edge detector. We would like to justify this substitution and to know what constraints there are on the choice of edge detector. Suppose the surface markings of the ground plane (e.g. the carpet) consist of a high frequency component and a constant (DC) component. Let ta be the lowest spatial frequency in the high frequency component. If the image plane is parallel to the ground plane, looking down at the ground plane from a distance d, then the to component will appear on the image plane with frequency tat

__

tad

Y

where f is the lens focal length. Increasing d or changing the camera orientation can only increase this frequency. If we apply a low-pass filter with cutoff w t to the image, only the DC component will be preserved. Since an edge detector is effectively a band-pass filter, it should not respond to regions of the image projected from the ground plane, provided that it's high cutoff is less than w t. Thus we can substitute the edge detector for the F/G oracle:

q edgedet. H bprojH l, r,c~ vel.eqs b backgroundtextureconstraint

provided that the (BTC) holds: that the environment is uniformly illuminated, that the ground plane's surface markings have no components less than w other than DC, and that the scene is viewed from a distance of at least d. 2 The derivation of our algorithm from the oracle algorithm tells us that we can expect them to behave equivalently provided that the GPC and BTC hold of the environment, and that the robot base can be approximated as two non-interacting first-order systems. We will return to these

2This substitution step is not strictly valid, since it assumes that the edge detector fires for all the object pixels. In real life, the edge detector will only fire for the outline of the object and part of its interior. To be strictly valid, the substitution step should replace the F/G oracle with an edge detector combined with a fill algorithm (the fill would mark the object pixels missed by the edge detector). However, it is easy to show that the bprojection is invariant w.r.t, the fill operation, so it can be safely omitted.

91 Robot Polly Gopher Frankie Wilt Elvis

Speed lrn/s 0.2m/s lm/s lm/s 2m/s

Processor TMS320C30 MC68332 MC68332 MC68332 TMS320C31

Frames/s 15 5-10 5-10 5-10 15

Cost $11,000 $700 $7OO $7OO $1,000

Power <40W <3W <3W <3W 12W

Weight <15kg 2kg 2kg 2kg 6kg

Table 1: MIT implementations of the algorithm. Cost, weight and power refer only to the vision system, not to the base or power supply. The frame rates of the 68332-based systems are variable because of varying CCD integration times. assumptions shortly. The derivation also formalizes the computational value of the individual assumptions: the ground plane constraint allows a particular formal simplification of the depth estimation problem, and the BTC allows a particular formal simplification of the figure/ground problem.

4 Implementations To date, we are aware of several implementations of the algorithm that are presently in use. Most have been done at MIT (see table 1), although we know of two others that have been implemented by outside labs [5][ 14].

Polly The first implementation was part of a vision-based mobile robot called Polly (Horswill [6][7]). Polly used vision to seek out visitors and give them primitive tours of our laboratory. Polly processed images at 15Hz using an on-board, off-the-shelf DSP card and frame grabber mounted on an RWI B 12 holonomic base. The complete robot was built for less than $20K US. Polly uses several different algorithms in parallel. The motor outputs of the collision collision avoidance algorithm discussed here are summed with the outputs of a simplified version of Bellutta's vanishing-point algorithm [ 1] for corridor following. All the MIT implementations, except for Elvis and Gopher, run at roughly lm/s. Higher-level navigation was performed by separate processes that attempt to recognize landmarks and choose paths. The details of these other algorithms are outside the scope of this paper. See [7] for further information.

Frankie and Wilt Frankie and Wilt are newer implementations based on new-generation hardware. Polly's major reliability problem was that its off-the-shelf hardware was not intended to run on batteries, nor was it intended to be knocked around and shaken on board a mobile robot. Frankie and Wilt, by contrast, use only a small 68020-class microcontroller board designed to run on batteries. The board is much more robust, both mechanically and electrically, than Polly's processor. Image acquisition is performed by directly interfacing a CCD image sensor to the processor bus by way of an A/D converter (see Gavin and Yamamoto [4], Horswill and Yamamoto [8]). The CCD is mounted on a pan/tilt head built from Futaba model airplane servo motors. The complete vision system is less than $700 US (parts cost). The collision avoidance algorithm was modified to take advantage of the pan/tilt head; Instead of using the center distance to control

92 the rate of translation, the center distance is used to tilt the head up or down so as to keep the nearest obstacle centered in the image. The translate velocity is then based on the tilt angle of the camera. This increases the effective field of view of the system and helps compensate for various mechanical deficiencies (see below). The Polly, Frankie and Wilt implementations work by subsampling the image to 64 x 48 pixels, and then smoothing the result with a 3 x 3 filter. Edge detection is done by computing the sum of the absolute differences of each pixel with its upper and right neighbors. If the sum is greater than a threshold (15 grey levels out of a possible 510 on Polly, 22 grey levels for Frankie and Wilt which have more camera noise). The Polly system is described in detail in [7]. The important parts of the code for the Frankie system are given in section 6. Elvis

The Frankie and Wilt implementations were more robust and were dramatically cheaper, however, their 1-2MIP processors severely limited the processing they could perform. Elvis, our third-generation robot is based on much higher-performance (50MFLOP) hardware that is still relatively inexpensive ($1,000 parts cost). Because Elvis can use a higher frame rate and has a lower center of gravity, it can run dramatically faster than the other implementations. We have reliably achieved speeds of up to 2m/s. The Elvis vision hardware, designed by Chris Barnhart, consists of a TMS320C31 floatingpoint DSP with 128KW of ram, a memory-mapped NTSC color frame grabber, and a 24 bit memory-mapped video display for debugging. We hope to use the color capabilities of the machine to simplify place recognition.

5

Evaluation

The algorithm has been thoroughly tested, having seen many hundreds of hours of service. Polly has given over 100 tours and has patrolled the lab for periods of up to two hours before exhausting its motor batteries. Wilt, while more limited in functionality, is light enough and robust enough to permit much greater testing~ It is reliable (and safe) enough that researchers regularly bring their children to play with it. The researchers often demo the system themselves, even though they did not design it and do not know how it works. Frankie and Wilt have also been demo'ed off site by researchers unfamiliar with their workings. The algorithm is dramatically more reliable than the sonar-based systems running in the lab, which are very sensitive to obstacle geometry. In all, the various implementations of the algorithm have been tested in a few dozen different environments: 9 Various floors of our lab (see below). 9 Various class rooms and auditoriums of MIT. 9 The AAAI-93 convention center, exhibition halls, and hotel. 9 The Brown University Computer Science Department. 9 West Point Military Academy 9 The New Hampshire offices of DEC 9 A local elementary school.

93

5.1

Environmental approximations

The analysis of section 3 showed that the behavior of our algorithm is equivalent to the behavior of a system with a perfect distance oracle, provided that the environment satisfied certain properties: that all obstacles rest on a ground plane and have a certain type of shape (the GPC), and that the ground plane was textureless and uniformly lit (the BTC). These are idealizations; real environments contain shadows, real carpets have stains. The usefulness of the algorithm is determined by the range of environments which satisfy these properties well enough to allow the system to work. The ground-plane constraint has turned out to be mostly non-problematic, even though many objects, such as chairs and tables, violate its shape restrictions. It is particularly unproblematic for Frankie because it tilts its head downward as it approaches an obstacle. In the limit, when the head is pointed straight down, the equivalence of the bprojection to a depth map can be shown to be independent of object shape. Thus the active head actually makes the system somewhat more robust. Frankie's biggest problem is that its head happens to be exactly at desktop level. As Frankie approaches a desk and looks downward to keep the desk legs in view, the desktop goes out of view and Frankie hits itself in the head. The background-texture constraint is more complicated. All floor surfaces in our building satisfy its surface marking requirements, but the basement and top floor both have shiny tile floors that reflect images of the overhead lights like mirrors. The robot treats these reflections as obstacles and tries to avoid them. Depending on room geometry, the robot may successfully navigate or may back away from the lights until it collides with something in the rear (Frankie can't see backward). Another problem occurs when a room has two different carpets. Even though each carpet satisfies the BTC, the boundary between them looks like an obstacle to the robot. Polly uses a separate algorithm to identify specific carpet boundaries and override the edge detector, but we hope to handle the situation more intelligently on Frankie. Shadows have not been a problem in our environment, although scenes which are illuminated both by sunlight and by room light often have illumination gradients strong enough to trigger the edge detector. Such scenes also cause the automatic gain control of the camera to fail. Although such experiments are purely anecdotal, we have also run the robots in a number of off-site environments. Frankie has been run successfully at a number of sites on campus: classrooms, the main auditorium, and the so-called "infinite corridor." One problem in auditorium environments is that the overhead lights are often point-sources which generate strong shadows. These can sometimes confuse the robot. Polly has been run successfully in those parts of the Brown University computer science building that were well enough lit for the video camera to operate. Frankie has also been taken off-site by other researchers to give demos. A smaller robot using an early version of the Frankie vision hardware was run on-site at last year's AAAI. The robot suffered many hardware problems and did not perform at all well in the office environment that had been carefully engineered for the sonar-based robots of the AAAI robot contest. However, it was the only robot that could function effectively in the real lounge area, the sessions, or the conference hotel.

94

5.2 Dynamicapproximations Our analysis assumed that the translation and rotation systems of the robot are independent and that they are both first order dynamic systems. In real life, the two systems are coupled: rotations in response to the l and d distances change the c distance, while translations in response to c can change the I and r. This coupling raises the possibility of oscillation. Fortunately, we have not experienced this problem in practice because the robot is almost always either blocked or in rotational equilibriummit tends either to translate or rotate, but rarely both. Thus the coupling between rotation and translation rarely shows itself. While most modern robot bases have good velocity control mechanisms, they can only generate finite accelerations, so the first-order approximation is also only partly accurate. Another limitation of our robots is that their centers of gravity are high enough for them to tip over when the robots stop suddenly. Deceleration limits cause an interesting problem: if the robot fails to detect a faint obstacle until it is nearby, the robot may not have time to stop fully before the obstacle goes out of view. Since the algorithm is purely reactive, it forgets the out of view obstacle and accelerates into it. Any reactive control system will have this problem, whether it is based on a first-order model or not, so the problem does not really lie with our analysis. On Frankie, the problem is greatly reduced because it can compensate with its pan/tilt head. The low mass head is not subject to the same acceleration restrictions as the body, so Frankie can always keep the object in view. Frankie is also drastically lighter and has a lower center of gravity, so its dynamic problems are less to begin with. Elvis, being still lighter and lower to the ground, can safely run at up to 2m/s and yet stop without falling.

6

Conclusions

We have presented a simple collision avoidance algorithm and an analysis of its behavior based on a set of assumptions regarding the environment and the robot's dynamics. We also presented various implementations of the algorithm and evaluated their performance and the empirical validity of the assumptions used in the analysis. Experiments show that a surprisingly wide range of environments satisfy the constraints closely enough to allow the algorithm to work effectively and that the real robot dynamics generally match the approximations used in the analysis. Thus the environment and dynamic models, while imperfect, are sufficient for the analysis. The algorithm is simple, conservative, extremely efficient, and quite reliable within our local environment. In our environment, it is far more reliable than the sonar-based methods. The algorithm belongs to a general class of algorithms that exploit the fact that it is often easier to recognize the background than to recognize an obstacle. Depending on the context, a robot might use any number of cues to distinguish the background from the obstacles. The algorithm presented here uses an edge detector, and therefore relies on an assumption about the surface markings of the background. One can easily imagine other algorithms using other cues (e.g. color or scale) in other contexts. To change the cue, one need only change the final substitution in the analysis given in section 3. For example, one could substitute a color-based identification system (e.g. Swain [ 15]) for the edge detector:

q

H

H ',',oH ve' b

95 if the BTC didn't hold of the environment but the floor could be relied on to have a predictable color. In theory, one could even use an adaptive system that dynamically switched between an array of cues, depending on local context, but this is more speculative. The reader may object that this algorithm is just a particular hack for a particular environment. In a sense, this is true. However, it is a hack which has been demonstrated to operate in a larger number of environments than any other vision-based robot of which we are aware. In addition, nearly any vision system will make some set of assumptions about its environment (see Marr [ 10] for the canonical discussion of this point). A stereo or motion system, for example, requires the presence of texture not only on the obstacles, but on the floors and walls also, at least if it is to make accurate depth readings there. This property simply isn't true of our laboratory. We suggest that the issue is not so much whether we should specialize our systems to tasks and environmentsmsince nearly all systems are specialized--but what specializations are most useful.

Acknowledgments Support for this research was provided in part by the Advanced Research Projects Agency under Office of Naval Research contract N00014-91-J-4038 and in part by the NASA Jet Propulsion Laboratory under contract 959333. Additional support was provided by Matsushita Electric Industrial Co., Ltd.

Appendix: code excerpts from Frankie Below is most of the important code from the Frankie system. Some variable names have been changed to be more consistent with the text. Device drivers and obvious functions, such as a b s have been omitted. The code is written in C and makes extensive use of pointer arithmetic to speed up the inner loops of the vision code. Note that only 1D iterations are performed over the images, rather than 2D iterations. This makes the code somewhat faster and simpler, but does produce erroneous results at the edges of the image. The rest of the code is written to ignore the values it finds there.

#define image_width 64 #define image_height 48 #define image size (image_width*image height) * ipointer is the type of a pointer into an image. * pointer() casts an image into a pointer. */

typedef unsigned char image [image_height] [image_width ] ; typedef unsigned char *ipointer; #define pointer(a) ((ipointer) (a))

96 /* I t e r a t i o n m a c r o s */ # d e f i n e c o u n t d o w n ( v a t , count) \ for (var : (count)-l; v a r > = 0 ; var--) # d e f i n e d o t i m e s ( v a r , count) \ for (var = 0; v a r < ( c o u n t ) ; var++) /* P o i n t e r a r i t h m e t i c for i m a g e p o i n t e r s */ #define shift_left(i,pix) ((i)-(pix)) #define shift right(i,pix) ((i)+(pix)) #define shift_down(i,pix) ((i)+image_width*(pix)) #define shift_up(i,pix) ((i)-image_width*(pix)) /* T h i s is a 3x3 s e p a r a b l e l o w - p a s s void smooth(image in, i m a g e out)

{

image scratch; i p o i n t e r left, int c o u n t e r ;

middle,

right,

up,

filter

down,

*/

i,

i = pointer(in) ; s = p o i n t e r (scratch) ; o = s h i f t _ r i g h t (s, i) ; left = i ; middle = shift right(i,l) ; right = shift right(i,2) ; countdown(counter, image_size-2) *o++ = ( * l e f t + + + * r i g h t + + + ( * m i d d l e + + >> 2;

o,

<
o = s h i f t _ d o w n ( p o i n t e r (out) , i) ; up = s; middle = shift down(s,l) ; down - shift down(s,2) ; countdown(counter, image size-(image_width*2)) *o++ : ( ( i n t ) * u p + + + ( i n t ) * d o w n + + + ( ( i n t ) * m i d d l e + + << I)) >> 2;

void

{

edge_detect

i p o i n t e r up, int c o u n t e r ,

(image in, i m a g e out, int e d g e t h r e s h o l d ) left, center, c, dx, dy;

i ,o;

s;

97 center : i : pointer(in) ; o : p o i n t e r (out) ; up : s h i f t _ u p (i, i) ; l e f t = s h i f t l e f t ( i , i) ; countdown(counter, i m a g e size) c = *center++; d x = (int) ( * l e f t + + ) -c; dy = (int) (*up++) -c; *o++((abs(dx) + abs(dy))> ?255-0;

#define

thresh

22

/* Do all t h e v i s i o n to and center distances v o i d do v i s i o n ( )

{

edge_threshold)

compute */

left,

right,

grab frame(framebuf) ; s m o o t h ( f r a m e b u f , bufl) 9 e d g e _ d e t e c t (bufl, b u f 2 , t h r e s h ) ; b p r o j e c t i o n (buf2, d e p t h _ m a p ) ; /* c o m p u t e l, r, a n d c d i s t a n c e s */ left=region_min ( d e p t h _ m a p , 4,25) ; r i g h t = r e g i o n _ r a i n ( d e p t h _ m a p , 35,25) ; center=region_min ( d e p t h _ m a p , 20,24) ;

int

pan,

/*

tilt-

T i l t t h e c a m e r a so as obstacle */ void cameracontroller()

{

int int

to

center

the

nearest

vertical_error = center - desired_height; control_signal = min(max(-(vertical_error>>l),-l),5);

/* t i l t t h e c a m e r a u p / d o w n to c a n c e l o u t control_signal *,I tilt camera relative( control signal) ;

98 /* R o t a t e the b a s e to b a l a n c e the free s p a c e on the left a n d right. D r i v e f o r w a r d / b a c k to k e e p the n e a r e s t o b s t a c l e at a c o n s t a n t d i s t a n c e */ void

{

base

int int int if

controller()

rotational_error = right-left; translational_error = center - desired_height; tilt_error = 20-current camera_tilt();

(current c a m e r a _ t i l t ( ) = = b o t t o m _ l i m i t ) /* s p e e d up turns w h e n v e r y c l o s e to o b s t a c l e this m a k e s it r e s p o n d f a s t e r w h e n it r e a c h e s a d e a d end */ rotational error = (rotational_error>0) ?

set_rotate_velocity

4000- (-4000);

(rotational_error*200)

/* If too c l o s e to object, m o v e b a c k full this p r o b a b l y isn't n e e d e d n o w that we m o v e the h e a d */ if (center< d e s i r e d h e i g h t -i0) set_translate_velocity(-1000); else set_translate velocity(tilt error*100);

/* D r i v e r main()

{

loop

*/

w h i l e (I) { dovi s ion ( ) ; c a m e r a _ c o n t r o l l e r () ; base controller() ;

}

; speed can

99

References [ 1] R Bellutta, G. Collini, A. Verri, and V. Torre. Navigation by tracking vanishing points. In AAM Spring Symposium on Robot Navigation, pages 6-10, Stanford University, March 1989. AAAI. [2] David Coombs and Karen Roberts. "bee-bot": using peripheral optical flow to avoid obstacles. In Proc. of the SPIE Conf. on Intelligent Robots and Computer Vision XI: Algorithms, Techniques, and Active Vision, (Boston, MA, November 15-20, 1992), 1992. [3] James E. Cutting. Perception with an Eye for Motion. MIT Press, 1986. [4] Andrew S. Gavin and Masaki Yamamoto. A fast, cheap, and easy system for outside vision on mars. In Intelligent Robots and Computer Vision XI, Cambridge, MA, September 1993. SPIE. [5] Takashi Gomi and Koh ichi Ide. Vision based navigation for an office messenger robot. In this volume, 1995. [6] Ian Horswill. Polly: A vision-based artificial agent. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 824-829. AAAI, MIT Press, 1993. [7] Ian Horswill. Specialization of perceptual processes. PhD thesis, Massachusetts Institute of Technology, Cambridge, May 1993.

[8] Ian Horswill and Masaki Yamamoto. A $1000 active stereo vision system. In Submitted to CVPR-94, 1994. [9] David J. Kriegman, Ernst Triendl, and Tomas O. Binford. A mobile robot: Sensing, planning and locomotion. In 1987 IEEE lnternation Conference on Robotics and Automation, pages 402--408. IEEE, March 87. [ 10] David Marr. Vision. W. H. Freeman and Co., 1982. [ 11] Hans R Moravec. The stanford cart and cmu rover. Technical report, Robotics Institute, Carnegie-Mellon University, February 1983. [ 12] G. Sandini, E Gandolfo, E. Grosso, and M. Tistarelli. Vision during action. In Yiannis Aloimonos, editor, Active Perception, chapter 4, pages 151-190. Lawrence Erlbaum Associates, Inc., 1993. [ 13] K. Storjohann, T. Zeilke, H. A. Mallot, and W. von Seelen. Visual obstacle detection for automatically guided vehicles. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 761-766, May 1990. [ 14] Michael Swain. Personal communication. [ 15] Michael J. Swain. Color indexing. Technical Report 390, University of Rochester Computer Science Department, November 1990.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

101

Using Perceptions to Plan Incremental Adaptations A.J. Hendriks and D.M. Lyons Philips Laboratories, Philips Electronics North America Corporation, 345 Scarborough Road, BriarcliffManor, NY 10510, USA. [email protected], [email protected], com

Abstract

A robot system operating in an environment in which there is uncertainty and change needs to combine the ability to react with the ability to plan ahead. In a previous paper we proposed a solution to the problems of integrating planning and reaction: cast planning as adaptation of a reactive system. The planner, asynchronously tuning the reactor, decides on the appropriate parts of the reactor to be modified based on perceptions: information gathered in the reactor for the express purpose to inform the planner. In this paper, we show the benefits of using perception to adapt the reactor where it needs to be updated most and present our first experimental results from the planner-reactor architecture. 1.

INTRODUCTION

The importance of integrating deliberative ("planning") capabilities and reactive capabilities when building robust, 'real-world' robot systems is becoming widely accepted [1, 6, 13]. Purely deliberative approaches (e.g., [4]) are good at determining globally good courses of action for a robot, selecting an optimal assembly sequence for example. They are not very robust, however, in the face of changing or uncertain conditions in the robot's environment. Typically, a deliberative system expends a lot of effort deriving a good course of action for the robot, only to find on starting that the environment has changed and rendered the actions unsuitable [13]. Purely reactive approaches (e.g., [2, 7]) operate very robustly in some uncertain and dynamic e n v i r o n m e n t s - namely those for which the reactions have been manually p r e - p r o g r a m m e d - but lack any ability to plan ahead. However, many application domains such as mobile robots [6], autonomous explorers [1], emergency response planning [13] as well as our domain, robotic kitting, demand the integration of deliberative and reactive capabilities. In [11] we suggested a straightforward approach to this integration: allow the deliberative component to incrementally adapt the reactive component to suit

102 current goals and operating conditions. In subsequent papers, we looked at the theoretical issues of incremental adaptation [9] and of safely making changes to an operating reactive system [10]. Here we present our first performance results, using our approach to construct a robot kitting work cell. Some alternative approaches to this integration have been suggested. Schoppers[14] amongst others has proposed an 'off-line' generator for reactive systems; Xiaodong and Bekey [16] also suggest something similar for an assembly cell equipped with a reactive scheduler. This approach works well only when the deliberative component can be separated 'off-line'. We address the problems that arise when both reactive and deliberative components need to be 'on-line'. Connell's SSS [3] addresses this integration for mobile robots. In SSS, though, the deliberative component is restricted to enabling/disabling behaviors, while we need to be able also to generate new behaviors. Bresina & Drummond's ERE [1] comes closest to our approach. A key difference is that our domain, kitting, is a repetitive activity. Our concept of the improvement of a reactive system exploits this repetition, whereas ERE was designed for once-off activities. We begin with a summary of the planner-reactor architecture we developed to implement of our approach. Section 3 describes the kitting robot domain. Sections 4 and 5 provide more details of our implementation as a prelude to our experimental results in Section 6.

Adaptations

PLANNER

Actions

REACTOR

WORLD

J Perceptions

Sensing

Figure 1: Planner-Reactor-World System 2.

THE P L A N N E R - R E A C T O R A R C H I T E C T U R E

Our definition of planner is a system that continually modifies a concurrent and separate reactive system (the reactor) so that its behavior becomes more goaldirected. Figure 1 illustrates this architecture. The reactor contains a network of reactions: hardwired compositions of sensor and motor processes. In our framework, these are sensory processes composed with effector (or action) processes in such a fashion that should the sensory process detect its triggering condition, then it will in turn trigger the effector process. The reactor has two key properties:

103 1.

2.

It can produce action at any time. Unlike a plan executor, a reactor can act independently of the planner; it is always actively inspecting the world, and will act should one of its reactions be triggered. A reactor should produce useful behavior even without a planner~ It is a real-time system. The central objective of the reactor is to produce the appropriate action at the appropriate time. To make this possible, all the 'knowledge' in a reactor is encoded into reactions and cross-connections between them, and annotated with time constraints.

The reactor encapsulates what would typically be called the plan in more traditional approaches. By making it a completely separate system, we remove the delays involved when a general deliberation subsystem is always interposed between sensing and action. The reactor is also responsible for the default behavior of the system in the following sense: It is initialized with a number of generally useful behaviors already in place, for example, distinguishing objects from background, locally avoiding obstacles, and scanning its sensors in the environment. Unlike a plan executor or a hierarchical system of planner and reactive system, our reactor acts asynchronously and independently of the planner. It is always actively inspecting the world, and will act should one of its reactions be triggered. A reactor should produce timely, useful behavior even without a planner. Rather than viewing the planner as a higher-level system that loads plans into an executor, we see the planner as an equal-level system that continually tunes the reactor to produce appropriate behavior. Interaction between the planner and reactor is entirely asynchronous. The planner continually determines if the reactor's responses to the current environment would indeed conform to the given objectives. If not, then the planner makes an incremental change to the reactor configuration to suit the reactor's behavior better to the objectives. As indicated in figure 1, there are two routes of interaction between planner and reactor. 1. 2.

Adaptations These are structural alteration commands issued by the planner causing changes in the reactor structure. Perceptions These are sensory data collected by the reactor to be sent to the planner.

An adaptation is essentially an instruction to delete part of the reactor structure or to add in some extra structure. Each individual adaptation should be small in effect and scope; large changes in reactor behavior only come about as the result of iterated adaptation. Large adaptations would be equivalent to downloading 'plans' to the reactor; something we need to avoid. A planner can only adapt a reactor in a well-defined way, as is explained shortly. This allows a formal analysis of the adaptation mechanism. To distinguish between reactor and planner sensory data, we refer to any data collected by the reactor as sensory data, but sensory data collected to be sent to the planner will be referred to as perceptual data. In the remainder of this section, we briefly reiterate the details of the planner-reactor approach as presented in our previous work[9, 10]. We begin by introducing the formal language ~U we use to describe the reactor and its incremental adaptation.

104

2.1. The T5 Model The ~5 model [8] was developed to r e p r e s e n t and analyze the kind of prog r a m s involved in sensory-based robotics. This language allows a user to construct programs by 'gluing' together processes in various ways. This style of language is called a process-algebra language. These languages have the additional a d v a n t a g e t h a t they facilitate the m a t h e m a t i c a l analysis of programs. A description of a process, or n e t w o r k of processes, is called a schema. For example, L o c a t e m (X) denotes a process t h a t is an instance of the schema L o c a t e with one i n p u t p a r a m e t e r m and one result x. Networks are built by composing processes together using several kinds of process composition operators. This allows processes to be ordered in various ways, including concurrent, conditional and iterative orderings. At the bottom of this hierarchy, every n e t w o r k m u s t be composed from a set of atomic, pre-defined processes. The composition operations are the following:

1. 2. 3. 4. 5.

sequential concurrent conditional negation disabling

A; B,

do A t h e n B.

(A I B), A: B,

do A and B concurrently.

~A,

do A but fail if A succeeds and vice-versa.

A#B,

do B only if A succeeds. do A and B concurrently until one terminates.

We use two r e c u r r e n t operators:

6. 7.

synchronous recurrent A: ; B =A: (B; {A: ; B) ), while A succeeds do B. asynchronous recurrent A: : B =A: (B I (A: 9B) ), while A succeeds s p a w n off B.

It is i m p o r t a n t to represent how process networks evolve over time, as processes dynamically t e r m i n a t e or are created. This is captured by the evolves operator: We say t h a t process A evolves into process B u n d e r condition ~ if A can possibly become equal to B w h e n condition a occurs; we write this as A -~ B. A process must evolve to one of the set of processes to which it can possibly evolve.

2.2. I n c r e m e n t a l R e a c t o r A d a p t a t i o n We define the reactor R as a set of concurrent reactions, and a well-defined interface t h r o u g h which the p l a n n e r can modify the reactor structure:

REACTOR

=

(EI#pI I ......I En#pn

I (ADD(k): ~ (Ek#p k) )

(1 )

w h e r e there are three m a i n items: 1.

pi is a reaction or reactor segment.

2.

E i is a g u a r d process by which the p l a n n e r can remove pi from the reactor.

3.

ADD is the interface t h r o u g h which the p l a n n e r (by the definition of : :) can add new guarded reactions.

A reactor does not contain/execute plans in the same sense t h a t a classical plan executor does, i.e., as a d a t a structure. The reactions are represented procedurally, so the reactor is more like a compiled controller. Nonetheless, certain

105

p a r t s of the reactor s t r u c t u r e can be identified as corresponding to the instructions necessary for a specific t a s k or subtask; we call these parts reactor segments. It is possible to write down how the reactor is affected by the changes described in items 2 and 3 of eq. 1 above using the T5 evolves operator: ( R

Dk

[ Ek#p k )

--> Ak -->

R

where Dk is where A k is

R ( a

I Ek#p k )

(2)

STOP, and

gk --+ ADD(k> --->

STOP.

Saying t h a t a process P successfully t e r m i n a t e s (under some condition) is the s a m e as saying t h a t P evolves to STOP (under t h a t condition). So in eq. 2 above, Dk can be used to delete reactions, and A k can be used to add reactions. Using the evolves operation, we can express the process of reactor a d a p t a t i o n as follows: R

~

R'

(3)

w h e r e a' is the a d a p t e d reactor and a is some combination of A k and Dk conditions. In [10] we describe how to m a k e these a d a p t a t i o n s in a safe m a n n e r despite the fact t h a t the p l a n n e r does not know w h e n or which reaction m i g h t be firing. Ideal React.or. The u l t i m a t e objective of the p l a n n e r is the construction of a reactor t h a t achieves its goals not only in the current environment, but because change is possible, in all e n v i r o n m e n t s t h a t m i g h t occur, as indicated by the e n v i r o n m e n t model. A reactor t h a t fulfills this criterion is called an ideal reactor, and is similar in behavior to Schoppers' universal plans[14]. It is unrealistic to suppose t h a t a p l a n n e r will always be able to generate the ideal reactor in one step, and in [9] we introduced the more restricted m-ideal reactor am. In p l a n n e r terms, m is a set of assumptions u n d e r which the reactor R~ has been constructed. We propose the following i n c r e m e n t a l reactor construction strategy: m is initially chosen to allow the p l a n n e r to quickly produce a goal achieving reactor; t h e n co is g r a d u a l l y relaxed over time. The evolution in eq. 3 can be used to r e p r e s e n t this s t r a t e g y as follows: R m1

for

~

e(col) >

Rm2

e(co2)

a2 --+ >

a--+ n

... ....

>

R*

(4)

e(O)

where the ai is an a d a p t a t i o n c o m m a n d g e n e r a t e d by the p l a n n e r and e0 is an evaluation function. The a s s u m p t i o n relaxation priorities are governed by the ordering on o~ dictated by e0. The conditions t h a t trigger the p l a n n e r to produce an improved reactor are described in [9] and the constraints on convergence to a* overviewed.

106 3.

THE KITTING ROBOT PROBLEM

Assembly components are fed from stores to a kitting station on a conveyor belt. The kitting station consists of one or more kitting robots. Each kitting robot must take parts off the belt, place them in the appropriate slots of a kitting tray, and then place the tray onto the belt. The trays are stored in a stack in the robot's work space. The product we are kitting is a small DC servomotor manufactured by Philips. It has three parts: a cap, a motor and a body, all of which have a number of variants. The planner-reactor approach provides a way to incrementally construct reactive systems with improving performance. At the heart of this iterative mechanism is the concept of characterizing the environment by a set of assumptions and methods to detect whether they hold or not. Kitting Assumptions. The set of assumptions we have developed for the kitting environment is described below in order of likelihood of not holding: the ordering we have chosen for the planner to relax them (i.e., the e0 function in eq. 4). The planner uses these assumptions to help it to quickly build a reasonable reactor. Later, as it has time, it relaxes assumptions and fashions improved versions of the reactor. When the environment goes through regimes where different subsets of assumptions hold, the planner can capitalize on this, especially when new factory goals have been received and resultant changes in behavior are necessary. Some of the assumptions, such as that of the quality and substitutability of parts, must eventually be relaxed to get to a robust work cell. Some other assumptions, such as that of cooperation in kit assembly and tray disturbance, describe contingencies that make a more versatile work cell, but are not necessary for a robust system. Still other assumptions pertain to unusual operating conditions, such as the assumptions of no downstream disturbances. The kitting robot problem uses nine assumptions in total. In our experiments, however, only the following five were relaxed: 1. 2. 3. 4. 5.

Assumption of Parts Quality (AQ): All the parts coming into the kitting work cell are of good quality and do not need to be tested. Assumption of non-substitutability of parts (AS): Each part has only one type. Assumption of no parts motion (AM): The kit parts do not move around once delivered on the belt to the work cell. Assumption of no downstream disturbance (ADD): Downstream automation is always ready to receive finished kits. Assumption of parts ordering (AO): The robot cannot modify the preplanned parts ordering.

In addition to the iterative relaxation of assumptions based on likelihood, the planner can also be forced to relax assumptions. Whenever the planner builds a reactor that depends upon a particular assumption holding, it needs to embed a monitor process that can detect whether the assumption actually holds in practice or not. The monitor process triggers a 'clean up' action if the assumption ever fails, and notifies the planner that it needs to relax this assumption next.

107 4.

SITUATIONS

Reactor situations are a m e c h a n i s m to group related reactions together in a hierarchical fashion. This m o d u l a r i t y is i m p o r t a n t because it allows (human) designers to more easily u n d e r s t a n d the reactor and diagnose any problems t h a t m a y occur; it provides the p l a n n e r with a unit around which to define safe changes to the reactor s t r u c t u r e [10] and it provides a way to give computational resources to those reactions t h a t currently need it, while 'suspending' others. Intuitively: a situation being active expresses the a p p r o p r i a t e n e s s for the reactor to enable the set of reactions associated with t h a t situation. Situations can be nested hierarchically, and m a n y situations m a y be active at one time so t h a t the execution of their reactions will be interleaved. We introduce a small set of basic processes with which to build reactor situations in T5 (table 1). For example, an instance of a situation is asserted by the execution of the A S S E R T s p l , ~ 2 .... process, w h e r e p l, p 2 .... are the values for the p a r a m e t e r s associated wit~ the situation. The SIT s ( k , p l , . . . ) process, w h e n executed, suspends itself until an instance of situation s is asserted. It t h e n t e r m i n a t e s and passes on the details of the situation instance as its results. * ASSERTS,p Assert situation s and initialize the situation p a r a m e t e r s p,. . . . . . This t e r m i n a t e s w h e n the situation is t e r m i n a t e d and stops or aborts depending on w h e t h e r FAIL or SUCCEED was used. * FAILS, k T e r m i n a t e s instance k of situation s with fail status. * SUCCEEDs, k T e r m i n a t e s instance k of situation s with success status. * SIT s ( k , p l , p 2 , . . . ) T e r m i n a t e s if an instance of situation s is currently asserted. k is the instance n u m b e r and p l , p2, ... are the current values of the situation parameters. Table 1: Situation basic processes We use the S I T s process to ensure t h a t a reaction associated with a situation is only 'enabled' w h e n an instance of t h a t situation is asserted. For example, let p1, . . . . , p n be the reactions for situation s, t h e n we would r e p r e s e n t this in the reactor as follows:

PS

=

SITs:;pI I SITs:;P2 I ""I SITs:;Pn

(5)

The ':;' operation ensures t h a t as long as the situation is active, t h e n the reactions are continually re-enabled. Note t h a t once enabled, a reaction cannot be disabled until it has t e r m i n a t e d . The asserting and t e r m i n a t i o n of instances of situation s h a p p e n as the side effects of ASSERTs and FAIL s or SUCCEEDs processes in reactions in other situations.

108

5.

THE PLANNER

The planner needs to adapt the reactor to suit new goals and/or changing operating conditions. New goals may be issued by factory management. Through feedback from perceptions t h a t the planner has inserted in the reactor the planner is made aware of the current operating conditions, e.g. the current set of assumptions t h a t hold in the environment. Based on these two sources of information, the planner reasons to arrive at an incremental set of adaptations, such t h a t the reactor's behavior becomes ever more goalworthy. This section outlines the principles of operation of the planner with emphasis on its interactions with the reactor, a detailed description of the planner is given in [9].

5.1. The P l a n n e r Algorithm At every planner iteration t, based on the environment model TM~ and current goals Gt, the planner generates what we call an expectation T_t:an abstract description of the changes it expects to make to the reactor to achieve Gt. At every iteration, the combination of the reactor model and the expectation is exactly the co-ideal reactor (where co is the current set of assumptions the planner is working with).The process of incremental improvement of the reactor can now be seen as incremental reduction of the planner's expectation by some ATt and corresponding increment of the reactor ARt, until when Tt is null, the current reactor is exactly the ideal reactor. Additionally, when assumptions are used to construct the reactor increment, the planner will insert assumption monitors AIt into the reactor. Similarly, if the planner needs specific information for its own planning purposes, it can insert additional perception processes APt, processes t h a t collect information in the reactor and report back to the planner. Goals G

T,

r I I I I I Z~

t

I

i i i

current reactor

Rt

AIr ~

perception monitors

Pt

SI

t

I I I I

=1

Wt

PS i

PLANNER

',

.i

Fit

REACTOR

Figure 2: Planner-Reactor Interactions

WORLD

12

109 The input/output interactions between planner and reactor are as shown in Figure 2. The planner incrementally produces changes in the reactor structure ARt plus the associated assumption monitors AI t and additional perception processes APt. It receives back information from existing perception process vP t and any signals from assumption monitors that have failed FI t .

Figure 3: The Planner Architecture

5.2. The p l a n n e r architecture. The block diagram of the planner's internals is shown in figure 3. Based on the environment model TM and the goals G,, the planner accumulates a current Problem-Solving Context (PSC). Within that PSC, the planner generates an abstract plan, the expectation Tt. The current set of assumptions ~ s ~ is used to filter the environment model to generate a simpler model in the PSC. The current expectation Tt is analyzed to determine how much of it can be reduced A~. This incremental expectation is decomposed (in the planning sense) in the context of this filtered world, to generate an adaptation of the reactor ARt and the set of assumption monitoring processes to go along with it AIt. A set of perception requests can also be generated at this point APt; these requests will return data that clarify uncertainty in T M t and allow further decomposition of ~. The assumption monitor processes may signal that the assumptions they protect have been violated; this results in ~ 5 ~ being decremented by that assumption, and it may thus result in being revised and probably increased again.

110

5.3. Expectation reduction. To reduce this expectation, the planner reasons within a Problem Solving Context consisting of the relevant parts of the environment model TM, the action repertoire A and the assumptions o~ The outcome will be a reactor adaptation ARt, the reactions necessary to implement the expectation reduction ATt. The manageable size of AT_t is constrained by the following factors: 1.

The response time of the planner. This can come from the design specification of the planner, or from the temporal constraints on goals and actions in the plan. 2. The effects of uncertain knowledge on the plan. The planner may need to issue perception requests before more of the expectation can be reduced. 3. Resource use and mappings. Initially, the planner reduces an expectation by looking for an abstract plan that achieves the given goals. This abstract plan, representing the structure of a more detailed plan, is maintained through insertion of situations into the reactor. The more detailed actions corresponding to the expansion of an abstract action are encoded as reactions in the situation, representing the abstract action. Any reactor segment sent to the reactor must be directly executable. However it need not necessarily be concrete, i.e., a reaction of a given situation may only contain an STUB process, a process that acts as a 'placeholder' and, when executed, only sends a perception back to the planner stating that this particular situation has become active. We call this perception signal a stub-trigger. In this way, the planner can adapt the reactor without needing to fully refine its plan first. Before inserting any adaptation into the reactor however, the planner proceeds to detail the first step in the abstract plan, such that for this first step a concrete plan is in place (i.e., no STUB) when the adaptation is sent. This avoids an immediate stub-trigger signal from the reactor for this segment. Instead it allows the reactor to start operating immediately, while the planner can proceed in parallel to flesh out other abstract parts of the plan. For the kitting robot domain, an initial abstract plan would be a sequential composition of the situations AcquireTray, FillTray, and RemoveTray. Only AcquireTray needs to be concrete in order for the planner to adapt the reactor. The other situations can have S'I~'B reflexes initially. The planner can refine these while the reactor acquires a tray. The s ~ m processes in the reactor provide the planner with much needed focus-of-attention information, i.e. the stub trigger signal tell the planner which part of the reactor configuration is in immediate need of refinement. Hence the planner can concentrate on refining the relevant parts of the reactor first, as indicated by the environment by a stub-trigger. The following process descriptions show a reactor segment generated by the planner for this initial adaptation. Each reaction in the segment below is in an ifthen-else form ( ( - C o n d : I f T r u e ) : E l s e ) . The REASSERT process continually reasserts failing situations. P0 is necessary to assert the topmost situation repeatedly. The process MOVeobj,loc repositions a grasped object obj to a location loc.

111 PO

=

S T O P : ; ASSERTKitting

P1

=

P2

=

SITAcquireTray(k >: ; ( - ( ( REASSERTFindPart (xl> : MOVExl ' trayarea : (UPDATEAcquireTray, k, xl; SUCCEEDAcquireTray, k ) ) : FAILAcquireTray, k )

P3

=

SITFindpart(k> : ; ( " ( ( Findtray(xl> : UPDATEFindPart, k, xl ; SUCCEEDFindPart, k ) ) : FAILFindPart, k )

P4

=

SITFillTrayt(k, x l } : ; (STUBFillTray: SUSPEND2000; FAILFiUTray, k)

P5

=

SITFinishTrayt(k , xl>. ; (STUBFinishTray : SUSPEND2000; FAILFinishTray, k)

SITKitting(k >: ; (- ( (REASSERTAcquireTray (xl> : REASSERTFilITray, xl : REASSERTFinishTray ,xl ) " SUCCEEDKitting, k) : FAILKitting, k)

Barring any perceptions received from the reactor, the planner continues this process, incrementally fleshing out abstract segments. At every iteration, the expectation is reduced by fleshing out a situation. The situation hierarchy is restricted to be an acyclic digraph, i.e., abstract actions used higher in the plan hierarchy cannot be used to construct a concrete reaction. If all the reactor segments are made concrete and all the STUB processes removed. Then, the planner has achieved an co-ideal reactor, and can proceed to relax the next assumption.

5.4. Perceptions Perceptions may change the course of operation of the planner. They provide key information for the planner to decide what part of the reactor to elaborate upon next. The planner specifies which perceptions it wants to receive, i.e. which sensory data the reactor needs to collect for the planner's purpose. Reactions are ideally suited for situations where the data yielded by the embedded sensing is of no interest to the planner, because such data has only local effect within the reflex. Unknown or uncertain data that can have an effect on future reactor segments, however, should be collected differently. For example, in the case of a constant, but unknown, fact, it is inefficient to manufacture a reaction with a conditional on this fact, since we know a priori that all but one branch of that conditional is wasted. Factoring in resource use constraints means that some problems would become impossible when implemented with conditionals. There simply won't be enough resources available to cover all the a priori options. In addition, posting perception processes is useful for longer term information that is important to the planner, e.g. in the kitting domain, the fact of whether downstream automation is always ready to receive finished kits. In these cases, the planner produces perception requests: these are actions that report information back to the planner. The planner uses this information to refine its knowledge about the world, and thus remove the troublesome uncertainty. In contrast, insertion of communicating networks in the reactor is clearly most appropriate for transitory information, e.g., the position of a cap on the belt, or for determining which of the situations, sit(t), of a task, t, is currently present.

112 This fosters a fast response on the part of the reactor, it allows the reactor to deal with contingencies, and to take advantage of opportunities such as fortuitously occurring goal conditions. Apart from informational perceptions, the planner relies on four specific perceptions to set its priorities for the planning process. These perceptions are the following: 1.

2.

3.

4.

Guard-kills. The planner is informed when an ~. process is killed. ~. processes ('guards') form the basis by which an adaptation can remove old structure and install new structure in the reactor. This perception indicates to the planner the completion of an adaptation. Stub trigger. The execution of a sTtm process in the reactor signals the planner that the reactor has started to execute a segment for which no concrete plan yet exists. Situation failure. This signals the planner that a reactor segment failed (usually through an invalidated assumption), and needs to be modified as soon as possible. Assumption failure. The planner installs dedicated processes (A]:t) in the reactor to monitor assumptions. If one of the assumptions is invalidated, its monitor will signal the planner.

Guard-kills govern the adaptation cycle and are necessary for the planner to preserve the safety of the adaptation process. If a situation is still being adapted, the planner cannot send out a next adaptation for that same situation since the safe adaptation mechanism is only safe for one adaptation per situation at the time. Thus the planner holds back the next adaptation instructions for that situation until the guard-kill perception is received. The other three types influence the planner's priority in planning adaptations. The planner uses these to decide which part of the expectation to reduce first. In effect, the environment (through the reactor) tells the planner what parts of the reactor should be considered for adaptation. The priory ordering for planning is the following: Failed situations > Forced assumption relaxation > Active stubs > Quiescent stubs.

In absence of any perceptions received i.e in a fully cooperating environment, the normal procedure is to build an initial reactor first using all assumptions. This initial reactor is built in increments with stub refinement proceeding along temporal ordering of the reactor segments. Subsequently, assumptions are relaxed to widen the scope of the reactor.

113 5.5. F o r c e d R e l a x a t i o n .

Any of the assumption monitors or stub triggers i n the initial reactor segments can potentially signal the planner and divert its attention from the a priori established ordering of assumptions relaxation. Failure of a situation to successfully complete its reactions is also cause for a perception. These three sources of perceptual input have different effects on the planner. Stub trigger. On receiving a stub trigger signal, the planner redirects its focus of attention for refining the abstract plan to the portion of the plan containing the stub trigger. The planner inserts the appropriate refinement into the reactor. The reactor then can start executing that segment immediately, since the situation is still active. Assumption failure. An assumption failure perception has a larger effect than simply refocusing attention. The assumption failure signal causes the planner to negate the assumption in its Problem Solving Context and begin to rebuild the effected portions of its plan. Situation failure. If a situation relying on that assumption failed, that particular situation is given highest priority for adaptation, even though other assumptions may have priority in the assumption ranking. This means the planner may have to turn its focus back to parts of the plan it had previously considered finished. Apart from its cause, a forced relaxation or refinement results in the same reactor segment and adaptation sequence as a normal relaxation or refinement. 6.

EXPERIMENTAL RESULTS

The kitting example described section 3 was implemented and several experimental runs conducted on a PUMA-560 based kitting work cell. Trace statistics were gathered on each run on, e.g., the times when assumption failures were detected, when adaptations were made, the number of assumptions in use in the reactor, and so on. In each run, the planner utilized all the assumptions described in section 3. However, only five of the assumptions were actually relaxed in these trials. The purpose of these experimental runs was to begin to explore the behavior of a planner-reactor system in practice. Results from the runs are presented in the following subsections. Initial Reactor Construction. Figure 4 contains trace statistics for the start-up phase on a typical experimental run of the kitting work cell. Graph (a) shows the number of assumptions in use at any time in the work cell by the planner and by the reactor. Graph (b) shows the number of adaptations issued (by the planner) and the number of discrete changes to the reactor structure (adaptations applied). Initially there is an empty reactor in operation. After 26 seconds (in this example) the planner is in a position to start to update the reactor. (This time include planning plus communication time.) The planner begins by sending adaptations to the reactor (adaptations issued). Each adaptation involves one or more changes to the reactor structure. Thus, eventually, the adaptations applied will outnumber the adaptations issued.

114 The constraints involved in safe adaptation can cause a time lag in implementing changes. Active situations cannot be i n t e r r u p t e d - that would leave the reactor in an undefined s t a t e - instead the adaptation waits for the situation to end before applying the changes; the theory behind this is presented in [10]. Thus the adaptations applied trace always lags the adaptations issued trace. The lag time varies, depending on the activity of the reactor. This is what causes the reactor assumptions trace in Figure 4(a) to lag the planner assumption trace.

Figure 4: Start-up phase statistics for one run of the Planner-Reactor based Kitting Robot The first reactor was in place roughly 28 s after start-up. Even though this reactor was not complete, the kitting robot could start kitting; its kitting actions were overlapped in time with the further refinement of its kitting 'program'. This reactor was elaborated over the following 20 s until by 40 s after start-up the first complete reactor was in place. This reactor employed all nine assumptions. Up to this point, there was little delay between the planner reducing an assumption and the reactor following suit. Assumption Relaxation. Then the planner began to relax assumptions to incrementally improve the reactor. It took roughly 5 s to relax the first assumption (the length of the 'plateau' in the planner trace in Figure 4(a)). It then began to issue the adaptations to the reactor. Typically one or more adaptations is necessary to relax any one assumption. Each adaptation will give rise to one or more structural changes in the reactor, again subject to the delays of the safe adaptation constraint. In the case of this first assumption relaxation, its took the reactor roughly 60 s before it was able to implement the last structural change to relax the assumption (the 'plateau' on the reactor trace in Figure 4(a)). It took the planner roughly 30 s to relax the second assumption; but the change was implemented in the reactor almost immediately after the first relaxation. These lags and overlaps in operation underscore the highly asynchronous and independent operation of planner and reactor.

115 In these experiments, we restricted to planner to only relax two assumptions voluntarily. Having done this, the planner will only relax other assumptions when the environment forces their failure. This could occur within the start-up phase, of course, but for convenience of presentation we have separated the two here.

Figure 5: Forced relaxation statistics Environment Forced Relaxation. Figure 5 shows trace statistics for the ongoing operational phase in a typical experimental run of the kitting work cell. Again, graph (a) displays the assumptions in use by the planner and reactor, and graph (b) displays the trace of adaptations issued and applied. Additionally, Figure 5 (b) now also shows the trace of assumption failure perceptions. These are the perceptions that force the planner to relax an assumption. The two forced assumption relaxations in Figure 5 are the no-motion (AM) and downstream disturbance (ADD) assumptions. The failure perception for AM is generated when the robot fails to acquire a part it had identified in an initial visual image of the work space. The failure perception for ADD is manually invoked, and would normally be an input signal from downstream automation. At 152 s the AM assumption failure was signaled (Fig. 5(b)). The planner responded very quickly and began to issue adaptations by 155 s, continuing for about 10 s. There was little delay in the safe adaptation procedure and the structural changes to the reactor lag the adaptations issued by only about ls. This is also evident from the assumptions trace in (b). Part of the changes in this adaptation was to remove the assumption failure monitor for this assumption; thus, partway into the adaptation (161 s) the signals from the assumption monitor cease. The total forced relaxation response time ~ the time from failure to final change in the reactor m is roughly 11 s. The ADD failure occurred at 215 s and proceeded in a similar fashion. Reinstating Assumptions. In our theoretical analysis [9, 10] we did n o t address the problem of reasserting previously failed assumptions. Nonetheless, this is convenient in practice to model operating regimes, and therefore is part of our objectives. In the experimental run, at time 357 s the ADD assumption was reinstated

116 (Figure 5b)). On the relaxation of such reinstateable assumptions, the planner adds a reactor monitor that will signal when the assumption holds again. (It doesn't show up as an assumption failure, since it isn't). As the trace statistics show, our implementation does indeed handle the reintroduction of assumptions. However, further work is necessary to extend our convergence results to include this case. Ordering of Assumption Relaxation. The graphs of assumptions versus time do not indicate which assumptions are relaxed at each step. Figure 6 shows the interval for which each assumptions is in use in both planner (solid line) and reactor (broken line). The acronyms on the Y-axis refer to assumptions. Four assumptions m ADT (tray disturbance), ACP (cooperation), ACT (competition), and AF (filled trays) m are not relaxed in this experiment and hence hold throughout. The assumptions of quality (AQ) and substitutability (AS) are introduced first. Note that the reactor does not drop these assumptions for some time after the planner has dropped them. This is also shown in Figure 4 but there it wasn't evident which assumptions were involved. The downstream disturbance assumption (ADD) is introduced, then relaxed and then reintroduced, as shown in Figure 5.

Figure 6: Reactor assumption usage over time

Figure 7: The response to immediate failure of assumptions

Immediate Assumption Failure. The assumption relaxation in the experiment shown in graphs 4 and 5 is artificial: The planner-directed relaxation of assumptions is carefully separated in time from the environment-directed relaxation.

117 This is unlikely in practice, both planner and environment will be simultaneously directing relaxation. The assumptions graph in Figure 7 was generated in simulation (the previous graphs have been from real robot experiments) by assuming that whenever the reactor tests an assumption, the assumption failso The first difference with the previous experiment, (Figure 4) is that the planner never gets to use all the assumptions. Failure signals begin as soon as the first reactor update is made, even before the entire first ideal reactor is constructed. The planner quickly relaxes the AQ, AS and AO assumptions. It then finishes the reactor, introducing the ADD assumption which also immediately fails (the 'blip' at 44 s) and is replaced. 7.

DISCUSSION

In summary, we have reviewed our motivation and approach to casting planning as adaptation of a reactive system and have presented the first performance results of the approach, a first for integrated systems of this kind. These results show the feasibility of our approach. The incremental reactor construction and assumption relaxation strategy minimizes planning delays. The reactor can start kitting while the planner still is refining later parts of the task. The asynchronous communication between planner and reactor allows the latter to be real-time and not bogged down by the planner. The planner stays informed through its implantation of perceptions in the reactor. Information from these perceptions gets shuttled back to the planner, which employs it to decide on the appropriate parts of the reactor to modify next. Thus, the planner is able to support the reactor's direct needs, despite the asynchronous link between the two. REFERENCES

1. J. Bresina & M. Drummond, Integrating Planning and Reaction, in: AAAI Spring Workshop on Planning in Uncertain, Unpredictable or Changing Environments, Stanford, CA, J. Hendler (ed.), Systems Research Center, Univ. of Maryland (1990). 2. R. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal Robotics & Automation, 2 (1986). 3. J. Connell, SSS: A hybrid architecture applied to robot navigation. IEEE Int'l Conf. on Robotics & Automation (1992). 4. R.E. Fikes and N.J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, (1971), 189-208 5. B.R. Fox and K.G. Kempf, Opportunistic scheduling for robotic assembly. IEEE Int'l Conf. Robotics & Automation, St.Louis MO (1985), 880-889. 6. Th. Fraichard & C. Laugier, On-line Reactive Planning for Non-Holonomic Mobile Robot in a Dynamic Environment, IEEE Int'l. Conf. on Robotics & Automation, CA (1991). 7. E. Gat, Alfa: A language for programming reactive robot control systems, IEEE Int'l Conf. on Robotics & Automation, Sacramento, CA (1991), 1116-1121. 8. D.M. Lyons, Representing and analyzing action plans as networks of concurrent processes. IEEE Trans. Rob. & Aut}, 9 (3), (1993).

118 9. D.M. Lyons & A.J. Hendriks, Planning for Reactive Robot Behavior, IEEE Int'l Conf. on Robotics & Automation, Nice, France (1992). 10.D.M. Lyons & A.J. Hendriks, Safely Adapting a Hierarchical Reactive System, SPIE Intelligent Robotics & Computer Vision Boston, MA, (1993). 11. D.M. Lyons, A.J. Hendriks & S. Mehta, Achieving Robustness by Casting Planning as Adaptation of a Reactive System, IEEE Int'l Conf. on Robotics & Automation, Sacramento, CA (1991), 198-203. 12. A.J. Hendriks and D.M. Lyons, A Methodology for Creating and Adapting Reactive Systems, IEEE Int'l conference on Tools for AI, San Jose, CA (1991), 220227. 13.D. McDermott, Robot Planning, Tech. Report YALEU/CSD/RR #861, Yale University, (1991). 14.M. Schoppers, Representation and Automatic Synthesis of Reaction Plans, Ph.D. Thesis, Tech. Rep. uiucdcs-r-89-1546, CS dept., University of Illinois at Urbana-Champaign (1989). 15. C.J. Sellers & S.Y. Nof, Performance Analysis of robotic kitting systems, Rob. & Comp. Integ. Manuf. 6 (1989). 16.X. Xiadong and G.A. Bekey, Sroma: An adaptive scheduler for robotic assembly systems, IEEE Int'l Conf. on Robotics & Automation, Philadelphia, PA (1988).

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

119

Robot task programming by human demonstration: Mapping human grasps to manipulator grasps Sing Bing Kang and Katsushi Ikeuchi

School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

Abstract To alleviate the problem of overwhelming complexity in grasp synthesis and path planning associated with robot task planning, we adopt the approach of teaching the robot by demonstrating in front of it. A system with this programming technique is able to temporally segment a task into separate and meaningful parts for further individual analysis and recognize the human grasp employed in the task. With such derived information, this system would then map the human grasp to that of the given manipulator, plan its trajectory, and proceed to execute the task. This paper describes how grasp mapping can be accomplished in our system. The mapping process essentially comprises three steps. The first step is local functional mapping, in which grasps of functionally equivalent fingers are established. This is followed by gross physical mapping which produces a kinematically feasible manipulator grasp. Finally, by carrying out local grasp adjustment using some task-related criterion, we arrive at a locally optimal manipulator grasp. We describe these steps in detail in this paper and show results of example grasp mappings.

1. Introduction There is always a need for a portable, safe, easy, fast, and robust means of programming robots to perform tasks. Current conventional methods of programming robots such as programming using a robot language and direct manipulation of the robot through the desired trajectory (i.e., manual lead-through), while useful, fall short on some of these characteristics. Completely automatic task planning (with the inputs being just the high-level specifications of the task goals) is difficult to achieve in practice. This is attributed to the high combinatorial complexity of grasp synthesis and path planning in the presence of obstacles. We seek to mitigate this problem by using cues extracted directly from human performance of the task of interest. The

120 approach of programming a robot by allowing it to observe the human demonstrating the task is also known as Assembly Plan from Observation (APO) [9]. A major component in task planning is planning the manipulator grasp. In the following section, we briefly discuss some of the issues involved in grasp planning and reiterate the justification for our approach.

2. Grasp Planning Grasp planning involves determining the hand placement relative to the object of interest as well as the posture of the hand assumed in grasping the object, both in a manner consistent to the requirements of the task. It is computationally intensive due to both the large search space and the incorporation of geometric and task-oriented constraints. Approaches to grasp planning can be categorized as geometric and mechanical [ 19]. The geometric approach considers only the spatial aspects of the objects and arm/hand system, while the mechanical approach incorporates the arm/hand kinematics, system dynamics, and their task-related constraints. The large search space for an optimal grasp can be reduced in size or dimensionality by using certain heuristics or structural constraints. For a 2-finger gripper, for example, the surface element pair selection can be guided by a series of constraints, such as face parallelism, permissible face separation, and non-zero overlap between projected faces [24]. By imposing fixed contact locations or fixed types of grasp, Pollard computes a 6D projection of the 15 DOF configuration space for the Salisbury hand [20]. This reduced space represents the space of wrist poses for geometrically feasible grasps, and is used to search for a globally optimal grasp. In addition, a popular strategy for reducing the complexity of planning the dextrous hand grasp is by using a small set of predetermined grasps or grasping behaviours (e.g., [2], [7] and [21 ]). We adopt the approach to make grasp planning more tractable by using cues extracted from human execution of the grasp. The approach first involves the temporal segmentation of the recorded task executed by the human which locates the instant when the grasp occurs [ 11]. The subsequent steps are recognizing the type of grasp [12][13] and recovering the approximate grasp contact locations from observed data. This is followed by determining the initial approximate grasp which satisfies kinematic and geometric constraints, and subsequently performing local optimization of the grasp using the approximate grasp as the starting condition and a taskoriented objective function. Our grasp mapping approach is described in detail in the following sections.

3. Mapping grasps 3.1. General approach At the highest level of description, planning the manipulator grasp based upon the observed human grasp comprises the following steps:

121

1. Local functional mapping This is done by observing the type of grasp used by the human, and the functions and grouping of the palm and individual fingers. The latter is determined using the real-finger-to-virtual finger mapping described in [ 13]. The mapping of the manipulator end-effector to the virtual fingers is dependent on the effective cohesive index and the cohesive indices of the individual virtual fingers.

2. Adjustable or gross physical mapping This operation is carried out using the outcome of the local functional mapping and the initial location of the manipulator (near the pose of the human hand while carrying out the task). It results in an approximate form of the manipulator grasp of the object which is geometrically feasible.

3. Fine-tuning or local adjustment of grasp The grasp determined from steps 1 and 2 may not exhibit static stability, or that it may not be locally optimal according to a chosen task-oriented criterion. By analyzing the criterion and iteratively perturbing the initial grasp configuration, we arrive at a locally optimal grasp. In the case of the power grasp which involve high kinematic constraint, this step is skipped. 3.2. A n a l y t i c m e a s u r e s of a g r a s p Cutkosky and Howe [5] summarizes a number of analytic measures that have been used by various researchers in their analyses of grasps, several of which are for the purpose of grasp synthesis. We can group these grasp analytic measures into different categories according to both the depth of knowledge of the grasp and hand mechanism, and the type of analysis (Fig. 1). These analytic grasp measures are instrumental in determining the manipulator grasp. The overall strategy of mapping grasps is depicted in Fig. 2. The human grasp can be described in a hierarchical fashion [ 12]. With a priori knowledge of the human hand and data derived from observation, certain analytic grasp measures can be extracted (such as connectivity and mobility). Armed with these information, together with a priori kinematic, structural and geometric knowledge of the manipulator, the grasp translator would then attempt to produce a manipulator grasp with compatible analytic grasp measures. In our work reported in this paper, as a starting point, we employ the manipulability measure as the criterion used in generating the locally optimal manipulator grasp.

122

Analytic Grasp Measures Degree-of-freedom Measures

connectivity mobility

Static Measures

Increasing depth of knowledge required of grasp and hand mechanism

force closure form closure internal forces resistance to slipping manipulability grasp isotropy

Structural and Servo Parameter Measure$ compliance stability

Fig. 1 Categories of analytic grasp measures

f

Human grasp

~

]g l ras7 ulat~

observation ofHierarchicalgrasp ~ description a priori

knowledge of human ' hand ~.,

i/

~.~ ~ ~ ~" '~ ~ ~ [ "~ ~

| |

.

description of grasp Analytic grasp measure________~s

I a priori

knowledge f manipulator

~

J

Fig. 2 Grasp mapping strategy

The functions of the hierarchical description of the human grasp are local functional mapping and gross physical mapping. Meanwhile, the functions of the analytic grasp measures are: 1. Verification of appropriateness of gross physical mapping; and 2. Local fine-tuning or perturbation of grasp to a locally optimal posture.

4. Local functional mapping This first step in mapping grasps is common to all types of grasps, i.e., power and precision grasps. The local functional mapping associates the manipulator fingers to those of the hand,

123 and is generally different for different grasps. To guide the local functional mapping, we first assign manipulator fingers as either a primary finger, secondary finger, or a palm.

4.1. Assigning manipulator fingers Prior to mapping the the manipulator fingers to those of the human in the observed grasp, the manipulator finger has to be assigned as one of the groups of fingers:

1. Primary finger/s A finger in this group would correspond to the human thumb, which has at least the same number of degrees of freedom as the other fingers. It serves not only to secure the hold of the object against the palm and/or other fingers, but also to manipulate the object.

2. Secondary finger/s This group would correspond to the fingers of the hand other than the thumb. This group of fingers would serve to secure the hold of the object against the palm and/or primary finger/s. A special case of a "finger" which is also assigned is the palm; it is passive with no local degree of freedom (i.e., discounting the global 6 DOF of the manipulator). Manipulators are designed with specific functions or level of dexterity in handling objects. The repertoire of grasping abilities may be as limited and simple as just clamping the object between its jaws as a means of securing a hold on it (as in the parallel-jaw gripper). The manipulator may be anthropomorphic to some degree and be able to mimic certain grasps of the human hand; examples of such hands are the Utah/MIT hand [10], Salisbury hand [22], Okada hand [18], Minnesota hand [15], and the Southampton hand [4]. It is usually simple to label a priori each of the manipulator fingers as either a primary or secondary finger. Once the primary and secondary fingers have been identified, the hand-to-manipulator finger mapping can proceed with the aid of another type of mapping. This mapping relates real fingers to groups of functionally equivalent fingers called virtual fingers [1][13]. A consequence of this real-finger-to-virtual finger mapping is the cohesive index [ 13], which is associated with the degree to which the real fingers in a virtual finger act in a similar manner (by comparing the object normals at the contact points). A cohesive index of a virtual finger is maximum at unity, and would indicate that the fingers within that virtual finger act exactly alike. 4.2. U s i n g t h e

cohesive index to guide the mapping of fingers

An indication of how the cohesive index is used to guide the virtual finger mapping between those of the human hand and the manipulator is illustrated in Fig. 3. (VF n refers to the nth virtual finger with C n being its associated cohesive index.) The palms, primary fingers, and secondary fingers are mapped appropriately as seen in Fig. 3. The cohesive index serves as a check - the virtual finger corresponding to the group of secondary fingers normally has a cohesive index less than that of the collection of primary fingers. The virtual finger composition determined from the real-finger-to-virtual finger mapping [ 13] take precedence, however.

124

VF o = {0} (palm) .~ VF 1={1} VF 2 = {2, 3, 4, 5}

C o = 1.0 C 1=1.0 C 2 < 1.0

5-fingered cylindrical power grasp VF o' = {0 } (palm) VF 1' = {1} VF 2' = {2}

VF0" = {0 } (palm) VFI" = {1} VF2" = {2, 3}

VF0'" = {0 } (palm) VFI'" = {1} VF2"' = {2, 3, 4}

(Decompose VF 1 and VF 2 this way because C 2 < C 1)

Two-fingered parallel-jaw gripper

Three-fingered Salisbury hand

Four-fingered Utah-MIT hand

Fig. 3 Mapping grasps according to cohesive indices of virtual fingers Once the functional finger assignments have been accomplished, the more detailed manipulator grasp planning can then proceed.

5. Approach in generating grasp and pregrasp trajectory The diagram representing the entire manipulator grasp and trajectory planning approach is shown in Fig. 4. Prior to the more detailed manipulator grasp planning, the manipulator assumes a fixed pose and posture relative to the human hand throughout the task execution; local planning is carried out within the vicinity of this pose and posture. Note that by virtue of the different form and function of the power and fingertip precision grasps, they are planned in a different manner elucidated in subsequent sections. Once the local planning is done, starting with the pose and posture of the manipulator during the grasping stage, the pregrasp trajectory is then planned backwards, with the goals of a smooth and collision-free path. This path would closely follow that assumed by the human hand trajectory.

125 Depending on the type of grasp, the approach in manipulator grasp planning differs. In the subsequent description of such mapping techniques, we concentrate on the most widely used grasps, namely the power grasp and the fingertip precision grasp. Identify human grasp Generate grasp hierarchy

$

Functional finger group assignment for manipulator xV

Follow approximate hand trajectory

Power grasp < t~

Fingertip precision grasp ~,,i~

Gross grasp posture followed by fine-tuning of grasp posture ~V

Regenerate the pregrasp trajectory

Fig. 4 Scheme for generating the manipulator grasp and pregrasp trajectory

6. Generating the power grasp The primary purpose of the power grasp, in which the hand generally envelops the object, is to ensure a secure hold of the held object at the expense of dexterity. Hence, we can dispense with the third step of local grasp adjustment.

6.1. General approach The approach taken to generate the power or "enveloping" grasp follows that taken in "reactive" planning (similar to that described in [2]). Note that this approach is taken in generating the grasp, and is not used in the actual execution of the grasp itself. The approach is characterized with the following phases:

126

1. Aligning the dextrous hand with the approach vector assumed by the human hand just prior to grasping (approach alignment phase). 2. Translating the hand (preserving its orientation) towards the object until the palm touches the object (approach phase). 3. Adjust the hand to "fit" the object to the palm (adjustment phase). 4. Flex the fingers to envelope the object (wrapping phase). Central to the alignment motions of the dextrous hand is the pivot point. It is predefined to be a point on the surface of the palm between the fingers. Its frame is oriented such that its z-direction points away from the palm and the x-direction points from the thumb (or a primary finger/ s) to the rest of the fingers. 6.2. A p p r o a c h a l i g n m e n t of t h e d e x t r o u s h a n d In order to minimize the problem of collision with the object during the pregrasp and post grasp phases, we first align the pivot frame of the dextrous hand to the approach vector assumed by the human hand just prior to grasping the object. This is known as the approach alignment phase. The approach vector is determined as the translation component of the differential transformation AT, where AT = Tg Tg_k-1, where Tg is the transformation of the hand at the grasp frame and Tg_k is the transformation of the hand k frames prior to the grasp frame (k is chosen to be 2). Once the rotational transformation (about the pivot point) has been determined, the dextrous hand is rotated in steps, with the intermediate collision detection with the object. If collision is detected, the dextrous hand is translated or "repulsed" away (preserving its orientation) from the surface of object that makes contact with the hand. The normals of the surface or surfaces in question determine the direction of the repulsion vector. 6.3. A d j u s t i n g t h e h a n d a n d e n v e l o p i n g t h e o b j e c t Once the desired orientation of the hand is assumed, planning proceeds with the following behaviors:

1. Pure translation toward the object (approach phase) The translational motion is dictated by attractive forces by the object on the pivot point on the dextrous hand. This behavior is active prior to any collision between the object and the manipulator palm.

127

~ Object n j--

~

Fig. 5 Determining direction of translation From Fig. 5, p is the pivot point on the manipulator palm, and qj is the jth sampled object surface point, rj is the vector from point qj to p, i.e., rj = qj - p. fij is the object normal at qj. The motion at each stage is M:

d M = 8=-=, Idl

d = -

E llj. rj>o

fD.r. J J

where 8 is the step size in 3D space. r is the weighting factor dependent on Irjl; the function used is o~j = Irj1-2.

Translation and rotation about the pivot point (adjustment phase) Once collision between the object and the manipulator palm is detected, this behavior becomes active. The motion associated with this behavior is defined by both the point or points of contact and the entire object. The points of contact determine the motion which tend to push away the manipulator (repulsive forces) while the entire object tends to pull the manipulator (via the pivot point) towards the object. This has the property of trying to align the manipulator relative to the object prior to grasping. It should be noted that while these behaviors are active, the posture of the manipulator is that of full extension of the fingers (for parallel jaw gripper, maximally distanced fingers). Note: The circumflex (^) over the symbol indicate its normalized version. The rotation about the pivot point is ~, where Q-

E Qj qjCP

and P is the palm, Qj is the quaternion associated with the rotation due to the jth contact point and is given by Qj =

2 Ecos ~-Oj, mjsin-Oj

0j = 8/Irjl and mj = -rj x nj.

128

3. Flexion of fingers around the object (wrapping phase) Once the manipulator has been adequately aligned with respect to the object, the manipulator would then proceed to wrap its fingers around it. This is done by flexing successively distal joints (starting with the most proximal joint) until contact is made between the associated link and the object (Fig. 6 (a) and (b)).

Fig. 6 Results of mapping a cylindrical power grasp for Utah/MIT hand (a) and Salisbury hand (b) This approach is a simple one, and is relatively independent of the manipulator (once the manipulator models have been built). This approach is made possible by the cues given in the human execution of the same task.

7. Generating the fingertip precision grasp 7.1. G e n e r a l a p p r o a c h The approach taken to plan the fingertip precision grasp is different than that for the power grasp, primarily because this precision grasp does not envelope the object. As with the power grasp, there are basically two stages:

1. Initial grasp pose and posture determination phase Let the (N+ 1) contact points on the object be denoted Pc0, Pc 1 . . . . PcN" The initial grasp pose is determined by calculating the pose of the contact frame F c and orienting the pivot frame Fpivot centered at the pivot point on the hand by a certain fixed transformation relative to F c (Fig. 7).

129

Pcl Pco

9

d .....~ P c N Fpivot

Fig. 7 Orienting the pivot frame with respect to the contact frame Subsequently, the pose of the hand is searched at this neighhorhood until a kinematically feasible grasp is found. The contact frame F c is centered about point c, defined to be

1

N

1

1/

N

c -- ~ Z 2 (Pco +pc/) = ~ Pco + ~ Pc/ i=l i=1

/

The orientation of F c is determined as follow (note that the caret ^ on a vector term indicates its normalized version):

N N l = ~ (Pc/-Pco) - Z P c / - N P c o i=1 i=1 If the best fit plane H to the contact points is r 90 = a, where r is any point on H, and 0 its unit normal, then

h = I 0,

l

if

-0,

(P X1) 9(PeN--Pc0) >0 otherwise

and~h = ~ x ~ .

2. Local grasp pose and posture adjustment phase This is done by locally optimizing a task-oriented criterion with respect to the pose and posture of the hand. The local adjustment is done at two levels; the two levels are associated with the optimization with respect to the pose (outer level) and with respect to the joint angles (inner level). Note that the inner level optimization require iterations only if redundancy of finger DOFs exists. For the Utah/MIT hand, each finger has a redundant DOF since there are 4 DOFs per finger with only the 3D position of each fingertip being fixed. There is no redundancy of DOF for the three fingers of the Salisbury hand.

130 Let Aj = (tj, rj) be the vector representing the hand pose, Ojk be the vector of joint angles representing the hand posture, and C0 be the task-oriented criterion to be optimized. Then the optimization can be symbolically represented as:

max fmax C (Aj, Ojk) ) = max C (Aj, Oj, opt)

A.J Lo.

)

A.j

The outer optimization level uses the Nelder-Mead simplex algorithm (see, for example, [ 17]), while the inner optimization is accomplished using Yoshikawa's scheme for optimizing manipulator postures with redundant DOFs [25]. For a given initial hand pose Aj, the locally optimal hand posture Oj, opt is determined using the following pseudocode:

Repeat until either the joint angle limitations are exceeded or the magnitude of joint angle change vector 6)8t is below a threshold {

= ( I - J+J) ~k since ic = O, the contact points being fixed at each iteration.

%k <---%k + l~jkSt

} where J is the hand Jacobian and J+ is the pseudoinverse of the Jacobian, i.e.,

j+ = jT ( j j T ) - I

8. Estimating the dextrous hand-object contact points Approach is as follows:

1. Determine the rough positions of the human hand-object contact points. Fit these contact points with a plane H: p 9h = d.

2. Cull off planes and edges as potential grasping places by imposing the constraint that the associated object normal should not be too close in direction to h. At each point on edge i (edge point) and point on face j (face point), let the associated normals be nei and hf respectively. Then, disregard points on edge i if Ihe 9h I > COS0e and points on face j if lhfj. h > cos0f. Since we favor faces over edges, Oei> Of, with 0 e = 75 ~ and Of = 45 ~

131 3. For each virtual finger with more than 1 real finger, fit a line to the human hand contact points, and space the hypothesized dextrous hand-object contact points equally along this line. 4. Find the object surface points nearest to the hypothesized dextrous hand-object points; for each face point, impose a margin ~mfrom the nearest edge (~m = 1.Ocm) if possible; other wise arrange its position in the center between opposite edges. An example of a fingertip precision grasp mapping is shown in Fig. 8.

Fig. 8 Results of mapping a fingertip precision grasp for Utah/MIT hand (a) and Salisbury hand (b)

8.1. The optimization criterion The precision grasp may be generated using a variety of task-oriented measures such as the manipulability measure [25], task compatability index [3], sum of force minimization [ 16], measures based on the task ellipsoid [14], among many others. We illustrate the notion of generating the precision grasp by local perturbation using Chiu's task comparability index [3]. The velocity and force transmission characteristics can be represented geometrically as ellipsoids defined as a function of the manipulator Jacobian. Let the kinematic transformation from joint space to task space be given by x = x (| where | = [01, 02 ..... 0n] T and x = [x 1, x 2, .... Xm]T are the joint and task coordinate vectors. The Jacobian links the time differentials: :e = J ( | where J(| is the mxn manipulator Jacobian matrix, whose elements are Ji,j = ~gxi/~90j. The unit sphere in joint space 91n defined by II| -< 1 is mapped into an ellipsoid in ~ m defined by

This ellipsoid is variously called the manipulability ellipsoid [25] and the velocity ellipsoid [3]. Analogous to the velocity ellipsoid is the force ellipsoid defined by fT ( j (O) jT ( O ) ) f < 1

132 w h e r e f is the force vector in task space and x is the joint torque vector. The force transmission ratio c~ and velocity transmission ratio ~ along direction u in task space can be determined to be 1

1

cz = [u T ( J J T ) u ] 2 and

13 = [u T(JJT)-~u]

respectively. Chiu defines the

compatibility index as

C : Z i

i iw~-t-"k'Zwj~j2 2 j

= Z W i [U Ti ( j j T ) Ui] ~=1 +

i

2

~_wj [u~ (jjT)

-1 uj]

:]:1

j

where the positive exponent of ~i or 13jis used for directions in which the magnitude is of interest, and the negative exponent is used for the directions in which accuracy is of interest. The w's are weighting factors that indicate the relative magnitude or accuracy requirements in the respective task directions. In our application, the finer force control directions are chosen to be along the object normals at the points of contact, while the finer velocity control directions are along the tangent plane defined by two spanning tangents.

9. System implementation The arm/hand system used to verify the grasp mapping technique is shown in Fig. 9. The arm used is the PUMA 560 arm while the dextrous hand used is the Utah/MIT hand. Both the arm and hand is controlled via Chimera, which is a real-time operating system for sensor-based control developed at Carnegie Mellon University [23]. The joint angles of the arm and the hand are fed from the grasp simulator (the display of which is shown in Fig. 10) to the arm/ hand system via Chimera. The two types of grasp performed by the Utah/MIT hand are shown in Fig. 11.

133

Fig. 9 PUMA 560 and Utah/MIT hand system

Fig. 10 Simulator display

Fig. 11 Cylindrical power grasp (left) and fingertip precision grasp (right) by the Utah/ MIT hand

134

10. Concluding remarks Our approach in programming a robot to perform a task is to have the system observe a human executing that task, identify and plan what needs to be done to accomplish the task, and finally execute the task. A very important component of planning the execution of the task is the generation of the manipulator grasp. In this paper, we have described our method in mapping the observed human grasp to that of the manipulator. While this method may not produce a globally optimal manipulator grasp (by some task-related measure), it greatly simplifies grasp planning. In addition, it illustrates our philosophy of easing the programming burden to mostly demonstrating the task; it reduces the complexity of programming and planning the task by taking advantage of the cues derived from observing a human.

Acknowledgments Many thanks to Richard Voyles for helping us set up the PUMA arm and Utah/MIT hand system. This research was supported in part by the Avionics Laboratory, Wright Research and Development Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB, Ohio 45433-6543 under Contract F33615-90-C-1465, ARPA Order No. 7597, and in part by NSF under Contract CDA-9121797.

References [1]

M.A. Arbib, T. Iberall, and D.M. Lyons, "Coordinated control programs for movements of the hand," Hand Function and the Neocortex, eds. Goodwin, A.W., and DarianSmith, I., Springer-Verlag, 1985, pp. 111-129.

[2]

D.L. Brock and J.K. Salisbury, "Implementation of behavioral control on a robot hand/ arm system", 2nd lnt'l Symposium on Experimental Robotics, 1991, pp. 1-19.

[31

S.L. Chiu, "Task compatibility of manipulator postures," The lnt'l Journal of Robotics Research, vol. 7, no. 5, 1988, pp. 13-21.

[4]

R.M. Crowder, "An anthropomorphic robotic end effector," Robotics and Autonomous Systems, vol. 7, no. 4, 1991, pp. 253-268.

[5]

M.R. Cutkosky and R.D. Howe, "Human grasp choice and robotic grasp analysis," Dextrous Robot Hands, S.T. Venkataraman, and T. Iberall (eds.), Springer-Verlag, 1990, pp. 5-31.

[6]

C. Ferrari and J. Canny, "Planning optimal grasps," Proc. Int'l Conf. on Robotics and Automation, 1992, pp. 2290-2295.

135

[7]

E. Grosso and G. Vercelli, "Grasping strategies for reconstructed unknown 3D objects," Proc. IEEE/RSJ Workshop on Intelligent Robots and Systems, Osaka, Japan, 1991, pp. 70-75.

[8]

K.H. Hunt, Kinematic Geometry of Mechanisms, Oxford University Press, 1978.

[9]

K. Ikeuchi and T. Suehiro, "Towards an Assembly Plan from Observation, Part I: Assembly task recognition using face-contact relations (polyhedral objects)," Proc. Int'l Conf. on Robotics and Automation, 1992, pp. 2171-2177.

[10]

S.C. Jacobson, J.E. Wood, D.EKnutti, and K.B. Biggers, "The Utah/MIT dextrous hand: Work in progress," Int'l Journal of Robotics Research, vol. 3, no. 4, 1984, pp. 2150.

[11]

S.B. Kang and K. Ikeuchi, "Determination of motion breakpoints in a task sequence from human hand motion," to appear in Proc. IEEE Int'l Conf. on Robotics and Automation, San Diego, May, 1994.

[12]

S.B. Kang and K. Ikeuchi, "A grasp abstraction hierarchy for recognition of grasping tasks from observation," Proc. IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems, July, 1993, pp. 194-201.

[13]

S.B. Kang and K. Ikeuchi, "Toward automatic robot instruction from perception - Recognizing a grasp from observation," IEEE Trans. on Robotics and Automation, vol. 9, no. 4, 1993, pp. 432-443.

[14]

Z. Li and S. Sastry, "Task-oriented optimal grasping by multifingered robot hands," IEEE Journal on Robotics and Automation, vol. 4, no. 1, 1988, pp. 32-44.

[15]

D. Lian, S. Peterson, and M. Donath, "A three-fingered, articulated, robotic hand," Proc. 13th Int'l Symp. on Industrial Robots, vol. 2, 1983, pp. 18/91-18/101.

[16]

X. Markenskoff and C.H. Papadimitriou, "Optimum grip of a polygon," The Int'l Journal of Robotics Research, vol. 8, n. 2, 1989, pp. 17-29.

[17]

J.H. Mathews, Numerical Methods for Computer Science, Engineering, and Mathematics, Prentice-Hall, 1987.

[18]

T. Okada, "Object-handling system for manual industry," IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-9, no. 2, 1979, pp. 79-89.

[19]

J. Pertin-Troccaz, "Grasping: A state of the art," The Robotics Review 1, O. Khatib, J.J. Craig, and T. Lozano-P6rez (eds.), MIT Press, Cambridge, MA, 1989, pp. 71-98.

[20]

N. Pollard, "Planning grasps for a robot hand in the presence of obstacles," Proc. Int'l Conf. on Robotics and Automation, 1993, pp. 723-728.

[21]

K. Rao, G. Medioni, H. Liu, and G.A. Bekey, "Robot hand-eye coordination: Shape description and grasping," Proc. Int'l Conf. on Robotics and Automation, 1988, pp. 407411.

136 [22]

J.K. Salisbury, Kinematic and Force Analysis of Articulated Hands, Ph.D. dissertation, Stanford University, Stanford, CA, 1982.

[231

D.B. Stewart and P.K. Khosla, Chimera 3.1, The Real-Time Programming Operating System for Reconfigurable Sensor-Based Control Systems, Carnegie Mellon University,

[24]

1993. J.D. Wolter, R.A. Volz, and A.C. Woo, "Automatic generation of gripping positions for assembly," Robot Grippers, D.T. Pham and W.B. Heginbotham (eds.), Springer-Verlag,

[25]

1986, pp. 55-74. T. Yoshikawa, "Manipulability of robotic mechanisms," The Int'l Journal of Robotics

Research, vol. 4. no. 2, 1985, pp. 3-9.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

137

Robot Learning By Nonparametric Regression Stefan Schaal and Christopher G. Atkeson Department of Brain and Cognitive Sciences & The Artificial Intelligence Laboratory Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139"

Abstract: We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every point in input space at which a prediction is to be made, i.e., at every query point. Such a model, however, is only generated at the time of prediction and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences. 1. INTRODUCTION Learning control means improving a motor skill by repeatedly practicing a task. We are exploring systems that learn by explicitly remembering their experiences. These systems will be useful in situations where the form or structure of the task to be performed is not known in advance. Previous work tested such methods by implementing learning for one-shot or static tasks, such as throwing a ball at a target (Aboaf et. al., 1988), and also repetitive or dynamic tasks, such as bouncing a ball on a paddle (Aboaf et. al., 1989) and hitting a stick back and forth (a form of juggling known as devil sticking) (Van Zyl, 1991). This experimental work has highlighted the importance of making sure the control paradigm used is robust to uncertainty, that the representational scheme is able to compute what is known about the task, and how well it is known, and that there is some process that generates exploration, so that modNew address of both authors: Georgia Institute of Technology, College of Computing, 801 Atlantic Drive, Atlanta, GA 30332, USA

138 els and controllers based on insufficient data are improved. In this paper we discuss the current stage of our non para metric learning methods by describing some of their recent extensions with the help of an implementation of learning with the devil sticking robot. In the given context, we are using a form of indirect learning, where a model is learned and then control actions are chosen based on the model, rather than direct learning, where appropriate control actions are learned directly (Barto et. al., 1991); the demonstrated framework, however, is equally well suited to direct learning, too. Our starting point for modeling is that we do not know the structure or form of the system to be con trolled. However, we assume we do know what constitutes a state of the system, and that we measure the complete state. Future work will discuss how to approach tasks in which complete measurements of the state are not available, or what constitutes a state is not even known. We adopted a recently developed statistical technique from nonparametric regression analysis, locally weighted regression (LWR (Cleveland, 1979; Farmer & Sidorowich, 1987; Atkeson, 1992), to learn models the system we are trying to control (Section 2). The LWR approach allows to efficiently estimate local linear models for different points in the state space. This approach is extended here to give information about the reliability of the predictions and local linearizations generated by locally weighted regression. Thus, as will be demonstrated in Section 3, a robot can monitor its own skill level and guide its exploratory behavior, and it can ensure that even in high dimensional state space good learning results are accomplished. 2. L O C A L L Y W E I G H T E D REGRESSION The point of view acquired here that the goal of a learning system for robots is to be able to build internal models of tasks during execution of those tasks. These models are multidimensional functions that are approximated from sampled data (the previous experiences or attempts to perform the task). The learned models are used in a variety of ways to successfully execute the task. Models should incorporate the latest information. As they will be continuously updated with a stream of new training data, updating a model should take a short period of time. There are also time constraints on how long it can take to use a model to make a prediction. Because we are interested in control methods that make use of local linearizations of the plant model, a representation is needed that can quickly compute a local linear model of the represented transformation. Moreover, negative interference from learning new knowledge on previously stored information should be as minimal as possible. As the most generic approximator that satisfies many of these criteria, we explore a version of nonparametric learning called locally weighted regression (LWR). LWR is a memory-based learning (MBL) system: it is trained just by storing the training data in a memory. This allows to achieve real-time learning and avoids interference between new and old data by retaining and using all the data to answer each query. Nonparametric learning sy stems like LWR approximate complex functions by using simple local models, as does a Taylor series. Examples of types of local models include nearest neighbor, weighted average, and locally weighted regression. Each of these local models combines points near to the input point of a desired prediction, the query point, to estimate the appropriate output. Locally weighted regression uses a regression procedure to form the local model, and is thus more expensive than nearest neighbor and weighted average memory-based learning procedures. However, local linear models have been shown to have a favorable balance between bias and variance

139 (Cleveland et al., 1979). They also provide the first derivative at the query point which will be shown below to be very use ful for control tasks. For each query LWR forms a new local model. The rate at which local models can be formed and evaluated limits the rate at which queries can be answered. At the end of this paper, it will be described how locally weighted regression was implemented for a real-time application. Unweighted regression finds the solution to the equations: y=X.13 (2.1a) by minimizing j=l m )2 -~E(Yi--Yi (2. lb) i=l

where X is an m.(n+ 1) matrix consisting of m data points, each represented by its n input dimensions and a "1" in the last column, y is a vector of the corresponding outputs for each data point, ~ is the (n+l)-vector of regression parameters which are to be determined, and J is the sum of squared errors over all given data points (Yi is the predicted result after fitting the parameters [3). The well-known result for [3 yields = ( x T x ) -1 XTy (2.2) and a prediction of the outcome of a query point xq becomes: (2.3)

yq = xT~

However, this gives distant points equal influence with nearby points on the ultimate answer to the query. To weight similar points more, locally weighted regression first calculates the distance of each of the stored data points (rows in the X matrix) to the query point Xq:

d~

n = E(Xq r=l

-- Xir) 2

(2.4)

The weight for each stored data point is a function of that distance:

f(d~) (2.5) Each i-th row in X and y is multiplied by the corresponding weight wi, and the regression problem is solved as be fore. In matrix notation, this introduces the diagonal m , m weight matrix Win J: j= 1 m .~Z(wi(Y i _ ~i))2 = 11 z W ( y - ~)l2 (2.6) W i -"

i=1

and the local weighted regression parameters become: 13= [(wx)T WX] -1 (wx)T Wy

(2.7)

A simple weighting function just raises the distance to a negative power, which determines how local the regression will be (the rate of drop-off of the weights with distance). 1

wi=(df) k

(2.8)

This type of weighting function goes to infinity as the query point approaches a stored data point which forces the locally weighted regression to exactly match that stored point. If

140 the data is noisy, exact interpolation is not desirable, and a weighting scheme with limited magnitude is desired. One such scheme, which we use in what follows, is a Gaussian kernel: w i = exp - 2k 2 )

(2.9)

As in (2.8), the parameter k scales the size of the kernel to determine how local the regression will be. A LWR lookup has three stages: forming the weights (Eqns. (2.4), (2.5)), forming the regression matrix WX, and solving for the regression parameters as given in (2.7). A more thorough introduction of LWR can be found in Atkeson (1992) or Schaal and Atkeson (1994). We have implemented the local weighted regression procedure on a 33MHz Intel i860 microprocessor. The peak computation rate of this processor is 66 MFlops. We have achieved effective computation rates of 20 MFlops on a learning problem with 10 input dimensions and 5 output dimensions, using a linear local model. This leads to a lookup time of approximately 15 milliseconds on a database of 1000 points.

2.1. Tuning The Fit Parameters LWR uses a few parameters which require tuning to optimize the quality of fit. Of most importance is the parameter k in (2.9) and, in multiple input systems, the distance metric m which determines the weighting of the input dimensions with respect to each other. The distance metric enters (2.4) and modifies this equation to: d 2 =~[(Xq--Xir)mr]

2

(2.10)

r=l

In the past we have used off-line global cross validation to estimate reasonable values for these fit parameters. Since we are using a local model that is linear in the unknown parameters, derivatives can be computed of the cross validation error e i = ~ - Yi with respect to the fit parameters k and m and minimize the sum of the squared cross validation error using a Levenberg-Marquardt (nonlinear least squares) procedure (MINPACK, NL2SOL). However, it is clear that these parameters should depend on the location of the query point. In this section we describe new procedures that estimate local values of the fit parameters optimized for the site of the current query point. We want to demonstrate the differences between local and global fitting in an example where we only focus on the kernel width k of a Gaussian weighting function (2.9). In Figure la, a noisy data set of the function 3 3 y = x - s i n 3 (2trx)cos(2tcx )exp(x 4 ) was fitted by locally weighted regression with a globally optimized constant k. In the left half of the plot, the regression starts to fit noise because k had to be rather small to fit the high frequency regions on the right half of the plot. The prediction intervals, which will be introduced below, demonstrate high uncertainty in several places. To avoid such undesirable behavior, a local optimization criterion is needed. Standard linear regression analysis provides a series of well-defined statistical tools to assess the quality of fits, such as coefficients of determination, t-tests, F-test, the PRESS-statistic, Mallow's Cp-test, confidence intervals, prediction intervals, and many more (e.g., Myers 1990). These tools can be adapted to locally weighted regression. We do not want to discuss all possible available statistics here but rather focus on two that have proved to be quite helpful.

141

Figure 1: Optimizing the locally weighted regression fit using different measures: (a) global cross validation, (b) local cross validation Cross validation has a relative in linear regression analysis called the PRESS residual error. The PRESS statistic performs leave-one-out cross validation, however, without the need to recalculate the regression parameters for every excluded point. This is computationally very efficient. Schaal and Atkeson (1994) show how the PRESS residual can be expressed as a mean squared cross validation error M S E c.... . In Figure l b, the same data as in Figure l a was fitted by adjusting k to minimize M S E c. . . . a t e a c h q u e r y p o i n t . The outcome is much smoother than that of global cross validation, and also the prediction intervals are narrower. It should be noted that the extrapolation properties on both sides of the graph are quite appropriate (compared to the known underlying function), in comparison to Figure 1a.

142 Prediction intervals I i are expected bounds of the prediction error at a query point X i . Their derivation can be found in most text books on regression analysis (e.g., Myers 1990), and an adaptation for LWR is straightforward (Schaal and Atkeson, 1994). Besides using the intervals to assess the confidence in the fit at a certain point, they provide another optimization measure. Using prediction intervals as optimization criterion for k, a plot similar to Figure l b is obtained. The curves in Figure 2a,b are generated by using this optimization criterion.

2.2. Assessing The Quality of the Local Model Both the local cross validation assess the quality of the local fit:

a:t

=

4MSEcr~ C

or

Q:t =

error

17-17' C

MSEcros s and the prediction interval I i may serve to

(2.11)

The factor c makes Qat dimensionless and normalizes it with respect to some user defined quantity. In our applications, we usually preferred Q:t based on the prediction intervals, which is the more conservative assessment.

2.3. Dealing with Outliers Linear regression analysis is not robust with respect to outliers. This also holds for locally weighted regression, although the influence of outliers will not be noticed unless the outliers lie close enough to a query point. In Figure 2a we added three outliers to the test data of Figure 1 to demonstrate this effect. Again, we would like to localize our criterion for outlier removal. The PRESS statistic can be modified to serve as an outlier detector in LWR. For this, we need the standardized individual PRESS residual ei,cros s (Schaal and Atkeson, 1994). This measure has zero mean and unit variance. If, for a given data point x i, it deviates from zero more than a certain threshold, the point can be called an outlier. A conservative threshold would be 1.96, discarding all points lying outside the 95% area of the normal distribution. In our applications, we used 2.57 cutting off all data outside the 99% area of the normal distilbution. As can be seen in Figure 2b, the effects of the outliers is reduced.

2.4. Data Compression Despite that the original version of LWR is purely memory-based and does not attempt any kind of data compression, it is straightforward to add this property. The "infinity" of local linear models can be converted into a finite set of piecewise linear models. An efficient way to do this is by means of a recursive tessellation of the inputs space, another nonparametric technique well-known from decision tree induction (Breiman et al., 1984). K-d tree partitioning is the most common of these methods (Friedman et al., 1977). K-d trees partition the input space into hyper-rectangles by recursively splitting the partitions along one input dimension. There are many different ways when and where to split, and we will sketch only what was used in our algorithms.

143

Figure 2: Influence of outliers on LWR: (a) no outlier removal, (b) with outlier removal For building piecewise linear models, our method starts with one partition which includes all the data. Local linear models are fit to the center of the partition and all its vertices, by LWR. If the local linear model at the center is capable of predicting the outputs of the LWR lookups at the vertices within a certain error margin, the partition is not split anymore, and, if the prediction intervals of the lookup at the center are narrow enough, the partition will be marked valid. The outputs of all query points falling into this partition can be predicted with the linear model at the center. If the linear model of the center of the partition fails to predict the vertices well enough, the partition is split at the center along the dimension where the greatest mismatch between predicted value and real value had been encountered. Then the process repeats with the newly

144 created partitions. The algorithm terminates when no more splitting needs to be done. Figure 3 shows the result of this method for the function introduced in Figure 1. As Figure 3 illustrates, the piecewise linear fit of the data represents the given curve very well. At locations where the data has high frequencies, more partitions are allocated than at the more linear regions. By changing the error margin when to accept a partition and its local linear model, the piecewise fit could become coarser or finer. One advantage of this method over other piecewise linear partitioning techniques is that the partitioning of the input space is only done in order to find candidate centers for the linear models. For the local linear fits, all data are always included in the LWR lookup. Thus, there is no hard but rather a soft partitioning of the input space since each local linear models also includes data from the neighboring partitions. Due to space limitations, this data compression method cannot be discussed in detail. It can be applied in conjunction with all the methods mentioned so far.

1.5 .-.I-

1 >,,

0.5

-

+

0

-0.5 0

''''1''''1''''1''''1''''1''''1''''1''''1''''1'''' 0.1 0.2 0.3 0.4 0.5

0.6

0.7

0.8

0.9

1

XH>

Figure 3: Piecewise linear fit of the data from Figure 1; the black dots denote the boundaries between two neighboring partitions.

3. R E A L - T I M E LEARNING OF R O B O T J U G G L I N G We have constructed a system for experiments in real-time motor learning (van Zyl 1991). The task is a juggling task known as "devil sticking". A center stick is batted back and forth between two handsticks ( Figure 4a). Figure 4b shows a sketch of our devil sticking robot. The juggling robot uses its top two joints to perform planar devil sticking. Hand sticks are mounted on the robot with springs and dampers. This implements a passive catch. The center stick does not bounce when it hits the hand stick, and therefore requires an active throwing motion by the robot. To simplify the problem the center stick is constrained by a boom to move on the surface of a sphere. For small movements the center stick movements are ap-

145 proximately planar. The boom also provides a way to measure the current state of the center stick. Ultimately we want to be able to perform unconstrained three dimensional devil sticking. The task state is the predicted location of the center stick when it hits the hand stick held in a nominal position. Standard ballistics equations for the flight of the center stick are used into a task state. The dynamics of to map flight trajectory measurements throwing the devilstick are parameterized by 5 state and 5 action variables, resulting in a 10/5-dimensional input/output model for each hand:

(x(t),y(t),O(t))

x=(p,O,k,p,O)

(3.1)

The task command is given by a displacement of the hand stick from the nominal position 0,, and a throw velocity vector (v~, vy).

(Xh,Yh), a center stick an gular velocity threshold to trigger the start of a throwing motion

U:'(Xh,Yh,Ot,Vx,Vy)

(3.2)

Figure 4: (a) an illustration of devil sticking, (b) sketch of our devil sticking robot: the flow of force from each motor into the robot is indicated by different shadings of the robot links, and a position change due to an application of motor 1 or motor 2, respectively, is indicated in the small sketches

146 Every time the robot catches and throws the devil stick it generates an experience of the form: (Xk,Uk,Xk+l)

(3.3)

where x k is the current state, u k is the action performed by the robot, and xk+1 is the state of the center stick that results. Initially we explored learning an inverse model of the task, using nonlinear "deadbeat" control to eliminate all error on each hit. Each hand had its own inverse model of the form: uk = )~-1(Xk,Xk+l)

(3.4)

Before each hit the system looked up a command with the expected impact state of the devilstick and the desired state:

=?

Inverse model learning was successfully used to train the system to perform the devil sticking task. Juggling runs up to 100 hits were achieved. The system incorporated new data in real time, and used databases of several hundred hits. Lookups took less than 15 milliseconds, and therefore several lookups could be performed before the end of the flight of the center stick. Later queries incorporated more measurements of the flight of the center stick and therefore more accurate predictions of the state x k of the task. However, the system required substantial structure in the initial training to achieve this performance. The system was started with a default command that was appropriate for open loop performance of the task. Each control parameter was varied systematically to explore the space near the default command. A global linear model was made of this initial data, and a linear controller based on this model was used to generate an initial training set for the memory-based system (approximately 100 hits). Learning with no initial data was not possible. We also experimented with learning based on both inverse and forward models. After a command is generated by the inverse model, it can be evaluated using a memory-based forward model with the same data: xk+, = )~(Xk,fik)

(3.6)

Because it produces a local linear model, the LWR procedure generates estimates of the derivatives of the forward model with respect to the commands as part of the estimated parameter vector [3. These derivatives can be used to find a correction to the command vector that reduces errors in the predicted outcome based on the forward model: - ~ aa = x k + l - x d

(3.7)

The pseudo-inverse of the matrix ~ / oM is used to solve the above equation for Aft k in order to handle situations in which the matrix is singular or a different number of commands and states exists (which does not apply for devil sticking). The process of command refinement can be repeated until the forward model no longer produces accurate predictions of the outcome. This will happen when the query to the forward model requires significant extrapolation from the current database. We investigated this method for incremental learning of devil sticking in simulations. The outcome, however, did not meet expectations: without sufficient initial data around the setpoint, the algorithm did not work. Two main reasons can be held responsible:

147 i)

ii)

Similar to the pure inverse model approach, the inverse-forward model acts as a one-step deadbeat controller. One-step deadbeat control applies unreasonably high commands to correct for deviations from the setpoint. In the presence of errors in the model, this is detrimental since it magnifies the model errors. Additionally, the workspace bounds and command bounds of our devil sticking robot limit the size of the commands. Due to the nonlinearities in the dynamics of the robot, the 10-dimensional input space of the forward model suffers from the first symptoms of Bellman's "curse of dimensionality". Error reduction as described in (3.7) only works if sufficient data exists at the query sites. The inevitable model errors will make the robot explore randomly, leading to dispersed data. This gives little chance for model improvements. Imagine we had to place data in a (hyper-)cube of normalized edge length 0.1. In a 3-dimensional input space there are 103 such cubes which leaves some probability to finally arrive at the goal. In a 10-dimensional state space, there are 101~such cubes - a prohibitive number for random exploration. Thus, two ingredients had to be added to the devil sticking controller:

a) b)

Control must start as soon as possible with the primary goal to increase the data density in the current region of the state-action space, and the secondary goal to arrive at the desired goal state. Control actions must avoid deadbeat properties and must be planned to go to the goal in multiple steps.

Both requirements are fulfilled by a simple exploration method, the shifting setpoint algorithm (SSA) (Schaal and Atkeson, 1994). The SSA attempts to decompose the control problem into two separate control tasks on different time scales. At the fast time scale, it generates controls as a nonlinear regulator by trying to keep the controlled system at the setpoints. On a slower time scale, the setpoints are shifted to accomplish a desired goal. The SSA tries to explore the world by going to the fringes of its data support in the direction of the goal. It extends its knowledge in the fringes until statistically sufficient data has been collected to make a further step towards the goal. In this way the SSA builds a narrow tube of data support in which it knows how to control. This data can be used by more sophisticated control algorithms for planning or further exploration. Applied to devil sticking, the SSA proceeds as follows: (1) Regardless of the poor juggling quality of the robot (i.e., at most two or three hits per trial), the SSA makes the robot repeat these initial actions with small random perturbations until a cloud of data is collected somewhere in state-action space of each hand. An abstract illustration for this is given in Figure 5-top-left---> 5-top-right. (2) Each point in the data cloud of each hand is used as a candidate for a setpoint of the corresponding hand by trying to predict its output from its input with LWR. The point achieving the narrowest local confidence interval becomes the setpoint (Xs,Us) of the hand and an LQ controller is calculated for its local linear model: x s = A x s + B u s+c, or Xs,ou t

:

X s --C

:

Ax s + Bu s

(3.8)

148 As in (3.7), the linear model is estimated by LWR. By means of these controllers, the amount of data around the setpoints can quickly and rather accurately be increased until the quality of the local models exceeds a certain statistical threshold.

Figure 5: Abstract illustration how the SSA algorithm collects data in space: top-left: sparse data after the first few hits; top-right: high local data density due to local control in this region; bottom-left: increased data density on the way to the goals due to shifting the setpoints; bottom-right: ridge of data density after the goal was reached

149 (3) At this point, the setpoints are gradually shifted towards the goal setpoints until the data support of the local models falls below a statistical value. The shifting algorithm executes as follows: 1) calculate the error of the predicted output state: e r r s , o u t = Xdesire d - - X S , o u t 2) solve the equation: &S,out

O3Us

m u s ._

BAus = a err s out

for Au s , e.g., by singular value decomposition (Press et al, 1989); o~ e [0,1] determines how much of the error should be compensated for in one step. 3) update u s" u s = u s + Au s and calculate new predicted output setpoint state Xs,ou , with a LWR lookup. 4) assess the fit for the updated setpoint with the help of prediction intervals. If the quality is above a certain threshold, continue with 1), otherwise terminate shifting. Shifting occurs for both x s and Xs,ou ,. After shifting, the kernel k (cf. Eqn. (2.9) is optimized by minimizing the local cross validation error M S E c.... . (In Figure 5, the goal setpoints are given explicitly, but they can also develop automatically from the require ment to throw the devilstick increasingly close to a place, in which the other hand has data, i.e., X desired,lefi : X s,right and X desired,right : X S,lefi .) (4) The SSA repeats itself by collecting data in the new regions of the workspace until the setpoints can be shifted again (Fig. 5-bottom-left). The procedure terminates by reaching the goal, leaving a (hyper-) ridge of data in space (Figure 5-bottom-right). The LQ controllers play a crucial role for devil sticking. Although we statistically exploit data rather thoroughly, it is nevertheless hard to build good local linear models in the high dimensional forward models, particularly at the beginning of learning. LQ control has useful robustness even if the underlying linear models are imprecise.

Figure 6: Learning curves of devil sticking for three runs.

150 We tested the SSA in a noise corrupted simulation and on the real robot. Learning curves of the real robot are given in Figure 6. The learning curves are typical for the given problem. It takes roughly 40 trials before the setpoint of each hand has moved close enough to the other hand' s setpoint. For the simulation a break-through occurs and the robot rarely loses the devilstick after that. The real robot (Figure 6) takes more trials to achieve longer juggling runs. In our first experiments, its performance was not very consistent. Applying LWR and a LQ controller in the above described form turned out to undersample certain directions in control space. Regression analyses need full rank data matrices to achieve good results. By undersampling one direction, the regression matrix becomes close to singular. To avoid this problem some small amount of uniformly distributed noise was added to the controls. The performance of three runs is shown in Figure 6 (note the scale of the ordinate). On average, humans need roughly a week of 1 hour practicing a day before they learn to juggle the devil stick. With respect to this, the robot demonstrated rather nice learning performance, but the controller is still local and could not correct large perturbations, and due to the boom the task is slightly simplified. Future work will attempt to further improve learning rate and robustness. 4. DISCUSSION In this paper we adopted a nonparametric approach to leaming control. By means of locally weighted regression we built models of the world first, and exploited the models subsequently with statistical methods and algorithms from optimal control to design controllers. Despite the computational complexity of these methods, we demonstrated the usefulness of our algorithms in a real-time implementation of robot learning. Using models for control according to the certainty equivalence principle is nothing new and has been supported by many researchers in the last years (e.g., An et al., 1988; Sutton, 1990; Jordan, 1992; Moore, 1991). Using memory-based or non parametric models, however, has only recently received increasing interest. One of the advantages of memory-based modeling lies in the least commitment strategy which is associated with it. Since all data is kept in memory, a lookup can be optimized with respect to the few open architec tural parameters. Parametric approaches do not have this ability if they discard their training data; if they retained all the training data they essentially become memory-based. As we demonstrated in our LWR approach to nonparametric modeling, several established statistical methods may be adopted to assess the quality of a model. These statistics form the backbone of the SSA exploration algorithm. So far we have only examined some of the most obvious statistical tools which directly relate to regression analysis. Many other methods may be suitable, too, particularly from Bayesian statistics. Training a memory-based model is computationally inexpensive, as the data is simply stored in a memory. Training a nonlinear parametric model typically requires an iterative search for the appropriate parameters. Examples of iterative search are the various gradient descent techniques used to train neural network models (Hertz et al., 1991). Lookup or evaluating a memory based model is computationally expensive, as described in this paper. Lookup for a nonlinear parametric model is often relatively inexpensive. If there is a situation in which a fixed set of training data is available, and there will be many queries to the model after the training data is processed, then it makes sense to use a nonlinear parametric model. However,

151 if there is a continuous stream of new training data intermixed with queries, as there typically is in many motor learning problems, it may be less expensive to train and query a nonparametric memory-based model then it is to train and query a nonlinear parametric model. A question that often arises with memory-based models is the effect of memory limitations. Memory use can be minimized based on several methods. One approach is to only store "surprises". The system would try to predict the outputs of a data point before trying to store it. If the prediction is good, it is not necessary to store the point. Another approach is to forget data points. Points can be forgotten or removed from the database based on age, proximity to queries, or other criteria. Because memory-based learning retains the original training data, forgetting can be explicitly controlled. Additionally, we also presented a method for data compression. Data compression on basis of k-d tree partitioning will be useful when sufficient data was collected in a region such that there is enough confidence to build a local parametric model. The piecewise local models which are generated by our method can be used in form of a lookup table which quickly provides predictions and derivatives at a query point. Smoothing across partition can be done by interpolation and will guarantee continuous first derivatives. The LWR-k-d-tree can also be built on-line and in combination with the regular LWR learning. If the k-d tree does not provide a valid partition for a query, the system falls back to the normal LWR mode, otherwise the result is generated by the LWR-k-d-tree. That computational complexity does not necessarily limit real time applications was demonstrated with our successful devil sticking robot. We are able to do lookups for memorybased local models in less than 15 milliseconds for a thousand data points modeling a 10 to 5 mapping, and we are able to build on-line LQ controllers in another 5ms. The initial shortcomings of our deadbeat inverse or inverse-forward model controllers are not due to the LWR learning algorithm but rather to the inherent problems of this kind of control. As has been pointed out by Jordan (1992), inverse models are not goal-directed and perform data sampling in action and not state space. They do not establish a connection between a certain sensation and a certain action but rather a connection between two sensations. Hence, they do not learn from bad actions. A forward model overcomes these problem. Pure forward model controllers, however, are still deadbeat controllers which try to cancel the plant dynamics in one step. This results in large commands if the system deviates only moderately from its desired goal and conflicts with the workspace bounds and command bounds of our robot. Additionally, modeling errors are strongly amplified by deadbeat control. Accurate data sampiing, as it is necessary in high dimensional spaces, becomes thus rather difficult. Due to the statistical properties of locally weighted regression, a simple exploration algorithm like the shifting setpoint algorithm is powerful enough to accomplish the desired task. Deadbeat control was replaced by LQ control which naturally blends into the LWR framework. By no means was the SSA algorithm intended to replace high-level controllers. Indeed, it remains to be explored in how far the chaining of individual LQ controllers is actually robust, and whether an approach from trajectory optimization (Dyer and McReynolds, 1970) would not be more appropriate. In favor of the SSA algorithm stands its easy implementation for real-time systems. Two crucial prerequisites entered our explanations on robot learning. First, we assumed we know the input/output representations of the task, and second, we were able to generate a goal state for the SSA exploration. A good choice of a representation is crucial in order to be able to accomplish the goal at all (Schaal and Atkeson, 1993, Schaal and Sternad, 1993), and

152 we have very limited insight so far how to automate this part of the learning process. Of equal importance is a good choice of a goal state. In devil sticking, the goal state developed out of the necessity that the left and right hand have to cooperate. The initial action, however, which was given by the experimenter, clearly determined in which ballpark the juggling pattern would lie. Certain patterns of devil sticking are easier than others (Schaal et al., 1992), and we picked an initial action of which we knew that it was in the right ballpark. One part of our future work will address these issues in more detail in that we search for good initial actions and strategies to approach a task. 5. CONCLUSIONS The paper demonstrated that a real robot can indeed learn a non-trivial task. As pointed out above, by taking input/output representations and good learning goals as given, a large portion of the task was already solved in advance. Solving the remaining problems became practicable mainly because of the characteristics of the LWR learning method. The local linear models that this algorithm generated at every query point allowed us to make use of optimal control techniques which added useful robustness to the controllers. Since LWR is memory-based, the local linear models could be optimized with respect to statistical uncertainty measures. These measures also served as the basis of the SSA exploration algorithm. Such statistical tools are not generally available in learning. LWR is particularly suited to exploit statistics since it originates from a statistical method, and we could thus easily assure compatibility of the statistics and the learning algorithm. As a last point, LWR does not suffer from problems of interference when being trained on new data. Interference means a degradation of performance in one part of the model when training the model with data relevant for different parts. Such an effect could happen during SSA shifting if a parametric learning method were applied. But since lookups with LWR are affected only by a small neighborhood of the query point, interference problems are greatly reduced. Our future work will focus on extending LWR model-based control to multistage problems in the optimal control domain. Devil sticking should not only be stable within the validity of the local linear controllers but rather exhibit global stability. This requires nonlinear optimal control and planning techniques which we are currently exploring. Future work must furthermore address how we could approach tasks in which complete measurements of the states are not available, or what constitutes a state is not even known. ACKNOWLEDGMENTS Support was provided by the Air Force Office of Scientific Research, by the Siemens Corporation, the German Scholarship Foundation and the Alexander von Humboldt Foundation to Stefan Schaal, and a National Science Foundation Presidential Young Investigator Award to Christopher G. Atkeson. We thank Gideon Stein for implementing the first version of LWR on a DSP board, and Gerrie van Zyl for building the devil sticking robot and implementing the first version of learning of devil sticking.

153

REFERENCES Aboaf, E. W., Atkeson, C. G., & Reinkensmeyer, D. J. (1988). "Task-level robot learning." In: Proceedings of the IEEE International Conference on Robotics and Automation, April 24-29, Philadelphia, Pennsylvania. Aboaf, E. W., Drucker, S. M., & Atkeson, C. G. (1989). "Task-level robot leafing: Juggling a tennis ball more accurately." In: Proceedings of lEEE Interational Conference on Robotics and Automation, May 14-19, Scottsdale, Arizona. An, C. H., Atkeson, C. (3., & Hollerbach, J. M. (1988). Model-based control of a robot manipulator. Cambridge, MA: MIT Press. Atkeson, C. G. (1992). "Memory-based approaches to approximating continuous functions." In: Casdagli, M., & Eubank, S. (Eds.), Nonlinear Modeling and Forecasting, pp.503-521. Redwood City, CA: Addison Wesley. Barto, A. G., Bradtke, S. J., & Singh, S. P. (1991). "Real-time learning and control using asynchronous dynamic programming." Technical Report 91-57, University of Massachusetts at Amherst, Amherst: Univ. of Massachusetts. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group. Cleveland, W. S. (1979). "Robust locally weighted regression and smoothing scatterplots." Journal of the American Statistical Association, 74, pp.829-836. Dyer, P., & McReynolds, S. R. (1970). The computation and theory of opitmal control. New York: Academic Press. Farmer, J. D., & Sidorowich (1987). "Predicting chaotic time series." Phys. Rev. Lett., 59 (8), pp.845-848. Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). "An algorithm for finding best matches in logarithmic expected time." A CM Transactions on Mathematical Software, 3 (3), pp.209-226. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison Wesley. Jordan, I. M. (1992). "Supervised learning with a distal teacher." In: Cognitive Science, 16, 3, pp.307-354. Mallows, C. L. (1966). "Choosing a subset regression." unpublished paper presented at the annual meeting of the American Statistical Association, Los Angles,. Moore, A. (199 l a). "Fast, robust adaptive control by learning only forward models." In: Moody, J. E., Hanson, S. J., & and Lippmann, R. P. (Eds.), Advances in Neural Information Processing Systems 4. San Mateo, CA: Morgan Kaufmann. Myers, R. H. (1990). Classical and modern regression with applications. Boston, MA: PWSKENT. Press, W. P., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in C - The art of scientific computing. Cambridge, MA: Press Syndiacate University of Cambridge.

154 Schaal, S., Atkeson, C. G., & Botros, S. (1992b). "What should be learned?." In: Proceedings of Seventh Yale Workshop on Adaptive and Learning Systems, p.199-204, New Haven, CT. Schaal, S., & Atkeson, C. G. (1993b). "Open loop stable control strategies for robot juggling." In: IEEE International Conference on Robotics and Automation, 3, pp.913-918, Georgia, Atlanta. Schaal, S., & Stemad, D. (1993c). "Learning passive motor control strategies with genetic algorithms." In: Nadel, L., & Stein, D. (Eds.), 1992 Lectures in complex systems. Redwood City, CA: Addison-Wesley. Schaal, S., & Atkeson, C. G. (1994a). "Robot juggling: An implementation of memory-based learning." Control Systems Magazine, 14, 1, pp.57-71. Sutton, R. S. (1990). "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming." In: Proceedings of the International Machine Learning Conference. van Zyl, G. (1991). "Design and control of a robotic platform for machine learning." MS Thesis, Cambridge, MA: Massachusetts Institute of Technology, Dept. of Mech. Engineering. Wahba, G., & Wold, S. (1975). "A completely automatic french curve: Fitting spline functions by cross-validation." Communications in Statistics, 4 (1),.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

155

Behavioral control in mobile-robot navigation using a fuzzy decision making approach H.R. Beom a, K.C. Koh a and H.S. Chob aResearcher, Research & Development Laboratory, LG Industrial Systems, Co., Ltd. 533 Hogae-dong, Dongan-gu, Anyang, 430-080, Korea bprofessor, Center for Robotics and Automation, Department of Precision Engineering and Mechatronics, Korea Advanced Institute of Science and Technology 373-1 Kusong-dong, Yusong-gu, Taejon, 305-701, Korea

This paper proposes a sensor-based mobile robot navigation which utilizes a multi-criterion decision-making method in uncertain environments. In order to acquire information about the environment around a mobile robot, ultrasonic sensors mounted on the front of the mobile robot are used. The multi-crtefion decision-maker uses fuzzy sets to select a reactive and purposive behavior in accordance with a global mission from a higher-level planner and the situation around the robot. The reactive and purposive behaviors correspond to avoidance and goal-seeking behaviors, respectively. The two behaviors are designed to produce the prescribed responses to currently perceived sensory information using fuzzy logic and reinforcement learning. A decision function to appropriately select one of two behaviors is formulated by use of a fuzzy set. The optimal alternative of two behaviors is defined as the one achieving the higher degree of membership in the fuzzy decision function. The fuzzy decisionmaker provides the mobile-robot navigator with a degree of flexibility which cannot be found in conventional decision-making methods. The effectiveness of the proposed method is verified by a series of simulations. 1. INTRODUCTION Path planning is one of the most vital tasks in navigation of autonomous mobile robots and is usually divided into two categories. One is global path planning based on a priori complete information about the environment and the other is local path planning based on sensory information in a cluttered environment. Local path planning utilizes the information provided by sensors such as an ultrasonic sensor, vision, laser rangefinder, proximity sensor or touch sensor. Brooks [1] applied the force-field concept to the obstacle avoidance problem for a mobile robot equipped with ultrasonic sensors whose readings are used to 'compute the resultant repulsive force. Borenstein and Koren [2] proposed the vector-field histogram method for a fast-running mobile robot equipped with ultrasonic sensors. However, with the above methods, it is difficult to find the force coefficients that influence the velocity and the direction of the mobile robot in a cluttered environment. As the altematives, fuzzy logic and a neural network have been employed in obstacle avoidance for mobile-robot navigation [3]. The neural network informs the mobile robot of its

156 situation at the present instant. According to the situation, fuzzy rules are brought into play to cause the mobile robot to act. However, it is difficult to construct and tune the rule bases. Thus, in the earlier study [4], a navigation method using avoidance behavior and goal-seeking behavior was proposed. Each behavior was designed to make the mobile robot act in accordance with currently perceived sensory information by using the fuzzy logic and reinforcement learning. Recently, advanced behavior-based systems have been proposed to give a purposive capability to the mobile robot. Arkin [5] proposed a motor schema-based architecture describing the interaction between perception and action of the mobile robot. The behaviorbased approach decomposes the system into individual modules, each of which is responsible for one behavior to be performed by the mobile-robot system. Whenever the mobile robot navigates in a cluttered environment towards the goal position, the two behaviors, avoidance behavior and goal-seeking behavior, always conflict. Hence, in order for the mobile robot to arrive at the goal position without colliding with the obstacles, a method to effectively fuse the behaviors is required. Saffiotti et al. [6] have proposed a behavior-blending method using fuzzy logic, and Salichs et al. [7] have proposed a method using reinforcement leaming. In this study, a new way of combining two behaviors using a multi-criterion decisionmaking method is proposed. A general decision-making problem can be expressed in terms of the simultaneous satisfaction of conflicting criteria. In conventional decision-making problems, a single scalar criterion, the weighted sum of the deviation from an ideal goal value, is constructed and optimized to identify non-dominated solutions. One drawback of such a formulation is that the optimum solution tends to be sensitive to numerical weights. The fuzzy decision-making approach models objective functions and constraints in the form of fuzzy goals and then transforms the problem into a conventional optimization using fuzzy weights. The decision function is a combination of criteria obtained by some aggregation operations. Two well-known aggregation operations are intersection and union. In multi-criterion decisionmaking using fuzzy sets, decision alternatives are usually obtained by constructing a decision function. In this study, the fuzzy decision function is constructed by using the sensory information about the environment around the mobile robot and global mission from a higherlevel planner. 2. MOBILE-ROBOT NAVIGATION ARCHITECTURE The proposed navigation method is used for a wheeled mobile robot named LCAR which has been developed in the Laboratory for Control Systems and Automation, KAIST. As shown in Fig. 1, the robot has four wheels: two drive wheels fixed at the sides of the mobile robot and two castors attached at the front and rear of the robot. The robot moves and changes its heading angle by the rotation of the two drive wheels. Four DC servo motors are used to drive the two wheels and one camera mounting device. LCAR is equipped with a camera, a laser lightstripe, ultrasonic sensors and a DSP board. The mobile-robot navigation architecture employed in this study is shown in Fig. 2. The architecture consists of the following modules: higher-level planner, perception module, primitive behaviors, fuzzy decision-maker, and lower-level controller. The higher-level planner, giving the goal and sub-goal positions, consists of a global path planner and a mission planner. The primitive behaviors produce a prescribed robot action to the currently perceived sensory information. According to the intervention of the fuzzy decision-maker, the output of only one

157 behavior transfers to the lower-level controller, called the dead reckoner. The lower-level controller controls the robot, taking into account the limitation of its driving angle and maximum speed. The primitive behaviors and fuzzy decision-maker will be explained in detail later.

Figure 1. The wheeled mobile robot, LCAR

Figure 2. The mobile-robot navigation architecture

2.1. Perception system using ultrasonic sensors Environments of mobile-robot navigation can be classified as static or dynamic. The dynamic environment is defined as follows: obstacles can move or change their shapes, and some obstacles may not initially be registered on the global map. In this study, eighteen ultrasonic sensors mounted in the front face of the mobile robot are used to understand environmental situations, for example, what types of obstacles exist in front of the mobile robot or whether the robot can proceed without colliding with obstacles. These sensors act both as transmitters and receivers. Fig. 3 shows the arrangement of ultrasonic sensors mounted in the front face of the mobile robot with sensors being marked as dots in the figure. The distances ej ( j - 1,2 ..... 18), from the origin of the robot frame {R} to obstacles detected by jth sensor Sj, can be defined as ej - 8j + R. Here, R is the radius of the robot and the 6j is thejth sensor reading shown as a dashed line. In this figure, ~pj denotes the orientation of thejth sensor with respect to the flame {R}. Thus, the orientations of ultrasonic sensors can be expressed by the relationship: ~oj - (j - 9.5)~, (j = 1,2,...,18).

(1)

158 Here, the angle ~ denotes the angle between two adjacent sensors and is fixed at 11.25~. The number, 9.5, is required to express the orientation angles of the sensors installed at the fight and the let't hand sides from x-axis by negative and positive values, respectively.

Figure 3. The arrangement of ultrasonic sensors and sensor suites It is assumed that the y-axis of the coordinate frame {R} is aligned with the direction of motion of the mobile robot. As shown in Fig. 3, in order to avoid the increase of the dimension of input space, sensor suites are introduced. A sensor suite consists of a group of neighboring sensors. The first and the filth sensor suites, ssa and s s 5, a r e composed of three neighboring sensors, while the other sensor suites are composed of four neighboring sensors. The distance d~ ( i - 1,2 ..... 5) measured by the ith sensor suite from the center of the mobile robot to the obstacle, and the direction #i of the obstacle with respect to the frame {R} are defined, respectively, by R + min(6j IJ : 4 i - 4, 4 i - 3, 4 i - 2, 4 i - 1) /f i : 2,3,4 at, =

+min(6j Ij = 2'-~, 2 '-' +1,2 '-I +2)

7r.

~i -- "~-(1 --

3)

9

/ f i = 1,5

(2)

(3)

The ultrasonic sensors are controlled by an 8031 microcontroller communicating with a PC/AT computer. Their control boards consist of a transmitter module to fire the pulses, a receiver/gain-control module to receive the attenuated echo pulses, and a switching module to arbitrarily select the activation of the sensor according to the sensing mode. In the simulation study, it is assumed that the sensors can ideally detect the obstacles irrespective of inclination angle. 2.2. Primitive behaviors The primitive behaviors may be classified as follows: goal-seeking behavior, wall-following

159 behavior, keep-away behavior, free-space explorer, and emergency stop. The primitive behavior can be characterized by a temporal sequence of appropriate values for its heading velocity v and incremental steering angle A0 which cause the robot to exhibit the prescribed response to sensory information. Thus, the output vector of a primitive behavior n(t) is defined as u ( t ) = (v(t),AO) r

(4)

where T denotes the transpose and t denotes the time step of the robot controller. From now on, the index t is omitted for notational simplicity. The wall-following behavior follows the boundary of stationary obstacles maintaining a prespecified distance from the center of the mobile robot. This behavior considers the minimum radius of curvature of the mobile robot. The free-space explorer provides the mobile robot with movement in a direction with the greatest distance between the robot and obstacle boundaries. The keep-away behavior basically makes the mobile robot avoid obstacles in its proximity, and produces an output vector corresponding to movement in the direction that ensures no collision with obstacles. This behavior takes information from measurements provided by the front sensors. Since the above behaviors produce mobile-robot actions based on the currently perceived sensory information, we will unify them as an avoidance behavior producing the same action as the individual behaviors do. The goal-seeking behavior produces an output which directs the robot towards a specific goal defined by a point ( x , y ) in the world coordinate frame {W}. This behavior receives the goal command from the higher-level planner and produces the mobile-robot action that causes the robot to move smoothly towards the goal position. This takes into account the minimum radius of curvature of the mobile robot. Therefore, it corresponds to purposive behavior. In this study, we will design the navigator using the avoidance behavior and the goal-seeking behavior. The fuzzy decision-maker decides the next behavior based on the fuzzy goals. Let {u~,u2,...,n,} be a set of outputs of behaviors and {q~,q2,...,q.} be an output of the fuzzy decision-maker. If the ith behavior is chosen by the fuzzy decision-maker, then q~(t) is equal to 1 and the others are zero. Thus, the mobile-robot motion command transferred to the lowerlevel controller at time step t is u(t) - ~ q~(t)u,(t) = q,(t)ul(t ) + q2 (t)u2 (t) + "'" + q~(t)u,(t) + ... + q , ( t ) u , ( t ) = u, (t).

(5)

i

2.3. Design of two behaviors Our goal is to design a navigator operating in an uncertain environment. In the previous study [4], we designed two behaviors using fuzzy logic and reinforcement learning. In this study we will deal with how to choose one of two behaviors using fuzzy decision-making Design of the two behaviors proceeds in the following sequence: (A) fuzzification of the input/output variables, 03) rule-base construction through reinforcement learning, (C) reasoning process, and (D) defuzzifieation of output variables. The fuzzy decoder partitions the input space formed by the information from internal and external sensors into subspaces. The partitioned input spaces are represented by the rules. Each behavior is composed of fuzzy

160 decoder, decision making, rule base, reinforcement-learning mechanism, and defuzzifier. The goal of navigation in an uncertain environment is to make the mobile robot arrive at a destination point without colliding with obstacles. In order to accomplish the navigation effectively, we must control the mobile-robot motion by considering the obstacle position Xoi =(Xo~,Yoi), the robot's position X = (x,y), and its heading angle 0 with respect to the world coordinate frame {W} shown in Fig. 4.

Figure 4. The coordinate frames and control variables The motion of the mobile robot is characterized by its heading velocity v and incremental steering angle A0. Thus, we choose the input variables for avoidance behavior, 4 -IlXo,- xll 1,2 ..... 5), and those for goal-seeking behavior, heading-angle difference and distance to goal z=

IIx,- xll

output v

u

for two behaviors is expressed by the

velocity v and the incremental steering angle A0. Here, the subscript i denotes the number of the sensor suite detecting the obstacles in front of the robot and I1~ denotes the Euclidean norm. The variable d~ is calculated by Eq. (2). The angle g is that between the heading direction of the mobile robot and the direction of the goal position, z is the distance from the current position X-(x,y) to the goal position Xg- (xg,yg) with respect to the frame {W}. All the variables are defined in Fig. 4. Using these variables, a rule base is constructed for each behavior to infer two actions. Avoidance behavior and goal-seeking behavior have 324 rules and 56 rules, respectively. The output fuzzy sets are expressed by the linguistic values with membership functions having the triangular shaped functions. Initially, all the center values of the membership functions of the output fuzzy sets, representing the linear velocity and the incremental steering angle, are set to constants. When the control-action signals are applied to the mobile robot at time step t the robot moves by the amount of the movement vector re(t). As a result of the series of movement vectors {m(0),..., m(t - 3), m(t - 2), m(t - 1), re(t)}, shown in Fig. 5, the mobile robot may collide with the obstacle or move away from the goal position. When the robot navigates

161 through a section of the environment similar to the previously experienced one, the rules contributed to generate the control action must be corrected in order to avoid collision. The rules influencing the mobile robot's failures are the rules used at the previous time steps t, t - 1, t - 2 . . . . . Thus, such rules must be changed into the correct rules. This task is accomplished by two neuronlike adaptive elements [8] consisting of an associative search element and an associative critic element. By use of these dements, the center values of the membership functions of output fuzzy sets approach the correct values as the learning step progresses.

Figure 5. The movement vectors associated with colliding with an obstacle or moving away from the goal position

3. BEHAVIOR SELECTION Whenever the mobile robot navigates in an uncertain environment towards the goal position, two behaviors, avoidance and goal-seeking, always conflict. Therefore, for the complete navigator, it is necessary for the fuzzy decision-maker to efficiently select one of the behaviors.

3.1. Multi-criterion decision-making using fuzzy sets In 1970, Bellman and Zadeh considered the classical model of a decision and suggested a model for decision making in a fuzzy environment [9]. They considered a situation of decision making under certainty, where the objective function and constraints were characterized by their membership functions. Since we want to satisfy the objective function and the constraints simultaneously, the decision in a fuzzy environment is defined by analogy to that in a crisp environment as selecting the one among the alternatives which satisfies the objective function and the constraints. Thus, the decision in the fuzzy environment can be considered as the intersection of fuzzy objective function and fuzzy constraints. In this case there is no difference between objective function and constraints. Usually it is impossible for a single criterion to determine an optimal solution to a decision problem and to judge the suitability of the solution. Thus, a multi-criterion decision-making has been proposed and it has led to numerous evaluation schemes. The multi-criterion decisionmaking using fuzzy sets can be divided into two major areas: multi-objective decision-making

162 and multi-attribute decision-making [10]. The former concentrates on continuous decision space. The latter focuses on the problem with discrete decision spaces. The problem for selection of appropriate behavior corresponds to multi-attribute decision-making. Let u = ( u 1, u2,..., u,) be a set of alternatives resulting from primitive behaviors and a set of fuzzy goals (~ = ((~1,(~,..., (~m), by which the suitability of a behavior is judged. The jth fuzzy goal (~s is characterized by its membership function/aoj(u). In the following, the tilde sign (-~) representing the fuzzy sets will be dropped for notational simplicity. However, there are some cases where some goals and constraints are of greater importance than others. In such cases, the fuzzy decision function D might be expressed as the goals with weighting coefficients reflecting the relative importance of the constituent terms. The importance of goal Gs is expressed by the exponent a s. Thus, the fuzzy decision function is defined as the intersection of all the fuzzy goals (6)

D = G~' N GZ~ N ... N G:, ..

The rationale behind using weights as exponents to express the importance of a goal can be found in [10]. The greater the importance of a goal, the larger is the exponent of its fuzzy set, if the fuzzy sets are normalized. The condition aj > 1 means more importance, while a / < 1 means less importance. The problem is to determine one of the alternatives u i with the highest degree of suitability with respect to all relevant goals Gj, j = 1,2,...,m. The fuzzy set decision D in discrete space as the intersection of the Gj(u~) becomes

o- [{o,. min o

}.,- ....

1

(7)

The relative importance, a/, of the goals is established in the form of matrix A by pairwise comparison as follows:

A

=

I a,/b, a,/a~ ... a,/a. a2(a~ .

.

(8)

By applying Saaty's eigenvector method [11] to matrix A, consistent weights ws for the membership function of the j th goal are determined. As an alternative to the above, a fuzzy set decision D may be calculated by

o: [{.i, 1 , 2 , . . I,. =, 1,2,...,o;j: o ] ,

(9)

163 m

where

~ wj = 1.0. This method is useful when the number of fuzzy goals is small. The J optimal alternative is defined as the one achieving the highest degree of membership in D.

3.2. Fuzzy goals in the navigation problem In order for the mobile robot to arrive at its goal position without colliding with obstacles, an appropriate behavior must be selected according to the situation around the robot. When the obstacles are located around the mobile robot as shown in Fig. 4, the robot must reduce its speed in order not to collide with them. If the maximum acceleration am~x is applied to the mobile robot with an arbitrary initial velocity, the minimum time required to avoid the obstacle is calculated by trn~ - v c o s t p J a ~

x .

(10)

If a constant acceleration less than the maximum value is applied to the mobile robot, the time required to avoid the obstacle is calculated by to -

2(ej

-

R)/(veos~j).

(11)

Therefore, when the distance from the center of the mobile robot to the obstacle and the velocity component towards the obstacle are given, the marginal avoidance time tma r shown in Fig. 6 can be defined as the difference between the above two required times.

Figure 6. The marginal avoidance time

Figure 7. The minimum time to transfer the mobile robot to the goal position

Since the marginal avoidance time denotes the degree of collision, it is used in this study to represent the repulsive potential. Repulsive potential E r e ~ is produced by the inversion of the marginal avoidance time, which is a function of the mobile robot's velocity and the distances to the obstacles measured by the ultrasonic sensor array. Thus, the repulsive potential is defined by

164 su x"

Krar~xv COS~j 2am~ (ej - R ) - v2 cos2 %

(12)

where K r is a scale factor set to 0.0003 and SN is the number of ultrasonic sensors. Repulsive potential increases as the mobile robot goes closer to the obstacles, and approaches zero as its velocity in the direction of an obstacle approaches zero. When the mobile-robot configuration is as given in Fig. 4, the attractive potential is defined in order to transfer the robot with initial velocity v to the goal position. This potential is calculated using the minimum time tg as follows:

:

c~

+4a xl[X-

x ll-v ~

(13)

ar~x

Attractive potential depends on the distance to the goal position, the heading angle difference, and the initial velocity. As can be seen from Eq. (13) and Fig. 7, attractive potential increases as the mobile robot moves away from the goal position. The situation around the robot can be represented by the use of two potentials, Erep and Ea. When two behaviors are assumed to apply at the next step, the control actions derived from each behavior are calculated. Then, the changes of potential, Crep(Ui ) "-" AEre p (Ui) and cau(u~) = AEau(U~) (i = 1 (avoidance), 2 (goal-seeking)) resulting from the actions of the two behaviors are calculated. In order that the mobile robot can move through the environment to the goal position without colliding with obstacles, the changes of the weighted sum of the two potentials must be negative. However, it is very difficult to find the weighting coefficients in a cluttered environment. Furthermore, the uncertainties in the sensor measurements can be introduced into the potentials. Hence, in this study, one behavior to be used at the next action step is selected by the fuzzy decision-maker. To perform this, three fuzzy goals are defined by Gl: the change in repulsive potential Crev (Ui) shouM be smaller than q and G 2: the change m attractive potential Can(Ui) shouM be smaller than c2 and G 3:the repulsive potential Erev (u~) should be smaller than E 1 9

Here, the connective "and" denotes the intersection of fuzzy goals. The membership functions of the fuzzy goal are defined by 1.0 ~s 1(Crep (Ui )) -

~s (Catt(Ui)):

1.0 1.0 + M 1(Crep(Ui) + C1) Nl

i f Crep(Ui) ~_ C1

otherwise

g c=,(u,)_< c~ 1.0 1.0 otherwise 1.0 + Mz (c~u (u, ) + c2 ) N~

(14)

(15)

165

if E,,,(,,,)__ E, otherwise ~as (Er~v(ui))= 1.O+Ms(Er,v(ui)+E,)Ns 1.0 1.0

Here,

(16)

Nj, j = 1, 2, 3 are even integers and the Mj, j = 1,2, 3 are positive constants. The q, c2

and E1 are determined through a series of simulations. As can be seen from the above equations, all the fuzzy sets are normalized. 4. SIMULATION AND RESULTS 4.1. Simulation conditions As an illustration of the foregoing process, a series of simulations were carried out using an arbitrarily constructed environment with sixteen obstacles. It was assumed that the radius of the mobile robot is 0.2m and the extent of the environment is 10m • 10m. The maximum speed of the robot was assumed to be 35cm/s. The measurement range of the sensor was assumed to be eight times the radius of the mobile robot. In order to acquire the information about the environment, a sensor simulator giving the range data of the ultrasonic sensors was used. For the sensor simulator, it was assumed that the ultrasonic sensors can ideally detect the nearest obstacles within their beams. The correct rule bases for each behavior were those constructed in the previous study [4] and they were employed here for the simulations to select a behavior using the fuzzy decision-maker. In the simulations, the parameters Nj, j = 1,2,3 determining the shape of the membership

functions of the fuzzy goals, were all set to 2. The constant values, q, c~ and E 1, were determined through a series of simulations and were set to -0.01, -0.02 and 1.0 respectively. 4.2. Simulation of the selection of behaviors By use of Eqs. (2) and (3), the eighteen distance values measured by the sensors are grouped into the five distance values belonging to each sensor suite. The behaviors produce the prescribed responses to these five distance values. Let us assume that avoidance and goalseeking behaviors are both used at the next action step. Then, we can obtain two movement vectors corresponding to the outputs of the two behaviors. Using these two vectors and the mobile robot's current position and orientation, the potentials are calculated by Eqs. (12) and (13) and their changes are calculated. Using these values, the membership degrees representing the degree of fuzzy goal satisfaction are determined by Eqs. (14) through (16). Then, the minimum value of the membership degrees is chosen. This value denotes the degree of behavior satisfaction with respect to fuzzy goals. Thus, the fuzzy decision set D represented by Eq. (7) is obtained. The optimal alternative, the behavior to be used at next step, is determined by the behavior with the highest degree of membership in fuzzy decision set D. To test the effectiveness of the method using the fuzzy decision-maker, various start positions and initial heading angles were employed. In all simulations, the goal position is fixed and assumed to be located at the upper comer of the right-hand side within the environment as shown in Fig. 8. The weights wj (j = 1,2 ..... m) reflecting the relative importance of the fuzzy goals must be

determined. It was assumed that all the goals are of equal importance. Since a 1, % and a 3 in Eq. (8) can be equally set to 1.0, we can obtain the eigenvalue set as {0.0, 0.0, 3.0}. After

166 calculating three eigenvectors we can determine one eigenvector with the elements of the same sign as w = {w~,w2,w3} r = {1.0,1.0,1.0} r among the three eigenvectors.

Figure 8. The results of execution of navigation Fig. 8(a) shows the result of execution of navigation when M 1, M 2 and M 3 are all set to 0.2. The mobile robot can arrive at the goal position by passing through the environment without colliding with any obstacles. Fig. 8(b) shows the result of execution of navigation under the same conditions as above, except that M 2 = 0.05. As can be seen from Eqs. (14) through (16), the parameter M~ plays the role of modifier. The modifier is an operation which

167 changes the meaning of a fuzzy set. The case with the small _M2 corresponds to the case when w2 < 1.0. Therefore, this corresponds to the case when the fuzzy goal (7,_ associated with the change in attractive potential is less important than the others. As can be seen from this figure, the smaller the value of M 2, the larger should be the number of time steps for arrival at the goal position. Fig. 8(c) shows the result of execution of navigation when a different start position and initial heading angle are given. Fig. 8(d) shows the result of execution of the case when a local minimum occurs. As can be seen from these figures, even if the mobile robot is faced with situation where a local minimum occurs, the navigator with fuzzy decision maker shows a feasibility to escape from the local minimum. We believe that the proposed navigator with the fuzzy decision-maker enables the mobile robot to navigate through the environment more smoothly if more than three fuzzy goals are used. 5. CONCLUSIONS A behavior selection method based on fuzzy decision-making has been presented for mobile-robot navigation in complex environments. The mobile robot's situation at any instant is classified by calculating the repulsive and attractive potentials using both the currently perceived sensory information and the goal position provided by the higher-level planner. The decision function to select the behavior suitable in the vicinity of the robot is formulated as a fuzzy set using the fuzzy goals. Then, optimal selection of behaviors is determined by achieving the highest degree of membership in a fuzzy decision set. The fuzzy decision-making provides a flexibility which cannot be found in conventional decision-making. Furthermore, it is easy to formulate the relevant problems since there is no difference between goals and constraints in the fuzzy environment. In this study, even if three fuzzy goals are used, the mobile robot can move towards the goal position through the environment without colliding with obstacles. It has been shown that the proposed method enables the robot to navigate through complex environments where a local minimum occurs. REFERENCES

1. R.A. Brooks, "A robust layered control system for a mobile robot", IEEE J. of Robotics and Automation, vol. RA-2, no. 1, pp. 14-23, 1986. 2. J. Borenstein and Y. Koren, "Real-time obstacle avoidance for fast mobile robot", IEEE Trans. Syst. Man Cybern., vol. 19, no. 5, pp. 1179-1187, Sept./Oct. 1989. 3. H.R. Beom and H.S. Cho, "A sensor-based obstacle avoidance controller for a mobile robot using fuzzy logic and neural network", in Proc. Int. Conf. Intelligent Robots and Systems, pp. 1470-1475, 1992. 4. H.R. Beom and H.S. Cho, "A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning", IEEE Trans. Syst. Man Cybern., vol. 25, no. 3, Mar. 1995. 5. R.C. Arkin, "Motor schema-based mobile robot navigation", International Journal of Robotics Research, vol. 8, no.4, pp. 92-112, Aug. 1989. 6. A. Saffiotti, E.H. Ruspini and K. Konolige, "Blending reactivity and goal-directedness in a fuzzy controller", in Proc. IEEE Int. Conf. Fuzzy Systems, pp. 134-139, 1993. 7. M.A. Salichs, E.A. Puente, D. Gachet and J.R. Pimentel, "Learning behavioral control by reinforcement for an autonomous mobile robot", in Proc. IECONV93, pp. 1436-1441, 1993.

168 8. A.G. Barto and R.S. Sutton, "Neuronlike adaptive elements that can solve difficult learning control problems", IEEE Trans. Syst. Man Cybem., vol. SMC-13, no. 5, pp. 834-846, Sept./Oct. 1983. 9. R.E. Bellman and L.A. Zadeh "Decision making in a fuzzy environment", Management Science, vol. 17, no.4, pp. 141-164, Dec. 1970. 10. H.J. Zimmermann, Fuzzy set theory and its application. Kluwer-NijhoffPublishing, 1985. 11. T.L. Saaty, "Exploring the interface between hierarchies, multiple objectives and fuzzy sets", FSS 1, pp.57-68, 1978.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

169

Learning Emergent Tasks for an Autonomous Mobile Robot D. Gachet, M.A. Salichs and L. Moreno Dpto. Ingenier[a, Universidad Carlos III de Madrid, Butarque 15, 28911, Leganfis, Madrid, Spain e-mail: gachet~ing.uc3m.es We present an implementation of a reinforcement learning algorithm trough the use of a special neural network topology, the AHC (Adaptive Heuristic Critic). The AHC is used as a fusion supervisor of primitive behaviors in order to execute more complex robot behaviors, for example go to goal, surveillance or follow a path. The fusion supervisor is part of an architecture for the execution of mobile robot tasks which are composed of several primitive behaviors which act in a simultaneous or concurrent fashion. The architecture allows for learning to take place at the execution level, it incorporates the experience gained in executing primitive behaviors as well as the overall task. The implementation of this autonomous learning approach has been tested within OPMOR, a simulation environment for mobile robots and with our mobile platform Robuter. Both, simulated and actual results are presented. The performance of the AHC neural network is adequate. Portions of this work has been implemented within the EEC ESPRIT 2483 PANORAMA Project. 1. I n t r o d u c t i o n The process of guiding autonomous mobile robots in unstructured, dynamic, and complex environments is difficult. A multitude of problems arise such as noisy measurements, lack of appropriate sensors, difficulty in achieving real-time sensor processing, and constructing an appropriate model of the environment based on sensory information. For example, when working with ultrasonic range sensors there are inaccuracies in the values measured due to noise or to the physics of the ultrasound waves. Thus obtaining an accurate state of the environment is difficult. To deal with the aforementioned problems, researchers are advocating the development and use of intelligent systems not only in mobile robots but also in diverse areas such as manufacturing, control, vision and communications. Intelligent systems are difficult to define and indeed there are several definitions in the literature. There are also several theories of intelligent systems [1-4] and intelligent machines [7-10]. Regardless of a precise definition of an intelligent system, most definitions include learning as one of its main attributes. In addition to learning, the intelligent systems that we advocate follow a behavioral

170

methodology where all robot tasks are performed in terms of a set of elementary tasks referred to as primitive behaviors. The fundamental idea of behavioral control is to view a robot mission or task, also referred to as an emergent behavior as the temporal execution of a set of primitive behaviors. One of the most challenging problems in behavioral control involves the learning the appropriate mixture of primitive behaviors in order to execute complex tasks. For this purpose, in our research group several approaches have been investigated [5-7]. Our experience indicates that perhaps the most important learning requirement for autonomous systems is that of autonomous learning, in other words, the ability to learn from self-exploration or observation without the need of examples or a teacher. Although there is a great deal of research on learning in a general context [1-3], research on autonomous learning is just beginning. The neural network paradigm of reinforcement learning appears promising for autonomous learning. Accordingly, the goals of this paper are to discuss the reinforcement learning with a neural network AHC topology in the context of a behavioral control methodology for mobile robots, and to discuss needs for future research. 2. T h e A F R E B Control Architecture Several researchers have already argued the importance of looking at a mobile robot as a set of primitive behaviors. Primitive behaviors are also important components of reactive control which is a recently emerged paradigm for guiding robots in unstructured and dynamic environments [10,11]. Mobile robots must continuously interact with its environment and this is the essential characteristic of reactive programs. By reactive, we mean that all decisions are based on the currently perceived sensory information. An e m e r g e n t behavior can be defined as a simple or complex task or a task which is made up of more elementary (primitive) behaviors. Whereas primitive behaviors correspond to robot actions, emergent behaviors correspond to mission, goals, or subgoals. We have developed a behavioral control architecture called AFREB Adaptive Fusion of Reactive Behaviors for experimenting with a wide range of reactive control methodologies [5],[12]. The architecture is depicted in Figure 1 and basically consists of the following modu-

les:behavioral analysis, fusion supervisor, behavior primitives, and executor. The function of behavior primitives and fusion supervisor will be explained later, the function of the executor module is the calculation of the actual robot commands while enforcing control limitations due to non-holonomic constraints (e.g., limitation of driving angle of the robot, maximum speed and so on) and exhibiting robot reflexive behavior (e.g., emergency stop). We have used the behavioral architecture in several simulation and actual experiments with our mobile robot Robuter. All simulation work has been performed with OPMOR [13].

2.1. Behavior Primitives The set of primitive behaviors active at any given time depends upon the occurrence of specific events in the environment (e.g., the detection of an obstacle, time for the robot

171

UPPER LEVELS

BEHAVIORAL - Task decomposition ANALYSIS - Chek for inconsistences I - Check for constrains - Enable behaviors

( R o b o t Model~

,os,o

ains IlJ

I

I P E

L

R

C E P T

9

~r

PRIMITIVE BEHAVIORS

SUPERVISOR Evaluate modified tasks

~1

C1

al

n

c2

- Produce robot ,,C,,,n commands for individual behaviors

i

I

- Final robot commands ] I Enforce control limitations I EXECUTOR i range, rates (vel.) [ i Safety monitoring I Emergency stop ]

i I

I

O N

~r W: Dynamic Environment

Figure 1. The A F R E B Control Architecture

--

172 to recharge itself, the issuance of an interactive command, etc). The primitive behaviors (i.e., actions) which have been implemented are: cl :

goal attraction

c2 :

perimeter following (contour left)

c3 :

perimeter following (contour right)

c4 :

free space

c5 :

keep away (from objects)

C6

follow a path

"

A primitive behavior can be characterized by a temporal sequence of appropriate values for linear velocity v(i) and curvature k(i) which cause the robot to exhibit the pre-specified response to sensorial information. Thus we define the output of a primitive behavior c(i) as the vector:

c(i) = (v(i),k(i)) T

(1)

where the variable i denotes the ith cycle of the robot controller. In what follows we will drop the index i for notational simplicity. The goal attraction primitive produces an output which directs the robot towards a specific goal which is defined by a point (x, y) in a global reference frame. The perimeter following behavior follows the perimeter of a fix (i.e., static) obstacle maintaining a pre specified distance away from it. This behavior take into account the minimum radius of curvature R~i~ = v/w~ax of the robuter in order to avoid uneven corners. The movement can be performed left or right. The free space behavior will cause movements in a direction so that the robot frees itself from objects. This behavior takes the information from measurements provided by the frontal sensors and choose the direction of the longer adjacent measures in order to ensure a safe course in that direction. The outputs of this behavior is the direction of motion at a constant speed. The keep away behavior basically avoids obstacles in the proximity of the robot, choosing directions that ensure no collision with nearby obstacles. The follow path behavior drives the robot closest to a pre- defined path. The path is composed by several cartesian coordinates (xi, yi) in the robot's environment.

2.2. D y n a m i c Fusion of Primitive Behaviors The fundamental assumption of the fusion supervisor module is that more complex behaviors, also called emergent behaviors can be obtained in terms of primitive behaviors, in the following way: If we denote Cl, c2,..., cn the output of each primitive behavior, then the output of an emergent behavior (i.e., mission, task, or subtask) is: N

= E i=1

(2)

173 Where the ai coefficients, with 0 _< ai _< 1 are found by an appropriate combination of measurement information provided by the perception system. Thus, the main function of an intelligent controller is the learning of the weights ai so that the performance of the robot for the execution of the tasks is adequate. There are two possibilities for learning the coefficients ai: on line or off line. For autonomous systems the on line learning method is the most interesting. Whether or not learning is present it is to be noted that the coefficients ai are dynamically adjusted by the controller when the robot is in motion. For the system described above, the learning problem dealt with in this paper can be formulated as follows" how can the system learn the appropriate values for the ai coefficients (i.e., mixture of primitive behaviors) in order to execute a given task in a on-line, real-time fashion without the intervention of a trainer?. In the context of the AFREB architecture, the learning process is performed within the fusion supervisor module. For demonstrating the power of autonomous learning approach, we have implemented the following emergent behaviors (i.e., tasks or missions)" M~

~

M2

~

surveillance of the environment (exploration). motion between two points (i.e., towards a goal) while avoiding unexpected obstacles.

M3 " follow a predefined path while avoiding unexpected obstacles. In general, a mission is executed while taking into account the dynamic nature of the environment as sensed by the perception system of the robot. It is important to note that in the case of navigation or follow a path there is a principal primitive behavior (go to goal or follow a path) and the others can be seen as auxiliary behaviors in order to ensure a safe motion in a dynamic and unstructured environment. 3. T h e A d a p t i v e H e u r i s t i c Critic as Fusion S u p e r v i s o r In this section, we summarize a fusion supervisor module which learns the values of the relative gains ai by the use of an external reinforcement signal [14] which rewards or punishes the actions of the mobile robot. The objective is the successful completion of missions M1 to M3,( i.e., surveillance of the environment, navigation and follow a path). A learning system that is built based on this technique is not provided with any sets of instances or examples of a concept. In addition, the learning system may have to deal with a number of observations that represent several concepts rather than just concentrating on a single concept at a time. In the learning approach based on exploration or observation the system generates both, a set of control rules and appropriate examples autonomously. The Neural Network used is called Adaptive Heuristic Critic (AHC). The architecture of the AHC topology is depicted in Figure 2, where r(t) represents the external reinforcement signal to be used in the learning algorithm. Our implementation of the AHC neural network consists of a classifier system which has five or six inputs, depending on the mission, that correspond to a zone arrangement of frontal and lateral ultrasonic sensors, the output of this module is a situation that

174

Figure 2. The A H C Neural Network consists in a vector of 32 binary elements with only one of them with value 1 at any time (64 elements in the case of navigation and follow a path missions). The AHC Network has an output layer with 4 processing elements for mission M1 or 5 processing elements for missions M2 and M3. This is called the Associative Search Element (ASE), and computes the values of a~ (the relative gains for primitive behaviors) in the following way : k ---,grb

ai = f( ~

Ik. Wki + cr(t))

(3)

k=O

Where Wki is a matrix of adaptive weights between the input and output layers, a(t) is a random value in the interval [-1,1] with a uniform distribution, rn is the number of inputs. The function f is defined by:

f(x)

_ f

1

if x > 0

- 1 otherwise

The weight matrix W is modified according to the following law :

Wki(t + 1) = Wki(t) + a . bi(t) . eki(t)

(4)

Where a is the learning rate, In our case a = 3, eki(t) is the eligibility of the weight Wki for reward or punishment. This is an important concept because by it we can reward or punish not only the latest action but the predescesors. The eligibility is modified b y :

ek~(t + 1) = 5. eke(t) + (1 - 5). I k ' a i

(5)

175 With 5 is a value in the range [0,1] and in our case 0.3. The internal reinforcement signal bi(t) is given by the other layer ACE (Adaptive Critic Element), this layer computes the prediction P~(t) of b~(t) at time t : k~'m,

=

(6) k=0

Where Vki are the weight connections between the classifier and ACE layer. The internal reinforcement signal is computed by : bi(t) = r ( t ) + q,. P i ( t ) - P i ( t - 1)

(7)

Where 7 is a positive constant in the range [0,1], r(t) is the external reinforcement signal and the weights Vki are updated b y :

(8) Where/3 is a positive constant and Xk is the eligibility for this layer and :

Xk(t +

1) = s

(1 - s

Ik(t)

(9)

With ~ between 0 and 1, in our case 0.4. The four or five output nodes correspond to the relative contributions of the last four primitive behaviors (i.e., a2 through as) for the surveillance mission, al through as for the navigation and a2 to a6 for the follow path mission. 4. S i m u l a t i o n E x p e r i m e n t s 4.1. S u r v e i l l a n c e R o b o t For the surveillance mission, the external reinforcement r(t) provides a graded performance value describing how well the system performed (-1 = robot collision and 0 -- no collision). Roughly the system works as follows: At the beginning, the weights of the AHC network were initialized with random values and the robot executes random movements with r(t) evaluated continuously and used for changing the internal networks weights. When collisions occur the robot is returned to the position it was 30 steps before the collision took place and the process is resumed from there. The learning process involves the association of actions with the external reinforcement (i.e., whether collisions are present) for each situation. As a result of the learning process, the system learns how to move across the environment without collisions. Figure 3 depicts the initial collisions of the robot in an initial simulation experiment whereas Figures 4 and 5 depict the path of the robot in subsequent simulation experiments where the weights of the NN were stored at the end of each experiment and used in the following. It can be seen that the number of collisions decrease as the robot goes through more learning cycles. After a sufficient number of collisions, the robot navigates with virtually no collisions in the simulator including unknown environments as in the case of Figure 5.

176

Figure 3. Initial Path of the Surveillance robot

Figure 4. Intermediate path of the Surveillance robot

177

Figure 5. Final path of the Surveillance robot in an unknown environment

4.2. G o t o G o a l In the case of execution of mission M2 the external reinforcement signal r(t) take a negative value-1 in the following cases:

9 When the robot is too close to an obstacle (imminent collision) 9 When the frontal sensors do not detect obstacles in the goal direction and the current heading of the robot is too far from the correct direction. In all other cases the reinforcement signal is 0. In this case there is an additional input in the AHC network as compared to the surveillance mission, this input is a binary signal indicating whether obstacles are present in the goal direction. The execution loop is the same as described above for the surveillance robot. We can see in Figure 6, the first environment used for simulating the execution of the AHC network. In this case the weights of the network were initialized with random values, the networks start by giving random values to outputs al to a5 in the first cycles because the system does not have any knowledge of the environment. In this first cycles the robot accepts many punishments until the networks decide to utilize the contour followin 9 and free areas behaviors in order to avoid collisions, some cycles later the network uses the 9oal attraction. When the robot advances in the appropriate direction and no collisions occurs, it accepts rewards. Figure 7 depicts the path of the robot in a subsequent simulation environment where the weights of the NN were stored at the end of each experiment and used in the following.

178

Figure 6. Go to goal, initial path

Figure 7. Go to goal, final path

179 4.3. Follow P a t h In the case of execution of this mission, the path is defined a priori as a set of points (x~,y~), the external reinforcement signal r(t) take a negative value -1 in the following situations: 9 When the robot is too close to an obstacle. 9 When the frontal sensors do not detect obstacles and the current heading of the robot is too far from the direction of the path's next point. In all other cases the reinforcement signal is 0.

Figure 8. Follow a path, first environment

We can see in Figures 8 and 9, the first and the last environment used for simulating the execution of this mission. The control cycle is the same described above for the missions M1 and /1//2. In the follow path mission, the primitive behaviors used are c2 to c6 5. E x p e r i m e n t s w i t h t h e R o b u t e r The final step is to execute the control software on the actual robot, Robuter, which is a mobile platform with two differential wheels, and equipped with a belt of 24 ultrasonic sensors as shown in Figure 10. Perhaps the main advantage of our software development environment is that the control software developed under the simulator is virtually the same as that used to control the actual vehicle.

180

Figure 9. Follow a path, second environment

Figure 10. The Robuter vehicle

181 In most cases the only changes made to the control software in the simulator before it controls the actual robot are of two categories: lowering the speed of the robot, and adjusting some parameters (e.g., the distance away from contours when following it). Other changes involved additional sensor processing (e.g., a filter) to deal with highly noisy sensors, sensor failures, multiple reflections, and no-reflections. The filter used with experiments with the actual robot works as follows: if a range measurement is below 30 cm (the lower limitation of the ultrasonic sensor) it is discarded and replaced with the mean of two valid measurements at each side of the sensor in question. This filter has proven practical to solve many problems dealing with bad range measurements. The physical laboratory environment used correspond to a small room of 6.5 m x 4.5 m where we usually put boxes as obstacles. The size of the room poses a number of practical problems as it does not leave enough space to maneuver. Thus the room is an excellent test bed for maneuvering in tight spots. By contrast, we did not have similar maneuvering problems in the simulation environment. As noted, by reducing the velocity of the robot and adjusting some parameters of some primitive behaviors we were able to run all our simulation control algorithms with the real vehicle in the small room.

Figure 11. Motion of the Robuter executing the Surveillance mission

Figures 11 and 12 depicts the movement of the Robuter in the room where we have placed four boxes at different locations and orientations. Figure 11 shows the execution of the mission M1 i.e. surveillance, and Figure 12 depicts the movement of the robot execution mission M2, in this case the goal point is located in the upper right corner of the lab.

182

Figure 12. Path of the vehicle with the Go to goal mission

As noted above, there are only two differences in the control software running with the robot as that running with the simulator: the actual robot speed is smaller and the number of back steps after a collision is reduced from 30 to 10.

6. D i s c u s s i o n

A major problem with the use of reactive behavioral control is the correct fusion of the primitive behaviors in order to execute complex tasks. In order to solve this problem, it is clear that the autonomous learning is an interesting paradigm, because it liberates us from explicitly programming or training the system via examples. The implementation of the autonomous learning paradigm presented here, demonstrates the feasibility of this approach by the execution of some missions for an autonomous mobile robot, the observation of the overall behavior of the robot confirms the idea of that the self-learning methodology trough the use of the behavioral control is possible. Some drawbacks are present, for example the incomplete coverage of situations due to the limitations in the number of inputs for the classifier system. The characteristics of the environment are very important in the learning process of the AHC, the choice of the best set of situations is a non trivial problem, because we must take care of not training the network with contradictory situations. In other words the reinforcement signal must always be coherent, and the design of a correct external reinforcement signal is perhaps the most important step in this approach.

183 7. Conclusions and F u t u r e W o r k s The successful completion of three missions through the application of autonomous learning has been presented in this paper. The use of a special neural network topology AHC as a fusion supervisor of primitive behaviors within a control architecture AFREB has been demonstrated. We have built a system that learns by itself how to execute the missions, surveillance, go to goal and follow a path, the simulated and real experiments demonstrate that the autonomous learning through reinforcement is a viable technology for motion control of non- holonomic robots operating in complex environments. In future publications we will present the implementation of others task with this methodology for example the execution of a complex missions composed of several sub-missions. We are also attempting to solve the dimculties which arise due to the poor resolution of the inputs with the use of a Kohonen Neural Network.

Acknowledgments We wish to thank Prof. J.R. Pimentel for his efforts in the Panorama project. The first author wishes to thank the Direccidn General de Investigacidn Cientifica y T~cnica (DGICYT) of the Spanish Ministry of Education and Science for research funding.This research has been founded by the Commission of the European Communities ( Project ESPRIT 2483 PANORAMA ) and the Comisidn Interministerial de Ciencia y Tecnolog{a CICYT (Projects ROB90-159 and ROB91-64). REFERENCES

1. W. Fritz, R.G. Martinez, J. Banque, A. Rama, R.E. Adobbati, and M. Sarno, The Autonomous Intelligent System, Robotics and Autonomous Systems, Vol. 5, pp. 109125, 1989. 2. P.F.M.J. Verschure, B.J.A. Krose, and R. Pfeifer, Distributed Adaptive Control: The Self Organization of Adaptive Behavior, Robotics and Autonomous Systems, Vol. 9, pp. 181-196, 1992. 3. A. Famili, Integrating Learning and Decision-Making in Intelligent Manufacturing Systems, Intelligent and Robotics Systems, Vol.3, pp. 117-130, 1990. 4. W. Fritz, World View and Learning Systems, Robotics and Autonomous Systems, Vol. 10, pp. 1-7, 1992. 5. D. Gachet, M.A. Salichs, J.R. Pimentel, L. Moreno, A. de la Escalera, A Software Architecture for Behavioral Control Strategies of Autonomous Systems, Proc. IECON'92, pp. 1002-1007, San Diego CA, Nov. 1992. 6. E.A Puente, D. Gachet, J.R. Pimentel, L. Moreno, and M.A. Salichs, A Neural Network Supervisor for Behavioral Primitives of Autonomous Systems, Proc. IECON'92, San Diego, CA, Nov. 1992. 7. D. Gachet, J.R. Pimentel, L. Moreno, M.A. Salichs, and V. Fernandez, Neural Network Control Approaches for Behavioral Control of Autonomous Systems, 1st IFAC Int. Workshop on Intelligent Autonomous Vehicles, Southampton, UK, pp. 330-334, April 1993. 8. G.Pang, A Framework for Intelligent Control, Intelligent and Robotics Systems, Vol. 4, pp. 109-127, 1991.

184

10. 11. 12.

13.

14.

W. Davis, A. Jones, and A. Saleh, Generic Architecture for Intelligent Control Systems, Computer-Integrated Manufacturing Systems, Vol. 5, No. 2, pp. 105-113, 1992. R. A. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal on Robotics and Automation, RA-2, pp 14-24, Apr. 1986. R.C. Arkin, Motor Schema-Based Mobile Robot Navigation, The International Journal of Robotics Research, Vol. 8, No. 4, pp. 92-112, Aug. 1989. M.A. Salichs, E.A. Puente, L. Moreno and J.R. Pimente, A Software Development Environment for Autonomous Mobile Robots, Chapter 8 of Recent Trends in Mobile Robots. Ed. Y.F. Zheng. World Scientific. 1993. J.R. Pimentel, E.A. Puente, D. Gachet, and J.M Pelaez, OPMOR: Optimization of Motion Control Algorithms for Mobile Robots, Proc. IECON'92, pp. 853-861, San Diego CA, Nov. 1992. J. del R. Millan, and C. Torras, Learning to Avoid Obstacles Through Reinforcement: Noise-tolerance, Generalization, and Dynamic Capabilities, Proc. IROS'92, Raleigh, N.C., pp. 1801-1807, Jul. 1992.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

185

Efficient reinforcement learning of navigation strategies in an autonomous robot Jos~ del R. Millgm a and Carme Torras b a Institute for Systems Engineering and Informatics, European Commission, Joint Research Centre, TP 361, 21020 Ispra (VA), Italy bInstitut de Cibern~tica (CSIC-UPC), Diagonal, 647, 08028 Barcelona, Spain

This paper describes a reinforcement connectionist learning architecture that allows an autonomous mobile robot to adapt to an unknown indoor environment. As a result, the robot learns efficient reactive navigation strategies. The robot learns on top of built-in reflexes. Also, the neural controller is a modular network which is built automatically. In this way, the robot is operational from the very start and improves its performance rapidly and incrementally as it safely explores the environment. These appealing features make this learning robot's architecture very well suited to real-world applications. The paper also reports experimental results obtained with a real mobile robot that demonstrate the feasibility of this approach.

1.

INTRODUCTION

Efficient navigation is critical for autonomous robots operating in hostile environments, which are usually unknown the first time robots face them. This paper deals with the problem of controlling an autonomous mobile robot so that it reaches efficiently a goal location in an unknown indoor environment. Instances of the problem where the goal is not inside the perception range of the robot all the time are also considered. An usual approach to this problem is that of reactive systems (e.g., [1]). However, basic reactive systems stiffer from two shortcomings. First, they are difficult to program. Second and most important, pure reactive controllers may generate inefficient trajectories since they select the next action as a function of the current sensor readings and the robot's perception is limited. To address this second shortcoming, some approaches combine planning and reaction (e.g., [2, 3]). In contrast to classical planning that acts on a perfect (or sufficiently good) model of the environment, these approaches only require a coarse global map of the environment made out of landmarks. Then, planning takes place at an abstract level and all the low level details are handled by the reactive component as the robot actually moves. In the case of robots operating in initially unknown environments, this

186

global map can be built from sensory data gathered either while travelling to the goal (e.g., [21) or in a previous exploration phase (e.g., [3]). Global maps are a valuable aid for navigation which must be used when available. When not, we claim that the addition of learning capabilities to reactive systems is sufficient to allow a robot to generate efficient trajectories after a limited experience. This paper presents experimental results that support this claim: a real autonomous mobile robot equipped with low-resolution sensors learns efficient goal-oriented obstacleavoidance reactive strategies in a few trials. Moreover, a learning approach like ours even overcomes the first shortcoming of reactive systems. As some researchers have recently shown, the robot programming cost is considerably reduced by letting the robot learn automatically the appropriate navigation strategies (e.g., [4, 5, 6, 7, 8]). This paper describes the testing of a reinforcement connectionist architecture on a real autonomous mobile robot. A reinforcement-learning robot learns by doing and does not require a teacher who proposes correct actions for all possible situations the robot may find itself in. Instead, the robot simply tries different actions for every situation it encounters and selects the most useful ones as measured by a reinforcement or performance feedback signal. Reinforcement connectionist learning [9, 10, 11] brings four benefits to autonomous robots. First, this kind of learning robot can improve its performance continuously and can adapt itself to new environments. Second, the connectionist network does not need to represent explicitly all possible situation-action rules as it shows good generalization capabilities. Third, connectionist networks have been shown to deal well with noisy input data, a capability which is essential for any robot working upon information close to the raw sensory data. Fourth, connectionist learning rules are well suited to on-line and real-time learning. In addition to these benefits, the architecture described in this paper also overcomes three critical limitations of basic reinforcement connectionist learning that prevent its application to autonomous robots operating in the real world. The first and most important limitation is that reinforcement learning might require an extremely long time. The main reason is that it is hard to determine rapidly promising parts of the action space where to search for suitable reactions. The second limitation has to do with the robot's behavior dttring learning. Practical learning robots should be operational at any moment and, most critically, they should avoid catastrophic failures such as collisions. Finally, the third limitation concerns the inability of "monolithic" connectionist networks ~i.e., networks where knowledge is distributed over all the weights-- to support incremental learning. In this kind of standard networks, learning a new rule (or tuning an existing one) could degrade the knowledge already acquired for other situations. An important assumption of our approach is that the robot receives a reinforcement signal after performing every action. Although this assumption is hard to satisfy in many reinforcement learning tasks, it is not in the case of goal-directed tasks since the robot can easily evaluate its progress towards the goal at any moment. A second assumption of our approach is that the goal location is known. In particular, the goal location is specified in relative cartesian coordinates with respect to the starting location.

187

Figure 1. The Nomad 200 mobile robot.

2.

EXPERIMENTAL

SETUP

The system composed by the robot and the reinforcement connectionist controller has been called T E S E O . The physical robot is a wheeled cylindrical mobile platform of the Nomad 200 family (see Figure 1). It has three independent motors. The first motor moves the three wheels of the robot together. The second one steers the wheels together. The third motor rotates the turret of the robot. The robot has 16 infrared sensors and 16 sonar sensors, from which distances to the nearest obstacles can be estimated, and 20 tactile sensors detect collisions. The infrared and sonar sensors are evenly placed around the perimeter of the turret. Finally, the robot has a dead-reckoning system that keeps track of the robot's position and orientation. The connectionist controller maps the cLtrrently perceived situation into a spatially continuous action. Then, the controller waits until the robot has finished to perform the corresponding motor command before computing the associated reinforcement signal and the next action. We will describe next what the input, output and reinforcement signals are. The input to the connectionist network consists of a vector of 40 components, all of them real numbers in the interval [0, 1]. The first 32 components correspond to the infrared and sonar sensor readings. In this case, a value close to zero means that the corresponding sensor is detecting a very near obstacle. The remaining 8 components are

188 derived from a virtual sensor that provides the distance between the current and goal robot locations. This sensor is based on the dead-reckoning system. The 8 components correspond to a coarse codification of an inverse exponential function of the virtual sensor reading. The main reason for using this codification scheme is that, since it achieves a sort of interpolation, it offers three theoretical advantages, namely a greater robustness, a greater generalization ability and faster learning. The output of the connectionist network consists of a single component that controls directly the steering motor and indirectly the translation and rotation motors. This component is a real number in the interval [-180,180] and determines the direction of travel with respect to the vector connecting the current and goal robot locations. Once the robot has steered the commanded degrees, it translates a fixed distance (25 cm.) and, at the same time, it rotates its turret in order to maintain the front infrared and sonar sensors oriented toward the goal. In this way, a relative codification of both sensor readings and motor commands with respect to the goal direction is always maintained. This codification scheme is directly responsible for TESEO's generalization capabilities. The reinforcement signal is a real number in the interval [-3, 0] which measures the cost of doing a particular action in a given situation. The cost of an action is directly derived from the task definition, which is to reach the goal along trajectories that are sufficiently short and, at the same time, have a wide clearance to the obstacles. Thus actions incur a cost which depends on both the step clearance and the step length. Concerning the step clearance, the robot is constantly updating its sensor readings while moving. Thus the step clearance is the shortest distance measured by any of the sensors while performing the action. Finally, TESEO is equipped with a low-level asynchronous emergency routine to prevent collisions. The robot stops and retracts whenever it detects an obstacle in front of it which is closer than a safety distance.

3.

EXTENSIONS

TO BASIC REINFORCEMENT

LEARNING

TESEO's aim is to learn to perform those actions that optimize the total reinforcement received along the trajectory to the goal. As mentioned in the Introduction, TESEO has been designed to overcome 3 limitations of basic reinforcement learning: lack of incremental improvement, slow convergence, and failure to be operational from the very beginning. The following three aspects of TESEO's architecture address these limitations. First, the connectionist controller is a modular network, each module codifying a set of similar reaction rules. That is, these rules map similar sensory inputs into similar actions and, in addition, they have similar long-term consequences. Modularity guarantees that improvements in the response to a given situation will not negatively alter other unrelated reactions. Second, instead of learning from scratch, TESEO utilizes a fixed set of basic reflexes every time its connectionist controller fails to generalize correctly its previous experience to the current situation. The connectionist controller associates the selected reflex with the perceived situation in one step. This new reaction rule is tuned subsequently through reinforcement learning. Basic reflexes correspond to previous elemental knowledge about

189

Figure 2. Network architecture.

the task and are codified as simple reactive behaviors [1]. Each reflex selects one of the 16 directions corresponding to the current orientations of the infrared and sonar sensors. We have chosen this fixed set of directions because they are the most informative for the robot in terms of obstacle detection. It is worth noting that, except in simple cases, these reflexes alone do not generate efficient trajectories; they just provide acceptable starting points for the reinforcement learning algorithm to search appropriate actions. Integrating learning and reaction in this way allows T E S E O t o focus on promising parts of the action space immediately and to be operational from the very beginning. Third, T E S E O explores the action space by concentrating the search around the best actions currently known. The width of the search is determined by a counter-based scheme associated to the modules (see Section 4.2). This exploration technique allows T E S E O to avoid experiencing irrelevant actions and to minimize the risk of collisions.

4.

NETWORK

ARCHITECTURE

The connectionist controller is a modular two-layer network (see Figure 2). The first layer consists of units with localized receptive fields, which we call exemplars, because each of them represents a point of the input space and covers a limited area around this point. The second layer is made of one single stochastic linear unit, the output unit. There exists a full connectivity between the exemplars and the output unit. 4.1.

Exemplars The activation level of an exemplar is a real value in the interval [0, 1], it being 0 if

190 the perceived situation is outside its receptive field, and 1 if the situation corresponds to the point where the exemplar is centered. The j t h module consists of the exemplars c~,..., e~ and their related links. All modules respond to each perceived situation, but only the module owning the exemplar with the maximum response propagates the activities of its exemplars to the output unit. If no exemplar "matches" the perceived situation --i.e., if the input does not fall in the receptive field of any exemplar--, then the basic reflexes are triggered and the current situation becomes a new exemplar. Section 5.1 provides more details on this resource-allocating procedure. The modules are not predefined, but are created dynamically as T E S E O explores its environment. Every module j keeps track of four adaptive values. First, the width of the receptive fields of all the exemplars in this module, dj. Second, the expected total future reinforcement, bj, that the robot will receive if it uses this module for computing the next action. Third, a counter that records how many times this module has been used without improving the robot's performance, Q. Fourth, the prototypical action the robot should normally take whenever the perceived situation is classified into this module, paj. There are as many prototypical actions as reflexes. Thus the first one is the direction of the front infrared and sonar sensors --i.e., a deviation of 0 degrees from the vector connecting the cm'rent and goal robot locations--, the second one is 22.5 degrees, and so on. Sections 5.1 and 6.2 explain how paj is initially determined and how it evolves, respectively. After reacting, the cvaluator computes the reinforcement signal, z, as specified in Section 2. Then, if the action was computed through the module j, the difference between z and bj is used for learning (see Section 5.4). Only the weights of the links associated to the winning module j are modified. 9

4.2.

Output Unit The output of the connectionist controller is a prototypical action pa, normally the prototypical action of the winning module, plus a certain variation s that depends on the location of the perceived situation in the input subspace dominated by that module (see Figure 3). In order to find a suitable action for each situation through reinforcement learning, T E S E O needs to explore the action space. However, this exploration is not conducted upon the whole action space, but only around the best actions currently known. Since each action is the sum of two components, pa and s, the exploration mechanism works on each of them separately. This exploration mechanism depends on cj, the counter associated to the winning module. On the one hand, the exploration mechanism selects pa from the prototypical actions associated to all the modules that classify the perceived situation. That is, if the situation is located in the receptive field of one of the exemplars of the module m, then pare is a candidate. Assuming j to be the winning module, the selection goes as follows. If cj is not divisible by, say, 3 then paj is chosen. Otherwise, pa is taken to be the prototypical action associated to the module m with the best expected total future reinforcement, bin. The basic idea behind this exploration mechanism is that the winning module could well benefit from the knowledge of neighboring modules. On the other hand, the deviation s from pa is computed through a stochastic process in the interval [-45 ~ 45~ Thus, TESEO will only explore actions between pa and its four

191

Figure 3. The output unit. neighboring prototypical actions (two to the left and two to the right). The computation of s is done in three steps. The first step is to determine the value of the stochastic process' parameters. The mean # is a weighted sum of the activation levels of the exemplars e l , . . . , e{ of the winning module: n

#=

(1)

k k, k=l

where w~ is the weight associated to the link between e~ and the output unit, and a~ is the activation level of e~. The variance a is proportional to cj. This follows from the idea that the most often the module j is used without improving T E S E O ' s performance, the higher a must be. In the second step, the unit calculates its activation level 1 which is a normally distributed random variable: 1 = N(p, a).

(2)

In the third step, the unit computes s: s=

5. 5.1.

{

45~ - 5, l,

if I > 45, if 1 < - 4 5 , otherwise.

FOUR LEARNING

(3)

MECHANISMS

Network Growth The first learning mechanism makes the controller network grow as a function of the inputs received. Initially, there exist neither exemplars nor, consequently, modules and the resource-allocating procedure creates them as they are needed.

192 As mentioned in Section 4.1, if no exemplar "matches" the perceived situation, then the basic reflexes are triggered and the current situation becomes a new exemplar. That is, both represent the same point of the input space. The weight of the link from this exemplar to the output unit is initially set to zero and evolves subsequently through reinforcement learning. The new exemplar is added to one of the existing modules if its receptive field overlaps the receptive fields of the module's exemplars and the selected reflex is the same as the module's prototypical action. The first condition assures that every module will cover a connected input subspace. If any of the two conditions above is not satisfied, then the new exemplar is added to a new module. This module consists initially of the exemplar and its associated connections. Concerning the four parameters associated to this new module f, they are initially set to the following values: d I equals 0.4, c I equals 0, pa I is the selected reflex, and bI is estimated on the basis of the distance from the next location to the goal and the distance from the next location to the perceived obstacles in between the robot and the goal. T u n i n g Exemplars The second learning mechanism moves the position of the exemplars e ~ , . . . , e~ of the winning module j in order to better cover the input subspace dominated by that module. That is, the coordinates of the k th exemplar, v~, are updated proportionally to how well they matches the input coordinates x: 5.2.

9

v~(t +

I) =

v~(t) + e * a~(t) 9 [x(t) - v~(t)],

(4)

where e is the learning rate. In the experiments reported below, the value of e is 0.1.

5.3.

Improving Reinforcement Estimates

The third learning mechanism is related to the update of the future reinforcement estimates bj, and it is based on temporal difference (TO) methods [12]. Every value bj is an estimate of the total future reinforcement TESEO will obtain if it performs the best currently known actions that take it from its current location (whose associated observed situation is classified into the jth module) to the goal. Consequently, the value bj of the module j should, after learning, be equal to the sum of the cost z of reaching the best next module i plus the value bil:

bj =

max (z) + bi.

Actions

(5)

In order to iteratively update the values of bj, so that finally (5) holds for all of them, we have used the simplest TD method, i.e. TD(0) (see [12] for details). If the situation perceived at time t is classified by module j and, after performing the computed action, the next situation belongs to module i and the reinforcement signal - - o r cost-- is z(t + 1), then:

bj(t + 1) = bj(t) + U * [z(t + 1) + bi(t) - bj(t)].

(6)

1 As described in Section 2, z takes negative values. So, minimizing future cost corresponds to maximizing future reinforcement.

193 The intensity of the modification is controlled by r], which takes the value 0.75 when TESEO behaves better than expected, and 0.075 otherwise. The rationale for modifying less intensively bj when z(t + 1) + b~(t) - bj(t) < 0 is that the error in the estimation is probably due to the selection of an action different from the best currently known one for the module j.

5.4.

Weight U p d a t e

The fourth learning mechanism concerns weight modification and uses the classical associative search (AS) [9, 11]. AS uses the estimation given by TD to update the situation-action mapping, which is codified into the connectionist controller:

w~(t + 1) = wJk(t) + a

*

[z(t + 1) + bi(t)

-

bj(t)]

*

r

(7)

where a is the learning rate, and r is the eligibility factor. The eligibility factor of a given weight measures how influential that weight was in choosing the action. In our experiments, r is computed in such a manner that the learning rule corresponds to a gradient ascent mechanism on the expected reinforcement

[11]:

Oln____N , l(t) r (t) = Owjk (t) = a~ (t)

(8)

where N is the normal distribution function in (2). The weights w~ are modified more intensively in case of reward--i.e., when T E S E O behaves better than expected-- than in case of penalty. These two values of a are 0.2 and 0.02, respectively. The aim here is that T E S E O maintains the best situation-action rules known so far, while exploring other reaction rules.

6.

FOUR LEARNING OPPORTUNITIES

Let us present now the four occasions in which learning takes place. The first arises during the classification phase, the next two happen after reacting, and the last one takes place when reaching the goal.

6.1.

Unexperienced Situation

If the perceived situation is not classified into one of the existent modules, then the basic reflexes get control of the robot, and the resource-allocating procedure creates a new exemplar which is added either to one of the existing modules or to a new module.

6.2.

Performing within Expectations

If the perceived situation is classified into the module j and z(t+ 1)+b~(t)-bj(t) >_ kz, where kz is a negative constant, then (i) the exemplars of that module are tuned to make them closer to the situation, (ii) the weights associated to the connections between the exemplars and the output unit are modified using the AS reinforcement learning rule, (iii) bj is updated through TD(0), and (iv) dj, cj, and paj are adapted.

194 The adaptive parameters are updated differently in case of reward than in case of penalty. In case of reward, dj is increased by 0.1, cj is initialized to 0, and if the output of the action unit, pa + s, is closer to a prototypical action other than paj, then paj becomes this new prototypical action. In case of penalty, cj is increased by 1 and dj is decreased by 0.1 if it is still greater than a threshold kd.

6.3.

Performing rather Badly If the perceived situation is classified into the module j and z(t+ 1)+b~(t)-bj(t) < kz, then the topology of the network is slightly altered and dj is decreased by 0.1 if it is still greater than a threshold kd. If the total future reinforcement computed after reacting, z + bi, is considerably worse than the expected one, bj, this means that the situation was incorrectly classified and needs to be classified into a different module. The resource-allocating procedure creates a new exemplar, e~, that has the same coordinates as the perceived situation, but it does not add it to any module. The next time this situation will be faced, e~ will be the closest exemplar. Consequently, no module will classify the situation and the basic reflexes will get control of the robot. Then, the resource-allocating procedure will add e~ either to one of the existing modules or to a new module as described in Section 5.1. 6.4.

Reaching the Goal Finally, whenever the goal is reached, the value bj of every winning module j along the path to the goal is also updated in reverse chronological order. Thus, TESEO needs to store the pairs (j(t),z(t + 1)) along the current path. This supplementary update only accelerates the convergence of the bj's, but does not change their steady values [13]. 7.

E X P E R I M E N T A L RESULTS

TESEO's performance has been tested on a corridor with offices at both sides. The task is to generate a short but safe trajectory from inside an office to a point at the end of the corridor. TESEO achieves the target location every time and it never gets lost or trapped into malicious local maxima. The first time it tries to reach the goal it relies almost all the time on the basic reflexes. Essentially, the basic reflexes make TESEO travel in the direction that (i) is the closest to the current one, (ii) is the safest, and (iii) brings TESEO toward the goal. As illustrated in Figure 4, in the first trial, TESEO enters into a dead-end section of the office (but it does not get trapped into it) and even it collides against the door frame because its sensors were not able to detect it. Collisions happened because the frame of the door is relatively thin and the incident angles of the rays drawn from the sensors were too large resulting in specular reflections. Thus this simple task offers TESEO the opportunity to learn three skills. The first one is to smooth out certain sections of the trajectory generated by the basic reflexes. The second skill is to be able to avoid dead-ends or, in general, not follow wrong walls. The third and most critical is to avoid obstacles that its sensors cannot detect. TESEO learns these three skills very rapidly. It reaches the goal efficiently and without colliding after travelling 10 times from the starting location to the desired goal

195

Figure 4. The environment and first trajectory generated for a starting location within the office. Note that T E S E O has some problems in going through the doorway.

(Figure 5). The total length of the first trajectory is approximately 13 meters while the length of the trajectory generated after TESEO has learned the suitable sequence of reactions is about 10 meters. This experiment was run several times, obtaining similar results. Concerning the acquisition of the third skill, TESEO associates safe actions to malicious situations --made out of a coarse codification of the distance to the goal and of uncorrect estimated distances to the obstacles-- immediately after colliding with unperceived obstacles. Moreover, due to its noise tolerance and generalization capabilities (see below), TESEO only hits a few times with obstacles that its sensors cannot detect before learning to avoid them. In addition, the learned navigation strategies have the following featL~es. First, the trajectories are quite smooth even if the basic reflexes have not been programmed in this way (it is rather difficult to do it!!) and the reinforcement signal only penalizes long and unsafe paths. Second, the acquired reactions are robust to noisy sensory data. Since sensors are not perfect, TESEO does not perceive the same situations along similar trajectories. However, it is still able to generate suitable actions using only its acquired reactions. Figure 6 illustrates instances of the reaction rules learned. For every location considered (little circles) the move to be taken is depicted. Figure 6 shows that TESEO generates solution paths from any starting location inside the room. This simulation experiment indicates that TESEO exhibits good generalization abilities, since it can handle many more situations than those previously perceived. Once TESEO has learned efficient navigation strategies, occasional obstacles are put

196

Figure 5. Trajectory generated after travelling 10 times to the goal.

Figure 6. Generalization abilities: Situation-action rules applied for a sample of locations within the office and the first part of the corridor.

197 in the way to the goal. TESEO moves around the obstacles and then it returns to the original, efficient trajectory. In other experiments, the goal is changed. T E S E O learns also to navigate to this new goal in a few trials and it is still able to reach the first goal as efficiently as before.

8.

RELATED

WORK

The robot learning architectme described in this paper shows some resemblances with several previous works. In the case of goal-directed tasks, it is possible to use the future reinforcement associated with situation-action pairs to define a vector field over the environment. Then, the robot will normally perform the action that follows the gradient of the vector field. Seen in this way, our approach is reminiscent of potential field approaches (e.g., [14, 15]). However, our approach differs from them in three fundamental aspects. First, any potential field algorithm relies on predetermining a potential function that will allow the robot to reach the goal efficiently. Contrarily, a reinforcement approach like ours is adaptive in that it starts with an initial and inefficient vector field generated from rough estimates of the future reinforcements and adapts the field to generate efficient trajectories as it converges to the correct estimates through TD methods. Second, the kind of vector fields that can arise in our case are much more varied than those defined by a weighted sum of the potential fields originated by the different obstacles. Third, most of the potential field approaches require a global model of the environment while our approach only needs local information obtained from the robot's sensors. There exist also potential field approaches which only need local workspace models built out of sensory information (e.g., [14]), but they cannot produce efficient trajectories. The benefits of letting the robot learn automatically the appropriate reaction rules have recently been emphasized by several authors. [4] combines reinforcement connectionist learning and teaching to reduce the learning time. In this framework, a human teacher shows the robot several instances of reactive sequences that achieve the task. Then, the robot learns new reaction rules from these examples. The taught reaction rules help reinforcement learning by biasing the search for suitable actions toward promising parts of the action space. In our approach, the basic reflexes play the same guidance role, requiring only a programmer instead of a teacher. [5] use Kohonen maps [16] to split the sensory input space into clusters, and then associate an appropriate action to every cluster through reinforcement learning. Their architecture maps all the situations of a given cluster to a single action, and classifies situations solely by the similarity of their representations. Ore" architecture classifies situations, first, based on the similarity of their input representations. But, then, it also incorporates taskspecific information for classifying based on the similarity of reinforcements received. In this manner, the input space is split into consistent clusters since similar future reinforcement corresponds to similar suitable actions for similar situations. [7] integrate inductive neural network learning and explanation-based learning. The domain theory is previously learned by a set of neural networks, one network for each discrete action the robot can perform. Our basic reflexes also represent prior knowledge about the task; however, they are much more elemental and are used in a different way. Our approach is also related to the Dyna integrated architectures introduced by

198 [17] and further extended by [4, 18, 19], among others. Roughly, Dyna uses planning to speed up the acquisition of reaction rules through reinforcement learning. Dyna also learns a global model of the task on which it plans.

9.

SUMMARY

AND FUTURE

WORK

We have described a reinforcement learning architecture that makes an autonomous mobile robot rapidly learn efficient navigation strategies in an unknown indoor environment. Our robot T E S E O not only is operational from the very beginning and improves its performance with experience, but also learns to avoid collisions even when its sensors cannot detect the obstacles. This is a definite advantage over non-learning reactive robots. T E S E O also exhibits incremental learning, high tolerance to noisy sensory data, and good generalization abilities. All these features make our robot learning architecture very well suited to real-world applications. However, the current implementation of our approach suffers from one main limitation, namely it requires a reliable odometry system that keeps track of the robot's relative position with respect to the goal. Currently, odomtery is totally based on deadreckoning. In all the experiments we have carried out so far, dead-reckoning has proven sufficient to reach the goal. As long as its estimation of the position of the robot does not differ greatly from the actual one, the connectionist controller is still able to produce correct actions. But, dead-reckoning will probably be insufficient in more complicated missions requiring long travels and many turns. Current work focusses on reliable odometry. The goal is designated by a modulated light beacon and the robot is equipped with sensors especially designed to detect that light. Whenever T E S E O detects the goal it uses the beaconing system; otherwise, it relies on dead-reckoning.

ACKNOWLEDGMENTS We thank Aristide Varfis for helpful discussions. JRM is also grateful to people of Nomadic Technologies for their excellent support with the physical robot. The work reported in this paper was conducted in the Institute for Systems Engineering and Informatics, Joint Research Centre of the European Commission. This research has been partially supported by the ESPRIT Basic Research Action number 7274.

REFERENCES

[1] R.A. Brooks, "A robust layered control system for a mobile robot," IEEE Journal of Robotics and Automation, vol. 2, pp. 14-23, 1986. [2] D.P. Miller and M.G. Slack, "Global symbolic maps from local navigation," Proc. of the 9th National Conf. on Artificial Intelligence, pp. 750-755, 1991. [3] J.H. Connell, "SSS: A hybrid architecture applied to robot navigation," IEEE Int. Conf. on Robotics and Automation, pp. 2719-2724, 1992.

199

[4] [5] [6]

[7]

[s] [9]

[10] Ill] [12]

[13] [14] [15]

[16] [17] [is] [19]

L.-J. Lin, "Programming robots using reinforcement learning and teaching," Proc. of the 9th National Conf. on Artificial Intelligence, pp. 781-786, 1991. K. Berns, R. Dillmann, and U. Zachmann, "Reinforcement-learning for the control of an autonomous mobile robot," IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 1808-1815, 1992. J. del R. Millhn and C. Torras, "A reinforcement connectionist approach to robot path finding in non-maze-like environments," Machine Learning, vol. 8, pp. 363395, 1992. T.M. Mitchell and S.B. Thrun, "Explanation-based neural networks learning for robot control," in C.L. Giles, S.J. Hanson, and J.D. Cowan (eds.), Advances in Neural Information Processing Systems 5, pp. 287-294. San Mateo, CA: Morgan Kaufmann, 1993. J. del R. Mills "Reinforcement learning of goal-directed obstacle-avoidance reaction strategies in an autonomous mobile robot," Robotics and Autonomous Systems, in press. A.G. Barto, R.S. Sutton, and C.W. Anderson, "Neuronlike elements that can solve difficult learning control problems," IEEE Trans. on Systems, Man, and Cybernetics, vol. 13, pp. 835-846, 1983. A.G. Barto, R.S. Sutton, and C.J. Watkins, "Learning and sequential decision making," Tech. Report 89-95, Dept. of Computer and Information Science, University of Massachusetts, Amherst, 1989. R.J. Williams, "Simple statistical gI"adient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, pp. 229-256, 1992. R.S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988. S. Mahadevan and J. Connell, "Automatic programming of behavior-based robots using reinforcement learning," Artificial Intelligence, vol. 55, pp. 311-365, 1992. J. Borenstein and Y. Koren, "Real-time obstacle avoidance for fast mobile robots," IEEE Trans. on Systems, Man, and Cybernetics, vol. 19, pp. 1179-1187, 1989. J. Barraquand and J.C. Latombe, "Robot motion planning: A distributed representation approach," The International Journal of Robotics Research, vol. 10, pp. 628-649, 1991. T. Kohonen, Self-Organization and Associative Memory. Second Edition. Berlin: Springer-Verlag, 1988. R.S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," Proc. 7th Int. Conf. on Machine Learning, pp. 216-224, 1990. L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching", Machine Learning, vol. 8, pp. 293-321, 1992. J. Peng and R.J. Williams, "Efficient learning and planning within the Dyna framework," Adaptive Behavior, vol. 1, pp. 437-454, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

201

A Lifelong Learning Perspective for Mobile Robot Control Sebastian Thrun Universit~t Bonn Institut ffir Informatik III R5merstr. 164, 53117 Bonn, Germany E-mail: [email protected], [email protected] Designing robots that learn by themselves to perform complex real-world tasks is a still-open challenge for the field of Robotics and Artificial Intelligence. In this paper we present the robot learning problem as a lifelong problem, in which a robot faces a collection of tasks over its entire lifetime. Such a scenario provides the opportunity to gather general-purpose knowledge that transfers across tasks. We illustrate a particular learning mechanism, explanation-based neural network learning, that transfers knowledge between related tasks via neural network action models. The learning approach is illustrated using a mobile robot, equipped with visual, ultrasonic and laser sensors. In less than 10 minutes operation time, the robot is able to learn to navigate to a marked target object in a natural office environment. 1. I N T R O D U C T I O N Throughout the last decades, the field of robotics has produced a large variety of approaches for the control of complex physical manipulators. Despite significant progress in virtually all aspects of robotics science, most of today's robots are specialized to perform a narrow set of tasks in a very particular kind of environment. Most robots employ specialized controllers that have been carefully designed by hand, using extensive knowledge of the robot, its environment and its task. If one is interested in building autonomous multi-purpose robots, such approaches face some serious bottlenecks. 9 K n o w l e d g e b o t t l e n e c k . Designing a robot controller requires prior knowledge about the robot, its environment and the tasks it is to perform. Some of the knowledge is usually easy to obtain (like the kinematic properties of the robot), but other knowledge might be very hard to obtain (like certain aspects of the robot dynamics, or the characteristics of the robot's sensors). Moreover, certain knowledge (like the particular configuration of the environment, or the particular task one wants a robot to do) might not be accessible at all at the design-time of the robot. 9 E n g i n e e r i n g b o t t l e n e c k . Making domain knowledge computer-accessible, i.e., hand-coding explicit models of robot hardware, sensors and environments, has often been found to require tremendous amounts of programming time. As robotic

202 hardware becomes increasingly more complex, and robots are to become more reactive in more complex and less predictable environments, the task of hand-coding a robot controller will become more and more a cost-dominating factor in the design of robots. 9 T r a c t a b i l i t y b o t t l e n e c k . Even if the robot, its environment and its goals can be modeled in sufficient detail, generating control for a general-purpose device has often been found to be of enormous computational complexity (e.g., [18]). Moreover, the computational complexity often increases drastically with the complexity of the mechanical device. Machine learning aims to overcome these limitations, by enabling a robot to collect its knowledge on-the-fly, through real-world experimentation. If a robot is placed in a novel, unknown environment, or faced with a novel task for which no a priori solution is known, a robot that learns can collect new experiences, acquire new skills, and eventually perform new tasks all by itself. For example, in [5] a robot manipulator is described which learns to insert a peg in a hole without prior knowledge regarding the manipulator or the hole. Maes and Brooks [8] successfully applied learning techniques to coordinating leg motion for an insect-like robot. Their approach, too, operates in the absence of a model of the dynamics of the system. Learning techniques have frequently come to bear in situations where the physical world is extremely hard to model by hand (e.g., the characteristics of noisy sensors). For example, Pomerleau describes a computer system that learns to steer a vehicle driving at 55mph on public highways, based on input sensor data from a video camera [15]. Learning techniques have also successfully been applied to speed-up robot control, by observing the statistical regularities of "typical" situations (like typical robot and environment configurations), and compiling more compact controllers for frequently encountered situations. For example, Mitchell [11] describes an approach in which a mobile robot becomes increasingly reactive, by using observations to compile fast rules out of a database of domain knowledge. However, there is a fundamental shortcoming in most of to date's rigorous learning approaches. Most of the robot control learning approaches focus on learning to achieve single, isolated performance tasks. If one is interested in learning with a minimum amount of initial knowledge, as is often the case in approaches to robot learning, such approaches have a critical limiting factor: the number of training examples required for successful generalization. The more complex the task at hand and the lesser is known about the problem beforehand, the more training data is necessary to achieve the task. In many robotics domains, the collection of training data is an expensive undertaking due to the slowness of robotics hardware. Hence, it does not surprise that the time required for realworld experimentation has frequently been found to be the limiting factor that prevents rigorous machine learning techniques from being truly successful in robotics. The task of learning from scratch can be significantly simplified by considering robots that face whole collections of control learning problems over their entire lifetime. In such a lifelong robot learning scenario [27], learning tasks are related in that they all play in the same environment, and that they involve the the same robot hardware. Lifelong learning scenarios open the opportunity for the transfer of knowledge across tasks. Complex tasks, which might require huge amounts of training data when faced in isolation, can conceivably

203 be achieved much faster if a robot manages to exploit previously learned knowledge. For example, a lifelong learning robot might acquire general-purpose knowledge about itself and its environment, or acquire generally useful skills that can be applied in the context of multiple tasks. Such functions, once learned, can be applied to speed up learning in new tasks. Hence, a lifelong learning perspective, as proposed in this paper, promises to reduce the number of training examples required for successful learning, and hence to scale machine learning technology to more complex robot scenarios. In this paper we will present one particular candidate approach to the lifelong learning problem: The explanation-based neural network learning algorithm (EBNN) [13,26,27]. EBNN uses a hybrid learning strategy to generalize training data. On the one hand, it allows to learn functions inductively from scratch, just like neural network Backpropagation [17]. On the other hand, EBNN also allows to learn task-independent domain knowledge, which applies to multiple tasks. More specifically, in the robot control learning scenarios studied in this paper, domain knowledge is represented by neural network action models, which model the effect of the robot's actions. These models are independent of the particular control learning task at hand, and they are acquired over the lifetime of the robot. When learning control, they are used to analyze observations, in order to generalize better to unseen situations. Since in this paper we are interested in learning robot control, we will describe EBNN in the context of Q-Learning [28]. Q-Learning is a popular method for learning to select actions from delayed and sparse reward. To illustrate the appropriateness of the EBNN learning mechanism for robotics problems, we will describe experimental results obtained in an indoor mobile robot navigation domain. 2. L E A R N I N G

CONTROL

Q-Learning [28] is a reinforcement learning technique that has frequently been applied to robot control. The goal of Q-Learning is to learn a policy for generating whole action sequences, which maximizes an externally given reward function. In general, the reward may be delayed and/or sparse, adding a significant degree of complexity to the learning problem. For example, in the robot navigation experiments reported here, reward is only received upon reaching a particular goal position, or upon total failure (i.e., loosing a target object out of sight, as described in Sect. 4 ). The goal of learning, thus, is to construct a reactive controller that carries the robot to its goal position in as few steps as possible. Q-Learning works as follows. Let 5' be the set of all possible robot percepts. For simplicity, let us assume that at any (discrete) point in time, the agent can observe the complete state of the world 1, denoted by sES. Based on this observation, the learner then picks an action a (from a set of actions A). As a result, the world state changes. In addition, the learner receives a scalar reward value, denoted by r(s, a) (or rt, if we refer to the expected reward at a certain time t), which measures its current performance. Throughout this paper we will assume that reward will exclusively be received upon reaching a designated goal location, or upon total failure, respectively. 1This restrictive assumption is frequently referred to as the Markov assumption. See [2,7,10,22] for approaches to reinforcement learning in partially observable worlds.

204

Q

(goal/failure)

valuenetworkQ~I N~

valuenetw~

N~

valuenetw~

N~

Figure 1. Episode and Q-networks in Q-Learning.

Formally, in Q-Learning one seeks to find an action policy 7r:s ~a, i.e., a mapping from environment states s to actions a, which, when applied for action selection, maximize the

cumulative discounted future reward oo

R

=

(1) t'-tO

Here 7 (with 0 < 7 5 1 ) is a discount factor that favors rewards reaped sooner in time. Notice that actions may impact reward many time steps later. Hence, in order to learn to pick reward-maximizing actions, one has to solve a temporal credit assignment problem. The key of Q-Learning is to learn a value function for picking actions. A value function, denoted by Q:S• ~ , maps robot percepts sqS and actions aCA to scalar "utility" values. In the ideal case, Q(s, a) is, after learning, the maximum cumulative, discounted reward one can expect upon executing action a in state s (cf. Eq. (1)). Hence, Q ranks actions according to their reward: The larger the expected cumulative reward for picking action a at state s, the larger its value Q(s, a). Q, once learned, allows to generate optimal actions by greedily picking the action which maximizes Q for the current state s: 7r(s)

=

argmax Q(s, ~t) aeA

Initially, all values Q(s, a) are set to zero. During learning values are refined incrementally, using the following recursive update procedure. Suppose the robot just executed a whole action sequence which, starting at some initial state Sl, led to a final reward r(sT, aT). Such an episode is shown in Fig. 1. For all time steps t within this episode, Q(st, at) is then updated through mixture of the values of subsequent observation-action pairs, up to the final value at the end of the episode. The exact update equation, combined with a modified temporal differencing rule [24], is: +100 if at final action, robot reached goal - 1 0 0 if at final action, robot failed

~target(st, at)

(2)

7" [(l-A). max Q(St+l,a) a action l

+)~. ~target(st+l,at+l) ]

otherwise

205 actionmodelMal

actionmodelMa2

(goal/failure)

valuenetworkQal ~

valuenetworkQa2 ~

valuenetworkQa3

Figure 2. E B N N . General-purpose action models are used to derive slope information for training examples.

Here A (0_A
NEURAL NETWORK

LEARNING

Q-Learning learns individual policies independently, ignoring the opportunity for the transfer of knowledge across different tasks. An important aspect of successful approaches to the lifelong learning problem is the ability to transfer knowledge to related tasks. In this section we will describe the explanation-based neural network learning algorithm (EBNN), which uses task-independent knowledge to bias generalization when learning robot control. Notice that the EBNN algorithm is a general learning algorithm, albeit the fact that EBNN is described in the context of Q-Learning. A more detailed description can be found in [26,27]. In order to transfer knowledge, EBNN learns general-purpose predictive action models. Action models, denoted by M:S• ~A, model the effect of the robots actions. In EBNN, these models are represented using artificial neural networks, which are trained with observed state transitions using the Backpropagation algorithm [17] (or EBNN itself). Once trained, they are used to analyze observed episodes using the following three-step procedure: 1. Explain. An explanation is a post-facto prediction of an observed episode. More specifically, when explaining an episode, action models are employed to predict each state st from the previously observed state action tuple (st-l, at-l), as illustrated in

206

Figure 3. F i t t i n g values and slopes in E B N N : Let f be the target function for which three examples (zl,f(zl)), (z2, f(z2)), and (z3, f(za)) are known. Based on these points the learner might generate the hypothesis g. If the slopes (tangents) are also known, the learner can do much better: h.

Fig. 2. 2. Analyze. The explanation is now analyzed in order to extract the slopes of the target Q function. More specifically, consider the update rule (2). The slope of the target output, Qtarget(st, at), with respect to its input st is governed by: 0r(M (s, at)) ]

~8

V~,Qt~s~t(st, at) =

7" [(l-A)

[st

if at final action

oe(s,at+l)los ~,+, +

~.Vst.F1 ~target(st+x,at+x)] " OM(s, at)

(3)

st otherwise

If both Q and M are correct, Eq. (3) can be viewed as the derivative of Eq. (2) with respect to s with fixed actions. The auxiliary derivatives OM(s,a,) Os and OQ(s,a,§ are obtained by computing the first derivative of the neural network funcOs tion which, since neural networks are differentiable functions, is straightforward to obtain. Hence, for each target value of the form ((st, at), Qt~sct(st, at)) obtained via Eq. (3), EBNN finds the slope of the target function Q at (st, at), denoted by ~Tst~target(3t, at). This slope vector informs the learner about the sensitivity of Q with respect to infinitesimal changes in st. Clearly, action models M play an integral part in the extraction of slopes. 3. Learn. Once the slopes of the target function are known, one can generalize better by incorporating them into the training procedure, as depicted in Fig. 3. In EBNN both the target values Qtarget(st, at) and the target slopes VstQtarget(,.qt, at) are used to refine the weights of the target networks. Hence, a combined error function E

=

~ E~a~+,l.E~op~ patterns

207 is minimized, which takes both value error, Evmue, and slope error, Eslope, weighted by a gain parameter r/into account. Target values are fitted using the Backpropagation algorithm [17]. Target slopes are fitted using the Tangent-Prop algorithm, which is an analogue of Backpropagation for fitting slopes [20]. Clearly, slopes extracted by EBNN can be wrong. This is because they are computed using artificial neural networks, which themselves are constructed from training examples. Consequently, target slopes can mislead the generalization. EBNN provides a simple but effective mechanism to recover from malicious slopes. Whenever a model is used for predicting the next state, EBNN measures its root mean square prediction error, denoted by St, which is obtained by comparing the real state with the model prediction. This error indicates the correctness of the action model, and hence the accuracy of the slopes. Consequently, it is used to weight the slope information when refining the weights of the target network. More specifically, the weighting factor of the tth slope when refining the weights, denoted by r/t, is governed by T/t =

I-[

1-

Here 5t is the RMS error normalization factor such network, the learning rate which is called LOB*, has learning from weak action

of the model M at (st, at), and 5m~ is an appropriate that O<_St<_5~x. When refining the weights of the Qfor the tth slope is multiplied with r/t. This mechanism, empirically been found to be important for successfully models [26].

Notice that EBNN employs a neural network version of explanation-based learning [4,12]. To summarize, EBNN employs action models to analyze training episodes, and to derive slopes that are used for generalizing these episodes. These slopes generalize training instances in input space, since they indicate how small changes will affect the target function, Q. They are extracted from general-purpose action models, which is acquired and used over the entire lifetime of the robot. Hence, in a lifelong learning context, EBNN transfers knowledge by virtue of its action models. 4. R E S U L T S I N A R O B O T N A V I G A T I O N

DOMAIN

In this section we will present empirical results obtained in a mobile robot navigation domain. Xavier 2, the robot at hand, is shown in Fig. 4. It is equipped with a ring of 24 sonar sensors, a laser light-stripper (range finder), and a color camera mounted on a pan/tilt unit. Sonar sensors return approximate echo distances, along with noise. The laser light-stripper measures distances more accurately, but its perceptual field is limited to a small range in front of the robot. The task of the robot was to learn to navigate to a specifically marked target location in a natural laboratory environment. In some experiments, the location of the marker was fixed throughout the course of learning, in others it was moved across the laboratory and only kept fixed for the duration of a single training episode. Sometimes parts of the environment were blocked by obstacles. The marker was 2Xavier has been built by and is property of Carnegie Mellon University, Pittsburgh, USA.

208

Figure 4. The CMU Xavier robot.

detected using a visual object tracking routine that recognized and tracked the marker in real-time, making heavily use of the robot's pan/tilt unit. Every 3 seconds the robot could chose one of seven possible actions, ranging from sharp turns to straight motion. In order to avoid collisions, the robot employed a pre-coded obstacle avoidance routine. Whenever the projected path of the robot was blocked by an obstacle, the robot decelerated and, if necessary, changed its motion direction (regardless of the commanded action). Xavier was operated continuously in real-time. Each learning episode corresponded to a sequence of actions which started at a random initial position and terminated either when the robot lost sight of the target object, for which it was penalized, or when it halted in front of the marker, in which case it was rewarded. Penalty/reward was only given at the end of an episode, and Q-Learning with EBNN was employed to determine optimal action policies. In our implementation, percepts were mapped into a 46-dimensional perceptual space, comprising 24 logarithmically scaled sonar distance measurements, 10 locally averaged laser distance scans, and an array of 12 camera values that indicated the horizontal angle of the marker position relative to the robot. Hence, each action model mapped 46 sensor values to 46 sensor predictions (15 hidden units), plus a prediction for the immediate reward. The models were learned beforehand using the Backpropagation training procedure, employing a cross-validation scheme to prevent over-fitting the data. Initially, we used a training corpus of approximately 800 randomly generated training examples, which was gradually increased through the course of this research to 3,000 examples, taken from some 700 episodes. These training examples were distributed roughly equally among the

209 seven action model networks. In all our experiments the action models were trained first, prior to learning Q, and frozen during learning control. When training the Q networks (46 input, eight hidden and one output unit), we explicitly memorized all training data, and used a replay technique similar to "experience replay" described in [7]. This procedure memorizes all past experiences explicitly. After each learning episode, it re-estimates the target values of the Q function by recursively replaying past episodes, as if they had just been observed. Experience replay makes extensive use of the training instances, hence allowing for more accurate evaluations of the minimum requirements on data volume. In all our experiments the update parameter 3, was set to 0.9, and ~ was set to 0.7, which was empirically found to work best. We performed a total of seven complete learning experiments, each consisting of 25-40 episodes. In order to evaluate the importance of the action models for rapid learning in a lifelong learning setting, we ran two sets of experiments: one in which no prior knowledge was available, and one where Xavier had access to the pre-trained action models. One of the experiments is summarized in Fig. 6. In this figure, we visualized several learning episodes viewed from a bird eye's view, using a sonar map techniques described in [27]. In all cases Xavier learned to navigate to a static target location in less than 19 episodes (with action models) and 24 episodes (without action models). Each episode was between two and eleven actions in length. It consistently learned to navigate to arbitrary target locations (which was required in five out of seven experiments) always in less than 25 (35, respectively) episodes. The reader should notice the small number of training examples required to learn this task. Although the robot faced a high-dimensional sensation space, it always managed to learn the task in less than 10 minutes of robot operation time, and, on average, less than 20 training examples per Q-network. Of course, the training time does not include the time for collecting the training data of the action models. Almost all training examples for the action models, however, were obtained as a side effect when experimenting with the robot. When testing our approach, we also confronted Xavier with situations which were not part of its training experience, as shown in Fig. 7. In one case, we kept the location of the marker fixed and moved it only in the testing phase. In a second experiment, we blocked the robot's path by large obstacles, even though it had not experienced obstacles during training. It was here that the presence of appropriate action models was most important. While without prior knowledge the robot consistently failed to approach the marker under these new conditions, it reliably (>90%) managed this task when it was trained with the help of the action model networks. Obviously, this is because in EBNN the action models provides a knowledgeable bias for generalization to unseen situations.

210

Figure 5. Prediction and slopes. A neural network action model predicts sensations and penalty/reward for the next time step. The large matrix displays the output-input slopes of the network. White boxes refer to positive and black boxes to negative values. Box sizes indicate absolute magnitudes. Notice the bulk of positive gradients along the main diagonal, and the cross-dependencies between different sensors.

211

Figure 6. L e a r n i n g navigation. Traces of three early and three late episodes are shown. Each diagram shows a two-dimensional occupancy maps of the world, which have been constructed based on sonar information. Bright regions indicate free-space and dark regions indicate the presence of obstacles. Note that the location of the target object (marked by a cross) is held constant in this experiment.

Figure 7. Testing navigation. After training, the location of the target object was moved. In some experiments, the path of the robot was also blocked by obstacles. Unlike plain inductive neural network learning, EBNN almost always manages these cases successfully.

212 5. C O N C L U S I O N In this paper we have described an approach to the lifelong robot learning problem, based on Q-Learning and EBNN. We have empirically illustrated rapid learning capabilities in a mobile robot navigation domain, demonstrating the appropriateness of QLearning and EBNN for robot navigation with unstructured, high-dimensional perceptual spaces. The reader should notice that others have studies the transfer of knowledge. For example, in the context of reinforcement learning, models have been successfully employed for the transfer of knowledge via planning [9,25] or replay [7]. Others proposed hierarchical approaches, in which the building blocks, once learned, can be applied to multiple tasks [3,7,21]. A third way for the transfer of knowledge is concerned with the construction of better internal representations, which improve generalization across multiple tasks [1,16,19,23]. While this list is clearly incomplete, it nevertheless illustrates the importance for transferring knowledge for scaling machine learning to more complex domains [27]. From a lifelong learning perspective, much of the work presented in this paper is preliminary. While we have not yet studied robot control in the context of multiple tasks in practice, in experiments described here and elsewhere [13,26,27] we consistently found that EBNN outperforms pure inductive neural network learning, which does not employ background knowledge and hence learns from scratch. In a related paper we have illustrated superior generalization due to EBNN in a robot perception task [14]. Learning mechanisms that allows for the effective knowledge transfer, like EBNN, is a necessary prerequisite for successful approaches to the lifelong robot learning problem. ACKNOWLEDGMENT The author thanks Tom Mitchell, Joseph O'Sullivan and the CMU robot learning group for their invaluable feedback. He also thanks CMU for granting him access to the Xavier robot, without which this research would not have been possible. This research is sponsored in part by the National Science Foundation under award IRI9313367 to Tom Mitchell. Views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing official policies or endorsements, either expressed or implied, of NSF. REFERENCES

1. Richard Caruana. Multitask learning: A knowledge-based of source of inductive bias. In Paul E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 41-48, San Mateo, CA, 1993. Morgan Kaufmann. 2. Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinction approach. In Proceedings of 1992 AAAI Conference, Menlo Park, CA, July 1992. AAAI Press / The MIT Press. 3. Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufmann.

213 4. Gerald DeJong and Raymond Mooney. Explanation-based learning: An alternative view. Machine Learning, 1(2):145-176, 1986. 5. Vijaykumar Gullapalli, Judy A. Franklin, and Hamid Benbrahim. Acquiring robot skills via reinforcement learning. IEEE Control Systems, 272(1708):13-24, February 1994. Tommi Jaakkola, Michael I. Jordan, and Satinder P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Technical Report 9307, Department of Brain and Cognitive Sciences, Massachusetts Institut of Technology, July 1993. Long-Ji Lin. Self-supervised Learning by Reinforcement and Artificial Neural Networks. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, 1992. Pattie Maes and Rodney A. Brooks. Learning to coordinate behaviors. In Proceedings Eighth National Conference on Artificial Intelligence, pages 796-802, Cambridge, MA, 1990. AAAI, The MIT Press. Sridhar Mahadevan. Enhancing transfer in reinforcement learning by building stochastic models of robot actions. In Proceedings of the Seventh International Conference on Machine Learning, pages 290-299. Morgan Kaufmann, 1992. 10. R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In Paul E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 190-196, San Mateo, CA, 1993. Morgan Kaufmann. 11. Tom M. Mitchell. Becoming increasingly reactive. In Proceedings of 1990 AAAI Conference, Menlo Park, CA, August 1990. AAAI, AAAI Press / The MIT Press. 12. Tom M. Mitchell, Rich Keller, and Smadar Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47-80, 1986. 13. Tom M. Mitchell and Sebastian Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287-294, San Mateo, CA, 1993. Morgan Kaufmann. 14. Joseph O'Sullivan, Tom M. Mitchell, and Sebastian Thrun. Explanation-based neural network learning from mobile robot perception. In Katsushi Ikeuchi and Manuela Veloso, editors, Symbolic Visual Learning. Oxford University Press, to appear. 15. Dean A. Pomerleau. ALVINN: an autonomous land vehicle in a neural network. Technical Report CMU-CS-89-107, Computer Science Dept. Carnegie Mellon University, Pittsburgh PA, 1989. 16. Lori Y. Pratt. Discriminability-based transfer between neural networks. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufmann. 17. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.

214 18. Jacob T. Schwartz, Micha Scharir, and John Hopcroft. Planning, Geometry and Complexity of Robot Motion. Ablex Publishing Corporation, Norwood, N J, 1987. 19. Noel E. Sharkey and Amanda J.C. Sharkey. Adaptive generalization and the transfer of knowledge. In Proceedings of the Second Irish Neural Networks Conference, Belfast, 1992. 20. Patrice Simard, Bernard Victorri, Yann LeCun, and John Denker. Tangent prop a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems ~, pages 895-903, San Mateo, CA, 1992. Morgan Kaufmann. -

21. Satinder P. Singh. Transfer of learning by composing solutions for elemental sequential tasks. Machine Learning, 8, 1992. 22. Satinder P. Singh, Tommi Jaakkola, and Micheal I. Jordan. Learning without stateestimation in partially observable markovian decision processes. In Proceedings of the Eleventh Machine Learning Conference, 1994. Steven C. Suddarth and Y. L. Kergosien. Rule-injection hints as a means of improving 23. network performance and learning time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, Feb 1990. EURASIP. 24. Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3, 1988. 25. Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, June 1990, pages 216-224, San Mateo, CA, 1990. Morgan Kaufmann. 26. Sebastian Thrun and Tom M. Mitchell. Integrating inductive neural network learning and explanation-based learning. In Proceedings of IJCAI-93, Chamberry, France, July 1993. IJCAI, Inc. 27. Sebastian Thrun and Tom M. Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, to appear. Also appeared as Technical Report IAI-TR-93-7, University of Bonn, Dept. of Computer Science III, 1993. 28. Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, England, 1989.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

215

A Multilevel Learning Approach to Mobile Robot Path Planning F. Wallner, M. Kaiser, H. Friedrich, and R. Dillmann Institute for Real-Time Computer Systems and Robotics Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann Department of Computer Science University of Karlsruhe, D-76128 Karlsruhe, Germany E-Mail: wallner ,kaiser,friedric, dillm~nn@ira, uka. do

The problem of adapting mobile robot navigation to changes in the environment is usually approached by modifying an internal world model. Descriptions on different levels of abstraction provide the information necessary for navigation and therefore influence the robot's behaviour. The effect of such indirect adaptation is limited. The approach presented in this paper describes a new technique for direct integration of navigation experience in path planning. Thus, not only the world knowledge, but also the planning behaviour is improved over time. Experiments are carried out on a robot which is controlled by a layered architecture. It is integrated in a multi-robot control environment which is described. The focus of the article is towards improving the higher navigation levels. The main idea being presented is the realization of adaptive behaviour not only on the level of reflexes, but also with respect to the planning capabilities of the robot. The application of learning techniques allows to continuously improve the estimation of plan costs and therefore the inherent strategy of the topological planner. It is illustrated that a combined learning of world description and navigation allows fast and sophisticated reaction to new environmental conditions. 1. I n t r o d u c t i o n In most applications, a mobile robot operates in a world which is not exactly known and changing over time. A robot which is not able to adapt itself to a dynamic environment will operate inefficiently or even fail after a while. Two approaches are investigated how to modify the robot's behaviour when the environment has changed. The first Call be called indirect adaptation. Changes in the environment, such as newly detected obstacles, are perceived via the robot's sensors and integrated in a world model [1]. As planning is based on world knowledge, the robot will behave according to the new situation. However, such indirect adaptation offers no possibility to integrate navigation experience made by the robot when driving along a route. A second way of adaptive behaviour can usually be found at the level of reflexes. Especially learning of collision avoidance strategies has already been investigated by different researchers such as [4]. In addition, adaptation on a higher level is an appealing field of research. Besides its dependency on knowledge about obstacles, the feasibility of a path

216 depends on additional features that are difficult to extract from a world model, such as the time since a route has been last visited, the number of successful (or less successful) visits, the obstacle occurrence and externally given constraints like maximum throughput or m a x i m u m safety.

Figure 1. The mobile system PRIAMOS

This paper describes a learning mobile robot system that improves high level path planning through navigation experience. In combination with sensor based map improvement, this high-level adaptation enables the robot to operate efficiently even in changing environments. Experiments are carried out on the mobile robot PRIAMOS 1. PRIAMOS is a mobile system with three degrees of freedom, i.e. motion in longitudinal and transverse directions and rotation around the centre of the vehicle. A set of ultrasonic range sensors is mounted on a circle around the centre of the vehicle. As a highly sophisticated sensor, an active stereo-vision head is also available. There are two onboard computer systems, a real-time image processing system which provides a vector description of stereo-images at videorate and a multi-processor system for the control of the vehicle. The robot is connected to a stationary control computer via a radio link. 2. T h e control a r c h i t e c t u r e of P R I A M O S For experiments in the field of mobile robotics, the control environment MARS 2 has been developed. The structure of this environment is shown in figure 2. MARS has a strictly modular concept. Higher level modules are running as independent UNIX processes and are connected via sockets. Lower level modules with stronger real1PRI = Permutation of the name of our Institute, AMOS = Autonomous MObile System 2Mobile Autonomous Robot control System

217

MMI We rl"d Specification

Missionspecification

J

Control and Surveillance

Vehicle control ..... a r c h i t e c t u r e ,,

11 II

! i .........

b

Sensor/ emulator

t5

Robot/ emulator

Figure 2. The control environment MARS

time constraints are executed on the robot's internal OS/9 based computer systems. Central components are a man-machine-interface for monitoring and control and a database. The number of robots the operator wants to control can freely be specified. Due to the modular concept, a separate controller for each robot is initiahzed. For all physical components, a software emulation is available which allows off-line program development and testing in simulation. Local data structures that are updated automatically only when changes occur reduce the load on the database. The control architecture of each single robot is structured horizontally and vertically. The vertical structure consists of different layers of abstraction. Each layer contains a navigation and a perception module that are connected through a world model. This world model is constructed by the perception module and integrates sensor information and (on higher levels) information stored in the database. Figure 3 shows the levels of abstraction of the world representation. On the highest level, the world model has a topological structure. Path segments are described as edges E and intersections as nodes N in a graph. This allows simple path planning over longer distances without the need to take geometric details into account. In the vicinity of the robot, obstacle geometry is needed for precise path planning. Therefore, a local dynamic model of the perceivable environment which also integrates recent sensor information is used for geometric path planning. A problem of layered control architectures is the difficulty to react quickly to obstacles that are not known a-priori, but newly detected. The information used for planning on higher levels must be filtered and integrated over time due to sensor noise. During the time between an observation, the appearance of the corresponding object in the world

218

Figure 3. World model hierarchy

model, and the planning of a new path, the robot would have to stop its motion to insure collision-freeness. However, as path modifications are often of a local nature, a simple collision avoidance algorithm using recent sensor data can increase the average operating speed of the robot. On this reflexive level, the world knowledge consists of raw sensor data only. In some obstacle configurations, local reactions are wrong. A path that seems to be the best can be blocked by some obstacle that has not yet been observed. A similar problem can occur if an obstacle exists in the geometric map but not in the topological one. The topological planner will produce a path which the geometrical one can not follow. Section 3.3 shows the integration of both layers and how to deal with these problems. 3. T h e h i e r a r c h i c a l p l a n n i n g a p p r o a c h Based on the world model, on each level a navigation module decomposes the task commanded by the next higher level into a set of more simple ones, such that a location the robot should reach is finally transformed into a set of incremental position commands that are passed to the robot motion controller. The reflexive level provides the basic motion capabilities that are used by all higher levels, such as the topological and the geometrical planning level described below. A collision avoidance strategy modifies a given path on the base of recent sensor data [6]. Additionally, a set of reflexive behaviors exists that are employed by higher level planners or directly by an operator in situations requiring the robot to move through a door, to enter a narrow passage, or to dock at a recharging or loading-station.

3.1. The topological p l a n n e r The idea of introducing topological graphs (see figure 5) as a more abstract way to represent free spaces and existing passages in the environment is very appealing (see also [3]). The graph representation allows easy access to the existing information, and

219 algorithms known from graph theory are directly applicable. However, an efficient use of topological maps in a real world requires that the following aspects are taken into account: The topological map itself is subject to change due to changes in the environment. It must be updated as soon as changes in the geometrical map occur. As soon as the robot's current position and its goal are not directly located on the topological map, selecting good or even optimal points to enter and leave the graph is not straightforward. In a dynamic environment it is possible that two motions, starting at the same point and aiming at the same goal, require the use of different topological paths. The cost of moving along a certain edge of the topological map cannot completely be calculated a-priori, since it does not only depend on static features of the environment (such as the length of the corresponding path), but also on the "difficulty" to traverse the edge (i.e., on the complexity of the required reflexive behaviour), and on global constraints imposed on the whole transport subsystem. The basic strategy of topological path planning is the following: temporary edges from the start position to the entry node and from the exit node to the goal position are generated, the two distance are obtained by a geometrical planning step. A branch and bound search algorithm is used to find a sequence of edges satisfying a given set of criteria (e.g. length of a path) with the least overall costs. Section 4.1 shows how to find good points to enter respectively leave the graph, given arbitrary start and goal positions. During these considerations, the topological map will be assumed to be static. This is a reasonable assumption since changes in the topological map that affect the current plan will force the robot to perform a new planning step, which will possibly result in a completely new path and can therefore be regarded as a completely new mission. However, temporal changes such as a passage blocked by a human, a vehicle, or a large obstacle, are taken into account. They are handled on the level of edge cost function estimation (see section 4.2). 3.2. T h e g e o m e t r i c a l p l a n n e r Geometrical planning is restricted to a local area which is slightly larger than the perception range of the robot's sensors, as planning is performed in the local geometric model. The central part of the geometric model is obtained by a fusion process of actual sensor data and a-priori knowledge. Planning in areas further away is mainly a static procedure and can be performed on layers of higher abstraction. However, due to the local nature of the geometrical model, the next graph node might not be located within the area of geometric planning. In such cases, special care has to be taken in order to plan a path which finally leads to the correct topological node. Path planning is based on the idea of potential distance fields [6]. The algorithm described below guarantees to find the shortest collision free path. Its main advantage is its computational efficiency. The holonomic driving principle of PRIAMOS offers the advantage that no additional constraints have to be taken into account. Planning is done in five steps.

220 9

If a topological path node is located outside the local model, a goal at the border of the local model in the direction of the next topological path node is computed.

9 Obstacles are enlarged by half of the robot length so that further planning can consider the robot as a point. This approximation can not be applied in narrow passages (e.g. doorways). Here the reflexive collision avoidance is replaced by appropriate reflexive control strategies. 9 A square grid that covers the planning region and which is centered around the current position of the robot is generated. Each cell of the grid contains binary information indicating if this cell is free or not. 9 The next step is path planning itself. The cell of the goal position is initialized with a start-value ('1'). The 4 neighbours of a cell are recursively (beginning with the goal-cell) given a value: celLvalue + I if they are free. This procedure is repeated until the cell which covers the robot position is reached. Starting there, a path is found by successively moving to the lowest valued neighbour cell (8 neighbours are considered in this step). 9 Finally, a polygon is defined by the list of cells. corner-points of this polygon.

A path is represented by the

The complete obstacle transformation and planning procedure is illustrated in figure 4.

Figure 4. A geometric planning example. Shown is a ground plan of an office environment as well as the result of a planning cycle. (#: obstacles; $$: robot position; filled squares indicate the path)

The geometric planning module provides two different modes of operation. In MOTION mode, a trajectory towards a given goal is planned within the extension of the current local model. In NO_MOTION mode, two positions are specified. In this mode, the planner obtains a geometric description of a specified area from the database and executes a planning step in this static model. The NO_MOTION mode is necessary to support the

221 topological planner in finding the optimal nodes for entering and leaving the topological graph by planning paths from the vehicle's start and goal position to the preselected entry and exit nodes. In case of successful planning the length of the generated path is returned to the topological planner.

3.3. Integration of navigation layers Even if the hierarchical organization of the planning and navigation process is necessary to handle the complexity involved in these tasks, it also causes a number of probles. In some obstacle configurations, local reactions are wrong. A path that seems to be the best can be blocked by some obstacle that has not yet been observed. A similar problem can occur if an obstacle exists in the geometrical, but not in the topological model. The higher level planner will produce a path which cannot be followed by the lower levels. To overcome such inconsistencies, higher level planners have to be continuously re-activated. Such replanning starts at the current robot position and can therefore be used to verify if the current path is still the best. The local nature of the geometrical planner implies constraints on the nature of the topological graph. If the distance between topological nodes is bigger than the area covered by the local model and obstacles exist in between, geometrical planning can fail in planning a path between these segments. The flow of information between the different layers is strictly hierarchical. A higher level generates a path and sends it to the next lower planning layer. It is possible that a high level planner or the task decomposition module itself assigns a low level reflexive behaviour to a path segment. In such cases layers which are located inbetween are not activated. The path execution is controlled by higher levels which are activated automatically. The only information which is returned by lower levels is the state of the execution, i.e., if a certain task could be completed or if it failed (e.g., because no path to the target position existed). 4. Multilevel adaptive b e h a v i o u r The apphcation of mobile robots in future factories, and the subsequent definition of more complex evaluation criteria regarding the robot's behaviour (such as maximum throughput of the whole transport subsystem) require adaptive behaviour also on higher cognitive (planning) layers. In this section, adaptive behaviour on the two layers of topological and geometrical planning is described. 4.1. E x p e r i e n c e - b a s e d selection of t o p o l o g i c a l n o d e s The efficient application of topological maps in a real world requires to cope with the problems of a changing environment and arbitrary start and goal positions. While the next section describes a way to overcome the former, a possible solution to the problem of selecting good entry and exit points into and out of the topological graph is given now. The task to be solved is to determine good or even optimal entry and exit points into respectively out of the topological graph. This task is difficult to solve because of several problems (see figure 5): 9 The edge and node closest to the start and end point do not necessarily represent optimal or even a good entry and exit points.

222

.................................................. : .....- i

o .

!i 9

-

-!r.......... ......... ! ~ i

..................

... 9

i

,yg)

.~.~

9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

r . . . . . . . . . . .

t,

x(~,y 0

Figure 5. Start point (Zo, y,), goal position (zg, yg) in the topological map T of PRIAMOS' operating environment. 9 Obstacles might exist between the start/exit point and the theoretically optimal entry/exit nodes/edges. If --*G denotes a geometrically planned path, and --~T a topologically planned one, the

problem can be stated as follows: Given: The position (z,,V,) of the start and (zg,Vg) of the goal point, the topological map T = (N, E) and the geometrical map G. D e t e r m i n e : The entry node N,n and the exit node N,z such that the path P = {(zo,y,) ~ G .Are, ~ T N,, ~ G (zg, yg)} causes minimal costs. Initially, entry and exit nodes are selected by applying an heuristic H*. n nodes nearest to the start/end position are selected. On the base of the world model, the geometrical planner determines the costs to reach each of these nodes. Afterwards, en <_ n possible entry and ez _< n possible exit points exist. The topological planner determines the cost of the resulting en x ez paths (which might partially include each other), and the best (i.e., most cost effective) combination of path and entry/exit point is selected. Since this heuristics cannot guarantee that movement starts with a "bad" node resulting in a much more costly topological path, the generated path must be checked in a global context, using experience that has been obtained from already driven paths. The applied method is based on the idea of incrementally building a decision structure, providing a good guess of useful entry and exit points. Given the start position (z,, y,), the goal position (zg, yg), and a decision structure D, the method works as follows" 1. Apply the heuristic H* to obtain a possible path P with costs c from ( z , , y , ) to 2. If D is empty, use path P and continue with 8. 3. If D is not empty, try to classify the example B = ((x,,y,), (zg, yg)) by application of C L A S S I F Y - W l T H - D S ( D , B ) in order to obtain ( ,N, , BN ~ ) .B If B cannot be classified, use P and continue with 8.

223 4. Calculate the costs ~ of N~n -~T N~. 5. If c < ~, calculate the costs c. and % for moving from the current start and goal point to the entry and exit node N~n and N ~ , respectively. Else use P and continue with 8. 6. If c < ~ + cs + co, use P and continue with 8. 7. Generate the path P = {(zo,y.)--*a N~S,, ~ T N ~ ~ c (zg, yg)} and use it. 8. After the goal has been reached, build an example B = ((z.,yo), (zg, yg), Nen,/Vex). Let c be the actual costs for moving from (z.,y.) to (zg,yg) (and vice versa). 9. Insert B with costs c into the decision structure D using

BUILD-DS(D,B,c).

Summarizing, the selection of "good" nodes to enter and leave the topological map is based on the idea that typical paths connecting regions of the workspace can be identified.

Figure 6. Part of a decision structure D.

These regions are represented as nodes inside the decision structure. As shown in figure 6, these nodes as well as the connections between them are generated during operation, such that the decision structure D is built incrementally. The applied method is similar to incremental clustering [2]. Let S = ((x,,y,),(xg,ya)) be the example to be classified. Let z ( N ) , y ( N ) be the center of the cluster represented by a node N of D and d(N) be the radius of this cluster. Then, classification using D works as follows: C L A S S I F Y - W I T H - D S ( D , B): 1. Find the set

SN. of nodes of D such that + V/(y(N)- y,)' < d(N).

a= 2. If SN, is empty, return failure.

3. Let Nm be the node with the minimum d.

SNg of nodes connected to Nm such that d = V/(x(N)- zg)2+ V/(Y(N) ya) 2 < d(N)for N C Sgg.

4. Find the set

224 5. If SN, is empty, let SN~ = SN~ - N~. Goto 2. 6. Find the node N a E SN, such that the newly calculated costs c of the connection between N,,, and Ng are minimal. 7. Return N ~ := Nm, N B := Ng. After a path has actually been used by the robot, its costs are exactly known to the system. Therefore, the combined information (path, costs) is used to incrementally built the decision structure. During the generation of the structure, the geometrical distance between two positions serves as a lower-bound estimate of the costs to move from one position to the other. With B = (((z,, y,), (%, ya), N~, N~) and c being the real costs of the path from (zo,y,) to (za,yg), the building of D works as follows:

BUILD-DS(D,B,c): 1. Find the set SN. of nodes of D such that

dl = V / ( z ( N ) - z,) 2 + V / ( Y ( N ) - y,)2 < d(N). 2. If SNo is empty, goto 12. 3. Let N1 be the node with the minimum di. 4. Find the set Slvg of nodes connected to Nx such that d2 -

r

2+

V / ( y ( Y ) - ya) 2 < d(N) for Y e SNo. 5. If SNg is empty, let /V = N1 and goto 13. 6. Let N2 be the node with the minimum d2. 7. Let ~ be the newly calculated costs for the path from N1 to N2, N~,, be the entry node and/Vex be the exit node associated to that path. 8. I f ~ < d x + d 2 + c , (a) if dx > d2 expand the region represented by N2 with d(N2) = d(N2) + d2 and continue with 12. (b) else expand the region represented by N~ with d(N~) = d(N1)+dx, let N - N1 and continue with 13. 9. Else use the geometrical planner to calculate the costs cl to move from (zs,ys) to Ne,, and the costs c2 to move from Ne= to (zg,yg). 10. If ~ > ci + c2 + c, move and expand the region represented by NI: Let ~ = c + cl + c2, N.,, = N:B., N.: - N~B: x ( N 1 ) - :(N,)+:. y ( N ~ ) - u(N,)+u, z ( N 2 ) - :(N:)+:, '

2

'

2

'

2

'

y(N2) = ~(N~)+y,, d(N~) - d(Nx) + dl and d(N2) - d(N2) + d2. 11. Else (a) if Cl > c2 expand the region represented by N2 with d(N2)= d(N2)+ d2 and continue with 12. (b) else expand the region represented by N1 with d(N1) = d(N~)+ d~, let/V = Nx and continue with 13.

225 12. Insert a new node representing the current start region Add a node N1 to D.

(b)

Let (z(N1),y(N1)) = (zo,yo) be the center of the cluster represented by the node and d(N1) = dmi, be its width.

(c)

Find the set SNo of nodes of D such that d = y/(ZiN) ' Zg)~+ V/(Y(N) - yg)2 < d(N) for N e Sn,.

(a) (e) (f) (g)

If SNg is empty, let N = N1 and continue with 13. Let N2 6 Sn, be the node with the minimum d. Initiahze the connection between N1 and N2 with the data obtained from B. exit BUILD-DS.

13. With N representing the corresponding start region, insert a new node representing the current goal region: (a) Add a node N2 to D. _

(b) Initialize the connection between N and N2 with the data obtained from B. Summarized, the method works by building a structure representing (possibly overlapping) regions of different size and virtual connections, estabhshed by already driven paths between them, thus improving the selection of good entry/exit points over time. The method features a minimum overhead regarding the apphcation of the geometrical planner, since geometrical planning is during learning only necessary in the case of very close start- and goal positions. It should be noted that the costs of known paths are not stored in the decision structure, because the estimation of edge costs is improved continuously (see below). 4 . 2 . I n c r e m e n t a l i m p r o v e m e n t of edge c o s t e s t i m a t i o n

Given the topological map, a start node, a goal node, and the costs of moving along the individual interconnections (edges) between each of the nodes in the map, an optimal path from the start to the goal can be found by several means. Provided that a lower bound estimation of the individual edge's costs exists, algorithms such as A* produce optimal solutions. However, they also require to keep track of the real costs that emerge and possibly to climb up the search tree if the current branch does not yield the desired solution. After the goal node has been reached, the best path between the start and the goal is known. Transferred to the given apphcation domain, this behaviour would result in the robot moving back and forth along paths. This solution is of course not satisfactory, since the aim must be to obtain a near optimal path before the robot starts moving. Therefore, the lower-bound condition for the edgecost estimation (as, for instance, given by the length of the corresponding passage) is not sufficient. In fact, the edge costs must be known as accurately as possible in order to produce a good path. Moreover, in a dynamic environment the costs of traversing an individual edge can also be changing. Therefore, the edge cost function C(e) must meet the following requirements:

226 9 The a-priori estimation of C must incorporate not only static components, but also information about changes in the environment. 9 After the real costs of traversing an edge are known, the edge's cost function must be updated. These requirements yield the following definition of the cost function C: Ca_priori(e) =

g(l,,p,) + h(hist(e)),

where 9 a monotonically increasing function of the length l, of the edge and the complexity p, of the required behaviour, hist(e) is the accumulated information about the costs to move along the edge e and h is a function mapping this complex information to a numerical value. After the robot has moved along the planned path, the real costs of the individual edges are known: Ca.posteriori(e) =

f(t(e),t,t(e)) + g(l,,pe) + f(o(e),o,t(e)),

where f is a monotonically increasing function of the fraction between the measured time t(e) required to move along an edge and the estimated time t,,t(e), and f is a monotonically increasing function of the fraction between the measured traversing difficulty o(e) and the estimated one o~,t(e). Summarizing, the following terms must therefore be determined: 9 The parameter te,t(e), which is first given by the length of the edge and the nominal speed of the robot, but which will change according to the measured time t(e) required to traverse edge e. t(e) 9 The function f, which is realized a s Ote t,,,(e). t(,) ). 9 The function f, which is realized as oT,t,o,(, -

9 The parameter p,, which can be calculated for each edge from the complexity of the commanded movement, which might, for instance, depend on certain security requirements. 9 The parameter le, which is a static attribute of each edge and can be calculated from the geometrical model of the environment. 9 The function g which combines the length of the edge and the commanded movement to a numerical value. This function is static as are the physical characteristics of the robot and the length of the edges. 9 The information about the edge e over time hist(e) 9 The history analyzer function h. It is obvious that the information that is most difficult to handle concerns the behaviour of a particular edge in the past. While the exploration strategy described in section 4.2.2 accounts for a changing traversability of the edge (i.e., if the edge can be used at all), dealing with continuous valued, possibly periodically changing parameters (such as the density of obstacles) requires much more sophisticated strategies.

227 4.2.1. P r e d i c t i o n of edge costs Analyzing the history of an edge in terms of its costs over time aims at value of the function h. Therefore, the problem can be stated as follows:

predicting the

Given: The history of the edge in terms of the costs C ( e ) t , . . . , C(e)l~J (with le[ being the number of times the edge has been used) to traverse it. D e t e r m i n e : ] t ( C ( e ) t , . . . , C(e)lel) such that ]t(C(e)t,..., C(e)lel) ~ h(hist(e)). Given the definition of Ca_priori(e)and Ca_posteriori(e), it can be found that h(hist(e)) and therefore h should approximate H = f(t(e),t~,t(e))+ ](o(e),o+ot(e)). As H can be calculated directly after the edge has been passed, each passing of an edge yields an example from which the general shape of the function tt can possibly be derived. The task to be solved is therefore to improve the prediction of the value of it by means of learning from examples of h(hist(e)). An appropriate learning technique to deal with such problems is temporal difference learning [5]. 4.2.2. A s t r a t e g y for e x p l o r a t i o n As mentioned in section 4.1, temporal changes such as a passage blocked by a human, a vehicle, or a large workpiece, must be taken into account. In fact, allowing the robot to try again to use a passage that has been found to be blocked before, requires an exploration strategy. Principally, this strategy works by assigning a temporally valid blocked attribute to edges that could not be traversed. This attribute is only valid for a certain time, with the time depending on the experiences made with the corresponding edge, i.e., how often the edge has been found to be traversable respectively blocked. In detail, the exploration strategy works as follows: 1. Initialization (a) Assign a blocked counter to each edge. (b) Assign a blocked h i s t o r y counter to each edge. (c) Set the blocked counter to zero. (d) Set the blocked h i s t o r y counter to one. 2. During planning: (a) If the blocked counter of an edge is zero, the edge can be traversed, (b) else decrease the blocked counter by one. 3. During execution" (a) If an edge is found to be traversable, set its blocked h i s t o r y counter to mar.(-ti blocked h i s t o r y counter, 1). (b) If an edge is found to be blocked, set its blocked counter equal to the blocked h i s t o r y counter, double the blocked h i s t o r y counter. This strategy realizes a dynamic adaptation of the frequency of retries the robot performs on edges that have been found to be blocked.

228 5. Conclusion and f u r t h e r work The work described in this paper deals with an important aspect of the realization of mobile systems that are adaptive with respect to their environment. The main idea that has been followed is to realize adaptive behaviour not only on the level of reflexes, but also with respect to the planning capabilities of the robot. This approach is closely related to the hierarchical control architecture of the mobile robot PRIAMOS, which has been used for experiments. By means of continuously improving the estimation of edge costs and the evaluation criteria of the topological planner, changes in the environment are not only reflected in the explicit world model, but also in the robot's reasoning system. The main activities at the moment are concerned with the practical evaluation of the theoretically achieved results. Additionally, some more theoretical work has to be done concerning the decision structure. Especially algorithms for shrinking, deleting, splitting and merging of regions seem to be necessary. It is not clear if the different evaluation functions cover all situations that occur in a real factory. However, assuming that the robot's working environment is not too cluttered (which should be the case e.g. in an automated factory), both the planning and the adaptive components can be expected to work flawlessly. Acknowledgement This work has partially been supported by the ESPRIT Project 7274 "B-Learn II". It has been performed at the Institute for Real-Time Computer Systems & Robotics, Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany. The authors would like to thank the anonymous reviewers for their helpful comments. REFERENCES 1. A. Elfes. A sensor-based mapping and navigation system. In Proceedings of the IEEE International Conference on Robotics and Automation, 1986. 2. J . H . Gennari, P. Langley, and D. Fisher. Models of incremental concept formation. Artificial Intelligence, 40:11- 61, 1989. 3. P. Kampmann and G. Schmidt. Topological structured geometrie-database and global motion planning for the autonomous mobile robot macrobe. Robotersysteme, 5, 1989. 4. J. del R. Mills and C. Torras. Efficient reinforcement learning of navigation strategies in an autonomous robot. In Proceedings of the IEEE//RSJ Conference on Intelligent Robots and Systems, 1994. 5. R.S. Sutton. Learning to predict by methods of temporal difference. Machine Learning, 3:9 - 44, 1988. 6. F. Wallner, T. C. Lfith, and F. Langinieux. Fast local path planning for a mobile robot. In Proc. of the 2nd Int. Conf. on Automation, Robotics and Computer Vision, Singapore, 1992.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

A self-organizing representation

229

of s e n s o r s p a c e for

mobile robot navigation Ben J. A. KrSse and Marc Eecen a aFaculty of Mathematics and Computer Science, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, the Netherlands k r o s e @ f w i , uva. nl

The paper describes a sensor-based navigation scheme for a mobile robot system which makes use of a global representation of the environment by means of a self-organizing map or Kohonen network. In contrast to existing methods for self-organizing environment representation, this discrete map is not represented in the world domain or in the configuration space of the vehicle, but in the sensor domain. The map is built by exploration. A conventional path planning technique now gives a path from the current state to a desired state in the sensor domain, which can be followed using sensor-based control. Collisions with obstacles are detected and used in the path planning. Results from a simulation show that the learned representation gives correct paths from an arbitrary starting point to an arbitrary end point. 1. I N T R O D U C T I O N Goal-directed, collision-free robot navigation can be a result of planned actions, where a path planner determines off-line the optimal path from the current configuration to the desired configuration. Alternatively, it can be induced by a reactive behaviour, where sensor data is used directly to generate actions. If a planning method is used for navigation, global information about the environment is needed. Since such global information is not present in the data of sensors on the robot system, an internal representation of the environment must be used. Such a global model is usually represented in a configuration space spanned by the degrees of freedom of the robot system. For a mobile robot these are its position and its orientation. The environment is usually represented as a (geometric) model indicating the free and occupied areas. Given knowledge about desired and current configuration, a large number of methods exist for planning in configuration space or world space [4]. The construction of such an environment model can be done on the basis of a-priori knowledge, but the maps can also be learned by exploration. Self-organizing networks have been presented as maps in the world domain or configuration space [6],[7]. The role of sensor measurements in such a navigation scheme is to estimate the actual state of the robot system in the configuration space. Controlling the system consists of sending control commands to the actuators such that the error between actual state and desired state in the configuration space is minimized.

230 Reactive systems on the other hand consider the state of the robot not in configuration space but in sensor space. A state of the robot is now defined as a constellation of sensory data or features extracted from these data. It has been shown that the difference between the actual and desired state in sensor space can be used to control the robot, either using an explicit model of the system [11] or a learned, implicit model in the form of a neural network [9]. However, since the sensor information is inherently local, reactive systems suffer the from the existence of local minima. Methods have been described for escaping from those minima but do not result in optimal paths. The only guarantee for an optimal path is the use of global knowledge. For reactive systems this should be a representation in the sensor space, so that paths in sensor space can be planned which can be followed by a reactive system. In this paper we introduce the sensor space as the state space of the robot and describe some of the possibilities and problems for representing environment information. A selforganizing model is described which represents the environment in the sensor space. The model is used for path planning and path following for a non-holonomic mobile robot. 2. S E N S O R - B A S E D

CONTROL

AND GLOBAL

REPRESENTATION

Although animals pick up objects and navigate through complex environments without precise numerical three-dimensional position information, most robot systems still use an internal representation of the world in a carthesian domain. However, for certain robot tasks, robot motion can be successfully controlled directly by using sensor data or features derived from these data instead of computing positional information from these data. For robot manipulators, "visual servoing" methods have been developed [2],[11] which use a direct mapping from image features to control commands. These methods form the basis for our sensor space representation and navigation scheme for mobile robots. A mobile robot is equipped with a variety of sensors, which may be range sensors, visual sensors which detect marker positions in the image, beacon detectors, and so on. Let the state be described by the vector f - (fl, ..., fro) C 9r in which fi is a sensor output or the value of a feature derived from sensor data. The robot is controlled by sending it displacements Ax. For sensor-based navigation, the vehicle has to follow a trajectory f(t) in sensor space 9t-. Similar to visual servoing methods, an error in the sensor domain Af(t) can be mapped into the necessary motion A x via the m x n "Feature Sensitivity Matrix" JR: Ax = JFtAf.

(1)

in which J F t is the generalized inverse of JF, and can be derived from a model of the system or can be represented in a neural network which is trained from a learning set. For goal-directed navigation in a cluttered environment, the problem is to find a trajectory from a current state f to the goal state fgoaz. Similar as for carthesian-based navigation schemes, a procedure which navigates the vehicle along the vector f(t) - fgoaZ will be stuck in local minima. For this, we need a global representation of the sensor space, as well as methods to plan paths in this space. Our first question is how global environment knowledge looks like when expressed in the sensor space ~-. When the robot operates within a given environment R, all possible measurements f will lie in a fixed sub-space 7~ of the m-dimensional sensor space F , since not all sensor

231

Figure 1. For a fixed orientation of the vehicle all sensor data f will be represented as a 2-dimensional manifold in the n-dimensional sensor space. In a) this manifold is drawn for a 3-dimensional sensor space for the sensor configuration sketched in b). The environment is a square room.

vectors will occur in R. Furthermore, we know that this subspace is 3-dimensional, since the number of degrees of freedom to generate those data is only 3: position and orientation of the vehicle. In order to study the complexity of such a subspace we have computed and visualized this subspace for some simple situations. For example, the sensor space of a vehicle equipped with 3 range sensors is 3-dimensional. For a fixed orientation of the vehicle all sensor data f will be represented as a 2-dimensional manifold in this sensor space. Figure 1 shows the complex nature of the data. A trajectory of the vehicle in the room will result in a trajectory on the manifold. The remainder of the paper is concerned with how to construct this subspace which is basically finding a non-linear projection for the mapping of sensor data onto this subspace. The model is used for pathplanning and navigation.

3. R E P R E S E N T A T I O N MAIN

OF T H E E N V I R O N M E N T

IN THE SENSOR

DO-

Because it is assumed that the environment is not known (and even if it is known it is not a trivial task to find an analytical expression of the subspace) a learning m e t h o d has to be found which builds the internal representation, and is able to represent the nonlinear mapping from 9c to 7Z. In this paper we used a self-organizing vector quantization scheme in the form of a Kohonen network. A similar approach was presented by [3]. The Kohonen algorithm learns from examples a mapping from the input space jc to a lattice ~4 of N formal neurons rl, ...rN. Since we know that the intrinsic dirnensionality of the sensor data is three, a three-dimensional lattice is used. The map Cw : 9r ~ j t assigns to each sensor vector f C ~ an element Cw(f) E A, and is defined by the "weights"

232

l compass

16-d range data

r3

Figure 2. Different positions may result in identical sensor states. Therefor a compass was added. The value of the compass determines the winning "layer" while the value of the range sensor determines the winning neuron in a layer.

w = (wl, w2, ...wN), wi E ~-. The function Cw(f) is specified by the condition IWcw(f)- f l -

min Iwr - fl rcA

(2)

i.e., an sensor vector f is mapped onto that neuron r for which I W r - f] becomes minimal. During learning the weights of all neurons r in the network are adapted according to the Kohonen rule nWr = ~A(r, k ) ( f -

Wr),

(3)

where the learning parameter 7/is multiplied with A(r, k), a decreasing function of distance of neuron r to the winner k [8]. Learning also decreases with time to obtain a stable network.

4. E X P E R I M E N T S

ON REPRESENTATION

Our mobile robot is equipped with 16 range sensors, each of which gives the distance to the nearest object within a beamwidth of 2 degrees. The sensors are fixed on the vehicle at orientations uniformly distributed between 0 and 2~r. Each measurement can be considered as a vector f in a 16-dimensional sensor space. It is obvious that if the vehicle is not equipped with only range sensors but also with other external sensors such as beacon detectors, ultrasonic sensors, etc, the dimensionality of the Kohonen network can still remain three. A problem with using range data only is that (in particular for symmetric environments) range data at a particular location can be identical to range data at another location but for a different orientation of the vehicle. We equipped the vehicle with a compass in order to distinguish between these situations. With data from the 16-dimensional range data and the compass, a 10 x 10 x 10 network was trained. The 17dimensional sensor vector was not fed directly into the network but instead a hierarchical arrangement was used, similar to [5]. According to the output ~o of the compass a layer in the "z" direction of the network was selected. Within this layer a winning neuron k was selected as the neuron having a weight vector closest to the 16-dimensional range input vector f (see Figure 2). Weights of all neurons r in the network were adapted according to the Kohonen rule (Eq.3). Periodicity in the orientation ~ was taken into account by

233

Figure 3. Regions in the work space where a neuron is winner and positions where the distance to the winning neuron is m i n i m a l (vertices of network). This is an easy room.

a w r a p - a r o u n d of the network: neurons of the u p p e r and lower layer of the network are considered neighbours. For two environments we tested the algorithm. The first environment was a square room of 20 x 20 meters, with only two small obstacles in the lower-left and lower-right corner. The second test was done in a square r o o m with a n u m b e r of large obstacles. The latter r o o m was also used as test environment for the p a t h planning. Training samples are generated by positioning the vehicle at 40.000 r a n d o m positions and orientations in the r o o m and d e t e r m i n i n g the range data. In this way the 3-dimensional network in the sensor space is built. For visualization we d e t e r m i n e d which neurons were active for which configurations of the vehicle. This was done by systematically displacing the vehicle by small steps in the working space for a fixed orientation and r e p e a t i n g it for other orientations. For each position the label of the winning neuron was stored as well as the distance to this neuron.

234

Figure 4. Positions where the distance to the winning neuron is minimal (vertices of network). This is a difficult room.

Results are depicted in Figure 3. The boundaries between the regions in which a neuron is winner are plotted as well as the positions at which the distance to the neuron is minimal (vertices of the network). For the simple room depicted in Figure 3, the neurons form a nice square lattice in the room. However, for the more complex room we have used in our experiments, the results are somewhat different, as shown by Figure 4. Noticeable in the first place is the fact that some connections are missing. The reason for this is that only connections are drawn between nearest neighbours in the lattice. Apparently not all neurons will become winner during the test. This may be a result of the fact that because of the obstacles there will be many discontinuities in the sensor space. The Kohonen learning procedure may put neurons at the discontinuity. Secondly, the figure shows that very distant positions in the world space may activate neighbouring neurons in the network.

235 5. P L A N N I N G

AND NAVIGATION

It has been shown (e.g., [10]) that discrete representations of the working space, in the form of Kohonen or similar networks can be used for path planning using conventional search algorithms. Also, if the neighbour relations of neurons are considered as excitatory synaptic connections, the dynamics of such a topologically organized neural net can be used for path planning and obstacle avoidance [1]. In our application, where a path in sensor space has to be planned for a mobile robot this approach can not be followed for a number of reasons. 1. For non-holonomic systems such as our mobile robot, two neighbouring ('connected') neurons in the sensor space are not necessarily connected in the sense that an elementary action can bring the system from one state to the other. 2. If planning is done within the Kohonen network, the metric which will be used for path planning will be the metric of the lattice of neurons. The Kohonen learning rule places the nodes according to the probability density function of the inputs: regions in input space with high probability will have a large density of neurons. This means that the length of the path will depend on the distribution of learning samples. For this reason we decided not to use the neighbour relations of the Kohonen network for path planning. Instead we considered the navigation as a Markovian decision problem with the neurons of the Kohonen network as a discrete set of sensor states. During exploration of the environment, the sensor values f and corresponding actions u were stored. For each f the winning neuron ri is calculated before and after the action u, and an estimation of the state transition probability pij(u) is made. pij(u) is defined as the probability that, given an initial state (neuron) i, the vehicle is found in state (neuron) j after action u. In this way, neighbourhood relations between sensory states are imposed by the elementary actions. For navigation a greedy policy is used with respect to a cost-function f* which assigns to each neuron i the (expected) cost f* for driving toward the goal:

u -- argmin [c(u) + 9/ ~j pij(u) f~] ,

(4)

with c(u) the immediate costs of the action. The problem is to find the cost function f* for the Markovian decision problem. For this a dynamic programming approach is chosen. The cost function can be approximated via an iterative process if the direct cost c(u) of an action u and the transition probabilities are known:

fi(k + l) - min [c(u) + 3'~PiJ(u)fj(k) 3

,

(5)

in which fi(k) is the approximated costs for neuron i at time step k. This procedure converges to the optimal cost function f* if the goal state is initialized to zero and 7 < 1.

236 During exploration, collisions with obstacles are stored as well, so for each state i we can make an estimate of pi,cot(U), defined as the probability that given the state (neuron) i the vehicle will collide with an obstacle after action u. Note that the collision is seen as a discrete state rcoz, which also has a value for the cost function f*ot. In our model the cost for the collision neuron is always fcoz(k) = 1 + maxi fi(k), so the system will prefer actions which do not result in a collision.

6. E X P E R I M E N T S

ON PLANNING

Experiments were carried out in simulation. In our experiments, the set of possible actions b / w a s restricted to left, straight, right, indicating that the vehicle rotates along - 3 0 , 0 respectively 30 degrees, followed by a linear displacement of .75 m. The workspace of the robot is shown in Figure 4, which also shows a layer from the Kohonen network. S t a t e transition probabilities pij(u) were estimated by performing 100.000 random actions at random positions. The immediate costs c(u) of an action are equal to one for each action. We tested the planning algorithm by selecting an a r b i t r a r y goal state in the sensor domain. The vehicle was put at an arbitrary position and the cost function given in Eq 5 was iteratively determined. As soon as the current neuron (the neuron which is active at the current position) has neighbours with lower cost, a steepest descent m e t h o d can be applied: the action is selected for which the expected future costs are minimal. Note that this does not necessarily mean that the function f(k) at time step k has converged to the optimal solution f*(k). We therefore tested how many iterations were needed for a correct path. Results are shown in Figures 5 and 6. The lower part of Figure 5 shows the path of the vehicle if it starts moving after 10, 30, 40 or 200 iterations of the dynamic programming algorithm. The upper part shows the number of elementary actions of the vehicle which are needed to reach the goal state. If the number of iterations is less than 50 the vehicle does not reach the goal but collides with an obstacle. Figure 6 shows a more dimcult situation: the vehicle has to make a turn in a narrow alley. At least 50 iterations are needed before the cost function is sufficiently good for collision free navigation. Remarkable is that the path length increases after 300 iterations (while a good, short path was found after 50 iterations). The reason for this is our choice for the costs for the collision "neuron", which is always larger than the costs of any other neuron and therefore contributes (too) much to an expected cost assessment. The longer path is apparently a result of the trade-off between minimizing the probability on a collision and minimizing the expected number of actions needed to reach the goal.

7. D I S C U S S I O N Results show that an internal representation of the sensor space, acquired by an exploration of the environment, can be used for path planning and navigation. The sensor space of the robot system consists all possible constellations of sensory data which can occur in a given static environment of the system. In this paper a Kohonen network is used for the nonlinear mapping from the high-dimensional sensor space to the 3-dimensional network. The use of such a projection also enables us to take into account the uncertainty

237

Figure 5. The upper part shows the number of actions needed to reach the goal. A black disk indicates termination of navigation because of a collision. The lower part of the figure shows some typical paths of the vehicle if it starts moving after 10, 30, 40 or 200 iterations of the dynamic programming algorithm.

238

Figure 6. As in the previous figure. The lower part of the figure shows some typical paths of the vehicle if it starts moving after 10, 50, 200 or 300 iterations of the dynamic programming algorithm.

239 of the sensor data. If the data of one of the sensors is more uncertain than the data from another sensor, we can take this into account by lowering the contribution fi of this sensor to the distance measure which is used to determine the winning neuron. Some shortcomings of the approach have shown up which will have to be studied in the near future. One of the major drawbacks of the approach presented in this paper is that many learning samples are needed: the Kohonen map already needs about several thousands of samples, but to estimate the transition probabilities we need even more samples. In a more efficient approach both the Kohonen map and the transition probabilities could be learned at the same time, while a more realistic learning procedure the learning samples will not be generated by random placement of the vehicle but by an exploratory navigation.

Acknowledgements The authors thank Leo Dorst for discussions on the shape of the sensor data subspace and his help in plotting the space for a simple situation. The work was carried out partly within the project "Active Perception and Cognition", supported by the Japanese RWC programme.

REFERENCES 1. Glasius, R., Komoda, A., Gielen, S. Biologically Inspired Neural Networks for Trajectory Formation and Obstacle Avoidance. in: S. Gielen and B. Kappen (eds) ICANN'93, Proceedings of the International Conference on Artificial Neural Networks. Springer-Verlag, (1993) 281-284. 2. Won Jang and Zeugnam Bien, Feature-based Visual Servoing of an Eye-In-Hand Robot with Improved Tracking Performance. Proceedings of the 1991 IEEE Int. Conf. on Robotics and Automation, (1991), 2254-2260. 3. Kurtz, A. Building maps for path-planning and navigation using learning classification of external sensor data. in: I. Aleksander and J. Taylor (eds): Artificial Neural Networks 2, North-Holland/Elsevier Science Publishers, Amsterdam, (1992) 587-591. 4. Latombe, J.C. Robot motion planning. (1991) Kluwer Academic Publishers, Boston. 5. Martinetz T, Schulten K. Hierarchical neural nets for learning control of a robot's arm and gripper. IJCNN-90 Conf. Proc., San Diego 1990, III, (1990) 747-752. 6. Morasso, P., Vercelli, G. and Zaccaria, R. Hybrid systems for robot planning, in: I. Aleksander and J. Taylor (eds): Artificial Neural Networks 2, North-Holland/Elsevier Science Publishers, Amsterdam, (1992) 691-697. 7. Najand, S., Zhen-Ping Lo, Bavarian, B. Application of Self-organizing neural networks for mobile robot environment learning, in: G. Bekey and K. Goldberg, Neural networks in robotics, Kluwer Academic Publishers, Dordrecht, (1992) 85-96. 8. Ritter, H, Martinetz T, Schulten K. Topology conserving maps for learning visuomotor coordination. Neural Networks 2, (1989) 159-168. 9. P.P. van der Smagt and B.J.A. KrSse. A real-time learning neural robot controller. In: Kohonen, M/ikisara, Simula and Kangas (eds): Artificial Neural Networks. Elseviers Science Publishers B.V. (1991) 351-356.

240 10. J.M. Vleugels, J.N. Kok and M.H. Overmars A self-organizing neural network for robot motion planning, in: S. Gielen and B. Kappen (eds) ICANN'93, Proceedings of the International Conference on Artificial Neural Networks. Springer-Verlag, (1993) 281-284. 11. Weiss, L.E., A.C. Sanderson and C.P. Neuman. Dynamic sensor-based control of robots with visual feedback. IEEE Journal on Robotics and Automation, RA-3, (1987) 404-417.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

241

Path Planning and Guidance Techniques for an Autonomous Mobile Cleaning Robot Christian Hofner and G/inther Schmidt ~* ~Department for Automatic Control Engineering (LSR), Technische Universit~t Mfinchen, D-80290 M/inchen, Germany P h o n e : (+49 89) 2105-3416, F A X : (+49 89) 2105-8340 E-mail: [email protected] K e y w o r d s : Cleaning path planner, self-localization, sensor application planner, tracking error compensation, obstacle avoidance. Application of freely navigating mobile robots to service tasks such as floor cleaning or inspection requires specific techniques for path planning and vehicle guidance. This article describes in its first part a rule-based geometric path planner. In addition to a 2D environmental map the planner takes into account robot kinematics and geometry as well as technological requirements. The result of path planning is a list of motion commands which leads during path execution to a most complete coverage of a user-specified cleaning area. In its second part the article discusses problems of vehicle guidance including initialization of the robot's start location, cleaning path execution, accurate path tracking, vehicle localization as well as path replanning in case of obstacles. The article concludes with a discussion of implementation issues and results of a floor coverage experiment with the full-scale mobile robot MACROBE. 1. I N T R O D U C T I O N Many challenging research problems arise in the application of freely navigating mobile service robots to floor cleaning or related tasks in extended public areas such as corridors, halls or platforms [11]. Although most of todays cleaning and sweeping machines are still guided by human operators there exist already several semi-autonomous cleaning robots for commercial usage [12]. These systems assist a human operator in cleaning of rectangular or polygonal shaped areas. Obstacles cause stops, until either the operator or the obstacle clears the blocked path. A few publications [4],I2] discuss application of a mobile robot to the cleaning task without making use of explicit a-priori knowledge about geometry and complexity of the cleaning area. In spite of these developments there exist many open problems in intelligent path planning and vehicle guidance for cleaning robots due to advanced requirements as listed in Table 1. *The work reported in this article was supported by the Deutsche Forschungsgemeinschaft DFG as part of an interdisciplinary research project on "Information Processing Techniques in Autonomous Mobile Robots (SFB 331)". The authors would like to thank colleagues and students of the laboratory for their valuable contributions to this work.

242 Table 1 Advanced requirements for an autonomous floor coverage robot Task-related requirements Identification of the cleaning area Path planning considering technological and economical aspects Accurate long distance path tracking

Safety/operational requirements Man-robot-interface for operator assistance Collision and tumbling avoidance Detection of obstacles with the result of velocity adaptation and path replanning

Detecting, memorizing and cleaning temporarily occupied parts of the cleaning area

In [8], [9] we have introduced basic ideas of our planning and guidance approach for autonomous floor-covering operations in extended indoor environments. In Section 2 of this article we describe in more detail the rule-based planning system which automatically generates the motion command sequence for an appropriate cleaning path according to robot geometry and kinematic restrictions. Results from automatic planning are illustrated for robots with various degrees of agility in typical cleaning environments. In Section 3 we discuss a vehicle guidance scheme comprising regular robot localization, i.e. estimation of position and heading, as well as appropriate compensation of drift errors during path execution. Unlike obstacle avoidance approaches known from point-to-point operations we propose in Section 4 a new approach for detection and handling of obstacle situations during cleaning motion. In Section 5 we discuss the implementation of the total control structure as well as results from a floor coverage experiment evaluating the proposed path planning and tracking techniques. 2. P L A N N I N G

OF A C L E A N I N G P A T H

Reasonable guidance of a cleaning robot in real indoor environments requires an appropriately planned cleaning path. A few publications have already introduced grid-based [13] or rule-based [4] planners for complete floor coverage. However, a flexible path planner must pay attention to the technological feasibility of a cleaning path in real indoor environments and to problems arising during path execution. Thus robot geometry and agility as well as appropriate adaptation of the cleaning path to the specific environment must be considered. In this Section we describe a semi-automatic approach for generation of a robot motion program which ensures most complete coverage of a specified area.

2.1. Geometry and Agility of the Robot and its Cleaning Units To define relevant design parameters of mobile robots for floor coverage operations we refer to the features of typical cleaning machines as illustrated in Figure 1. Most of these conventional systems are equipped with two or more cleaning units [6]. The front unit comprises brushes soaked in detergent. They scrub the dirt from the floor. Dirt is then removed by a vacuum suction unit at the rear end of the vehicle. For path planning we will always assume rectangular shaped vehicles with car-like non-

243

Figure 1. Manual cleaning machine

holonomic kinematics but varying degrees of agility. Agility is characterized by the robot's minimum turning radius train. Furthermore we will assume the existence of two cleaning units mounted at the front and rear end of the vehicle being as wide as the robot itself, i.e. dd~an - b. 2.2. R e p r e s e n t a t i o n of t h e C l e a n i n g A r e a With respect to the environment we assume the existence of a 2D-map of a-priori known walls, pillars, staircases or other fixed objects. These items are represented by closed 2D-object polygons as illustrated in Figures 4b) and 5b). Planning starts with operator-based identification of the borders of the cleaning area in a visualized 2 D - m a p which is part of the planning system. Both, the borders and all object polygons inside the cleaning area are represented by contours. Each contour c is described by cartesian end points SPc and EPc with respect to an inertial reference system E, see Figure 3. 2.3. S e l e c t i o n of a Start L o c a t i o n To determine the alignment of motion patterns to be planned, the operator selects in the visualized map an appropriate start position and a reference contour. Typically a point at a corner and a neighbouring wall are chosen for this purpose. Based on this information the robot's start location S as illustrated in Figures 4b) and 5b) is automatically calculated. S is defined by an initial path frame/9o = (x0, y0, r representing vehicle position (x0, y0) and heading r with respect to E.

244 2.4. Automatic Planning of a C l e a n i n g P a t h For satisfactory floor coverage a cleaning path consists typically of overlapping parallel tracks. Two basic changing maneuvers are used for connecting neighbouring tracks. 9 U - T u r n defined by desired turning radius rturn and a turning angle of 180 ~ 9

S i d e - s h i f t defined by a concatenation of two circular arcs with different sign of

curvature U-turn maneuvers are preferred for path planning, because no change in motion direction is required and operating time as well as floor coverage redundancy can be reduced. Because of vehicle swing out during U-turns, safety distances between objects and the vehicle have to be considered [8]. On the other hand Side-shift maneuvers are preferably employed in narrow areas or in case of a lateral approach to a wall. Once all parameters of the robot and its environment are defined the planning procedure a u t o m a t i c a l l y constructs a technologically feasible cleaning path based on basic path planning templates.

2.5. Basic Path Planning Templates Five templates proposed for path planning are defined in Table 2 and Figure 2. Table 2 Basic path planning templates Template TM UT SS UT-Stretched UT-Squashed

Purpose Move Move Move Move Move

straight straight straight straight straight

forward/backward forward and change track using a backward and change track using forward and change track using a forward and change track using a

simple U-turn a Side-shift stretched U-turn squashed U-turn

Application of a certain template depends on the robot's agility (rmi,). For this reason, path planning distinguishes two types of robots. Vmin -< rt~,rn rturn < r m i n <~ 2 " r t ~ n

" Type I robot I with 9 Type II robot f

vtu,.n - dclean -- doverlap

(1)

2

rt=rn is the radius required for a U-turn to connect neighbouring tracks. Each track has an efficient cleaning width ddean including an overlapping width d.... tap. Most complete floor coverage is achieved for both Type I and Type II robots, if the templates as illustrated in Figure 2 are used for path construction. The determination of template parameters is discussed next.

245

Figure 2. Templates for path construction With each template a geometric motion corridor (rectangular, circular or combined) is defined. It is used for map-based collision analysis between the robot and known contours. The result is a list of discrete subgoals Pi = (zi,yi,r defining a collision-free section of the robot's motion path. The procedure of subgoal calculation is discussed in more detail for the UT-template as shown in Figure 3.

Figure 3. Collision analysis for UT-template

246 Table 3 Algorithm for subgoal calculation in case of a UT-template

(1)

Define a rectangular motion corridor RC with width dclean from current robot location Pi-1. Transform contours with respect to Pi-1. Calculate critical collision point Ccrit - (Zcrit,Ycrit) between contours c and RC with respect to Pi-1. For a circular motion corridor CC with radius rcc (1) calculate its center point CP for Cc~it intersecting CC. The lateral displacement of CP with respect to Pi-1 is given by rturn. Consider contours to be part of CC. Determine critical contour by calculating point Ccoll which is part of this contour while having the smallest radial distance to CP. Calculate the vertical displacement Az of CP for CC touching the critical contour. If Ax > v e t then return FAILURE and stop else calculate Pi relative to Pi-1 " Pi - (xi,yi,r and return SUCCESS. In case of SUCCESS calculate subgoals Pi+l - (rtur,,, -+-rtu.,, +90 ~ and Pi+2 = (rtu~,,,-+-rt,,~,,, • ~ to complete the collision-free U-turn description. r~ depends on the vehicle's geometric parameters and rt=~,,

Assuming that the robot is currently located at Pi-1, the last subgoal from application of a preceding template, collision analysis is performed by the algorithm given in Table 3. From the subgoals for the UT-template illustrated in Figure 3 the planner generates the following list of motion commands for a mobile robot with a cartesian motion controller [10]. M OV E_REL ATIVE (zi, MOVE_RELATIVE(zi+I, MOVE_RELATIVE(zi+2,

yi,

r

Yi+I,

r r

Yi+2,

FORWARD, FORWARD, FORWARD,

i) i+l) i+2)

2.6. Automatic Selection of Basic Path Planning Templates A complete cleaning path as illustrated in Figures 4b) and 5b) is constructed by successive selection of basic path templates according to a logical sequence of rules. The rule sequence for a Type I robot is illustrated in Figure 4a) and can be easily verified by the planning example given in Figure 4b). From the operator-defined start location S = P0 inside the selected cleaning area the appropriate shift direction is determined. For the first section of the path in Figure 4b) the planner selects a UT-template with shift direction to the right. If the UT-template algorithm returns SUCCESS, the robot will be located at P3. From P3 the new shift direction is determined and again a UT-template is selected. The concatenation of feasible UT-templates leads automatically to a Snake-trail pattern with overlapping tracks, see Figure 4b). If a U-turn cannot be performed due to environmental restrictions such as a side wall, a SS-template is called upon. The radius

247 of the two arcs is automatically calculated for a collision free track change. Successful application of SS-templates leads to a Side-shift pattern, see Figure 4b). Application of a SS-template fails, if a lateral border of the cleaning area is detected inside the specified motion corridor. In this case floor coverage is assumed to be completed and path planning is terminated.

Figure 4. Construction of a complete cleaning path for Type I robot : (a) Outline of template selection procedure (b) Path planning example

Due to their size, weight and velocity Type II robots usually operate in extended cleaning areas. To achieve reasonable floor coverage with this type of robot the set of path templates needs to be extended by the so-called stretched UT-template and squashed UT-template as illustrated in Figure 2. Application of this set of templates requires a more sophisticated selection procedure, given by the rule sequence for Type II robots. In Figure 5a) only the major selection cycle is illustrated which concatenates feasible path templates to an interlaced Snake-trail pattern. This is equivalent to the UT-template cycle in Figure 4a). It is obvious that additional template selections are required for completing the cleaning path, if application of a template returns FAILURE. Application of interlaced Snake-trail patterns to cleaning areas with complex structured boundaries leads to unsatisfactory floor coverage. To cope with this drawback, the planning procedure tests before and after each feasible U-turn, whether an additional forward/backward track motion can cover additional floor by using a TM-teraplate. Examples are shown in Figure 5b).

248

Figure 5. Construction of a complete cleaning path for Type II robot: (a) Major cycle of template selection procedure (b) Path planning example 2.7. Evaluation of Floor Coverage The ultimative objective of autonomous floor cleaning is complete coverage of a specified area at minimum operating time. However, technological restrictions and navigation in real indoor environments make it rather difficult to specify an "optimal" cleaning path. Experience has shown, that reasonable floor coverage is achieved by a cleaning path composed of appropriate Snake-trail and Side-shift patterns. To evaluate floor coverage for a planned cleaning path the following performance measures are defined m

top [ m i n ]__ ~ s , ~ i=1

qtoua

[%]=

Vi

100.

cov

q~

+

(2)

taccj j=l

Acovered A~pr162 Aco~,e,-~d

(3) (4)

top

[%]-

loo.

Apath

A~o~

_

_

~)

(5)

top represents the operating time for floor coverage and is calculated according to the velocity profile of the robot along its planned path. This profile considers path sections si with constant velocity vi, as well as (de-)acceleration phases tacc~ due to kinematic restrictions and changes in motion direction.

249 qtotal represents the ratio of the actually covered area A . . . . . . d and the specified cleaning area Asp~. Due to complexity of motion patterns and the modelled environment, qtot~t is estimated by segmentation of the visualized cleaning area with methods known from image-processing. Cleaning performance pco~ is a common term in cleaning technology and represents the floor space in square meters which can be covered within one hour. qred considers the ratio of Acovered and the area covered according to the planned cleaning path Apath. This measure indicates redundancy caused by overlapping tracks, interlaced Snake-trail and Side-shift patterns. Ap~th is determined as follows .

circles m

n

i=1

j=l

(6) tracks with Ai

=

dd~an . li

A,

-

n (~'jec,~an -- h ) + (R '~ + h') ~ t ~ .

R

=

h,f

:

dd~n rtu,-n+-2

dd~

-k

+ T " (f" -

<'~~176

distance from robot's turning axis to its rear end, resp. front end

:

efficient cleaning width

l,

:

length of a single track

~j, rt ....

:

turning angle, turning radius

Table 4 Results from floor coverage evaluation Geometric data Path length Average velocity Specified floor Covered floor Coverage along the path

1 ~ A.,,e~

A ....... d Avao~

[m] [mm] 8 [rn2] [m 2] [m 2]

Performance value Operating time Total coverage Coverage redundancy Cleaning performance

top qtolat q,-ed pco,,

[min] [%] [%1 [-~]

Figure 4b)

Figure 5b)

236.6 430 124.6 113.1 122.8

144.6 460 124.6 101.1 158.3

Figure 4b)

Figure 5b)

9.5 90.8 8.5 715.7

6.4 81.0 56.6 953.0

250 The described path planning system automatically generates performance values for a planned cleaning path. Table 4 summarizes evaluation results for the examples of Figures 4b) and 5b). Analysis of the quality values indicates, that achieved floor coverage and cleaning performance depend on the specific environment and the appropriately selected robot technology (geometry, agility etc.). The proposed quality measures strongly support further improvement of path planning techniques with respect to the ultimative objectives of cleaning operations. 3. V E H I C L E G U I D A N C E There are two important issues to be considered for successful nominal execution of a planned cleaning path : 9 The specified start location S of the cleaning path from an arbitrary robot location in the vicinity of S must be found, an operation which we call initialization. 9 Path errors must be compensated in order to achieve satisfactory floor coverage over long distances. Both operations are supported by a combined sensor and map based self-localization procedure. 3.1. I n i t i a l i z a t i o n of R o b o t M o t i o n Cleaning motion starts at the selected start location S of the planned cleaning path, e.g. a room corner as illustrated in Figures 4b) and 5b). Since a human operator cannot be expected to position the robot precisely at S, but somewhere in its vicinity, selflocalization is performed and the robot maneuvers to S. The maneuver is planned and executed by a local path planning and execution module which is part of the robot control system as described in [1]. Self-localization performed during initialization and path execution is briefly discussed in the following Section. 3.2. S e l f - l o c a l i z a t i o n For estimating robot position and heading in the environment, natural landmarks detected by sensors are matched with corresponding contours in the 2D-map. For landmark detection ultrasonic sensors with a maximum range of 1.5 m [3] are used. A minimum configuration of eight sensors mounted around a rectangular shaped robot is illustrated in Figure 6a). Taking into account typical problems arising with ultrasonic range sensors, a reliable absolute localization technique as reported in [7] was adopted for vehicle guidance. The main processing steps for estimating robot location L = (ZL, YL,r by means of ultrasonic range measurements and dead-reckoning data are summarized in Table 5. Position and heading of the robot cannot be completely determined if either the robot moves along or towards a single contour or if faulty sensor measurements have to be eliminated. In this case missing components of L are replaced by the corresponding values of the dead-reckoning system. This avoids undefined situations when path tracking errors must be compensated.

251 Table 5 Main steps of location estimation Self-localization starts with a rough estimate of the robot's current location characterized by an uncertainty area E, see Figure 6a). 9 For robot initialization, E is parameterized by the path planner around the selected start location S. 9 During path execution, E is parameterized around the current robot location 0 = (ZDn, YDR,CDR), given by the dead-reckoning system. .

.

W h e n approaching a landmark a correspondence tree as shown in Figure 6b) is generated. Each branch represents the set of map contours detectable by relevant ultrasonic sensors. Different branches are generated by variation of robot location in E. By use of appropriate models of the sensors' field of view, the number of possible branches can be kept small. Based on map and current sensor data a possible robot location and a quality measure is calculated for each branch using least-squares-fitting, see [7]. The robot's true location L = (XL,YL,~ZL) is selected from the set of possible locations by use of the quality measures.

Figure 6. Self-localization at natural landmarks: (a) Ultrasonic range sensor configuration (b) Correspondence tree

252

3.3. Automatic Planning of Sensor Application Automatic planning of sensor application, i.e. selection of appropriate localization points, is supported by the subgoals already determined during path planning. Since subgoals are typically located in the vicinity of landmarks such as walls or pillars, they are easily detectable by ultrasonic sensors. Localization points are selected from subgoals according to the following criteria: Landmarks in the vicinity of a subgoal must be detectable by at least two sensors. This is a prerequisite for deriving a good estimate of current heading which is a most sensitive parameter for accurate path tracking during cleaning. For checking this condition the set of intersections between the modelled sensors' field of view and the respective contours of the 2D-map is tested for the robot positioned at the subgoal, see Figure 6a). Path length between two localization points should not exceed a threshold value given by the errors of the dead-reckoning system and the specified overlap dowrtap of neighbouring tracks. If a subgoal meets the above requirements, the corresponding motion command is marked by a so-called localization flag. If such a command is passed to the cartesian motion controller during path execution, vehicle guidance initiates self-localization when the robot approaches the subgoal. Localization points as selected by automatic sensor application planning are given in Figure 9b).

3.4. Compensation of Path Tracking E r r o r s For compensation of tracking errors during path execution, the following data are available 9 current subgoal Pi = (xi, yi, r 9 current location L = (xL,YL, eL) estimated by self-localization, see Section 3.2 9 current location O = (XDR, YDR, ~,3DR) provided by the dead-reckoning system As all data refer to the same reference system E, transformation vectors -~L between L and P as well as -~ORbetween O and P are calculated. The robot's errors in heading and position are then given by e = ]-~L- ~-DRI" Error compensation is activated according to the following criterion: 3

Compensation

YES

NO

if

V ISt,- 5DR,I > ~ , i--1

(7)

otherwise

The threshold ~max is known from experiments. Errors are compensated during robot motion by appropriate modification of subgoal values generated by the path planner. If localization fails locally, compensation is not initiated and the robot continues its path to the current subgoal. A more detailed description of this procedure is given in [81.

253 4. O B S T A C L E A V O I D A N C E Obstacle avoidance strategies are constrained by requirements of satisfactory floor coverage. This means among others, that temporary obstacles should not cause immediate path modification, while permanent obstacles should be passed according to task-related requirements. 4.1. O b s t a c l e D e t e c t i o n A structured-light CCD-sensor as described in [5] is used for obstacle detection. It provides 2D line segments in a 3m * 3m field of view in front of the vehicle. According to Figures 7a) and 7b) a line segment represents an obstacle, if it intersects the surveillance corridor (SC) of the robot moving to its current subgoal. Detected objects, which will never be reached by the robot along its planned path, are not considered as obstacles.

Figure 7. Obstacle avoidance: (a) Obstacle detection (b) Replanned cleaning path after map update

For a blocked path the robot estimates collision distance d. If d < dcrit, a softstop is performed. The robot slows down smoothly, but does not stop in case of temporary obstacles, while a safe stop occurs in front of permanent obstacles. If the path is not cleared within a given time, path replanning is initiated.

254

4.2. Path Replanning The path planner as described in Section 2.4 is used online to construct a new cleaning path. Passing an obstacle on this path complies with the requirements of good floor coverage. For path replanning the map of the cleaning area is updated by the line segments representing an obstacle. Current robot location is considered as the new start location S*. A simulation result of path replanning and resulting floor coverage with a Type II robot is shown in Figure 7b). 5. R E S U L T S

5.1. Implementation Issues MACROBE, which must be classified as a Type II mobile robot, is used for experimental validation of the proposed planning and guidance techniques. The overall control structure is shown in Figure 8. Two cascaded loops of cyclic perception, planning and execution ensure both accurate long distance path tracking and adaptive cleaning motion in case of obstacles. The gray shaded blocks have been implemented on a notebook computer.

Figure 8. Overall control structure for floor coverage robot

255

5.2. Floor Coverage Experiment The following experiment was performed in a FMS pilot factory. A view into the cleaning environment is shown in Figure 9a), the corresponding map and results from planning are presented in Figure 9b).

Figure 9. Floor coverage experiment" (a) Cleaning environment (b) Path planning results

Using a joystick, the operator positions the robot in the vicinity of S. Next the robot initializes its location and maneuvers to S. The result of initial self-locahzation in Table 6 demonstrates, that the operator had positioned the robot with deviations up to lm in motion direction and up to 0.5m in lateral direction with respect to S. During path execution the tracking error is determined, whenever the vehicle approaches one of the planned localization points LP 1 to LP 4, see Figure 9b). If the error threshold Cma. = (Az, Ay, A r (50 mm, 50 mm, 0.5 ~ is exceeded, compensation is initiated. Numbers given in Table fi demonstrate, that path tracking errors are kept small and overlapping of neighbouring tracks is guaranteed. The '*' indicates, that at LP3 and LP4 the robot's lateral displacement and lateral error cannot be determined by self-localization. Processing time for location estimation is in the order of 300 msec on a 48fi/33-MHz computer. This includes generating the correspondence tree as well as calculating the

256 Table 6 Initialization and path tracking errors Task Initialization Tracking Tracking Tracking Tracking

(LP (LP (LP (LP

1) 2) 3) 4)

Lateral error [mm]

Longitudinal error [mm]

Orientation error [o]

Compensation

-963

+510

+1.1

yes

+2 +45 * *

* +2 +10 -10

-0.4 -0.5 +0.9 -2.7

no yes yes yes

current location. Experiments demonstrate, that vehicle guidance satisfies real time requirements for velocities up to 500 ram. This is a velocity typical for conventional cleaning machines. 8eC

Figure 10. Achieved floor coverage in the experiment

Figure 10 illustrates floor coverage achieved in the experiment according to recorded data based on MACROBE's dead-reckoning system and the localization module. This and further experiments [9] have demonstrated that the presented planning and guidance approach meets major requirements as given in Table 1.

257 6. C O N C L U S I O N S This paper proposed an automatic path planner which generates a technologically feasible cleaning path and the corresponding list of cartesian motion commands. Execution of this path leads to most complete floor coverage both in narrow and wide cleaning areas. Stable path execution is supported by regular self-localization at preplanned points in the vicinity of natural landmarks. Obstacle detection and avoidance is achieved by use of a new structured-light CCD-sensor and online application of the proposed path planner. Experiments in real-world environments have demonstrated that the implemented sensor and control approach can improve capabilities of current cleaning machines at reasonable additional cost. Further work will focus on intelligent behaviour of the robot in case of obstacles. For example, memorizing of temporarily inaccessible subareas can help to cover remaining uncleaned floor after completion of the nominal cleaning task. REFERENCES 1. Bott, W. et al: A utomatische Planung und A usfiihrung lokaler FahrmanSver fiir Roboterfahrzeuge; VDI-Bericht, Nr. 1094, Diisseldorf, Germany, 1993, pp. 763-772 2. Burhanpurkar, V.P. : Design of Commercial Autonomous Service Robots; Publication of Cyberworks Inc., Ontario, Canada, 1993 3. FrShlich, C. et ah Multisensor System for an Autonomous Robot Vehicle; Information Processing in Autonomous Mobile Robots; Proc. of the International Workshop, Mfinchen, 1991, Springer Verlag, Heidelberg, pp. 61-76 4. Furuhashi, H. et al: Development of an Autonomous Cleaning Robot with an External Power Cable.; Proc. of the International Conference on Motion and Vibration Control, Yokohama, Japan, 1992, pp. 499-504 5. Gilg, A.: Laser-Lichtschnittsensor mit homogener AuflSsung dutch asphSrischen Spiegel; Internal report, Dept. for Automatic Control Engineering, TU Mfinchen, Miinchen, Germany, 1993 6. HAKO-Company: Information brochures; Bad Oldesloe, Germany, 1994 7. Hanebeck, U.: Kurziibersicht zur Lokalisierung mit Hilfe yon UltraschaUsensorik; Internal report, Dept. for Automatic Control Engineering, TU Mfinchen, Mfinchen, Germany, 1993 8. Hofner, C.; Schmidt G.: Fl~ichendeckende Kursplanung und Fiihrung autonomer Reinigungsfahrzeuge; Proc. of the 9th Workshop on AMS, Mfinchen, Germany, 1993, pp. 27-38 9. Hofner, C.; Schmidt G.: Path Planning And Guidance Techniques For An Autonomous Mobile Cleaning Robot; Proc. of the International Conference on Intelligent Robots and Systems (IROS), Mfinchen, Germany, 1994, Vol. 1, pp. 610-617 10. Horn, J. : Cartesian Motion Control of a Mobile Robot; Proc. of the Intelligent Vehicles 94 Symposium, Paris, France, 1994, pp. 443-448 11. Schraft, R.D. et ah Service robots : The Appropriate Level of Automation and the Role of Users/Operators in the Task Execution; Proc. of the International Conference on Systems Man and Cybernetics, France, 1993, Vol. 4, pp. 163-169 12. Villmer, F.-J.: Einsatzfelder und Applikationen yon Dienstleistungsautomaten in der Reinigung; Proc. of the IPA-Technologie-Forum, Innovative Technologien ffir Dienstleistungen, Stuttgart, Germany, 1994, pp. 81-120 13. Zelinsky, A. et al: Planning Paths of Complete Coverage of an Unstructured Environment by a Mobile Robot; Proc. of the ICAR 93,1993, pp. 533-538

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

259

Grasp Strategies for a Dextrous Robotic Hand Kurt Woelfl and Friedrich Pfeiffer Lehrstuhl B fiir Mechanik Technische Universitgt Miinchen 80290 Munich, Germany

Abstract The grasp strategies presented in this paper were developed at the Lehrstuhl B [iir Mechanik for the T U M Hand, a dextrous hydraulic hand with four fingers. They cm'er three main aspects of grasping: the calculation of the grasp points and fi.nger forces, the positioning of the object wit hil~ the ha.nd, and the regrasping of the object within the hand. A decompostion of the finger forces into their normal and tangential, rather than internal and manipulation components it used together with a descriptio~ of the object as a generalized ellipsoid. Once the grasp points and the finger forces are known, the probiem of correctiy placing the object within the hand can be considered. In order to maximize the range of possible applications for the T U M Hand, a. stra.teoiv is presented which allows the regrasping of an object. During regrasping, a new grasp .it established on the object while it remains firmly hehl in the hand.

Introduction Since the development of the Stanford/JPL Hand [10] and the U t a h / M I T Ha,nd [1], the dextrous manipulation of objects has emerged as a new field in robotics. \\:ork done in this area can be devided into three groups: the development, of robotic hands, the control of robotic hands, and the design of grasp strategies. In recent years, grasp planning for dextrous hands has become a new area of research for robotic applications. If dextrous hands are to find industrial applications, grasp strategies are needed to automate t he grasping process. These strategies must not, only calcul~te the finger forces necessary to manipulate the object [9], but also locate the fingers on tlle object in such a way that a st al)le grasp can be achieved [8]. With a few exccptions [6], the work on grasp planning llas focused on one aspect or the other. In this paper, a grasp strategy is demonstrated which accomplishes both tasks. Given the desired external forces on the object and the object geometry, llle strategy calculates the grasp points and the finger forces necessary to achieve the desired external wrench on the object. Also critical to automatic grasping processes is the positioning of the hand in such a, manner that the object can l~e efficiently grasped. Both the position and the orientation

260 of the hand relative to the object play an important role in reaclling the desired grasp. This paper describes a method in whicl~ both the orientation of the objc'ct within the hand and the position of the ol)ject relative to the hand are determined, so that all the fingers can reach the desired grasp points. During complex manipulalions, it is often necessary to change the grasp of the hand on the object. A method is presented which allows a free finger to grasp tlle object at a new point and thus freeing one of the other fingers from its contact with the object. During the entire regrasping process, the object remains firmly held within the hand.

Description 2.1

Object

of the Object

as a Generalized

and the Finger

Force

Ellipsoid

Throughout the work presented in this paper, the object's surface is described as a generalized ellipsoid:

a.~,

ay

clpy iz cztp +

az

-

1, Pi

> 0

}

(1)

and a z are the half-axis lenghts. With powers Pi - 2 , (i - x , y , z ) t h e equation describes an ellipsoid, whose center is located at ( c x , Cy, Cz). It is always assumed that the origin of the body-fixed coordinate system is located at the center of the object, so that c x - c y - c z - O. h lcreasing the powers Pi of the generalized ellipsoid makes the object more rectangular. ax, ay

K

Figure 1" The Geometrical Parameters Using Equation 1, each point on the ol)ject's surface can be described as a function of the two angles C and ~ (see Figure 1). The cartesian coordinates are then described by:

y -

z -

r(C,~)sinCcos

(2)

261 Substituting these equations into equation (1) leads to

~(r ' ~) cos r cos ~ ]p~ ax

+

r(r ' ~) sin r cos ~ Ipy ay I

+

~(r ~)si~ ~ Ip~

'--

-t

(3)

When the two angles r and ( are known, the distance r(r ~) to the point can be found by solving this equation numerically. The cartesian coordinates of the point can then be found through Equations (2). The base-vectors which are normal (n) and tangent (ell , el2 ) to the surface are then described by:

(~r ell

=

~5--7" =

~ ( r ~~) ~in r ~o~ ~ + ~(r ~) cos r cos

~r

~(r ~~)cos r Cos ~ - ~(r ~) cos r sin e~2 =

n

=

(~r

5~-

5r(r ~) sin r cos ~ - r(r ~) sill r Sill

(4)

&(r

6tlet2

Every generalized ellipsoid can thus be described as a function of the two paralnters and 4. With these two angles, the cartesian coordinates of the point and the base-vectors normal and tangent to the surface at this point can be found. In order to demonstrate lhe effect of increasing the power of the generalized ellipsoid, Figure 2 shows an ellipsoid with powers Pi = 2, Pi = 4 and Pi = 10. The transformation of the ellipsoid to a rectangular solid can be clearly seen.

2.2

Decomposition

of the Finger Forces

In most of the past research performed on grasping an object with a multi-fingered hand, the conta.ct finger forces have been divided into two components, the manipulation force and the internal force. The manipulation force, introduced by Yoshikawa and Nagai [12], is defined as the force wllicl~ generates the required external force on the object. The internal force was originally defined by Salisbury [10], as the component of the finger force which exerts no net resllltant force on the object. Although variations of these definitions have been proposed [13], most researchers decompose the finger" forces into these two components (for" examples see [3] [4]). This paper proposes the llse of a. novel decomposition of the finger forces into its normal and tangential directions, a met llod supported by [2]. Because the finger forces rely on the contact between tlle finger and the object, this decomposition not only allows physical insight into the roles of the normal and tangential (frictional) forces but also simplifies the mathematical description of such terms as the pressing of the finger against the object, or the slipping o[ l lie finger oll the object's surface.

262

Figure 2: Generalized Ellipsoid with Increasing Powers

3

Grasp C a l c u l a t i o n

One major aspect of grasp planning involves the determination of the grasp points and the calculation of the finger forces necessary to achieve the desired external wrench on the object. A method is presented which, given the desired external wrenches on the object. and the object geometry, calculates not only the necessary finger forces, but also the points where the fingers should grasp the object. If the grasp points are already known, a very similar method described in [11] can be used. In this paper, it is assumed that each finger acts as a point contact with fl'iction and thus cannot exert moments. It is also assumed that the object geometry (as described in Section 2.1) and the desired external wrench which the hand should exert on the object are known. In the example presented, a. three-fingered grasp is assumed, so that the fourth finger of the TUM Hand remains available for possible regrasping.

263

3.0.1

The Optimization Problem

The grasp strategy presented here which calculates the finger forces and the corresponding grasp points can be described as an optimization problem with specific constraints. This problem is summarized in Table 1.

O p t i m i z a t i o n Criterion

n

2

G - ~ n _ 1 ~ j _ l(ifil _ ifj12)

2

~ mi.,~

Constraints Force Equilibrium Moment Equilibrium Contact

E n - l(fni + f t i ) - Fe - 0

En-

1 ri(fni + ft i) - M e - 0 fn i n i < 0

Friction Cone

IQil 2 - ~21f7~i12 < 0

Stability

I E n = l n i l <_ S

Separation Rectangular Solids

Iri- r j l - e m i n >-- 0

i# j

/~1--""IX~I

>

0

~-IX•

>

0

al

a2

Table 1" Optimization of the Finger Forces and Grasp Points

Optimization Criterion 9Similar to [9], the optimization criterion is so chosen that all the fingers exert the same magnitude of force. This criterion is used in order to ensule an optimal distribution so that no one finger assumes too large a share of the external load. By minimizing the difference in tlle magnitude of finger forces, the optimization criterion divides the external wrench evenly over all the fingers and subdues the tendency for one of the fingers to become overloaded. Force a n d M o m e n t E q u i l i b r i a 9 In order to assure that the fingers exert tile, desired external wrench, the force and moment equilibria are used as constraints for the above optimization. Here fn i and fti describe the normal and tangential co~nponents respectively of the the force for the i-th finger. Fr and Me are the desired external force and moment that the fingers should exert on the object, r i describes the vector from t]le center of gravity of the object to the grasp point of the i-th finger, and n i describes the vector normal to the object's surface at that i-th grasp point and always points outwards. These conditions ensure that the finger forces exert the desired force and t noment on the object. Although these equations describe static equilibrium, the dynamics of the object can be taken into account. (lurillg the path planning stage, and thus included in Fe an,:l Me.

Contact w i t h F r i c t i o n 9The contact, of the fingers on the object plays a.n obviously important role when grasping and manipulating a.n object. Contact wit.h friction adds

264 two constraints to the optimization problem: force direction and the friction cone. The decomposition of the finger force into it's normal and tangential components f~Lcilitates the understanding of the role of friction during grasping. Tile directional condition is an obvious one. It assures that the fingers remain in contact with the obj('.ct. T'he nornlal component of the finger force must always point inwards, towards the object. The second condition ensures that the finger forces remain within the friction cone so that the fingers do not slip on the object. The tangential component of the finger force should not exceed the m a x i m u m transferable tangential force, determined by #lfT~,il , where t I is the static coefficient of friciton. G r a s p S t a b i l i t y 9 The previous conditions ensure that the fingers exert the desired external wrench on the object., and that they do not slip on the object while doing so, but do not in any way control the stability of the grasp. Although there have been many different technical definitions of grasp stability [3], the definitions used most frequently are that of force closure and form closure (also known as complete restraint). Force closure arises when disturbance forces on the object act to maintain the contact. A grasp has form closure when it is able to resist any arbitrary disturbances. For example, if the wrench suddenly changes magnitude or direction, the grasp must be able to compensate for these changes so that the object does not fall out of the hand. A grasp with complete restraint is thus one which can exert any external force on the object. A fully stable grasp can then be considered as a grasp with form closure, and a stable grasp as one which is capable of resisting certain, but not all disturbances (force closure). In order to take this into account, a stability condition was incorporated into the optimization problem so that the grasp point can be chosen with varying degrees of stability. If one is very certain that the forces or moments will not change direction greatly, a less stable grasp may very well be the better grasp. A grasp can be chosen with force closure here, so that the small dist.urbances would tend to maintain the contact. The fingers can be located more efficiently on the object with a larger share of the finger forces being used to exert the desired wrench, and a smaller share being used t.o maintain the ability to compensate for non-exist.ant disturbances. The vectors normal to the object's surface at the grasp points provide a good insight into the stability of the grip: the smaller the sum of the vectors, the more stable the grasp. The grasp is less stable in the direction opposite the resulting sum, which lneans that it is less capable of resisting disturbances in t llat direction. However, the grasp can still be used in many force closure situalions. A sum of zero implies equal stability in all directions. Assuming the three points do not. lie on a line, a grasp with a sutn equal to zero has forl~l closure. The internal force with all fingers pressing eqaully strong corresponds exactly to the normal forces at tlle fingertips, wllich means that any tangential force necessary to overcome disturbances, can be brougllt back into the friction cone 1)3, increasing t.he internal forces. This stability condition can be described as follows: ..

I ~ nil <- S i-1

(5)

where S is the desired stability measure. For example, a sphere is grasped most stably with three fingers when the sum of the normal vectors is zero. With such a grasp all

265 external wrenches can be acllieved by adjusting the normal finger forces as necessary to keep the finger forces within the friction cone. As shown in Figure 3, when grasping a cube with three fingers however, it is impossible to grasp it in such a way that the sum is zero. In this case S - 1 describes the most stable grasp. With S - 1 it. is possible to have a grasp which can exert all external forces, but it does have a direction in which it is more difficult to resist disturbances, namely that in the direction opposite the sum of the vectors. On the other hand, disturbances which act in the direction of the sum tend to maintain contact.

Top View

Figure 3: Achievable S for a Sphere and a Cube

Side View

Figure 4: Achievable S for a Pyrami
This effect can be most easily seen on a pyramid (see Figure 4). It is impossible to grasp a p y r a m i d with three fingers in such a way that S = 0. In this example the sum of the vectors points towards the untouched face. If a disturbance force were to act in this direction, it would push the object towards the fingers and thus maintain the grasp. If a force were to act in the opposite direction however, it would be at best very difficult and under normal frictional conditions impossible to compensate for the disturbance. S e p a r a t i o n 9 This condition guarantees that a m i n i m u m separation e,,~i,~ is mainrained between the grasp points, so that the fingers do not come too close to one another. R e c t a n g u l a r S o l i d s 9 For rectangular solids, the conditions insure that the grasp points are not located too near the edges, where a small position error could lead to very large errors in the grasp. O111v the inner ~ fraction of the face can be used. Unknown Parameters 9 fn i, fti and r i are the unknown parameters which must be optimized. However, wilh a m a t h e m a t i c a l description of the object's surface, these vectors can be described as f,,nctions of the scalars fni, k li, k2i, (7,i, a.nd ~i"

f,,: -

f~i+f~i

= f n i n i + f t l i e t l i + ft2iet2 i ri - f((,i,#i) (Section 2.1) n i , e t l i , e t 2 i - f((,i,(i) (Section 2.1) Thus we have 5 , n parameters to optimize, where n is the n u m b e r of fingers grasping the object. The optimization is carried out using a, 'Successive Quadratic P r o g r a m m i n g ' procedure with a, core routine from the I~ISL Software Library.

266 3.0.2

Results

As an example, a sphere (radius 1 crn) is held against the forces of gravity (the hand should exert a net force of 1 N in the z direction and no m o m e n t s ) . A coefficient of friction # - 0.5 is assumed. In order to d e m o n s t r a t e the effects of the stability measure S on the optimization, two cases were performed, one with S - 0, and one with S _< 3. Tables 2 and 3 show the results of the optimizations. In each table, the first block of d a t a shows the values of the unknown p a r a m e t e r s delivered by the opt.imization. The second block shows the m a g n i t u d e of the calculated finger forces and the coefficient of fi'iction necessary to hold the object with the calculated grasp. The third block shows the grasp points and the forces and m o m e n t s (,hat each finger exerts on the object in Cartesian coordinates. As can be seen in both tables, the optimization provides finger forces which exert the desired wrench e x t r e m e l y well. Additionally, the tables show that no one finger assumes too large a share of the total force, nor does any finger slip (#n~ < #). 1] Finger 1 Finger 2 Finger 3 Sum 2.095 0.000 -2.09~ (red) -0.01J - 0 . 0 1 0 -0.011 .In (N) -1.01~ - 1 . 0 1 8 -1.01~ ftl (N) 0.000 0.000 0.000 ft2 (N) 0.323 0.323 0.323 Ifl (N) 1.0681.0681.068 ~nec [I 0.317 I 0.317 I 0.317 I [ z (m) -0.010 0.020 -0.010 y(m) 0.017 0.000 -0.017 (rad)

z (~)

o.ooo

o.ooo

o.ooo

fx (N) fy (N) fz (N) mx (Nm) my (Nm) mz (Nm)

0.507 -0.878 0.333 0.006 0.003 0.000

-1.014 0.000 0.333 0.000 -0.006 0.000

0..507 0.878 0.333 -0.006 0.003 0.000

~ (tad) ~ (red) fn (N) ftl (N) ft2 (N)

Sum

Ifl (N)

#nec

x (m)

y(m)

-0.008 0.000

z (~)

0.000 Ix (N) 0.000 fy (N) 1.000 fz (N) 0.000 mx (Nm) 0.000 my (Nm) 0.000 mz (Nm)

Table 2: Sphere held against gravity (S - 0)

[I Finger 1 Finger 2 Finger 3 0.000 2.095 -2.094 -2.56( - 2 . 5 6 0 -2.560 -0.36~ - 0 . 3 6 8 -0.368 0.000 0.000 0.000 -0.15~ - 0 . 1 5 7 -0.157

0.004 -0.007

0.004 0.007

-o.oo6

-o.oo6

-o.oo6

0.220 0.000 0.333 0.000 0.002 0,000

-0.110 0.191 0.333 -0.001 -0.001 0,000

-0.110 -0.191 0.333 0.001 -0.001 0,000

0.000 0.000 1.000

0.000 0.000 0,000

Table 3: Sphere held against gravity (S _< 3)

In order to b e t t e r visualize the effects of the stability measure, the results of the optimizations are shown in Figures 5 and 6. It is i m p o r t a n t to r e m e m b e r that the m e t h o d not only calculates the finger forces, but also the points where the object should be grasped. As can been seen, tile grasp in Figure 6 is less stable than that in Figure 5. The less stable grasp relies on the fact that the gravitational force holds the object in the hand and thus a larger percentage of the finger force can be used to generate the external force. The grasp points shifted towards the b o t t o m of the sphere, which allows the finger forces to point more in the direction of the desired external force. This makes the grasp more effecient, but if a disturbance would suddenly push the object upwards (opposite the direction of the sum of the normals), the grasp would be ill-conditioned to c o m p e n s a t e for this change and the object would most likely fall out of the hand. The grasp in Figure 5 on the other hand can hold the sphere in the presence of adverse disturbances and thus is fully stable.

267

Figure 5" Sphere held against gravity

(s-o)

Figure 6:

(s<3)

Sphere held against gravity

Depending on the knowledge of the environment and possible disturbances, an appropriate stability measure S can be chosen to best accomplish a desired task.

4

Hand

Placement

In order to automate the grasping process, a strategy which can orient and locate the hand in such a manner that. all three fingers can reach their designated grasp points is needed. The object has six degrees of freedom relative to the hand wlfich have to be limited in such a way that the grasp points are reachable. The method presented in this paper has four parts: the definition of the grasp-triangle, a rough hand orientation, the finger assignment, and, finally, an optimization of the hand orientation and distance to the object.

4.1

Definition of the Grasp-triangle

The three grasp points, which can be obtained from the method described in Section 3, always form a triangle (assuming a relatively stable grasp so that the points do not form a line) and thus define a plane. The points Q and R are defined as the points which form the shortest side of the triangle, while E is the remaining point. The center of the grasp G is defined as the point halfway between point E and the midpoint of the segment QR. The grasp coordinate system can then be defined as follows" x: points towards point E, lies in the grasp plane z: normal to the grasp plane, found by the cross product between x and the vector from E to (2. y: lies in the grasp plain, found by the cross-product between z and x. The grasp-triangle and the corresponding grasp coordinates are thus fully defined.

4.2

Rough Hand Orientation

A rough hand orientation then follows, in which different degrees of freedom of the object relative to the hand are eliminated. The palm of the hand is oriented, so that it is

268 parallel to the grasp plane and its center lies on the z-axis of the grasp coordinate systen:. Whether the hand should be located on the positive (case 1) or on the negative (case 2) z-axis depends on the desired wrench. The TUM hydraulic hand. similar to the hun:an hand, has the property that, when grasping an object, the fingers close towards the palm. The fingers thus usually exert, forces towards and not away from the pahn when grasping. The robotic hand should thus be located such that the external force, whicl: the fingers should exert on the object, points towards the palm. With this rough hand orientatiol/, four of the six degrees of freedom (two translational, and two rotational) are eliminate,l. Only two are left" the distance o, between the object and the hand, and the orientation :3 of the grasp plane with respect to the plane of the palm (both are parallel, but the object. can be rotated relative to the hand about the z-axis). Figure 7 shows the rough orientation of the hand and the grasp plane. The grasp plane formed by the points QRE is parallel to the palm of the hand with a separation c~. The grasp plane can be rotated about its z-axis where/3 represents the angle between the x axis of the palm and the y axis of the grasp plane.

Figure 7: Rougll Orientation of the Hand

4.3

Finger Assignment

After the rough hand orientation, each finger is designated a grasp point. The current TUM-Hand has four fingers, where three fingers are located opposite a.nother which acts as a thumb. During the assignment of the fingers, this geometry is used to find a finger for each grasp point. The two points Q and R, which form the shortest side of tl:e grasp-triangle, are assigned to the two outer fingers shown in Figure 7. Of course, ttte assignment of each finger depends on whether the hand is on the positive (Case 1) or t.he negative (Case 2) z-axis. Tlw last point, point E is then assigned to the thumb. Tl~e fourth finger remains free for regrasping.

4.4

Optimization of the Hand Orientation and Distance

Now that each finger has been assigned a. grasp point, the optimization of the hm:d orientation /3 and distance (~. can proceed. In this optimization, the hand orientation

269 and distance to the object are varied and optimized under consideration of two criteria. The first and most important criterion is obviously that all three fingers can reach their assigned grasp points without penetrating the surface of the object. If any finger cannot reach its designated point, the hand orientation and distance combination is considered invalid. Every valid combination is assigned a penalty value, depending on the finger angles and the direction of the object's velocity. The combination with the smallest penalty is then the optimal combination. The penalty function r is described by the following equation

- ~ lciD ~

(6)

i where the ki represents the penalty factor for the i-th finger angle, and the Di is a function of the i-th finger segment angle "/i and its velocity as shown below.

{ ~i-T1 i Di-

" :'/i > 0

7u i - T i l ("Yu i 17i - -~

" ;/i < 0 --

(7)

71 i )1 " "5~i - 0

7u i and 71 i are the maximum and minimum angle that the i-th finger segment can achieve. This function ensures that each finger has the maximum range of movement. If an angle is increasing, for instance, the optimum starting position is the smallest angle, so that the finger has a larger range motion before the maximum angle is obtained. If there is no motion, the optimum angle is chosen to be in the middle of the allowable range. For example, for an object trajectory in which a finger must move from right to left, the optimal combination is that combination in which the finger starts farthest to the left, thus allowing greater finger motion before the finger reaches its kinematic limits.

4.5

Results

The hand placement method can be coupled with the grasp calculation strategies described earlier. First, the grasp calculation strategy is used to determine the finger forces and the points at which the finger grasp the object. The calculated finger location can then be used as input for the hand placement method, in order to orient and locate the hand properly. The resulting system is one in which an object of defined geometry can be grasped 1)37 the TUM-Hand in such a way that the finger forces and grasp points achieve the desired external wrench, and that all three fingers reach the grasp points with a maximum range of motion for the desired trajectory. In order to demonstrate l llis procedllre, a. cylinder with a radius of 1 cm and a height of 1 cm is grasped and turned about its longitudinal axis in the abscence of gravity. It is assumed that the cylinder represents a mating part which needs to be screwed onto another piece. Thus, the hand must grasp the cylinder and exert, a moment of 0.1 N'm about the z-axis. A friction coefficient t l - 0.80 was assumed for this example. Table 4 displays the results of the grasp point and finger forces optimization. Using this data, the hand orientation was then carried through and found an optimal separation c ~ - 82 m'n~, and orientation ,3 - -45.0 ~ Figure 8 shows the cylinder being grasped by t.he TUM

270 Hand. The maximization of the range of motion can be clearly seen. The manipulation represents a turning of the cvlinder in a clockwise motion. As can be seen, the fingers grasp the object in such a way that a large clockwise motion is possible, before the fingers reach their limits.

~(rad) ~(mm) I~(N) ftl(N) ft2(N)

Finger 1 0.000 0.000 4.444 3.333 0.000

Finger 2 Finger 3 2 095 -2.095 0 000 0.000 - 4 . 4 4 4 -4.444 3 333 3.333 0 000 0.000

Sum

Ifl (N)

Pnec f~(N) fy(N) fz(N) mx(Nm) my(Nm) mz(Nm)

9

.75

-4.445 3.332

-( 1.663 O6

oooo

0.000 0.000 0.033

0.000 0.000 0.033

.

5.109 2.183

0.000 0.000

oooo

oooo

0.000 0.000 0.033

0.000 0.000 0.100

Figure 8: Cylinder being turned in the abscence of gravity (S = 0)

Table 4: Zylinder being turned in the abscence of gravity (S = 0)

5

Regrasping the Object

In order to perform more complicated manipulations, it is often necessary to change the grasp of the hand on the object. The method which is most often used by humans is the regrasping of the object within the hand. Typical examples are the turning of a pen or a ball within the hand. During the regrasping phase of the manipulation, a free finger makes contact with the object at a new point and thus establishes a new grip. One of the other fingers is now fi'ee to release the object and can in turn be used to establish yet another grip on the object. \Vith this regrasping, the grip of the hand on the object can be changed without having to release tlle object. The object remains firmly held in the hand during the entire manipulation. Of course it is necessary to remember that three fingers are needed to form a stal)le grasp. Thus in order to perform a manipulation with regrasping, a hand with at least four fingers is required. The fourth finger of the TUM Hand is located directly opposite the thumb between the other two fingers. Although the regrasping of an object within the hand plays such an important role in human manipulations, it has not received much attention in robotic manipulations. A short formulation of the regrasping problem is presented in [5]. However methods which propose a corresponding solution to the problem are lacking. In order to make robotic regrasping possible, a strategy is presenled here, which not only calculates the finger forces necessary to generate the desired external wrench, but also finds a suitable grasp point at which the new finger should grasp the object.

271

5.1

The

Optimization

Problem

During the observation of human regrasping, it becomes clear that the regrasping represents the change from one grip to another, in which the existing contact points of the fingers remain unchanged while a new contact point is established by the free finger. Only after this contact is established can another finger release its grasp on the object. The calculation of the finger forces and the new contact point can be described as an optimization problem with (3n) + 2 unknowns" the 3~7 finger force components .fl~i, f t l i , ft2i and the two pa.rameters for the position of the new grasp point ~'u and Cu,. The new finger forces have to sa.stisfy the equilibrium, contact, and friction conditions so that the fingers exert the desired external force on the object without slipping. I~ addition the grasp point for the new finger has to satisfy the reachability, penetration and collision avoidance conditions. As in the previous optimization problems, the grasp is considered optimum when the difference between the magnitude of the finger forces reaches a minimum. Table 5 summarizes the optimization problem.

Optimization Necessary

Criterion

G : E n_

1 E j _n

l(Ifil 2 _ Ifjl2) ~

~ ,nt.i n.

Conditions

Force Equilibrium Momemt Equilibrium Contact Friction Cone Grasp Point

z? _ ~' -1

l(fni + fti)

Fe - 0

Fi (fni + fti ) - Me - 0

fn i . n i < 0 Ifti[ 2 - #21fr~,.i12 < 0 Reachability Penetration Collision Avoidance

Rectangular Solid

E1--''IDI al

~-IX~l a2

~

0

> o

Tal)le 5: Optimization with Regrasping The first four conditions and the edge avoidance condition for rectangular solids are the same as those discussed in Section 3. The last three conditions are geometrical ones, which means that not only t lie desired external wrench and the object geometry have to be known, but also the the elltire geometry of the hand. This includes the geometry of the fingers, the arrangement of the fingers on the palm and the position and orient.ation of the object within the hand. Because this geometry is different for each hand, only general principles which can be applied to every hand will be discussed here. (For a description

272 of the TUM Hand which is modelled in the example please see [7]). I"or the following discussion, it is assumed that. the geometrical values listed in Table 6 are known.

Hand geometry

Object Orientation

HrHOi

Position of the i-th finger on the pahn

R HOi

Orientation of the i-th finger on the palm

IIrHK

Position of the object in the hand syst.enl

RHK

Orientation of the object relative to the hand

Table 6: General geometrical values R e a c h a b i l i t y 9 If the regrasping is t.o be successful, the free finger lnust be abh: to reach the new grasp point. This reachability must be continually checked during tlle search for a suitable grasp point r K u for the free finger u. \Vith the position the grasp point, the position of the fingertip whic]~ would be necessary to establish contact at tlLis point can be calculated as follows:

ourOsu

-

R-1 H O u ( R H K i(ri(u + HrIII(

HrHOu)

(S)

With the inverse finger kinematics it is then possible to determine if the finger can reacll this grasp point. If the point cannot be reached it is unsuitable for regrasping. P e n e t r a t i o n 9 The free finger cannot penetrate into the object in order to reach t]ae new contact point. With the position of the fingertip calculated in equation (8) and the finger angles obtained through the inverse finger kinematics, the position of critical poil~ts along the finger ourOcrit can be calculated in the object's coordinate system as follows:

KrKcrit - R ~ K ( HrHOu - I i r H i ( + RHO u OurOcrit)

(9)

Now it can be checked if any of the critical points lie within the object's surface. If a point lies within the surface, the point is unsuitable for regrasping, because the finger has to penetrate the object in order to reacll it. C o l l i s i o n A v o i d a n c e 9 If t.he finger can reach the grasp point withotlt penetrating the object, it still remains to 1)e checked if a collision danger exists, because a successful regrasping is only possible if t lie fingers do not disturb one another. The position of the new grasp point has to be chosen in suc]l a way that it does not lead to collisions. First the separation between the new grasp point and all existing grasp points is checked, il~ order to be sure that small errors in the finger positions do not lead to collisions. If the minimum separation is g~laranteed, the finger cross-over has to be checked. For tl~is purpose a method which uses the projection of the fingers on the palm was developed. If the projection of the new finger crosses over, or comes too close to the projection of another finger on the palm, the danger of a collision is very high a.nd the gra.sp point is deemed unsuitable. Using the following equation, the position of the each grasp point found in equation (8) can be transformed into the hand coordinate system:

HrOsi - RHO i oiros i

(1 O)

273 By considering the projection of this vector onto the plane of the pahn, the collision danger can be determined. If lhe fingers must cross one another, the chance of collision is too high, and the point is considered unsuitable for regrasping. S o l u t i o n 9 Because this optimization problem is of the same form as the grasp optilnization problem, the same "Successive Quadratic Programming" inethod can be used to solve it. However, in this case, the success of the optimization depends strongly on the initial point. In order to a cl~ieve a good convergence, a suitable starting point, lnust be found before the optimization. The reacl~able point which achieves the most stable grasp (defined after the stability measure from section 3.0.1) is used as the starting point for the optimization. As before, when grasping a. rectangular object, the gra.sp points must remain within within the allowed axislength ~, so that they do not come too close to the edge. This optimization delivers not only the finger forces for the new" grasp, but also a suitable new grasping point.

5.2

Results

In order to demonstrate the effectivity of this method, an example is presented in which the hand holds an ellispoid (a:r = 13m~)~, a 9 - 16ram and az = 25ram) against gravity and rotates it about the verlical axis. The hand has to exert an external vertical force of 1.4 N in the z-direction and a moment of-0.03 N m about this axis. A coeflizient of friction of tt = 0.55 is assumed. The orientation and position of the object within tl~e hand are also known. The h)llrih finger should grasp the object and replace the second finger. Table 7 shows the resulis of the optimization of the finger forces before the optimization while Table 8 shows the res~llts of lhe regrapsing optimization. The first data. group in the tables shows the the rallies for the unknowns which the ol)timization delivers. During the optimization of the fillger forces 1)efore the regrasping the geometric parameters for the grasp points (4i and (i) are given. During the regrasping optimization these two parameters for the fourth finger are unknown and must be determined by the optimization. The finger forces for the fingers wllich grasp the object are also determined by ~he optimization. The other data groups show values which are calculated fronl the optimized values. As both tables show, the fourth finger is not used before the regrasping. During the regrasping phase, the follrth finger is used to replace the second finger's grasp on the object. The hand uses a new grasp point (and a new finger) to exert the external force on the object. The optimization computes finger forces which accurately achieve the desired external force. One can also see that no finger assumes too large a share of the total load, and that all the finger forces remain within the friction cone (g~ec < It). During the regrasping, the load is nol distrilmted as well before. Because the free finger has to be able to reach the new grasl) poini wiillout penetrating the object or disturbing the other fingers, the number of 1)ossil)le grasp points is reduced. This reduces the number of possible finger force coml)illalions wllicll in turn leads to a less idea.1 distribution of the total load. Figure 9 shows the reslllls for l llis example. The grasp used before the regrasping is shown at the top and the grasp afier lhc regrasping at the bottoln. The point during the

274 regrasping at which all four fingers briefly contact the object is shown in the middle. This figure shows clearly t h a t the reachability, p e n e t r a t i o n , and collision avoidance criterion are all fulfilled. T h e new grasp point is so chosen t h a t the fourth finger replaces the second finger w i t h o u t p e n e t r a t i n g into the object or d i s t u r b i n g the other fingers.

(tad) (tad) fn (N) ftl (N) ft2 (N) ,fl(N)

#nec

x (m)

y(m)

z (m) fx (N)

fy (N) fz (N) mx (Nm) my (Nm) mz (Nrn)

II Finger 1 I Finger 2 I Finger 3 I Finger 4 I Sum I -2.094 0.000 2.094 0.000 0.000 0.000 -2.677 -2.858 -2.950 -1.245 -0.694 -0.121 0.443 0.513 0.443 [I 2.985 2.985 2.985 __ [1 0.494 0 302 0.156 -0.007 0.013 -0.007 -0.013 0.000 0.013 0.000 0.000 0.000 0.825 -2.858 2.033 0.000 2.835 -0.694 -2.141 0.000 0.4,13 0.514 0.443 1.400 -0.006 0.000 0.006 0.000 0.003 -0.007 0.004 0.000 -0.011 -0.009 -0.010 -0.030

Table 7: Finger Force O p t i m i z a t i o n : Before R e g r a s p i n g

1[ Finger 1 I Finger 2 I Finger 3 I Finger 4 I Sum ] -2.094 2.094 -0.593 (tad) 0.000 0.000 0.628 fn (N) -4.367 -4.632 -4.497 fti (N) -0.918 0.958 -1.603 ft2 (N) 1.957 1.172 -0.977 If, (N)11 ,4.873 __ 4.873 4.837 .nec 0.495 0.327 0 417 x (m) -0.007 -0.007 0.011 y(m) -0.013 0.013 -0.007 z (m) 0.000 0.000 0.009 :Ix (N) 2.184 2.328 -4.512 0.000 fy (N) 3.891 -4.117 0.226 0.000 fz (N) 1.955 1.170 -1.725 1.400 mx (Nm) -0.025 0.015 0.010 0.000 my (Nrn) 0.015 0.009 -0.024 0.000 mz (Nm.) -0.001 0.001 -0.030 -0.030 (tad)

II

Tal)le 8: O p t i m i z a t i o n with R e g r a s p i n g

275

Figure 9: R egrasping an Ellipsoid

6

Conclusion

This paper provided a. brief look into (llree aspects of tile high-level grasp stra.tegi(::s developed at the Leh'rst, l~l B fl}'r Mechaoik which represent the first steps towards the goal of complete grasp automation" calclllation of the finger forces and grasp points, the placing of the object wit.l~in (lie hand, and the regrasping of the object withirl the hall,:l. If an object is be manil~lllated effectively, both the finger position and the finger forces necessary to exert the desired force oll the object must be deternained. The strategy presented not only calculates the finger forces necessary to manipulate the object, but also provide points of variable stability wllere the fingers should grasp the object. Example grasps were shown and discllssed. Once the finger forces and grasp points are known, the hand must be so positioned and oriented so that the fingers can reach their assigned grasp points. A method was presented which places the ol>.iect so wi(llin the hand that all grasp points are reachable and the fingers have the maximuna range of movement to complete the desired task. Many manipulations require that the hand change its grasp on the object. A naet,hod

276 was presented which plans the regrasping of the object within the hand. A free finger grasps the object at a new point and thus establishes a new grip. Another finger can break its contact with the object is now free to perform further regrasping. During the entire regrasping process, the object is held firmly within the hand.

Acknowledgement Funding for this work was provided by the German Ministry for Research a.nd Technology (BMFT) through the SEKON Project (Grant" 01 IN 104 D/7).

References [1] Biggers K.B., S.C. Jacobsen and G.E. Gerpheide. "Low-Level Control of the Utah/MIT Dextrous Hand". Proc. of IEEE Int. Co@ o1~ Robotics and Automation, April 1987. [2] Chen Y.-C. and I.D. Walker. "An Analysis of Squeezing and Twisting for Multifingered Grasping". Proc. of IEEE [,t. Conj'. on Robotics and Automation, Atlanta, Georgia, May 1993. [3] Kerr J. and B. Roth. "Analysis of Multifingered Hands". Int. Jotlrnal of Robotics Research, 4(4):3-17, Winter 1986. [4] Kerr J. and B. Roth. "Special Grasping Configurations with Dextrous Itands". Pl'oc. of I E E E Int. Conf. o,n Robotics and Automation, April 1986. [5] Li Z., J.F. Canny and S.S. Sastry. "On Motion Planning for Dexterous Maniptllation, Part I: The Prol)lem Formulalion". Proc. of IEEE Int. Conf. o l~ Robotics a l~d Automation, May 1989. [6] Li Z., P. Hsu and S.S. Sastry. "Grasping and Coordina.ted Manipulation by a Multifingered Robot Hand". l,t. Jovr, al of Robotics Research, 8(4)'33--50, 1989. [7] Menzel R., K. \Voelfl and F. Pfeiffer.

"The Development ot'a ttydraulic tland".

Proc. of the 2nd Co o.[cre,ce o, ilhch.atronics and Robotics, pages 225-238, Duis-

burg/Moers, Germany. September 1993. [8] Omata T. "Fingertip positions of a multifingered Hand". Proc. of 1LgEE Int. Co@ on Robotics and Automation, May 1990. [9] Park Y.C. G raspiT~.g and 3la,ipulotio., of an Object usi~g a M ult~ifingcrcd Robot Hand. PhD thesis, Universily of New Xlexico, May 1990. [10] Salisbury J.K. I(iT~ematic a'od Force Analysis of Articulated Ha'rids. PhD thesis, Stanford University, Nlay 1982.

277 [11] Woelfl K. and E. Pfeiffer. "Grasp Strategies for a Dextrous Robotic Hand". P roc. of I E E E / R S J Int. Co,,f. 0o h~tellige,t Robots and Sgstems, Munich, Germany. September 1994. [12] Yoshikawa T. and K. Nagai. "Manipulating and Grasping Forces in Manipulation by Multi-Fingered Hands". Proc. of Intl. Conference o lz Robotics a l~,d A'tttomation,, pages 1998-2004, Raleigll, March 1987. [13] Yoshikawa T. and K. Nagai. "Manipulating and grasping forces in manipulation by multifingered robot hands". I E E E Journal of Robotics a l~d A~ttomatiolz, 7(1):67-77, February 1991.

This Page Intentionally Left Blank

Intelligent Robots and Systems

V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

279

Motion Scheme for Dextrous Manipulation in the Intelligent Cooperative Manipulation System--ICMS M. Buss ~ and H. Hashimoto b ~Department of Automatic Control Engineering, Technical University of Munich, Theresienstrasse 90, D-80333 Munich, Germany, E-mail: [email protected] bInstitute of Industrial Science, University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan, E-mail: H.Hashimoto~ieee.org This paper proposes a motion scheme for dextrous manipulation for the Intelligent Cooperative Manipulation System--ICMS. Key idea of this motion scheme is the hand manipulation skill model used for skill acquisition, transfer and manipulation assistance in the ICMS. The proposed skill mapping maps from desired object task trajectories to finger contact point locations and can therewith describe complex manipulation tasks, where contact states of the fingers may change during the task. Characteristics of this skill mapping are discussed and a scale transformation increases applicablity for a change in object scale. Based on the motion scheme for dextrous manipulation, a skill example of pen rotation by four fingers is acquired from the human operator, and the acquired skill is transferred to a four-fingered dextrous robotic hand with a different kinematic structure and size than the human hand. Replacing the complex input device attached to the human hand by a much simpler position and orientation sensor is the idea of the manipulation assistant also shown in this paper. The human supervisory operator inputs desired object position and orientation and the ICMS assists in that the references for the dextrous hand are generated by the proposed motion scheme. 1. I N T R O D U C T I O N The authors have been proposing the Intelligent Assisting System--IAS [1,2], and as a first step towards the IAS the Intelligent Cooperative Manipulation System--ICMS for human machine cooperative manipulation. The ICMS uses a 10 degrees-of-freedom force feedback sensor glove device connected to a virtual reality including a dynamic force simulator. The human operator can interact with objects in this artificial environment and data describing the manipulation task can be measured. This paper briefly shows the system structure of the ICMS and discusses hand manipulation skill modeling for acquisition and transfer in the motion scheme for dextrous manipulation. Dextrous manipulation using robotic hands is well studied. Salisbury has described some of the basic issues of interest [3], Kerr and Roth [4], Yoshikawa et al. [5,6] have proposed schemes to solve for internal grasping forces suspect to friction and other constraints, Li et al. [7] and Yoshikawa et al. [8] have developed control strategies. Other

280 problems of sensing and hardware realization of dextrous robotic hands are issues of ongoing research, see e.g. [9]. Preshaping of the hand to prepare for manipulation task execution can be observed in humans reaching for objects to be grasped. Power, precision, force closure and other ways of grasp prehension were observed [10-12]. One important and less understood issue is how to actually generate desired trajectories for dextrous manipulation tasks. Some results using primitive based hierarchical control strategies were shown by Speeter [13] and Stansfield [14]. This paper develops the idea of a manipulation skill mapping to achieve finger contact point locations for given object task trajectories. It is noted that this can be also used during the preshaping stage but the ideas in this paper focus mainly on time-dependent or task-dependent grasp transitions during task execution after an initial grasp has been accomplished. In Section 2 the Intelligent Cooperative Manipulation System-ICMS is briefly reviewed. Section 3 develops a motion scheme for dextrous manipulation, defines the manipulation skill mapping and shows the process from desired object task trajectory to contact wrench intensity trajectories. Applicability of the proposed skill mapping is discussed for object scale changes and a manipulation example of pen rotation is shown. Section 4 discusses how manipulation skill can be acquired, which then is transformed to an executing dextrous robotic hand in Section 5. Section 6 shows grasping assistance as an example of human operator assistance by the ICMS. 2. I N T E L L I G E N T

COOPERATIVE MANIPULATION SYSTEM--ICMS

This section briefly describes the ICMS consisting of a sensor glove device for force feedback to the human operator, and a virtual reality including a Dynamic Force Simulator (DFS) which calculates the feedback forces to the operator (see [15] for details about the DFS). Most important characteristics of this system are the 3 information and power (IP) flows also illustrated in Figure 1. I P Flow l m S k i l l Acquisition: The human operator wears a sensor glove device enabling interaction with objects in the virtual world while getting appropriate force feedback. Visual feedback information from the virtual world graphic animation module is utilized by the operator for task goal verification and overall environment perception. Measurements during manipulation by the human operator yield data for acquisition into the manipulation skill data base of the ICMS. I P Flow 2 - - S k i l l Transfer: The acquired skill now available in the skill data base is used for more sophisticated control of the executing dextrous robotic hand. I P Flow 3 - - A s s i s t i n g M a n i p u l a t i o n " The ICMS anticipates human motion and if appropriate skill is available in the data base, assists the human operator in executing the task. In this paper an example of grasping assistance is shown to demonstrate cooperative work by human operator and ICMS. The following section proposes a motion scheme for dextrous manipulation also modeling hand manipulation skill. Using this motion scheme the three IP flows of skill acquisition, transfer and assisting manipulation can be realized which is then shown in subsequent sections.

281

Figure 1. Intelligent Cooperative Manipulation System--ICMS. 3. M O T I O N

SCHEME

FOR DEXTROUS

MANIPULATION

Motion of an object in the environment can be described by its position and orientation vector as a function of time. This is common practice for many problems in the field of robotics. In case this motion is subject to physical laws like gravitational acceleration and moments of inertia differential equations of motion can be derived easily. Complexity of these equations increases for constrained environment conditions opposing or inducing object motion due to geometric or other constraints. One central objective is to make objects follow reference trajectories of position and orientation given by a global task planner or human teleoperator under varying environment conditions. The terms task and goal planning describe the operation of deriving such object reference trajectories, which is a rather unconscious undertaking for the human but a major effort of research in robotic systems. For redundant manipulators and dextrous robotic hands this problem has no simple straightforward solution since there is an infinite number of solutions due to redundancy. This chapter deals with a motion scheme for dextrous manipulation useful to derive a set of task trajectories for dextrous hands from desired object motion assumed to be given by a global task planner or in case of the ICMS by the human operator. After a task definition given in the next section, a model for the performed manipulation task is shown and its characteristics are discussed. 3.1. Task D e f i n i t i o n Figure 2 shows a hierarchical model for task planning and its execution. First there generally is an intention and motivation of what task to execute next, then the objects off the task objectives are located in the environment and a task trajectory is planned. Herei

282 various environment conditions can change the resulting task trajectory such as e.g. a requirement of avoiding obstacles in the direct way to the goal position of the task object or other relations between several objects.

Figure 2. Hierarchy of task and path planning during manipulation.

To realize a specific task in the environment is considered equivalent to moving the task objects along a desired path. The manipulation task can then be defined by this object trajectory in space and time. Definition 3.1 (Manipulation Task) A manipulation task of an object is described by its task trajectory, the position and orientation vector p(t) C IR6. Often this task trajectory is assumed to be given by a task and goal planner in the context of dextrous manipulation. For the ICMS it can also be given directly by the human operator interacting with the system. To show that Definition 3.1 is actually describing much more than only object position and orientation the following well known

283 relations to the velocity and force domain are given. The object velocity can be written in terms of p as v 0.)

- tI/(p) ib

(1)

where ~(p) C IR6x6 is an appropriate transformation matrix. Differentiation of (1) with respect to time yields linear and angular acceleration of the object dJ

]-

+

(p)p

(2)

The position and orientation trajectory p(t) can be related to the force domain f(t) using the Newton-Euler equations of the object as

0

,7

iJ+

wx,Jw

-f-f~t+

0

'

(3)

where rh E IR3x3 is a diagonal matrix with the object mass m in the diagonal elements, J E IR3x3 is the object inertia tensor, f~t E IR6 is the generalized external force acting on the object and g E IR3 is the gravitational acceleration vector. It is noted here, that geometric and other environment constraints can be included in the external force f~t.

3.2. Definition of M o t i o n S k i l l - - M S K In the following motion skill is defined according to related results from cognitive science, see e.g. [16-19]. Other engineering approaches to human skill modeling or robot programming by human demonstration can be found in [20-23]. Definition 3.2 (Motion Skill--MSIQ Motion skill (MSIQ is defined as the ability to derive motor actuation commands from perceived environment states assuming a task goal known from intention and task planning. This definition of motion skill is based on and closely related to the skill-based human behavior proposed by Rasmussen [16], in that for most skilled sensory-motor tasks, the body acts as a multivariable continuous control system synchronizing movements with environment states, see Figure 3. In summary from several works of cognitive science one can conclude, that skill-based performance is the result of processing (physical) signals perceived as time-space signals, continuous quantitative indicators from sensors to control the body actuators. These signals have no "meaning" or significance except as direct physical time-space data. At this stage there is no symbolic processing like e.g. symbolic environment state perception or symbolic goal planning, which is used and processed in the rule-based behavior level [16].

3.3. Hand M a n i p u l a t i o n Skill M o d e l - - H M S K In this section hand manipulation skill (HMSK) is defined as the ability to decide finger-tip positions l(t) for grasping from object task trajectories p(t). It basically makes no difference for this model where the data comes from, i.e. finger-tip locations could be

284

Intention

Task goal

Environment ~

Perception / - ~

SKILLMOTIONMSK ,'~'----~ Action

Figure 3. Motion skill (MSK) model as sensory-motor behavior. calculated analytically as the solution of e.g. manipulability optimization or be learned from a human operator in case of the ICMS. As a more specific form of motion skill (MSK), the following model of hand manipulation skill (HMSK) for dextrous hands is defined. A s s u m p t i o n 3.1 The desired object trajectory pd(t) @ ]R 6 as of definition 3.1 is given by a manipulation task planner or a human teleoperator. A s s u m p t i o n 3.2 Object shape and center of mass are known. Definition 3.3 (Skill M a p p i n g - - S M ) pd(t )

SM

l(t),

The skill mapping S M is defined as

(4)

where pd(t) e IR 6 is the object reference trajectory, l(t) - [ l~ ... 1T ]T e IR 3n are the contact point positions of the n fingers used for manipulation, and li C IR 3 is a vector .from the object center of mass to the i th contact point.

Figure 4 shows the skill mapping and other essential mappings involved in dextrous manipulation tasks. The skill mapping 5'M gives the relation between the desired object trajectory pd(t) and finger tip contact point locations, which can be used to directly compute the grip transform W ( t ) assuming the object shape as known. Once the grip transform has been found by S M the second problem is to transfer this into reference trajectories for joint angles and torques in manipulating hand joint space. Solving inverse hand kinematics from finger tip trajectories yields trajectories in hand joint space realizing the desired finger-tip positions, however, the problem of appropriate internal grasping force trajectories remains. Optimization for appropriate internal forces (IFO) can be solved as a nonlinear or linear programming problem and a variety of approaches to solve the IFO can be found in the literature [24-26,4,27-33].

285 pa(t) object r trajectory

Object shape______~ class ~1 SM finger locations

]

l(t)

SM: Skill Mapping HIK: Hand Inverse Kinematics

Object shape ] grip transform Object shape ~ mass, friction ~ and contact model

W(t) +

I

IFO

].. HIK 0 ~(0 fingerjoint reference IFO:Internal Force Optimization

contact force ca(t), f~xJt(t), fifft (t) internal reference

Figure 4. Manipulation skill mapping SM. R e m a r k 3.1 Note that the skill mapping S M does not depend on the kinematic structure of the manipulating hand and therewith is quite general, but naturally the resulting contact point locations li need to be within the operable workspace of the corresponding finger i. R e m a r k 3.2 This skill mapping S M maps .from, the 6-dimensional object position and orientation space to the 3n-dimensional contact point space. The mapping is nontrivial, difficult to solve analytically and can include time variant contact conditions and also timely past and/or future steps during manipulation. 3.4. Skill M a p p i n g A p p l i c a b i l i t y The skill mapping of Definition 3.3 assumes exact knowledge of the object shape. This assumption can be relaxed by transforming the desired contact point 1 resulting from S M for differently scaled objects. Therewith the same skill mapping S M can be used for larger and smaller objects within the same shape class, which increases the applicability and robustness of the modeled manipulation skill. Definition 3.4 (Scale Transformation) To transfer the acquired contact point locations 1 to new locations l' for a differently scaled object a scaling matrix E E IR3• is defined as

l!i - E~,:,

(5)

where i = 1, ..., n designates finger i of n manipulating fingers.

R e m a r k 3.3 To increase the robustness during execution of the modeled manipulation skill this scale transform might be included in the control algorithm through an adaptive mechanism, which will be a topic of.future research.

286 3.5. Characteristics of SM S M looked upon as a static mapping from object position and orientation p solely defines how the object should be grasped in its workspace. It is considered compact, because

1. it is independent on hand kinematics (Remark 3.1), 2. of the scale transformation for object size variance (Definition 3.4), 3. it does not depend on a specific task trajectory in time or task execution speed, i.e. once S M is learned it is applicable for arbitrary task trajectories (see grasping assistance in Section 6, where an arbitrary trajectory is given by a human operator). 4. rolling or sliding contacts are implicitly modeled. Comparing finger-tip positions between two time-instants can be used to select constraints for the programming problem to solve for internal forces resulting in sliding or rolling contact, and further can be used to select an appropriate mode in the executing controller.

5. SM also can model preshaping strategies, i.e. before contact is achieved with the object finger-tip contact locations (virtual contact) describe the preshape. 6. it defines also impedance. Finger-tip positions depend on object position and orientation, and are realized by the executing controller. Reference to the controller are also the contact wrench intensities as the result of internal force optimization (IFO). Object impedance defined in the object reference coordinate system depends therewith on SM, IFO and dextrous hand controller gains [34]. A more general SM depends also on time 1 = SM(p, t) and has the following additional characteristics: 1. it may change the type of grasp drastically depending on execution speed. This can be understood by a simple example of a skill mapping SM(p, p) depending on task trajectory and speed intervals as

9

SMI(p)

SM(p, I5) =

SM2(p)

for p E [P0, Pl) for /5 E [/51,ib2)

(6)

SMn(p) for /i C [ibn-1,/i~) Nevertheless, the static S M also works for differing execution speeds, but does not change the grasp state and type drastically. 3.6. E x a m p l e This example shows how the grip transformation matrix W can be derived from finger contact point locations l = SM(p).

287 Consider the case of a pen shown in Figure 5 and a soft finger contact model with the following unit wrenches in the contact frame C = CI: CwT1,1 --

[ 1 0 0 0 0 0 ]

[010000] [001000] [000001]

CwT CwT

This yields the grip transform W as these unit wrenches transformed into the coordinate frame B fixed at the center of mass of the object

W

=

BWl, 1 BWl, 2 BWl, 3 BWl,4 ] --

1 0 0 0 r --ll

0 1 0 -r 0 0

0 0 1 ll 0 0

0 0 0 0 0 1

Figure 5. Example of contact force for a pen-like object.

4. H M S K

ACQUISITION

Hand manipulation skill (HMSK) can be acquired according to Definition 3.3 by learning the skill mapping parameters. There are a number of ways this can be done: 1. Learning S M from a human operator: Using a sensory device attached to the hand of a human operator giving exact knowledge about finger contact locations and object position and orientation, data describing the skill mapping can be measured [1,35]. 2. Using a telerobotic system to implicitly get the required data for the slave robot the manipulation skill is to be implemented on later. Recently, methods for learning S M , in particular an artificial neural network and memory-based learning approach have been investigated [36].

288

oo3t 0.04

Contact point locations of finger 1 (m)

Contact point locations of finger 2 (m)

~176

I _o.o,I /

v

-~176 o

t

\

-0"04 r

so

loo

Phi (deg)

1so

200

-00%

so

16o

Phi (deg)

1;0

\

200

Figure 6. Approximated finger contact point locations as acquired from the human operator wearing a VPL data glove depending on object rotational angle ~b of the object. 2

Joint angles of finger 2 (rad)

Joint angles of finger 1 (rad)

1.5 1 0.5 0 -0.5 -1 -1.5 -2

0

s'0

\ /

~oo

Phi (deg)

1;0

v 2o0

%

~o

16o

Phi (deg)

1;o

2oo

Figure 7. Finger joint angles of the dextrous 24 degree-of-freedom robotic hand used for execution of pen rotation manipulation skill.

4.1. Acquisition Example For a simple manipulation example of rotating a pen using 4 fingers, experimental data from a human operator wearing a VPL dataglove were acquired. The measured data were filtered, interpolated, and then approximated by a 5 element fourier series for the /x-component of the contact point and linear approximation for the ly- and/z-components, which yields a rather compact representation of the acquired manipulation skill. Figure 6 shows contact point locations for two of the four fingers depending on the rotational angle 6 of the pen. The task was carried out with approximately pa = [ rx ry r~ 0 0 ~ ]T (as exactly the human operator can handle this requirement; errors for the two zero orientation angles where neglected), where r=, ry, rz is the location of the pen center of mass and [ 0 0 ~ ]T are the RPY-angles giving the orientation of the pen (radius rp - 0.004 m, length lp - 0.15 m).

289

Figure 8. Illustration of the manipulation task execution by the 24 degree-of-freedom robotic hand (r denotes the rotational angle of the object).

5. M S K T R A N S F E R

AND EXECUTION

Once the skill mapping S M has been acquired as described in Section 4 solving inverse kinematic equations of the robotic hand structure on which the manipulation task is to be executed yields trajectories for the robotic hand in hand joint space. Further the contact wrench intensities c(t), fint(t) as the solution of the internal grasping force programming problem (IFO) need to be realized by the control law for the robotic hand. Such control algorithms are proposed in the literature [2,37,8]. 5.1. Transfer E x a m p l e The acquired manipulation task as shown in the example in section 4.1 can be transferred for execution by a robotic hand. Assuming a kinematic structure of a 4-fingered 24 degrees-of-freedom hand as developed by Namima et al. [38], which is quite larger in scale than the human hand, the acquired manipulation skill must be transformed to this

290 larger scale. The pen to be handled by the robotic hand is assumed to have a radius of rp - 0.02 m and a length of lp - 0.5 m. To transfer the acquired contact point locations the scale transformation of definition 3.4 is used, where !

!

!

!

i

lp --,rP _rp ) - d i a g ( = diag( ~ , rp rp

3.33, 5, 5 ) ,

(7)

assuming the x-axis of the pen center of mass coordinate frame is in the direction of the length. If a soft finger contact model with 4 degrees-of-freedom is assumed between finger-tip and object the grip transformation matrix W follows immediately as

W-

0

1

0

0 ...

-sign(lj,y) 0

0 0

0

0 1

lj,,,

0 ... 0 ...

0 -sign(lj,y)lj,x

0 -lj,y

-lj,x 0

0 ... 1 ...

0

0 ...

'

(8)

where sign(x) = - 1 for x < 0, sign(x) = 1 for x > 0, lj,x, lj,y are the x- and y-components of lj, respectively, and j are the fingers is contact with the object. Then the programming problem (IFO) is solved yielding wrench and internal force intensities subject to friction and other constraints. The resulting joint angle trajectories in hand joint space depending on object rotational angle r are shown in Figure 7. Figure 8 visualizes how the manipulation skill of pen rotation acquired from the human is executed.

Object orientation reference (deg)

180 160 140

Polhemus + 3D_Tracker~

object posmon/ orientation

Z

.

~...._~_.1..-.--

x "--

.

p d(k) ---. y

contact state, grip transform

__~ S M ~---~ l(k) W(k) Skill

Mapping

120 100 80 60 40

~o/ t~

u(

1

2

3

time (s)

4

5

6

Figure 9. Grasping assistance using SM. Figure 10. Rotational angle of the object commanded by the human operator.

291 6. G R A S P I N G A S S I S T A N C E It is possible to use the acquired hand manipulation skill mapping SM for grasping assistance. Figure 9 shows how the object task trajectory is given by a human operator using a 3D tracking device (here a Polhemus tracker). Applying S'M yields contact point location references and finally after solving for internal grasping forces hand joint torque references. Figure 10 and Figure 11 show an experimental result of grasping assistance.

0.1[

0.15

Contact point locations of finger 1 (m)

Contact point locations of finger 2 (m)

0.25 0.2 0.15

0.05

0.1

0

0.05 0

-0.05

-~ 2

-0.05

~

i

3

time (s)

~

;

Joint angles of finger 1 (rad)

;

7

-ol

1

4

2

3

4

5

time (s) Joint angles of finger 2 (rad)

6

7

6

7

1.5 1 0.5 0 -0.5

-2 time (s)

0

1

2

3 4 time (s)

5

Figure 11. Finger contact point locations and joint angles during grasping assistance.

7. C O N C L U S I O N S This paper discussed the Intelligent Cooperative Manipulation System--ICMS and the underlying motion scheme for dextrous manipulation. Using the proposed skill mapping as a hand manipulation skill model is our approach to skill acquisition from human operators, skill transfer to an executing dextrous robotic hand and manipulation assistance is realized as human-machine cooperation in the ICMS. Specifically the following issues were discussed:

292 9 Within the motion scheme for dextrous manipulation a discussion about motion skill in general and specifically for hand manipulation skill was given. 9 The manipulation skill mapping is the proposed model for hand manipulation skill, mapping between desired object task trajectory and finger contact points locations for changing contact states during manipulation. Some special characteristics of the skill mapping SM were discussed, in that SM is independent on hand kinematics, a scale transformation makes it adaptive to a change in object size, rolling or sliding contacts and impedance imposed on the object are implicitly modeled, SM can also model preshaping strategies, and a time dependent S M may change the grasp strategy drastically depending on execution speed. 9 Acquisition of hand manipulation skill mappings from human operators was discussed and shown in an example of pen rotation by four fingers. 9 The acquired manipulation skill example of pen rotation using four fingers was transferred to a 24 degree-of-freedom dextrous robotic hand of larger scale than the human hand and a larger scale object. 9 Grasping assistance was shown as an example of human--ICMS cooperative work, where the complex input device attached to the human hand during skill acquisition can be replaced by a simpler position and orientation command input device, and actual grasping and control of the dextrous hand is achieved by the proposed motion scheme. Future work will include robustness issues for manipulation skill transfer to cope with object model errors and disturbances from the environment. REFERENCES

1. M. Buss and H. Hashimoto, "Skill acquisition and transfer system as approach to the intelligent assisting system--ias," in Proceedings of the 2nd IEEE Conference on Control Applications, (Vancouver, British Columbia, Canada), pp. 451-456, 1993. 2. M. Buss and H. Hashimoto, "Human manipulative skill analysis and acquisition for the intelligent assisting system," in Proceedings 12th IFA C World Congress, International Federation of Automatic Control, vol. 9, (Sydney, Australia), pp. 321-326, 1993. 3. M. Mason and J. Salisbury, Robot Hands and the Mechanics of Manipulation. Cambridge, Massachusetts: MIT Press, 1985. 4. J. Kerr and B. Roth, "Analysis of multifingered hands," International Journal of Robotics Research, vol. 4, pp. 3-17, Winter 1986. 5. Y. Nakamura, K. Nagai, and T. Yoshikawa, "Mechanics of coordinative manipulation by multiple robotic mechanisms," in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 991-998, 1987. 6. T. Yoshikawa, "Dynamic hybrid position/force control of robot manipulators- description of hand constraints and calculation of joint driving force," IEEE Journal of Robotics and Automation, vol. 3, pp. 386-392, October 1987. 7. Z. Li and S. Sastry, "Task-oriented optimal grasping by multifingered robot hands," IEEE Journal of Robotics and Automation, vol. 4, pp. 32-44, February 1988.

293

10.

11.

12.

13.

14.

15.

16.

17.

18.

19. 20. 21.

22.

23.

T. Yoshikawa and X. Zheng, "Coordinated dynamic hybrid position/force control for multiple robot manipulators handling one constrained object," International Journal of Robotics Research, vol. 12, pp. 219-230, June 1993. S. Horikoshi, H. Hashimoto, and F. Harashima, "Identification of object parameters with robot hand," in Proceedings of the International Conference on Industrial Electronics, Control, and Instrumentation IECON, (Hawaii), pp. 45-50, 1993. M. Cutkosky, "On grasp choice, grasp models, and the design of hands for manufacturing tasks," IEEE Transactions on Robotics and Automation, vol. 5, pp. 269-279, June 1989. T. Iberall, "The nature of human prehension: Three dextrous hands in one," in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 396401, 1987. T. Nguyen and H. Stephanou, "A computational model of prehensility and its applications to dextrous manipulation," in Proceedings of the IEEE International Conference on Robotics and Automation, (Sacramento, California), pp. 878-883, 1991. T. Specter, "Primitive based control of the utah/mit dextrous hand," in Proceedings of the IEEE International Conference on Robotics and Automation, (Sacramento, California), pp. 866-877, 1991. S. Stansfield, "Primitives, features and exploratory features: Building a robot tactile perception system," in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1274-1279, 1986. H. Hashimoto, M. Buss, Y. Kunii, and F. Harashima, "Intelligent cooperative manipulation system using dynamic force simulator," in Proceedings of the IEEE International Conference on Robotics and Automation, (San Diego, California), pp. 25982603, 1994. J. Rasmussen, "Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models," IEEE Transactions on Systems, Man and Cybernetics, vol. 13, pp. 257-266, May/June 1983. R. Redding, "Taking cognitive task analysis into the field: Bridging the gap from research to application," in Proceedings of the Human Factors Society 3~th Annual Meeting, pp. 1304-1308, 1990. E. Roth and D. Woods, "Analyizing the cognitive demands of problem-solving environments: An approach to cognitive task analysis," in Proceedings of the Human Factors Society 3~th Annual Meeting, pp. 1314-1317, 1990. R. Zucco, C. Tausky, and G. Sutton, "Skill as proficiency: Introducing the process of task mastery," , . S. Arimoto, "Learning for skill refinement," in Proceedings of the International Workshop on Intelligent Robots and Systems IROS, pp. 10-13, 1991. N. Delson and H. West, "Robot programming by human demonstration: Subtask compliance controller identification," in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems IROS, pp. 33-41, 1993. B. Yang and H. Asada, "Skill acquisition for robot grinding: Learning of high-level feedback laws from teaching data," in Proceedings of the 1990 JAPAN-U.S.A Symposium on Flexible Automation, pp. 649-655, 1990. J. Yang, Y. Xu, and C. Chen, "Hidden markov model approach to skill learning and

294

24. 25.

26.

27.

28.

29.

30. 31.

32.

33.

34. 35.

36.

37.

38.

its application in telerobotics," in Proceedings of the IEEE International Conference on Robotics and Automation, (Atlanta, Georgia), pp. 396-402, 1993. A. Bicchi, "Optimization of robotic grasping forces," in Proceedings of the American Control Conference, 1992. F.-T. Cheng and D. Orin, "Efficient algorithm for optimal force distribution--the compact-dual lp method," IEEE Transactions on Robotics and Automation, vol. 6, pp. 178-187, April 1990. J. Jameson and L. Leifer, "Automatic grasping: An optimization approach," IEEE Transactions on Systems, Man and Cybernetics, vol. 17, pp. 806-814, September/October 1987. C. Klein and S. Kittivatcharapong, "Optimal force distribution for the legs of a walking machine with friction cone constraints," IEEE Transactions on Robotics and Automation, vol. 6, pp. 73-85, February 1990. V. Kumar and K. WMdron, "Sub-optimal algorithms for force distribution in multifingered grippers," in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 252-257, 1987. Y. Nakamura, K. Nagai, and T. Yoshikawa, "Dynamics and stability in coordination of multiple robotic mechanisms," International Journal of Robotics Research, vol. 8, pp. 44-61, April 1989. V. Nguyen, "Constructing stable grasps in 3d," in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 234-239, 1987. D. Orin and S. Oh, "Control of force distribution in robotics mechanisms containing closed kinematic chains," Transactions of the ASME: Journal of Dynamic Systems, Measurement and Control, vol. 102, pp. 134-141, June 1981. P. Sinha and J. Abel, "A contact stress model for multifingered grasps of rough objects," IEEE Transactions on Robotics and Automation, vol. 8, pp. 7-22, February 1992. T. Yoshikawa and K. Nagai, "Evaluation and determination of grasping forces for multi-fingered hands," in Proceedings of the IEEE International Conference on Robotics and Automation, (Philadelphia, Pennsylvania), pp. 245-251, 1988. M. Buss, Study on Intelligent Cooperative Manipulation. PhD thesis, University of Tokyo, Tokyo, June 1994. K. Shimoga, "A survey of perceptual feedback issues in dexterous telemanipulation: Part i," in Proceedings of the IEEE Virtual Reality Annual International Symposium VRAIS, pp. 263-270, 1993. M. Buss and H. Hashimoto, "Hand manipulation skill learning for the intelligent cooperative manipulation system," in Proceedings of the Intelligent Autonomous Systems Conference IAS-4, 1995. Z. Li, P. Hsu, and S. Sastry, "Dynamic coordination of a multiple robotic system with point contact," in Proceedings of the American Control Conference, pp. 505-510, 1988. T. Namima and H. Hashimoto, "Dextrous manipulation using a 24 dof robotic hand," in Proceedings of the IEEE//RSJ/GI International Conference on Intelligent Robots and Systems IROS, 1994.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

295

Designing a Behavior of a Mobile Robot Equipped with a Manipulator to Open and Pass through a Door Keiji Nagatani and Shin'ichi Yuta ~ Intelligent Robot Laboratory Inst. of Information Science and Electronics, University of Tsukuba, Tsukuba 305, Japan One approach of how to pass through a doorway by a mobile robot is discussed. In this task-oriented approach, proposed task is to realize a total behavior with opening a door and passing through a door-way by a mobile robot equipped with a manipulator. In this paper, we describe a design of such behavior by a robot, and show a simulation of robot motion series for this task. The key issue for realizing such behavior is cooperation between robot's function sub-systems, such as the locomotion control system, the manipulator control system and the sensor systems. 1. I n t r o d u c t i o n Research of mobile manipulators has received much attention. These approaches treat the redundancy of mobile manipulators [1] [2] [3], and analyze the dynamics of mobile manipulators [4] [5]. Some approaches propose a control concept or an algorithm for a simple task, such as tracing surfaces[6]. However, few approaches have dealt with a complex task for the mobile manipulator directly. In this paper, we present a task-oriented approach for researching mobile manipulators on a specific mission. Our approach consists of the following steps. 1. Defining a task for a robot to perform 2. Designing about an algorithm and an architecture to realize the task. 3. Implementing the algorithms and architecture on a real robot system for experimental verification of our design 4. If problems existing in the implementation, repeat steps 2 and 3 until a satisfactory solution is found. We define the task of this research as realizing a behavior to open the door and to pass through a door-way using an autonomous mobile robot equipped with a manipulator. Considering the architecture of the robot, it is most reasonable to realize a distributed control system. In the distributed architecture which is proposed in references [7] [8]. The robot's task we propose here can be realized by cooperation between the manipulation module, locomotion module and sensor modules.

296 Section 2 presents a detailed definition of the task the robot will perform. In section 3, a design based on behaviors to analyze the task is presented. A sinmlation result using the proposed design is shown in section 4. In section 5, an implementation and experimental result of stage 4 on the real robot is reported. Future work is discussed in section 6. 2. D e f i n i t i o n of t h e t a s k The problem task is opening a door and passing through a door-way using a wheeled mobile robot equipped with a manipulator. To realize this task, the robot finds the door knob, then grasps and rotates the knob with an end-effector mounted on the manipulator, pushes the door and passes through the door-way using both its locomotion system and the manipulator. Such a task depends on the size of mobile robot, the size of manipulator, the ability of the robot hardware and the condition of the environment. First, we define the assumptions and constraints concerning the environment and the robot. 2.1. E n v i r o n m e n t The target environment is indoors, such as an office building. The surfaces of the floor are sufficiently smooth so that a wheeled mobile robot can move easily. Rooms are connected with corridors by open-sided doors which are usually closed. Every door has a rotating door knob which is centered about 5 centimeter from the opening side at a height of about 100 centimeter. When the knob is rotated, a prop in the door is released and the door can be opened by pushing or pulling it. Every door is motionless while no external force is applied to the door. The size of the door is, about 200 • 90 • 3 centimeter. Doors of this form and size are standard in Japanese offices. The above described environment exists in our laboratory at University of Tsukuba (Figure 1).

Figure 1. Target Environment

297 2.2. R o b o t Our laboratory has developed several small sized wheeled mobile robots, called "YAMABICO"[9] for experiments. One of these robots, which is called "YAMABICO typeTEN", is rather large and is equipped with a light manipulator. Figure 3 shows a photograph of "YAMABICO type-TEN". In this research project, we use this autonomous mobile robot to realize the proposed task. The configuration of the YAMABICO type-TEN robot system are now discussed.

9 A light weight 6 degree of freedom manipulator is mounted on the mobile robot. 9 A force sensor is located at the end of the manipulator arm. 9 An end-effector to grasp the door knob is mounted on top of the force sensor. 9 A vision sensor is mounted on the naanipulator to confirm the position of the knob of the door. 9 Twelve ultra-sonic sensors are mounted around the mobile robot for navigation and obstacle detection. 9 The mobile robot uses a locomotion system with two driving wheels mounted on a central axis [10]. 9 The locomotion system uses dead-reckoning to estimate its position based on odometry. 9 The schematic details of the manipulator and its degree of freedom (DOF) are shown in Figure 2. 9 The position of the sensors are illustrated in Figure 3.

285.... / ~ ...................

6"".,

2

/'"" 1

/ / / / / / / / / /

Figure 2. Schematic Diagram of the Manipulator

65ram

298

Figure 3. YAMABICO type-TEN

3. Designing a B e h a v i o r to Pass T h r o u g h a D o o r - w a y To realize the proposed task of robot passage through a door-way, a number of technical considerations are necessary. The cooperation between function controllers of the robot must be considered. In this section, we analyze the complete behavior to pass through a door-way, particularly the cooperation between the manipulation and the locomotion. First, we divide the task into 9 stages from the point of view of designing cooperative actions between controllers. The stages are shown in Figure 4. Second, we developed a simulator to analyze the robot's behavior. In this simulator, the cooperation between the locomotion controller and manipulation controller is simulated. The parameters of the initial position, paths and speed of opening door can be changed in the simulator to determine the best parameters empirically. Details of the simulator are presented in Section 4. Nine stages of detailed behavior to realize the proposed task is discussed in following. In order to generalize the environment, width of the robot is expressed as R~, length of the robot is expressed Rd, width of the door is expressed D~. These parameters are shown in Figure 5. Where, the manipulator mounted on the robot has the similar length with door width, and such parameters need to satisfy the condition as D w > R~.

299 START

Stage 1

9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . { Position the robot : in front of the Door ~. . . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . . .

"~ i,

Obstacle

on

the

Path'?

,

l

Stage 2

............................................. Find and Grasp the Door Knob ~9 . . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . .

Stage 3

............................................ Rotate the Knob and Push the Door ~9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

~

Can't

see

Door

Knob?

l ;

Open ......................

the

Door

i

"~ . . . . . . . . . . . . . . . . . . . . .

J

.............................................

Stage 5

R e l e a s e

~9 . . . . . . . . . . . . . . . . . . . .

the Knob -~ . . . . . . . . . . . . . . . . . . . . .

Door

Locked'?

,

,. . . . . . . . . . . . . . . . . . . . . . . . . . . ."~ ................

Stage 4

The

Obstacle Back

at of

the the

Obstacle

on

the

Door

?

] i J

l Stage 6

~ ........................................... Move Behind the Door ~,. . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . . .

Stage 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . Find and Grasp the Door Knob ...................... ~ .....................

~

Path?

I

--,I l .'

f

............................................

Stage 8

Close 9~ ....................

the

Door

-~- .....................

1 ,

f

Another (Obstacle

Task

avoidance

etc.)

............................................

Stage 9

Release ~9 . . . . . . . . . . . . . . . . . . . .

the Knob "~ . . . . . . . . . . . . . . . . . . . . . . I END

Figure 4. Procedure to pass through a door-way S t a g e 1 --Position the Robot in Front of the Door

The mobile robot is supposed to have an environment map and navigates itself to the front of the door by the navigation method reported in reference [13]. The stop position is d ~ i g ~ d to be (R_~2-t- r/21) from the front of the door and (R_~ + rn2) from the right side of the door. This position is shown in Figure 5. Where, TIZ 1 and rn2 are the margins for safe movement of the robot. S t a g e 2 - - F i n d and Grasp the Door Knob

If the robot knows its position and knob's position exactly, the robot can grasp the knob by the end-effector own only by position control of the manipulator. However in reality, errors exist between the estimated position and the actual position of the mobile robot. Therefore, the robot must confirm the exact position of the knob by

300

Figure 5. Position the Robot in front of the Door

using the vision which is mounted on the manipulator. If the robot can't find the knob of the door, the robot abandons the task. The next step in the task is to move the end-effector toward the knob and grasp it with the manipulator. This behavior is controlled by a kind of visual servo based on the vision sensor. S t a g e 3 - - R o t a t e the Knob and Push the Door The robot rotates the knob, and pushes the door with a small force. If the force sensor mounted on manipulator senses excess force, the robot recognizes that the door is locked and abandons the task. S t a g e 4 - - O p e n the Door The total length of the manipulator is limited. Therefore the mobile robot must move forward while the manipulator pushes the door. The manipulation controller nmst cooperate with locomotion controller. Designing this behavior is realized in the following way. The locomotion controller moves the robot along a straight line. The line is (_5_~+ rn2) from the right side of the door, shown in Figure 6. At the same time, the manipulation controller controls the manipulator to push the door. The desired position of the mobile robot depends on the angle of the door. As the door angle increases, this results larger forces acting on the end-effector. To open the door

301 with a fixed angular velocity, the robot's velocity must be varied proportional to the angle of the door. As the door angle increases, the robot's velocity must be decreased. The robot's position is calculated as follows.

P~176

D~176

The desired posture of the manipulator is also calculated from the knob's position which also depends on the angle of the door. While a robot pushes the door with mobile base moving forward, position and orientation errors of the mobile base may generate large forces in the end-effector. To absorb the force, a compliant control at the top of the end-effector is needed by using the force sensor. It is discussed in section 5.3. In this stage, the robot also monitors the force for the direction of the motion of door with the force sensor. If excessive force is sensed, the robot concludes that there are obstacles behind the door and abandons the task. According to above formula, the door is opened 90 degrees when the robot arrives at stop position.

Figure 6. Open the Door

302 S t a g e 5 --Release the Knob The end-effector releases the knob, and the manipulation controller returns the manipulator to its initial position. In this stage, only the manipulation controller is active. S t a g e 6 --Move Behind the Door In order to close the door, the robot moves behind it by navigating itself [13]. The robot's path and the exact parameters are shown in Figure 7. At the stop position, the robot uses its ultra-sonic range sensor to correct its position using the wall and the opened door.

Figure 7. MoveBehind the Door S t a g e 7 - - F i n d and Grasp the K n o b The robot finds and grasps the knob by using vision sensor as same as stage 2. S t a g e 8 --Close the Door In this phase, the robot pushes the door to close while moving. This behavior is similar as the Stage4, but the motion of the robot is different. The straight line is {D..,- (-~ + m2)} from the wall, as shown in Figure 8. The desired position of the robot is calculated, P~176

- D w * sin ( 90 - D~176Angle "

303 When the robot stops at the stop position, the door may already be closed. Confirmation of the closure of the door is done by sensing the force acting on the knob.

Figure 8. Close the Door

Stage 9 --Release the Knob The end-effector releases the knob, and the manipulation controller returns the manipulator to its initial position. In this stage, the robot completes the task, and continue to navigate itself to the destination of the robot. The activity of controllers is shown in Figure 9. In this figure, the locomotion controller needs to cooperate with manipulation controller in Stage 4 and Stage 8, and the vision sensor unit needs to cooperate with manipulation controller in Stage 2 and Stage 7.

304

Locomotion Control

Manipulator Control

Vision Sensor

Cooperation

9 9 9

9

Vision and Manipulator

0

Stage 1 Stage 2 Stage 3

0

Stage 4

Locomotion and Manipulator

O

Stage 5

0

Stage 6

O 9

Stage 7

0

Stage 8

Vision and Manipulator

O

Locomotion ,and Manipulator

O

Stage 9

Figure 9. Activity and Cooperation between Controllers

4. S i m u l a t i o n We implemented a program to simulate the behavior of the mobile robot equipped with a manipulator. In this simulator, the model of the robot and environment are built into the simulator. The cooperation between a locomotion controller and a manipulation controller can be studied. Calculating method of the arm position is performed using differential motion control [12]. To find the best parameters for perforlning the behavior, the follow parameters can be changed and the simulation can be repeated. 9 The initial position of the robot 9 The trajectory and the speed of the robot 9 The desired angular speed of the door during opening and closing According to the assumption of the robot, the parameters of detailed behavior is, D w 90cm Rw - 70cm Rd -- 50cm. Under these parameters, we used simulator repeatedly, margin parameter is fixed as follows empirically. ml

-

50cm,

m2

-

5crrz,

m3

-

15c'm

Sn example of the simulation result is shown in Figure 10. In this simulation, we did not consider the dynamics of the robot and accumulated error of the locomotion. We should consider these factors in real environment.

305

I

(3)

(2)

(I)

!

I

(4)

(5)

I

I

(9)

oo)

8

.__----=.---

i

I

I

(6)

(7)

(8)

@ ---03

l _ I

(11)

(12)

(13)

(14)

1

S_

k__

(15)

I

I

I

I

I

(16)

(17)

(18)

(19)

(20)

I

I

(21)

(22)

9 (23)

F i g u r e 10. An Exa, l-nple of S i m u l a t i o n R e s u l t

@ -\ (24)

t_..a

(25)

306 5. I m p l e m e n t a t i o n Next step of our task-oriented approach is to implement the proposed algorithms and architecture on a real robot system. First, we implemented stage 4 behavior on real robot system. One key of the behavior is how to realize a cooperation between locomotion control and manipulation control, and the other key is how to realize a compliant control of the manipulator. In this section, we report a control architecture of robot-system, outlines how to realize a cooperation behavior and compliant control, and experimental result of stage 4.

5.1. A r c h i t e c t u r e of R o b o t - S y s t e m Our group has proposed a distributed control architecture for autonomous robot systems [7][8]. To realize the behavior which is proposed in Section 3, we basically use it. Each function of the robot is modularized in this distributed architecture, and we refer to each distributed function as a module. The architecture form two level hierarchy. The top of the structure is master module which supervises the total control of the robot system. Other modules are connected by bus and each module communicates with master module by using dual-port memory. In this implementation, we allow for direct communication between the locomotion module and the manipulation module. This is because, fast communication is necessary in stages 4 and 8 of the robot's task. The schematic diagram of proposed architecture is shown in Figure 11. Now, master module, manipulation module, locomotion module and force sensor module are working to realize a behavior of stage 4.

. ......................................................................................

B U S [. . . . . . . . . . . . . . . . . . . . .

..... 9- p

I ....

.....

I

I

~2~I::~~ .........

[

.....

/ .....

_[~'- ...... .r..... " ' ~ 1 M,.,o~ I ............

(Locomotion "~ Board

i

~.aoara/i,~

il

Motors for Locomotion .oco

o, on

o0ue

.........................................

Data

! i

(~~

Motors for Manipulator an

ua, o0

I

[.......

sensor

)

) .......... Mo~u[~............ :"................................, --

Ultra-Sonic Transducer and Receiver

~

o0, e

'" .........................................

i: [ i

[

] { i i :

Force Sensor

9 Module . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 11. Architecture of the Robot

I ....

l,,'V~a S o ~

i(V~onsenso~!

i '.. aoar~ l - i l

Data( ~ ~

I

.."...... I M~,,,~ I........: ."...... 1 M~.~

(Manipulati~

;

J

....

Module

"- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

307

5.2. Cooperation between Manipulator and Locomotion According to the design of stage 4, we realized cooperation behavior between manipulator and locomotion. All parameters are same as parameters of simulation. To synchronize a behavior of locomotion and manipulation, master module totMize both modules. It sends reference position to locomotion module, and also sends reference joint angle to the manipulation module. Master module always check differences between current and reference position of the robot vehicle and joint angles of the manipulator. When the both differences are enough small, the master module changes references of position and joint angle in small amount to step forward.

5.3. Compliant Control for Manipulator When a robot pushes the door while moving, position and orientation errors of the robot may result in large forces in the end-effector. Therefore, we have to control the end-effector compliantly along the plane perpendicular to axis of the knob using the force sensor (Figure 12).

Figure 12. Compliant Control

The compliant control is realized as follows. The force sensor detects a force on the vartical plane. In propotion to the amount of the force, the reference position of the endeffector is shifted to detected force direction. It is calculated in every 20 mili-seconds. This mechanism absorbs low frequency component of the position error. The high frequency component of the error is covered by soft rubbers attached inside of the end-effector. 5.4. E x p e r i m e n t a l R e s u l t s We implelnented proposed stage 4 behavior on the target robot "YAMABICO-type TEN", and made door opening experiment. In the experiment, the robot can open the door while the robot-base moving forward slowly. This behavior is shown by continuous pictures in Figure 13.

308

Figure 13. An Experiment of Door Opening Behavior

6. C o n c l u s i o n and F u t u r e W o r k s In this paper, we proposed a design of a behavior of the mobile robot equipped with a manipulator to open a door and to pass through a door-way. Such a behavior is realized by a cooperation between the locomotion controller module and manipulation controller module. We verified this cooperation of it by simulation. Finally, we showed the implementation of stage 4 on our mobile robot "YAMABICO type-TEN" and experimental result. To realize a total behavior of robot's passing through a door-way, we must solve a number of technical problems. An important topic is how to confirm the position of the knob and how to grasp it with the end-effector (stage 2), and our next step is to realize such grasping behavior. To confirm the exact position of the knob, we will use a vision sensor. After realizing the stage 2 behavior, we will make a total behavior of robot's passing through a door-way. In a future, we will make a long distance navigation for an autonomous mobile robot equipped with a manipulator including a door opening behavior.

309

Acknowledgement T h e a,uthors express their g r a t i t u d e to the members of Intelligent Robot L a b o r a t o r y of University of T s u k u b a .

REFERENCES 1. Wayne F.Carriker, Pradeep K.Khosla, Bruce H.Krogh : "Path Planning for Mobile Manipulators for Multiple Task Execution", in IEEE Transactions on Robotics and Automation, VOL.7, No.3, JUNE 1991. 2. W.Miksch, D.Schroeder :"Performance-Functional Based Controller Design for Mobile Manipulator", in Proc. of IEEE Int. Conf. on Robotics and Automa, tion, pp227-232, 1992. 3. Homayoun Seraji : "An On-line Approach to Coordinated Mobility and Manipulation", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp28-33, 1993. 4. S.Dubowsky, E.E.Vance : "Planning Mobile Manipulator Motions Considering Vehicle Dynamic Stability Constraints", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp 1271-1276, 1989. 5. Norbert A.M.Hootsmans, Steven Dubowsky : "Large Motion Control of Mobile Manipulators Including Vehicle Suspension Characteristics", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp2336-2341, 1991. 6. Yoshio Yamamoto, Xiaoping Yun : "Control of Mobile Manipulators Following a Moving Surface", in Proc. of IEEE Int. Conf. on Robotics and Automation, ppl-6, 1993. 7. Yutaka Kanayama, Shin'ichi Yuta : "Computer Architecture for Intelligent Robots", in Journal of Robotic Systems, 2(3), pp237-251, 1985. 8. S.Yuta and J.Iijima : "State Information Panel for Inter-Processor Communication in an Autonomous Mobile Robot Controller", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, 1990. 9. S.Yuta., S.Suzuki, S.Iida : "Implementation of a. small size experimental self-contained autonomous robot", in Proc. of Second International Symposium On Experimental Robotics, 1991. 10. S.Iida and S.Yuta : "Vehicle Command System and Trajectory Control for Autonomous Mobile Robots", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, pp212217,1991. 11. K.Nakazawa, M.Shimizu, S.Yuta, M.Nakajima : "Measurement of 3-D Shape and Position by Fiber Grating Vision Sensor Installed on a Manipulator", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, pp611-616,1988. 12. Rechard P.paul : "Robot Manipulators", the MIT Press, 1981. la. K.Nagatani and S.Yuta : "Path and Sensing Point Planning for Mobile Robot Navigation to Minimize the Risk of Collision", in Proc. of I E E E / R S J Int. Conference on Intelligent Robots and Systems, pp2198-2203, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

311

The D e v e l o p m e n t of R e - u s a b l e S o f t w a r e S y s t e m s for Intelligent A u t o n o m o u s R o b o t s in Industrial and Space Applications E. Freund and J. Rol3mann Institute of Robotics Research, University of Dortmund, Otto-Hahn-Str. 8, 44227 Dortmund, Germany, Email: rossmann @damon.irf.uni-dortmund.de ABSTRACT

The aim of the development of modern robot control systems at the Institute of Robotics Research (IRF, Institut fiir Roboterforschung) is to provide higher flexibility and autonomy for robot systems, together with consistent and user-transparent concepts for installation, programming and operation of robots in their working-environment. An important feature of a flexible future-oriented robot controller is the capability to incorporate information from different classes of sensors and to have standardized communication interfaces with other factory-automation components. To ease the robot systems' operation an intuitive graphical user-interface is provided. Furthermore, implicit programming and action planning methods allow robot programming on a much higher level of abstraction than today's procedural programming languages. For industrial applications, this implies easier and quicker adaptation of robot workcells to new tasks and a significant increase of the robots' flexibility to allow the feasibility of small quantity orders. This paper gives a basic description of "the principle of the hierarchical coordinator", the basic system concept for the development of complex control systems at the IRF. This concept has been developed and employed for the design and implementation of intelligent, autonomous robot systems, which are described in detail. The applications described range from the design of a flexible assembly workcell to a multirobotsystem for autonomous experiment-servicing in a space laboratory module. 1. INTRODUCTION The development of intelligent and autonomous robotic systems with a high degree of flexibility, good operational capabilities and the potential for efficient integration into an industrial CIM-environment is still a difficult task which requires both, technical and management skills. In the recent years the transfer of know-how from the area of space robotics to industrial applications has been emphasized at the Institute of Robotics Research because robotic system capabilities for space applications have already reached a considerable standard with regard to system autonomy, safety and local intelligence that is appropriate to be applied to industrial applications. The development of robot control software at the IRF has always aimed at the realization of components which take into account industrial standards and which can be integrated into comprehensive automation concepts. Particular care in the design of precise interfaces has paid off, not only when adapting various software components to control systems in an industrial environment, but also has permitted a consistent integration of modules in the design and development of new products.

312 2. COORDINATION OF THE DEVELOPMENT OF I N T E L L I G E N T AUTONOMOUS ROBOTS Fig. 1 gives an overview of the interdependence between the research activities at the IRF dedicated to the design of intelligent autonomous robot and automation systems. This overview is structured according to the activities of the four research groups at the IRF: "Autonomously Guided Vehicles and Sensor Technology", "Multirobot Systems and Space Robotics", "Automation Systems" and "Communication Structures and Intelligent Systems". In Fig. 1, the arrows symbolize the transfer of results into new projects and products. A strictly modular realization of functions with the capability for transfer and further development requires a great deal of inter-project management, especially regarding the specification of module interfaces. Experience has shown, that this additional expense in the first stage of a new development is made up with increasing efficiency and lower costs for the resulting product.

Fig. 1: Coordination of research projects at the IRF Fig. 1 shows that developments in the projects ROTEX (Robot Technology Experiment) and CIROS (Control of Intelligent Robots in Space) [3] can be regarded as basic research works at the IRF, whose functionality contributed significantly to the results of the projects evolving.

2.1 Space Robotics as a Motor for the Development of Intelligent Robot Control Systems Essential parts of the path control software of the ROTEX-robot its capabilities were demonstrated in the second German space mission D 2 - were developed at the IRF. One of the major challenges of this development was the cooperation with several project partners and the demand to meet the quality standards for flight software. Fig. 2 displays a simplified scheme of the ROTEX path control structure. The functionality for microgravity path planning deserves special attention due to its capability to plan robot motions which minimize the forces and torques exerted on the space shuttle. Robot motions are planned such, that the amplitude of disturbances remains limited in a specified frequency band. This guarantees that payload experiments requiring zero gravity can be carried out successfully. The other path control modules such as collision detection and -avoidance are

313 not specifically designed for space applications and can directly be used in terrestrial robot controllers.

Fig. 2: Simplified structure of the ROTEX path control While the ROTEX software was developed concentrating especially on the precise project timing and the quality standards for flight software, research beyond the scope of ROTEX was carried out developing a robot control system suitable to control a multirobot system for space applications. In the C I R O S (Control of Intelligent Robots in Space) [3] project, this multirobot control structure was built, based on the concept of the hierarchical coordinator [ 1][9]. Using the methodology of the hierarchical coordinator, a robot control system with a significant increase in autonomy and the operational capabilities for space laboratory servicing was designed. The implemented control system IRCS (Intelligent Robot Control System) and is well suited to perform service tasks in a space laboratory environment such as the CIROS testbed shown below.

Fig. 3: The CIROS multirobot testbed

314 The CIROS testbed is equipped with two robots with redundant kinematics. Each has 6 revolute and one translational joint. The layout of the laboratory is similar to that of the German Spacelab and it was built in a modular manner, so that it is possible to incorporate the latest experiments and to adapt the environment to reality as precisely as necessary. In four double and two single racks, switches and other operating elements of the experiments were reproduced and arranged in order to be performing realistic operational sequences. A tool exchange capability and a torque/force-sensor was mounted at each robot's wrist to allow the robots to operate drawers and flaps for experiment processing. Example tasks comprise the exchange of printed circuit boards and inserting of a sample into a heater. The redundant two-armed robot configuration with the torque/force-sensors at the robots' wrists permits fully coordinated operation as well as synchronized or independent action of the two robots. Furthermore, the robots are equipped with hand cameras and the whole laboratory can be supervised by a scene camera. A vision system performs the imageprocessing in order to provide the planning levels with necessary information, i.e. about the exact location of objects to be gripped. The control desk in the foreground contains devices for system control and supervision.

Fig. 4." Structure of the multirobot control system IRCS

315 Fig. 4 shows the structure of the CIROS multirobot control IRCS (Intelligent Robot Control System), that contains, superimposed on the level of the individual robot controllers, the additional levels for online-collision-avoidance, multirobot-coordination and automatic action planning [3][ 10][11]. The capabilities of this multirobot system include the ability to operate both robots independently or in a coordinated manner. Cooperative operation enables the operator to handle big, heavy or sensitive loads with high precision, while independent robot operation allows a parallel execution of tasks. Both, cooperative and parallel operation are presented in the following sections as coordinated operation [2]. A task oriented programming interface facilitates the programming of the multirobot system. The automatic action planning component [4] can interpret action descriptions which are formulated on a high level of abstraction. With the aid of the central world model that contains a mathematical description of the environment, tasks formulated in a "user-friendly" manner like "close drawer 1 in rack 4" or, as shown in fig. 5, "put sample 1 into heater slot 1" are decomposed into so-called elementary actions. These elementary actions are selected from a predetermined set of actions comprising the activation of endeffectors, locking or unlocking of the gripper exchange system and the path planning functionality for independently working or cooperating robots.

Fig. 5: Automatic action planning for multirobot systems During task execution, the robots are supervised by the level of online collision avoidance [13][12][14], which predicts potential collision dangers between robots and between robots and their environment. In case of collision danger, it actively avoids a collision by changing the pre-programmed path of the endangered robots in real-time, considering the static and dynamic limits of the robots and task specific constraints. Fig. 6 shows an example of such an automatically generated avoidance trajectory. As a part of its task, the upper robot starts its motion on the right side of the CIROS-environment, moving to a drawer in the left side of the cell. The lower robot is not busy during this motion and should therefore give way without disturbing the upper robot's motion. Snapshots of the resulting paths are shown in fig. 6. It

316 should be emphasized here, that the lower robot's motion only results from the need to avoid a collision with the upper robot and to return to its starting position. With the help of the online collision avoidance level, the upper robot was able to follow its trajectory exactly in the way it was planned by the path planning module in the level of coordinated operation (fig. 4).

Fig. 6: Online-collision-avoidance in the CIROS-demonstrator The description of the functionality of the CIROS multirobot control system gives an idea of the complexity of a system including capabilities for automatic action planning, coordinated operation and online collision avoidance.

Fig. 7: Structure of the multirobot system using the hierarchical coordinator

317 The entire control architecture was designed using the concept of the hierarchical coordinator which provides the system theory to describe the interfaces and the assignment of functionality to the different hierarchical levels in a systematic manner. Fig. 7 shows the architecture of the multirobot system from the system-theoretical point of view of the hierarchical coordinator as introduced in [9].

2.2 Modern Virtual-Reality Based Man-Machine-Interfaces for Intuitive Operation The use of methods for automatic action planning facilitates the programming of a multirobot system like CIROS very much, but still requires a great deal of knowledge about the workspace environment and symbolic descriptors for individual components. The project VITAL (Virtual Reality for Telerobotics) pursues the goal to develop an easy to use, intuitive man-machine-interface for single robots and multirobot systems that uses today's Virtual Reality technology. Various tasks that are executed by robots in a space laboratory environment in order to conduct experiments, are carried out by researchers, who do not have robot specific knowledge. In the VITAL project, these researchers can now perform their experiments in a graphically simulated virtual environment, employing a data glove for intuitive interaction with the control system. Eye-phones, small displays integrated into a helmet, give a stereoscopic view of the environment and the experiments. Any operation executed with the data glove, which is an animated human hand in virtual reality, is translated into robot actions with the aid of the action planning component and executed by the robots in the physical environment. Fig. 8 shows a view into the Virtual-Reality representation of the CIROS environment, showing the animated hand, a drawer and the printed circuit boards of a computer system, that the operator is manipulating. Please note that the robots are not shown in the virtual environment, the operator's hand is the only active part.

Fig. 8: A view of the CIROS environment in the virtual reality representation (Photo: CAE)

318 A straight forward approach to control a robotic system via Virtual Reality (VR) is to directly track the data-glove with the robot's tool-center-point (TCP); this mode of operation will further be called "telemanipulation or hand-tracking mode". The n theoretical ~ advantage of this approach is its flexibility. With the robot directly tracking the operators hand, the robot should ~ in theory ~ be able to do the same things as the human operators. In practice, this mode of operation cannot really be recommend for most of the tasks that involve contact between a robot and its environment. If e.g. an assembly task should be commanded by a virtual reality system in hand-tracking-mode, the following problems are encountered: 9 time delays between the display of a robots movement in the VR and its physical movements are critical for the stability of the handling process, because, similar to standard telemanipulation approaches, the user is still "in a real-time control loop" 9 the graphical model has to be very precise 9 the measurement of the position and orientation of the data-glove has to be very precise 9 measures have to be taken to reduce "trembling" of the operators hand 9 online-collision avoidance features are necessary to grant safe operation 9 a versatile sensor-control is necessary to compensate for unwanted tensions when objects are inserted into tight fittings The problems mentioned above make very difficult the control of the robot by just tracking the operators hand, so that a new method to control the robotic system [ 15] was developed that helps to facilitate the user's job to control the robots. The aim was to hide from the user the typical robot-related difficulties like singularity-treatment, collision avoidance, sensor-based guidance of the robot and the problem to coordinate different robots. By making use of modern control techniques that can provide these features, the necessary expertise to e.g. conduct an experiment in a space laboratory environment like CIROS is thus shared between the user with the necessary knowledge about the experiment and the robot-control with the necessary "knowledge" about how to control the robots. So another mode of operation of the VR-system was developed: the task deduction mode, which makes use of the capabilities already developed for the IRCS. The solution was to enhance the VR-system in the way that while the user is working, the different subtasks that are carried out by the User are automatically recognized and task descriptions for the IRCS are deduced. These task descriptions are then sent to the action planning c o m p o n e n t of the IRCS and are carried out safely by the IRCS with its inherent capabilities. This approach has several advantages over the direct hand-tracking: 9 the user is no longer in the "real-time control loop". Complete subtasks are recognized and carried out as a whole without the necessity for immediate feedback to the user 9 the subtasks are carried out safely using the sensor-control and -supervision capabilities of the IRCS 9 for physical assembly tasks, the accuracy of the environment model can be compensated for by automatic sensor-supported strategies 9 the accuracy of the data-glove tracking-device is not as important as for the direct tracking mode. The allowable tolerances when the user is gripping an object or inserting a peg in a hole can be adjusted in the VR-software 9 if a heavy or fragile object is moved in the VR, the planning component is automatically capable of using the two CIROS robots in a coordinated manner to do the transport in reality 9 different users working at different VR-systems can work on different tasks that are sent to the planning component of the IRCS, which then can decide, depending in the available

319 resources, an adequate sequence of the tasks to be done. Thus one robotic system can serve e.g. multiple experimentators in a space laboratory environment. if the robot control is versatile enough, there is no longer a need to even show a robot in the virtual environment displayed to the user; so the user more and more gets the impression of carrying out a task "himself", which is the highest level of intuitivity, that can be reached. if the planning component is versatile enough, it cannot only control the robots, but also other kinds of automated devices. The action planning component in the CIROS environment "knows" that to open the leftmost one of the three drawers, it doesn't need to employ a robot. This drawer is equipped with a motor, so that it just has to control the motor to open this drawer. Robot-automated and hard-automated tasks are thus controlled under one unified framework. The only deficiency of this mode of operation is, that the IRCS must principally know the different types of tasks carried out by the user. Therefore, for special applications that are hard to describe as parameterized tasks m e.g. the shaking of a material sample m or for applications that were not foreseen by the developers of the action planning component, the hand-tracking mode is still available in the realized VR-system. Experience showed, that 95% of the tasks carried out in the CIROS environment can successfully be commanded in task-

deduction-mode. In August 1994, as a part of a cooperation between the Institute for Robotics and Intelligent Systems of the University of Southern California (USC) and the Institute of Robotics Research in Germany, the PC-based version of the VR-system, that builds on the PC-based robot simulation system COSIMIR, described in chapter 3.2, successfully performed the control of the CIROS testbed in Germany from Southern California via INTERNET. In a research project at USC, the same VR-system is used to simulate, supervise and control a sixlegged walking machine. The user here can "enter the same room" with the simulated walking machine, he can place different obstacles in the way of the walking machine and see how the different implemented algorithms behave in the simulated world before they are downloaded into the physical walking machine for testing purposes. This allows several team-members to work with the machine at the same time, running several instances of the VR-system and thus reduces the need to work with the physical machine.

3. SCALEABLE ROBOT CONTROL ARCHITECTURES: HIGH-PERFORMANCE AND L O W - C O S T NO C O N T R A D I C T I O N In the research field "robot control and simulation" at the IRF, experiences and results from the projects ROTEX and CIROS were directly applied to products for industrial applications. As shown in fig. 1, the product development was conducted in the projects FRC (Flexible Robot Control), PCROB (PC-based Robot Control) and C O S I M I R (Cell Oriented Simulation of Industrial Robots).

3.1 Robot Control Development As only an Intel 8086 based computer had space qualification when the ROTEX software was developed, the control software had to b e coded very efficiently. Exactly these efficient routines make up the kernel of the PC-based robot control PCROB (PC-based Robot Control).

320 At first, PCROB was planned as a low cost robot control with an easy to learn user-interface and dedicated to small robots. PCROB runs on a PC, a low-cost hardware platform, in order to provide a low cost robot control for didactic purposes and as a starting point for automation and robotics in smaller companies. In parallel, the VME-based robot control FRC (Flexible Robot Control) was implemented, providing all the features of a modern robot control. In the FRC, a sensor-interface, a network-interface and a programming environment for IRL, VAL-II and IRDATA were provided. In the past two years, due to the increasing capabilities and speed of PC-hardware, the development of both robot control could be harmonized, such that more and more features could be realized for the PCROB as well. So today high performance and low cost are no longer a contradiction because PCROB now provides the same functionality as the FRC including a teach pendant, the programming environments for IRL, VAL-II, B APS and IRDATA, a sensor interface and a network interface as shown in fig. 9.

Fig 9: The structure of the PC-based robot control PCROB

3.2 Robot Simulation Systems Based on the results of the P C R O B and CIROS projects the robot simulation system C O S I M I R (Cell Oriented Simulation of Industrial Robots) was developed. It runs on either a PC with MS-Windows, Windows NT or on a Silicon Graphics workstation. COSIMIR consists of three major components: the control and display component, the robot controller (PCROB) and the environment model (fig. 10).

321

Fig. 10: Structure of the COSIMIR simulation system In addition to providing a graphical user interface, the control and display component supervises the environment model and robot controller components. During each simulation cycle, new state values for the robot and simulated mechanisms in the workcell (i.e. revolving table, conveyor belts, parts feeders, etc.) are requested from the robot controller (PCROB) and then transferred to the environment model. An object manager maintains the environment model of the simulated workcell calculating the current 3D polyhedral model of the workcell taking into consideration the interaction between the individual mechanisms. E.g. an object on a conveyor belt or an object placed on a revolving table, must be moved together with a moving belt or table. The object manager was adopted from CIROS project. In the CIROS project, the object management task is principally the same, however the volume model of the workcell here is not used for the model visualization but rather makes up the basis of calculation for online collision detection and avoidance. In the CIROS collision avoidance, corresponding to its task in CIROS, this component is named the collision avoidance specific world model [13]. The reason that the CIROS collision avoidance can be incorporated in COSIMIR is one of the important characteristics of the CIROS collision avoidance, it is extremely flexible because the model information is not hard coded in the source code but rather formulated by a textual description language known as GEMOL. The model description is read during the initialization phase and is converted during run time to the necessary realtime data structures. COSIMIR optimally uses this flexibility because the user just loads the GEMOL-description of another work cell to work with it. To facilitate the work cell definition, i.e. the programming of a GEMOL work cell description, the program COSIMOD,

322 delivered with COSIMIR, offers a graphical interactive environment to develop the work cell descriptions; thus alleviating the user from learning GEMOL. Besides the flexible modeling of the environment, the integration of the industrial controller PCROB is a strong argument for COSIMIR. The robot programs generated by COSIMIR will be capable of running in the actual robot work cell without changes. This is because the PCROB kernel that controls the simulated robot in COSIMIR also controls the actual robot allowing the problems such as joint limits, static and dynamic limits and reachability in the neighborhood of singularities to be found during the simulation execution. The integration of the tested object manager and PCROB components shortens the integration time of COSIMIR. With the implementation of an advanced communication concept, COSIMIR will always be integrated with the latest version of PCROB; thus all advances of PCROB will be utilized in COSIMIR. COSIMIR is available in English and German and since December 1993 is distributed world wide by an industrial partner.

3.3 Robot Controllers for Flexible Manufacturing Scaleable, task adaptable robot controllers form an important base of modern factory automation. These controllers are not developed with only a specific industrial robot in mind but rather with standardized interfaces, network connections and are programmable with a standard robot programming language for applicability with a wide range of industrial robots. In addition such systems must not only be of high performance to make flexible computer controlled manufacturing possible, they must also be cost effective in the integration into factory's automation and control as well as provide means to be integrated into the factory's communication infrastructure. In the project SENROB at the IRF, several new communication techniques have been developed for flexible manufacturing cells. Fig. 11 shows a work cell and its various components that are integrated together to form a manufacturing cell.

Fig. 11: The flexible manufacturing cell

323 In fig. 11, a SCARA type robot and a 6 revolute axis type robot coordinately work on a process plan determined by the cell controller with the goal of minimizing the cycle time. The cell controller selects the appropriate process plan, after which the workpiece carrier is moved onto the transport system and is identified by the barcode reader. The cell controller is realized similar to the C I R O S action planning component, such that it represents a superimposed control level for the different components of a work cell. The components of the work cell receive their commands via the component drivers which provide standardized interfaces of the different components for the cell controller. Fig. 12 shows the overall communication structure of the flexible manufacturing cell which is an important aspect of the practical realization for application of industrial robots. The communication functions are implemented using standardized busses such as the InterBus-S or ProfiBus and the simple control functions are implemented via a SPS that is connected to the bus system.

Fig. 12:

The communication structure related to the flexible manufacturing cell

Due to the standardized communication mechanisms incorporated, the cell controller, as shown in fig. 12, can also request logistic services e.g. from an autonomously guided vehicle, which delivers new production materials. This is only one example illustrating the advantages

324 of the IRF's aim to connect all automation components in a consistent manner via networks and to provide the necessary interfaces and control structures to request special services as needed.

3.4 Autonomously guided vehicles For the material flow control in the frame of flexible manufacturing today, many autonomously guided vehicles (AGV) are available for use. Experience from the European Community's project PROMETHEUS demonstrated that the problems of the control and coordination of AGV transport systems are tightly related with the control of a multirobot systems. The AGV controller contains a route planning system, a level for multiple vehicle coordination and a level for collision detection and avoidance. It was obvious that the theoretical basis for the realization of the vehicle controller could be the principle of hierarchical coordinator [9], the same principle that served as an architectural design guide for the CIROS multirobot system. Using the standardized communication structure that was shown in fig. 12, the cell controller, submits a transport request. By means of either the sensory or model based provided path information, the vehicle's trajectory is generated by the control computer for all the vehicles within the control computer's movement zone while considering the physical constraints such as the speed and lateral and longitudinal accelerations. The partial steps of the trajectory determination are for example: 9 Detection of conflict situations 9 Generation of alternative maneuvers 9 Calculation and selection of optimal maneuver combinations The calculated desired paths are then transmitted to the vehicle, in our case by means of a communication link based on infrared light transmission.

Fig. 13:

The IRF's autonomously guided vehicle CAESAR

325 In addition to the control of the vehicle the exact determination of the position of the AGV is an extremely difficult problem. Because of the possible inertial and the odometric systems with unacceptable accuracy, a new landmark navigation system was developed on the basis of a 2D laser scanner [6] in the project CAESAR. A special feature of this method is that no artificial landmarks or induction wires are needed and that the process is very robust against visibility obstructions. For the application with the AGV using this navigation process, for example in a manufacturing plant, no structural steps are required such as the laying of induction wires. To initialize the environment model for the navigation system, only o n e vehicle has to be driven over the predetermined paths o n c e being controlled by hand. This defines the environment model that will then be transmitted to all the other vehicles which will then be immediately ready for action. Fig. 13 shows one of the three autonomously guided vehicles at the IRF equipped with this new navigation system. Above the laser, there is a hemisphere visible that contains the infrared transceiver for communication with the control computer. 4. C O N C L U S I O N In the scope of this paper the principle functionalities necessary for the realization of an intelligent autonomous robot system were discussed. Based on the discussion of the CIROS multirobot controller it was illustrated which functionalities are needed and how they can be implemented to ensure optimal cooperation. In addition, it was pointed out how the development of various sub-components are coordinated and how their requirements are defined, such that the necessary subsystem interfaces are suitable. Done properly, this allows the re-use of various components to economically generate a whole family of compatible automation products ranging from scaleable controllers to robot simulators and virtual-reality man machine interfaces. Reacting to the market requirements, new products can be quickly and effectively realized such as the presented PC-based robot controller PCROB and the PCbased robot simulation system COSIMIR. The control structure underlying the theoretical system principles of the hierarchical coordinator was briefly illustrated and the most important functions were presented; examples of these are the coordinated operation of robots and the collision avoidance. Besides the mentioned direct spin-offs for the space robotics, the experience from the development of the intelligent autonomous robot systems for the CIROS and ROTEX projects could be adopted for the development of control structures for an autonomously guided vehicle that serves as a flexible transport system in a factory environment. All the examples presented here demonstrate that in the components' development careful attention is paid to the maintaining of the interface definitions and when possible they use industrial standards for interfaces and communication protocols. The described realizations are characterized by high modularity and precise interfaces that guarantee that a desired automation system can be developed piecemeal and can be easily and clearly adopted to new requirements and technologies.

REFERENCES Freund, E., Fast Nonlinear Control with arbitrary Pole-Placement for Industrial Robots, The International Journal of Robotics Research, Vol. 1, No. 1, 1982.

326 2. .

.

Kaever, P., Kinematische und Taktile Koordination in Mehrrobotersystemen, Ph.D.Thesis, Institute of Robotics Research, University of Dortmund, February 1993. Kaever, P.; Kernebeck, U.; Pesara, J.; Rol3mann, J., Abschlul3bericht zum Projekt CIROS, Institute of Robotics Research, Dortmund, Germany, April 1991. Kernebeck, U., Action Planning for Multiple Robots in Space, The Fifth International Conference on Industrial Engineering Applications of Artificial Intelligence and Expert Systems, Paderborn, Germany, 1992. Freund, E.; Buxbaum, H.-J., Control of Robot-based Flexible Manufacturing Work Cells, Proc. Int. Conf. on Advanced Mechatronics, Tokyo, Japan, 1993, pp. 348-353. Freund, E.; Dierks, F.; Rogmann, J., Untersuchungen zum Arbeitsschutz bei Mobilen Robotern und Mehrrobotersystemen, Schriftenreihe der Bundesanstalt ffir Arbeitsschutz und Unfallforschung, Wirtschaftsverlag NW, ISBN 3-89429-297-0, 1993. Freund, E.; Hypki, A.; Uthoff, J.; van der Valk, U., COSIMIR und PCROB: Integration von Zellensimulation und Robotertsteuerung auf PCs, Fachtagung Intelligente Steuerung und Regelung von Robotern, Langen, Germany, November 1993. Freund, E.; Judaschke, U., Automated Vehicle Guidance as a Component of a Traffic Control System, Proceedings of the 2nd Prometheus Workshop, Stockholm, Sweden, October 1989, pp. 204-213. Freund, E.; Rogmann, J., A Generic Hierarchical Control Structure for a Large Class of Automation Problems: Theory and Applications, Proceedings of the International Workshop on Advanced Robotics, Beijing, China, August 1991.

10. Freund, E.; Rol3mann, J., Autonomous Operation of Multi-Robot-Systems in Factory-ofthe-Future Scenarios, Proceedings of the 6th International Conference on CAD/CAM, Robotics and Factories of the Future, Vol. 1, S. 269-274, London, Great Britain, August 1991. 11. Freund, E.; Rol3mann, J., Control of Multi-Robot-Systems for Autonomous Space Laboratory Servicing, Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space, i-SAIRAS, Vol. 1, S. 443-453, Toulouse, France, September 1992. 12. Rol3mann, J., Echzeitf/ihige, kollisionsvermeidende Bahnplanung ftir Einzel- und Mehrrobotersysteme, 27. Colloquium on Control Techniques, Boppard, Germany, February 1992. 13. Rol3mann, J., Echtzeitf~ihige kollisionsvermeidende Bahnplanung far Mehrrobotersysteme, Ph.D.-Thesis, Institute of Robotics Research, University of Dortmund, Germany, February 1993. 14. Rol3mann, J: "Online Collision Avoidance for Multi-Robot-Systems, a new Security Methodology", Proceedings of the 6.th Topical Meeting on Robotics and Remote Systems, Monterey, CA, Februar 1995. 15. Rol3mann, J., Virtual Reality as a Control and Supervision Tool for Autonomous Systems, Proc. of the 4th International Conference on Intelligent Autonomous Systems, Karlsruhe, Germany, March 1995.

Intelligent Robots and Systems

V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

327

Toward Integrated Operator Interface for Advanced Teleoperation under Time-Delay Antal K. Bejczy, Paolo Fiorini, Won Soo Kim, and Paul S. Schenker ~* ~Jet Propulsion Laboratory California Institute of Technology, Pasadena, CA, USA This paper briefly describes an Advanced Teleoperator (ATOP) system and its control station where a variety of computer-based operator interface devices and techniques are integrated into a functional setting, accomodating a primary operator and secondary operators. The overall sensing and computer aided system setting intends to include all perceptive components that are necessary to perform sensitive, dual-arm manipulation efficiently, focussed at non-repetitive and unexpected tasks. Computer graphics is a key operator interface component in the control station where new types of manual interface devices also are employed. The results of some generic and application task experiments are summarized, including the performance of a simulated remote satellite servicing task, carried out under four to eight seconds communication time delay, using satellite TV and Internet computer communication links. In conclusion, the paper highlights the lessons learned so far. 1. I N T R O D U C T I O N In general, teleoperation implies continuous human operator involvement in the control of remote manipulators. Typically, the human control is a manual one, and the basic information feedback is through visual images. Continuous human operator involvement in teleoperation has both advantages and disadvantages. The disadvantages become quite dramatic when there is an observable, two-way communication time delay between the operator and the remotely controlled equipment. Modern development trends in teleoperators control technology are aimed at amplifying the advantages and alleviating the disadvantages of the human element in teleoperation through the development and use of various non-visual sensing, intelligent or task-level computer controls, computer graphics or virtual reality displays, and new computer-based human-machine interface devices and techniques in the intormation and control channels between the operator and the remotely controlled manipulators. These development trends are typically summarized under the popular titles of telepresence and supervisory control technologies. In this paper, those two titles are lumped under the term advanced teleoperation. Several notes should be added here to the objective description of telepresence and supervisory control technologies. First, none of them eliminates the human operator from the control operation, but both change the operator's function assignments and employ *The research described in this paper has been carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.

328 human capabilities in new ways. Second, both technologies promise the performance of more complex tasks with better results. But, in doing so, both technologies also make a close reference to human capabilities of operators who will use evolving new devices and techniques in the control station. Third, both telepresence and supervisory control make reference to evolving capabilities or features of other technologies like sensing, high performance computer graphics, computerized electro-mechanical devices, algorithms-based flexible automation, expert systems for planning and error recovery, and so on. Thus, the progress in both technologies are tied to rich multidisciplinary activities. Fourth, both technologies need the evaluation and validation of their results relative to application environments. This paper is focused at the description and some practical evaluation of an integrated operator interface system for advanced teleoperation, developed at the Jet Propulsion Laboratory (JPL), during the past six to seven years, and exercised recently for realistic remote control experiments between JPL in California and the Goddard Space Flight Center (GSFC) in Maryland, 4000 Km away from JPL, using satellite TV and computer communication links between JPL and GSFC. First we describe the JPL Advanced Teleoperator (ATOP) system and its control station where a variety of operator interface devices and techniques are integrated into a functional setting, accomodating a primary operator and secondary operators. Then, we will summarize the results of some generic and application task experiments. In the third part of the paper, the JPL-GSFC simulated remote satellite servicing task, under communication time delay, will briefly be described. Throughout the paper we will emphasize the designs and use of the operator interface elements and their functional integration. In the concluding part of the paper, we will highlight the lessons learned so far. 2. A T O P A N D I T S C O N T R O L S T A T I O N The basic underlying idea of the JPL ATOP system setting is to provide a dual arm robot system together with the necessary operator interfaces in order to extend the twohanded manipulation capabilities of a human operator to remote places. The system setting intends to include all perceptive components that are necessary to perform sensitive remote manipulation efficiently, including non-repetitive and unexpected tasks. The general goal is to elevate teleoperation to a new level of task performance capabilities through enhanced visual and non-visual sensing, computer-aided remote control, and computer-aided human-machine interface devices and techniques. The overall system is divided into two major parts: the remote (robot) work site and the local (control station) site, with electronic data and TV communication between the two sites. The remote site is a workcell. It comprises: (i) two redundant 8-d.o.f. AAI arms in a fixed base setting, each covering a hemispheric work volume, and each equipped with the latest JPL-developed Model (3 smart hands which contain 3D force-moment sensors at the hands' base and grasp force sensing at the base of the hand claws, (ii) a JPL-developed control electronics and distributed computing system for the two arms and smart hands, and (iii) a computer controllable multi-TV gantry robot system with controllable illumination. This gantry robot currently accomodates three color TV cameras, one on the ceiling plane, one on the rear plane, and one on the right side plane. Each camera can be

329

Figure 1. The JPL ATOP Dual Arm Workcell with Gantry TV Frame

position controlled in two translational d.o.f, in the respective plane, and in two orientation directions (pan and tilt) relative to the respective moving base. Zoom, focus and iris of each TV camera can also be computer controlled. A stereo TV camera system is also available which can be mounted on any of the two side camera bases. The total size of the rectangular remote work site is" about 5 m width, about 4 m depth, and about 2.5 m hight. See Figure 1 for ATOP remote workcell. The control station site organization follows the idea of accomodating the human operator in all levels of human-machine interaction, and in all forms of human-machine interfaces. Presently, it comprises: (i) two general purpose Force-Reflecting Hand Controllers (FRHC), (ii) three TV monitor, (iii) a TV camera/monitor switchboard, (iv) manual input device for TV control, and (v) three graphics displays" one is connected to the primary graphics workstation (IRIS 4D/310 VGX) which is used for preview/predictive displays and for various graphical user interfaces (GUI's) in four-quadrant format; the second is connected to an IRIS 4D/70 GT workstation and is solely used for sensor data display; the third one is connected to a SUN workstation (SparcStation 10) and is used as a control configuration editor (CCE), which is an operator interface to the teleoperator control software based oll X-window environment. See Figure 2 for ATOP local control station.

330

Figure 2. The JPL ATOP Control Station

2.1. H a n d C o n t r o l l e r s

The human arm and hand are functionally both powerful mechanical tools and delicate sensory organs through which information is received from and transmitted to the world. Therefore, the human arm-hand system (thereafter simply called hand here) is a key communication medium in teleoperator control. With hand actions, complex position, rate or force commands can be formulated and very physically written to the controller of a remote robot arm system in all workspace directions. At the same time, the human hand also can receive force, torque, and touch information from the remote robot armhand system. Furthermore, the human fingers offer additional capabilities to convey new commands to a remote robot controller from a suitable hand controller. Hand controller technology is, therefore, an important technology in the development of advanced teleoperation. Its importance is particularly underlined when one considers computer control which connects the hand controller to the remote arm system. The direct and continuous (scaled or unscaled) relation of operator hand motion to the remote robot arm's motion behavior in real time through a hand controller is in sharp contrast to the computer keyboard type commands which, by their very nature, are symbolic, abstract and discrete (non-continuous), and require the specification of some set of parameters within the context of a desired motion. In contrast to the standard force-reflecting, replica master-slave systems, a new form

331 of bilateral, force reflecting manual control of robot arms has been implemented at the JPL ATOP project. The hand controller is a backdrivable six d.o.f, isotonic joystick. It is dissimilar to the controlled robot arm both kinemathically and dynamically. But, through computer transformations, it can control robot arm motion in six task space coordinates (in three position and three orientation coordinates). Forces and moments sensed at the base of the robot hand can back-drive the hand controller through proper computer transformations so that the operator feels the forces and moments acting at the robot hand while he controls the position and orientation of it. This hand controller can read the position and orientation of the hand grip within a 30 cm cube in all orientations, and can apply arbitrary force and moment vectors up to 20 N and 1.0 Nm, respectively, at the hand grip. The computer-based control system supports four modes of manual control: position, rate, force-reflecting, and compliant control in task space (Cartesian space) coordinates. The operator, through an on-screen menu, can designate the control mode for each task space axis independently. Position control mode servos the slave position and orientation to match the master's. In force-reflecting mode, the hand controller is back-driven based on force-moment data generated by the robot and sensor during the robot hand's interaction with objects and environment. The indexing function allows slave excursions larger or smaller than the 30 cm cube hand controller work volume. Rate control mode sets slave endpoint velocity in task space based on the displacement of the hand controller. This is implemented through a software spring in the control computer of the hand controller. Through this software spring, the operator has a sensation of the commanded rate, and the software spring also provides a zero-referenced restoring force. Rate mode is useful for tasks requiring large translations. Compliant control mode is implemented through a low-pass software filter in the hybrid position-force loop. This permits the operator to control a springy oi" less stiff robot. Active compliance with damping can be varied by changing the filter parameters in the software menu. Setting the spring parameter to zero in the low pass filter will reduce it to a pure damper which results in a high stiffness hybrid position-force control loop. The overall control organization permits a spectrum of operations between full manual, shared manual and automatic, and full automatic (called traded) control, and the control can be operated with variable active compliance referenced to force-moment sensor data. More on the hand controller and on the overall ATOP control system can be found in [1] through [6].

332

Figure 3. DATAHAND Switch Modules Integrated with FRHC Hand-Grip

333

1. Each module contains five switches. 2. Switches can give tactile and audio feectloack. 3. Switches require low strike force.

4. Switches surround finger creating differential feedback regarding key that has been struck.

Figure 4. Five Key-Equivalent Switches at a DATAHAND Fingertip Switch Module

The present FRHC has a single hand grip equipped with a deadman switch and with three function switches. In order to better utilize the operator's finger input capabilities, an exploratory project recently evaluated a design concept that would place computer keyboard features attached to the hand grip of the FRHC. To accomplish this, three DATAHAND TM [7] switch modules were integrated with the hand grip as shown in Figure 3. Each switch module at a finger tip contains five switches as indicated in Figure 4. Thus, the three switch modules at the FRHC hand grip contain fifteen function keys which directly can communicate with a computer terminal. This eliminates the need that the operator moves his/her hand from the FRHC hand grip to a separate keyboard to input messages and commands to the computer. A recent test and evaluation, using a mock-up system and ten test subjects, indicated the viability of the finger-tip switch modules as part of a new hand grip unit for the FRHC as a practical step towards a more integrated operator interface device to the ATOP system. More on this concept and evaluation can be found in [8]. 2.2. C o m p u t e r G r a p h i c s Task visualization is a key problem in teleoperation, since most of the operator's control decisions are based on visual or visually conveyed information. For this reason, computer graphics plays an increasingly important role in advanced teleoperation. This role includes: (i) planning actions, (ii) previewing motions, (iii) predicting motions in real time under communication time delay, (iv) to help operator training, (v) to enable visual perception of non-visible events like forces and moments, and (vi) to serve as a flexible operator interface to the computerized control system. The capability of task planning aided by computer graphics offers flexibility, visual

334

Figure 5. Schematic Layout of the TCE Interface

quality and a quantitative design base to the planning process. The capability of graphically previewing motions enhances the quality of teleoperation by reducing trial-and-error strategies in the hardware control and by increasing the operator's confidence in control decision making during task execution. Predicting consequences of motion commands in real time under communication time delay permits longer action segmentations as opposed to the move-and-wait control strategy normally employed when no predictive display is available, increases operation safety, and reduces total operation time. Operator training through a computer graphics display system is a convenient tool for familiarizing the operator with the teleoperated system without turning the hardware system on. Visualization of non-visible effects (like control forces) enables visual perception of different non-visual sensor data, and helps management of system redundancy by providing some suitable geometric image of a multi-dimensional system state. Last, but not least, computer graphics as a flexible operator interface to the control systems replaces complex switchboard and analog display hardware in a control station. The actual utility of computer graphics in teleoperation to a higher degree depends on the fidelity of graphics models that represent the teleoperated system, the task and the task environment. The JPL ATOP effort in the past few years was focused at the

335

Figure 6. Schematic Layout of the Hierarchical Data Interface

development of high-fidelity calibration of graphics images to actual TV images of task scenes. This development has four major ingredients. First, creation of high-fidelity 3-D graphics models of robot arms and of objects of interest for robot arm tasks. Second, high-fidelity calibration of the 3-D graphics models relative to given TV camera 2-D image frames which cover the sight of both the robot arm and the objects of interest. Third, high-fidelity overlay of the calibrated graphics models over the actual robot arm and object images in a given TV camera image frame on a monitor screen. Fourth, highfidelity motion control of robot arm graphics image by using the same control software that drives the real robot. The high fidelity fused virtual and actual reality image displays became very useful tools for planning, previewing and predicting robot arm motions without commanding and moving the robot hardware. The operator can generate visual effects of robot motion by commanding and controlling the motion of the robot's graphics image superimposed over TV pictures of the live scene. Thus, the operator can see the consequences of motion commands in real time, before sending the commands to the remotely located robot. The calibrated virtual reality display system can also provide high-fidelity synthetic or artificial TV camera views to the operator. These synthetic views can make critical motion events

336 visible that otherwise are hidden from the operator in a given TV camera view or for which no TV camera view is available. More on the graphics system in the ATOP control station can be found in [9] through [13]. The first development of a graphic system as an advanced operator interface was aimed at parameter acquisition, and was handled and called as a Teleoperation Configuration Editor (TCE) [14]. This interface used the concepts of Windows, Icons, Menus, and Pointing Device to allow the operator to interact, select and update single parameters as well as groups of parameters. TCE utilizes the direct manipulation concept, with the central idea to have visible objects such buttons, sliders, icons, that can be manipulated directly, i.e. moved and selected using the mouse, to perform any operation. A graphic interface of this type has several advantages over a traditional panel of physical buttons, switches and knobs: the layout can be easily modified and its implementation cycle, i.e. design and validation, is significantly shorter than hardware changes. The TCE (Figure 5) was developed to incorporate all the configuration parameters of an early single arm version of the ATOP system. It was organized in a single menu divided in several areas dedicated to the parameters of a specific function. Dependencies among different graphical objects are embedded in the interface so that, when an object is activated, the TCE checks for parameters congruency. A significant feature of this implementation is the capability of storing and retrieving sets of parameters via macro buttons. When a macro command is invoked, it saves the current system configuration and stores it in a function button which can later restore it. The peg-in-hole task, for instance, requires mostly translational motions but when holes have a tight clearance, compliance is necessary. An appropriate macro configuration is one that enables x,y and z axes, with position control in the approach direction and automatic compliance on the other two axes. This configuration can be assigned to a macro button and then recalled during a task containing a peg-in-hole segment. The continuing work on a graphic system as an advanced operator interface is aimed at the data presentation structure of the interface problem, and, for that purpose, uses a hierarchical architecture [13]. This hierarchical data interface looks like a menu tree with only the last menu of the chain (the leaf) displaying data. All the ancestors of the leaf are visible to clearly indicate the nature of the data displayed. The content of the leaf includes data or pictures and quickly conveys the various choices available to the operator. A schematic figure of this layout is shown in Figure 6. Parameters have been organized in four large groups that follow the sequence of steps in a teleoperation protocol. These groups are: (i) Layout, (ii) Configuration, (iii) Tools, (iv) Execution. Each group is further subdivided into specific functions. The Layout menu tree contains the parameters defining the physical task structure, such as relative position of the robots and of the FRHC, servo rates etc. The Configuration menu tree contains the parameters necessary to define task phases, such as control mode and control gains. The Tools tree contains parameters and commands for the off line support to the operator, such as planning, redundancy resolution and software development. Finally, the Execution tree contains commands and parameters necessary while teleoperating the manipulators, such as data acquisition, monitoring of robots, hand controllers and smart hands, retrieval of stored configurations and camera commands. This hierarchical data interface helps solving the problem of displaying the large amount

337 of data needed during a teleoperation task, but it does not address the issue of data entry by the operator. Traditional interfaces require several operators, using a different input device for each controlled function. A single-operator using this interface needs to use different input devices: joysticks to move the manipulators, buttons for camera and video control, keyboard and mouse for parameters entry. Operating these devices is time consuming and distracts the operator from the task. An alternative approach would be using the FRHC as the only input device for the operator. This scheme is similar to a virtual reality interface, where operators can access commands and data by simply moving their arms and hands. In our implementation, the operator would use the FRHC as pointing device on the interface menus and would use the FRHC trigger to click the selected button. The discrimination between commands for the robot and those for the data interface could be done by the d e a d m a n switch on the FRHC handle. When not pressed, this switch inhibits the generation of motion commands for the robots, but FRHC motion data are still available and are used to move the cursor. This scheme will be tested to validate its effect on operator performance. This scheme also can be combined with the DATAHAND TMswitch module integrated with the FRHC hand grip, discussed previously in this paper. 3. C O N T R O L

EXPERIMENTS

To evaluate advanced teleoperation capabilities, two types of experiments were designed and conducted: experiments with generic tasks and experiments with application tasks. Generic tasks are idealized, simplified tasks and serve the purpose of evaluating some specific advanced teleoperation features. Application tasks are simulating some real-world uses of teleoperation. 3.1. G e n e r i c T a s k E x p e r i m e n t s In these experiments, described in detail in [15], four tasks were used: attach and detach velcro; peg insertion and extraction; manipulating three electrical connector; manipulating a bayonet connector. Each task was broken down to subtasks. The test operators were chosen from a population with some technical background but not with an in-depth knowledge of robotics and teleoperation. Each test subject received 2 to 4 hours of training on the control station equipment. The practice of individuals consisted of four to eight 30-minute sessions. As pointed out in [15], performance variation among the nine subjects was surprisingly slight. Their backgrounds were similar (engineering students or recent graduates) except for one who was a physical education major with training in gymnastics and coaching. This subject showed the best overall performance by each of the measures. This apparent correlation between performance and prior background might suggest that potential operators be grouped into classes based on interest and aptitudes. The generic task experiments were focused at the evaluation of kinesthetic force feedback versus no force feedback, using the specific force feedback implementation techniques of the JPL ATOP project. The evaluation of the experimental data supports the idea that multiple measures of performance must be used to characterize human performance in sensing and computer aided teleoperation. For instance, in most cases kinesthetic force feedback significantly reduced task completion time. In some specific cases, however, it

338 did not, but it did sharply reduce extraneous forces. More on the results in [15,16].

3.2. Application Task Experiments These experiments were grouped around a simulated satellite repair task. The particular repair task is the duplication of the Solar Maximum Satellite Repair (SMSR) mission, which was performed by two astronauts in Earth orbit in the Space Shuttle Bay in 1984. Thus, it offers a realistic performance reference data base. This repair is a very challenging task, since this satellite was not designed for repair. Very specific auxiliary subtasks must be performed (e.g. a hinge attachment) in order to accomplish the basic repair which, in our simulation, is the replacement of the Main Electric Box (MEB) of the satellite. The total repair, as performed by two astronauts in Earth orbit, lasted for about three hours, and comprised the following set of subtasks: thermal blanket removal, hinge attachment for MEB opening, opening of the MEB, removal of electrical connectors, replacement of MEB, securing parts and cables, replug of electrical connectors, closing of MEB, reinstating thermM blanket. It is noted that the two astronauts were trained for this repair on the ground for about a year. The SMSR repair simulation by ATOP capabilities was organized so that each repair scenario had its own technical justification and performance evaluation objective. For instance, in the first subtask-scenario, performance experiments, alternative control modes, alternative visual settings, operator skills versus training, and evaluation measures themselves were evaluated [17,18]. The first subtask-scenario performance experiments involved thermal blanket cutting and unscrewing MEB bolts. That is, both subtasks implied the use of tools. Several important observations were made during the above-mentioned subtask-scenario performance experiments. The two most important ones are: (i) the remote control problem in any teleoperation mode and using any advanced component or technique is at least in 50% a visual perception problem to the operator, influenced greatly by view angle, illumination and contrasts in color or in shading; (ii) the training or, more specifically, the training cycle has a dramatic effect upon operator performance. It was found that the first cycle should be regarded as a familiarization with the system and with the task. For a novice operator, this familiarization cycle should be repeated at least twice. The real training for performance evaluation can only start after completion of a familiarization cycle. The familiarization can be considered as completed when the trainee understands the system I/O details, the system response to commands, and the task sequence details. During the second cycle of training, performance measurements should be made so that the operator understands the content of measures against which the performance will be evaluated. Note, that it is necessary to separate each cycle and repetitions within cycles by several days. Once a personal skill has been formed by the operator as a consequence of the second training cycle, the real performance evaluation experiments can start. A useful criteria for determining the sufficient level of training can be, for instance, that of computing the ratio of standard deviation of completion time to mean completion time (that is, computing the coefficient of variation). If the coefficient of variation of the last five trials of a subtask performance is less than 20%, than sufficient level of training can be declared. In the above-quoted subtask-scenario experiments, the real training, on the average, required one week per subject. More details on application task experiments can

339

Figure 7. Partially Recurrent Neural Network for Peg-in-Hole Task Segmentation

be found in [17,18]. The practical meaning of training is, in essence, to help the operator develop a mental model of the system and of the task. During task execution, the operator acts through the aid of this mental model. It is, therefore, critical that the operator understands very well the response characteristics of the sensing and computer-aided ATOP system which has a variety of selectable control modes, adjustable control gains and scale factors. The procedure of operator training and the expected behaviour of a skilled operator following an activity protocol offers the idea of providing the operator with performance feedback messages on the operator interface graphics, derived from a stored model of the task execution. A key element for such advanced performance feedback tool to the operator is a program that can follow the evolution of a teloperated task by segmenting the sensory data stream into appropriate phases. A task segmentation program of this type has been implemented by means of a Neural Network architecture [19] and it is able to identify the segments of a peg-in-hole task. With this architecture, the temporal sequence of sensory data generated by the wrist sensor on the manipulators are turned into spatial patterns and a window of sensor observations is

340 20 16

-12 -16 -20

I

I

2

I

I

4

I

I

6

I

I

8

I

I

10

I

I

12

I

I

14

I

I

16

I

I

18

I

I

20

I

I

22

I

I

24

IIII

26

III

28

30

IIII

32

34

36

Time (seconds)

Figure 8. Segmentation in the Real-Time Experiment for a Peg-in-Hole Task

associated to the current task phase. A Partially Recurrent network algorithm [20] was employed in the computation. Partially Recurrent Networks represent well the temporal evolution of a task, since they include in the input layer a set of nodes connected to the output units, to create a context memory. These units represents the task phase already executed - the previous state. A set of fixed weight connections have been established among the output and context layer units (see Figure 7). The Partially Recurrent network gave good results both in training and in simulation and it has been interfaced to the ATOP telemanipulator system for real-time tests. The ATOP configuration is significantly different from the one used for the training data and, therefore, these tests also verified the robustness of the approach to hardware variations. Figure 8 is the output of an experiment of the peg-in-hole task. Three curves are plotted in this figure: the X axis force signal input to the network, the real time output of the network (dotted line) and the off-line classification of the network (solid line). The dotted line shows the actual output of the classifier and the solid line is the the output of the off-line segmentation of the same data. The values of the segments in the two lines are the indices of the peg-in-hole phases, as described in [21]. On the solid line, phase transitions are synchronous with the corresponding data, since the data rate is determined by the processing speed of the network. The dotted line, instead, shows a lag between its phase transitions and the solid line ones, due to the low speed of the on-line segmentation. In the real time segmentation, the delay between corresponding transitions increases as a

341 function of the time elapsed from the beginning of the experiment, since samples arrive to the network at a much higher rate than their propagation speed through the network. Several experiments of the peg-in-hole task have been carried out and the results have been encouraging with a percentage of correct segmentations approximately equal to 65%. Approximately at time 29 s the network misclassified the end of the extract phase (index n. 8) that should have occurred at time 34 s. This can be explained by considering that the network was trained on the data of a single experiment and, therefore, it was very sensitive to changes in the duration and/or amplitude of the force signal. Furthermore, the tests also showed that the temporal dependency built into the network was not as strong as it should have been necessary. This weak temporal structure resulted in phases being classified in the wrong sequence. In other experiments, the network showed the capability of recovering after a classification error and the unexpected feature of classifying tasks whose phases sequence was significantly different from the expected one. During peg extraction, for example, the operator decided to regrasp the peg and the network correctly recognized the new sequence of phases by inserting an insertion, corresponding to the regrasp, within the extraction phase. 4. A T I M E - D E L A Y

CONTROL

EXPERIMENT

The benefits of integrated operator interface to sensing and computer control aided and computer graphics supported advanced teleoperation system become most convincing when the operation has to be performed under communication time delay. The technical meaning of integrated operator interface for such cases signifies two major features of the overall ATOP architecture of remote control and control station: (i) the operator, through high fidelity overlay of computer graphics images of work scenes (virtual reality) over TV camera images of the same work scenes (actual reality), can, with high visual fidelity, preview and predict the outcome of command and control actions in real time; (ii) the operator can, with high cognitive confidence, delegate some commands and control authority to the sensor-based closed loop remote control based on visual preverification of the expected action domain of that control loop. 4.1. C a l i b r a t i o n M e t h o d A high-fidelity overlay of graphics and TV images of work scenes requires a high fidelity TV camera calibration and object localization relative to the displayed TV camera view. Theoretically, this can be accomplished in several ways. For the purpose of simplicity and operator-controllable reliability, an operator-interactive camera calibration and object localization technique has been developed, using the robot arm itself as a calibration fixture, and a non-linear least-squares algorithm combined with a linear one as a new approach to compute accurate calibration and localization parameters. The current method uses a point-to-point mapping procedure, and the computation of camera parameters is based on the ideal pinhole model of image formation by the camera. In the camera calibration procedure, the operator first enters the correspondence information between the 3-D graphics model points and the 2-D camera image points of the robot arm to the computer. This is performed by repeatedly clicking with a mouse a graphics model point and its corresponding TV image point for each corresponding pair of points on a monitor screen which, in a four-quadrant window arrangement, shows both the

342

Figure 9. Graphics User Interface for Calibrating Virtual (Graphics) Images to TV Images

graphics model and the actual TV camera image. (See Figure 9). To improve calibration accuracy, several poses of the manipulator within the same TV camera view can be used to enter corresponding graphics model and TV images points to the computer. Then the computer computes the camera calibration parameters. Because of the ideal pinhole model assumption, the computed output is a single linear 4 by 3 calibration matrix for a linear perspective projection. Object localization is performed after camera calibration, by entering corresponding object model and TV image points to the computer for different TV camera views of the object. Again, the computational output is a single linear 4 by 3 calibration matrix for a linear perspective projection. The actual camera calibration and object localization computations are carried out by a combination of linear and non-linear least-squares algorithms. The linear algorithm, in general, does not guarantee the orthonormality of the rotation matrix, providing only an approximate solution. The non-linear algorithm provides the least-squares solution that satisfies the orthonormality of the rotation matrix, but requires a good initial guess for a convergent solution without entering into a very time-consuming random search. When a reasonable approximate solution is known, one can start with the non-linear algorithm directly. When an approximate solution is not known, the linear algorithm can be used to find one, and then one can proceed with the non-linear algorithm. More on the calibration and object localization technique can be

343 found in [22,23]. After completion of camera calibration and object localization, the graphics models of both robot arm and object of interest can be overlaid with high fidelity on the corresponding actual images of a given TV camera view. The overlays can be in wire-frame or solid-shaded polygonal rendering with varying levels of transparency, providing different visual effects to the operator for different task details. In the wire-frame format, the hidden lines can be removed or retained by the operator, dependent on the information needs in a given task. The calibrated virtual reality display system can also provide synthetic TV camera views to the operator. Synthetic TV camera views can make critical motion events or robot arm position and alignment states visible which otherwise would be hidden from the operator in a given TV camera view or for which no real TV camera views are available. The calibrated virtual reality method also enables the placement of calibrated 3-D target symbols over the live TV video helping the real-time visual guidance/control and real-time visual verification by the operator.

4.2. P e r f o r m a n c e R e s u l t s The performance capabilities of the high-fidelity graphics overlay preview/predictive display technique were demonstrated on a large laboratory scale in May 1993. A simulated life-size satellite servicing task was set up at GSFC and controlled 4000 Km away from the JPL ATOP control station. Three fixed camera setting were used at the GSFC worksite, and TV images were sent to the JPL control station over the NASA-Select Satellite TV channels at video rate. Command and control data from JPL to GSFC and status and sensor data from GSFC to JPL were sent through the Internet computer communication network. The roundtrip command/information time delay varied between four to eight seconds between the GSFC worksite and the JPL control station. The task involved the exchange of a satellite module. This required inserting a 45 cm long power screwdriver, attached to the robot arm, through a 45 cm long hole to reach the module's latching mechanism at the module's backplane, unlatching the module from the satellite, connecting the module rigidly to the robot arm, and removing the module from the satellite. The placement of a new module back to the satellite's frame followed the reverse sequence of actions. Four camera views were calibrated for this experiment, entering 15 to 20 correspondence points in total from 3 to 4 arm poses for each view. The calibration and object localization errors at the critical tool insertion task amounted to about 0.2 cm each, well within the allowed insertion error tolerance. This 0.2 cm error is referenced to the zoom-in view (fovy=8 ~ from the overhead (front view) camera which was about 1 m away from the tool tip. For this zoom-in view, the average error on the image plane was typically 1.2 to 1.6 % (3.2 to 3.4 % maximum error); a 1.4 % average error is equivalent to 0.2 cm displacement error on the plane 1 m in front of the camera. The idea with the high-fidelity graphics overlay image over a real TV image is that the operator can interact with it visually in real time on a monitor within one perceptive frame when generating motion commands manually or by a computer algorithm. Thus, this method compensates in real time for the operator's visual absence from reality due to the time-delayed image. Typically, the geometric dimensions of a monitor and geometric

344 dimensions of the real work scene shown on the monitor are quite different. For instance, an 8-inch long trajectory on a monitor can correspond to a 24-inch long trajectory in the actual work space, that is, three times longer than the apparent trajectory on the monitor screen. Therefore, to preserve fidelity between previewed graphics arm image and actual arm motions, all previewed actions on the monitor were scaled down very closely to the expected real motion rate of the arm hardware. The manually generated trajectories were also previewed before sending the motion commands to the GSFC control system in order to verify that all motion data were properly recorded. Preview displays contribute to operational safety. In order to eliminate the problem associated with the varying time delay in data transfer, the robot motion trajectory command is not executed at the GSFC control system until all the data blocks for the trajectory are received. An element of fidelity between graphics arm image and actual arm motion was given by the requirement that the motion of the graphics image of the arm on the monitor screen be controlled by the same software that controls the motion of the actual arm hardware. This required to implement the GSFC control software in the JPL graphics computer. A few seconds after the motion commands were transmitted to GSFC from JPL, the JPL operator could view the motion of the real arm on the same screen where the graphics arm image motion was previewed. If everything went well, the image of the real arm followed the same trajectory on the screen that the previewed graphics arm image motion previously described, and the real arm image motion on the screen stopped at the same position where the graphics arm image motion stopped earlier. After completion of robot arm motion, the graphics images on the screen were updated with the actual final robot joint angle values. This update eliminates accumulation of motion execution errors from the graphics image of robot arm, and retains graphics robot arm position fidelity on the screen even after the completion of a force sensor referenced compliance control action. The actual contact events (moving the tool within the hole and moving the module out from or in to tile satellite's frame) were automatically controlled by an appropriate compliance control algorithm referenced to data from a force-moment sensor at the end of the robot arm.

345

Figure 10. A. Predictive/Preview Display of End Point Motion. B. Status of Predicted End Point after Motion Execution, from a Different Camera View, for the Same Motion Shown Above.

346 The experiments have been performed successfully, showing the practical utility of highfidelity predictive-preview display techniques, combined with sensor-referenced automatic compliance control, for a demanding telerobotic servicing task under communication time delay. More on these experiments and on the related error analysis can be found in [22,23]. Figure 10 illustrates a few typical overlay views. A few notes are in place here, regarding the use of calibrated graphics overlays for time-delayed remote control. (i) There is a wealth of computation activities that the operator has to exercise. This requires very careful design considerations for an easy and user friendly operator interface to this computation activity. (ii) The selection of the matching graphics and TV image points by the operator has an impact on the calibration results. First, the operator has to select significant points. This requires some rule-based knowledge about what is a significant point in a given view. Second, the operator has to use good visual acuity to click the selected significant points by the mouse. 5. C O N C L U S I O N S The following general conclusions emerged so far from the development and experimental evaluation of the JPL ATOP: 1. The sensing, computer and graphics aided advanced teleoperation system truly provides new and improved technical features. In order to transform these features into new and improved task performance capabilities, the operators of the system have to be transformed from naive to skilled operators. This transformation is primarily an undertaking of education and training. 2. To carry out an actual task requires that the operator follows a clear procedure or protocol which has to be worked out off-line, tested, modified and finalized. It is this procedure or protocol following habit that finally will help develop the experience and skill of an operator. 3. The final skill of an operator can be tested and graded by the ability of successfully improvising to recover from unexpected errors in order to complete a task. 4. The variety of I/O activities in the ATOP control station requires workload distribution between two operators. The primary operator controls the sensing and computer aided robot arm system, while the secondary operator controls the TV camera and monitor system and assures protocol following. Thus, the coordinated training of two cooperating operators is essential to successfully use the ATOP system for performing realistic tasks. It is yet not know what a single operator could do and how. To configure and integrate the current ATOP control station for successful use by a single operator is a challenging R&D work. 5. The problem of ATOP system development is not so much the improvements of technical components and subsystems. Though, they also present challenges. The final challenge is, however, to integrate the improved technical features with the natural capabilities of the operator through appropriate human-machine interface devices and techniques to produce an improved overall system performance capability in which the operator is part of the system in some new way.

347 REFERENCES

[1] A.K. Bejczy and J.K. Salisbury. Controlling remote manipulators through kinesthetic coupling. Computers in Mechanical Engineering, 1(1), July 1983. also in Kinesthetic Coupling Between Operator and Remote Manipulator in Proc. ASME Int'l Computer Technology Conference, Vol. 1, San Francisco, CA, August 12-15, 1980, pp. 197-211. [2] A.K. Bejczy, Z. Szakaly, and W.K. Kim. A laboratory breadboard system for dual arm teleoperation. In Third Annual Workshop on Space Operations, Automation and Robotics, JSC, Huston, TX, July 25-27, 1989. [3] Z. Szakaly, W.K. Kim, and A.K. Bejczy. Force-reflective teleoperated system with shared and compliant control capabilites. In NASA Conference on Space Telerobotics, JPL Publication 89-7, Vol. IV, Jet Propulsion Laboratory, Pasadena, CA, January 31, 1989. [4] A.K. Bejczy, Z. Szakaly, and T. Ohm. Impact of end effector technology on telemanipulation performance. In Third Annual Workshop on Space Operations, Automation and Robotics, number NASA Conf. Publication 3059, JSC, Huston, TX, July 25-27, 1989. [5] A.K. Bejczy and Z. Szakaly. Performance capabilities of a JPL dual-arm advanced teleoperation system. In Space Operations, Applications, and Research Symposyum (SOAR'90), Albuquerque, NM, June 26, 1990. [6] A.K. Bejczy and Z. Szakaly. An 8-d.o.f. dual arm system for advanced teleoperation performa.nce experiments. In Space Operations, Applications, and Research Symposyum (SOAR '91), Houston, TX, July 1991. see also, Lee, S. and Bejczy, A.K., Redundant arm kinematic control based on parametrization, in Proc. IEEE Int'l Conf. on Robotics and Automation, Sacramento, CA, April 1991. [7] L.W. Knight and D. Retter. Datahandtm: Design, potential performance, and improvements in the computer keyboard and mouse. In National Human Factors Society Conference, Denver, CO, November 1989. [8] L.W. Knight. Single environment: Experimental hand-grip controller for ATOP. Jet Propulsion Laboratory, DOE Summer Faculty Fellow (SFF) Reports, July 31, 1992 and July 30, 1993. [9] A.K. Bejczy, W.S. Kim, and S. Venema. The phantom robot: Predictive display for teleoperation with time delay. In IEEE International Conference on Robotics and Automation, Cincinnati, OH, May 1990. [10] A.K. Bejczy and W.S. Kim. Predictive displays and shared compliance control for time delayed telemanipulation. In IEEE Int'l Workshop on Intelligent Robots and Systems (IROS'90), Tsuchiura, Japan, July 1990.

348 [11] W.S. Kim and A.K Bejczy. Graphics displays for operator aid in telemanipulation. In IEEE International Conference on Systems, Man and Cybernetics, Charlottesville, VA, October 1991. [12] W.S. Kim. Graphical operator interface for space telerobotics. In IEEE International Conference on Robotics and Automation, Atlanta, GA, May 1993. [13] P. Fiorini, A.K. Bejczy, and P. Schenker. Integrated interface for advanced teleoperation. IEEE Control Systems Magazine, 13(5), October 1983. [14] P. Lee et al. Telerobot configuration editor. In IEEE International Conference on Systems, Man and Cybernetics, Los Angeles, CA, November 1990. [15] B. Hannaford et al. Performance evaluation of a six-axis generalized force-reflecting teleoperator. Publication 89-18, Jet Propulsion Laboratory, June 15, 1989. [16] B. Hannaford et al. Performance evaluation of a six-axis generalized force-reflecting teleoperator. IEEE Transaction on Systems, Man and Cybernetics, 21(3), May/June 1991. [17] H. Das et al. Performance experiments with alternative advanced teleoperated control modes for a simulated solar max satellite repair. In Space Operations, Automation and Robotics Symposium (SOAR'91), JSC, Huston, TX, July 9-11, 1991. [18] H. Das et al. Performance with alternative control modes in teleoperation. PRESENCE: Teleoperators and Virtual Environments, 1(2), Spring 1993. MIT Press Publ. [19] P. Fiorini et al. Neural networks for segmentation of teleoperation tasks. PRESENCE: Teleoperators and Virtual Environments, 2(1), Winter 1993. MIT Press Publ. [20] J. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA, 1991. [21] B. Hannaford and P. Lee. Hidden markov model analysis of force-torque information in telemanipulation. International Journal of Robotics Research, 10(5), October 1991. [22] W.S. Kim and A.K. Bejczy. Demonstration of a high-fidelity predictive/preview display technique for telerobotics servicing in space. IEEE Transaction on Robotics and Automation, October 1993. Special Issue on Space Telerobotics. [23] W.S. Kim. Virtual reality calibration for telerobotic servicing. In IEEE International Conference on Robotics and Automation, San Diego, CA, May 1994. (Submitted).

Intelligent Robots and Systems V. Graefe (Editor)

9 1995 Elsevier Science B.V. All rights reserved.

349

A c t i v e U n d e r s t a n d i n g of H u m a n Intention by a R o b o t t h r o u g h M o n i t o r i n g of H u m a n Behavior Tomomasa SATO", Yoshifumi NISHIDA b, Junri ICHIKAWA ~, Yotaro HATAMURA ~ and Hiroshi MIZOGUCHI a "Research Center for Advanced Science and Technology, The University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo, 153 Japan bFaculty of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 Japan ':Hitachi Inc. Understanding human intention is an essential function if a robot is to help adequately support human beings. It helps smooth communication between the human and the robot. Human behavior is an expressive media of communication. This paper proposes a new flmction, "active understanding of human intentions" by a robot through monitoring human behavior. The proposed flmction is unique because it utilizes multiple communication channels in parallel, i.e., human intentions are understood not only through conscious behavior but also through unconscious behavior. The paper also proposes a new robot architecture to realize the proposed function. The key architecture features are: 1) A robot possesses multiple sensors which surround the human. 2) Information is processed by dual loops-- a loop for information exchange between the human and the robot and a loop for human intention understanding. As an example of a robot with the human intention understanding functions, the authors constructed a micro-teleoperation robot. It can automatically understand an operator's intention through such unconscious human behavior as an operator's hand touching a desk. The understood result is utilized to change the control mode of a master-slave manipulation system from fine motion to rough motion and vice versa. The experimental results prove that the 15roposed function effectively simplifies operation system, making the system operator friendly. 1. I n t r o d u c t i o n Our aging society requires robots which can adequately support human beings. The support should be what the human wants, i.e., it should be in accord with the human's intentions. The authors refer to such a service robot as "a human-supporting robot." Robots which assist senior persons and robots which offer medical helping hands are typical examples of human-supporting robots. The human-supporting robot should be able to discern a human's intentions through comnmnication with the human, because it is impossible for the robot to adequately

350 support the human without knowing what the human wants. Understanding hmnan intention is a key function to realize human-supporting robots. The conventional robots have comnmnicated through input devices such as teachingboxes, keyboards, or switches to obtain information from the human. A common feature of such input devices is that the devices can obtain information only when the human consciously generates a behavior to operate these input devices. Although these devices have the merit that the information which the human doesn't intend to input can be suppressed consciously, they also have such the demerit that they only accept the human's conscious behavior. The demerit comes from the passive utilization of human behavior as a communication media. This imposes a burden on the human. Specifically, human operators must convert all their intentions into conscious behaviors to express their intention. Let us consider an example of writing a letter with a pencil. We usually move the pencil with one side of our hand touching a desk because we desire fine positioning. Moving the pencil is usually a conscious behavior. In contrast, touching the desk with the hand is usually an unconscious behavior. If a robot can automatically perceive such types of unconscious behavior, it can determine whether the human intends to move his or her hand in a fine motion control mode or a rough motion control mode. The example of the touching behavior indicates that unconscious behavior includes instructive information to convey a human's intention to the robot. Conventional robots have not utilized such behavior. Utilization of unconscious behavior requires the robot to actively perceive human behavior. Therefore, if a robot can more actively perceive a human's behavior as a conmmnication media, i.e., if the robot can perceive both conscious and unconscious behavior, the robot can understand human intention without forcing the human to teach all his or her desires consciously. While the conventional robot is a passive one which waits for input behavior as a command, a robot having the proposed function is an active one which understands human intention through monitoring human behaviors. This paper will demonstrate a method by which a robot actively recognizes expression of human intention through human behavior. This paper is organized as follows: The next section shows a model of communication between humans. The authors clarify the mechanism for understanding a partner's intention by the model. In addition, a robot function for active understanding of human intention is proposed. An architecture to realize the proposed function is described in section 3. In section 4, the authors present a microteleoperation robot as an example of the robot with the proposed function. The system effectiveness is proven experimentally. The conclusion is given in section 5. 2. U n d e r s t a n d i n g of h u m a n i n t e n t i o n In this section, we firstly analyze the function and mechanism of understanding of human intention through communication between human beings, based on the analysis, we present the robot's function for actively understanding human intention. 2.1. U n d e r s t a n d i n g of h u m a n i n t e n t i o n by h u m a n s Humans understand each other's intention by interpreting information received through communication. Therefore, a function for information exchange with the other person and a function for interpreting the received information are essential for understanding human

351 intention. We propose a model of communication between humans as shown in Fig. 1. We will use this model to analyze the mechanism of understanding of human intention by a human. Human understands the partner's intentions through communication via two information processing loops as shown in Fig. 1.

Figure 1. Loops in communication between humans

An information-exchange loop refers to a loop on which humans exchange information with each other. Information exchange is realized via language and behavior. An intention-understanding loop refers to a loop on which a human understands another human's intentions. Intention is understood based on knowledge concerning the partner (see models in Fig. 1) . Figure 2 shows the channels which convey information. The information flow from a speaker to a listener is illustrated in the diagram. The speaker's behavior is divided into two classes. The first class is the behavior generated when the speaker is conscious of the listener, the second is the behavior generated when the speaker is unconscious of the listener. The listener can obtain about 35% of the information by means of language and about 65% of it through nonverbal communication [1][2]. Following are the characteristics of communication channels which convey information in the information-exchange loop. Char. 1 H i e r a r c h y There exists a hierarchical structure among channels of communication between humans. Layer (1) L a n g u a g e c h a n n e l

352

Figure 2. Human communication channels conveying information

353 Language is the highest level and most flexible communication channel. A speaker is conscious of the listener's presence when the speaker takes expression. Layer (2) G e s t u r e c h a n n e l ( c h a n n e l of conscious b e h a v i o r w i t h c o n s c i o u s n e s s of a listener) Pointing at something by one's index finger is a typical example of a gesture. This layer generally provides a substitutional comnmnication channel for language. A speaker is trying to transmit information to a listener. Even information that can't be conveyed via language can often be communicated easily via gesture. Layer (3) C h a n n e l of conscious b e h a v i o r w i t h o u t c o n s c i o u s n e s s of a l i s t e n e r The listener can know that the speaker is hot by noting that the speaker fans his or her face with a fan. This is a typical example of conscious behavior without consciousness of the listener. This behavior is not assumed because the operator wants to send information consciously to the listener but because the operator wants to cool the speaker's face, i.e., the speaker behaves for his of her self. This channel differs from the previous two channels in term of consciousness of speakers. Since the speaker's behavior is not taken to convey information to the listener, the listener should intentionally look at and obtain information to communicate with the speaker. Layer (4) C h a n n e l of u n c o n s c i o u s b e h a v i o r Turning over in sleep or cat-napping are typical examples of unconscious behavior. A speaker doesn't send information to a listener intentionally, and the speaker isn't conscious of the behavior. The listener can understand the speaker's intention in the same way as (3). Layer (5) C h a n n e l of i n v o l u n t a r y b e h a v i o r or s y m p t o m s Behavior or symptoms in vital activity which have nothing to do with the consciousness. Breathing, sneezing, and sweating are typical examples of such behavior. A listener can understand a speaker's intention in the same way as the last two channels. Although behavior isn't always conscious behavior or unconscious behavior, i.e., conscious behavior can often change to unconscious behavior and vice versa, human behavior can always be categorized to a certain category if we take one moment into account. Char. 2 D i r e c t i v i t y There is directivity in comnmnication channels. The channels of layer (1) and (2) have directivity since the information is transmitted from the speaker toward the listener. However, the channels of layers (3) to (5) have no directivity since the information is not transmitted from the speaker toward the listener. The listener, therefore, has to actively watch a speaker's behavior to obtain information through layers (3) to (5). Human beings utilize not only directive communication channels but also indirective comnmnication channels through active recognition.

354 The following are characteristics for utilizing information which flows in the dual loops. Char. 1 Parallel utilization of multiple channels Human beings exchange information through multiple communication channels described above depending on the situation in parallel and interpret the received information in parallel. More specifically, human beings utilize multiple channels in the information-exchange loop and in the intention-understanding loop. Char. 2 Interpretation based on tacit rules Rules play an important role in interpreting the received information. For example, English language rules in channel (1) enable humans to interpret the spoken language. Communication in channels (2) through (5) is based on tacit rules which we know instinctively or as common sense. The inner square filled with dots of Fig. 2 represents common rules which make communication possible in a given culture. Considering the above characteristics, we can sunnnarize features of human intention understanding by human beings as follows. Human beings use the hierarchical channels fully by transmitting and receiving information through both the directive and the indirective channels in parallel. Humans can thus receive more information than transmitted consciously by the speaker, because humans can utilize unconscious behavior to receive information as well. Human beings understand the each other's intention by interpreting the received information based on the tacit rules. These features maximize the ability of human beings to understand each other's intention. 2.2. A c t i v e u n d e r s t a n d i n g of h u m a n intention by robot Based on the above analysis, we will propose essential functions of a robot which can efficiently understand a human's intention through communicating with the human to understand human intention, a robot must be able to exchange information with the human and to interpret information to achieve the same understanding as in communication between humans. Let us consider the problems of a conventional robot in terms of information exchange and information interpretation method. Figure 3 illustrates communication loops between a human and a conventional robot. Figure 4 shows communication channels between a human and a conventional robot in the loops. The problems of conventional robots can be summarized as follows. 1. Limited exchange of information Conventional robots receive only information from channels (1) and (2). For example, when an operator inputs commands from a keyboard, the operator uses only (1). In the information-exchange loop between a conventional robot and a human, unlike the information-exchange loop between humans, the robot imposes a burden on the human. Specifically, the human must consciously input all the information from sensors by converting the information of channels (3) and (4) into that of channels (1) or (2).

355

Figure 3. Loops in conmmnication between a human and a conventional robot

Such limited utilization of the channels occurs because the robot doesn't have sensors to receive information conveyed through the channels (3) to (5). 2. L i m i t e d i n t e r p r e t a t i o n of i n f o r m a t i o n Conventional robots interpret only information conveyed by channels (1) and (2). Therefore, the information conveyed by the channels (3) to (5) can't be processed even if received. In other words, conventional robots don't have the necessary function to interpret information from the channels (3) to (5). This is another reason humans have had to convert information of channels (3) and (4) into that of channel (1) or (2). A robot needs interpretation rules to interpret received information. The conventional robot establishes new artificial rules for interpretation which must be remembered by a human. Robot language is a typical example of such artificial rules. Figure 4 illustrates that a human and a robot interpret information by the artificial rules. The reason a beginner differs from a skilled worker is that the beginner is not accustomed to the artificial rules while the skilled worker is. The artificial rules established between a conventional robot and a human differ from the tacit rules in a human society. Therefore, the hulnan must remember complex operation methods. In the following, we will describe how to solve such problems. To solve the first problem, i.e., to overcome the limited receipt of information from a human, a robot should be able to receive information conveyed through all human comnmnication channels in parallel

356

Figure 4. C o m m u n i c a t i o n channels b e t w e e n a h u m a n and a conventional robot

357 channels (1) to (5). We refer to this function Function 1. To solve the second problem, i.e., to interpret received information, a robot should have a function to process all the information conveyed through the channels (1) to (5) in parallel. We refer to this function Function 2. The robot also requires a function to interpret the received information, not based on the artificial rules which must be learned by the human, but also based on tacit rules which are naturally known by the human. We refer to this function Function 3. (In this paper, "efficient" means there is a little the human must teach explicitly and consciously in order to convey his or her intention.) We propose a new function of "active understanding of human intention" which integrates Function 1 to 3 stated above. The function enables a robot to efficiently understand human intention through communication with the human. The main features of this function are described below. 1. A c t i v e n e s s We use the term "activeness" to mean that a robot with the proposed functions can receive information which a human doesn't transmit consciously. The robot thus receives information even through indirective channels. 2. U t i l i z a t i o n o f h u m a n b e h a v i o r as c o m m u n i c a t i o n m e d i a While conventional robots utilize only channels (1) and (2), the robot with the proposed function can utilize not only the language channel (channel (1)), but also the human behavior channels (channels of (2) to (5)) as communication media. The function for actively understanding human intention provides the advantages described below. I. E a s y c o m m u n i c a t i o n w i t h a r o b o t A human can smoothly conmmnicate with a robot because the robot can receive information which is not transmitted consciously. The human therefore need not convert the information of channels (3) and (4) to information of channels (1) and

(2).

II. E a s y o p e r a t i o n for b e g i n n e r s A beginner can easily operate the system provided with the proposed function, because the system understands his or her intentions on the basis of tacit h u m a n rules. As a result, the beginner need not remember new artificial rules. As a practical example of robots with the "active understanding of human intention" function, let us explain a teleoperational robot by which the operator performs teleoperations while watching a CRT monitor. Depending upon the channels the robot utilizes, the robot can offer the various support below. 9 Utilization of channels (1) and (2) When the operator points his index finger at something on the CRT monitor and says to the robot, "bring me that bolt over there," the robot can understand the operator's intention from the hand gesture as well as the spoken language. The robot can then bring the desired object to the speaker.

358 9 Utilization of channel (3) When the operator leans toward the CRT monitor to manipulate something more precisely, the robot can display a zoomed image of the object by perceiving the behavior. 9 Utilization of channel (4) The robot can prevent the operator from doing something dangerous by watching him or her cat-napping. 9 Utilization of channel (5) The robot can keep the room at a comfortable temperature by monitoring perspiration on the operator's forehead. 3. Robot architecture to realize active understanding of h u m a n i n t e n t i o n This section proposes an information processing architecture for the robot with the function of active understanding of human intention. 3.1. I n f o r m a t i o n exchange loop and intention understanding loop We propose the information processing architecture shown in Fig. 5. The essence of the architecture is existence of two information processing loops, the information exchange loop and the intention understanding loop.

3.1.1. Information exchange loop: The robot exchanges information with the human via the information exchange loop. As shown in Fig. 6, information exchange for communication is realized by multiple communication channels ((1) to (5)). To utilize hierarchical human communication channels in parallel in the information exchange loop, it is necessary to equip the robot with appropriate input output devices such as sensors, actuators, and displays corresponding to each of the communication channels. 3.1.2. I n t e n t i o n understanding loop" The robot understands human intention by processing information in the intention understanding loop. The following processing is repeated continuously: acquisition of sensor data, understanding of behavior, understanding of intention, and selection of sensing mode. For each channel, there exist multiple intention understanding loops. All the loops run in parallel to realize hierarchical understanding of human intention. Information processing of each loop is based on the model. The models are described below.

Behavior model Using the behavior model, the robot interprets sensor data of human behavior and tries to understand the meaning of human behavior, i.e., it converts sensor data to the meaning of behavior.

359

Figure 5. Conceptual diagram of the information processing architecture

360

Figure 6. Parallel utilization of hierarchical communication channels

361

Intention model The intention model refers to the model which converts input behavior to human intention. It also contains future human intentions which can be predicted from the current intention in order to control sensing modes. Sensing model The sensing model is utilized to determine the sensing modes based on the predicted human intention. Sensing modes includes the kind of sensors to use in the future, the configuration of these sensors, and sensor data handling. Determination of the next sensing mode enables efficient sensor data processing by attention focusing. In addition to these models for active understanding of human intention, the human supporting robot should have a support model. This is the model by which the human supporting robot retrieves the support description based on the understood intention. Figure. 7 shows a detailed flow chart of active understanding of human intention.

Figure 7. Flow chart of active understanding of human intention

362 3.2. M u l t i p l e sensors c o n f i g u r a t i o n s u r r o u n d i n g a h u m a n We propose a new configuration (Fig. 8) for human supporting robots.

Figure 8. Multiple sensors surrounding a human

The configuration is such that the robot's multiple sensors surround the human. Surrounding the human by nmltiple sensors enables the robot to receive information from any desired directions. The information from multiple directions contains various aspects of human behavior. Consequently, the configuration makes the robot sensitive to the human behavior. The robot with the proposed configuration can communicate with the human differently from communication between humans because artificial robot sensors have the following advantages: 1) Some sensors can input data much more accurately than human sensors. 2) Some sensors can obtain data which cannot be obtained by human sensors, e.g. an infrared camera. 3) The robot can select a number and configuration of such sensors freely, e.g. six spherically arranged eyes using six cameras.

363

4. Application to Microteleoperation In this section, we present a microteleoperation robot as an example of a robot with the proposed function. 4.1. Microteleoperation A microteleoperation system is a master-slave manipulation system in which a slave robot handles very small objects according to an operator's movement. The size of handled object is on the order of micrometers[3]. In the system, a pen-shaped master manipulator, referred to a "pen-shaped master," is utilized as a master manipulator [4]. The slave robot executes tasks under a microscope. A CRT display monitor attached beneath a desk surface with its face up serves as the system human interface device. It displays the visual information from the slave robot microscope. The operator can perform teleoperation with the hand-held pen-shaped master while monitoring the working environment displayed on the monitor. Microteleoperation requires the slave robot to execute both fine and rough motions. Consider the task of micromachine assembly as an example. Before the assembly starts, the micromachine components should be conveyed into the microworking environment from the macro world. Therefore, the robot should have rough motion capability to move the object the long distance between the micro world and the macro world as quickly as possible. During the assembly, the robot should move the object with fine motion to align the object precisely. After the assembly, the assembled machine has to be moved from the working environment back to the macro world. The robot is again required to move the object a long distance by rough motion. In microteleoperation, it is necessary to change from the fine motion control mode to the rough motion control mode and vice versa frequently depending upon the working situation to execute tasks smoothly. In case of the conventional robot, the operator is forced to consciously change these modes. If a system can change these two modes automatically in accordance with the operator's intention by understanding the intention from an operator's unconscious behavior, i.e., by utilizing channels (3) and (4), the operator can perform microteleoperation smoothly because the operator is not required to consciously switch modes. 4.2. A p p l i c a t i o n of the active u n d e r s t a n d i n g of h u m a n intention A intention model, a support model, and a behavior model for the task of repairing a micro pattern are presented below. In this example, the operator makes the slave robot shave micro patterns painted on a sheet by maneuvering the pen-shaped master in his or her hand. (A) Intention model Here we will explain the relation between the operator's intention and the operator's behavior, in another word, the relation between the operator's intentions and motions of the operator's hand. Figure 9 shows the typical situations of the operator's hand. In (a), the operator's hand is floating in the air. In this situation, the operator intends to move operator's hand rapidly. In (b), the operator's hand is touching the desk. In this situation, the operator intends to execute the task precisely. These are the relations between the operator's behavior and their intentions. Such human intentions can be judged from

364

Figure 9. Typical situations of operator's hand

whether the operator's hand touches the desk or not. The intention model to judge the operator's intentions from his or her behavior is given as follows: (a) If the hand is not contacting the desk, the operator wants the slave robot to move quickly. (b) If the hand is touching the desk, the operator wants the slave robot to move precisely. (B) S u p p o r t m o d e l The slave robot supports the operator in accordance with the understood intention. When the operator wants to move the slave manipulator quickly, the system should set the slave robot control mode to rough motion control. When the operator wants to execute the task precisely, the system should set the fine motion control mode. The following support model realizes these activities: (a) If the operator wants to execute the task quickly, the slave robot should be in rough motion control mode. (b) If the operator wants to execute the task precisely, the slave robot should be in fine motion control mode. (C) B e h a v i o r m o d e l A touch-sensor panel is a suitable input device for obtaining the touch information between the hand and the desk. The following behavior model is constructed to recognize the operator's behavior from the output of the touch-sensor panel When there are no signals from the touch-sensor panel, the operator's hand is not contacting the desk as shown in (a) of Fig. 9. When there are signals from the touch-sensor panel, the operator's hand is touching the

365 desk as shown in (b) of Fig. 9. A flow chart of the intention-understanding loop in the microteleoperation robot can be summarized as Fig. 10

Figure 10. Flow chart of the intention understanding loop in the micro-teleoperation robot

4.3. E x p e r i m e n t a l S y s t e m Figure 11 shows the experimental micro-teleoperation system. The system consists of a human interface section, a slave robot and computers and controllers. 4.3.1. H u m a n i n t e r f a c e s e c t i o n Figure 12 shows a photograph of the human interface section. In this section, the information-exchange loop is formed between an operator and the system. The human interface section consists of a pen-shaped master manipulator, CCD cameras, a CRT display monitor, and a touch-sensor panel. The pen-shaped master is equipped with two

366

Figure 11. System configuration

367 LEDs to measure its three-dimensional (3D) position. Two CCD cameras are located above the CRT display monitor to obtain the image of two LEDs. The touch-sensor panel is attached to the screen of the CRT display monitor. The touch-sensor panel generates the output data indicating whether the surface is touched or not as well as the 2D touched position. The operator can perform the teleoperation with the pen-shaped master in the operator's hand while watching images of micro objects on the CRT display monitor. Note that the CRT display monitor screen is utilized as the desk surface so the operator can easily view the image as shown in Fig. 11. With the above input devices, the system can receive both conscious behavior of moving the pen-shaped master and unconscious behavior of touching the desk in parallel. In the system, the touch-sensor panel and CCD cameras are "the robot's multi-sensors surrounding a human" stated in the previous section. 4.3.2. Slave r o b o t Figure 13 shows a photograph of the slave robot. The slave robot consists of a slave manipulator, a work table, a stage, and a microscope. The slave manipulator has a handling tool (a knife) at its hand tip. The work table is equipped with a stage having three translational degrees of freedom. The nficroscope is located above the work table in a slave robot. 4.3.3. C o m p u t e r s a n d c o n t r o l l e r s The system computers consist of a host computer (Sun Sparc IPX), a real-time image processor (MaxVideo200) and transputers (T805). Transputers control the motion of the work table and the stage of the robot. The host computer integrates the system. The intention-understanding loop is implemented in the host computer.

Figure 12. Photo of hmnan interface

Figure 13. Photo of slave robot

368

4.3.4. System architecture The system realizes active understanding of human intention by the proposed architecture which consists of the information-exchange loop, the intention-understanding loop, and multiple sensors surrounding a human as shown in Fig. 14. Intention understanding loop (1) in Fig. 14 represents a loop in which the system understands the operator's desired direction. Intention understanding loop (2) represents a loop in which the system understands desired control mode. The two intention understanding loops work in parallel.

4.4. Experiment The following experiment was conducted. The operator shaves micropatterns on a sheet using the slave robot. The sheet is placed on the work table. Figure 15 shows a photograph of examples of patterns. The diameter of the dotted pattern is about 120 pro. Two identical sheets are prepared for the experiments. In the first experiment, the system is manually switched between fine motion control mode and rough motion control mode. In the second experiment, the proposed robot switches the system automatically. We measured the time to complete the task for six subjects. The subjects are taught how to manually change modes switch and how to use the touch-sensor panel for automatic changing in advance. Figures 16 and 17 show repairing times for each pattern in experiment (1) and experiment (2). These results indicate there is reproducibility in data because the same subject has almost fixed time for repairing, although there are differences among subjects. In order to clarify the difference between the two experiments, we compare repairing time by averaging the times for 6 subjects. Figure 18 shows the result. The results indicate that times of the second experiment (automatic control mode switching) are shorter than those of the first, though there are individual differences. The differences can be explained as follows. The operator need not consciously change modes because the system automatically changes them based on the operator's unconscious behavior. Consequently, the operator need pay attention only to the microteleoperation itself. The operator doesn't need to remember how to change modes because the modes are changed based on the operator's unconscious behavior and that unconscious behavior is based on tacit human rules.

369 r

Intention understanding / loop(2) Channel Obtain data from touch panel Understand behavior

Sensors surrounding operator

"F

vior Judge touching or floating of hand from sensor data.

Judge intention from situation of hand.

Understand intention

Touch panel CCD cameras

Select support mode

g

Select rough or fine motion control mode.

Conscious Channel uator and display Monitor

I'

Obtain data from CCD cameras

Understand behavior

Support

Information exchange 100p

-~-Behavior model-~ C a l c u l a t e 3D position of pen-shaped master.

Move stage according to 3D positoin.

Intention under loop(l)

Micro-teleoperation robot Figure 14. System architecture

J

370

Figure 15. Example of dotted patterns on a Figure 16. Repairing time for each pattern sheet in experiment(1)

Figure 17. Repairing time for each pattern Figure 18. Comparison of repairing time for in experiment(2) experiment (1)and experiment (2)(average time of subjects)

371 5. C o n c l u s i o n This paper proposed a new function of "active understanding of human intention by a robot through monitoring of human behavior," which should be used by robots in order to efficiently understand human intention. Robots with this function can provide a human appropriate support based on understood intentions. The unique features of the proposed function are as follows: 1) The robot sinmltaneously receives information through nmltiple hierarchical human comnmnication channels of human behavior. 2) The robot understands human intention by interpreting the received information on the basis of tacit human rules. The authors also proposed a robot architecture to realize the function. The key features of the architecture are: 1) The robot is configured such that the robot's nmltiple sensors surround the human. 2) The robot has dual loops, an intention-understanding loop for understanding human intention and an information-exchange loop for exchanging information between the human and the robot. As an example of a robot having the proposed function and architecture, the authors constructed a micro-teleoperation robot. The robot switches between fine motion control mode and rough motion control mode, which is required frequently in micro-master-slave manipulation. The switching is performed automatically by interpreting the touch-sensor information which indicates whether the operator's hand is touching the desk or not. The hand touching the desk is a typical example of unconscious behavior of the operator. The experimental results proved that the function is useful for providing support based on the operator's intention and consequently in making the system friendly to the operator. Finally, the authors describe an example of a robot with the proposed active behavior understanding function for medical application. A patient is taken into an intensive care unit (ICU) when the patient is seriously ill or injured so that the state of the patient can be carefully monitored. Monitoring is performed by obtaining reliable data from many sensors directly attached to the patient. However, such monitoring imposes mental and physical burdens on the patient. A patient who is not so seriously ill or injured should be taken into a simple I CU which doesn't impose burdens on the patient. We are planning to construct "a robotic sickroom" as a simple type of intensive care unit (ICU). The robotic sickroom monitors the patient with the function proposed in this paper and provides the appropriate support. For example, it monitors a patient's turning over in sleep to prevent bedsores. It monitors unusual behaviors 1 and breathing motion to keep the patient's situation from becoming dangerous. The behaviors of turning over in sleep, unusual behavior, and breathing are other examples of unconscious human behavior. Understanding a patient's intention by recognizing such unconscious behavior enables the robotic sickroom to offer the functions of an ICU without imposing a burden on the patients. Future research should be devoted to modeling and processing such human behavior. The authors would like to express the sincere appreciation to Yoshinobu Kotani for his suggestions. 1Extracting a needle of an intravenous medication is a typical example. It arises f r o l n a side effect of a medicine or depression of brain.

372 REFERENCES

1. M. F. Vargas, "Louder than words-An introduction to nonverbal conmmnication-" Iowa State Unversity Press, 1987 2. K. Tachibana, "The structure of Communication", pp45-54, NTT-shuppan, 1993 3. T. Sato, K. Koyano, M. Nakao, Y. Hatamura, "Novel manipulation for micro object handling as interface between micro and human world," Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol3, pp1674-1681, July 1993 4. T. Sato, J. Ichikawa, M. Mituishi, "Micro teleoperation system concentrating visual and force information at operator's hand," Proc. of International Symposium on Ezperimental Robotics, pp118-124, Oct 1993

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

373

Human/machine sharing control for telerobotic systems* Tzyh-Jong Tarn a, Ning Xi a, Chuanfan Guo a, and Antal K. Bejczy b a Department of Systems Science and Mathematics, Washington University, St. Louis, MO 63130, U.S.A. bJet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, U.S.A Telerobotic systems usually consist of autonomous operations and master/slave operations. Based on our newly developed event-based planning and control theory, a new action planning and control scheme for the telerobotic system is presented. It provides a unified model to integrate a human operator control command (supervisory command) with motion planning and control for automatic operation. As a result, supervisory command and autonomous motion planning and control are naturally combined together such that the autonomous motion plan can be adjusted and modified by a human operator during its execution. This scheme lays down a foundation for planning and control of a general robotic system involving human operators. The scheme is implemented and tested on a PUMA 560 dual-arm system. A Spaceball is used by the human operator as a command generator to input the supervisory command. The experimental results of both single arm teleoperation and dual-arm coordinated teleoperation are presented. 1. I N T R O D U C T I O N Applications of the teleoperation of robotic systems range from material handling and automation in hazardous environments to servicing tasks in space. A telerobotic system can be commanded by manually generated motion commands, preplanned motion commands or a combination of both commands. Generally, manually generated commands are obtained from command generators manipulated by human operators in real-time. But preplanned commands are given before the motion execution. The preplanned commands are sometimes called autonomous commands or autonomous functions. Since autonomous functions may not work perfectly in unexpected situations, intervention by the human operator is often needed. The focus of this paper is to present a new approach to the planning and control of telerobotic operations, such that the telerobotic system has the ability to cope with some unexpected events. In order to interface a human operator with a robotic system, many planning and control schemes for telerobotic systems have been proposed. Among them, *Research Partially supported under NSF Grant IRI-9106317, CAD-9404949 and by Sandia National Laboratories Contract No. AC-3752-C.

374 the master/slave type of telerobotic system was intensively studied in [5, 6][8]- [12]. In addition, different kinds of semiautonomous control of the telerobotic system were presented in [8]. The teleoperation of a redundant manipulator is discussed in [9], and the importance of coordination between the human operator and remote robot is addressed in [10]. Obstacle avoidance by using semiautonomous control approach is presented in [11, 12]. In existing planning and control schemes for teleoperations, autonomous functions and manual operations are separated into two different operation modes. Based on some logic conditions, the operation modes can be switched between each other. However, if some unexpected event, such as occurrence of obstacles etc., happens in the autonomous operation, the system has to be switched to the manual mode. After proper manual operation, complete replanning of the autonomous operation is required to switch the system back into the autonomous mode, since the autonomous motion plan is usually described as a function of time, and the switches do not happen at some predefined time. This significantly limits the capability of the telerobotic system to deal with unexpected events. In addition, due to the uncertainty of the switching time caused by unexpected events, it could be extremely difficult to achieve stable and smooth switching between the different operation modes. In order to overcome the above disadvantages, a new event-based planning and control scheme is introduced for autonomous operations. It can be easily adjusted or modified by a human operator, through a command generator, during the operation. As a result, the autonomous and manual operations are naturally combined, an problems caused by mode switching are avoided. The proposed method is applied to single arm, as well as to dual-arm teleoperations. Finally, experimental results for single arm action and for dual-arm material handling are presented. The ultimate goal of this paper is to develop a new planning and control scheme for a telerobotic system to achieve an intelligent and robust performance. 2. E V E N T - B A S E D

PLANNING

AND CONTROL

THEORY

The basic idea of the event-based planning and control theory [3] is to introduce the concept of an action reference parameter. This parameter is directly related to the system output measurement relative to the task. Instead of being parameterized by time, the plan/desired input to the system is parameterized by an action reference parameter. As a result, for any given instant of time, the desired input is a function of the system output. This creates a mechanism for adjusting the base plan based on the output measurement, thus converting the planning to a closed loop real-time process. That is, the plan (the desired action) is an "abstract" entity, like a "desired goal". It only becomes a "real" plan when measurements are made. The event-based planning and control scheme is compared with the traditional time-based planning and control scheme in Figure I. In the event-based planning and control scheme, the function of the Action Reference block of Figure Ib is to compute the action reference parameter s on-line based on system output measurement. The planner then gives the desired input to the system according to the on-line computed action reference parameter s. In addition, the action reference parameter is calculated at the same rate as the feedback control. In other words, the

375

planning process is adjusted at a very high rate, which enables the planner to handle unexpected or uncertain discrete and continuous events together with the control execution.

[Time-based[

Yd(t)

y(t)

e(t)]

...~ K~t ~ ' ~-~/I

a. Traditional Planning and Control Scheme

yd(s) Event-based] "Y.'~ e(~[ Planner ~ C o n t r o l l e r

s1 !

" ._|

|

y(t) ~

Robot

Motion

Reference

L [--..

b. Event-Based Planning and Control Scheme

Figure I. Comparison

of Traditional and Event-Based

Planning and Control Schemes

As an example, the event-based planning and control scheme has been successfully applied to a robot arm motion in which the arm was stopped by an unexpected obstacle [3]. Since the desired input of the system is directly related to the output through an action reference, the error remained constant while the robot arm stopped. Once the obstacle is removed, the robot arm automatically continues its motion, according to the original plan, in a stable manner without replanning. In this case, if the conventional time-based motion plan were implemented, the system error would continue to increase, since the planner would give the desired input according to the original fixed plan. As a result, the system would become unstable. In order to apply the event-based planning and control theory to the telerobotic system, the most important thing is to integrate the human operator command with the action reference parameter [7]. As a result, the action plan will be based on sensing information in an autonomous operation, or in a manual operation on human operator command. This provides a unified planning and control scheme for operations which are commanded by both a human operator and an autonomous motion plan. 3. C O N T R O L

STRATEGY

FOR SINGLE ARM TELEROBOTIC

SYSTEM

A single arm telerobotic system mainly consists of a robot arm and a command generator. It can perform simple teleoperations such as master/slave operation, as well as complicated teleoperations such as object avoidance, which combines both autonomous functions and master/slave commands. "Simple teleoperation" means that the commands

376 of teleoperation are either from a human operator or from some autonomous motion plan. "Complicated teleoperation", on the other hand, is a combination of human operator commands and of autonomous motion plan.

3.1 Robot

dynamic

The dynamic

model

model for a robot arm can be written as

D(q)~ + C(q, (1) + G(q) = T

(1)

The task space is defined as Y c R 6, Y=

(x,y,z,O,A,T)

T,

where (x, y, z) is the position of the tip in the Cartesian space, and (O, A, T) is an orientation representation (Orientation, Altitude, Tool angles) [4]. The relation between Y and joint angle q is Y = h(q)

(2)

Therefore we have = Jh(q)(1

(3)

where Jh(q) is the Jacobian matrix. The robot dynamics in task level can be described as follows D ( q ) J h l ( q ) ( Y - Jh(q)(1) + C(q, (t) + G(q) = 7

(4)

The following nonlinear feedback [3] T = n ( q ) J [ l [ Y d + K,,(Y d - 9 ) + K p ( Y d - Y) + Jh(q)q] + C(q, (1) + G(q)

(5)

is applied to obtain a totally linearized and decoupled error dynamics, (?d _ ? ) + K~(?d _ ? ) + K~(yd _ y ) = 0

(6)

where yd, ~zd, ~d are the desired position, velocity and acceleration, respectively, and Kv, Kp are the velocity and position feedback gains which can be designed to stabilize the system. 3.2 C o m m a n d g e n e r a t o r Master robots which are kinematically similar to the telerobots (slave robots) are often used as command generators [II]. Another kind of command generator is called Position Hand Controller [5, 6]. In this paper, the Spaceball (Figure 2), is used as a command generator. Both Position Hand Controller and Spaceball are not kinematically similar to the telerobots. The Spaceball can generate operation commands in 6 directions. From a

377 manipulation point of view, the end-effector of a robot should follow Spaceball commands generated by a human operator.

Figure 2. Command Generator: The Spaceball Scaled sampled data from the Spaceball are used as the desired velocity of end-effector motion in Cartesian space, which consist of translation velocity and angular velocity. The Spaceball commands can be used as the desired velocity of the end-effector since only the relative motion of the end-effector is important. The desired position and acceleration of the end-effector can easily be obtained by integration or differentiation of the desired velocity. Assume the Spaceball output after scaling is Vx ~3y

~

=

~z

(7)

COx COy COz

Notice that Vsp is described as a function of the motion reference s [3]. A human operator gives a proper command according to his/her observed robot position and orientation. In this sense, the operator plays the same role as a sensor in the event-based planning and control scheme [3]. The desired velocity of the end-effector can be obtained as follows ~/'id -- J1Vs p

(8)

where J1 is the Jacobian matrix from Cartesian space to task space and is described by

[4]

J1 = [/a•

[ 0

where

0

K~r

V~p

(9)

378

1] CA -So 0 Co 0

SoSA COSA

KE(~OAT---

CA --Co _So CA

CA

which represents the relationship between the OAT derivatives and angular velocities. The desired position change of the end-effector can be determined by: =

and the desired acceleration is

]/';dp=

jiVs p -.Ji- ,.]19sp

For a simple human operator control, i.e., a teleoperation without an autonomous motion plan, AY~%, ]?~d, and ~;d will provide the robot arm with all the necessary motion commands. 3.3 S e m i a u t o n o m o u s c o n t r o l for t e l e r o b o t i c s y s t e m If the robot arm is moving along a preplanned trajectory and some obstacle occurs in the given path, a scheme must be designed so that the human operator can temporarily stop the robot motion to wait for the clearance of the obstacle, or, by superposing an additional command through the Spaceball, change the motion to avoid the obstacle. Let the autonomous function of robot motion be described by yd(s), Yd(s), Yd(s), and the supervisory command from Spaceball is Ayd(s), yd(s), Y~d(s), where s is the motion reference parameter. To obtain a semiautonomous function which combines the attonomous function and the supervisory command to achieve a complicated teleoperation, the following scenarios have to be considered: (1) After the robot motion is either stopped or slowed down with the Spaceball, the robot motion defined by an autonomous motion plan can resume after the obstacle is removed and the Spaceball is released. (2) Switch between autonomous and master/slave operations. (3) Semiautonomous control, superposing the Spaceball commands to an autonomous motion plan to achieve complicated teleoperations. Notice that case (1) is a special situation of case (3). Case (2) is currently a common operation in telerobotic systems. Based on event-based planning and control schemes, semiautonomous control for the above cases can be easily implemented. Let the command motion direction from the autonomous motion plan be ]~d, and the command motion direction from the Spaceball be Y~p. "d If the autonomous motion plan needs some modification by a human operator command to ensure proper robot motion, the following two conditions may be considered: (1) The projection of ]2~%on the positive direction of ]?d cannot make the total velocity of robot motion in this direction exceed the maximum velocity limit of the robot arm.

379 (2) The projection of ])~d on the negative direction of yd must be modified so that the motion reference s keeps nondecreasing to ensure the stability of the motion [3].

~Pd~k. d ~

lisp2

Sf

So

(a) cos ~ >= 0 yd

d

Ysp2

sf

,

y;d So

(b) cos ot < O

Figure 3 Semiautonomous Commands for Single Arm Teleoperation Let c~ be the angle between ]~d and ]~d in Figure 3, and

~/d ,pl

(10)

Ii?dll

~d

(11)

" d __ y d

sp2 ~ Ysp

spi

The semiautonomous command is defined as follows "d

Yip,~ =

{ ?..., !cos >0 "d

"d

<

]zip, IY;pll I _ I17~11, ~os~ < 0 ?d,~. _ ?d II?~ I > II?d[L, COS ~ < 0

The desired position change of the end-effector will be A Y 4 , s = Ji Vsp,sAt

and the desired acceleration will be "'d ~/'sp,s

__

J1 Vsp,s Jr- Jl Vsp,s

(12)

380

Notice that the desired position command from the Spaceball is actually a relative position increment command superposed on the autonomous function yd. The total desired input is then/kY~dp,~ + yd, Ys d "t" ~/'d and ~2"sp,s-Jc ~.zd.

Spac .'Ball

_. e, er

gobo, I , Controller ~ - Motion Reference ]--i. Human Operator ,

I~--. I-"

Figure 4. Telerobotic System with Semiautonomous

Control

Remark:

In this case, if the autonomous commands are functions of time, it cannot be so easy to combine the Spaceball commands with the autonomous commands because the Spaceball commands are naturally functions of event: the given path of robot motion. Figure 4 shows the structure of single arm telerobotic system with autonomous control.

4. C O N T R O L

STRATEGY

FOR

DUAL-ARM

TELEROBOTIC

SYSTEM

The dual arm telerobotic system consists of two robot arms and a command generator. In this system, under the command of a human operator, two robot arms can manipulate an object in 6 dimensional task space with automatic coordination. 4.1 R o b o t d y n a m i c m o d e l s The dynamic model for each robot can be written as

Di(qi)qi + Ci(qi, qi) + Gi(qi) + J~(qi)fi = "ri, i = 1,2

(13)

where the term J~(q~)f~ represents the interaction force between robots and the object, and Jh~ is the Jacobian matrix and is defined as

381

= Jh~(qi)(ti,

i = 1,2.

where Yi = hi(qi), i = 1, 2, are task space outputs of each robot. The robot dynamics in task space can be described as

Di(qi)j~X(qi)(Yi - Jh (qi)oi) + C~(q~, Oi) + Gi(qi) + j T (qi)fi

7-i

(14)

The following nonlinear feedback with the consideration of force/torque compensation [7] is applied for each robot

Ti = Di(qi) J -h, l [Yid nt- I(v ~. ( t d -- Yi ) -Jc-I(p ~. ( Yi d -- Yi ) +Jh~(qi)oi] + Ci(qi, 0~) + Gi(qi) + j Thi( q i ) f i

(15)

Then a totally linearized and decoupled error dynamics can be obtained for each robot,

(Yi a - ~ ) + Kiv(Yi a - Yi) + Kw(Yi a - Yi) = 0

(16)

where Kiv, Kip can be designed to stabilize the system.

4.2 D u a l - A r m coordinated teleoperation In a dual-arm telerobotic system, it is extremely important to achieve coordinated control. Based on the event-based planning and control scheme [3], the automatic coordination in a teleoperation could be easily achieved.

, v

iR~ I

V///•

?,t ......

I

I,,i- K.

"--

R~176

////Z

Figure 5. Coordinate Frame Systems of Dual Arm Robot System In Figure 5 , Kw is the world coordinate frame, and robot 1 and robot 2 hold an object by squeezing. It is assumed that the two robots manipulate an object with coordination and there is no relative motion between each robot end-effector and the object. Therefore the following constraints must be satisfied

382 (i) Position constraints 0 pod = pd + Rid (pob _ po),

i=1,2

where pod = (xdb, yo%, Zod)T is the desired position of the object in the world coordinate frame; P~ = (X~ Yob,~Zobo)T is the initial position of the object in the world coordinate frame; pd, p0 and pd , p0 are the desired position and initial position of robot 1 and robot 2 in the world coordinate frame; R1d and R~ are the desired rotation matrix of robot 1 and robot 2 in the world coordinate frame. (ii) Orientation constraints

( R d ) - l R d - (RO)-lRO and

(.Rdb)-l.R d ---(.ROb)-lR o where Rdb and ROb are the desired rotation matrix and the initial rotation matrix of the object in the world coordinate frame; R ~ and R2~ are the initial rotation matrix of robot 1 and robot 2 in the world coordinate frame. Without loss of generality, it can be assumed that the initial rotation matrix of the object is equal to that of robot I, i.e. ROb -- R ~ Then the desired orientation of the object and the desired orientation of robot 1 should remain the same all the times, i.e. R d = Rod. The initial position of the object can be assumed to be the center of the straight line from Y1~ to Y2~ i.e. pOob- 7l(p0 + p0) The sampled data from the Spaceball after scaling is used as the desired velocity of the object motion in Cartesian space. Assume the Spaceball output after scaling is Vsp. The desired velocity of the object is

?g =

Jl%

The desired position increment of the object is = J1y

At

and the desired acceleration is

Using the constraints, the desired position, velocity and acceleration for each robot can be obtained. The goal is to achieve the best possible coordination. In the event-based motion plan, the desired inputs are described by functions of motion reference s, which should closely relate to the goal of control. In other words, the problem is to determine s such that the position constraints are satisfied and also minimize the following coordination criterion

J = klllPld(s)- Pill 2 + k~.llP~(S ) -p~.ll ~

(17)

383 where kl and k2 are weighing coefficients which depend on the importance of coordination requirements in different directions and are usually determined by the task itself. The closed form solution can be obtained by solving

OJ =0 Os subject to constraints (i)[7]. 5. E X P E R I M E N T A L

RESULTS

The above planning and control schemes for telerobotic systems were experimentally implemented and tested on two PUMA 560 6-DOF robot arms. Each robot arm is equipped with FSA-3254 6-Axis force/torque sensor. The planning and control algorithms were implemented in a SGI 4D/340 workstation. A graphical user interface has been developed. It can display sensory information and provide a means for the human operator to adjust the system parameters.

.-~ 1 ooolE /

4

'.

~,

===========================

g s0o~.--.-..---..--..'.----.---=---.i --.- x - - - :

a.

0

500

.....

0

,000

Y--" z ....

i

i

:

5

10

time (sec)

,

2- i

.

15

...........

i~

!. . . . . . . . . . . . . .

i .... Y - - -

i

i

z ....

.

,,_ _

.

r, ~ ! ~ ' : ''! ' : -2

0

5

.

'

time (sec)

10

] ,oo01

..

15

....

g 50oFz.---.-..=.-.-.'...=-.-..---.-~ .. x - - ' ]

"

500 ! ............. 0

i ..........

5

':i

time (sec)

.... .....

10

15

0

0

5

9

time (sec)

10

15

Figure 6. Single Arm Obstacle Avoidance Figure 6 shows the experimental results of single-arm-teleoperation. At the beginning, the arm was commanded by an autonomous motion plan to move along a given straight line. While an unexpected obstacle was moving toward the given path of the arm, an operator used the Spaceball to superpose an additional command to the autonomous motion plan. The arm stopped the motion, and waited until the obstacle passed the given path. Then, after the operator released the Spaceball, the robot arm continued its original autonomous motion without any replanning. It can be seen that the robot arm undergoes

384 smooth transfer between "simple teleoperation" and "complicated teleoperation" to cope with the unexpected event 9 For both modes of operation, highly accurate control has been achieved 9

E~"1.5f -

L~

1"]"

.

,

,.

........ ', .......... i ........ i

~0.5[..... 8 o[

i

.

i ........

"

:

"~" 1ooo1,,

E

13-

/

i

i

5

i

10

15

,

,

time (sec)

o

20

o

',

,

9

I

.

~w" o o

0

.

0

5

.

.

.

.

.

i. x "

10

15

'~"

,

5

i ........

l

l 9 9 9

. 9

.

.

.

.

.

.

9

.

i x~

y--.

|

5

.

. . . .

,

20

9. . ,

.

!." .... ! "'..

: .

l..,

....

15

. . .

.

0

!

9 9 9

9

Figure 7. Dual-Arm

!

I

- !-: ......

!

,

10 time (sec)

9

20

;..i ......

!

..-..-.- ..'-,_.~..::.~._.._.._._ .... !

:

~-50

,

i

..~ . . . . . . . . . .

! Y--"

.

time (sec)

,

!

I.......... ! z

0

"

"- 1[I .......... ~ i..........i....... i...... t_ ~. 1 ....i-T iO E

I

8-soo ~ A

~.2,

,

!

soo[ . . . . . . . . . .

:

0

.

.2 ........ ! .:

"....'.." o

9

9

i

i

10

15

time (see)

20

Teleoperation

In Figure 7, a carton weighing 0.45kg, was manipulated by dual arm teleoperation system as shown in Figure 5. The motion command was generated by a human operator through the Spaceball. Dual arm coordination was automatically maintained 9 The coordination error was defined as e~oo~d

--

IlPl(S)

-- m(~)ll

-- IIP~(S) -- P~(S)II

which is the difference between the actual and the desired distances of two contact surfaces. The experimental results show that high accurate coordination has been achieved in the teleoperation. 6. C O N C L U S I O N A new action planning and control scheme for the telerobotic system has been proposed in this paper. Based on the event-based planning and control theory, a unified model has been developed to integrate the human operator command with the autonomous motion plan. This makes it possible for human operator to adjust or modify the autonomous motion plan on-line to cope with unexpected events. It significantly improves the robustness of the teleoperation. In addition, the method provides a scheme to automatically coordinate the dual-arm in a teleoperation. This significantly reduces the human operator's required operating skill level. The experimental results presented in this paper verify

385

the theoretical results and clearly demonstrate their advantages. More importantly, the proposed methods provide a mechanism for integrating various sensor information with human operator commands. The methods could easily be extended to a general theory for modeling, analyzing and designing a human/machine interaction system. This will be an important step in the advancement of the development of an intelligent machine. REFERENCES

I. J.Y.S. Luh, Y.F. Zheng, Constrained Relations Between Two Coordinated Industrial Robots for Motion Control, The International Journal of Robotics Research, Vol.6, No.3, Fall 1987. 2. T.J. Tam, A.K. Bejczy and Ning Xi, Hybrid Position/Force Control of Redundant Robot with Consideration of Motor Dynamics, Proc. of the Thirtieth Annual Allerton Conf. on Communication, Control and Computing, Urbana, 1992. 3. T.J. Tam, A.K. Bejczy and Ning Xi, Motion Planning in Phase Space for Intelligent Robot Arm Control, Proc. of IROS'92, Raleigh, 1992. 4. T.J. Tam, A.K. Bejczy, G.T. Marth and A.K. Ramadorai, Kinematic Characterization of the PUMA 560 Manipulator, Robotics Laboratory Report SSM-RL-92-15, Washington University, Dec. 1991. 5. A.K. Bejczy and Z.F. Szakaly, A Harmonic Motion Generator for Telerobotic Applications, Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, California, April, 1991, pp.2032-2039. 6. Z.F. Szakaly and A.K. Bejczy, Performance Capabilities of a JPL Dual-arm Advanced Teleoperation System, (Preprint) SOAR'90 Workshop, Albuquerque, NM, June 26-28, 1990. 7. Ning Xi, T.J. Tam and A.K. Bejczy, Event-Based Planning and Control for MultiRobot Coordination, Proceedings of the 1993 IEEE conference on Robotics and Automation, Atlanta, Georgia, pp.251-258, May 2-6, 1993. 8. Yasuyoshi Yokokohji, Akira Ogawa, Hitoshi Hasunuma, and Tsuneo Yoshihawa, Operation Modes for Cooperating with Autonomous Functions in Intelligent Teleoperation Systems, Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, Georgia, May 2-6, 1993, Vol.3, pp.510-515. 9. Paul G. Backes, Mark K. Long, and Robert D. Steele,The Modular Telerobot Task Execution System for Space Telerobotics, Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, Georgia, May 2-6, 1993, Vol.3, pp.524-530. I0. T. Sato and S. Hirai,MEISTER: A Model Enhanced Intelligent and Skillful Teleoperation Robot System, Robotics Research- The Fourth International Symposium-, The MIT Press, pp.155-162, 1988.

386

II. P.T. Boissiere and R.W. Harrigan,Telerobotic Operation of Conventional Robot Manipulators, Proceedings of the 1988 IEEE International Conference on Robotics and Automation, pp.576-583. 12. V. Lumelsky and E. Cheung,Towards Safe Real-time Robot Teleoperation: Automatic Whole-sensitive Arm Collision Avoidance Frees the Operation for Global Control, Proceedings of the 1991 IEEE International Conference on Robotics and Automation, pp. 797-802.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

387

T a s k - d i r e c t e d p r o g r a m m i n g o f s e n s o r - b a s e d robots B. Brunner, K. Arbter and G. Hirzinger German Aerospace Research Establishment (DLR), Oberpfaffenhofen, Institute of Robotics and System Dynamics, P.O. Box 1116, D-82230 Wessling, Germany

Abstract

We propose the so-called TeleSensorProgramming concept that uses sensory perception to achieve local autonomy in robotic manipulation. Sensor-based robot tasks are used to define elemental moves within a high level programming environment. This approach is applicable in both, the real robot's world and the simulated one. Beside the graphical off-line programming concept, the range of application lies especially in the field of teleoperation with large time delays. A shared autonomy concept is proposed that distributes intelligence between man and machine. The feasibility of graphically simulating the robot within its environment is extended by emulating different sensor functions like distance, force-torque, and vision sensors to achieve a correct copy of the real system behaviour as far as possible. These simulation features are embedded in a task-oriented high level robot programming approach. This implicit programming paradigm is supported by a sophisticated graphical man-machineinterface. Programming by demonstration is performed by introducing an analytical controller generation approach. Sensor fusion aspects with respect to autonomous sensor-controlled task execution are discussed as well as the interaction between the real and the simulated system. 1

Introduction

Programming a robot off-line has been restricted to showing a sequence of cartesian positions or joint angle configurations. But mechanical inaccuracies at the manipulator level and differences between real and simulated world lead to difficult problems in the execution of off-line defined robot activities. Local adaptations on path generation level have to be done via teach-in commands by a human operator, especially in the context of manipulation tasks, such as gripping or handling objects. Even small deviations in position, orientation, and shape of all the objects to be handled are not allowed, because the task execution will fail in an uncertain environment. Graphical simulation of robot tasks and downloading the generated commands to the real robot is limited to the joint or cartesian motion level. This approach is only useful if geometrical consistency of the real environment and the simulated one can be guaranteed. This demand cannot be met with available programming systems.

388 The most important requirement to achieve autonomy is the ability to successfully execute a given task by intelligent sensor data processing. Actual sensory data are matched with a predefined reference pattern to receive information for the robot controller to achieve the desired end position. Local deviations from the desired state are detected by the sensors and handled by the robot system in an autonomous way. Therefore, sensor-controlled movements have to be used to achieve local autonomy at the machine level. High level planning facilities for task scheduling as well as intelligent error handling mechanisms are required for full autonomy, but state-of-the-art techniques are insufficient to provide the adequate tools. Presently, full autonomy is unattainable.

2 The TeleSensorProgramming approach Therefore, we favour a shared autonomy concept that distributes intelligence between man and machine [ 1]. Presuming that sufficient information about the actual environment is available from the sensor systems, partial tasks can be executed independently at the machine level. Specifications and decisions at a high task planning level have to be performed by human operators. Local sensory feedback loops are executed by the robot system, while global task level jobs have to be specified interactively by a human operator. Coarse planning activities have to be done at a task-oriented level by human intelligence, fine path planning at manipulator level takes place at a sensor-based control level [2]. human operator

low frequency

planning

task-directed

robot system

high frequency

execution

sensor-based

Figure 1 Shared autonomy This shared autonomy approach is the basis of our programming paradigm which is determined TeleSensorProgramming (TSP), resulting in robot programming at a task-directed level. The teach-in of a robot system occurs not at the joint or cartesian manipulator level but at a high language level, i.e. the operator plans activities at a level which can be processed by the robotic system independently from human intervention. Hence, TSP means teaching by showing typical situations at a task-directed level including nominal sensory patterns with the aid of sensory refinement in the real, or a completely simulated, world. All the taught sensor-based actions can be activated in analogous situations. Figure 2 shows the structure of our TSP concept consisting of two parallel working control cascades. One of them is the real system control loop containing internal control loops for local autonomy. The other one is a simulation environment which is structurally equivalent to the real system, with few exceptions. Most important is a signal delay which may occur in the real (remote) system, e.g. in space applications, which is not implemented in the simulation. This makes the simulation predictive with respect to the real system. Another exception is the display of internal variables in the simulation part, which cannot be observed (measured) in the real system. This gives the operator or task planner more insight to what happens or may happen on the system according to his commands. Communication between the two loops

389 runs via a common model data base which delivers a priori knowledge for the execution at the remote level and a posteriori knowledge for the model update of the simulated world.

i!

Graphics and Animation

'1

virtual feedback

real feedback

l

i

local autonomyloop

I

I

,-.-,o0~, I

..... t ...... : .......

task planning and error handling

]i~

path planning 1 and parameter determination

I r.o~,~

.... " ...... ".... t f :

l

(Ground)

|i

!

I

,',',o~,

....... : l I

world base sensormodel fusiondata algorithms

a posterioriknowledge update of world model

:

Simulation

"-!

I! .... "

J

t

a prioriknowledge usageof world model

,obo,

images Plots Graphics

Real System i 9 9 ; ......................................................................................

;

Figure 2 The TeleSensorProgramming concept For such an intelligent programming system different tools are necessary to implement the required functionality. First, a sophisticated simulation system has to be provided to emulate the real robot system. This includes the simulation of sensory perception within the real environment. Beside this sensor simulation, the shared autonomy concept has to provide an efficient operator interface to set up task descriptions, to configure the task control parameters, to decide what kind of sensors and control algorithms should be used, and to debug an entire job execution phase. Two applications in the field of robotics are evident to the proposed TSP approach. First, the graphical off-line programming concept is extended by the processing of simulated sensory data and by the introduction of local autonomy on the task description level. This means that not only the joint and cartesian information is gathered by graphically guiding the robot during the off-line programming task, but also the simulated sensory informations are stored off-line as nominal patterns for subtask execution on a local feedback loop. The fine motion control loop for handling uncertainties takes place independently of any human intervention both at the simulation side and the real one. A main advantage of this off-line programming scheme is the feasibility to verify all the robot actions before real execution is performed. This includes the perception and processing of sensory data and the activation of the sensor-controlled elemental moves.

390 Second, the field of telerobotics with large time delays, especially in space and subsea applications, is a good area for this sensor-based, task-directed programming approach [1]. Direct visual feedback in the control loop, where a time delay of a few seconds is inherent, is not feasible for the human operator to handle the robot movements in a suitable way. Predictive simulation is the appropriate means for an operator to telemanipulate the remote system on-line [3]. Similar approaches are known, which use force reflecting hand-controllers to feed back force sensor signals for shared and teleoperated control modes [4]. Furthermore, an interactive supervisory user interface makes it possible to set up the environmental and control parameters. But in our approach, the sensor-based elemental move concept (see section 3), feasible for all kind of sensors and execution tasks, is preferred. The operator has to only guide the robot through the task space in a coarse way and to activate specific sensor control phases. Sending these basic commands to the remote system enables the real robot to execute the desired task with its own local autonomy after the delay time has elapsed. The main feature of our telerobotic concept is to replace the time-delayed visual feedback by predictive stereo graphics with sensor simulation, providing a supervisory control technique, that will allow to shift more and more autonomy and intelligence to the robot system. The main focus of this paper lies in the task-directed programming approach which facilitates the robot programming work by an intuitive task description level. Predefined complex sensor-based tasks can be described at a task-oriented level which allows for usage in a varying context. To describe robot activities at a high language level we introduce the elemental move concept. This concept allows us to define subtasks that can be executed by the robot system in an autonomous way. To program a complex robot task it is only necessary to define a sequence of elemental moves at an intuitive planning level. An analytical approach is proposed to define sensor-based elemental moves without an extensive controller design. Therefore, we call our method programming by demonstration, which represents an easy variant of teaching by showing. 3

The elemental move concept

In the past, many attempts have been started to describe robot assembly plans at a high language level. Command language approaches [5] or planning tools such as Petri nets [6] or assembly graphs [7] have been proposed. They work well in a structured, well-defined environment for a specific application field like block world assembly [8] or compliant motion tasks [9]. Our task-directed programming approach is driven by an elemental move concept that enables us to program robot tasks in an intuitive implicit manner. A complex robot task is composed of such elemental moves (EMs) that can be divided into three categories. First, pose-controlled EMs that are described by the goal pose (position and orientation) in the cartesian space; second, the sensor-controlled EMs that have to be defined via nominal sensory patterns, and, third, shared-controlled EMs. Each class of EMs has a template of pre- and postconditions that describe the prerequisites to activate or stop the execution of an EM instance. An EM instance can be regarded as a node of a state machine. The

391 transition between two nodes is defined by the matching of post- and preconditions of the corresponding EM nodes. 3.1 Pose-controlled elemental moves

Pose-controlled EMs are defined via the goal pose in the cartesian space. Therefore, the precondition template for the pose-controlled EM class is empty, the postcondition is met if the tool frame of the robot has reached the desired pose within the predefined limits. The path generation considers the inverse kinematic problem that is solved by an iterative approach. Currently, an on-line collision avoidance algorithm is implemented to find a collision-free path from the current start to the defined end pose [10] . 3.2 Sensor-controlled elemental m o v e s

Sensor-controlled EMs are defined via the nominal sensory pattern in the reference pose relative to the environment. Therefore, the precondition template for the sensor-controlled EM class includes the availability and correctness of the sensory data, which is necessary to apply the control algorithm for achieving the desired goal state. This class of EMs represents full autonomy, in the sense that the sensor-based control mechanism leads to a proper system behaviour. An EM is fully sensor-controlled, if all 6 degrees of freedoms (DOFs) in the cartesian space are controlled by sensor values. With the aid of sensor-based path refinement it is possible to act in an environment under the constraints of the sensor measurements. This class of the sensor-controlled EMs also gives us the possibility to handle similar tasks in analogous situations. E.g. APPROACH movements, that will be able to align the robot gripper with the object to be picked up, can be defined as one sensor-controlled EM for a desired distance and/or vision sensory pattern. Similar objects at different poses can be gripped by the same EM. 3.3 Shared-controlled elemental moves

The shared-controlled EM class is a mixture of the previous two. For each instance of shared-controlled EM the constraint space configuration has to be defined. The constraint space configuration description specifies which degree of freedom in the cartesian 3D-space is pose- or sensor-controlled. For instance a contour-following task in the xy-plane can be described by the pose-controlled DOFs Ztrans, Xrot, Yrot, Zrot and the sensor-controlled Xtrans and Ytrans. Based on Mason's C-frame concept [11 ] sensor-controlled subspaces are defined using nominal sensory patterns to control the selected DOFs. The free DOFs are controlled by the path planning algorithm or, in the case of teleoperation, by the human operator using a 6-DOF input device. The techniques used for projecting gross move commands into the pose and sensor-controlled subspaces have been discussed in a number of previous papers, e.g. [12].

4 High-level task-directed programming Task-directed robot programming has been discussed so far in the field of generalized planning [13] and assembly plan generation as an AI paradigm [14],[15]. Especially for the

392 telerobotic applications interactive planning facilities are of crucial importance. To integrate such planning tools into the framework of sensor-based task execution, only theoretical work has been done so far [ 16]. We believe that the realization of task-directed robot programming by the elemental move approach leads to an intuitive high level programming concept. Complex robot tasks are described by an action graph of elemental moves. The nodes of this graph are represented by the EM controllers. The transitions between the corresponding nodes are defined by matching the successor preconditions with the predecessor postconditions. A plausibility check has to be initiated to look for the congruence between the end conditions of the subtask and the start conditions of the following one, so that the consistency of the subtask sequence is ensured within the entire robot task. The feasibility of an EM, i.e. the question whether the preconditions can be met, depends on availability of the required sensor values as well as on determination of the constraint space configuration. To implement a real task-directed programming tool we propose a hierarchical declaration of the EM framework. At the lowest level we define the pose-, sensor-, and shared-controlled EMs as so-called atomic elemental moves that cannot be devided into several control items. Each node in the action graph represents one instance of an atomic EM. A sequence of connected nodes as a part of the action graph can be joined to a so-called composed elemental move, which can be further used as a logical EM at a high task description level. We have to consider that the uniqueness of the graph transitions are guaranteed by the alignment of the pre- and postconditions of successive EMs. Furthermore, a composed EM is put together by a non-empty sequence of atomic and/or composed EMs in a recursive way. One remark to the definition of the atomic EMs: In contrast to other approaches, e.g. [5] or [4], we don't have to predefine all the elemental operations which can be used to execute any task. We can interactively define and configure an atomic EM via programming by demonstration, which is the topic of the next section.

5

Programming by demonstration

Visual programming tools for the direct transfer of human manipulation skills to the robot have been developed, but they are constrained to location gathering by using a marker-based vision system [17] or trajectory generation from observing object features [18]. Compliance controller identification by human demonstration [19] is restricted to straight line motion in well-defined constraint configurations. Methods proposed during the last years [20],[21],[22] for programming by demonstration utilize the real robot system to collect the required demonstration data. The same system is used for further execution, so that this class of direct demonstration approaches can be regarded as simple playback methods. 5.1 Remote demonstration

We have extended this direct demonstration method to the separation of demonstration and execution system that gives us the ability to define task level robot actions off-line or to overcome the problems attended upon time-delayed telerobotic applications. Before the execution of the robot task takes place in the real system, we demonstrate the task by

393 using sophisticated simulation tools including the emulation of the interactions between the manipulator and the environment concerning collision information of contact sensors as well as non-contact sensor devices. Currently we have implemented the functionality of laser distance sensors, a feature-based simulation of stereo cameras, and the behaviour of a stiff force-torque sensor in contact with the environment [23]. We emphasize that the usage of simulated sensory behaviour is only feasible, if the virtual system is properly calibrated. Especially in the emulation of computer vision the knowledge of internal and external camera parameters is indispensable for finding the right mapping between the 3D world model and the 2D camera planes. 5.2 Definition of atomic elemental moves It should be noted that the following concepts for teaching atomic elemental moves are applicable to both, first to the simulation with regard to graphical off-line programming or telerobotics, second to the real robot (including sensor equipment) without any simulation environment. Applying programming by demonstration to simple pose-controlled EMs is straightforward. The actual cartesian pose of the robot's tool frame is stored with current manipulator data like velocity or acceleration. By activating such a pose-controlled EM the robot moves from any pose to the desired one regarding the inverse kinematic problems as well as the collision detection. To move the robot a 6-DOF input device is used, which allows the operator to easily generate such target points within the robot's workspace. To define the various instances of the sensor- or shared-controlled EM class, we propose an intuitive programming by demonstration method. The key issue is to find an appropriate controller to perform the desired EM correctly. 5.3 Automatic controller generation For the handling of the sensor-controlled EMs we outline a method to estimate the motion parameters by the known sensor values to achieve the predefined goal state of the robot's pose (cartesian position in translation and orientation or joint angle position). In other words, we have to find a control sequence that transduces the robot's end effector from a pose xo into the nominal pose x* which is described by its corresponding sensor values y*. x* is assumed to be unknown or uncertain. The motion parameters relative to the sensed environment are expressed by the vector increment Axk, the actual sensor values by the vector Yk. The nominal sensor values y* generally are non-linear functions of the actual interaction between the robot, the sensor and the environment in the robot's nominal pose x*. y*-f(x*)

(1)

In the execution phase we want to find a controller sequence that is able to reach the goal state dependent on the nominal sensor value vector, which the system was taught for. Starting from a pose xo we calculate stepwise the motion parameters Axk in order of the actual and nominal sensor values, i.e. we apply a linear controller of the form

Axk = ~ C ( y k - y*),

(2)

394 where C is a constant N'M-Matrix, which maps the displacement in the M-dimensional sensor space to the N-dimensional control space, and where a is a scalar damping factor, which determines the dynamic behaviour of the closed control loop. The optimal controller coefficients, in the sense of least square estimate, are expressed by the pseudoinverse

C = ( J T j ) - I J T,

(3)

where the elements of J are the partial derivatives of the sensor values y to the motion parameters x:

Ji j

Oyi

=

'

Oxj

(4) x*

The teach-in process is implemented in an efficient way, because the nominal sensor values y* can be easily derived from either the simulation or the real system. The controller design, i.e. the construction of J is also implemented in a simple manner by experimentally estimating the Jacobian J, i.e. moving the simulated or real robot by increments, using the difference quotients AYi (5) J i 'J ~

-s

i

x*

"

The reason why we determine the Jacobian J by experimentally estimating the partial derivatives of the sensor values to the motion parameters, is the robustness of the control approach in the local working space, i.e. if Yk ~ Y* is in a suitable bound. Doing so, we avoid the very high effort which would be necessary if we would analytically determine the partial derivatives using a large non-linear model. Furthermore, the estimates determined by using the real system are expected to be better than the analytical ones, because they are independent of (inevitable) model abstractions. To get the mapping from the actual sensory pattern to the desired robot's reference pose, a sensor calibration is not required because the training process delivers the inverse relation between the robot's movements and the corresponding sensor values. In the case of programming by virtual demonstration within the graphical simulation environment, sufficiently correct world models with respect to sensor and object descriptions must be available for the generation of the controller matrix C. The method is plain and robust. The simplicity of the programming by demonstration paradigm is fulfilled by an easy way to configure the sensor- or shared-controlled EM description. The reference sensor pattern, i.e. the goal state definition of the atomic EM is directly shown by the current sensor value configuration. The controller for achieving this desired state is determined in an automatic manner by the generation of the Jacobian as described above. The human operator has only to specify the controller input variables, i.e. the sensor combination, which should be used for the control task, and the controller output variables, i.e. the sensor-controlled DOFs, to activate the controller generation. This procedure can be viewed as a form of training by doing. Currently an intelligent sensor-planning system is in progress to find out the related sensor usage in combination with the determination of the sensor-controlled subspaces in an autonomous way [24].

395

5.4 Sensor fusion aspects The proposed sensorimotion approach allows a simple integration of different sensor values. An efficient sensorfusion algorithm is inherent implemented by assigning all the used sensor values to the elements of the measurement vector y. To get better estimation results, it is often helpful to use redundant and normalized sensor information. We have implemented this approach, among other cases, to combine stereo vision (2 cameras) data with laser range data (4 distance values) and have observed fast (exponentially) as well as wide (full sensor range) convergence. In the following the main advantages of the proposed controller generation process are shortly discussed. The controller matrix C remains constant during the control process. The training of sensor-controlled EMs, i.e. to set the linear controller coefficients in the Jacobian, is necessary only once for a specific sensor/DOF combination in the desired reference state. The number of training steps for the determination of the controller is equal to the number of DOFs to be controlled. Any constraint space configuration can be selected to integrate the shared-control concept into the control generation process. Sensor failures can easily be handled by introducing a binary valued diagonal weight matrix W extending the controller design to (

C = jTWj

)-1

JTw

with

Wi, j - -

{

E

O" i ~ j

{0, 1} 9 i - j

'

(6,

The controller redesign only needs a failure detection, but does not need to apply a new training process for the changed sensor configuration. If j T j is regular, then JTWJ may be regular too, if the original sensor configuration contains enough redundancy. The condition number of j T w J may be used to qualify different sensor configurations. To realize such a sensorimotion coordination and sensor fusion concept, we already have applied other methods such as neural networks [25]. Our ongoing work is to compare both approaches with regard to applicability, robustness, and efficiency.

6 Experiments The TSP concept has been verified within R O T E X an advanced space RObot Technology EXperiment in May '93 at the D2 spacelab mission [12]. The experimental environment of the ROTEX ground station is used and currently extended to implement our task-directed robot programming approach. A powerful man-machine-interface is one of the key issues to perform the programming by demonstration in an efficient way. A 6-DOF control-ball is used to move the robot's tool center point in the 3D-stereographics simulation. A user-friendly graphical interface based on X and OSF/Motif has been implemented to support the operator in the definition and composition of the various elemental moves. Three types of sensors are available in the real as well as the simulated gripper environment: an array of 8 laser distance sensors with a working range from 0-3 cm, one with 0-30 cm, two 6-axis force-torque wrist sensors (a stiff strain gauge based and a more compliant optical one), and a tiny pair of cameras to provide a stereo image viewed in the gripper's z-direction.

396 The image processing system used in our laboratory is able to extract features like contours or markers in real time (application dependent). In addition to this direct object feature extraction more abstract measurements, such as the Fourier coefficients of a contour, can be delivered as an efficient tool for the determination of the pose, size, and shape of an object given by its contour line [26]. The corresponding simulation works analogously at a featurebased level with respect to the internal and external camera parameters. Especially for the acquisition of stereo vision data this model-based method offers the advantage to avoid the corresponcience problem. Corresponding object features in the different camera views can easily be detected by using the same geometric world model interface. The experimental environment consists of a powerful Silicon Graphics workstation with a multi-processor architecture to enhance the simulation and graphical visualization of the workcell environment. The graphical user interface as well as the controller task are running as independent processes on the same machine. The inter-process-communication occurs via shared memories protected by semaphores to guarantee the mutual exclusion on apparent access collisions. The exchange of an orbital replaceable unit (ORU) seems to be a good example to explain the task-directed programming approach in more detail (see figure 3). This manipulation task can be regarded as a sophisticated pick and place operation where the pick includes a bajonet closure screwing action. The composed EM ExchangeOru is split into three more detailed EMs FreeMotion, ApproachToOru, AttachOru, and the inverse EMs FreeMotionWithOru, ApproachWithOru, ReleaseOru. Let us focus on the EM ApproachToOru. This composed EM is further divided into three atomic EMs which differ in the choice of the used sensors and the constraint space configuration.

Figure 3 The composed EM ExchangeOru

397 First, only the vision sensor is available within its measurement range. ApproachByVision uses four corner points lying on the ORU contour as the reference pattern to achieve the first goal state. After the execution of the EM ApproachByVision, the distance sensors can be applied together with the visible corner points (see figures 4 and 5). The reference state of this EM ApproachByVisionAndDistance is reached if the distance sensor values are zero or non-zero force-torque sensor values appear. The last atomic EM ApproachByDistanceAndForce ends in a state where the attaching of the ORU is possible and the next EM AttachOru can be activated.

Figure 4 Configuration of the EM ApproachByVisionAndDistance

Figure 5 ApproachByVisionAndDistance, viewed out of the gripper-camera All three atomic EMs are completely sensor-controlled, but for teleoperation the constraint space configuration can easily be reconfigured to allow an operator supported shared-control. The Jacobian for the EM ApproachByVisionAnODistance e.g. integrates the vision data of two visible corner points of the ORU contour and four distance sensor values. These sensor readings are represented by the sensor beams approximately perpendicular to the desired contact surface between the ORU and the gripper in the attach phase (see figures 4 and 5).

398 We experimentally estimate the Jacobian J by moving the simulated or real robot by increments of 1 mm in each translational DOF and of 2 degrees in each rotational DOF, creating the difference quotients

Ayi Ji'J ~ ~zj

y (x* + A x j ) - y (x* - A x j )

x*

2 Axj

(7)

The incremental values of 1 mm and 2 degrees depend on the geometrical dimensions of workcell and sensor description. We have to consider that the incremental values Axk have to be large enough to avoid numerical error problems. The linear controller, automatically created and used by the EM ApproachByVisionAndDistance, generates a smooth behaviour of the tool center point movements to achieve the desired position. In the case of programming by demonstration and execution both on the real system, the experimental determination of the controller coefficients considers implicitly the sensor characteristics. This is a great advantage w.r.t, to an analytical approach, where the particular sensor parameters must be explicitly known to obtain the correct mapping between the sensor measurements and the manipulator movements. In the case of programming in the simulated environment and execution in the real world, the alignment of the real and the virtual task execution has to be guaranteed. Due to the fact that the goal state of the sensor-controlled EMs is expressed in terms of sensor values, the real sensor data must match the sensor values which are simulated in the virtual environment. For instance the distance sensors have to deliver the correct distance value between the sensor origin and the measured object surface, or the vision system should be able to provide features such as points, lines, or contours in the image plane without lens distortion or something else. Therefore the congruence between the simulated and the real sensors must be guaranteed, using calibrated sensor models.

7

Conclusion

We have introduced the TeleSensorProgramming concept as an easy way to program a robot off-line in a robust way with sensor-based control structures. The proposed taskdirected programming paradigm is also applicable in the field of telerobotics with large time delays where local autonomy is indispensable. Teaching by showing in the real as well as in the simulated world, including all the interactions between the robot and its environment, leads to a programming by demonstration approach using an elemental move concept with local sensor-based machine intelligence. The elemental moves are defined by a sensor data processing scheme which is experimentally determined via the Jacobian in the working area respectively. Goal of our ongoing work is to integrate the virtual and real world by world model update facilities. As the first step we have to enhance the congruence between the virtual and real world using calibrated sensor models.

399 References

[1] G. Hirzinger, J. Heindl, K. Landzettel, and B. Brunner, "Multisensory shared autonomy a key issue in the space robot technology experiment ROTEX", Proc. of the IEEE Conf. on Intelligent Robots and Systems (IROS), Raleigh, July 7-10, 1992. [2] T. Sheridan, "Human supervisory control of robot systems", Proc. of the IEEE Conf. of Robotics and Automation, San Francisco, 1986. [3] A. Bejczy and W. Kim, "Predictive displays and shared compliance control for timedelayed telemanipulation", IEEE Int. Workshop on Intelligent Robots and Systems, Ibaraki, 1990. [4] P. Backes and K. Tso, "UMI: An Interactive Supervisory and Shared Control System for Telerobotics", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 1096-1101, 1990. [5] J. Funda and R. Paul, "Remote Control of a Robotic System by Teleprogramming", Proc. of the IEEE Int. Conf. on Robotics and Automation, Sacramento, 1991. [6] B. McCarragher, "Task-Level Adaptation Using a Discrete Event Controller for Robotic Assembly", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [7] A. Kapitanovsky and O. Maimon, "Conceptual Graph-based Synthesis of Robotic Assembly Operations", Proc. of the IEEE Conf. on Robotics and Automation, Nice, France, May, 1992. [8] G. Schrott, "An Experimental Environment for Task-Level Programming", Proc. of the Second Int. Symposium on Experimental Robotics (ISER), Toulouse, France, June 25-27, 1991. [9] P. Van de Poel, W. Witvrouw, H. Bruyninckx, and J. de Schutter, "An Environment for Developing and Optimising Compliant Robot Motion Tasks", Proc. of the Int. Conf. on Advanced Robotics (ICAR), Tokyo, November, 1993. [ 10] E. Ralli and G. Hirzinger, "Fast Path Planning for Robot Manipulators using Numerical Potential Fields in the Configuration Space", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Munich, Sept. 12-16, 1994. [ 11 ] M. Mason, "Compliance and force control for computer controlled manipulators", IEEE Trans. on Systems, Man and Cybernetics, Vol SMC-11, No. 6, pp. 418--432, 1981. [ 12] G. Hirzinger, B. Brunner, J. Dietrich, and J. Heindl, " R O T E X - The First Space Robot Technology Experiment", Proc. of the Sixth Int. Symposium on Robotics Research, Hidden Valley, PA, Oct 2-5, 1993. [13] W. Yared and T. Sheridan, "Plan Recognition and Generalization in Command Languages with Application to Telerobotics", IEEE Trans. on Systems, Man and Cybernetics, Vol. 21, No. 2, p. 327-338, 1991. [ 14] B. Frommherz and G. Werling, "Generating Robot Action Plans by means of an Heuristic Search", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 884889, 1990.

400

[15] Y. Huang and C. Lee, "An Automatic Assembly Planning System", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 1594-1599, 1990. [16] M. Gini, "Integrating Planning and Execution for sensor-based Robots", in: Ravani B. (ed.), CAD based Programming for Sensory Robots, NATO ASI Series F50, Springer, Berlin, 1988. [17] S. Tso and K. Liu, "Visual Programming for Capturing of Human Manipulation Skill", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [18] A. Ude, "Trajectory generation from noisy positions of object features for teaching robot paths", Robotics and Autonomous Systems 11, 113-127, 1993. [19] N. Delson and H. West, "Robot Programming by Human Desmonstration: Subtask Compliance Controller Identification", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [20] G. Hirzinger and J. Heindl, "Sensor programming - - a new way for teaching a robot paths and force/torques simultaneously", Proc. of the Third Int. Conf. on Robot Vision and Sensory Controls, 1983. [21] H. Asada and S. Izumi, "Direct teaching and Automatic Program Generation for the Hybrid Control of Robot Manipulators", Proc. of the IEEE Int. Conf. on Robotics and Automation, 1987. [22] K. Kosuge, T. Fukuda, and H. Asada, "Acquisition of Human Skills for Robotic Systems", Proc. of the IEEE Symposium on Intelligent Control Arlington Virginia, August, 1991. [23] B. Brunner, G. Hirzinger, K. Landzettel, and J. Heindl, "Multisensory Shared Autonomy and Tele-Sensor-Programming- Key Issues in the Space Robot Technology Experiments ROTEX", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [24] G. Grunwald and G. Hager, "Towards Task-Directed Planning of Cooperating Sensors", Proc. of the SPIE Conf. on Sensor Fusion V, Boston Mass., Nov., 1992. [25] G. Wei, G. Hirzinger, and B. Brunner, "Sensorimotion Coordination and Sensor Fusion by Neural Networks", Proc. of the IEEE Int. Conf. on Neural Networks, San Francisco, 1993. [26] K. Arbter and H. Burkhardt, "A Fourier-method for feature extraction and parameter estimation for planar curves in the 3-D space (in German)", in: Informationstechnik it(33), Oldenbourg, Munich, 1991.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

401

Telerobotics with large time delays- the ROTEX experience G. Hirzinger and K. Landzettel ~ and Ch. Fagerer b ~DLR German Aerospace Research Establishment, D-82234 Wessling bUniv, der Bundeswehr, D-85577 Neubiberg The paper discusses delay-compensating techniques when operating a space robot from ground or from another remote spacecraft. These kind of techniques have been a key element in the space robot technology experiment ROTEX that has successfully flown with shuttle COLUMBIA end of April 93. During this "spacelab-D2"-mission for the first time in the history of space flight a small, multisensory robot (i.e. provided with modest local intelligence) has performed prototype tasks on board a spacecraft in different operational modes, namely preprogrammed (and reprogrammed from ground), remotely controlled (teleoperated) by the astronauts, but also remotely controlled from ground via the human operator as well as via machine intelligence. In these operational modes the robot successfully closed and opened connector plugs (bayonet closure), assembled structures from single parts and captured a free-floating object. This paper focusses on the powerful delay-compensating 3D-graphics simulation (predictive simulation) concepts that was realized in the telerobotic ground station and which allowed to compensate delays of up to 7 sec e.g. when grasping the floating object fully automatically from ground. 1. I N T R O D U C T I O N Several activities towards flying space robots before the year 2000 have started in a number of countries, one of the largest projects being the space station's mobile servicing center (MSC) with its two-arm special dexterous manipulator system to be built by the Canadian space agency. Other remarkable mid-term projects are the Japanese space station "remote manipulator system" (JEM-RMS) or the Japanese ETS-VII project, an experimental flight telerobotic servicer for maintenance and repair of space systems. The European Space Agency ESA is going to prepare the use of robots in the European part COLUMBUS of the space station. We have a strong belief that for complex, partly autonomous robots with extensive ground control capabilities it is risky to leap from zero experience to a fully operational system; this is the reason why we have flown the space robot technology experiment ROTEX (spacelab mission D2 on shuttle flight STS 55 from April 26 to May 6, 93). ROTEX was the first real robot in space; it contained as much sensor-based on-board autonomy as possible from the present state of technology. There is a general observation which is crucial for the round-trip signal delays in case of on-line teleoperation. Robots on a geostationary "repair and maintenance" satellite would

402 be permanently visible directly from one and the same groundstation. Thus roundtrip delays might be kept down to a fraction of a second (the pure time of flight being around 0,25 sec). But for robots in low orbits (typically 300 to 500 km as with the shuttle) the alternative of "direct" signal contact exists only when the spacecraft has direct line of sight to the ground station; however these periods are typically 10 to 20 minutes every 2 hours. When permanent control of the robot is the goal, relay satellites have to be used that may guarantee "full-time coverage". However in this case the delays may easily sum up to a few seconds. When ROTEX became a real project around 1988 the first statements we got were that the round-trip delays would be around 15 seconds with a jitter of around 5 seconds, as the shuttle board and ground structures were not at all prepared for these kind of requests. Fortunately, later on NASA opened up the socalled "text and graphics channel" via TDRS satellite, which helped essentially to reduce the delays down to 6 -4- 1 sec delay with a jitter of a few hundred milliseconds. How we compensated these delays is the mayor issue of this paper.

2. A B R I E F

OUTLINE

OF THE ROTEX

EXPERIMENT

The experiment's main features were as follows (fig. 1 and 2) 9 A small, six-axis robot (working space .-~ 1 m) flew inside a space-lab rack. Its gripper was provided with a number of sensors, especially two 6-axis force-torque wrist sensors, tactile arrays, grasping force control, an array of 9 laser-range finders and a tiny pair of stereo cameras to provide a stereo image out of the gripper; in addition a fixed pair of cameras provided a stereo image of the robot's working area. (For more details see e.g. [11]) 9 In order to demonstrate servicing prototype capabilities three basic tasks were performed: a) assembling a mechanical truss structure from three identical cube-link parts b) connecting/disconnecting an electrical plug (orbit-replaceable-unit-ORUexchange using a "bayonet closure") c) grasping a floating object 9 All operational modes which we think are important for the future, were verified, - teleoperation on board - teleoperation from ground via human operators and machine intelligence as well - sensor-based off-line programming on ground

403

Figure l a. ROTEX control structures

Figure lb. Integration of ROTEX in spacelab (courtesy of DORNIER)

Figure 2. ROTEX robot and experiment set-up in DLR laboratory, where the multisensory gripper is below the three-part truss structure just in front of the bayonet closure (the "ORU")

404

Figure 3. Presimulation of sensory perception and path refinement in case of teleoperation from ground a) local on-board sensory feedback (e.g. tactile contact) b) sensory feedback via groundstation (grasping a free-flyer)

3. S T A T I C E N V I R O N M E N T : APPROACH

THE TELE-SENSOR-PROGRAMMING

For all operations related to a static environment the (delay-compensating) telerobotic control in ROTEX was in a unified way based on the tele-sensorprogramming approach; it comprises on-line teleoperation on board and on ground as well as sensor-based off-line programming on ground and subsequent on-board execution (following kind of a "learning by showing" philosophy). Basically this approach has two main features a shared control concept (see e.g. [10]) based on local sensory feedback at the robot's site (i.e. on-board or in a predictive ground simulation), by which gross commands were refined autonomously providing the robot with a modest kind of sensory intelligence (fig. 3a). However due to processor limitations, on-board sensory feedback in ROTEX was restricted to force-torque and range finder signals (see below). Active compliance as well as hybrid (pseudo-) force control using nominal sensory patterns in the sensorcontrolled subspace, based on MASON's C-frame-concept [2] was realized locally. Gross commands in this framework may originate from a human operator handling the control or sensor ball (a 6 dof non-force-reflecting hand controller) or alternatively from an automatic path planner. The techniques used for projecting gross commands into the position and sensor-controlled subspaces have been discussed

405 in a number of previous papers (e.g. [10]). Feedback to the human operator- in case of on-line teleoperation - was provided only via the visual system, e.g. for the astronaut via stereo TV images, for the ground operator mainly via predictive stereo graphics (the stereo TV images being an add-on for verification). This allowed us to realize a unified control structure despite of the fairly large round-trip signal delays of up to 7 seconds. Indeed predictive 3D-Stereographic simulation was a key issue for the big success of this space robot experiment, and in our opinion is the only efficient way to cope with large signal delays. Of course for these kind of ideas to work the same control structures and path planners were realized on-board as well as in the predictive graphics ground station. And this in turn meant that not only the robot's free motion but also its sensory perception and feedback behaviour had to be realized in the "virtual environment" on ground. 9 an e l e m e n t a l m o v e concept, i.e. any complex task like the dismounting and remounting of the bayonet closure is assumed to be composed of elemental moves, for which a certain constraint-frame- and sensor-type-configuration and certain nominal sensory patterns hold, so that automatic sensorbased path refinement is clearly defined during these motion primitives. As an example the bayonet closure screwing operation, but also the approach phase before grasping the closure, were typical sensor-based elemental moves, the screwing operation being a nice example for shared control as well, as the hand rotation was position- (or better orientation-) controlled using gross command projections, while the motion along the rotation axis was locally force controlled so that forces arising due to the screwing operation werde immediately compensated. For more details see e.g. [11] or [12]. All sensory systems of ROTEX worked perfectly and the deviations between presimulated and real sensory data were minimal (fig. 5). This was one of the many positive surprises of ROTEX. By help of these realistic graphics presimulation we were able to operate the robot via the graphics just if the delays were not present. During the ROTEX mission we did not overlay real and simulated views e.g. out of the hand cameras, instead the real endeffector's position was indicated by the hand frame and the real gripper's position by two patches in the graphics scene. In addition the graphics system permanently displayed real and simulated sensory data in form of overlayed bars (fig. 4) or dots (in case of the tactile arrays, see lower left part of fig. 4), while an additional SGI system displayed the time history of simulated and real sensory signals shifted by the actual delays, thus correlated in time (fig. 5). 4. D Y N A M I C E N V I R O N M E N T :

CATCHING THE FLOATING OBJECT

We feel that the local closed loop concepts as described in the preceding chapter is the only reasonable way to master especially contact operations, where errors of e.g. 1 mm might cause excessive forces. However in a non-contact environment e.g. when trying to grasp a free-floating object these errors might be admissible. Indeed no space qualifiable real-time image processing hardware was available for ROTEX; so in case of grasping the free-floating object from ground the only chance was trying to close the loop merely via the

406

Figure 4. Sensorsimulation: Range finder simulation in the "virtual" workcell environment. In addition to the 5 simulated rays out of the gripper (fig.2) the bars in the right lower corner indicate the same simulated (bright) and the corresponding real (dark) range values as registrated by the real robot.

Figure 5. Correlation between presimulated (for comparison delayed) and real sensory data (in closed loop each) was nearly perfect in ROTEX. These recordings of the four finger-range-finders pointing "downwards" were made during sensorbased teleoperation when removing from the ORU bayonet closure.

407

Figure 6. Block structure of predictive estimation scheme groundstation w i t h o u t local feedback (fig. 3b), thus leaving the basic ROTEX concept here. Fortunately a free-flyer provides a very precise dynamic model, so with the (monoor stereo video-)images available on ground optimal observer technology (Kalman filter estimation) is realizable and has succesfully be used in ROTEX. Nevertheless concepts of shared control (e.g. reserving several or all degrees of freedom for sensorbased automatic control) as presented in the preceding chapter remain valid here, too. Fig. 6 shows a block structure of such an estimation scheme. In ROTEX the (delayed) hand-camera information on the free-flyer's pose (relative to the gripper) was provided on ground using alternative schemes; one was based on the "dynamic vision approach" as given in [8], using only one of the two tiny hand-cameras, the other one was a full stereo approach realized in a multitransputer system. In both cases the thus "measured" object poses were compared with estimates as issued by an observer and predictor scheme that simulates the up- and down-link delays as well as robot and free-flyer models (fig. 6); this scheme predicts (and graphically displays) the situation that will occur in the spacecraft after the up-link delay has elapsed and thus allows to close the "grasp loop" either purely operator controlled, or via shared control, or purely autonomously (i.e. solving an automatic rendezvous and docking problem). In [4] the observer and predictor equations have been derived in detail for a fully linearized dynamic world model x(k + 1) =

A..._g.x(k)+ Bu(k)

y_(k) = c (k)

(1) (2)

408 where k denotes the k-th sampling interval, A is the systems dynamics matrix (containing robot, moving parts and disturbances), B the input matrix (vector) i.e. the control commands, __Cthe measurement matrix (vector), which indicates e.g. whether a moving object is observed out of the robot's hand, _x is the state vector, _u the control input to the robot (e.g. cartesian motion increments) and y contains the measured variables. In ROTEX the measured variables as registrated or computed in the groundstation have been the relative pose of the free-flyer (using real-time vision as mentioned above) and the robot's joint angles. In a lengthy derivation it has been shown for the linear case in [4] that if n~ is the uplink delay (in number of sampling intervals) and nd the downlink delay, then fig. 7 is the optimal delay-compensating scheme, showing up a very simple structure. For the real ROTEX implementation two essential changes had to be made: 1. The dynamic world model (e.g. due to the free flyer's dynamics) was treated as a real nonlinear system, thus extended Kalman filter schemes were applied instead of linear observers. 2. If we are only interested in perfect prediction in order to issue optimal control commands, it is of minor interest what the present dynamic state of the on-board system really is; in other words we may assume that there is only an uplink delay or only a down-link delay (which then has to summarize both types of delays of course). Thus we finally arrived at the structure of fig. 8 to be explained in more detailed now. Basically we have to split up the equ. (1) and equ. (2) into the robot (or camera) part (index R) and the free-flyer part (index F). Originally we intended to describe the robot dynamics (and thus the camera position dynamics) by simplified second order systems in every degree of freedom, i.e.

xR(k + 1) = ARxR(k ) + Bu(k)

(3)

but it turned out that the robot dynamics were negligible compared to the very long delays, so the final representation of the robot's motion was described by the frame transitions (caused by the control input u) Ln,k+ 1 = ALn, k (u)/JR, k

(4)

where the frames Ln is composed of the orientation matrix R and the tool-center position p (or more precisely the camera position), i.e. L R , k = [ Rk

P--R,kl

(5)

The free-flyer's motion in contrast has no control imput (if it does not hit the environment); assuming a rigid body motion it may be decomposed into a constant linear velocity of the center of mass and a constant rotational velocity around it. For avoiding singularities we used quaternions instead of e.g. Euler angles. The nonlinear state equation writes as

409

Figure 7. Detailed predictive estimation scheme for the graphics simulation of a teleoperated robot that has sensory feedback on board and a large time delay in the transmission links.

Figure 8. The ROTEX delay compensating scheme when grasping the free-flyer from ground.

410

X_F(k) = A F ( k - 1)_XF (k - 1)

(6)

where A f contains the nonlinear, state dependent parameters from the quaternion equations and ;TF consists of position, quaternions and velocities. State estimations for robot and freeflyer are different, too, nevertheless in both cases we used extended Kalman filters; in case of the robot the on-board measured joint angles (after transmission to ground) were compared with the simulated ones (transformed into cartesian values of the camera pose) taking into account that the on-board robot might not precisely execute the control commands

,R (k) =

(k) + KR (k)

(k) - v__R(k)

y (k) = c . z R (k) ~s

(7) (8)

~t

where _x~ (k) = f (L~:) = f (ALk-I,Lk_I). ^ denotes the updated estimates, while 9 denotes the predicted (apriori) estimates and K R is the corresponding Kalman gain matrix. The state estimation of the free flyer was much more difficult, partly due to the difficult realtime "relative pose" determination (i.e. the measurement) from the mono- or stereo images. Using the estimates of the hand camera position from equ. (7) and (8) the free-flyer's motion was estimated in the base coordinate system, too. The Kalman filter equations write here

(9) YF (k) = CFX_F (k) x__*F(k) = A F ( k -

1)s

(10) (k - 1)

(11)

Note that the measurement matrix CF(k ) as well as the system dynamics matrix A_1(k1) are not constant, but depending on _x~. We turned the Kalman filter gains in a way that after initializing the estimation process (with noiseless measurement data) the estimation errors were always less than 5 percent in less than 2 seconds. The rough outline of the estimation scheme we have given above is simplified in many aspects compared to the final implementation. Just to mention one example, the video data by which we measured the free-flyer's relative pose had typical down-link-delays of 2 sec, while the joint angle data transmitted via another channel had a minimum delay of around 3 seconds. These features were based on the shuttle control structure and thus not changeable. For grasping the free-flyer during the flight we allowed shared control in its two extremes only, namely either fully manually controlled, looking at the predictive stereo-graphics and operating a control ball or "space-mouse", or fully automatic based on a tracking algorithm not discussed here. By pressing just one key these modes were switchable

411 in the fraction of a second. This led to an exciting situation during the mission. We first teleoperated the robot to draw the free-flyer out from its fixture, released it and after letting it drift away we switched to the automatic tracking mode. In the graphics scenery the robot indeed grasped the object (a cube with all corners flattened), but the human operators were afraid the grasp might not be central enough and overrode the "close gripper" decision. Thus in reality (to be observed seconds later) the robot kind of snapped the object but by opening the gripper immediately gave it quite a kick so that it drifted through the workcell like a billard ball. After a few minutes it came back into the robot's area of reachability; this time we decided to let the automatic system alone and indeed the robot performed a perfect, fully automatic ground-controlled grasp (ground teleoperation by "machine intelligence") despite of a round-trip delay of about 6 seconds. Fig. 9 shows photos of the TV-scene out of one of the hand cameras immediately before successful, fully automatic grasping from ground, following the image processing approach in [8]. This automatic, ground-controlled capture of the free-flyer was one of the many spectacular actions of ROTEX. 5. R O T E X D A T A - T R A N S F E R

IN CASE OF O N - L I N E G R O U N D

OPERATION At the ROTEX-telerobotic ground station in the German Space Operation Control Center (GSOC) every 50 msec a set of motion commands, including information for gripper and program control, was generated. These commands were routed to the predictive graphic simulation. Simultaneously, the data were written into a data buffer with a depth of 10. At the same time the block number of the dataset was stored in a list, together with the actual system time. When the data buffer was filled with 10 data blocks, a DECNET transmit operation was initiated and the contents of the data buffer was transferred to GSOC. Fig. 10 shows the data flow from GSOC to Shuttle and vice versa. Numerous computers within this loop receive data, check the contents of the data block and transmit the information to the next computer or satellite link. The data processing together with the pure signal travel time on satellite links results in a total uplink time of about 2.5 seconds. The ROTEX board computer reads the data blocks from the HRFL (high-rate forward-link)-interface and releases the 10 minor frames to the board software modules; one frame per 50 msec. At the same time the released minor frames (including the unchanged block numbers) are fed into the ROTEX down link buffer. Again ten data blocks (minor frames) are combined to one major frame to minimize the input/output overhead at the computer system. The major frames are sent back to GSOC and from there to the ROTEX-telerobotic ground station. A DECNET receive program checks the major frames and if there was no transmission error, every 50 msec the contents of one minor frame is sent to the ground station software modules. In case of a data transfer error the disturbed block is neglected. The block number of each minor frame points to the block generation time, stored in the list. The total round trip time is calculated as the difference between the block generation time and the actual system time. GSOC is responsible for distributing the received data to the ground segments of all the active payload elements. Because the data processing load at the GSOC distribution computer

412

Figure 9. Two subsequent TV-images out of one of the hand cameras shortly before grasping the free flyer automatically from ground. The dark areas at the left and right lower part are the gripper jaws.

413

I c~176

.ou.,on I

I

!

Figure 10. ROTEX data paths from GSOC to shuttle COLUMBIA is not constant, the data are not transmitted within equidistant time intervals- we have some jitter on the data blocks. Additional buffering was introduced to avoid the short-term jitter which originally was 0,3 sec. Thus with the up-link delay of around 3 sec we ended up with an average delay of about 6,5 seconds with long-term variations (typical time constants 10 minutes) from 5 to 7 seconds. Of course, the estimation schemes for the free-flyer's ground capture had to be permanently adapted to these delay changes. 6.

CONCLUSION

For the first time in the history of space flight a small, multisensory robot (i.e. provided with modest local intelligence) has performed a number of prototype tasks on board a spacecraft in the most different operational modes that are feasible today. Key technologies for the success of ROTEX have been 9 local (shared autonomy) sensory feedback control concepts, refining gross commands autonomously by intelligent sensory processing 9 powerful delay-compensating 3D-stereo-graphic simulation (predictive simulation), which included the robot's sensory behaviour. The experiment also clearly showed that the information and control structures in mission control centres for future space robot applications should be improved, allowing the robot operator on ground more direct access to the different types of uplinks and providing him with a continuous TV-transmission link. Close cooperation between man (astronaut or ground operator) and machine comprising different levels of robot autonomy was the basis of the success of ROTEX. It was clearly proven that a robot system configured in this flexible way of arbitrary and fast switching between the most different operational modes will be a powerful tool in assisting man in future space flight projects and it was impressively shown that even large delays can be compensated by appropriate estimation and pre-simulation concepts. ROTEX had to make optimal use of given data transmission structures which were not at all prepared to robot teleoperation. Operational space robots of the future should have much better conditions and thus really may become a powerful prolongation of the human arm into space.

414 REFERENCES

1. S.Lee, G. Bekey, A.K. Bejczy, "Computer control of space-borne teleoperators with sensory feedback", Proceedings IEEE Conference on Robotics and Automation, S. 205-214, St. Louis, Missouri, 25-28 March 1985. 2. M.T. Mason, "Compliance and force control for computer controlled manipulators", IEEE Trans. on Systems, Man and Cybernetics, Vol SMC-11, No. 6 (1981, 418-432). 3. T.B. Sheridan, "Human supervisory control of robot systems". Proceedings IEEE Conference, Int. Conference on Robotics and Automation, San Francisco, April 7-10, 1986. 4. G. Hirzinger, J. Heindl, K. Landzettel, "Predictive and knowledge-based telerobotic control concepts". IEEE Conference on Robotics and Automation, Scottsdale, Arizona, May 14-19, 1989. 5. S. Hayati, S.T. Venkataraman, "Design and Implementation of a Robot Control System with Traded and Shared Control Capability", Proceedings IEEE Conference Robotics and Automation, Scottsdale, 1989. 6. L. Conway, R. Volz, M. Walker, "Tele-Autonomous Systems: Methods and Architectures for Intermingling Autonomous and Telerobotic Technology", Proceedings IEEE Conference Robotics and Automation, Raleigh, 1987. 7. J. Funda, R.P. Paul, "Efficient control of a robotic system for time-delayed environments". Proceedings of the Fifth International Conference on Robotics and Automation, pages 133-137, 1989. 8. D. Dickmanns, "4D-dynamic scene analysis with integral spatio-temporal models", Fourth Int. Symposium on Robotics Research, Santa Cruz, Aug. 1987. 9. Christian Fagerer and Gerhard Hirzinger, "Predictive Telerobotic Concept for Grasping a Floating Object", Proc. IFAC Workshop on Spacecraft Automation and OnBoard Autonomous Mission Control, Darmstadt, Sept. 1992 10. B. Brunner, G. Hirzinger, K. Landzettel, J. Heindl, "Multisensory shared autonomy and tele-sensor-programming- key issues in the space robot technology experiment ROTEX", IROS'93 International Conference on Intelligent Robots and Systems, Yokohama, Japan, July 26-30, 1993. 11. G. Hirzinger, "ROTEX - The First Robot in Space", ICAR'93. The sixth International Conference on Advanced Robotics, Tokyo, 1993. 12. G. Hirzinger, B. Brunner, J. Dietrich, J. Heindl, "ROTEX - An advanced space robot technology experiment", ISRR'93, Sixth International Symposium on Robotics Research, Hidden Valley, PA, Oct. 2-5, 1993.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

415

Feature-Based Visual Servoing and its Application to Telerobotics Gregory D. Hayer, a Gerhard Grunwald, b and Kentaro Toyamaa ~Department of Computer Science, P.O. Box 208285 Yale Station, Yale University, New Haven, CT 06520-8285, U.S.A. bGerman Aerospace Research Establishment (DLR), Institute of Robotics and System Dynamics, Mfinchenerstr. 20, D-82234 Oberpfaffenhofen, Germany

Recent advances in visual servoing theory and practice now make it possible to accurately and robustly position a robot manipulator relative to a target. Both the vision and control algorithms are extremely simple, but they must be initialized on task-relevant features in order to be applied. Consequently, they are particularly well-suited to telerobotics systems where an operator can initialize the system but round-trip delay prohibits direct operator feedback during motion. This paper describes the basic theory behind feature-based visual servoing and discusses the issues involved in integrating visual servoing into the R O T E X space teleoperation system.

1. I n t r o d u c t i o n Automation and robotics are rapidly becoming extremely attractive areas within space technology. They hold the promise of assembly, servicing, and repair with a minimal number of expensive manned missions [7]. However, unlike a factory environment, space operations require the ability to work in an environment which is unstructured. For this reason, some amount of autonomous behavior is necessary to perform complex, diverse tasks. Such autonomy will rely heavily on vision, depth, force, torque, and tactile sensors, as well as advanced planning and decision capabilities. Unfortunately, our current lack of understanding of sensor data interpretation, robot motion control, and artificial intelligence makes the prospect of complete autonomy for complex tasks unlikely in the near future [2]. One way out of this dilemma is sensor-based telerobotics. In particular, space operation tasks for which telerobotics could play an increasingly large role include inspection, assembly, servicing, and repair [10]. When teleoperating in space, relay satellites and computer networks insert a large delay into round-trip communication. Hence sensor data such as video images cannot be used as direct, real-time feedback by a remote operator. One approach to overcoming this problem is to use a predictive computer graphics system as was successfully demonstrated in the German space robot experiment, ROTEX. In ROTEX, the operator handles a control device--a six degree-of-freedom control ball--based on a predictive graphics model of the robot and its environment. The control commands are sent to both the local simulation

416 and the remote robot. If the simulation is properly calibrated to the real environment, the behavior of the remote system will mimic that of the simulation exactly. Clearly, the crucial problem in this system is to provide an extremely accurate simulation. Teleoperation based on the predictive graphics will fail if the world model does not correspond with reality. This may happen if objects are deformed or damaged (for example, the solar panel of the Hubble Space Telescope), or if an object is freely floating in space. Another disadvantage of relying on a known world model is the need for a precise calibration of the entire operational space. This requirement is difficult to meet because the mechanics a r e p u t under extreme pressure by liftoff and the subsequent temperature differences between parts facing the sun and those which are not. One way of lessening the dependence on prior geometric knowledge is to make remote operations sensor-based. Experience has shown that proper use of closed-loop control algorithms operating in orbit can significantly enhance the capabilities of the teleoperation system [6]. To date, there has been little prog[ess in the use of visual feedback to'support teleoperation. One reason is that most classical work on visual servoing relies heavily on a calil~rated hand-eye system. As noted above, maintaining accurate calibration in orbit can be difficult. A second reason is that most vision algorithms are too complex t6 execute on hardware that is space-qualified. In [5], an approach to feature-based visual servoing using closed-loop control was developed. The main advantages of the approach are that it is extremely robust to calibration error and that it uses simple visual features that are computationally simple to track. It can be shown that the accuracy of these visual servoing algorithms is, in fact, independent of calibration error. These properties make the technique well-suited to the teleoperation problem since the operator can choose features and servoing operations using a single stati~c image, and the system can then, without further intervention, autonomously execute the operation with high accuracy and reliability. In this paper we discuss a family of visual servoing techniques, where simple positioning primitives can be easily combined to perform full six degree-of-freedom relative positioning. We then describe some of the issues involved in integrating visual servoing into a teleoperation environment, discuss a preliminary design for such a system, and illustrate its use on an example problem. The remainder of this paper is organized as follows. The next section describes the visual servoing problem and our solution to it. In Section 3, we describe the integration of visual tracking and the telerobotic system together with the operator interface. In Section 4, we describe our current progress at implementing this system.

2. Feature-Based Visual Tracking and Servoing Vision is an extremely rich and precise mode of sensing. In many ways, it is an ideal candidate for sensor-based motion and manipulation. However, in order to combine vision and robotics, twomajor problems must be addressed. First, vision problems that involve scene interpretation tend to be extremely difficult, and most of the vision algorithms used to solve them require specialized hardware in order to operate in real-time. Second, in order to relate visual information to a robot manipulator, the spatial relationship between

417

Figure 1. A visual servoing system performing positioning by reducing visual disparity to zero

the manipulator and the camera(s) must be known. Determining and maintaining this relationship, the hand-eye calibration, with high accuracy is very difficult. Here, we describe an approach to visual servoing that eliminates these difficulties. Instead of relying on accurate visual reconstruction, we base motion directly on visual feedback. In humans, this process is called hand-eye coordination. We are attempting to develop similar hand-eye skills for robotic manipulation systems. We hypothesize that a sufficiently large and rich vocabulary of hand-eye skills will dramatically increase the dexterity of robotic systems. For the purposes of this article, hand-eye skills are essentially vision-based feedback control algorithms. These methods use locally defined visual features, primarily image contours, that can be tracked in real-time by standard workstations or PCs. Corresponding features in two cameras comprise the input to a feedback control system. The feedback control system will provide accurate positioning despite calibration errors provided control gains are chosen to ensure stability. 2.1. Visual Feedback Control We begin by describing two systems for visual feedback control: one that controls only position and one that controls both position and orientation. Figure 1 illustrates the underlying principle. We consider a visual servoing system which includes two video cameras, a robot arm, and computers that perform image-processing, low-level control operations, and other interface-related functions. Any attempt to accurately calibrate this system is limited by system modeling errors, mechanical backlash, control inaccuracy, and so forth. Consequently, a system that reconstructs absolute point coordinates using vision and positions the robot based on that information will be extremely imprecise. However, as illustrated in Figures l(a) and l(b) the robot manipulator can be positioned extremely accurately relative to the observed target. In (a), the cameras observe the visual disparity 01 and 02 between a point on the manipulator and the corner of the box. We know the following property: zero disparity between the manipulator and the goal in both images

418 means that they are at the same point in space. This property is true independent of where the cameras are located relative to the robot (More precisely, this statement is true except in configurations where the goal or robot and the cameras are collinear). Thus, if we have a stable controller to achieve zero visual disparity, we can position accurately, even if the system is badly calibrated. The theory underlying a feedback control algorithm for this problem can be summarized as follows. Suppose that y = f ( x ) is a mapping from a robot configuration space to an output sensor space, both of dimension n. Given a desired setpoint y*, define e v = E ( y , y * ) , where ev approaches zero as y approaches y*. Taking time derivatives,

e'y= .]E[C,

(I)

where OE JE = Ox"

(2)

If E(-,-) is nonlinear, JE is a function of the system state, x. Thus, if the system state is not directly available, it must be estimated from sensor data. We take u = ~: to be the control input to the system and, presuming JE is full rank, compute u = - k JEl eu

(3)

where k is a constant set by the designer. If the Jacobian is nonsquare, the control vector is computed by u = -k J [ ( J E J T ) - 1 % .

(4)

It is well known that the proper choice of k in discrete time approximations of systems of this form lead to an error, e, that asymptotically approaches zero. Using these concepts, we can construct visual servoing systems to perform different types of positioning and motion by defining the appropriate image-based error term. The inputs available for defining errors are contours and fixed visual reference points observed in two cameras. The following is a brief review of camera imaging geometry for these features. Camera positions are represented by the frames C1 = (c~, E~) and (32 = (c2, E2). Points and fines in ~(3) will be written in capital letters. The projection of a point P or line L in camera i will be written pi and li respectively. The projection of a point P = (P~,Py, Pz) T to a homogeneous vector p~ = (u, v, 1) T is given by P' p,

= =

Ei(P - ci), (u, v, 1) T = P ' / P : .

(5) (6)

In vector form, this is written P i - - gi(P). An arbitrary infinite line, L, is parameterized by a tuple (Ld, L~,) where Ld is a fixed point on the line and Lv is a unit vector representing the direction of the line. The vector

419

li parameterizing the projection of L in camera i is L' = Ei (Lv x (Ld -- ci)), L' = v/L, +

(7)

(s)

In vector form, this is written li = hi(L). The Jacobian of g is a function of the position of the observed point, and the Jacobian of h is a function of the fixed point on the line:

jg(p)_

Og

I

-~Xp

,

~

Jh(L)= Oxx

La

"

(9)

As described in [3], it is possible to estimate the parameters of both lines and points using relatively straightforward techniques. We refer the reader to that publication for the technical details of estimation and control and proceed to describe the structure of three representative controllers. P r o b l e m Definition: Given a fixed reference point, P, on a manipulator and a line, L, not on the manipulator, servo the manipulator so that P C L using inputs pi and li, i = 1,2. A homogeneous vector pi in camera image i lies on the line projection li if and only if Pi" li 0. Aside from a set of singular configurations, it can be shown that for any arbitrary line L and a point P, ll'pl = 4"p2 = 0 if and only if P C L. This motivates the definition of a positioning error e C N(2) as: =

eli e2

Pl " ll

P2 4

[gl(P)

" hi(L) ]

g2(P) h2(L)

= H2(P, L).

(10)

The Jacobian is as follows:

JH~ --

[ ITJg(P) lTjg(p) ].

(11)

Applying the methods described above yields a controller for positioning a point on a line. This primitive controller is remarkably versatile and can be easily composed with itself to form controllers which control greater degrees of freedom, hinting at the possibility of less and less human operator intervention. For example, we can combine two point-to-line primitives to create a line-to-line or "alignment" controller which controls four degrees of freedom: P r o b l e m Definition: Given two fixed reference points P and R and a reference line L that is rigidly attached to a manipulator, develop a regulator that positions the manipulator so that P C L and R E L, using pi, ri, and li, for i = 1, 2.

420

M

L ,S ........ 9 ...... ~

Initial

P

R

Configuration

@ ~.----

P R L "'Goal C o n f i g u r a t i o n

Figure 2. The geometry of the six degree-of-freedom servoing problem

The error defined for the previous problem can be extended for e C ~(4):

Iell e2 e3

I p2 pl ll12 I rl ~1

gl(P)'hl(L) g2(P)" h2(L) gl(R) "hi(L)

H4(P,R,L).

(12)

With this formulation, it is possible to impose the remaining two degrees of freedom arbitrarily by choosing a rotational velocity and translation velocity about and along the estimated value of the line L, respectively. This allows for shared control between the control system and an external agent, e.g., an operator. A similar construction with an additional constraint yields a full six degree-of-freedom regulator: P r o b l e m Definition: Given three non-collinear, fixed reference points P, R, and S, and two non-parallel reference lines L and M that are rigidly attached to a manipulator, develop a regulator that positions the manipulator so that P 6 L, R C L, and S C M using pi, r~, si, li, and mi, i = 1, 2 (See Figure 2). The positioning error, e C N(6), is quickly derived:

el

Pl" ll

gl(P)" hi (L)

e2

p2 " 12

g2(P) h2(L)

e3 e4

rl . 11 r2" 12

gl(R) g2(R)

e5

81"ml

e6

s2 " m2

h~(L) h2(L) h~ (M) g2(S) h2(M)

H6(P,R,S,L,M).

(13)

gl(S) "

Based on the remarks above, it follows that, aside from a small set of singular configurations, e = 0 if and only if P E L, R E L, and S 6 M. The system Jacobian Jg~ is square

421 V1 M

-~,.

i ~~p .-'" L

Key Line Used To Find Vanishing Point Line Constructed from Vanishing Point Tracked Comer .... Reference Vector

Figure 3. The geometric constructions for inserting a floppy disk into a drive unit

and depends on five vector quantities that can be estimated from sensor data. Applying the methods described above produces a regulator that can control all six degrees of freedom of a robot manipulator. For details on the form of the Jacobian and the estimation of unknown values, we refer the reader to [3]. Note that the assignment of features in the problem above was arbitrary. That is, the roles of points and lines can be interchanged without changing the final result. Also, one way of defining L and M is by choosing two reference points and computing the line that runs through them. The problem formulation is independent of whether the cameras are stationary in the environment or mounted on the manipulator itself. Thus, this basic control formulation can operate on a wide variety of systems with different types of input. All of these controllers have been implemented and tested as described in [5].

2.2. Geometry and Visual Servoing The controllers described above do not have to be based on directly observed features. Various types of geometric constructions can be used to define "virtual" lines and points. However, in order to maintain the accuracy of servoing, these constructions must rely on image-level geometric constructions. For example, Figure 3 illustrates the use of geometric constructions to place a floppy disk into a disk drive. Geometrically, the strategy is to move the disk into the plane of the disk drive unit, align it with the slot, and then move toward the slot until the disk slides in. In the visual domain, we can perform this task if we know that the sides of the disk drive unit are parallel to the floppy disk drive. In a projected image, parallel lines meet at a point, so we first construct vanishing point V1 by tracking the indicated corners of the drive unit. The edge of the floppy slot and the vanishing point are used to construct the line L. The line M is constructed from the image of the slot itself. The three corners of the floppy are used to define the reference points P, R, and S. Servoing using this information will place the floppy disk at the mouth of the drive slot, oriented parallel to it. As this example illustrates, a variety of geometric constructions can be used to provide increased functionality for visual serving. As with the basic servoing routines, these

422 constructions rely on tracking specific features in an image and computing values based on those features.

2.3. Feature Tracking In [4], a feature-based tracking system was described. The tracking system is based on two central ideas: window-based image processing and state-based programming of networks of tracked features. A window is an image defined by its height, width, position, and orientation in device (framebuffer) coordinates. Tracking a feature means that image-processing operations are used to maintain a fixed relationship between window coordinates and the underlying feature. The low-level features currently available in the system include solid or broken contrast edges detected using convolutions and general grey-scale patterns tracked using SSD methods [1,15]. The entire tracking system is designed to run on a standard CPU and framegrabber with no additional hardware support. For example, tracking single contours on a Sun Sparc 2 with an Imaging Technologies 100 series framegrabber requires 1.5ms for a 20 pixel contour searching + 10 pixels using a mask 15 pixels wide. Tracking arbitrary patterns using SSD methods requires 70 ms to localize a 20 by 20 window. Basic features can be easily composed into more complex configurations using feature networks. Every feature or image property in our system can be characterized in terms of a state vector. For basic features--those that operate directly on images--the state of the feature tracker is usually the position and orientation of the feature relative to the framebuffer coordinate system. We define composite features to be features that compute their state from other basic and composite features. A feature network is defined as a set of basic and composite features connected by two types of directed arcs referred to as up-links and down-links. The up-links provide image information for higher levels of abstraction. For example, a cluster of image features that all lie in a plane may contribute information to a composite feature that has, as state, the slant and tilt of the plane. The down-links provide constraints from higher levels of abstraction. For example, if something is known about the motion of a rigid object, this information can be propagated down to the feature level to predict positions of image features in the subsequent image. This ability to "package" features into higher-level primitives is particularly useful for providing visual constructions such as those used in the last section. Each of these constructions can be defined as an object that requires certain initialization information and provides a particular type of image information. These objects can be composed to form more objects, or they can be used directly to provide input to visual servoing routines.

3. Teleoperation and Visual Servoing Visual servoing provides the means to achieve a particular geometric configuration reliably and accurately. However, some intelligence external to the servoing algorithm must translate a geometric task into the visually based operations and choose the visual features needed to perform those operations. In this section, we explore the issues that arise when integrating visual servoing into a teleoperation system similar to ROTEX. The information available to the human operator consists of a menu driven programming environment, two camera images of the remote site, and a graphics simulation of

423 the known aspects of the remote site. The completeness of the latter depends on the availability of prior object knowledge and/or sensor reconstruction methods. The system interface offers the operator a list of the visual servoing operations, e.g., positioning or alignment, that are available in the current execution context. The operator must select an operation and, based on the selection, he or she must choose image features that initialize the tracking system for the chosen operation. These may be image features, or features defined using visual constructions. In the latter case, the system requests further initialization information until all operations are fully instantiated with image-level features. Note that the initialization of feature tracking must take place in both camera images.

3.1. Image Level Feature Specification The effect of a visual control action is only meaningful if the chosen feature has physical relevance and if the images of the same physical features are chosen in both camera images. As a consequence, the set of image-level features we consider are contours, intersections of contours, and unique surface markings. An example of the latter would be a barcode or similar artificial labeling device. We disallow arbitrary patterns that would be difficult to place in accurate correspondence in both camera images. At the image level, the tracking system is initialized to track a feature by supplying the image coordinates (position and orientation) of the feature within a prescribed tolerance. This information may be interactively supplied by an operator "clicking" on a feature in an image, or may be indirectly supplied through model information. In both cases, the low-level image processing can check for the presence of the correct type of feature and also ensure that there are no distracting features in the immediate vicinity that would lead to tracking ambiguity. The system can reject any feature that does not satisfy these conditions. As noted above, it is important that the features chosen in both camera images be physically consistent with one another. One means of ensuring this is to exploit epipolar camera geometry. Given (homogeneous) projections pt and pr, of a point P, there is a 3 by 3 matrix E such that pTEpr -- O. If the camera system is accurately calibrated, the E matrix can be computed directly from the calibration. Alternatively, the E matrix can be computed from the projections of eight points in two images provided the points are arranged in a non-singular configuration [9,11,12]. Hence, given a ninth feature point in one image, it is possible to determine the epipolar line along which the corresponding feature lies in the second image. Note that if the point P lies on a contour in one image, the corresponding point in the second image is determined by the intersection of the corresponding contour with the epipolar line. Consequently, standard matching methods can be applied to order candidate matches along the epipolar line. The operator can then accept the system's best choice, or override it by choosing another contour. If the feature in the first image is the intersection of contour points (e.g., a corner) then the feature in the second image is a point where two contours and the epipolar line intersect at a single point. This has a very high probability of being unique. Similar remarks hold for special patterns. These ideas rely on having either an accurate calibration or at least 8 corresponding points in both images. That latter condition is preferable as it is independent of camera

424 calibration, but it may be extremely difficult to satisfy. However, the environment contains many known objects, including the robot itself. Consequently, a simple method for providing the required correspondences is to decorate the workspace with unique visual targets. These targets can be acquired automatically from static images on the ground and used for subsequent epipolar computations. Furthermore, if enough image processing bandwidth is available at the remote site, the targets can be tracked over time and do not have to be reacquired for every operation initialization.

3.2. Using Prior Environmental Knowledge Even if the automatic correspondence mechanisms described above functioned perfectly, it would still become onerous for the operator to specify all of the features needed for numerous six degree-of-freedom positioning operations. In many cases, image-level tracking initialization may not be necessary. Some objects in the environment, such as the robot manipulator, will be known. If the location and pose of these objects can be ' calculated using artificial markings or by tracking object features, then the operator can choose features from the model in the graphics simulation. By using the known correspondence between model and image, the tracking system can be initialized in both images automatically. If sufficient tracking bandwidth is available the operator can also register features for later use and reuse. For example, if the manipulator is holding a particular tool, e.g., a screwdriver, the operator may instruct the system to track the shaft and the endpoint of the screwdriver at all times. These features can then be used again and again by referencing a symbolic label or by choosing them from an iconic representation in the graphic simulation. Likewise, some often-used operations may implicitly select certain object features. For example, one extremely common operation is to command the manipulator to approach an object along the manipulator center axis. An operation "manipulator-aligned-approach" could be defined that automatically chooses certain manipulator features to define the center axis and endpoint of the manipulator. In this case, the operator need only choose one or two features that define the goal configuration. If this can be done in the graphic simulation, visual servoing is essentially no more difficult than pointing to where the robot should go. 3.3. Integration into ROTEX The ROTEX sensor-based telerobotic system [6] is based on the shared autonomy concept that distributes intelligence to man and machine [8]. Global tasks like visual servoing are specified interactively by a human operator and carried out by local sensory feedback loops executed by the remote robot system. Coarse task-level planning activities are performed by human intelligence while fine path planning on manipulator level takes place through sensor-based control [13]. The shared autonomy concept also allows shared control in which the robot is guided by local sensory feedback as well as by the human operator via the teleoperational device. In the context of visual servoing, this allows the specification of a visual controller constraining less than six degrees of freedom. The remaining degrees of freedom are controlled by the operator. The ROTEX system already supports a three-dimensional graphics environment in which object models and the robot are represented. This virtual world allows the operator to move around and "interact" with the robot and its environment. In addition, all remote

425

Figure 4. The ROTEX workcell

operations are mimicked in the graphical simulation. In order to support visual servoing, the functionality of the ROTEX ground station [6] will be expanded to include a menudriven programming environment, the image processing system, and the capability of overlaying remote camera images with wireframe models of known objects. These visual aids make the feature selection easier for the operator. The overlay not only shows the correspondence between a model and real data but also gives a visual hint as to the validity of the system calibration. Selected visual features will appear highlighted in the camera images and, when possible, in the graphic display. Visual servoing will also be incorporated into the robot simulation so that the servo operations can be tested otttine before applying them to the remote system.

4. Examples We now describe examples of visual servoing and teleoperation in space. First, we briefly sketch how two servoing problems would be specified in the extended ROTEX system. ROTEX contains two camera systems: a stereo system mounted on the manipulator itself and an external camera system. The first example uses the external stereo cameras, and the second uses a single manipulator-mounted camera. We assume that some calibration information on both camera systems is available, although the correctness of the calibration is not guaranteed. In our third example, we describe a task which we hope to perform in the near future and offer experimental results which confirm the feasibility of visual servoing as a means to simplify teleoperation tasks. We first describe how the floppy insertion task could be performed using the ROTEX interface. We note this is in fact a realistic task since one of the commonly performed tasks in orbit is the removal and insertion of circuit boards. We assume that the disk drive unit is a known object registered to the graphics model. The disk itself is not an object known to the system. We assume the manipulator is holding the disk initially.

426

Figure 5. The view from the ROTEX manipulator camera

The first action of the operator would be to choose a servoing operation. Since insertion is a common operation, it would appear in the list of available operations. Once insertion is chosen, the operator is notified that he or she must choose specific image information m in this case points and/or contours on the manipulated and target objects. The operator then chooses the three corner points of the floppy disk as the defining points for the manipulated object. When chosen, the system estimates the depth of the points (using camera calibration information) and enters them into the graphics display. The system continues to request two lines on the target object. The operator now chooses the construction parallel-line. The system offers options for constructing parallel lines, in particular, by choosing two parallel lines in the world and a third point. The operator chooses this option. Since the floppy drive is registered in the graphics model, the operator can directly choose the sides of the drive unit as the requested parallel lines and the edge of the floppy slot as the point defining the insertion axis. This defines the line L of Figure 3. Still needing a second line, the operator indicates the drive slot of disk unit in the graphics display. The feature specification for the task is now complete. The system analyzes the features for trackability and completeness. If the features are satisfactory, the operation proceeds in both simulation and on the remote system. When finished, the operator receives an image and checks that the operation was correctly performed. If so, the system proceeds to perform the insertion operation, presumably under position and force control. As a second example, we describe the task of grasping the ORU (Orbit Replaceable Unit). This task is representative of many experiments which involve the manipulator moving toward a rigidly attached object, grasping it, and moving it to some other place. Figure 5 shows a typical view from the manipulator cameras above the ORU. The visual line markings 11 and 12 were added to aid the human operator in positioning. The final goal position is reached if ll aligns with the horizon of the fingertips, 12 aligns with the gap between the fingers and the manipulator is perpendicular to the surface of the ORU as it touches down. In the final approach, four laser distance sensors integrated into the fingertips can provide distance and some orientation information. In this example, it is assumed that all objects are known and their models are registered in the graphics model. Unfortunately, the configuration of the manipulator cameras is such that occlusion and visual singularities severely limit their usefulness. In particular, the "horizon" line in the

427 figure is parallel to the baseline of the cameras. This means that it is not possible to acquire depth information on this line, or on lines parallel to it. Instead, we use visual information on the lines ll and 12 to constrain the orientation of the manipulator about the center axis and translation perpendicular to the manipulator axis while the operator controls the direction and attitude of approach. Proceeding as above, the operator begins by choosing an operation, in this case the appropriate approach operation. This operation assumes that the horizon line and the center of the manipulator gap (/92 in the figure) are the manipulator setpoints. It indicates that the operator must specify a line and point in one camera. Since the ORU is registered in the graphics simulation, the operator indicates ll and P1. The system verifies that these features are visible and trackable, and then begins to graphically simulate the visual servoing operation as it is carried out remotely. The operator uses the tracking ball to move the manipulator toward the ORU, simultaneously adjusting orientation, until the laser distance sensors detect the ORU surface. At this point, the distance sensors can control manipulator attitude with respect to the surface. Vision continues to control translation parallel to the surface and rotation about the center axis. The operator continues to control distance to the object until touchdown is registered on the force sensors. As our third example, we describe a set of completed experiments which demonstrate how the system can be employed without the use of an a priori model. The actual task is to visually guide a robot arm mounted on a repair satellite so that it can grasp a separate, damaged satellite. The robot arm holds a specialized rod-shaped grasping tool, which is equipped with two triplets of laser distance sensors. Grasping of the damaged satellite becomes possible when the grasping tool is inserted into the satellite's conic apogee motor. Although the distance sensors will be used to center the grasping tool once it is inside the motor, vision must be used to servo the tool into the cone. We expect to perform the task in shared-control mode, where the operator will determine only translation along and rotation about the cone axis. Therefore, the servoing algorithm must fix the remaining four degrees of freedom in the manipulator. In our experiments, we simulate the actual, expected environment with a pencil-like stick instead of the grasping tool, and a paperboard cone instead of a real apogee motor (See Figure 6). Two cameras will also be placed upon the repair satellite, either in a stereo configuration (side by side), or with one camera on the robot manipulator, and one at the base of the manipulator. Our present experiments approximate the first situation, where two black and white cameras are placed approximately 30cm apart, about 150cm away from the cone, in a constant geometric relationship with the manipulator base. We note that, unlike the ROTEX system, the absence of precise knowledge of the motor or of the geometric configuration of the satellites means that we cannot expect to have an accurate graphics simulation of the environment. Four corners are tracked in either image: two for the cone base (assumed to be viewed roughly from the side, so that a trapezoidal figure is projected to the cameras), and two for the grasping tool. Each corner is tracked as the intersection of two line segments, themselves simple edge detectors, hinged together at a vertex. The geometry of the task requires only that we know the slopes of the lines that track the sides of the rod and the

428

Figure 6. A pair of beginning and ending configurations of manipulator and cone for our third example: both corners of the tip of the rod are tracked along with the two top corners of the cones

sides of the cone, which are, in turn, used to construct virtual lines for the central axes of the rod and the cone. These virtual lines are fed as input into a 4-DOF controller (see Section 2.1) that determines the motions required for visually aligning the axes. We have been able, in our simulated environment, to consistently align the rod with the cone. Once vision has been initialized by a human operator, the servoing algorithm is capable of aligning the stick to within 2 degrees and 3mm of the cone axis, which is more than accurate enough for the needs of the application. Servoing remains reliable despite slow, continuous motion on the part of the cone and motion of the stick towards the cone as commanded by the operator. 5. D i s c u s s i o n We have described a method for feature-based visual servoing and its application to remote teleoperation. The visual servoing methods described above have been tested in stand-alone systems. We are now in the process of integrating them into a larger, more flexible servoing system. We have taken some initial steps toward including visual servoing in the ROTEX environment and other space operations. While visual servoing provides robust, calibration-insensitive primitive operations, even at this early stage, we find that its performance can be greatly enhanced in a variety of ways. First, any remotely operated system will benefit from the availability of artificial visual features, even if the features are not part of a calibrated geometric model. Ensuring that eight such features are always visible is extremely useful for correspondence calculation when the camera system is not well calibrated. Second, the ability to recognize and calibrate the robot end-effector and other objects in the environment to a simulation significantly eases the burden on the operator. Careful design of the visual servoing operations can relieve the operator from the potentially onerous task of specifying visual cues. Third, the availability of a graphical simulation for the visual servoing operation is an extremely useful tool. In addition to providing a means for choosing object fea-

429 tures, it supports operator interaction using hybrid control. The latter often simplifies the specification of a visual servoing task. One issue we have not addressed in any detail is the issue of error handling. The visual servoing methods described above will fail if the system moves through (visually) singular configurations. In many cases, these errors can be detected and corrected for automatically. However, some errors should be flagged and returned to the operator. For example, in ROTEX is it possible for the operator to choose a set of features on the manipulator that lead to a globally singular system. This should be detected and the feature specification rejected with an appropriate explanation of the problem. Another issue is the choice of a set of actions to provide to the operator, actions which should be the most "intuitive" and useful. The geometry used in visual servoing is often non-intuitive, and there are many limitations on what can and cannot be done. Thus, one immediate priority is to implement a preliminary version of the visual servoing interface and to perform simulated visual servoing experiments with the R O T E X system. We are also continuing to extend our understanding of closed-loop visual servoing and its application to robot control.

Acknowledgments The work in this paper could not have been done without the help of Sidd Purl on implementing an initial version of the tracking system, and W-C. Chang and A.S. Morse for help in designing the visual feedback controller. This research was supported by DARPA grants N00014-91-J-1577 and N00014-93-1-1235, by National Science Foundation grants IRI-9109116 and DDM-9112458, by NATO Collaborative Research Grant CRG-910994 and by funds provided by DLR.

REFERENCES 1. P. Anandan. A computational framework and an algorithm for the measurement of structure from motion. Int. J. Computer Vision, 2:283-310, 1989. 2. B. Brunner, G. Hirzinger, K. Landzettel, and J. Heindl. Multisensory shared autonomy and tele-sensor-programming: Key issues in the space robot technology experiment rotex. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Yokohama, Japan, 1993. 3. G . D . Hager. Six DOF visual control of relative position. DCS RR-1018, Yale University, New Haven, CT, May 1994. 4. G.D. Hager. Real-time feature tracking and projective invariance as a basis for hand-eye coordination. DCS RR-993, Yale University, New Haven, CT, November 1993. To be presented at the 1994 International Conference on Computer Vision and Pattern Recognition. 5. G. D. Hager, W.-C. Chang, and A. S. Morse. Robot feedback control based on stereo vision: Towards calibration-free hand-eye coordination. In Proceedings of the 1994 IEEE International Conference on Robotics and Automation, pp. 2850-2856. 6. G. Hirzinger. Rotex - the first robot in space. In International Conference on Advanced Robotics. 1993. 7. G. Hirziager, G. Grunwald, B. Brunner, and H. Heindl. A sensor-based telerobotic system for the space robot experiment rotex. 2. International Symposium on Experimental Robotics, 1991. Toulouse, France. 8. G. Hirzinger, H. Heindl, K. Landzettel, and B. Brunner. Multisensory shared autonomy: A

430

9. 10. 11. 12. 13. 14. 15.

key issue in the space robot technology experiment rotex. IEEE Conference on Intelligent Robots and Systems (IROS), Raleigh, USA, 1992. T. S. ttuang and C. It. Lee. Motion and structure from orthographic projection. IEEE Trans. Pattern Anal. Mach. Intelligence, 11(5):536-540, May 1989. D. Lavery. Perspectives on future space telerobotics technology. In International Conference on Advanced Robotics. 1993. H. Longuet-ttiggins. A computer algorithm for reconstructing a scene from two projections. Nature, 293:133-135, 1981. J. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT Press, Cambridge, Mass., 1992. T. B. Sheridan. Merging mind and machine. Technology Review, pages 33 -40, 1989. C. Taylor and D. Kriegman. Structure and motion from line segments in multiple images. In IEEE Int. Conf. on Robotics and Automation, pages 1615-1620. 1992. C. Tomasi and T. Kanade. Shape and motion from image streams: a factorization method, full report on the orthographic case. CMU-CS 92-104, CMU, 1992.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

431

Practical Coordination Control Between Satellite Attitude and Manipulator Reaction Dynamics Based on Computed Momentum Concept Kazuya YOSHIDA ~ Dept. of Aeronautics and Space Engineering Tohoku University Aoba, Sendai 980-77, JAPAN This paper presents a practical control method of robot satellite attitude to cope with manipulator reaction on free-floating space robots, developing the Computed-Momentum based Reaction Compensation (CMRC) concept. The author proposes versions of the CMRC control schemes based on angular momentum conservation in floating multi-link systems, and these schemes require far less computation than the computed-torque based methods. The proposed schemes are demonstrated and examined by computer simulations using a realistic 3D model which involves free-floating dynamics and structural vibration of solar paddles. 1. I n t r o d u c t i o n For successful promotion of space projects, robotics and automation should be a key technology. Autonomous and dexterous space robots could reduce the workload of astronauts and increase operational efficiency. A drawback, however, in controlling space robotic systems is the lack of an inertially fixed footing base in the orbitital environment, unlike ground-based manipulators. Any motion of the manipulator arm will induce reaction forces and moments in its base, then disturb the base position and attitude, and also excite solar paddle vibration. Position change due to the manipulator reaction may not be a major issue, however attitude change is more serious because, for satellites generally, solar and global pointings are essential to keep their function. If a satellite loses its proper orientation, it will lose electric power from solar paddles and tele-commands from the ground stations. Usually a feedback attitude control is working to maintain a proper satellite attitude from disturbances, however in case of a "robot" satellite, the magnitude of disturbance torque due to the manipulator reaction is tremendously larger than the conventional solar pressure or thin-air-drag torques. To cope with the problem, a feedback control which starts working only after the sensors detect the attitude error is not effective, but such class of a new control method as a coordination of the manipulator and attitude dynamics or a feedforward control anticipating the manipulator reaction, is necessary. In this paper, we discuss a velocity level feedforward method, based on momentum conservation dynamics, and develop a Computed-Momentum based Reaction Compensation

432 (CMRC) concept, then demonstrate its practical effectiveness with computer simulations. This paper is organized in the following way: Section 2 outlines a basic modeling of space robot dynamics and Section 3 proposes the CMRC control concept to meet practical mission requirements, then computer simulations are presented in Section 4. 2. M o d e l i n g a n d C o n t r o l of F l o a t i n g R o b o t S y s t e m s In this section, the basic dynamics of an articulated space multibody system is presented. The equations shall be used for derivation of the proposing control law in Section 3, and for solution of the direct dynamics at the simulation in Section 4. To discuss floating robot dynamics, we consider a general model that a robot satellite has plural arms including solar paddles, reaction wheels or other appendages. This is regarded as a chain of links in a tree configuration consist of n + 1 rigid bodies, connected with n articulated joints. Assume that 1 pieces of arms are mounted on a satellite main L body, and the arm k has nk pieces of links, then n - ~k=l nk. Flexible arms like solar paddles can be treated as segmented virtual rigid links connected with elastic hinges. The flexibility yields joint torques on the virtual hinges according to their virtual deformation. However, to avoid complexity, this paper do not explicitly discuss the nature and modeling of flexibility, although it is accounted in the simulation. We assume that the system freely floats in the inertial space, i.e. no external forces or moments apply on the system, or no orbital motion is considered. Let linear and angular momenta with respect to the centroid of the satellite base body (link 0) be P0, L0, respectively, we obtain

Lo

w~og H~ constant.

=

w0

H~r

~b

(1)

where (~ is a joint velocity vector. We then obtain the equation of motion of the floating robot in the following form [1]" 0

1

(2)

where -1

H: I l

eo,

[ H,,,r JT,,,,]

(3)

nk

- E E(x

+ m,k - k T

-k

+ I0

(4)

k=l i=1 l

nk k-k

k

H~r = E E ( I ~ J ~ i + mi rsiJ~i)

(5)

k=l i=1 l

nk

-

(J~, k=l i=1

I~ J~i + m, J~i J~i)

(6)

433

Figure 1

l

A Japanese robot satellite, ETS-VII [2]

nk

J ~ - Z Z m~J~/~

(7)

k=l i=1

jr,

-

[kf • ( ~ - plY), k~ • ( ~ ) - , ~ ) , . . . , . . . , k~ • ( ~ - , 5 ) , 0 , . . . , 0]

(s) (9)

j~, - [k~k~, , . . . , k ,~, o , . . . , o ] ~09 -

~

-

~0

(10)

r0ki - r k - r0

(11)

k m i 9 mass of link i of arm k w 9 total mass of the system (w -- ~-~"k=l l ~-'~i=1 nk m i ) r/k : position vector of centroid of link i of arm k p/k . position vector of joint i of arm k k/k : unit vector indicating joint axis direction of link i of arm k r0 : position vector of centroid of satellite base b o d y rg : position vector of a total centroid of the system B : joint flexibility dependent t e r m r : internal torque vector applied at joints and a tilde operator stands for a cross product such t h a t ?a - r x a. All position and velocity vectors are defined with respect to the inertial reference frame.

3. C o m p u t e d - M o m e n t u m

based Reaction

Compensation

(CMRC)

control

This section reviews requirements for practical robot-satellite and, to meet the requirements, proposes a control scheme coordinating m a n i p u l a t o r operation and base satellite a t t i t u d e based on a C o m p u t e d - M o m e n t u m based Reaction C o m p e n s a t i o n (CMRC) concept.

434

3.1. Basic requirements for space robots Focusing on robot-satellite development of near future, like ETS-VII by NASDA, Japan (see Figure 1) in late 1990's, or possible next-generation models in early 21st century, we must face to the following space borne problems: 9 there is a communication time delay between ground-stations and a space robot, approximately 1 to 10 seconds in case of low orbit operation, 9 the capacity of the communication line is small, 9 the on-board computer resource and the reaction compensator's capacity are limited, 9 the control system is subject to the severe coupling of the manipulator reaction and satellite attitude dynamics, however, the control system should stably maintain satellite attitude orientation using limited resources. Among these problems, this paper concerns the latter two arguments. Looking closely into the requirements, we find that the computational power of an actual flight version of on-board processors whould be equivalent to an i80286 at 10MHz or less, or an earlier version of note-book commercial personal computer. This is because that the design policy of flight models is to make the most use of the well-established hardware resorces to guarantee the proper operation in the space hostile environment, but not to take risks by challenging processor speed or using novel, but unmatured technology. For the same reason, the capacity of the semi-conductor memories is also extremely small. On-board control algorithms, therefore, must be as simple as possible avoiding heavy computation. On the contrary, the dynamic coupling of the manipulation and the satellite attitude imposes complicated and difficult computation upon the control systems. Such dynamic coupling is expressed by the equation of motion (2) or the momentum equation (1): we must compute the inverse dynamic solution from them. The other problem is the limitation of torque/ momentum compensation capacity of reaction wheels. A feedback attitude control is working on normal satellites to correct the attitude error, however the magnitude of disturbance torque due to the manipulator reaction is tremendously larger than the conventional reaction wheels' capability. For instance, the capacity of a flight model available in ETS-VII will be in the order of 0.1 Nm for maximum compensation torque, and 10 Nms for maximum momentum: those values are good enough for the conventional compensation, but too short to cope with the manipulator reaction. To solve these problems, one possible way is to operate the manipulator arms so slowly that the reaction meets the capacity; but it would degrade operational time-cost efficiency and require intolerable patience to operators. The other, and better, solution is to employ a class of feedforward control which anticipates the manipulator reaction.

3.2. C o m p u t e d - m o m e n t u m concept In order to obtain a feedforward signal, a computed-torque method which is based on the inverse dynamics solution is commonly used in the field of manipulator control. However in the space robot systems, the robot comprises much more complicated floating-link

435 Sample & Hold

Manipulator Joint PD Control Tcmtuc

CMRC}.~~I j

~'m

Torql~

_

+ ~

xw JReaction

Satellite Attitude PD Control

F i g u r e 2 A block-diagram of the proposed feedforward and feedback scheme with Computed-Momentum based Reaction Compensation (CMRC) concept.

dynamics like equation (2) and the on-board computational power is limited as discussed in the previous subsection, computed-torque method hence may not be acceptable. There is an attempt to develop an efficient computation method [3][4], but it still requires further power beyond the available on-borad processors. To cope with it, we propose the use of the momentum conservation equation (1) in stead of equation (2), because the momentum equation has a linear form at velocity-level, much simpler than the equation of motion at acceleration-level. Even though a velocity-level description, the momentum equation fully expresses system dynamics. In contrast with the Computed-Torque method, we name our methodology Computed-Momentum concept. Equation (1) can be solved for the system angular momentum L0. l

Lo - Iswo + E Imk~ k

(12)

k=l

where l Is -

nk

E E ( I Y - m ik-k~-k s i 5i)

(13)

k=l i=0 n

link -- ~ ( I ~ J ~ -

k k m~k -rg~J~)

(14)

i=1

Eventually, since the rightest term of equation (12) comprises manipulator arms and 9 reaction wheels, we use the symbols Ira, (Pro for the arm and Iw, (P~ for the wheels. Then we have

Lo - Isw + Ira(Pro + Iw(Pw,

(15)

436 where w0 = r for simplicity. Equation (15) is a linear and simple expression at velocity level, not using acceleration, but it involves the reaction dynamics in terms of momentum conservation. It thereby plays an important role in the following Computed-Momentum schemes. By solving (15) for the satellite angular velocity w, we obtain

w = I~-l(Lo- Imam - I~qb~)

(16)

where L0 is a known and constant inintial momentum. Equation (16) is a solution for the attitude disturbance velocity when the manipulator and/or the wheels make an arbitrary motion. Or (15) can be solved for the wheels' angular velocity ( ~ , then ~b~ = I~X(L0- I~w - Imam)

(17)

We can use the latter for computing the required velocity of reaction wheels for a given attitude maneuver w and manipulator motion 4)m. 3.3. P r a c t i c a l r e a c t i o n c o m p e n s a t i o n c o n t r o l s c h e m e Reliability management policy for the space development requires us to make most use of conventional, mature, then reliable technologies and subsystems. Compatibility with the existing systems and modularity are also recommended. From this point of view, for sattellite attitude control and manipulator joint control, the conventional PD or PID feedback modules should be most supposed to use. To meet it, we propose a control system for the coordination of satellite attitude and manipulator operation, based on individual satellite and manipulator PD (can be PID) feedback control modules. F i g u r e 2 shows a block diagram of the proposed control system. It is based on a parallel instration of PD feedback modules for each joints and wheels, but these modules are connected with a feedforward loop which signal is calculated on the computed-momentum scheme. A box with "CMRC" is the part of the feedfoward computation, in which the anticipation value for reaction wheels 4)z is determined when the manipulator motion Cma and

~md are given. There are two possibility for the CMRC calculation from eqs.(16) or (17). We then propose two versions of Computed-Momentum based Reaction Compensation control methods, say CMRC-A and CMRC-B. Combining the feedback law and equation (16), the torque for the reaction wheels (reaction compensators) should be

"rw -- K l ( a d - a ) + K2(wd- w ) + K3~)z

(18)

~++ - I~-~(Lo - I m q b m a - I : $ = )

(19)

for CMRC-A. Where ft is the satellite attitude angle and w, ( ~ are satellite attitude and reaction wheels angular velocities, which all are measured by on-board sensors, and K1,2,3 are control gains. The suffix a indicates a "d"esirerd control command given by an operator or an arm motion planner.

437 For a zero-momentum, inertial-fixed sysytem, becomes much simpler. Tw : - K l a

Lo =

~ d -~ • d

-" O,

then the control law

(20)

-- K2og - K 3 l s l (lm~)md -}- Iw~w)

Note that the proposed system (Figure 2) accepts any motion commands for the manipulator, not requiring a special motion planning strategy. The manipulator command ~bmd and ~md therefore can be determined arbitrary by a certain motion planner or a human operator. For CMRC-B, the feedforward signal is obtained as the error correction of wheels' velocity between an expected one (17) and a measured value ~bw, then ~II becomes (21)

~bff -- I w l ( L 0 - Im~md -- IsC~d) -- ~w For the zero-momentum system, = -Ux

-

-

(22)

+

which is very similar to CMRC-A (20), but the inertia matrices work differently. In the block diagram, there are sample/hold switches on the manipulator command and the feedforward lines to synchronize the timimg of these signals. The control flow with the switches is described as follows: 1. At an initial state, turn off the switches and set the manipulator motion and feedforward commands, zero. 2. Give the manipulator command ~bmd and ~md from a trajectory planner or a joystick, then compute ~bff using eq.(19) or (21). 3. Evaluate the feedforward torque component torque capacity, i.e. IK3

zl >

K3~ff:

if it exceeds the reaction wheels'

(23)

then return to the arm motion planner and modify the operation command so as to meet the reaction compensation capacity. (If the mission will not allow to change the trajectory physically, the only re-planning solution is to move the arm slower.) 4. After the modifications, if the condition

]K3dpff[ < T~m.~

(24)

is satisfied, turn on the switches and put the manipulator motion command ~)md, ~md, and the feedforward command ~bz into each PD control loops, simultaneously.

438 3.4. E s t i m a t i o n of c o m p u t a t i o n a l b u r d e n In order to emphasize the advantage of the proposed computed-momentummethod, we roughly estimate the computational burden and compare it with the previously reported computed-torque method. At each computational cycle, we need the coordinate transformation of each position vectors and inertia tensors, and preparation of Jacobian matrix. It requires the following times of multiplication/division (M) and addition/substruction (A) operations. For transformation et al. (21n + 9)M (21n + 6)A For Jacobians (43n + 16)M (25n + 4)A Then, calculate the inertia tensors Is and Ira. For Is of eq.(13) (12n + 21)M (21n + 12)A For Im of eq.(14) (7.5n 2 + 7.5n)M (6n 2 + 6n)A The final step is the computation of the feedforward signal. For CMRC-A at zero-momentum

+ 39)/ (2n + 21)A For CMRC-B at zero-momentum (3n + 9)M (2n + 9)A In total, we need the following number. For CMRC-A (7.5n 2 + 86.5n + 85)M (6n 2 + 75n + 43)A For CMRC-B (7.5n 2 + 74.5n + 34)M (6n 2 + 54n + 19)A In case of n - 6, For CMRC-A For CMRC-B

874M + 709A 751M + 559A

Equation (20) resembles (22), however we should note that the latter does not need the computation of I; -1 which requires (12n + 42)M and (21n + 28)A including a 3 • 3 inversion. Instead, I~ 1 is constant once the wheels are designed, then no computation needed: this is the advantage of CMRC-B. Mukherjee and Nakamura [3] reported the computational cost of their efficient scheme of computed-torque method. Their estimation is for manipulator control, not for the attitude control, but those cost must be similar and the comparison would make sense. In case of n - 6, they say 2,259M + 1,992A. Yokokoji et al. [4] recently made a minor

439 improvement although, the burden is still in similar number. However, in our estimation of the CMRC methods, even though no special technique or optimization algorithm is employed, the computational cost of the proposed c o m p u t e d - m o m e n t u m scheme is about one-half to one-third or less of the computed-torque scheme.

F i g u r e 3 Simulation model TWIST

BEND

SWING

= -(Kr + D~b) K = Ipw~

I

D = 2Ipw,( wd = ~/I - ( 2 w .

F i g u r e 4 Model for flexible solar paddles

4. S i m u l a t i o n S t u d y In order to demonstrate the practical performance of the proposed control method, we have carried out relatively detailed numerical simulation with a realistic 3D model. The model comprises a single 6-DOF, 75 kg, 2.5 m manipulator arm; 3-axes, 0.2 Nm20 Nms class reaction wheels; and 1.2 kilo-watt class solar paddles. The total mass of the robot satellite is 2000 kg, and the wing span of solar paddles is 28.5 m. A topology

440 of the model is depicted in F i g u r e 3 and a detailed specification is listed in T a b l e 1. Zero-momentum system is assumed here. For the simulation, direct dynamics solution is computed using equation (2) for a set of torques of manipulator joints ~-m and reaction wheels 7-w commanded by a control law. Numerical integration is carried out in every 0.01 simulational second. The solar paddles are treated as a series of rigid links connected with rotational compliant hinges, as mentioned at the beginning of Section 2. For simplicity, but not loosing a practical sense, we consider the first modes of flexible deformation about three principle axes: bend, swing, and twist. Each solar paddle therefore has three virtual, rotational joints at its root, axes of which joints are perpendicular to each others, like F i g u r e 4. The torques passively generated by the flexibility are modeled based on the following simple expression -

-

+

T)

"k

(25)

where the joint number i = 1, 2, 3 corresponding to the twist, swing, and bend axis, respectively, and the arm number k = 1, 2 indicating right and left paddles. K:i is virtual joint stiffness representing natural flexibility of the paddles, :Di is corresponding damping factor. These values are selected to express a realistic natural frequency of vibration, 0.1 Hz, 0.5 Hz, 1.0 Hz for each axis as shown in T a b l e 2. In the following simulation, we suppose a manipulator operation that the hand straightly moves over the satellite surface for 1.5 m distance in 100 s, carting with a 50 kg orbital replacement unit (ORU), see F i g u r e 3. For more reality, sensory delay and computational lag-time are considered. We suppose 1.0 s time-constant in the attitude sensor for the measurement of ~, and 0.5 s time-interval of the feedforward computation, i.e. the switches in the block diagram turn on and the corrsponding sample-hold values are refreshed in every 0.5 s. F i g u r e s 5 (a)-(d) show a series of the simulation results. Due to the page limitation, the attitude responses only around the pitch, a longitudinal axis of the solar paddles in which the largest reaction is observed, are presented here. (a) shows the case with no attitude control. After the 100 s manipulator operation, the satellite attitude receives about 6 deg change due to the accumulation of the arm reaction acceleration, and it will not return to the initial orientation. The right hand figure is a close-up of the attitude vibration. After the manipulation, 0.1 Hz vibration due to the solar paddle bending flexibility is excited, but its amplitude is less than 0.01 deg then this vibration will not cause a severe effect to the system. (b) shows the case with PD feedback control, using K1 = 100,/(2 = 1000 for attitude control gains. The attitude is going to recover zero finally, although, relatively large overshoot is observed. Maximum amplitude of the orientation change is about 3 deg. This amount of pointing failure may not be severe for a low gain antenna, but very critical for a high gain antenna. To get this response, the reaction wheel generates more than +0.5 Nm torque. If we have a limitation of the torque capacity, the attitude response becomes much worse. (c) shows the case with CMRC, the proposed feedforward control. The attitude response has been drastically improved, resulting almost zero attitude disturbance. Both of proposing CMRC-A and CMRC-B show similar results with a gain selection of K1 = 10,

441 Table 1 Base II mass (Kg) mo link 0 II 1800.0

II

[ a r m II (Kg)

I

link link link link link link

1 2 3 4 5 6

II

Simulation model parameters

length (m) length l w i d t h l h e i g h t

201

length (m)

15.0 15.0 15.0 8.0 8.0 14.0

0.30 0.70 0.70 0.18 0.18 0.43

R.W.

,, (Kg)

wheel 1 wheel 2 wheel 3

201

0.15 0.35 0.35 0.09 0.09 0.20

II mi

5.0 5.0 5.0

mmt. of inertia ( Kgm 2 )

Iox I Iov [ 25 5ooo15ooo1

Ioz 5ooo

mmt. of inertia ( Kgm 2 )

0.15 0.35 0.35 0.09 0.09 0.20

(m) radius 0.1 0.1 0.1

0.1219 1.8375 1.8375 0.0100 0.0266 0.2245

0.1219 1.8375 1.8375 0.0266 0.0266 0.0175

0.0188 0.0188 0.0188 0.0266 0.0100 0.2245

(Kgm 2 )

Iix I 0.025 0.025 0.025

Iiy I Iiz

0.025 0.025 0.025

0.50 0.50 0.50

T a b l e 2 Specifications for solar paddle vibration Solar II (Kg) Paddle .~ twist 5.0 bend 0.0 swing 50.0

II

length (m)

~

I ~1 b~

3.0 1.5 0.0 0.0 10.0 5.0

1.5 0.0 5.0

mmt. of inertia ( K g m 2 )

Iiz

0.0063 0.000 37.500

Iiy I

3.753 0.0C9 416.70

II ( K g m 2IP ) I (Hzf~ I (Kgm2/s2)~) twist 37.5 0.5 370.11 bend 1666.7 0.1 657.98 swing 1704.2 1.0 6.73 x 104 note: k = (2~fa)2I.

Iiz

3.753 0.000 454.20 ~ I

0.042 0.0048 0.023

442

%

8o[/

101.22--

40

0

b

101.21 -

~

101.2o-

= lO1.19 -40 t

~

-80

0

I

100

!

200

I

300

101.18 101.17

400

0

100

TIME [sec]

200

300

400

TIME [sec] (a) No a t t i t u d e control

%

8O

0.4

40

0.2

0 /

E

"~-

~ r

-40 -

t-,

-80 0

o.o -0.2 -0.4

!

100

I

200

I

300

400

0

100

TIME [sec]

300

400

TIME [sec]

(b) P D feedback control ( K 1 -

%

200

80-

100,/{2 = 1000)

0.4

40-

,.., E z

0-

r

t--,

-40

o.o -0.2 -0.4

-80 0

0.2

I

100

I

200

I

300

400

0

100

TIME [sec]

!

200

I

300

400

TIME [sec]

(c) Proposing C M R C control (K1 = 10, K2 = 100, K3 -- 5000 for CMRC-A, or K1 -- 10, K2 = 100, K3 = 0.1 for CMRC-B) 80-

0

0.4

40

,..-, E

0 ,"r" (,J

-40

-

-80

-

0

0.2

o.o -0.2 -0.4

I

100

I

200

I

300

400

0

!

100

I

200

I

300

400

TIME [sec] TIME [sec] (d) C M R C control with 0.2 N reaction wheel torque s a t u r a t i o n F i g u r e 5 Simulation result (attitude response in the pitch axis supposing 100 s O R U - t r a n s p o r t manipulation)

443 /(2 = 100,/(3 = 5000 for CMRC-A and K 1 - - 10, /(2 = 100,/(3 = 0.1 for CMRC-B. (d) shows the case with CMRC supposing 0.2 Nm reaction wheel's torque saturation. Even for such a limited hardware resource, the attitude disturbace is maintained -+-2 deg and it quickly returns to zero after the manipulator operation. 5. C o n c l u s i o n In this paper, we have discussed practical requirements for robot satellites and proposed a new control scheme: Computed-Momentum based Reaction Compensation (CMRC) method, in order to relieve the computational burden but to offer relatively good performance for satellite attitude maintenance anticipating the manipulator reaction. The availability of the proposed method, especially its practical implementation aspect, has been confirmed by a series of numerical simulation assuming a realistic flight-scale model. In addition, the computational cost of the proposed schemes is estimated, then a far advantage to the computed-torque control is shown. The cost estimation and simulation proves that the proposed method works remarkably well to anticipate and compensate the manipulator reaction, although employing a relatively simple control algorithm and conventional-scale reaction wheels. REFERENCES

1. Space Robotics: Dynamics and Control, edited by Xu and Kanade, Kluwer Academic Publishers, 1993. 2. M.Oda et al: "The ETS-VII, the World First Telerobotic Satellite," Proc. Artificial Intelligence, Robotics and Automation, in Space '92, pp.307-318, 1992. 3. Ft.Mukherjee and Y.Nakamura: "Formulation and Efficient Computation of Inverse Dynamics of Space Robots," IEEE Trans. on Robotics and Automation, vol.8 No.3, pp.400-406, 1992. 4. Y.Yokokoji, T.Toyoshima, and T.Yoshikawa: "Efficient Computational Algorithms for Trajectory Control of Free-Flying Space Robots with Multiple Arms," IEEE Trans. on Robotics and Automation, vol.9 No.5, pp.571-580, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

445

A 5-Axis Mini Direct Drive Robot for Teleoperation and Biotechnology Manuel Moreyra, Pierre-Henry Marbot, Steven Venema, Blake Hannaford Department of Electrical Engineering, University of Washington, Seattle, WA 98195-2500 b 1ake @ isdl.ee, w ashin gton. edu

ABSTRACT A previously developed 3-axis mini direct drive robot has been enhanced with two additional direct drive axes for general positioning and orientation of an axially symmetric tool. The arm has a work volume of about 120 cm 3 and 5-10 ~tm or better resolution and repeatability. The arm forms an initial prototype for the NASA/University of Washington MicroTrex flight telerobotics experiment. The contemplated terrestrial applications include handling submicroliter liquid samples for electrophoresis, and micro-manipulation with scaled force reflection.

INTRODUCTION In recent years teleoperation has greatly expanded its scope of applications from its beginnings in the nuclear industry and its early expansion into undersea operations. Space and terrestrial biomedical applications are now attracting increasing attention. In both of these areas, in addition to the desirability of removing humans from sources of physical risk (the vacuum of space or highly infectious agents for example), there is increasing interest in miniaturization. In space, economic and other pressures are shifting the emphasis from relatively heavy, high cost and complex systems, towards low mass, low cost systems with fewer functions and higher launch frequency. In surgery, similar pressures are encouraging replacement of traditional operations with endoscopic ones which are much less invasive. In both space and medical applications, communication links and time delays will play a key role in the overall system design. This paper will report on a new manipulator designed to explore the possibilities and challenges of small scale manipulation in these applications (Figure 1). The arm is a five axis direct drive design having a work volume of about 120 cm 3 . The arm is designed as a prototype for use in the NASA Micro Telerobotics Flight Experiment, "MicroTrex".

446 Time Delay. Some time delay is unavoidable in the communication links between master and slave manipulators in telemanipulation. If it is of sufficient magnitude, this delay effects both the control system dynamics and the human operator performance. The effects of time delays on telemanipulation performance have, for the most part, been well studied. For a recent review, see Hannaford (1994). One exception is the case of time-varying delays. Variable delays characterize all of the existing communication channels between earth and low earth orbit. These channels were designed solely for reliable accumulation of telemetry and imaging data and were not designed to latency requirements.

Flight Experiments. The ROTEX experiment was conducted by Dr. Gerd Hirzinger in May of 1993 (Hirzinger, et al., 1993). In this experiment, a human scaled, gear driven arm, flown in the Space-Lab D2 was successfully controlled from the Space-Lab flight deck, as well as from a ground control station. The arm described in the present paper is a preliminary prototype being developed for the NASA Micro Telerobotics Experiment, MicroTrex. This joint experiment between the University of Washington and the Jet Propulsion Laboratory will launch a small robot manipulator into low earth orbit and control it from the ground to perform an end-to-end test of groundbased remote control of space robots. The robot will perform tasks designed to simulate key activities of future robotic missions. The long term aim of this experiment is to provide design information for future micro and full size telerobots which will be required for human exploitation of space. In the shorter term, results are expected to benefit users and maintainers of the Space Station as well as terrestrial domestic biotechnology researchers and industries. A realtime video link to local schools is planned to make MicroTrex an educational and motivational experience for future engineers. Currently, MicroTrex has been approved for Phase-A funding under the NASA In Space Technology Experiments Program (INSTEP). Direct Drive Robots. DD robots have several advantages for manipulation including high precision and high speed (Asada & Kanade, 1983, Asada & Youcef-Toumi, 1987). Most of the previous research in direct drive robots has concerned manipulators aimed at industrial applications and having work volumes on the order of human arms. Recently we have been studying the properties of direct drive robots with much smaller dimensions. In earlier work (Marbot & Hannaford, 1991) we described a 3 axis direct drive manipulator with a work volume of about 30 cc, aimed at applications in manipulation of cells and very young embryos in the biomedical laboratory. This arm was also adapted for pick and place telemanipulation of small objects on the order of individual sand grains (Bhatti, et.al., 1992). For the current arm, we have redesigned and re-manufactured the first three links and have added 2 additional degrees of freedom. We were able to obtain high performance actuators for the robot from computer hard disk drives. Voice coil and "flat coil" actuators, used for positioning the heads of disk drives, must meet many of the same design criteria as for direct drive robots, and are mass produced at low cost. Recently we have made detailed measurements of their performance (Buttolo et. al., 1994). Laboratory Applications. Several potential biomedical applications drive this design. One

447 is loading sub-microliter fluid samples onto miniaturized electrophoresis gels for isoelectric focusing analysis. This job will require precision placement of a micro-pipette into liquid sample vials as small as 2 mm. in diameter. We intend to experiment with collecting samples from high density (864 well) micro-titer plates (molded sample vial arrays). These new plates occupy the same space as a standard 96 well plate (82 by 125 mm) with 9 times the well density. Accessing the wells will require 2 additional degrees of freedom beyond the original three since the pipette must be oriented along the major axis of the well. Arbitrary angles may be required to avoid collision with microscope objectives or other equipment. A sixth axis ("tool roll") is not needed in this application due to the axial symmetry of the pipette. The following sections will describe the Mini DD robot at its current state of development. First, the robot structure and mechanical design will be described. The design is analyzed in terms of the kinematic model, the dynamic model, and the Jacobian matrix. Then the control system will be described, and the current project status will be summarized.

MINI-ROBOT MECHANISM The first axis actuator is a linear voice coil from a 5.25 inch hard disk drive. The second joint is driven by a rotary magnetic actuator. The third axis actuator is a rotary voice coil. The body of the mini-robot contains 20 individual parts machined of aluminum and anodized.

Figure 1 Photograph of the Mini Direct Drive Robot

448 The original 3 axis design was modified in order to add two more degrees of freedom. Low inertia of the parts and high resolution and repeatability of the end effector displacements were design goals. The resulting mechanism allows generalized motion of the end-effector in five degrees of freedom: three displacements, and two rotations (pitch and yaw). Biomedical applications, such as electrophoresis procedures and laboratory sample manipulation, were used as a guideline in the current development. An additional goal was to produce a device suitable to act as a slave manipulator for experiments in scaled force reflection (Colgate, 1991, Hannaford, 1991, Kobayashi & Nakamura, 1992). The project attempts to extend the technology of mini-robotics with several particular emphases: 9 Employ direct drive actuation to maintain accuracy and repeatability, enhance force control and avoid backlash. 9 Provide good dynamic performance by minimizing inertia. 9 Use miniature disk-drive components, to get high precision, low inertia and low cost. 9 Make the two orientation axes intersect at the wrist. The electrophoresis sample handling procedure to be performed by the mini-robot consists of: 9 Picking up 500 l.tl and smaller protein samples from a well as small as two millimeters in diameter with a pipette attached to the wrist, 9 Placing the sample on the electrophoresis gel, 9 Washing and rinsing the pipette.

WORKSPACE DESIGN The mini-robot workspace is reduced to an unusual extent by the limited range of the direct drive rotary actuators. Optimization of the performance of the mini-robot involved obtaining as large a work space as possible without compromising speed, force and precision of end effector movements. The work space chosen was approximately 80 mm height, 68 mm width and 25 mm length. This will allow the robot to reach about 18 of the wells on a standard 96 well microtiter plate. This will be sufficient for the intended experimental applications. The workvolume is approximately 120 cm 3. The overall design of the first three axes is the same as reported earlier (Marbot & Hannaford, 1991). However, the parts were re-designed and re-machined to improve precision and to support the addition of two more motion axes. The same linear voice coil actuator was used for the first, linear motion axis. Two high accuracy, low friction linear ball bearings, also taken from disk drives, guide and support the first motion axis. The second joint is driven by a rotary magnetic actuator. Its angular stroke is _ 15 o and the arm is 152mm long, giving a Y-axis travel of +40 mm. The third axis actuator has angular stroke of + 13 o giving a Z-axis travel of +34 mm.

449 The fourth and fifth joints have +20 ~ of stroke. We do not consider them to affect work volume since they drive intersecting axes at the wrist. However, there is a m i n i m u m offset from wrist center to tool mount of 22 mm. Due to the low torque of actuator 3 and the relatively high weight and size of the actuator and encoder used on the fifth joint, direct drive in the strict sense of the term is not used. Instead, a belt and two pulleys are used to drive this joint with a 1:1 mechanical advantage.

ACTUATORS Table 1 shows the basic mechanical and electrical characteristics of the mini-robot actuators. The first axis actuator is a linear voice coil from a 5.25 inch hard disk drive. The second joint is driven by a rotary magnetic actuator (BEI Motion Systems Company). It has four coils in the rotor and four ceramic magnets. The peak torque is 0.08 N m when the coils are aligned between the magnets. The torque vs. angle curve is equivalent to cos 2 (202). Its total weight is 100 g. This actuator was not taken from a hard drive. In retrospect, this unit gave inferior performance at greater cost. A disk drive actuator would have had an almost constant torqueangle relation and lower mass. The third axis actuator is a rotary actuator is also made by BEI (model No. RA23-06-001), but for a disk drive application. The moving member which includes the coil weighs 6.3 g and the total weight is 46 g. The fourth and fifth joints use head actuators from 1.8" disk drives. A minor alteration was made to the actuators so that they could drive a 6.35 m m diameter shaft. Their weight is 20 g each.

Table 1 Actuator characteristics for each joint

Actuator 1

Stroke +12.5mm

Force or Torque Constant

Maximum Force or Torque

Coil Resistance Ohms

Current @ Max. Load

Type of Actuator

2.50 N / A

5.0 N

3.4

2.0 A

linear voice coil

2

+15 ~

39.0 x10-3 Nm/A

78 x10-3 Nm

5.2

2.0 A

rotary solenoid

3

+13 ~

40.0 x10-3 Nm/A

42.4 x10-3 Nm

6.8

1.05 A

rotary voice coil

4

+20 ~

13.6 x10-3 Nm/A

8.9 x10-3 Nm

5.9

0.65 A

fiat coil

5

+20 ~

13.6 x10-3 Nm/A

8.9 x10-3 Nm

5.9

0.65 A

fiat coil

450 MECHANICAL DESIGN The output of the first four actuators are directly coupled to their links, eliminating the mechanical gearing or other transmission elements. Other advantages accomplished by this direct drive configuration are no backlash, low friction, high stiffness and low weight. In the first four joints, each actuator case and actuator stator is mounted on the preceding link. Joints three, four, and five are located in the shoulder (Figure 2). The fifth joint is driven by an elastic belt and two pulleys and its actuator is mounted on the third link, non-adjacent link. This configuration reduces the mass on the wrist which would have generated much higher inertia and lower resonance frequency. The belt has enough elasticity to allow the rotation of the fourth joint. The mini-robot has the following mechanical features: 9The arm is used as a shaft to drive the fourth link (wrist). With this configuration, low inertia, high torsional stiffness and relatively high structural resonance is accomplished. This joint is directly driven although its actuator is distant from the wrist. 9The center of gravity of the mini robot moving parts is close to the shoulder. This creates a nearly balanced mechanism in which the dynamic forces in each link are almost decoupled from the dynamic forces of the remaining links. 9The rotation of the wrist joints (including a 0.5 g pipette) affects the accumulated inertias

Figure 2. Shoulder assembly.

451 on the shoulder revolute joints by +0.5 %. Therefore, in the calculation of the shoulder dynamic forces, the wrist could be considered in a fixed position for practical purposes. 9High stiffness without sacrificing low weight and low inertia was accomplished. The lowest structural resonance occurred in the arm at about 83 Hz. This structural resonance could be increased by 70% to 144 Hz by replacing the aluminum tubular arm with a steel tubular arm. A stranded Aramid filament is used as transmission belt for the fifth link,. The resulting natural frequency is about 237 Hz.

MINI-ROBOT KINEMATICS

Figure 3 depicts the mini-robot kinematic model with its workspace. We use the terms "shoulder," "arm" and "wrist" to describe the parts of the mini-robot. The shoulder is the area around axes 1, 2, and 3. The axes of the first two joints intersect, but the third one is displaced 12.5 mm along the positive direction of d I . The wrist is at the intersection of the axes of joints four and five. The arm is the link between the wrist and the shoulder. Using the DenacitHartenberg parameters from Table 2 and the standard procedure, we obtain:

Fig. 3

Mini-robot schematic diagrams. Kinematic linkage diagram (a), top view (b), and side view (c) of mechanism and workspace.

452 Table 2 Mini Robot kinematic parameters

i

(deg

~

(mm)

ai

(mm)

di

(deg)

Oi

Home Position

Motion Range

1

0

0

d1

0

d 1 = Omm

-12.5 < d I < 12.5mm

2

-90

0

0

02

02 = _90 ~

_105 ~ < 02 <_75 ~

3

-90

12.5

0

03

03 = _60 ~

_73 ~ < 03 < _47 ~

4

-90

0

152

04

04 = 180 ~

160 ~ < 04 < 200 ~

5

-90

0

0

05

05 = _120 ~

_140 ~ < 05 <_100 o

I-

a]~ll

a12 - c 0 2 c 0 3 s 0 4 + s02c04

0 la21 a22 s03s04 5T = a01 a32 sO2cO3s04 + c02c04 0

- 1 5 2 c 0 2 s 0 3 + 12.5c02 -152c03 152s02s03 -

0

12.5s0 2 +

(1) d

1

where c0i = cos 0i, s0i = sin 0i, and a l l = c02c03c04c05 + s02s04c05 + c02s03s05

(2)

a12 = - c 0 2 c 0 3 c 0 4 s 0 5 - s 0 2 s 0 4 s 0 5

(3)

+ c02s03c05

a21 = _ s 0 3 c 0 4 c 0 5 + c03s05

(4)

a22 = s 0 3 c 0 4 s 0 5 + c 0 3 c 0 5

(5)

a31 = - s 0 2 c 0 3 c 0 4 c 0 5 + c 0 2 s 0 4 c 0 5 - s02s03s05

(6)

a32 = s02c03c04s05 - c02s04s05 - s02s03c05

(7)

INVERSE KINEMATICS Space precludes the full inverse kinematic solution in this paper. In this section we develop the p r o b l e m in which the desired position of the wrist frame relative to the base frame {0 } is known, and we compute the joint position d 1 and angles 0 a and 03 required to achieve this configuration. We can equate the position to the last column of (2) giving

453

/[ -152c~176

Px

40RG --

py

Op

+ 12"5c~

]

= /

(8)

[ 152sin02sin03 1 12"5sin02 +dl ]

/

1 /

This yields the following expressions: 03 = atan2 ___/1

(py)2)~2,(~21

atan2 +_ 1-

Px 12.5_ i~-2sin 03

(9)

-]37

02 =

2

Px 12.5_~-2sin03

(lO) (11)

dl = Pz- 152sin02sin03 + 12"5sin02

If joint angles were unrestricted, there would be four solutions. For the joint angles permitted by the limited actuator ranges, there is a unique solution, and acos0 may be used in place of atan20 in (9) and (10) without the traditional difficulties (Craig, 1991).

JACOBIAN

The Jacobian matrix relates the joint rate to cartesian end effector velocity. It was computed using symbolic manipulation software (Kung, et. al., 1992)"

~

0 a12-152cos02cos03

0

0

0 0

0

0

0

0

152sin03

1 a32 152 sin 02 cos 03

=

0 0

-sin02

-cos 02 sin 03

a45

0 1

0

-cos 03

sin 03 sin 04

0 0

-cos02

sin02sin03

a65

(12)

_

(13) where a12 = 152sin02sin03- 12.5sin02

(14)

a32 = 1 5 2 c o s 0 2 s i n 0 3 - 1 2 . 5 c o s 0 2

(15)

454 a45 = sin 02 cos 04 - cos 02 cos 03 sin 04

(16)

a65 = COS02COS04 -I- sin 02 cos 03 sin 04

(17)

All the singularities of the Jacobian matrix lie outside of the workspace which is limited by the small motion range of the actuators.

MINI-ROBOT DYNAMICS Three versions of the dynamic equations of the mini-robot were derived and expressed in state space form (Table 3). The most complete of these versions included gravity torques and the torque caused by the varying tension of the axis 5 drive belt as axis 4 changes position. Pulley Belt Tension. The tension of the transmission belt of the fifth link generates internal forces which are mainly counteracted by the mini-robot structure. When the fourth joint rotates away from its home position, a component of the belt tension causes a torque that opposes this rotation. This torque increases in proportion to the magnitude of the angle relative to home position. Its magnitude can be up to 22% of the maximum torque of the fourth actuator.

The State Space Equation. The Newton-Euler equations were evaluated symbolically, using the commercial software package, Maple, to yield a dynamic equation which can be written in the form "r = M(O)O+B(O) [00] +C(O) IoEJ +D(O) +G(o)

(18)

where M ( O ) is the 5 x5 mass matrix of the mini-robot, B (O) is a 5 • 6 matrix of Coriolis coefficients, C (o) is a 5 • 4 matrix of centrifugal coefficients, D(o) is an 5 • 1 vector of torque generated by the tension of the pulley belt, and G (O) is an 5 • 1 vector of gravity terms. The matrices M ( O ) , B (O) and C (o), are extremely large. We derived one nonlinear simplified state space model and one linearized model for the mini robot dynamics. Table 3 shows each model's features.

Linearized State Space Equation. The joint strokes are relatively small for all the rotary axes (Table 2). Because of this limited change in arm configuration a linearized model is more practical than usual. We will linearize the state space model at the home position and analyze the dynamics by simulating and comparing the different models. If we substitute angles

0i

with their values at home position (Table 2) we get

455

Table 3 C o m p l e t e D y n a m i c model, and two simplified models of the m i n i - r o b o t dynamics.

3

Name

Comments

Complete model

Complete state space model (terms with coefficients smaller than 10.8 were eliminated).

Wrist decoupled model

State space model considering the wrist in a fixed position (wrist decoupled) for the calculation of joint 1 force "~1 and the joint torques x2 and "c3 . The values of the wrist joint angles were replaced with the angles at home position, 04 = 180~ and 05 = -120 ~ . Due to the decoupling of the wrist, the first three rows of these matrices do not have trigonometric functions of 04 and 05 . The number of terms is reduced to about half.

Linear model

Linearized state space model. Joint angles set to "home position", (Table 2).

0.86865 -0.00239 0.00051 0.0000313 -0.000036 -0.00239 0.00091 0.0000037 0.0000030 0.0000003 Mli n = 0.00051 0.0000037 0.000754 -0.00000950.0000034 0.000031 0.000003 -0.0000095 0.0000021 0.00 -0.000036 0.0000003 0.0000034 0.00 0.0000012

(19)

w h e r e M l i n is the 5x5 linearized mass matrix which multiplies the 5xl vector of accelerations [ 0 ] . The linearized matrices of coefficients B (19) and C(19) are not included because they m u l t i p l y n o n l i n e a r velocity vectors of joint products [001 and [02~. T h e f o l l o w i n g equation shows the linearized torque due to the tension of the transmission belt D ti n in Nm. 0 4 is expressed in radians and must be a value in the m o t i o n range s h o w n in Table 2. Oli n

-" I0 0 0 (0.00560504-0.01761) 0] T (Nm).

(20)

E q u a t i o n 20 shows the linearized gravity torques G li n . T h e angles are in radians and must be included in the m o t i o n range shown in Table 2.

G li n =

0 0 - 0.00492603 + 0.01034

(Nm).

(21)

0.000147504 + 0.00005661 9.0003522 (03 + 05) + 0.00110 The linearized state space model, with the linearized mass matrix M l i n and torque '~lin is: T, l i n

=

Mlin~) + Dlin

(19) + Gli n (19) (Nm).

Dynamic Simulation.

(22)

One reason to analyze the mini-robot d y n a m i c equations is to evaluate

their effectiveness for use in reducing disturbances in a control system.

456 We simulated each of the three models during fast trajectories in which all five joints were moved from one joint limit to the other (Table 4). An important goal of this simulation was to compare the simplified state space models with the complete model. The dynamics of the mini-robot have been analyzed within the maximum acceleration and velocity that the actuators can provide in its limited motion range. The following are observations about the relation of the dynamic forces among the links of the mini-robot: Due to the high angular acceleration capabilities of the wrist actuators, the wrist decoupled model estimates the dynamic forces at the shoulder with high precision only in the neighborhood of the wrist home position or when the angular accelerations and velocities of the wrist are low. Overall, the complete model, the wrist decoupled model and the linearized model are within about +10 % of each other for a full stroke trajectory. This is due to the efforts to design the manipulator for a minimum of dynamic coupling. For shorter trajectories, which inherently have lower velocities, the accuracy of wrist decoupled model and the linearized model improves. The linearized state space model can estimate the dynamic forces more accurately at low accelerations and velocities. The first joint can be linearized and considered decoupled from the dynamic forces of the upper links if it is driven with maximal acceleration movements. In this case the dynamic forces due to the upper links would represent less than 5% of the first actuator maximum output. The wrist decoupled model is intermediate in accuracy between the complete model and the linearized model. Overall, the robot is extremely fast. The slowest joint was joint 3, taking 0.189 seconds to traverse its 26 degree motion range. Experiments have recently confirmed these simulations.

Table 4 Minimum time required to complete a stroke of the mini-robot joints. 1

2

3

4

5

90% of max. actuator output (N or Nm)

4.5

0.070

0.038

0.0080

0.0080

accumulated masses or inertias (kg or 10-3 kgm2)

0.869

0.91

0.754

0.0021

0.0012

0.165

0.189

0.0272

0.0205

min. time to complete 0.139 one stroke (s)

457 SENSORS

Most robots today use digital incremental position encoders to estimate joint positions. This works well when the mechanism design allows several turns of the encoder disk for small joint motions. However, the direct drive feature of the mini-robot, coupled with the desire for small, very high-precision motions, implies the need for different position measurement approaches. LVDT Position sensor. For joint 1, a linear variable differential transformer (LVDT) is used. A specialized monolithic IC is used for the LVDT signal conditioning (Signetics NE5521). This chip provides an extremely stable amplitude/frequency source for the LVDT primary coil. In our implementation, a 12-bit A/D converter effectively limits the LVDT resolution to 1 part in 4096. The 1-inch travel of joint 1 therefore implies a position measurement precision of 0.0254 m/212 = 6.2 ....~tm. This was verified in the earlier mini-robot prototype by microscopic inspection of the axis displacement (Marbot & Hannaford, 1991). Analog Incremental Encoders. Joints 2-5 are instrumented with analog incremental rotary position encoders. Unlike digital encoders, these encoders output two sinusoids in quadrature phase. These analog outputs allow a high degree of interpolation between each encoder line. Determining actual angular position, 0 a f r o m the sinusoidal outputs is a two stage process (Figure 4). First a coarse position, 0 c is maintained by passing a thresholded version of the

Figure 4: Analog Optical Encoder Signal Processing sinusoids through a standard digital quadrature decoder/counter circuit. Like standard digital encoders, 0 c will have angular resolution equal to 4 times the number of lines on the encoder disk. Expressed in units of encoder-lines, 0c always yields positions that are multiples of 0.25.

458 The second stage of the process yields a finer resolution by using the relative amplitudes of the two sinusoids (called x and y) to compute a more accurate angular offset, 0 ~ from the closest encoder line. For an ideal encoder, x and y would be perfect sinusoids of identical magnitude and with exactly 90 ~ relative phase difference. In this ideal case, the line offset position 0 ~ could be computed from the relation 0o - atan 2 (y, x) 2re

(-0.5 < 0 ~ < 0.5)

(23)

where atan20 is the commonly used two-argument arc-tangent function. This offset position 0 o can then be combined with the digitally estimated position 0 c to yield an accurate estimate of the actual position

0a: 1

0 a

"-

int(Oc) +0o+

-1 0

f r a c t (Oc) - 0 ~ >

1

-1 f r a c t (Oc) - Oo < "~

(24)

otherwise

wherefract(x) is the fractional part of x. The digital line count, 0 c , is therefore corrected if the actual system, as measured by the analog outputs x, and y, is near the boundary of quadrants I and IV. This is accomplished by adding or subtracting 1 count to 0 c so that the error is propagated to the higher order bits. However, the sinusoids from most analog encoders are far from ideal. Phase differences of 110 ~ are not uncommon. When plotted against each other, these signals make an irregular potato-like shape instead of the expected perfect circle. The potato algorithm (Marbot, 1991) was developed to address these non-ideal behaviors of the encoder signals. Space limitations preclude more detailed treatment than appears below. For more details see (Marbot, 1991). The Potato Algorithm. x = Acos (0o)

The potato algorithm models the encoder outputs as (25)

y = Bsin ( 0 o - ~ ) where x and y are the quadrature outputs of the encoder, 0o is 0 o expressed in radians (-r~ < 0o < ~) is the actual joint position, A and B are the magnitudes of the quadrature signals, and 9 is the phase error between the two quadrature signals (nominally 0 ~ ). The potato algorithm uses this ellipsoidal approximation to map the "potato" shape back onto the ideal circle.

459 If, for the moment, we only consider quadrants I and IV, solving (26) for 0 o yields the relation, Ay 0o = atan Bxcos

+ tan~]

(26)

In actual encoders, A and B tend to vary in a nearly identical manner due to the geometry of the light sensors. Note also from (27) that since 9 is a measured parameter of the encoder, cos ( ~ ) and tan ( ~ ) are constants in this equation. Therefore the computation only involves one multiply, one divide, one add, and one arctangent function. Since the two-argument arctangent function should be again used here, (27) can be rewritten as 0 o = atan2 (Ay + B x s i n ~ , B x c o s ~ )

(27)

Finally, 0 o must be scaled into units of encoder lines using the relation,

0~

= m0 /1:

(28)

P a r a m e t e r Identification.

In the original implementation of the potato algorithm, the param-

eters (A, B, 9 ) were estimated off-line by moving the encoder at a constant velocity and sampling the encoder outputs at a high rate. The resulting data was then used to measure the amplitudes (A, B ) and to estimate the quadrature phase error ~ . Angular Resolution.

The sinusoidal voltages go through their complete range twice in one

encoder cycle. The 12-bit A/D converters used to sample x and y inherently limit the precision of the 0 o estimate to 1 part in 212 + 1 or 0.000122 of an encoder line. For our 1000-line encoders, this implies a theoretical angular precision of 2rt • 1000

1

= 767x10

-9

radians.

(29)

Since this mini-robot wrist is located 0.152 m from the axis of rotation, the wrist position resolution is thus 116x10 -9 m. This precision will undoubtedly be reduced by sensor noise and inaccuracies in the (A, B, 9 ) parameters of the encoder model. Actual resolution has yet to be measured. A more detailed analysis of these error sources is available (Venema, 1994).

POWER ELECTRONICS Most full-scale robots today are driven by pulse-width modulation (PWM) power electronics. The popularity of PWM is due primarily to its ability to control large amounts of power (up to several kilowatts per joint) with very high efficiency. However, the high-speed current

460 switching used in PWM systems generates a large amount of electromagnetic interference (EMI). Digital sensors (e.g., digital incremental encoders) can be made to operate robustly in this type of environment using good shielding and grounding techniques. However, to get less than 1-bit of noise on our analog to digital conversions, we must have a signal-to-noise ratio of better than 74dB. The amount of EMI generated by PWM power electronics makes it difficult to shield the encoder signals sufficiently to achieve this desired SNR. Since the power levels needed to actuate the mini-robot are relatively small (--50 watts peak per actuator), we chose to use linear power amplifiers to drive the voice-coil actuators. Highpower op-amps were configured as voltage controlled current sources for each joint. D/A converters on our computer control system directly control the motor currents. The design of the power electronics was constrained by two factors: (1) the range of actuator resistances (3.4 to 6.5 f~ ) and (2) the desired maximum current output (2 Amperes per actuator). These factors, in turn, determine the desired power supply voltage Vcc and the amount of heat sink area needed to keep the amplifiers' temperatures within their operating limits. The worst-case power dissipation is P

max

where

=

2R L '

RL

(30)

is the resistance of the voice coil. The peak current output of the ideal op-amp is sim-

ply

Vcc

Ima x = R Z .

(31)

We selected a power supply voltage of 15 volts since this was the minimum supply voltage which would meet the 2-ampere criterion for the various actuator load resistances.

C O M P U T E R SYSTEM A real-time digital computer system is used to control the mini-robot. The computer is an in-house design based upon Texas Instrument's TMS320C30 DSP chip. The processor consists of a 6U-VME form-factor circuit board containing the 32 MHz DSP chip, up to 256k 32-bit words of high speed SRAM, a special bus for communications with other processor boards, a daughter-card bus for the addition of peripheral cards, and an RS232-compatible serial communications port for interface with an external host computer. Details of this design, including the processor's Neural Broadcast communications bus, can be found in MacDuff et al. (1992). The processor typically operates with a 16MHz instruction rate but supports a peak floating point throughput of 32MFLOPS using a two-stage instruction pipeline. Analog and digital I/O is accomplished using an add-on daughter card. Each daughter card contains 4 separate 33kHz 12-bit A/D and 12-bit D/A channels as well as 8 bits of digital input and 16 bits of digital out-

461 put. Up to four daughter cards may be stacked on a single processor board. Software is written in C and cross-compiled on Unix workstations using a compiler provided by Texas Instruments. We developed a ROM-monitor for the processor which supports downloading programs from the Unix workstations via the RS232 serial port. The end-to-end system configuration is shown in Figure 5.

Host Interface

Power Electronics

I

Actuators DSP Processor Board

9

DAU2 DAU1

1

Signal Conditioning Board

MiniRobot

Sensors

Figure 5" Control system block diagram.

CONCLUSION The design of the UW mini direct drive robot has been described. The mechanics and electronics are now complete, and closed loop control of each axis, employing the analog encoder signals, has been demonstrated. In work now underway, we will develop control methods for high performance and high precision point to point positioning. This will be required for the electrophoresis application. We plan to incorporate feed forward dynamic model information (Khosla, 1988) in the controller. In the second phase of the mini-robot development, we will study the control of contact forces in scaled teleoperation using the mini robot as a slave. Human control interface will be performed by a pen-based master device now approaching completion in our laboratory. Finally, the mini robot forms the prototype for a small 6-axis manipulator for the MicroTrex space flight telerobotics experiment. The restricted motion range of the disk drive actuators provide a rather small workspace relative to the overall arm size. However, as with the original 3-axis prototype, this robot will be valuable for studying the control of small direct drive robots. For the eventual flight model, we expect to increase the work volume by designing new actuators of greater motion range using proven design principles from disk drive actuators.

462 REFERENCES

H. Asada, T. Kanade, "Design of Direct-Drive Mechanical Arms," ASME J. Vibration, Acoustics, Stress, and Reliability in Design, vol. 105, pp. 312-316, 1983. H. Asada, K. Youcef-Toumi, "Direct Drive Robots, Theory and Practice," MIT Press, Cambridge, MA, 1987. P. Bhatti, P.H. Marbot, B. Hannaford, "Microscopic Pick and Place with the Mini-Direct Drive Arm," SPIE Telemanipulation Symposium, Boston, November, 1992. P. Buttolo, D.Y. Hwang, B. Hannaford, "Experimental Characterization of Hard Disk Actuators for Mini Robotics," Proc. SPIE Telemanipulator and Telepresence Technologies Symposium, Boston, October, 1994. J.E. Colgate, "Power and Impedance Scaling in Bilateral Manipulation," Proc. IEEE Intl. Conf. on Robotics and Automation, pp. 2292-2297, Sacramento, CA, 1991. J. Craig, "Introduction to Robotics: Mechanics and Control," Addison Wesley, 2nd Edition, 1991. B. Hannaford, "Ground Experiments towards Space Teleoperation with Time Delay," In-Press in "Manipulation, Automation, and Robotics in Space", S.B. Skaar, Ed., AIAA Press, 1994. G. Hirzinger, B. Brunner, J. Deitrich, J. Heindl, "Sensor-Based Space Robotics - ROTEX and its Telerobotic Features," IEEE Transactions on Robotics and Automation, v9, no 5., pp 649663., 1993. P.K. Khosla, "Some Experimental Results on Model-Based Control Schemes," Proc. IEEE Intl. Conf. on Robotics and Automation, vol. 3, pp. 1380-1385, Philadelphia, PA, April 1988. H. Kobayashi, H. Nakamura, "A Scaled Teleoperation," Proc. IEEE Intl. Workshop on Robot and Human Communication, pp. 269-274, Tokyo, 1-3 Sept. 1992. D. Kung, J. Parsons, B. Hannaford, "Visualization of Manipulability with Mathematica," Proceedings: IASTED Control & Robotics, Vancouver, August, 1992. I. MacDuff, S. Venema, B. Hannaford, "The Anthroform Neural Controller: A System for Detailed Emulation of Neural Circuits," Proceedings, IEEE International Conference on Systems, Man, and Cybernetics, Chicago, IL, September, 1992. P.H. Marbot, "Mini Direct Drive Robot for Biomedical Applications," MSEE Thesis, University of Washington, Department of Electrical Engineering, August, 1991. S.C. Venema, "A Kalman Filter Calibration Method for Analog Quadrature Position Encoders," MSEE Thesis, University of Washington, Department of Electrical Engineering, June 1994.

ACKNOWLEDGEMENTS We gratefully acknowledge the design assistance of Kevin Welton, and Tom Jackson, and donation of 1.8" head actuators by Dr. James Morehouse of Integral Peripherals Inc. Longmont Co. This research was supported by a National Science Foundation Presidential Young Investigator Award, and by the University of Washington Royalty Research Fund, and with equipment support from Texas Instruments Inc.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

463

A L a b o r a t o r y Test Bed for Space Robotics: T h e VES II

Steven Dubowsky, William Durfee, Thomas Corrigan, Andrew Kuklinski and Uwe Miiller Department of Mechanical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA The dynamic interaction between a space robotic manipulator and its base in micro-gravity can make it difficult to control, and lead to system performance degradation. Control and planning algorithms proposed in the past to compensate for this dynamic interaction have lacked sufficient experimental evaluation. A laboratory test bed is described which was developed and built to emulate the dynamic behavior of space robotic systems and investigate proposed planning and control algorithms. Two methods for removing the effects of gravity in the laboratory are presented. Experimental results demonstrate the effectiveness of this test bed in studying the dynamic coupling between a space manipulator and its spacecraft. 1. INTRODUCTION The performance of proposed space robotic systems can be degraded by the dynamic interaction between a space robot and its supporting structure or spacecraft [Papadopoulos and Dubowsky 1993]. Free-flying space robotic systems may use reaction jets to control the attitude and position of the spacecraft; other system concepts call for small dexterous manipulators mounted on large, flexible manipulator systems [Erickson et al. 1989, Crane et al. 1991]. In each of these systems, the robot base is not fixed in inertial space [Umetami and Holcomb 1990]. New control and planning algorithms are needed to accommodate these dynamic interactions [Papadopoulos and Dubowsky 1993, Xu and Kanade 1993]. Laboratory experiments will be required to more thoroughly explore the capabilities and limitations of these algorithms. Experimental evaluation of control and planning algorithms for space robotic systems is difficult for two reasons. First, the entire manipulator, including its base, must be permitted to move in six degrees of freedom. Second, the effects of gravity in a terrestrial laboratory mask much of the nature of the dynamic interaction between a manipulator and its spacecraft which would be seen in the weightless environment of space. Test beds have been designed and built to overcome these difficulties. The most common are planar systems which use air bearings [Alexander and Cannon

464 1990, Umatani and Yoshida 1989]. These are not capable of full three-dimensional motion, though they deal with gravity successfully. Spatial systems which employ neutral buoyancy tanks are effective for some studies, but fluid damping and inertia can corrupt the results when a system's dynamic behavior is important [Spoofed and Akin 1990]. Spatial systems using suspension cables and complex mechanisms have also been proposed to eliminate the effects of gravity [Sao et al 1991]. The complexity of these systems makes their reliable, accurate use difficult at best. Finally, other systems only simulate micro-gravity with approximations of a system's dynamics [Iwata et al. 1990, Shimoji et al 1989]. A system called the Vehicle Emulation System Model II (VES II) developed at MIT permits the experimental evaluation of planning and control algorithms for mobile terrestrial and space robot systems by using an approach called "admittance control" [Dubowsky et al. 1988, Durfee et al. 1991, Dubowsky et al. 1994]. The VES approach has proven successful in evaluating control algorithms for mobile manipulator systems [Hootsmans and Dubowsky 1991]. The VES II system, an improved version of an earlier test bed, uses two methods to compensate for gravity and emulate micro-gravity; a learning and a model based approach. Experimental results show that the VES II can effectively emulate the micro-gravity behavior of space robotic systems. 2. VES II SYSTEM OVERVIEW Figure 1 shows the VES II system carrying a PUMA 560 robot. The mechanical system consists of a Stewart platform with six position-controlled hydraulic actuators [Mfiller 1992]. Each actuator is attached to the platform base structure with a universal joint, and to the platform top with a combined universal and revolute joint. The system is driven by a hydraulic power supply. A six axis force/torque sensor and its amplifying electronics are attached to the platform top. Puma 560

Platform Top

Senso~

Amps--

6 Axis Force/Torque Sensor

Revolute

~

Joint

Hydraulic

Actuator

ervovalves

Univers

Hydraulic Manifold

Base Structure

Figure 1. The VES II Experimental System

465 The platform can emulate the dynamic behavior of a wide range of systems using admittance control [Dubowsky et al. 1994]. The admittance control emulation concept is illustrated in Figure 2. The six axis force/torque sensor measures the interaction forces and torques, or wrench, acting between the manipulator and its base, Ws. The components of the wrench measurement due to gravity must be estimated for micro-gravity emulation, ~ g , and subtracted from the measured values. The learning or model based methods are used to produce this estimate. The resulting wrench, ~ a = W s - ~ g , is an estimate of the dynamic interaction wrench which would be experienced by the emulated space vehicle carrying the manipulator.

Figure 2. Micro-Gravity Admittance Control Schematic The admittance controller uses ~ a as input to a set of differential equations which describe the dynamics of the vehicle being emulated: X(t) = g (X(t),Wd(t))

(1)

where X(t) is the state vector describing the inertial positions and velocities of the platform, and g() is called the admittance model; a function which characterizes the vehicle or structure emulated. Solution of Equation 1 yields motions of the idealized vehicle which the platform is controlled to track. The Stewart platform inverse kinematics are solved to determine the actuator lengths required to produce the desired motion of the manipulator base. A key to successful admittance control is the ability to

466 achieve faithful platform position control within the dynamic frequency range of inte~:est of the admittance model. The mathematical model of the vehicle, g(), can be any linear or non-linear set of differential equations which represent a physically realizable system with a dynamic response which is within the bandwidth of the platform control system [Kuklinski 1993]. 3. MECHANICAL DESIGN A Stewart platform mechanism was selected over a serial link mechanism because the VES II was required to carry a large payload (500kg) and to have high stiffness to allow for accurate positioning in all directions. The performance goals of the VES II were chosen to enable it to study future space robotic systems, including both free-flying and elastically mounted systems. The kinematic w o r k s p a c e boundaries, payload capacities, and dynamic p e r f o r m a n c e characteristics of the system were chosen based on a series of studies [Mfiller 1992]. The kinematic design parameters of a Stewart platform were analyzed to achieve these performance criteria. Additional criteria included actuator forces, power requirements, actuator stroke limitations, interference between actuators, joint angle limitations, and the location of kinematic singularities in the system workspace [Gosselin 1990, Fichter 1990]. Computer studies were performed to identify a feasible set of parameters for the design. The system was also analyzed to determine the demands placed on the mechanical components, which include the joints, actuators, and force/torque sensor. The large payload combined with cost and reliability constraints mandated the use of high performance, commercially available linear hydraulic actuators. The force/torque sensor is a customized design. The demands placed on the system's joints were severe: they required two rotational degrees of freedom that were able to carry large forces with large joint rotations. The range of motion of the joints is a critical factor in the kinematic performance of the platform. The platform dynamic performance also depends greatly on the joints. Virtually any backlash in the universal joints will introduce nonlinear effects causing vibrations and instability. The system's joints also must have good resistance to wear during partial rotation conditions, and the ability to withstand dynamic shock loading. Figure 3 shows the universal joints designed for the platform base. The design uses preloaded tapered roller bearings for smooth operation and good dynamic load distribution. Revolute joints, which connect the actuators to the top universal joints, were designed to reduce wear on the actuator seals and piston rings caused by relative rotation between the actuator rods and cylinders [Kuklinski 1993]. Hydraulic actuators were chosen for the VES II because of the need for high power and accuracy at a realizable size and cost. A critical design constraint for the actuation system was the cost and size of the hydraulic power supply, which increase dramatically as the flow and pressure requirements of the actuators increase. Keeping the size of the power supply within acceptable bounds required a

467 series of design compromises. Relatively long actuator strokes are required to achieve a large workspace, while decreasing the cross sectional area of the actuators decreases the required flow rates. However, selecting too long and slender a cylinder would excite transverse vibrations of the cylinders. A large hydraulic pressure is required to support the load and maintain the servovalves bandwidths, but higher pressure requires a larger, more cosily hydraulic power supply.

Figure 3: Base Universal Joints The final system configuration was chosen based on a series of design analyses and hydraulic components selected. These components proved to be adequate in meeting the system performance specifications [Dubowsky et al. 1994]. 4. CONTROL SYSTEM ARCHITECTURE Figure 4 summarizes the VES II control system architecture [Durfee et al. 1991]. A SunStation 3/80 is the host computer which runs the user interface and provides software development tools. The platform and the experiment control software is downloaded from the SunStation via an Ethernet to two single board computers called the RedSlave and the BlueSlave. As shown in Figure 4, the RedSlave reads force/torque and manifold pressure measurements on the VMEBus provided by the Analog I / O Board, which digitizes the analog signals using sixteen bit A / D converters. The RedSlave uses the force/torque information and a vehicle admittance model, as described above, to calculate the desired actuator lengths, which are then sent across the VMEBus to a custom communications board, called the LegHost. The LegHost sends the actuator commands to six custom single board computers (Intel 8031 based), called LegSlaves.

468

Figure 4: VES II Control Architecture Each LegSlave is dedicated to providing the low level, closed loop position control of a single hydraulic actuator. The length of each actuator is measured by a Temposonics transducer and is fed back to its respective LegSlave. The LegSlave position controllers operate at 1000 Hz. The control signals are converted to currents by the servo amplifier cards, shown in Figure 4, which drives the servovalve on the actuator. The actuator length is measured by sending an acoustic wave along a rod internal to the cylinder. The timing of the wave reflection gives a very accurate measure of the length of the actuator, eliminating the need for a sensor external to the cylinder. The wave timing is decoded by the Temposonics card which outputs a pulse-width modulated signal to the LegSlave. This signal is digitized by the LegSlave and accessed by the RedSlave through the LegHost board. The LegHost is also used as an interface for the RedSlave to monitor the state of a set of panic buttons and a pressuresensitive switching floor mat, which are part of the safety system. The BlueSlave, shown in Figure 4, contains a Motorola 68020 CPU, and controls the manipulator and calculates wrench estimates for the model based gravity compensation described below. The BlueSlave commands an 8 axis motion control board across the VMEBus. The motion control board reads the encoders and drives the motors of the manipulator mounted on top of the platform. The six axis force/torque sensor is a modified version of a commercially available unit. Extensive testing has characterized the sensor, including its accuracy and interchannel crosstalk, since accurate sensor calibration is required to achieve effective gravity compensation.

469 5. SAFETY SYSTEMS The VES II is a powerful machine, capable of exerting large forces and moving rapidly. It has the potential to do damage to itself and its payload and to inflict serious injury on any person within its operating area [M/iller 1992]. While operating under admittance control as opposed to trajectory control, the required motions of the platform are not generally known before an experiment begins. Hence, it is not possible to check for internal interference and obstacle avoidance in advance. This produces the need for a two-level safety interrupt system. If a first level error interrupt is generated during operation, the platform is brought to a stop using a computer controlled deceleration routine, or "soft freeze". There are software safety checks which test certain platform parameters inside the control loop and generate a first level interrupt if any of these are outside of pre-specified limits. These parameters are platform position, joint angles, actuator lengths, actuator velocities, actuator position errors, and the total hydraulic flow required of the system. If a second level safety interrupt is generated, then the platform is immediately stopped in place. This "hard freeze" is accomplished by stopping the flow through the actuators with six Moog Series 760 Auxiliary Safety Manifolds. Also, solenoid valves in both the main supply and return lines isolate the platform from the hydraulic power supply. The computer can initiate a hard freeze if the platform position, actuator lengths or joint angles approach mechanical limits, or if a very large actuator position error is detected. Watchdog timers on the six LegSlave boards to indicate if a software loop has unexpectedly stopped operating, a pressure sensitive floor mat surrounding the platform, and a hydraulic manifold pressure transducer are all monitored by the safety software. The floor mat is intended either to prevent the platform from moving or to stop its motion if a person is in the workspace of the system. In addition, a watchdog timer on the LegHost board and an array of panic buttons are directly wired to the safety manifolds. Due to the rapid deceleration of the platform caused by the closing of the safety valves, a hard freeze may cause damage to the force sensor and payload. As with any complex research system, the safety of the system is determined by the care exercised by its users. 6. EMULATING M I C R ~ V 1 T Y

ON THE VES II

Micro-gravity emulation is difficult to achieve if the dynamic wrench estimate is corrupted by even small errors in the gravity wrench estimate as the gravity effects can be substantially larger than the dynamic effects of interest [Corrigan and Dubowsky 1994]. Thus the gravitational wrench must be determined and compensated for very accurately. The gravity wrench is a function of several variables:

Wg = Wg(q, ~, P)

(2)

470 q is a vector of the manipulator system joint displacements, cI) is a vector of manipulator base (platform top) inertial orientation angles, and P is a vector of mass parameters of the system. Both q and 9 are measurable quantities, while P is a physical property of the manipulator. Each of these can introduce some error to the gravity wrench estimate. Typical manipulator joint encoders produce such small errors that they have a negligible effect on the gravity wrench estimation. Although errors in the measurement of the manipulator base orientation can cause significant errors in the gravity prediction, accurate sensors and the positioning accuracy of the VES II Stewart platform greatly reduce these errors. The major source of error in ~ g is uncertainty in P. Of the two methods used to predict accurately the gravitational wrench, the model approach directly finds the manipulator mass parameters, P, and the exact relationship of equation (2), while the learning method iteratively finds W g without explicitly solving for the mass parameters. 6.1 The Learning Method Here the gravitational wrench of a manipulator is "learned" iteratively during its motion on the VES II. This produces accurate space emulations with minimal analysis and real time computation. However, the learning method can be used only for experiments where the commands to the system are known in advance and are repeatable; it is inapplicable for telerobotic experiments involving human supervisors, or for other experiments with spontaneous events. The learning method finds directly the reaction motions of the manipulator base structure (the platform top) caused by the dynamic forces for a given manipulator motion without requiring knowledge of the manipulator's mass parameters. At each iteration a value for the motion of the manipulator base, a motion approximating the micro-gravity dynamic base motion, is used to form an estimate of the gravity wrench ~ g . The manipulator's joints and its base are moved very slowly and the sensor readings are recorded as ~ g . Since the gravity wrench is a function only of system positions, at low velocities with negligible dynamic effect the measured wrench will be equal to the gravity wrench. The accuracy of this estimate depends on how closely the base motions used in the iteration approximate the true micro-gravity dynamic motions. The estimate found, ~ g , is then subtracted from actual sensor readings during a full speed motion to yield an estimate of the dynamic base wrench, ~ a . That value is used by the admittance controller to produce a better estimate of the micro-gravity base motion. Hence the iterative procedure consists of obtaining an improved estimate for the platform motion and then using the platform motion to produce an improved estimate of Wg [Corrigan and Dubowsky 1994]. If the method converges, it can be shown that the final dynamic wrench and base motion used correspond to micro-gravity conditions. Successive base

471 motion estimates generally converge quickly to some accuracy allowing the VES II to emulate micro-gravity [Corrigan 1994].

6.2 The Model Based Method This m e t h o d requires experimentally finding the m a n i p u l a t o r mass parameters very accurately as the gravity wrench is analytically formulated in terms of the those parameters, joint angles, and the orientation of its base. Once the analytical relationship and the mass parameters are formulated, W g can be calculated. This allows the VES II to compensate for gravity and emulate microgravity without a priori knowledge of the system inputs, and permits studies such as telerobotic experimentation. The d e v e l o p m e n t of a mass parameter model requires finding the gravitational wrench at the base of a general manipulator with n links by applying elementary laws of statics, yielding:

Wg=

Fg - ~(mi){~g} = 1 Mg

(Mtotal){~g}

- ~ {~r i } x { ~g}(mi ) 1

(3.a)

(3.b)

where mi is the mass of link "i", bg is the gravitational force vector transformed to the coordinate frame of the manipulator base, and bri is the position vector of t h e i th links center of mass in the manipulator base coordinate frame. Equation 3ab produces a model of the manipulator's mass parameters. Equation 3.a requires a very accurate measurement of the total mass of the manipulator, Mtotal. This is done using the platform and its sensor. Unfortunately, small errors in the sensor or the platform orientation can lead to excessive errors in Mtotal. This problem is solved by making a number of force measurements at various platform orientations. Mtotal is then found accurately with a least squares calculation. The moment at the manipulator base caused by gravity can be rewritten by substituting a homogeneous transformation matrix for each link into equation (3.b). This matrix equation is expanded to produce an equation for the gravity moment with 4n+1 terms in each of the three axes. A simplified linear matrix form is obtained after some mathematical manipulation [Corrigan and Dubowsky 1994]: Mg = [G(q, cP)] {P}

(4)

where [G(q,~)] is a configuration dependent matrix, and P is a mass parameter vector of matching dimension. Although each element of P contains several constants, only their sum is required, which produces a simplified model of the manipulator [West et. al. 1989]. The best results are obtained if Equation 4 is solved for P by collecting a large set of data and using a least squares solution to minimize the errors over the entire range of system motions.

472 Together, estimates of P and Mtotal form the model required to predict the forces and m o m e n t s due to gravity at the base of a manipulator. Micro-gravity emulation can be achieved on the VES II by using these values to estimate gravity and subtract it from sensor measurements. 7. EXPERIMENTAL RESULTS

The VES II has been used successfully to emulate a n u m b e r of systems. The system described in Figure 5 was e m u l a t e d with a PUMA 560 m o u n t e d on the VES II system.

Figure 5. Space Robotic System Description The s u p p o r t i n g structure model with motions {Xb, Yb, Zb, CI)x, ~y, CI)z} and is represented by the mass, stiffness and d a m p i n g matrices, in SI units: Mass = 500 kg, Ixx=200 kg m 2, Iyy =Izz=250 kg m 2

Stiffness Matrix =

Damping Matrix -

50000 0 0 0 0 0 0 459 0 0 0 2295 0 0 0 459 0 -2295 0 0 0 2823 0 0 0 0-2295 0 15298 80 02295 0 0 0 1529 -J 1007 0 0 0 0 010 0 0 0 0 010 0 0 0 0 024 0 0000550 0000055

0 0 0 0

(5a)

(5b)

(5c)

473 The effectiveness of the VES II with gravity compensation can be seen in the simple case where the PUMA is c o m m a n d e d to move its shoulder joint q l through 180 degrees of motion in 2 seconds, nearly its maximum velocity, with its arm extended perpendicular to the q l axis. Figure 6 shows the gravity compensated dynamic moment in the Xb axis for both the learning method and the model based emulations.

Figure 6. Micro-Gravity Moment Figure 7 shows the close correlation in the base motion produced by the two methods. The learning method converged quickly, to within one millimeter, or .001 radians in each axis. Both of these methods have been used successfully to verify the effectiveness of path-planning and control algorithms for space robotics in a number of studies.

Figure 7: Micro-Gravity Rotation

474 8. CONCLUSIONS The use of robotic manipulators in space is complicated by the effects of dynamic coupling between a space manipulator and its supporting spacecraft or base structure. Control algorithms developed for these systems are difficult to evaluate experimentally in a terrestrial laboratory. Here a test bed designed to perform such experiments, the VES II, is described. The results presented show the ability of the VES II to emulate the micro-gravity of space robotic systems. 9. ACKNOWLEDGMENTS The support of this work by NASA (Langley Research Center, Automation Branch) under Grant NAG-I-801 is acknowledged. Also acknowledged are the contributions to this project of J. Penington and R. Williams of NASA's Langley Research Center and the following MIT faculty members and students; J. Baker, P. Foote, Y. Gong, D. Goodrum, H. Idris, H. West, and P. Zackin. REFERENCES

Alexander, H. and Cannon, R., "An Extended Operational-Space Control Algorithm for Satellite Manipulators," Journal of Astronautical Sciences, Vol. 38, No. 4, 1990, pp. 473-486. Corrigan, T. and Dubowsky S., "Emulating Micro-Gravity in Laboratory Studies of Space Robotics," 1994 ASME Mechanisms Conference, St. Paul, MN, Sept. 1994. Corrigan, Thomas R.J., The Development and Application of Methods for Emulating Micro-Gravity, S.M. Thesis, Department of Mechanical Engineering, MIT, Cambridge, MA, May 1994. Crane, C.D., Duffy, J. and Carnahan, T., "A Kinematic Analysis of the Space Station Remote Manipulator System (SSRMS)," Journal of Robotic Systems, vol. 8, no. 5, 1991, pp. 637-658. Dubowsky, S., et al, "The Design and Implementation of a Laboratory Test Bed for Space Robotics: The VES Mod II," submitted to 1994 ASME Mechanisms Conference, St. Paul, MN, September 1994. Dubowsky, S., Paul, I. and West, H., "An Analytical and Experimental Program to Develop Control Algorithms for Mobile Manipulators," Proceedings of the VIIth Symposium on Theory and Practice of Robots and Manipulators, Udine, Italy, Sept., 1988. Durfee, W., Idris, H. and Dubowsky, S., "Real Time Control of the MIT Vehicle Emulator System," Proceedings of the 1991 American Control Conference, Boston, MA, June 1991. Erickson, J.D., Reuter, G.J. and Healey, K.J., "Technology for an Intelligent, FreeFlying Robot for Crew and Equipment Retrieval in Space," Proceedings Space Operations & Robotics Conference & Workshop, JSC, Houston, TX, July 1989.

475 Fichter, E.F., "A Stewart Platform-Based Manipulator: General Theory and Practical Construction," Journal of Mechanical Design, Sept. 1990. Gosselin, C., "Stiffness Mapping for Parallel Manipulators," IEEE Transaction on Robotics and Automation, Vol. 6, No. 3, June 1990. Hootsmans, N.A.M. and Dubowsky, S., "Large Motion Control of Mobile Manipulators Including Vehicle Suspension Characteristics," Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, CA, April 1991. Iwata, T. et al., "Ground & Flying Testbed for the 2nd Generation Space Robot," Proceedings of the i-SAIRAS '90, Kobe, Japan, 1990. Kuklinski, A., The Practical Implementation of an Experimental Tool for Dynamic Interactions in Mobile Manip. Systems, S.M. Thesis, Department of Mech. Engineering, MIT, Cambridge, MA, June 1993. Mffiler, U., Design of a Second Generation Experimental System for Studying Mobile Manipulator Control, S.M. Thesis, Department of Mechanical Engineering, MIT, Cambridge, MA, June 1992. Papadopoulos, E. and Dubowsky, S., "Dynamic Singularities in Free-Floating Space Manipulators," Journal of Dynamic Systems, Measurement, and Control, vol. 115, no. 1, March 1993, pp. 44-52. Papadopoulos, E. and Dubowsky, S., "The Kinematics, Dynamics and Control of Free-Flying and Free-floating Space Robotics Systems," Special Issue on Space Robotics, IEEE Transactions on Robotics and Automation, Vol. 9, No. 5, October 1993, pp. 1-13. Invited. Sato, Y., et al "Micro-G Emulation System Using Constant-Tension Suspension for a Space Manipulator," Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, CA, April 1991. Shimoji, H., et al., "Simulation System for a Space Robot Using 6 Axes Servos," Proc. of the Xlth IFAC Symposium of Automatic Control in Aerospace, 1989. Spofford, J.R. and Akin, D.L., "Redundancy Control of a Free-Flying Telerobot," AIAA Journal of Guidance, Control, and Dynamics, vol. 13, no. 3, May 1990, pp. 515-523. Umatani, Y. and Yoshida, K., "Experimental Study of a Two Dimensional FreeFlying Robot Satellite Model," Proceedings of the NASA Conference. on Space Telerobotics, Pasadena, CA, January 1989. Umetami, Y. and Holcomb, L., Co-Chairmen, Proceedings of the i-SAIRAS "90, Kobe, Japan, November 1990. West, H. et. al., "A Method for Estimating the Mass Properties of a Manipulator by Measuring the Reaction Moments at its Base," Proceedings of the 1989 IEEE International Conference on Robotics and Automation, Scottsdale, AZ 1989. Xu and Kanade, Space Robotics: Dynamics and Control, Kluwer Academic Publishers, Norwell, MA, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

477

Hierarchical Prediction Model for Intelligent Communication in Multiple Robotic Systems Kosuke Sekiyama and Toshio Fukuda Department of Mechano-Informatics and Systems, School of Engineering Nagoya University A b s t r a c t : This paper describes a predictive decision making strategy for the multiple robotic systems. Although communication plays an important role in the robot system, it tends to be regarded as merely information exchanges by data transmission. However, communication is one of the vital intelligent activities in the multiple robot systems. In this paper, we propose a framework of "Intelligent Communication" which treats communication as an essential part of decision making considering interaction with the other agents. It includes knowledge abstraction from discrete events, intention reasoning and prediction. This paper, in particular focuses on the prediction issues and proposes the concept of Hierarchical Prediction Model for its realization. We will discuss the advantage of prediction and its difficulty for application to the robot through the simulation results.

1. Introduction In recent years, many research works regarding the multiple robot systems have been done [7], [8]. Its fields involve several issues on cooperation, coordination among agents, collective intelligence and self-organization appeared in the artificial life and so forth. Cellular Robotic Systems (CEBOT) is one of such the autonomous distributed robotic systems, that is composed of a number of functionally limited and different robotic units called "cell". The CEBOT reconfigures its structure in terms of hardware and software, according to the task or environment [1], [2]. In spite of the fact that some attempts have been made to generate an intelligent behavior from combination of the simple rules, realizing such the systems is still a challenging problem and it will require years of work to achieve an efficient system On the other hand,

478

if we expect truly intelligent behavior to the robots, an individual is required to possess some degree of intelligence. Since communication enables agents (cells) to share all sorts of information in the system, it can be said that every cell potentially possesses whole knowledge in the system and this feature enables to link cells for the coherent functioning and make the system robust. Therefore, communication is among such the vital intelligent activities in the multiple robot systems. However, we must note that some problems take place, for example, as the number of the cell increases, communication load will be extremely high. Efficient protocol and network is required. However, we consider that this will not give the fundamental solution to this problems. Because the environment where the agents work is full of uncertainty, agents must make a decision in real time even if they cannot obtain sufficient information to cope with the condition. Therefore, we should consider communication including decision making level. The paper organization is following. In section 2, we discuss the motivation of the study in detail, and thi~ concept of "Intelligent Communication" is proposed. Also, previous work is mentioned. In section 3, we propose the Hierarchical Prediction Model as a predictive decision making strategy and describe its back ground. In section 4, some applications of prediction are described. One is collision avoidance and the other is as to the chase. Simulation results will illustrate the effectiveness and necessity of prediction. Finally, we conclude the results and discuss the remaining problems and our future work.

Figure 1 Cellular Robotic System

2. Intelligent Communication For the multiple robot system to function efficiently, although it depends on the complexity of the task, each agent should have sufficient intelligence to cope with many problems that occur in group behavior. High level communication

479 is required in such a situation. Therefore, when we consider communication, it should be treated in the sense of the intelligent activities, such as reasoning and prediction. One of the main roles of communication is to reduce uncertainty in decision making. Therefore, it is supposed that lots of agents rely on communication to acquire necessary information in their decision. Our environment is full of uncertainty, that is, information that robot needs to obtain will be large. Consider what will happen if agent requires huge amount of information, however, it is not available at a time, therefore, they try communication until they are satisfied. "Communication explosion" will be a serious problem. The CEBOT assumes that a number of robots are engaged in the task. So, since communication delay causes further delay, the system will not work efficiently. Improving efficiency in terms of data transmission will not give the fundamental solution to this problem. We should explore the way to compensate the lack of information when uncertainty remains and communication load is high. Being conscious of cooperative behavior, we call the reasoning for the compensation of knowledge "intention reasoning" and this is a key issue for the intelligent communication. Standing from the view point of that communication should be treated in the context of total intelligent activities which are closely related to interaction with other agents, we attempt to reduce communication load in the whole system through the intention reasoning based on the knowledge and the risk estimate to the decision as a selfregression. Conceptual architecture for the intelligent communication is shown in figure 2. It consists of three main parts. 1) Inner model: Data from sensor inputs first enter the inner model. This model stores necessary information and organizes rules or strategies. Also, it is responsible for communication using the knowledge and rules. Particular important role is to convert correlation of numerical data to symbolic logical relation and abstract rules. The results are reflected to prediction. Realization of these is our future challenging problem. 2) Predictive Decision Making: Actual environment is dynamic. Therefore, intention reasoning should be made considering change of time. Also, only reactive motion cannot follow rapid change. To be flexible system, predictive decision making is essential. In particular, unification with knowledge based reasoning makes powerful prediction model. Predictive decision making is a main topic in this paper. So, further details are described in the next section. 3) Self-Regressive Decision Making: As well as feedback is important in reactive control, self-regression to the decision is essential. In case of that the decision by the cell affects the environment greatly, it must be carefully made. Therefore, the cell needs communication to acquire sufficient information and reduce uncertainty. On the other hand, if the result of the decision is trivial to

480

the environment, uncertainty can remain high. Therefore, communication is not so required. The cell should rather estimate the communication load in the system highly. We proposed the inner criterion to execute communication by estimating risk for the decision [3].

Interaction with Other

In net N| odel~

Cell

-~ C. . . . . . I ieation L-~---

[ Knowledge & Rules ~ Gioba.1Rule/.Knowl=~ Risk l~[~h [ [..... ~ ; ] I [ ~ jI ~ ~isk Estimate~-~ Kowledge abstraction i..............Jr----~'~Event based Behavior ~ l [ ] . [[-~ Prediction ~ [ Risk Low [ D..... t ...... t set Ill . . . . ~ ~ [ [ (pr0babi!ity model) [~ . , . /] [Decisi . . . . ~.~ll I ]L,~TimeSerlesPredlction ~__/] I .. 1] [ S. . . . . basedAdaptat|on [ Sensor Input ~-~)~ontroller " I

t

[Outer World]:

/

[ ,~ l ~E . . . . . tion] [

Figure 2 Framework of Intelligent Communication

3. Predictive Decision Making 3.1 Merit of prediction and difficulty

Most of the risk avoidance or navigation systems rely on the reactive strategies. However, we humans can avoid many risks by prediction. In conventional decision making strategies, few research works are found that actively adopt prediction. The merits of predictive decision making are following. 1) It will be possible to escape from risks which cannot be avoided by reactive motion. Even if it is not, the robot can gain time to cope with the problem. 2) Procedure to solve conflict problems needs high load of communication. In terms of communication reduction, the role of prediction is valuable. However, despite of these merits, the following fact hinders application of prediction. That is, behavior of the robot is non-stationary and the model is unknown. For example, if the robot is moving as shown in figure 3, prediction result based on the time series analysis will be dashed line. However, the robot will turn right using its collision avoidance algorithm. As is discussed, in considering prediction of the robot behavior, we should notice that state oriented rules, which cannot be observed directly, exist and influence the

481

behavior. Therefore, predictive decision making must be made based on both time oriented prediction and state oriented transition rules.

Figure 3 Will robot collide to the wall?

3.2 Hierarchical Prediction Model Hierarchical Prediction Model (HPM) is a challenge for the limitation of conventional time series model for robot behavior prediction. It stands on the assumption that a robot behavior is determined based on its strategy, planning results, some behavior rules defined in the system, or as a result of adaptation to the environment such as a collision avoidance. Our attempt is to unify the time series prediction and state oriented transition rule and to enhance the adaptability of the robot decision making. HPM is a multiple layered prediction model. Each layer is responsible for the different size of the state prediction. The reason why multiple layered prediction is introduced is that the result of the prediction will be different depending on the rang and depth of consideration. For example, trends cannot be detected by short time series analysis. If data sets are treated in a long scale, we might find some transition rules which apply to only large scale of the state transition. In terms of the robot behavior, the trend in the long scale is considered to reflect strategies or the planning result of the robot, based on which respective behavior is determined. On the other hand, local fluctuation of the data in the short range scale is considered to be due to the adaptation to the environment such as collision avoidance with the obstacles or the other agents. Therefore, each layer predicts from deferent point of viewpoint. In considering robot behavior prediction, it is essential to note that what is the driving force of the behavior. We call it intention reasoning to infer what is the plan and strategy of the robot from its behavior. However, this is still a difficult problem. One of the most simple cases is adaptation to the environment. As we described previous section with figure 3, the robot behavior will be affected by the local surrounding conditions of the

482

environment. Currently, we consider to integrate the both results of the time series prediction and the knowledge based reasoning with taking common predicted region as shown in figure 4-(a). The merit of this method is in easiness to incorporate state transition rules. For example, if one state has a transition rule such as a traffic rules, it can generate state transition vector and can be unified with time series prediction. Also, if it is known that certain region is obviously occupied by some obstacles, they can be easily excluded from the possible prediction area. The detail mathematical formulation and advanced treatment of this paper is described in [9]. Figure 4-(b) shows a simple example of this operation based on the entropy maximum principle. This method is applied to simulation 4.2 to exclude overlapped area between the prediction result and the regions which are occupied by the obstacles. In the next section, we discuss an application of predictive decision making with a simulation.

Figure 4-(a) Unification of the prediction results

Figure 4-(b) Statistical unification with Entropy Maximum Principle

4. Simulation 4.1 Collision A v o i d a n c e with Prediction Since the assumption of the stationary process will not hold to the robot behavior, this method is based on a random walk diffusion process from

483

observed location. Firstly the back ground is described. Let us consider a case of stochastic process as shown in figure 5. A object move to +a or -a at a time based on the transition probability p, q, r. This is a diffusion process which equation with respect to time t and position x is expressed as the diffusion equation (1). A fundamental solution of eq.(1) is expressed as well-known normal distribution form (2).

(1)

Op _ Op 02p Ot - -U-~x + D Ox----T,

1 _ ix- ,tfp ( x , t ) - 44TcDt e 4 D t

(2)

where,

u-a(p-q),

D-

a2(1- r)

2

(3)

, p+q+r-1.

Here, delta function is defined as follows,

6(x)-

0 (x - O)

(4)

(xr

where, followings are satisfied.

S~oo(x)dx- 1, 6 ( x ) - ~-~olimf (x) , o"2 - 2Dt. The region in which the object exists will shift as time goes, expanding its deviation. This diffusion process is shown in figure 6.

q -a

P a

Figure 5 Stochastic Process

Figure 6 Diffusion Process

484

Now, let us consider the situation as shown in figure 7, where cell A and cell B is moving along respective path which is surrounded by blocks. In case of the situation, even the cell is equipped with sensors to detect the objects, until one of the cell enters conjunction, sensor is disturbed by blocks. Therefore, without introducing some traffic rules, collision may take place. In order to avoid collision or solve dead-lock problem, many research works have been proposed such as in [4]-[6]. However, most of the methods are based on reactive adaptation or priori rules and few attempts are found which use prediction. If collision was predicted, the cell can avoid not only risk of collision, but also the complicated procedure to cope with the situation. Assume that information as to position of each cell is available through communication here, but each cell dose not know the path of the other cell. Since any behavior model of the cell is unknown, it is almost impossible to specify the particular position. Therefore, we will try to predict the region in which the cell will exist. Expanding eq.(2) to 2 dimensional distribution, the existence region of cell A and cell B is described as eq.(5) and eq.(6).

Ii l

1

exp 2

1

exp-2 ~ ~ )

f A ( X' Y ) -- 21r,tT Ax tT Ay

fs(x,y)- 2]r,aBxaBy

(5)

-+ t ~A,' )

t

(6)

17By

Figure 7 Collision Avoidance in sensor useless situation

Since these events are mutually independent, joint probability density function simply becomes as follows, 1

F ( f A f8 t) - (2n~)20"~O'AyO'SxO"8y exp ,

,

/x

+

t.-~57-.~J

t777.~)

.

(7)

485 After calculating eq.(7), introducing new deviation and mean as follows, 2_

(rx

1

1 ([y 2

1

1 ' /4 -

,)

m

1

Crh~: 8

1

,)

_

X A "t- -'--7-- X

1

O'Tt~ O'h~. (Likewise for y.) We will get simplified form as follows, F o~ exp-

(a~+-aa,7)(~; +Ga,,).exp-

Gx ~ v ~

(8)

o

For possible existence region of the cell S, joint existence probability in the region {a]a c S} as shown in figure 8 is expressed as follows,

P(t)- P{(x,,y,)e AI A c S}- I~ F(fa,fs,t)dA,

(9)

AcS

where,

P{A c 5-}- f[ JJ

1

47/72O-xO'y

exp-

X - L/x ] 2

o~ ) +[x-/~v_87(]: dA. /

Multiplying 2zro',o',, to both sides of eq.(8), we obtain following, t'(t)-

P { A c S} 2

2

+

O" 2

_~[ (.~A-~): + (gA-y,): exp Y((G]+G: ) (~.+G: Bx

(10)

By

Eq.(10) denotes the virtual joint existence probability in the region A at time t. Figure 8 shows the fluctuation and the shift of distribution of cell A and cell B based on the situation in figure 7. Since the cell cannot enter the block area, deviation of both of directions to the block is restricted. Figure 9 illustrates fluctuation of eq.(10) in case of this simulation. This result shows if cell A passes through conjunction around t=5, it may collide with cell B. Therefore, cell A can avoid collision by changing path or moving velocity. As well as risk avoidance, this can be applied to the chase or navigation. In the next section, we consider an another approach.

486

Figure 8 Joint existence Probability

Figure 9 Shifting distribution of the cell

Figure 10 Time varying joint existence probability in region A

4.2 D e t o u r w i t h P r e d i c t i o n for c h a s e Let us consider the situation that cell A is trying to catch up with cell B. cell A and cell B moves a t the same constant speed. We executed simulation to show the effectiveness of predictive decision making. Figure 11 shows the set up of the simulation. Each cell can move to any direction, but cannot enter the

487 black circle area. Solid line and dashed line denotes trajectory of cell A and cell B respectively, and the gray dot is location of the cell. Moving direction is given by unit vector that is defined as follows,

goal - AB +

2

l + ( r + r0)2

~ O~A

(11)

where, r is distance between cell and center of the black area. In case of cell B, A B is replaced to the vector to the virtual goal of cell B.

Figure 11 Simulation Set up

In the case of figure 12-(a), cell A takes the shortest path to reach cell B. However, cell A was not able to catch up with cell B. (Figure 12-(b)) On the other hand, figure 12-(c)shows the case of that cell A predicted cell B's existence region of several step ahead (calculated based on the distance between both). In this case, the first term of eq.(11) A B is replaced to the vector to the center of predicted region. Predicted region is expressed basically based on the eq.(6), but more general form eq.(12) is adopted.

fB(X, y)--

1 , exp(-2(x-x8)~c~(x-~B)) ,

2 1c.1 :

where, c~ is covariance matrix, and X denotes position vector.

x-(x,y) ,

-I

Gy

(12)

488

Based on the time series data on position of latest 5 steps, velocity and direction are estimated by the least squares method. If predicted region invaded object region, the area is excluded by the method described in previous section and new region is made. In figure 12-(c), cell A takes the shortest path as well as previous case. In figure 12-(d) and 12-(e), predicted region suggests that cell A should detour to catch up with cell B. Figure 12-(f) shows the decision was successful. This result was never obtained with reactive chase.

Figure 12 Simulation results

5. Conclusions This paper described the framework for the Intelligent Communication. It regards communication as whole intelligent activities in the decision making process. Predictive decision making is an important and effective strategy for adaptability enhancement. For the unique problem of robot behavior prediction, we proposed the concept of Hierarchical Prediction Model and attempted to synthesize time series prediction and knowledge based reasoning. Also, the simulation were made for the two cases, collision avoidance and

489

chase. The results indicate applicability to the actual decision making. However, many problems still remain to be solved, and it is only the beginning stage of the research. It is especially important to consider how to adopt the prediction result into planning scheme. This is closely related to reasoning structure. It is our ongoing work.

References [1] T. Ueyama, and T. Fukuda, Cooperative Search Using Genetic Algorithm Based on Local information- Path Planning for Structure Configuration of Cellular Robot-, Proc. of 1993 IEEE International Conf. on Intelligent Robots and System (IROS '93), pp. 1110-1117, 1993. |2] Y. Kawauchi, M. Inaba, and T. Fukuda, Self-Organizing Intelligence for Cellular Robotic System (CEBOT) with Genetic Knowledge Production Algorithm, Proc. of 1992 IEEE International Conf. on Robotics and Automation, pp. 811-818, 1992. [3] T. Fukuda and K. Sekiyama, Communication Reduction with Risk Estimate for Multiple Robot System, Proc. of IEEE International Conf. on Robotics and Automation, pp. 2864-2869, 1994. [4] J. Wang, Deadlock Detection and Resolution in Distributed Robotic Systems, Japan/USA Symposium on Flexible Automation, pp. 673-677, 1992. [5] K. J. Kyriakopoulos and G. N. Saridis, Collision Avoidance of Mobile Robots in None-stationary Environments, Proc. of IEEE International Conf. on Robotics and Automation, pp. 904-909, 1991. [6] S. Kato, S. Nishiyama and J. Takeno, Coordinating Mobile Robots by Applying Traffic Rules, Proc. of IEEE/RSJ International Conf. on Intelligent Robots and Systems, pp. 1435-1441, 1992. [7] J. Ota, T.Arai, and D. Kurabayashi, Dynamic Grouping in Multiple Mobile Robots System, Proc. of IEEE International Conference on Intelligent Robots and System, pp. 1963-1970, 1993. [8] T. Ishida, Towards Organizational Problem Solving, Proc. IEEE International Conf. on Robotics and Automation. Vol. 1, pp. 839-845,1993. [9] K. Sekiyama and T. Fukuda, Adaptability Enhancement of Multiple Robots in Dynamical Environment, Proc. of IEEE International Conf. on Robotics and Automation, 1995, to appear.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

491

Proposal for C o o p e r a t i v e R o b o t " G u n r y u " C o m p o s e d of A u t o n o m o u s S e g m e n t s Tokyo Institute of Technology: Shigeo Hirose, Takaya Shirasu, and Edwardo F.Fukushima This paper proposes the concept, and explains the characteristics, of a new group robot "Gunryu," each segment of which has autonomy but at the same time moves and operates cooperatively with the other segments. Next, the mechanisms of a primary model, GR-I, which is comprised of two prototype segments, are introduced and the results for operational experiments are shown.

1.

Introduction

The authors have been conducting research on a snake-like robot, and have proposed the articulate-body mobile robot "Koryu" as shown in Figure 1. The articulate-body mobile robot "Koryu" has active joints for up/down motion and swing motion. The coordination control of these joints and rotational motion of the active wheels endows the "Koryu" the following characteristics; 1) Able to drive over extremely rugged, irregular terrain, or along routes that are narrow and winding relative to the size of the robot body. 2) Capable of actions allowing the robot to cross over crevices by holding its long, narrow torso rigid. 3) Conversely, capable of moving over soft land surfaces such as swamps and sand by loosening the force between the segments of its long, narrow torso. 4) At the same time as the long, narrow torso is functioning for propulsion, it is also capable of manipulative actions such as coiling around and grabbing an object.

Fig. 1: Articulated body mobile robot Koryu II [1]

492 When the articulate-body mobile robot is realized in an autonomously distributed system in which multiple unitized segment mechanisms are configured and every mechanism has a central pivot, the following characteristics become manifest[1 ]: 5) There is good transportability to the work site because the segments can be separated from the torso. 6) Mechanical systems that are extremely redundant are also highly reliable. For example, improperly functioning segments may be removed. 7) The torso may be separated into several groups, and each of those groups can function as an autonomously distributed group robot. Among these characteristics of snake-like robots, this research focuses on the characteristics of 7) in particular, and we propose a new mobile and manipulative robot system called "Gunryu (abbreviated GR hereafter)" which has mechanisms that differ from conventional systems. As shown in Figure 2, GR is a group robot concept characterized by being composed of autonomous mobile bodies equipped with manipulators, which acts not only as an operational device just as usual manipulator but also as a connecting device to grab hold of other robot segments.

Fig. 2: Aristic view of the future application of Gunryu The idea of a robot that carries out its objective functions not as a single unit but as a group is an old one. Several demonstrations have been done in Japan including the one of "three-eyed group robots", developed by Mr. M. Moil, Professor Emeritus of Tokyo Institute of Technology, which was composed of a group of robots that operate autonomously while mutually communicating with light sensors. The series of research of the author [1] also motivated to realize such a group robots. However, the authors could not design proper robot mechanism with such a function of Item 7) until now. It is because the connecting mechanism which has to be installed on each units having automatic function to securely connect under a variety of conceivable environmental conditions could not be designed compact and lightweight. The authors were lead to believe that a group robot which could be detached automatically but incapable of performing other functions was their limitation, though being aware of innovations such as the CEBOT[3].

493 However, the authors found out that this kind of limitation of the group robot would be diminished if each unit robots were equipped with manipulator as an indispensable device for the robot, and that manipulator could be used also as a connecting mechanism. In this case, it is regarded that the robots have a connecting mechanism with active multiple degrees of freedoms, without the installation of specialized connecting-mechanism. Hence, a group robot system based on this concept is considered highly useful.

2.

The Basic Characteristics and Fields of Application of GR

"Gunryu (GR)" is characterized by the cooperative movement of connected autonomous mobile bodies, and this has the following effects: 1) Consider a situation of moving over rough terrain of a sinusoidal shape as in Figure 3. Single robot will consume potential energy HW in every cycle of up and down motion provided that the amplitude of the terrain is H and weight of the robot is W. However, with a connected body, the up and down motion of the center of gravity as a whole is reduced. In the extreme case, when the length of body accords with cycle length of the terrain, it does not go up and down at all. That is, a mechanically connected robot group mutually transfers potential energy, and as a result has high energy efficiency.

Fig. 3: Undulation smoothing effect of GR

Fig. 4: Group effect of GR over a cliff

Fig. 5 : Group effect of GR for handling task

494 2) As in Figure 4, a single unit by itself will fall off into the crevice, but if it is a connected body, falling can be prevented. The crevice may also be crossed by holding up the front robot. 3) In addition, as the connecting device is versatile manipulator, GR can accomplish miscellaneous tasks such as to transport broken down segments and heavy loads as in Figure 5. It is assumed that the primary field of application for GR will be a planetary exploration robot. This kind of robot will be deployed after landing on the planet, and will carry out the work of gathering rocks and of producing maps, etc. In this situation, effective use can be made of the characteristics of GR by which the segments can move autonomously. On the other hand, if the environment to be traversed has much rough terrain with many large contours and crevices, the high adaptability to the terrain of the connected GR can be used in order to move around in this kind of environment. Additionally, the fact that this is a redundant system with high reliability can be considered an optimum characteristic for a planetary exploration robot.

3.

The Stair Climbing Characteristics of GR

Among several expected advantages of GR, the typical advantage of GR to easily climb a slope will be discussed. Let us consider the slope climbing motion of GR as shown in Figure 6, and assume that, F is the force that works between the segments, g is the gravity coefficient, /z is the friction coefficient between the segment and ground, m is the mass of one segment part, and ~b is the angle of inclination by which the rear segment looks up at the front segment.

Fig. 6: Model of GR-I for kinematic analysis Then the formula for the conditions which each segment should fulfill when climbing the slope will be expressed as: /~ {mgcos 0 +Fsin( 0 - ~b)}+Fcos( 0 - ~b) = mg sin 0 /~ (mg + Fsin ~b) > Fcos ~b

(1) (2)

At this time, the segment in front is climbing an slope with an actual gradient 0 , while receiving the upward pushing force F from the rear segment, the front segment thus can be interpreted to be climbing a more gentle slope. The virtual angle of inclination that the front

495 segment experiences is expressed as 0 1 : t a n -~/~

(3)

Meanwhile, the angle of inclination experienced by the rear segment derived from Figure 7 is

02

"-

1[ Fcos ~b tan- lmg + Fsin 0

(4)

At the stage when the front segment has made it up and the rear segment comes to the slope, the rear segment at this time can climb the slope by receiving the pulling force F from the front segment. By combining these two motions, the virtual slope which is felt for the two segments can be gentle as in Figure 8.

Fig. 7: Force balance of GR on slope

4.

Fig. 8: A step terrain and virtual slope of GR

Composition of Gunryu 1 (GR-I)

In order to clarify the characteristics of the proposed GR, "Gunryu 1 (GR-I)" was made as a prototype to model the primary mechanisms (Figure 9).

Fig. 9: Over view of constructed model GR-I

496 The specifications of GR-I are indicated in Table 1. Because the manufacture of the manipulator is not currently complete, GR-I was configured by connecting two mobile units with a passive arm, and was made for the purpose of studying the principal characteristics of connected motion. Table 1: Specification of GR-I TOTAL LENGTH TOTAL WIDTH HEIGHT TOTAL HEIGHT WEIGHT SPEED ACTUATOR PROPULSIVE MOTION ELECTRIC SOURCE

520 mm 400 mm 175 mm 490 mm 13 kg Max 2.2 km/h DC servo motor 33 W x 2 AC 100 v

As indicated in Figure 9, GR-I mobile body is a crawler type of vehicle. The crawler drive motor is a structure that is built into the interior part of the wheel as shown in Figure 11, and as shown in Figure 10, there are no unnecessary protuberances outside of the crawler. This design is made to increase the ability of GR-I to travel across irregular terrain. The right and left crawlers of GR-I have a structure such that they are not fixed to the vehicle body, and can be inclined by adapting the contours of the ground surface. It is designed to incline by keeping the relative angle of the right and left crawlers to the vehicle body always at the same rate and to the opposite directions by means of differential mechanism. Please note that if this adaptation motion is realized by the compliance which is independently installed around the rotational joints supporting fight and left crawlers, the posture of vehicle body can not be fixed rigidly and it is not a suitable base for the mounted manipulator transmitting large force to the next segment. Usual differential mechanism is composed of bevel gear as shown in Figure 12. When gear R rotates with the angle + 0 the body, gear L rotates - 0 relation to the body, and the body is always held at an angle between the two crawlers. In order to lighten the weight of GR-I, this mechanism was achieved by the kind of wire mechanism shown in Figure 13.

Fig. 10: Designed to eliminate unnecessary protuberances around the crawler track

497 The arm of GR-I as shown in Figure 14 is made measured by a potentiometer during manipulations. Neidthart rubber spring and has spring characteristics. between the front and rear vehicles can be measured by

Fig. 11: Driving mechanism GR-I

so that the angle of each joint can be The middle joint is connected with a For this reason, the inner force working measuring the angle of this joint.

Fig. 12: Differential mechanism with bevel gear

Fig. 13: Differential mechanism of GR-I with pulley and wire

Fig. 14: Arm mechanism

498

5.

Driving Experiments with GR-I

By using GR-I, discussed in previous chapter, several running experiments were performed. One of the experiments will be reported here. Figure 15 shows that single GR-I could not climb three step stairs. It tumbled over at the final stage. Figure 16 shows that the connected GR-I could easily climb the same stairs by using the cooperative effect specific of GR. Both are photographs taken of action for 20 seconds.

Fig. 15: Experiment with single unit

Fig. 16: Experiment with two units

499 6.

Conclusions

In this paper, we have introduced a new concept of group robot coined as "Gunryu or GR", which uses manipulator as a connecting device and which has specific characteristics of multiple robot system such as a high terrain adaptability and versatility. The mechanism of a primary model, GR-I, which is comprised of two prototype crawler robots and connecting dummy arm is then introduced and experimental results are reported. The study of GR has just started. There are many interesting research subjects to be pursued in the research project of GR; indeed, its mechanism, sensor, signal communication, and cooperative control can be further developed. These aspects are to be improved in the near future. Grateful acknowledgement is made to Dr. Ogimoto, Aerospace group, Space System Department of Kawasaki Heavy Industry, Ltd. for their cooperation in realizing this research project. References

[1] Hirose,S., Biologically Inspired Robots (Snake-like Locomotor and Manipulator), Oxford University Press. (1993) [2] Hirose, S. and Morishima, A.: Design and Control of a Mobile Robot With an Articulated Body, International Journal of Robotics Research, Vol. 9, No. 2, April, 99/114 (1990) [3] Fukuda,T.,Ueyama, T.,Kawauchi, Y., and Arai, F. Concept of Cellular Robot System (CEBOT) and Basic Strategies for its Realization, Computers Elect. Engng Vol. 18, No. 1, pp.11-39, (1992)

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

501

Unified Control for Dynamic Cooperative Manipulation Kazuhiro Kosuge a, Hidehiro Yoshida b, Toshio Fukuda b Masaru Sakai c, Kiyoshi Kanitani c, Kazuo Hariki c aDepartment of Mechanical Engneering, Tohoku University, Aza Aoba, Aramaki, Aoba-ku, Sendai 980, Japan bDepartment of Mechano-Informatics and Systems, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-01, Japan CRobot Development Department, Nachi-Fujikoshi Corporation I shigane 20, Toyama 930, Japan

ABSTRACT:

This paper proposes a unified coordinated motion control algorithm of manipulators, which is applicable to both manipulation of an object and a parts-mating task. First, we consider the manipulation of an object by impedance controlled manipulators; the impedance of each arm is designed so that the apparent impedance of the manipulated object is specified. Then we consider the parts-mating problem using the same control algorithm proposed for the manipulation of a single object. We propose a method to design the impedance of each arm so that the relative motion of two manipulators has the dynamics of the RCC (Remote Center Compliance). The experimental results using two industrial robots will illustrate the validity of the proposed control system. 1. I N T R O D U C T I O N Many control algorithms have been proposed for the coordinated motion control of manipulators to realize dexterous manipulation like humans[3]-[16]. Fig. 1 shows typical examples of tasks executed by multiple manipulators. Fig. 1 (a) shows the manipulation of an object. The use of multiple manipulators would make it possible to handle a heavy object, which a single manipulator cannot manipulate by itself. Fig. 1 (b) shows an example of a task using a tool. Grasping a tool by manipulators could increase the rigidity of the system, because of its closed kinematics structure. Fig. 1 (c) shows an assembling task of two parts, a bolt and a nut. The use of two or more arms in coordination will make the dexterous assembly tasks possible, for which relative motion dynamics between two parts have to be controlled. Much of the research concerned with the multiple manipulators have placed special emphasis on the manipulation of a single object like the examples of fig. 1 (a) and (b), This is partially because the control of the closed kinematic structure attracted much attention from researchers involved in the control of a manipulator with an open kinematic structure. In this paper, we are going to propose a control algorithm, which is applicable to both manipulation of an object and parts-mating. First, we consider the manipulation of an object by impedance controlled manipulators; the impedance of each arm is designed so that the apparent

502 impedance of the manipulated object is specified. Based on the same control algorithm, we consider the parts-mating problem. We propose a method to design the impedance of each arm so that the relative motion of two manipulators has the dynamics of the RCC (Remote Center Compliance). The proposed control algorithm is applied to two industrial robots, each of which has six degrees of freedom. The experimental results will illustrate the algorithm.

Fig. 1 Various Manipulation 2. C O N T R O L P R O B L E M O F M U L T I P L E M A N I P U L A T O R S

When we manipulate one object with multiple manipulators as shown in fig.2, we have to consider the following problems: (1) How to control the pose of an object (2) How to regulate the intemal force applied to the object (3) How to control the compliant motion of an object to the external force applied to the object (4) How to distribute the load among manipulators (5) How to share the external force among manipulators When two manipulators assemble two parts together, we also have to control the relative dynamic motion between the assembled parts; the dynamic motion of the RCC is one of desirable dynamics for parts-mating like a peg-in-hole problem. In this paper, we consider the problems from (1) through (5), and the relative dynamic motion between two manipulators.

503

Fig.2 Manipulation of an Object 3. D E S I G N OF I M P E D A N C E F O R O B J E C T M A N I P U L A T I O N

We have proposed a coordinated motion control algorithm of robot arms manipulating an object[3]. In the algorithm, the object is supported by impedance controlled manipulators around the desired trajectory of their end effectors as shown in fig.3. In this section, we are going to reconsider the algorithm and propose a coordinated motion control algorithm of manipulators manipulating an object so that we can control the apparent mechanical impedance of the manipulated object Let us consider a control problem of an apparent mechanical impedance of a manipulated object by multiple manipulators. We consider a problem to realize a mechanical impedance of the object around its desired compliance center as shown in fig. 4. We assume that each manipulator supports the object compliantly with its mechanical impedance around a nominal force required for the manipulation of the object keeping a desired internal force applied to the object. Let the mechanical impedance of the object be expressed by the following equation: MA)? + DA/c + KAx = Fe~t

(1)

where Ax is the deviation of the manipulated object from the desired trajectory of the compliance center of the object, Fe,~ is the external force applied to the object around its desired compliance center, and M, D, K is 6 x 6 positive definite matrices. We assume that each arm grasps the object firmly and no relative motion exists between the object and each arm. To realize the impedance of the manipulated object, we consider the case in which each arm is impedance controlled around the desired compliance center of the object as shown in fig. 5.

504

Fig.3 Compliant Support of an Object Let the mechanical impedance of the i-th arm around the desired compliance center of the object be expressed by

M~

+ D~, +/q~

= Fo

(i = 1,2,.-.,n)

(2)

where Fa is the external force applied to the i-th manipulator around the desired compliance center of the object and Axi is the deviation of the compliance center of the i-th manipulator. From the assumption that there is no relative motion between the endeffector of each manipulator and the object, we have Ax

= Ax i

(i = 1,2,--.,n)

(3)

Let us specify the external force Fat shared by each arm as follows: Fa = PiFe~,

(i = 1,2,.-.,n)

(4)

where Pi > 0 . Concerned with the external force applied to the object, the following relation holds:

(5)

F~ = re~, i=1

and we have n

~p~ = 1

(6)

i=1

From these equations, we obtain the mechanical impedance of each arm, which realizes the desired mechanical impedance of the manipulated object expressed by eq. (1), as follows: Mi =piM D~ =p~D

(i = 1,2,...,n) (i = 1,2,..-,n)

(7) (8)

505 K i = piK

(i = 1,2,...,n)

(9)

Note that the external force applied to each arm is calculated using the force applied to each manipulator and the nominal force required for the manipulation of the object similar to the result shown in [3]. 4. DESIGN OF I M P E D A N C E FOR RELATIVE M O T I O N D Y N A M I C S

As shown in fig.5, RCC is well known device for assembly tasks of two parts. In general, one part is secured by a fixture, and another part is fixed to the RCC device attached to a manipulator, so that RCC generates a fine motion required for the assembly task. Instead of using RCC device and the fixture, we are going to consider the problem to realize a dynamic motion of the RCC between two parts manipulated by two manipulators. Let the dynamics of RCC be expressed by the following equation: MRCc ZYiC+ DRCc Air + KRCc Zkr = F

(10)

where MRCc , DRc C, KRCC a r e 6 x 6 positive definite matrices. Suppose that each manipulator has the same structure of impedance around the compliance center of the desired impedance of the manipulated object. Let us represent the impedance of each arm by the following equations.

Fig.4 Desired Object Compliance M I ~ 1 "[- D1AJq +

K1Ax 1 = F 1

M2ASc2 + D2AJc2 + K z A x z = F 2

(11) (12)

When two parts are put together with these impedance, the following relation holds; F~=F,

F2 = - F

And the relative compliant motion Ax is expressed by

(13)

506

Fig.5 i-th Arm Impedance around Desired Compliance Center of Object Ax = Ax 1 - Ax2

(14)

From eq. (11),(12),(13),(14) we have ( M 1 - M 2 ) A . ~ 1 - k - ( D 1 - D 2 ) z ~ 1 -k- (K 1 -

K2)Ax1 + M2z~ +

DzzSdc + K z A x = 2 F

(15)

By selectinng the parameters of eq. (11) and (12) as follows, the eq.(15) has the same dynamics as that of eq. (10). M 1 = M 2 = 2Mgc c

(16)

D 1 = D2 = 2Dnc c

(17) (18)

K 1 = K 2 = 2Kncc

Fig.6 Characteristic of RCC

507

Fig.7 Compliance of Each Arm This result means that the RCC dynamics is realized by the relative compliant motion between two parts, which are impedance controlled around the desired compliance center fixed a part. After controlling the dual manipulators using parameters derived from (18)~ (20), the apparent impedance of the assembled object is as follows.

D = 2D~ = 4 D R c c

(21) (22)

K = 2K i = 4KRc c

(23)

M:

2M i = 4MRc c

5. E X P E R I M E N T S

We applied the proposed algorithm to the experimental dual arm system composed of industrial manipulators, each of which has six degrees of freedom (Nachi, 7603-APJ). VxWorks was used to implement the control algorithm. Sampling rate was 800[Hz]. Fig.8 and 9 show the trajectory and force along x, y, and z axis respectively when the manipulators assembled two parts. Fig. 10, 11, 12 and 13 show external force applied to each manipulator when external force is applied to the object manipulated by two arms in coordination. Fig. 10 and 11 show the trajectory and Euler angles of each arm. Fig. 12 and 13 show the distribution of external force when we selected the parameters set the rate of distribution Pl = 1/3 and Pz = 2/3. We can see that the external force was distributed among manipulators as we specified. Fig. 14 shows the parts-mating experiments by dual manipulators. 0.76

10

Y~

5

I

N

0.75

0

-5 [0.74

t

0

I

5

i Time

I

[s]

10

i

-10

0

~

!

5

~

Time

!

[s]

10

15

508 0.65

'~

10

.g

0.64

5

~ o 0.63 !=, 0.62

o

|

.

Time [s]

i

10

.

0

5

Time [s]

10

15

1.11

-g NI

,~

I=,,

1.10

1.09

0

,

I

5

•

T i m e [s]

I

I0

-10

i

0

,

I

5

i

Time Is]

I

i

i

,

|

I

10

15

Fig.8 Left Trajectory and Force along x, y, z Axis 0.76

g

5

l~

M

,~r

0.75

o

a., ~

0 74

0

5

T i m e [s]

10

-1o 0

15

,

i

5

,

Time [s]

10

15

0.65

'~

.g

0.64

5 0

0.63

0"620

|

5

,

Time Is]

I

10

,

-10 0

,

I

5

T i m e [s]

10

15

509 1.12

[

E

1.11

~

5

l[

N

0

1.10

1.09

-5

-10

0

Time [s]

,

0

J

,

Time [s]

,~

,

Fig.9 Right Trajectory and Force along x, y, z Axis 0.80

0.75 . . . . . . . . . . . . . . . M

'

o. o

Arm 1 Arm2

i. -'~

V

0.60

. 0

I

i

5

I

.

l0

.,

-30 I 0

15

Time [s]

Fig. 10 Trajectory of Both Arm

-

I

1-~oI

0.65

, 5

Ao, A~2

Time [s]

',,

'f , 10

15

Fig. 12 External Force of Each Arm

0.2 ~

|

0.1 I

N 0.0

I

-0.1

~

-0.2

|

|

3

!II Arm 2

-0.3

I

0

5

i

"nme [s]

~

I

10

Fig. 11 Trajectory of Euler Angle

II

II

2 Arm 1

l,

Lfl.... -1[ 0

,

, 5

,

Time [ s]

, 10

,

Fig. 13 External Moment of Each Arm

510

Fig. 14 Parts-mating Experiment CONCLUSIONS

In this paper, we proposed a unified coordinated motion control algorithm of manipulators based on the impedance control of each arm around a unique fixed point to the manipulated object. The algorithm is applicable to both the manipulation of an object and parts-mating. Experimental results illustrated the proposed control algorithm. REFERENCES

[1] D.E.Whitney, "Quasi-Static Assembly of Compliantly Supported Rigid Parts," ASME, J.DSMC, 104, pp. 65-77, 1982. [2] N. Hogan, "Impedance Control Part 1-Part3, "Trans. of ASME, Journal of Dynamics Systems, Measurement, and Control, Vol. 107, pp. 1-24, 1985. [3] M.Koga, K.Kosuge, K.Furuta, K.Nosaki, "Coordinated Motion Control of Robot Arms Based on the Virtual Internal Model, IEEE Trans. on Robotics and Automation, Vol.8, No. 1, pp.77-85, 1992. [4] E.Nakano, S.Ozaki, T.Ishida, I.Kato, "Cooperational Control of the Anthropomorphous Manipulator MELARM", Proc. of 4th International Symopsium on Industrial Robots, pp.251260, 1974. [5] S.Hayati, "Hybrid Position/Force Control of Multiple Arm Cooperating Robots", Proc. of 1986 IEEE International Conference on Robotics and Automation, pp.89-92, 1986. [6] T.J.Tarn, A.K.Bejczy, X.Y.Yun, "Desgin of Dynamics Control of Two Coordinated Robots in Motion", Proc. 24the IEEE Conference on Decision and Control, pp. 1761-1765. [7] Y.Nakamura, K.Nagai, T.Yoshikawa, "Dynamics and Stability in Coordination of Multiple Robotic Mechanism", Intemational journal of Robotics Resarch, Vol.8, No.2, pp.4461, 1989. [8] R. K. Kankanranta, H. N. Koivo, "Dynamics and Simulation of Compliant Motion of a Manipulator, "IEEE Journal of Robotics and Automation, Vol. 4, No. 2, pp. 163-173,1988.

511 [9] Ping Hsu, "Control of Multi-manipulator System (Trajectory Tracking, Load Distribution, Internal Force Controland Decentralized Architecture.),"IEEE Int. Conf. on Robotics and Automation, pp. 1234-1239,1989 [10] D.Williams, O.Khatib, "Characterization of Internal Forces in Multi-grasp Manipulation", Proc. of ISMCR'92, pp. 731-738, Tsukuba, Japan, 1992. [11] P.Dauchecz, X.Delebarre, Y.Bouffard, and E.Degoulange, "Task Modeling and Force Control for a Two-arm Robot", Proc. IEEE International Conference on Robotics and Automation. pp.1702-1707, 1991. [12] Y.F. Zheng and J.Y.S Luh. "Optimal Load Distribution for Two Industrial Robots Handling a Single Object." Proc. IEEE Int. Conf. on Robotics and Automation, pp344-349, 1988. [13] M. Uchiyama and P. Dauchez "A Symmetric Hybrid Position/Force Control Scheme for the Coodination of Two Robots." Proc. IEEE Int. Conf. on Robotics and Automation, pp350356, 1988. [14] J.Wen and K.Kreutz. "Motion and Force Control for Multiple Cooperative Manipulators." Proc. IEEE Int. Conf. on Robotics and Automation, pp.1246-1251, 1989. [15] Z. Li, T. Tam, and A.K. Bejczy, "Dynamic Workspace Analysis of Multiple Cooperation Robot Arms", IEEE Trans. on Robotics and Automation, Vol.7 No.5, pp.589-596, 1991. [16] Stanly A. Schneider and Robert H. Cannon, Jr.,"Object Impedance Control for Cooperative Manipulation: Theory and Experimental Results" IEEE Trans. on Robotics and Automaion, Vol.8 No.3, pp.383-394, 1992.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

513

Comparative study of adaptive controllers for a pneumatic driven leg M. GUIHARD Laboratoire de Robotique de Paris 10/12 Avenue de l'Europe, 78140 Velizy, France In this paper, we make cl comparcltive study between m,o cldaptive control stategies applied to a leg of a quadruped robot. This leg is composed of two rigid links with pneumatic actuators driving rotational joints. In order to enable non-linear control approaches, a model of this system is built. The control design is based on one hand on MRAS and on the other hand on the passive system theory which ensure the asymptotic stability of the whole. Adaptation laws are proposed, grounded either on the number of operations keeping in mind real time implementation facilities, or on the robustness toward disturbances. These control laws are compared using simulation results.

1. I N T R O D U C T I O N Control of legged robot requires good performances when dealing with high .speed o'cyectories. The platform must actually be stable and the controller robust enough towards disturbances. Our robot, called RALPHY, is a quadruped robot which architecture of control is based on the decomposition into four subsets made of one leg and a part of the platform. A supervisor manages the control of the platform [1] and the general gait [2]. The low level has then to be highly reliable to allow dynamic gaits.

Figure 1. One leg of RALPHY

514 Each leg has two rotationnal joints actuated by four-way servovalves. Differential pressure and position/speed measurements are available through sensors. The mathematical dynamic model of one actuated leg is highly non-linear and time-variant, due both to the mechanical coupling effects and to the compressibility of air. Conventional controllers [3][4] are not appropriate, because they are not robust and fast enough to allow dynamic gaits[5]. To overcome these limitations, two different approaches have emerged. On the one hand, computed torque methods have been suggested in robotic as an effective method of using the system non-linear model in the control law. They provide good performances whatever the robot configuration and, beyond some limits, the desired speed are. Moreover, adaptive laws allow to tackle with parameter variations [6][7]. This kind of control seems however computationally too complex to be implemented in real time on our current system. On the other hand, works on adaptive controllers which do not include the non-linear mathematical model in the control law, have been developped [8][9]. These controllers provide correct results for slow motions and give a good compromise between complexity and accuracy. The aim of this paper is to propose adaptive cono'ollers for one actuated leg,, using each of the two main approaches previously mentionned. Its layout is as follows. Section 2 presents the dynamic equations of the leg with pneumatic actuators. In section 3, an adaptive computed torque controller is developped. Section 4 is devoted to a decentralized adaptive control laws which keeps the number of operation low. Both of these approaches use a reference model [ 10] and passive system theory to ensure the asymptotic stabilioz The first one is a non-linear adaptive control law when the second one is a linear adaptive controller. Simulation t~sults are presented in section 5. Section 6 contains some conclusions and further investigations.

2. ACTUATED LEG MODEL 2.1. Pneumatic Actuator model

Each joint is actuated by an electropneumatic servovalve with double effects linear jacks. It is made up of a torque motor and of two pneumatic stages. The valve controls the air flow which is converted in pressure supply for the two chambers of the cylinder. Figure 2 illustrates the servovalve and its flow stage. The determination of the dynamic actuators model [11][ 12] is based on the study of the flow stage supplied with fixed pressure Pa. Some fundamental assumptions must be taken into account when dealing with pneumatic systems. First, the fluid is considered as an ideal gas. Second, potential and kinetic energy within the fluid are neglected. Third, the leakage between the two piston chambers are also neglected. Let us define the following parameters: Pp, Pn " pressures in chambers P and N respectively, 7 - 1.405 9ratio of specific heat, Vp, V n. volumes in chambers P, N respecctively, r = 286J.Kg -1.K -1 :ideal gas constant, T : temperature in Kelvin, S : cross-section area of the piston, i : torque motor input current, y : displacement of the piston.

515

Figure 2. Servovalve scheme

Figure 3. Flow stage of a servovalve

Let mij denotes the mass flow rate in restriction Aij (with/j going from ap till an: refer to figure 3). From Liu and Bobrow [6], expressions of energy conservation and gas state equations lead to the following dynamic model" dPp

- -

dt

dPn . . dt

=

.

7 Pp dVp Vp

dt

7 Pn dVn . . V n dt

+

+

r T7 Vp rT7 Vn

(rfiap - rflpr) (1) ( m a n - m Hr )

Moreover, the relation between the volumes flow rates and the speed of the piston is: (dVp / dt) - - ( d V n / dt) = S ~, Vp - V - V n - S y

(2)

The dynamic behavior of the piston and of the external load can be described by the equation: m(d~, / dt) - S(Pp - Pn ) - V ( y , ) ) - F r

(3)

V (y, ~,) is a viscous and Coulomb friction non-linear flmction. m is the mass of the cable and of the piston that is negligible compared with the segment inertia. -

-

The exact actuator model is then defined by equations (1) to (3). The difficulty lies in the expression of the mass flow rate going through an aperture. It can be expressed in function of the upstream Pam and downstream Pav pressures. Thermodynamic studies [11] proved that it depends on the ratio Pav/Pam. Two kinds of ratings namely the subsonic and the supersonic ones have to be differentiated. In addition, it depends on the aperture shape and we do not have access to all the data required to build an exact model. Some

516 assumptions are there/ore taken considering the displacement of the piston due to small slide variations around its central position and the pneumatic system symmetric [14][15]. This allows the linearization of the mass flow rate which can be considered proportionnal both to the current i and to the chamber pressure as follows: " 7 rT .APp) PPi APp= (Ci.i-C p -~y Vp i Vp i

S(q

"

.

(4) ~

APn = 7' r T ( _ c i . i _ C p . A P n ) + Pn___j_i7 S ( q Vn i Vn i with:

~ p = pp-pp, ;Ave = vp - vp,;

(5)

APn = Pn - Pn, ;AV n = Vn - Vn, ; C i -(0

rhap / 69 i)] 0 - ( 0 riapr/69 i)[ 0

Cp - ( 0 ,ha, ' / 0 i)lo - ( 3 rianr / 63 i)[ o The index i means the equilibrium conditions. This simplified model is then used to build the control design [17]. Its parameter are timevariant depending on changing data as the ambient temperature or pressure. 2.2. One actuated leg model The dynamic equation of one leg according to figure 1 can be written as follows [13 ].

(6)

M ( q ) ~ + C(q,~l) ~1+ G ( q ) = z-r

q,q,~ denotes respectively the 2"1 vectors of joint position, speed and acceleration. M(q) is the 2*2 generalized inertia matrix, G(q) is the 2"1 vector of gravitationnal forces and C(q,~l)q is the 2"1 vector of centripetal and Coriolis forces. The load xr is the 2"1 vector of torques applied to the links. Force/torque and joint/displacement are linked by the radius of the pulley l as follows: "cr = t. F r

and

y = ~. q

(7)

This allows to connect the link (leg) equations to the actuators. We can gather all the dynamic equations as follows: M(q) q + C ' ( q , ~ l ) ~ I + G ' ( q ) - S g ( A P p - A P

n)

517

D

A P p - . y r T. ( c i , .i Cp.APp) . Vp i

PPi y S t!q Vp i

pn.

r y~ T ( - c i . i - Y Cp.APn)+ A Pn = Vn i

'q Vn i

(8)

S~q

with:

C'(q,cl) = C(q, t:-l)+/) B(q, t:l)

(9)

G'(q) = G ( q ) - S f(Pp, - Pn, )

We can underline that, at the initial conditions, we must have G ( q ) - S t (Ppi -Pni): the initial resistive torque is only due to gravitationnal effects. The available data at each joint are the position and the speed given by encoders. Piezoresistive sensors measure the differential pressure between the chambers (APp - APn). From these informations, we rewrite (8) in function of sensor informations: M(q) ~[+C'(q,(l) t] + G ' ( q ) : r=Ji-B with

r-E

r

q

r =SgZ Z = (APp-APn)

(10a) (lOb) (11)

where Z denotes the differential pressure and J, B, E are (2*2) diagonal parameters matrices deduced from (8). Equations (10) represents the compact complete dynamic model of one actuated leg used to design adaptive control laws.

3. NON-LINEAR ADAPTIVE CONTROL DESIGN This section is devoted to the design of an appropriate control which does not suffer from bandwith limitation as the conventionnal state feedback controller and ensure the global asymptotic stability in closed loop. The particularity of this system (10) is that it is composed of two coupled systems and both of them obey the passivity property [6][16]. Moreover, this parametrization preserves the rigid robot properties for the first equation which has as input the measurable variable "c. The outputs of the first equation (10a) are the link positions and velocities (q and q). The second equation (10b) has as outputs the joint torques and their derivatives ~ and ?. In what follows and for stability analysis purposes, we use the following properties: - PI: The inertia matrix M(q) are SPD and its inverse is uniformly bounded.

518 - P2: With an appropriate definition of C(q,~l) the matrix A(q,dl), defined as

follows, is skew symmetric: (12)

A(q,(l) = 19I(q)- 2 C(q,/l)

- P3: The dynamic equations (10) are linear in a set of parameter and can be written as: r = Wl |

i=

~2 02

(13)

where ~i denote the information matrices and |

the associated parameters.

The control objective is that the leg tracks a set of given link positions and velocities with an error rejection. The desired link trajectories will be noted qd (t), /1d (t) and t[d (t). The tracking errors and the reference velocity model are defined by: e_qd_q,

6=(ld_q

ilr -~1 d + A e ,

, er=r

d _/-

s=6+Ae=/lr-/l

(14)

We propose in this section an adaptive controller based on passivity of the equivalent feedback system. This method can be viewed as a system energy based control. The control task is decomposed in two objectives, one for computation of an optimal torque 1:d and the second for 'traking the desired trajectory. Let us consider the following adaptive control laws: r d - l~l(q) [Jr +l~(q,t']) (lr +(~(q)+Fv s J i - B r-1~/1 - r "d + G s

(15a) (15b)

where 1Vl(q), (~(q,(l), G(q), J, 13 and I~ are the estimates of M(q), C(q,~l), G(q), J, B and E. G and F v are positive diagonal gain matrices. We can write the closed loop equations from (10) and (15): M(q) g+ C'(q,(-l) s - e r - F v s - ~ l ( q , / l , / l r , q r ) O1 6 r + B e r - - G s+ ~2 (i,

rd,q a ) ~)2

represents the parameter error O - O - |

(16a) (16b) (17)

The time-variant parameters are adapted on line by Integral type Parameters Adaptation Algorithms (PAA): 41 - F 1 ~ ? S ,

~)2--F2

( G ' I ~ 2 ) T er

(18)

519 The Equivalent Feedback System represented in figure 4 will be passive if its four blocks are passive [6], [16]. All blocks verifiy Popov's inequality as shown in appendix A. So we can conclude to the passivity property of each of them.

Figure 4. Non-linear closed loop diagram The stability analysis can be conducted equivalently using Lyapunov stability theory. In the case of known parameters, the close loop scheme (Figure 4) is modified by removing the two PAA blocks. For the stability proof, let us consider the following Lyapunov candidate function:

V-(l/2) sTMs+(1/2) eye r r +(1/2) OIF1101 +(1/2) O2F2-102

(19)

V>0 This function depends on the inertia matrix, the reference velocity model, the torque error and the varying parameters. It represents a part of the system energy. Its time derivative evaluated along the closed loop system trajectories is: ~ r _ _ sT Fv s _ e Tr B e r - s T ( G e r - e r )

(20)

Setting G '= G - I, and recalling that Fv, B, G' are positive diagonal matrices, equation (20) can be rewritten as: V--~(Fv, i=l

s2i +Bi e2r +G'i s T e r , )

(21)

The index i denotes the ith component of the vector or the ith diagonal term of the matrix. Reshaped, equation (21) is equivalent to:

520

V--~F i=l

v (s i + ( 1 / 2 ) Fvi- I G '

i er~

2 ) - ~ ( B i e 2r - ( 1 / 4 ) F - I G ' i=l

i

i

i

e 2r ) i

A sufficient condition on feedback gains is G i < 2 x/Bi Fv, + 1 to get 9 < 0.

(22) (23)

These two developments allow us to conclude that the signals s a n s e z converge asymptotically to zero and that the parameters estimation errors ~)1, ~)2 are bounded. Since A > 0 and s - 6 + A.e, the position error converge asymptotically to zero [6], [16]. The global asymptotic stability for the proposed adaptive controller is then ensured. Another important property concerns the convergence of the estimations to the true parameters which is not necessary with our controller [17]. The objective of the passive feedback controller is hence not the linearization but the preservation of the passivity of the system.

4. DECENTRALIZED ADAPTIVE CONTROL DESIGN Non-linear controllers perform good results. However they mainly depend on the rightness of the mathematical model structure used to describe the dynamic behavior. Moreover, these control laws require many operations which, for some applications, can't reasonably be computed in real time. That's the reasons why a simple adaptive decentralized control seems at this time more appropriate to our problem. Dealing with the compromise of keeping a simple control scheme and taking advantage of the passivity property to ensure the asymptotic stability, we have opted for the decentralized adaptive control proposed by Seraji [9]. First, we display the conventionnal feedback applied to our system and proceed to adjustment of the control gains. Then this control law, with some added signals, is made adaptive inspired by Seraji's approach. 4.1.

Conventional

Feedback

State feedback approaches are usually applied on pneumatic systems [4][5]. The system state can be defined by the state vector (q,t:l ,z). The output "t is obtained from the measure (APp-APn) and the known parameters S and g,. In what follows, we use the same error signals as defined in equation (14). From the way we present the model, we propose a state feedback control in two stages. First, desired position and velocity are used to compute the required supply torque and then, we proceed to a pressure control in order to define the servovalve input current i 9 rd-Kpe

+ Kv6

i-Ge r Kp, Kv and G are 2*2 positive diagonal gain inatrices.

(24)

521

Figure 5. State feedback controller

This control is implemented in discrete time using 8=(~/-1)/Te,the delta operator for approximation of the derivation. (/is the forward operator and T e the sampling period. This can be drawn as in (Figure 5). We use an experimental procedure to determine the control gains. First, we consider as desired trajectory a slow sinusoid for each joint (period = 10s) and put the vector G at a very low value (10 -4) so that no oscillation appear. Then, we adjust the gains Kp, K v, to minimize the position and speed errors, and to limit the current i in the servovalves saturated at i=+20mA._ Finally, we rise the vector gain G which has a stabilizing effect on the system. We repeat these last two steps till we obtain the vector gain G maximum without oscillations and small tracking errors. We repeat the procedure for faster desired trajectories to optimize the control gains and bandwith. Simulation results are presented in section 5. 4.2. Decentralized adaptive control design The method proposed by Seraji [9], is based on the decomposition of the model into n subsystems each of them representing the dynamic behaviour of one joint. These subsystems are interconnected by the Coriolis, centrifugal and gravity terms. The controller, said decentralized, is then composed of n control laws acting on each subsystem. The aim of this section is to apply the method to our system which includes the actuator dynamic. First, the model is rewritten as follows: Mi(q)qi +di(q,C'l,q)= r i ii-Ji

zi+Bi ri-E

/li

(2s)

with

J-1/Ji, B-Bi/Ji, ' i E'i-Ei/Ji d i represents the gravity, Coriolis, centrifugal torques for the ith joint and also the inertial effects from the other joints. As J', B' and E' are diagonal matrices (there are no coupling effects between the actuators), J'i, B'i and E' i are the diagonal terms. The independant joint controllers are built in two stages: one concerning the leg dynamic and the other the actuator dynamic. They are designed in order to ensure the asymptotic

522 convergence of the tracking position and speed errors to zero. This leads to the following control laws drawn in figure 6. r d _ f + K p e + K v 6 + k I ~[d i-Ger+F6+k

(26)

2 (1d +k 3 z-d +k 4 ~.d

The index i is dropped and the gain variations coming from the "adaptive laws" block in figure 6 not represented for convenience.

Figure 6.Decentralized control diagram e, ~, e~ are the tracking errors defined by (14). All the gains f, Kp, Kv, F, G, k 1, k2, k3 and k 4 are updated on line according to adaptive laws. These laws are determined using Lyapunov method. The definition of two adaptive errors emerge from the demonstration proposed in appendix B. rl=Ler

(27)

and r 3 = P I e + p 2 6

The adaptive laws appear to be, after simplification, integral type adaptation algorithms as determined for the non-linear controller: f ' - 6 ( p I e+P2 6),

I(p - f l I(Pl e+P2 6).e,

I~ v - / 3 2 ( P l e+P2 6) 6, l~3-g4(Ler)

rd,

G=g2(Ler)

er

l ~ l - f l 3 ( P l e+P2 6) ~d, 1~2 _ g l ( L e r )

1~4 - g S ( L e r )

i-d,

l:'-g3(Ler) 6

qd

(28)

523 Some important remarks have to be mentionned concerning the stability proof developped in appendix B. First, it is assumed that the terms M and d vary slowly in comparison with the controller gains. Bad behaviour is then to be expected when dealing with high velocities. Moreover, the stability proof is based on an error model considered not disturbed and with null initial conditions. This implies that the desired trajectories must be taken with initial errors on speed and acceleration to zero. The asymptotic stability demonstrated is therefore a local stabiliO~. This implies that the desired trajectory must be slow enough to stay in the stable zone Finally, the control laws, even if they are computationally easy to implement, involve many gains to adjust. The number of gains increases with the number of joint. However, in our case the number of gains remains reasonable as we only have two joints. Initial gains are taken to zero except for Kp, K v and G which are initialized at the values found with the conventionnal controller presented in section 4.1. It means that, at the beginning, the adaptive controller will behave as the conventionnal controller. The tracking errors will then decrease owing to the adaptive laws. In spite of all these critical remarks, this controller remains attractive mainly because it do not include the complex dynamic model while ensuring the asymptotic stability. As for the nonlinear adaptive controller presented in section 3, the adaptive laws are based on the observation of the error variations. Convergence to the true parameters is hence not required.

5. C O M P A R I S O N VIA SIMULATION RESULTS Tile performances of these control laws have been studied in simulation. In both cases, the control is computed in discrete time and the system is simulated as continuous system using Simnon software. For each kind of control presented in this section, we use the same desired trajectory for comparison purposes, in tile form q d _ ( j r / 6 ) . ( l - c o s ((o.t)) with c0 = 3 rad.s -1. This trajectory guarantees null velocity t - 0 as advised in section 4.2. The results are presented only for the second joint. We get similar results for the first joint. First, we present the obtained results for position, velocity, tracking errors and the input current i of the servovalve, with corresponding gains and initial conditions. Then we propose a comparative study of the three controllers. 5.1. Simulation results

5.1.1 Conventional feedback As first simulation tests, we apply the conventionnal feedback presented in section 4.1. The sampling period for control is Te = 5 ms. The gains adjusted according to the procedure exposed in this section are: K p = 8 0 , Kv = 0 . 7 a n d G =

2.5 10-2.

524

q2 qd2 (tad)

8.1~2

t.

e2 ~a~

8 0.5

D

--4~.r-~ -o .q~l |

Time (s)

Time (s)

~/12 qd2 (~d,'s)

"'21 e 2 ( t a d / s )

t, O -15. i

-5

Time (s)

Time (s) i 2 (A)

-1. lg-31

-2. ] : - 3 t e

:~ ~ Time (s)

Figure 7. State feedback controller This controller has the advantage to be the less complex one. However, the stabilit), is lost when dealing with high velocities or when the initial conditions do not match the desired ones. Moreover, the procedure taken to adjust gains is quite difficult to implement because of the oscillations which may appear when high gains are used. We are restricted to narrow closed loop bandwith. 5.1.2 Non linear passive controller

We then apply the non linear controller presented in section 3. The sampling period is Te = 3ms and the gains are adjusted to: Fv = 1, A = 20, G = 20, Fj2=I0000, FB2=0.02, FE2=0.9, Frn2-~. 1 and FI2 = 10 -4, with the following initial conditions: J2 = 3000, B 2 = E2= m2 = 12 = 0.

525 q2 qd 2 (rad) |

O

.0~1

9 2 (tad)

--2tt Time (s) 9

Time (s)

.

o

2. q2 qd 2 (tad/s)

e . 2 ] e2 (tad/s) 8.1

_

--'!!

: e

15

--Q

4

s

Time (s)

8

,

te

-0

o

:~

~ d Time (s)

d

lh

,I i 2 ( A )

1---:I! -2.

0

'~ 4 w

i

6

O

110

Time (s) Figure 8. Non linear controller The sampling period must be quite low when the non linear controller is applied. The

acc~tracy, demanded and got, is actually high; this involves many computations. The complexity can be reduced with a study of the magnitude order of each parameter, and the removing of the less significant ones. Anyway, a hardware architecture fast enough to enable on line computations of this passive control is required. We can notice that when the initial conditions do not match with the desired one, the system retrieves quickly the desired trajectory. The same remark is suitable when adding a disturbance. This is due to the global asymptotic stability property peculiar to the passive non linear controller.

5.1.3 Decentralized adaptive control Finally, we present the results got with Seraji's approach applied to our system. The sampling period is Te=5ms. The starting velocity and position errors are zero and initial conditions for gains are the values obtained with the first controller:

526 Kp = 80, K v = 0.7, G = 2.5.10-2 and the other ones set to zero. The adaptive errors are set to. r 1 = e x and r 3 = 3.e + 0.5.6 and weighting gains to: 5 = I0, 131 = 132 = 40, 133 = 2 , gl = 0.1, g2 = 2, g3 = 0.8, g4 = 0.01. q2 qd2 (tad)

e 2 (tad)

t

S .El

Q

-B . 8 t

:~

m

,9

d

d

,b

e

e

4

Time (s)

s

a

ts

Time (st

,sq2 qd 2 (tad/s)

e.z

1

e2 (rad/s)

O.t

8 -i

-o. 1 --0. 6

~,

~t

d

6

~tb

Time (s)

Time (s)

,e

--5. E - 3

d

~

,~ d d tb Time (s) Figure 9. Decentralized controller As for the conventional controller, this controller looses stability at high velocities (o3 = 5 rad/s). This is due to the assumption of slowly time-variant parameter M(q) and d (q, t~, ti), which is no more verified in fast motion ases, and to the needed high gains. Another reason is that when the tracking errors are too high, it makes the system go out of the stable zone (because o f the local stability property). Variations of gains are then not sufficient to track the trajectories. We must underline that the weighting gains are difficult to adjust. Stability may also be lost when a too high gain is applied or when the initial conditions are too different from

527 the desired ones. This controller is hence not advisable for fast motions and not suJ.licieutly rohusl. These seem to be the most significant limits for the decentralized adaptive controller.

5.2. Comparative study Some remarks suitable for all the three controllers presented in this paper, concern the adjustment of gains. We observe that gains related to actuators are lower than the leg dynamic ones This can be justified by the different magnitude order involved in the two stages Weightings related either to gains for the decentralized adaptive controller or to parameters for the non-linear one, are adjusted in order to make their variation move fast enough without too much oscillations. These gains are smaller when the non-linear controller is applied, this emphasizes the fact that passive controller is more robust. Another general remark concerns the desired trajectories. No sufficently exciting property is advised for estimation as it is the case for identification algorithms. The controllers perform well whatever the desired trajectories are, chosen in function of the task to achieve, but the control parameters may be more difficult to adjust, bandwith limitation may appear. General comments emerge from the curve comparison. As prescribed, the decentralized adaptive controller behaves first as the conventional one and then improves its performances by adaptation. When the first period has gone by, the position tracking error of the two adaptive controllers are similar and about four times less than for the conventional one. Regarding the speed errors, we can note that we get better results with the non-linear controller. However, higher magnitude oscillations are observed during the first period. Applied input current is the same order of magnitude for the three controllers after many trials. Let us compare for our system the number of operation to be computed on line. For the non-linear controller, it represents about 134 additions or mutiplications with 4 sine or cosine functions to compute. For the decentralized controller, only 102 basic operations are required, that is to say about 23 % less. This ratio increases tremendously with the number of degree of fieedom and of parameters to estimate. Arguments for choice can be classified in two main points: - 7he t)clssive uoll-liuear adaptive coturo/ler has the properO~ to pelJorm well whatever the .weed rcmxe is. However, as it iuchtdes the complex uon-linear mode/in the coutrol kin's, it requires high computation power to be implemented ill real time. - The decentralized adaptive coulroller is attrative because of its simple control scheme and heuce implementation fitcilities. However, many gains are difficult to adjust and the stahiliO., is threateued when high veh)cities are required. A compromise must therefore be done. Either robustness towards disturbances and velocity range is necessary and then the nonlinear controller is advised, requiring tast computation property. Or the desired velocity is not too high and the hardware architecture less developped, and the decentralized adaptive controller is more appropriate. In our application, we intend to make our robot move fast keeping in mind that the stability of the control law is fundamental. Non-linear control is then advised. On the other hand, we expect to add one joint located at the "hip", which will increase the non-linear complexity.

528 Moreover, our hardware architecture is currently simple due to cost concern. These arguments make us opt for the decentralized controller which keeps the number of operations reasonable. Anyway, lower velocities will have to be taken into account. In conclusion, the two controllers, although having different properties, are interesting depending on the application, and no general choice can be taken.

6. CONCLUSION In this paper, a dynamic model of a pneumatic actuated leg is proposed to design adaptive control laws. Two kind of adaptive control approaches are applied and compared. The stability analysis of these two controllers are grounded on the passivity property of the system. Simulation results reveal the limits of the decentralized adaptive controller which looses stability at high velocities. On the other hand, the non-linear controller, although some non significant parameter may be removed, remains computationally complex. A compromise is then necessary to make a choice depending on the hardware architecture, the number of joints and the velocity range. The procedures developped in this paper are under implementation. For future works, the authors intend to investigate Force/Position control problems applied to pneumatic legged robots.

REFERENCES

1.

H.EI. Gamah, P.F.S. Amaral, J.G. Fontaine and J. Rabit, "Motion Control of an electropneumatic driven legged robot," Workshop on Motion Control, Peruggia, Italy, 1992. 2. Villard, P.Gorce, and J.G. Fontaine, "Study of the dynamic behavior of Ralphy," IEEE/RSJ b~ternational Cot~erence on h ltelligent Robot and 5"ystems, IROS'93, Yokohama Japan, pp 1765-1770, July 1993. 3. C.Villard, P.Gorce, and J.G. Fontaine, "Legged robot: How to solve quasi-dynamic situation with a coordinator lebel," hlternational Co1?ference on Advanced Mechatronics, ICAM'93 Tokyo Japan, pp 610-614, August 1993. 4. S. Scavarda, "Les asservissements 61ectropneumatiques de position," Ed. Hermes, 1989. 5. J. Pu, P.R. Moore, and R.H. Weston, "Digital servo motion control of air motors," htt. Journal Prod. Res, Vol. 29, No. 3, pp 599-618, 1991. 6. I . D . Landau, and R. Horowitz, "Applications of the Passive Systems Approach to the Stability Analysis of Adaptive Controllers for Robot Manipulators," Int. J. qf Adaptive Control and Signal Processing, Vol. 3, 1989. 7. J . J . E . Slotine, and W. Li, " On the Adaptive Control of Robot Manipulators," The International Journal of Robotics Reseamh, Vol. 6, No. 3, pp 49-59, 1987. 8. S. Dubowsky, and D. T. DesForges, " The Application of Model-Referenced Adaptive Control to Robotic Manipulators," Journal of Dynamic Systems, Measurement, and Control, Vol. 101, pp 193-200, September 1979.

529 9.

10. 11. 12. 13. 14. 15. 16. 17.

H. Seraji, " Decentralized Adptive Control of Manipulators: Theory, Simulation, and Experimentation," IEEE Transactions on Robotics and automation, Vol.5, No.2, April 1989. I.D. Landau, "Adaptive control- The model reference approach," Ed. Marcel Dekker, New-York, 1979. J.L. Shearer, "Study of Pneumatic Processes in the Continuous Control of Motion with Compressed Air, I, II," Trans. A,S3/IE, pp 233-249, February. 1956. A. Kellal, "Contribution a l'etude des asservissemants electropneumatiques. Application a la realisation d'un asservissement nuln6rique d'un axe de robot," These, INSA Lyon, Juillet 1987. E. Dombre, and W. Khalil, "Modelisation et commande des robots," Ed. Hermes, 1988. S. Liu, and J.E. Bobrow, "An Analysis of a Pneumatic Servo System and Its Application to a Computer-Controlled Robot," Journal of Dynamic ~S)~stems, Measurements, and Control, Vol. 110, pp 228-235, September 1988. S.J. Lin, and A. Akers, "Dynamic Analysis of a Flapper-Nozzle Valve," .lourmU of Dynamic Systems, Measurements, and Control, Vol. 113, pp 163-166, March 1991. N. K. M'Sirdi, and A. Benallegue "Adaptive and Robust Controller for Robot Manipulators with Flexible Joints", Proc. of ECC, Grenoble, July 2-5 1991. N.K. M'Sirdi, M. Guihard, and J.G. Fontaine, "Identification and control of pneumatic driven robot, " IEEE Systems, man and Cybernetics Confi,mnce, pp 722727, October 1993.

Appendix A In this section, we demonstrate the stability of the non-linear adaptive controller by the passive system theory. A system which verifies the Popov's inequality between its input u(t) and its output y(t) is passive. Popov's inequality is expressed as follows: j-~, yT(t).u(t).dt > - y 02

with 7' 02 < oc

The main inrerest of such systems lays in the tact that the parallel or the feedback connection of two passive systems remains passive. The Equivalent Feedback System represented in figure 4 will be passive if the tour interconnected blocks are passive, as they are in feedback connections. Remark: in the case of known parameters, the closed loop scheme is modified by removing the two feedback PAA blocks. The dynamic robot links block (M.g+(C'+Fv).S) is passive since it verifies Popov's inequality:

s<','

,-.at = S<','

M<:<-,:>. +

with s T . M ( q ) . g - ( 1 / 2 ) d - ~-

+ Fv :>. ].dt .M(q).s-(1/2).

.1Q(q).s

530 I(l' sT. r.dt =

(1/2)I~1-d-d[sT ~ .M(q).s ].dt- (1/2)~'

[1Vl(q) - 2.C(q,/1)]. dt + sT.Fv .s.dt

From property P2 (equation (12)) and noting that Fv > 0, we can write the following inequality: ~t~'sT. r.dt _> (I/2).sT(0).M(q(0)).s (0)_>-702 To preserve the passivity property, the PAA1, which is in feedback connection with the previous dynamic block, has to verify Popov's inequality: I(t)' ST ( - V 1 0 1 ) dt _>- 7 o2

If some parameters are unknown or varying during task operation, we consider the Integral type Parameter Adaptation Algorithm (PAA): ~1 - -FI- W?. s

so that: d[ ] 2 I(l' -sT. TI "(7)1.at-I~ ~~1 . F l l . O l . d t - I~'-d-~- t}I.FII.O1 .at >_ ~1 ( 0 ) . r l l - ~ 1 (0) >_ - 7' o In the same way, we consider the Integral type PAA, for the PAA 2 block: 02 _ F2 (G-IT2)T er so that:

j; e r

.G

,

d[

]

T2.O2.dt=f;'~2.F21.~2.dt -I;'-~- O2F21"ID2 "dt>O2(0)-F21-1D2(0)>'70

2

The transfer matrix H(p)=G/(p+B), representing the actuator dynamic equation, is the last block to analyse. This matrix is strictly positive real as the gain matrix G and the parameter matrix B are diagonal and strictly positive. The passivity of this block is then direct. To sum up, we have demonstrated the passivity of the four blocks represented in figure 4. These blocks are in feedback and parallel connections. From the passive theory [10], we can therefore conclude to the passivity of the whole system. Appendix B

This appendix is devoted to demonstrate the stability of the decentralized adaptive controller inspired by Seraji's method [9].

531 Firstly, the closed loop of the system is written from equations (24) and (25): M.i~+Kv.6+ Kp.e- d-f+(M-kl)~ J'.~} r + ( A + B ' ) . e

d d + ( J ' - k4). i-d +(E' - k2).~l d

r+(F-E').6-(B'-k3).r

This system can be rewritten in function of the state vector X defined as X=(e r, e, e ) T Secondly, an error model is built to make the closed loop system perform as this model. It is composed of two equations: one of the second-order to be compared with the dynamic error model and the other of the first-order to be compared with the actuator error model: e m r + 11.e m r = 0 e m + 2. ~. co. e m + co2. e m - 0 q, ~, and co are the standard coefficients which determine the damping and the frequancy of the system. As for the real system, the state vector associated to the error reference model is Xm=(emr, e m, etn) T. We then can write X m - D. Xm. This linear system is stable and verifies Lyapunov equation L.D + D T . L - - Q , P and Q are symetric positive-definite matrices. From these two error models, the error vector E is built and the whole system rewritten in function of the vector: E = X m - X. Thirdly, a Lyapunov candidate function is taken: f-d_f,)2 V - E'I.L.E + Q o ( - ~ +Q4 (k-~-E' 9

-

E I

k~)2 -

-

Kp_(_o2_K,)2+Q2. Kv K~,)2 kl-M *)2 +QI.( M p (--~-2.~.(0+Q3.( M - k l

G+B' + Qs.(

B'

-

'1 G*) 2 -

+Q6-(

I-'-E' E'

-

1_.,)2

+Q7.(

k3-B' B'

9 -~

-ka)-+Q8.(

ka;J'_k,)2 4

Qi are positive scalars and ki* adaptive laws determined to get the derivative of the Lyapunov function negative. When developping this derative function, we use the Lyapunov equation and set two adaptive errors to: r 1 = L.e. c + L'.e + L".6 and r 3 = L".e.c+ Pl.e + p2.6 (I-d" 9 = _ET.C.,.E+ 2!v~_

+2(k2 9 j,-E'

i" i*

_2.Qo .f

k2_k~ _s d 9rl _2.Q4.k~ .-77

- i'* + 2.

-k~ +2.

-co 2

J'

QI--~- - I(~ -r3.e -2 .Q,. Kp -

Q5 . - i T - d . ,

-

p

-(;*

532

9 j,

i

Q6 ~

rl.e -2.Q6.

-

+2

J'

Q7 - ~ - k 3 - r

.r 1 -2.Q7.k~

-k 3

"

+2.(k4+J') " "d k4 - k,~ J' Q8 - ~, - k~ - r .r1 - 2.Qs.k; -]7

Let the following equalities be:

*

Q2 I~v M - I~*v - r 3 . 6,

K~-Q2.r 3 6

K* *.' p - Ol r 3 e

/l~' "*/-q d 03 ~ - - k I -r3,

* * k I - O3.r3.q d

f* - Qo" r3

QI

p - r3.e,

,)

j---v-k2

Qsi- - -

~kJ

d-rl,

k 2 - 0 4*. r l .

- r I 9e ~ "

jt

Q8

-/1

-

1~* 4

-i"

(

9

.

Q6 -j,- -

Q5.rl.er

d .r l,

)

Q7

-r 1 ~

j,

3

F

9 '

-Q6.rl

3

7

" r

k] - Q8.rl. :r d

This leads to: * V =-ET.Q.E-2.Qo.r~_ -2.Ql.* r ~ . .'~ e - - 2.Q.~.r 3, . _v

_2Q; ~? ~d~_~Q~ ~ ~d~ <0

62

- 2.Q3.* ~-" ~ . ~ d

2

-~ * -~ 62 -2.Q4.* ri-.dl d' -2.Q 5*. r~. e 2r - 2.Q6.ri-.

~t

.

Qi are positive or null scalars taken to zero to minimize the number of gains to adjust. In the same way if we develop the Lyapunov equation and set L ' = L" = 0, the adaptation errors become r 1 = L.e~ called the torque adaptation error and r 2 = p l . e + P2- 6 called the position and velocity adaptation error. The adaptive laws are then of integral type and can be written as follows: f" = b'(Pl.e +p2.6 )

Kp=fll(pl.e+p2.e).e,

Kv=fl2(Pl.e+p2.d).e

l~l=]33(Pl.e+p2.e).q d

(~ =g2(L.e r).e r

I:" = g 3 ( L e r ) 6 "

1~2 =gl(L.er)./l d

k3 =g4(L.er) . r d

i~4 = gs(L.er), id 6, 13i and gi are strictly positive gains to be adjusted. This leads finally to the decentralized Model Adaptive Control law. It is worthwhile to note that some assumptions have been required for this developpment. In particular, the motion must be slow enough to allow the adaptation laws to track the time variations of dynamic characteristics. We can then conclude only to local stability.

Intelligent Robots and Systems

V. Graefe (Editor)

9 1995 Elsevier Science B.V. All rights reserved.

533

A n Efficient F o r w a r d G a i t C o n t r o l for a Q u a d r u p e d Walking Robot Daniel J. Pack and A. C. Kak ~ ~Robot Vision Laboratory, 1285 EE Building, Purdue University West Lafayette, IN 47907-1285 In this paper, we present a simplified forward gait for a quadruped walking robot. The proposed gait is a straight line, periodic, monotonically forward (SLPMF) gait. We show that for a given support pattern (a support pattern is a polygon generated by connecting the feet positions in contact with the ground) and for a given location of the robot centerof-gravity, only certain sequences of leg movements will generate the SLPMF gait. We also introduce a method to determine how to preserve stability of a quadruped robot during the motions called for by an SLPMF gait. Experimental results are presented to support the feasibility of the proposed gait. Another noteworthy aspect of our paper is the discussion on the leg design from a hardware standpoint; the design permits independent joint control. 1. I n t r o d u c t i o n Study of walking robots started in the mid 60's when an initial prototype of a quadruped robot was built and tested by General Electric Corporation[10]. Since then much work in this area by various researchers has resulted in the uncovering of many problems. At one end of the spectrum, these problems deal with low-level but critically important issues such as gait control, force-feedback control for terrain-adapted foot-placement, stability, etc. At the other end, we have problems of dynamic control, the incorporation of environment sensing, collision avoidance, goal attainment and others. To cite some of the literature that has appeared to date, Song and Chen[14], Hirose and Umetani[6], Bares and Wittaker[1], and Choi and Song[3] have addressed problems dealing with low-level motion control for leg coordination; Hirose[5] and Waldron et a1.[15] have dealt with the problem of overall mechanical design of walking robots. For higher level issues the reader is referred to Sekiguchi, Nagata, and Asakawa[12], Ooka et al. [11], and Brooks[2]. In our laboratory, we are currently engaged in building a quadruped robot. The main goal of this research effort is to analyze the problems that arise when low-level control strategies are integrated with high-level reasoning and planning modules. As we showed in the context of wheeled mobile robots [7,9], the integration of low and high level control is of fundamental importance in order to imbue a robot with "intelligent" behavior. While our research on quadruped robots is driven by the desire to study problems involved with such integration, this paper will deal with just one specific issue that we addressed and resolved during the course of our research: the issue of gait control.

534 The specific research we have reported in this paper is motivated by our credo that the complexity of a high-level controller can be reduced if the low-level control strategies are as simple as possible but, of course, adequate for the tasks intended for the robot. (The reader may see some parallels between this and the RISC approach to computing.) To simplify the low-level aspects of gait control, we have made certain reasonable assumptions about the motions that a quadruped robot is allowed to execute. Basically, a quadruped

Figure 1. A photo of our quadruped walking robot, ROACH.

robot can either move its body in some direction while all its feet are stationary and in contact with the ground, or the robot may lift one of its legs and move the leg in some direction with or without the body in simultaneous motion. In this paper, we assume that at any instant of time the robot is allowed only one type of motion: the robot may either just move its body while the feet stay stationary; or, keeping its body stationary, the robot may just move one of the legs for a foot placement that is more advantageous for subsequent motions. These assumptions help us devise a robust and computationally inexpensive gait control algorithm that ensures the robot will stay stable during all its motions.

535 The paper is organized as follows. In section 2, we discuss the mechanical design of the robot. In section 3, we then propose a forward gait that we refer to as a straight line, periodic, monotonically forward walking gait. We analyze the problem determining the sequence of free legs for such a gait. Section 4 presents experimental results, followed in section 5 by conclusions. 2. M e c h a n i c a l D e s i g n of t h e R o b o t Figure 1 shows a photo of the quadruped robot showing its overall configuration. The main body of the robot is an oval-shaped platform populated with a CCD camera, hardware for a video link antenna, and a battery. Attached to the body are four legs in the manner shown. The mechanical design of the legs will be discussed later. In what follows immediately, we will start with a justification for why we chose four for the number of legs. At this time, we are interested in control strategies that are dependent solely on kinematic considerations; our interests are more in the area of how a walking robot might interact with its environment given a goal and a certain characterization of the surface traversed by the robot. The goal and the surface characterization will be provided by the high level controller using a visual sensor.

Figure 2. Each leg consists of three links. The two joints on the left are revolute joints while the third joint is a linear joint.

To control a walking robot in kinematic mode, it is necessary that the robot have at least four legs. This conclusion is drawn by simply observing that a system needs at least three legs to balance itself on most surfaces and that any extra legs can then be used to move the robot to a new position. For a second reason for the quadruped design, while it is known [8] that robots with more than four legs can move faster, the speed advantage is

536 obtained at the cost of increased complexity in the coordination and control of the legs. Hardware costs and difficulties in arranging motor drivers, amplifiers, power supplies, etc., are proportional to the number of legs. Robot with more than four legs do possess one advantage, viz., the ability to cope with deadlock situations. A set of foot placements constitutes a deadlock condition if lifting any foot would cause the robot to lose balance. The design of our quadruped robot was governed by a desire for simplicity, simplicity in both the hardware and in low-level motion control strategies. We simply did not want the robot design to become overwhelmed by mechanical, electronic, control, etc., considerations to the extent that no useful research in high-level control would be feasible. We, of course, recognize the fact that an excessively simple robot might severely limit the range of possibilities for studying high-level issues. Taking these competing considerations into account, we have chosen the leg design shown in 2. This design was inspired by the leg design used in the Planetary Rover [13], in the sense that each joint of our robot is controlled independently. Apart from this similarity, there exist important differences between the two. The first two links of the leg shown in Fig. 2, link 1 and link 2, move horizontally in a manner that they can be used to position the foot on a two dimensional Cartesian space within a region whose shape and size are dictated by kinematic considerations. And, link 3 is a ball-screw joint which translates the rotations of a motor shaft into linear motions for raising and lowering the foot. A particularly noteworthy feature of link 3 is the manner in which the motor for this link is mounted on a pair of rails. This mounting allows the motor to slide up or down as the lead screw attached to the shaft of the motor turns in the base marked in Fig. 2 for the purpose of lifting or dropping the foot. The electrical motors used for all the links on a leg are of the stepper variety. In comparison with DC servo motors, modern stepper motors permit easier software development as control can be accomplished in an open-loop mode. Of course, using a stepper motor in open-loop control compromises precision due to lack of feedback. This can be rectified by the inclusion of encoders shown in Fig. 2. While the robot is still in the development phase, we have chosen to mount all the twelve motor drivers, three for each leg, off board. The motor drivers communicate with the host computer through an RS232 serial line. Eventually, all of this hardware, together with a microprocessor based control board will be mounted on top of the robot.

3. S L P M F Walking Gait First a few definitions are in order. A leg will be called a supporting leg if its foot is in contact with the ground and exerts pressure on the ground. A leg that is lifted clear off the ground will be called a free leg. A foot placement will be considered reachable if it does not violate any of the mechanical and kinematic constraints on the kinematic chain of the associated leg. Shown in Fig. 3 are the reachability regions for all the four legs. As is perhaps intuitively obvious, three generic motions of a quadruped walking robot are: 1) The robot moves its body while the feet stay in contact with the ground - there is obviously a limit on the size of such motions. 2) the robot lifts one of the legs and moves the foot forward for a new placement while the body stays stationary. And 3) the robot lifts one of the legs and both the body and the free leg move forward. As discussed

537

Figure 3. Feasible regions for each leg of the quadruped robot: Region number corresponds to feasible region for a particular leg.

by McGhee and Frank [8], a cyclic gait may be synthesized by combining these generic motions. (We must hasten to add that for primitive forward motions of the creeping kind, as exhibited by toy animal robots for kids, it is not necessary to synthesize a gait using the generic motions we just listed; the primitive kinds of walks can be created by moving the forward two legs only incrementally so that the c.o.g, of the robot is not disturbed overly and by just sliding the rear legs forward as necessary.) We will now present the forward periodic walking gait reported by McGhee and Frank [8]. We will refer to this gait as the McGhee and Frank gait in subsequent discussions. In this gait, forward movement of the robot is attained by simultaneously moving the body and one of the legs that is free. For this to occur, all joints of the robot, except for joint 3 on each of the three supporting legs (this joint is connected to the link in touch with the ground), must be controlled. In Fig. 4, the corners of the polygons represent the feet and the polygon itself the support pattern of the robot. The figure shows the successive support patterns involved in the McGhee and Frank gait. Note that the concept of a support pattern, first put forward by Hildebrand [4], is a convenient way to show each step of a gait cycle. In the figure, specified leg numbers represent the supporting legs while a missing number between one and four indicates the free leg. The arrow shows the direction of the motion; its length is proportional to the distance traveled by the center of the body. If we set up our robot for the McGhee and Frank gait, we would need 9 actuator commands for each step that starts with a triangular support pattern in Fig. 4; and 8 actuator commands at each step starting with a a four-sided polygon; therefore, a total of 68 commands would be needed over a gait cycle. This number of commands follows from the fact that when purely body motions are executed, that is when a four-sided support pattern moves into another four-sided support pattern, we need coordinated motions at joints 1 and 2 for each of the four legs, requiring a total 8 such commands. Taking into account that we have four such steps in the gait cycle shown in Fig. 4, we arrive at 32 actuation commands. And, when the robot motions in a gait cycle correspond to motions from a 3-sided to a 4-sided support pattern, we require 9 actuator commands; of these, six

538

Figure 4. Walking gait sequence proposed by McGhee and Frank: (a) shows an initial support pattern and forward robot body motion; (b) shows leg 3 off the ground and moving to a new location. Note that unlike in our case, as the free foot moves forward, the robot body also executes a forward motion simultaneously; (c) shows leg 3 positioned on the ground and the body moving forward to allow leg 2 to be the next free leg; in (d) leg 2 moves to a new location; (e) shows the intermediate step where leg 2 is now placed on the ground at a new location while, simultaneously, the body moves forward also; in (f) leg 4 is lifted off the ground and placed at a new location; in (g) leg 4 is again on the ground and the body moves forward subsequently; finally, in (h) leg 1 is lifted and placed at a new location which completes a gait cycle. An important point to note for this McGhee and Frank gait: the body moves forward during all steps of the gait.

539

Figure 5. The quadrants generated by Stability Admitting Lines. The quadrants are labeled Quad 1, Quad 2, Quad 3, and Quad 4. In (a)is shown a situation where all four quadrants exist while (b) shows a case where only Quad 3 and Quad 4 exist.

will be for joints 1 and 2 of the three supporting legs and three for coordinated movement of the three links of the free leg. Since we have four such 3-sided to 4-sided transitions in the gait cycle of Fig. 4, 36 actuation commands will be required. Therefore, we conclude that a total of 68 commands would be necessary to execute all the motions in a gait cycle. We now introduce a new gait control strategy that requires fewer actuation commands for the completion of one gait cycle. However, we must first introduce the idea of a S t a b i l i t y Admitting Line (SAL), a concept first promulgated by Hirose [5]. Formally, Definition S t a b i l i t y Admitting Lines (SALs) Lines in the ground plane that either connect the supporting feet 1 and 3 or the supporting feet 2 and 4 are called stability admitting lines. The location of the c.o.g, of a robot with respect to the SALs can determine the next appropriate action the robot needs to take. Fig. 5 shows a typical leg configuration with the SALs labeled. It is clear that the SALs always divide the robot body into at most four different regions, labeled Quad 1, Quad 2, Quad 3, and Quad 4 in (Fig. 5(a)). As shown in Fig. 5(b), all of these Quad regions may not exist for certain configurations of the feet. For the example shown there, only Quad 3 and Quad 4 exist. We will now make the reasonable assumption that at all times the positions of the four feet are known with respect to the robot body coordinate frame centered at the c.o.g, of the robot. This means that the equations for the SALs can be easily constructed for those feet that are in contact with the ground. Once the SALs are found, the robot can readily determine which quadrant contains the c.o.g, of the robot. If the c.o.g, is in Quad 1, the robot will not be able to use leg 1 or leg 2 as a free leg for the next motion, since otherwise the robot will lose balance and lurch forward. This problem can be remedied by first moving the robot body backwards while all the feet stay in contact with the ground. Such a motion of the body will shift the c.o.g back into Quad 3. Subsequently, the robot could plan its forward motion again and use either leg 1 or leg

540 2 as a free leg. If, on the other hand, the c.o.g, is in Quad 4, it will not be possible to use leg 1 as a free leg for the next motion. Later, we will present strategies for dealing with each such situation.

Figure 6. Initial configuration of the robot. Shown in (a) is the situation in which the robot c.o.g lies in Quad 1; in (b) the robot c.o.g, lies in Quad 2.

We will now present our SLPMF gait that is more efficient compared to the McGhee and Frank gait, in the sense that our gait requires fewer actuation commands per gait cycle. While the manner in which we select the free leg for forward motions is same as in [8], there are two important points of departure in how we synthesize a gait. First, the location of the c.o.g, in relation to SALs plays a critical role in our work. And, second, unlike McGhee and Frank [8], we only allow the body to move forward when all four feet are in contact with the ground. When the free leg executes its motions, the body remains stationary. Although for identical joint speeds, this will mean that a robot with our gait will move slower in relation to a robot with the McGhee and Frank gait, we do not consider that a serious issue. As recent experience with systems with a large number of degrees of freedom has demonstrated, future progress will be determined more by whether we can discover simplified control strategies than by raw overall speed. In any case, it is always possible to use more powerful motors if speed is really an issue and, at the same time, benefit from our simplified gait control. Given all possible sequences of free legs and body movements, we first want to show that not all leg sequences can be used to generate an SLPMF walking gait. There are 24 different possible sequences in which free legs can be ordered; this follows from the permutation P~ with n = r = 4. These sequences are of the form (1,2,3,4), (1,2,4,3), (1,3,4,2), etc., where the numbers identify the legs. Among these 24 sequences, only a subset of sequences, combined with the robot body movements, are permissible for an S L P M F walking gait. The number of permissible sequences is a function of the location of the robot c.o.g. As was mentioned earlier, if the c.o.g, is in Quad 1 of Fig. 5, it will not be possible for the robot to lift leg 1, implying that sequences beginning with

541 number 1 would not be permissible. We now present the proposed gait assuming that the robot c.o.g lies initially in one of the two quadrants shown in Fig. 6. By symmetry, our discussion can be extended to other cases where the initial c.o.g location is in Quad 2 or Quad 3. Let's first consider the case of the starting support pattern on the left in Fig. 6(a). Again, as we mentioned before, for this starting configuration, the robot will not be able to lift either of the two front legs initially. Thus, only 12 possible sequences remain to be examined. Among these, only two sequences, (3,2,4,1) and (4,1,3,2), can be used since the others will result in the robot either moving backward before it goes forward or in a non-periodic gait. We will first show the gait corresponding to one of the two permissible sequences, (3,2,4,1). Fig. 7 shows the steps involved in the SLPMF generated by the (3,2,4,1) sequence of free legs when the starting configuration is as shown in Fig. 6(a). A circle around a particular foot indicates the leg that will be free for the next step. The pre-computed sequence of free legs and body movements is 3 ~ 2 ---+ m o v e body f o r w a r d ---, 4 ~ 1 ~ m o v e body f o r w a r d . In the first step, leg number 3 is selected as the free leg and moved forward such that the robot c.o.g, resides in Quad 3 or Quad 4 generated by newly computed SAL 1. Note that only one SAL needs to be updated at a time. In the next step, the leg number 2 is moved. At this point, the body needs to move such that the robot c.o.g, lies in Quad 1 or Quad 2. This allows the next free leg to be leg number 4, Fig. 7(d). During the next step, the free leg, leg number 4, needs to move forward such that the robot c.o.g. lies in Quad 2 or Quad 3 generated by the new SALs, Fig. 7(e). Once the next free leg, leg number 1, is moved, the robot body needs to move forward, Fig. 7(f), to restore the original body and leg configuration which ends a gait cycle. Now consider the case in Fig. 6(b). Of the 24 sequences, those beginning with leg number 2 will not do, again because of stability considerations. Note, however, the sequences that begin with leg 3 are now permissible even though it seems that lifting this leg would cause the robot to lose balance. The reason is that, for the case of a sequence beginning with leg 3, the robot can move its body first, shifting the c.o.g. forward into Quad 1. Discounting the impermissible sequences, we are left with three, (1,3,2,4), (3,4,1,2), and (4,1,3,2). In Fig. 8, we show the gait for the sequence (1,3,2,4). The gait can be described by the following leg and body movements: 1 ~ m o v e body f o r w a r d ---, 3 ~

2 ---, m o v e body f o r w a r d --~ 4.

We now show one example of an impermissible sequence. Consider the free leg sequence (1,3,4,2). We can illustrate this case using Fig. 8. In this figure actions up to frame (d) are the same provided that at the step shown in frame (d) leg 4 is the next free leg selected instead of leg number 2. At this point, the robot body needs to move forward to position its c.o.g, ahead of SAL 1 and into Quad 1; this allows leg 4 to be the next free leg. Observe, however, once leg 4 is moved, wherever it may be, the next free leg can not be leg 2 since that would mean the body of the robot would have to move backward. Of course, if we drop the requirement that the gait be periodic or monotonically forward, we can generate a forward gait which is an end result of side walking and moving the robot c.o.g, back and forth with any sequence of body and leg movements. This type of gait is important and in some cases necessary, but can not be an optimal straight line forward gait in which we are interested at present.

542

Figure 7. For the starting support pattern shown in Fig. 6 (a), shown here are the different steps of the SLPMF walking gait. Shown in (a) is the initial robot configuration where leg 3 is about to move to a new location; (b) shows leg 3 at a new position and leg 2 selected for the next motion; (c) shows leg 2 at the new location and body ready to move in anticipation of the next free leg, leg 4, as shown in (d); (e) shows the robot configuration after leg 4 has executed its motion and the selection of leg 1 as the next free leg; (f) shows leg 1 at a new location and the body ready to move forward; (g) shows the robot body at its new location with the robot again in its initial configuration, signifying the completion of the gait cycle.

543

Figure 8. For the starting support pattern shown in Fig. 10(b), shown here are the different steps of the SLPMF walking gait. Shown in (a) is the initial robot configuration where leg 1 is about to move to a new location; (b) shows leg 1 at a new position and body ready to move in anticipation of the next free leg, leg 3, as shown in (c); (d) shows the robot configuration after leg 3 has executed its motion and the selection of leg 2 as the next free leg; (e) shows leg 2 at a new location and the body ready to move forward; (f) shows the robot body at its new location; and, finally, (g) shows the robot again in its initial configuration, signifying the completion of the gait cycle.

544 We will now discuss the issue of foot placement. There are two ways of determining the precise locations for placing the feet during execution of a gait. One may first establish the gait, as we have already done, and then by using a different set of considerations (such as the maximization of the forward movement during one gait cycle and the minimization of the deviations of the body c.o.g, from a straight line motion) determine where the feet should be placed. This approach is theoretically attractive for obvious reasons but cannot be pursued at this time due to the nonlinearities involved in the minimization and the dimensionality of the underlying space. An opposite approach - the approach that

Figure 9. Tracks for foot placement in an SLPMF gait. Shown in (a) is the track we have adopted for our robot while (b) shows ideal tracks.

we have adopted because it is simpler- is to assume ab initio that the the left and the right feet of the robot will always be placed on two parallel tracks, as shown in Fig. 9(a), ensuring that the robot body will travel along a straight line, and then determining the extent of each free leg motion for the maximization of forward progress. This raises the question of what distance to choose for the separation between the two tracks. Intuitively, and we believe on theoretical grounds also, it would appear that the ideal placement of the two tracks would be as shown in Fig. 9(b) where the left track passes through the

545 points where the left legs are anchored to the body, and the same for the right track with respect to the right legs. However, such tracks are not practical because of the hardware that is mounted on the joints between the different sections of each leg; there should be no collisions between this hardware and the main body of the robot. For this reason, the tracks that we have chosen, shown in Fig. 9(a), are offset from those in Fig. 9(b) by an extent that permits the joints to move freely without running into the robot body. With the specification of the tracks, the only remaining factor that decides placement of the feet is the sequence chosen for the selection of the free legs during a gait. Currently, the sequence that yields the largest distance forward during one gait cycle is chosen by an algorithm that is run off-line. As the reader will recall, the robot executes only two different kinds of motions: either the body moves forward while all the feet stay in contact with the ground, or a free leg moves forward. The previous paragraph addressed the issue of foot placement for the free leg. With regard to the motion of the body, its necessity is determined solely by the location of the c.o.g, with respect to the intersection of the Stability Admitting Lines (SALs). Again, at the risk of repeating ourselves, suppose the robot next wants to declare leg 3 free and move it forward but the c.o.g, is in Quad 3, as depicted in Fig. 8(b), the robot will then have to first move its body forward until the c.o.g, lies in Quad 1. In order to calculate the displacement for the body, the important factor to bear in mind is that this distance should exceed the minimum necessary for the new SALs to allow the next free leg to be lifted without causing instability of the robot. If the robot is allowed unhindered travel forward, a simple solution to this problem is to move the body forward by the maximum extent possible; this extent is a function of the mechanical limits of the robot.

Figure 10. Minimum Distance the body needs to move forward to ensure stability during the motions to be executed by the next free leg is obtained from the geometry of the SALs.

546 On the other hand, for adaptive gaits (such as when the robot must adjust its forward progress in order to either come to a stop at a particular location, or carry out collision avoidance, or veer or turn to one side or the other, etc.), one may wish to calculate a specific distance for the body motion. The minimum distance, denoted dy, which should be exceeded by the motion of the body is calculated using SALs as shown in Fig. 10. Recall that we keep track of all feet positions with respect to the robot body frame. Thus, we know where feet locations of leg 1 and leg 3 are with respect to the robot body origin, which means we can compute equations for lines in Fig. 10 in the robot body coordinate frame. All that is left is finding the intersection point of SAL 2 with the y axis of the robot body coordinate frame. This is the minimal distance the body needs to move for the robot to stay stable during the next move if leg 3 is selected as the next free leg.

4. Experimental Results In this section, we present experimental results of the proposed gait implemented on our robot named ROACH. We will show how the robot, shown in Fig. 1, walks forward using the stability admitting lines. The initial and final locations of the robot center of gravity lie in quad 1, see Fig. 6 (a). The steps performed by the robot shown in Fig. 11 correspond to the gait cycle illustrated in Fig. 7. Frame (a) shows the initial robot configuration in which the robot center of gravity resides in quad 1. Frame (b) shows leg 3 being selected and moved forward as the first free leg, taking advantage of the location of the robot c.o.g. The new foot location for the free leg is selected such that the resulting SAL 1 enforces the robot c.o.g to lie in Quad 3. Leg 2 is then selected as the next free leg and moved forward to a new location, as shown in frame (c). Subsequently, the robot moves its body forward which allows leg 4 to be the next free leg to be moved, frame (d). Frame (e) shows leg 4 after it has positioned its foot at a new location, where the newly generated SAL 2 allows leg 1 to be the next free leg. Frame(f) shows the robot in the process of lowering its leg 1 foot to a new position. Finally, the robot body moves forward to transfer the robot c.o.g, back to Quad 1, completing a gait cycle, frame (g). During each cycle of the gait, the robot can advance up to 8 cm. This value is a function of the parameters of the mechanical design of the robot, parameters such as the length of each link, robot body shape, location of each of the joints, etc.

5. C o n c l u s i o n In this paper, we introduced a new mechanical leg design for a quadruped walking robot. We proposed a new simplified straight line, periodic, monotonically forward (SLPMF) walking gait for such robots. We showed that these types of gaits can be controlled by using the notion of Stability Admitting Lines. The geometry of these lines, always available to the controller, permits the determination of whether or not the selected free leg can move forward without jeopardizing the stability of the robot. And, if the stability of the robot would be violated, these lines also permit the determination of the extent to which the body should move forward so as to guarantee stability during the next motion.

547

Figure 11. For the starting support pattern shown in Fig. 6 (a), shown here are the experimental results for a sequence of steps involved in the proposed walking gait.

548 REFERENCES

1. J. Bares and W. Wittaker, "Walking robot with a circulating gait," In IEEE International Workshop on Intelligent Robots and Systems 1990. 2. R.A. Brooks, "A robust layered control system for a mobile robot," IEEE Trans. on Robotics and Automation, vol. 2(11), 1986. 3. B. S. Choi and S. M. Song, "Fully automated obstacle-crossing gaits for walking machines," In IEEE Int. Conf. Robotics and Automation, pp. 802-807, 1988. 4. M. Hildebrand "Symmetrical gaits of horses," Science vol. 150, pp. 701-708, 1965. 5. S. Hirose, "A study of design and control of a quadruped walking vehicle," In Int. J. Robotics Research, vol. 3(2), 1984. 6. S. Hirose and Y. Umetani, "The basic motion regulation system for a quadruped walking vehicle," InTrans. ASME, DET-34, 1980. 7. A. Kosaka and A. C. Kak, "Fast vision-guided mobile robot navigation using modelbased reasoning and prediction of uncertainties," In C V G I P - Image Understanding vol. 52, pp. 271-329, November 1992. 8. R.B. McGhee and A.A. Frank, "On the Stability Properties of Quadruped Creeping Gaits," In Mathematical Biosciences. vol. 3 pp. 331-351, 1968. 9. M. Meng and A. C. Kak "Mobile Robot Navigation using Neural Networks and Nonmetrical Environment Models," In IEEE Control Systems pp. 30-39, October 1993. 10. R.S. Mosher, "Test and Evaluation of a Versatile Walking Truck," In Proceedings of Off-Road Mobility Symposium. pp. 359-378, Washington, D.C. 1968. 11. A. Ooka, et al., "Intelligent Robot System II," In 2nd Robotics Research, 1984, MIT Press. 12. M. Sekiguchi, S. Nagata, and K. Asakawa, "Mobile robot control by a structured hierarchical neural network," Control Systems, vol. 10(3), 1990. 13. R. Simmons and E. Krotkov, "An intergrated walking system for the Ambler Planetary Rover," In IEEE Int. Conf. Robotics and Automation, April, 1991. 14. S. Song and Y. Chen, "A free gait algorithm for quadrupedal walking machines," J. Terramechanics, vol. 28(1), pp. 33-48, 1991. 15. K. Waldron et al., "Configuration design of the adaptive suspension vehicle," Int. J. Robotics Research, vol. 3(2), 1984.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

549

The Six-Legged T U M Walking R o b o t tt.-J. Weidemann, F. Pfeiffer TU Muenchen, Lehrstuhl B ffir Mechanik Postfach 202420, D-W 80333 Muenchen, Germany

Abstract The paper presents a survey of the six-legged walking robot designed and constructed at the Institute of Mechanics at the Technical University in Munich. This robot was designed in 1993 with special regard for principles found in natural walking of hexapods, which walk and climb very efficiently over arbitrary irregular surfaces. The investigated insect and its main features are presented. A description of the mechanical hardware and electronics used in the robot is included.

Fig. l: The six-legged TUM Walking Robot

1 Introduction In earlier publications detailed studies on the kinematics and dynamics of six-leggedinsects (the Walking Stick Insect Carausius morosus ) h~ve been presented [7] [8] [9] [11] - [14]. Our research activities in the field of dynamics of walking insects is based on experimental results from biologists [1] [4] [5] [6]. In order to verify the elaborated theories and to further investigate autonomous locomotion of legged robots, we constructed a sixlegged version as shown in Fig. 1. The geometrical layout of the joint axis therefore resembles the configuration found in the Walking Stick Insect. A major problem in

550 designing walking robots is represented by the weight-to-power ratio of its actuators. In this paper we will present a solution which enables each single leg of the robot to lift about six times its own weight.

Fig.2 The Walking Stick Insect Carausius Morosus

2 T h e W a l k i n g Stick I n s e c t Leg Geometry A detailed description of the kinematic aspects of this walking insect is given in [12] [8] [9]. Since its gait movement appears regular and harmonic, this insect is suitable for legged locomotion research. Furthermore its relative big size of about 70 mm enables biologists to measure joint angles and surface contact forces during walking. Typical for insects the legs consist of three segments. In nature the first joint axis is fixed to the central body with an oblique orientation (Fig. 2 and Fig. 3). This obliqueness has essential influence on the performance of the legs concerning different types of tasks such as gravity support, forward thrust or tactile sensoring tasks. Hind legs which are mainly used to generate a forward thrust show completely different angles ~, ~b than front legs, which are used as tactile sensors also. The Stick Insect is able to change these parameters according to the terrain or its desired motion by means of rotating the complete leg apparatus around the 5-axis and thereby changing the leg parameters ~b, ~.

551

Fig. 3: The kinematics of an insect leg and its corresponding mechanical model This angle will by changed by the insect when performing different tasks such as climbing on branches. In this approach we investigate constant but different sets ~p, ~ for the front-, middle- and hind legs. It is shown in [7] [13], that this axis orientation has an essential influence on the resulting force/torque distribution during walking. As we found it, nature apparently optimized the legs of this insect to be subjected to a minimum of bending load only. The legs of the walking robot in Fig.1 are mounted to the central body with a set of angles identical to those of the insect when walking over an even, flat surface. The additional joint angle 5 thereby remains constant. The leg attachment is designed in a manner so that different sets of angles ~, ~ may be studied.

Leg Control in the Walking Stick Insect In nature a complex network of motor neurons activates and deactivates the muscles in the moving apparatus of the investigated insect. Motivated by detailed research results of biologists, we approximate the controlling neurons of the first leg joints by a set of threestep-controllers. We divide a regular leg cycle into the sections STANCE, P R O T R A C T , SWING and RETRACT. A special RESWING phase is executed whenever a leg hits an obstacle in order to place the feet some centimeters behind the obstacle. In this case, an additional P R O T R A C T phase followed by a SWING phase will be performed. During the STANCE phase the leg is in an active support phase, SWING marks the return movement of the leg to the next ground point. P R O T R A C T and R E T R A C T denote the high acceleration transition areas from status STANCE to SWING or vice versa, respectively. We furthermore demand piecewise constant angular accelerations which are switched at the anterior extreme position (AEP) and the posterior extreme position (PEP). Fig. 4 shows acceleration versus angle and the corresponding phase portrait of the swing movement of the leg plane.

552

PROTRACT

~,

PROT~

d

SWING --~

AEP SWING --~

PEP

RETRACT

....-

~

RESWING

v

~

STANCE

Fig.~: The Three-Step- Controller for the swing movement of the leg plane Leg P h a s e C o o r d i n a t i o n For the global leg coordination only the AEP and PEP threshold values are modified during walking. There is no central supervision; the coordination of the leg movements is done by control influences exchanged among the legs by themselves. Each leg acts as a sender controlling its neighbouring legs by shifting the AEP and PEP thresholds of these legs by small amounts A A E P and A P E P , respectively. The control influences used in our approach have been measured and isolated by neurobiologists [4], [5], [6]. The control routines driving the presented walking robot at the moment include three of eight control mechanisms described in full detail in [13]. A ninth influence changes the global walking speed in dependence on the individual leg phases, each leg is able to reduce the global walking speed. The global walking speed is indicated by & during S T A N C E phase and will by decreased by a leg when it is touching obstacles or when the static stability is endangered. This, again, is not conducted by a central controller, each individual leg may change this parameter depending on its own and its neighbouring legs state. 3 Design

of the

Walking

Robot

Control Design It is a well known fact that insects are capable of efficient walking even in very rough terrain or even with high loads. Their controllers show a very high robustness. Recently, some research groups use controllers for walking machines which are inspired by insect like, neurobiological control systems [3], [2]. In our approach we use the research data collected by neurobiologists concerning the Walking Stick Insect [1], [4], [5], [6]. We establish a Leg Coordination System which executes the above mentioned rules which actually have been detected by measuring nerve signals inside this insect. We added a finite-state-controller for each leg (Single Leg Controller SLC). This single leg controller is able to detect and surpass obstacles by means of monitoring the bending loads in the outer leg segments. In addition, body height control and detection/correction of slippage effects is done by the SLC. Finally, a fast on-line path planning algorithm and PID-controllers in each joint form the lowest level of our three-level controller arrangement. Figure 5 shows an overview of the control system designed for walking robots.

553

Ee iN7 p

AIPEP

~Ep

~pEP

$1N LE" LE (: I'ROL CU'""

;m e

AEP PEP SLC LCM

Anterior Extreme Position Posterior Extreme Position Single Leg Controller Leg Coordination Module

y SOl-[

Fig.5: Control Design for a walking robot.

Single Leg Controller Since in the real world a w~lking machine will encounter obstacles and/or slippery wMking ground, we added special mechanisms to the presented three-step single leg controller. The capability of obstacle avoidance is achieved by means of a special detection mechanism and a different approach to general path planning. The enhanced Single Leg Controller (SLC) is shown in Fig. 6. During SWING phase the SLC monitors the bending load in the leg segments. Whenever the corresponding strain gauge signal exceeds a certain threshold value the obstacle avoidance mechanism is activated. A short RESWING phase is executed followed by a new SWING phase trying to pass the obstacle. The path planning algorithm for the three leg angles c~,/3, 7 thereby differs from standard path planning used in robotics. Usually, end effector trajectories are described by time histories of work space or configuration space coordinates. In our approach we describe the dependency of the outer joint coordinates /3,7 in terms of the leg angle coordinate c~. We demand

554 /3,7 to obey to the functions /3 = f(a) and 7 = g(a). Physically, we thereby describe the geometrical shape of the trajectory but not its velocity and acceleration profile. Since the SLC monitors the a- movement of the leg plane,/3 and 9' are cascade- controlled by evaluating their desired values in dependency of the a- value. obstacle

/3 - f(a);

--+

Of.

0

Og.

0

Since these evaluations have to be made online and therefore in real time, we demand the foot trajectories to be composed of segments of simple polynomial functions. The so derived trajectories are in good accordance with foot trajectories observed and measured in Walking Insects. The path planning algorithm is executed at the beginning of a SWING phase within the 4mstime slice of an hardware interrupt driven control architecture. Similarly, foot slippage is detected by means of monitoring the/~ angle and its velocity, if necessary a RESWING phase is executed and the foot is moved to a new, legal STANCE position. Actuator

detected

/) - ~--~aa+ ~-Tf

after slippage or o b s t a c l e

Fig. 6: Enhanced Single Leg Controller

Design

One of the major problems designing walking machines is represented by the power-toweight ratio of its actuators. The underlying geometrical design of the joint axis requires high active torques especially in t h e / 3 - joints during walking. We therefore developed a special motor-gear combination as depictured in Figure 7. The driven casing of the selected Harmonic-Drive gear also functions as bearing of the second leg segment. The actuators (Neodym-Bor DC-motors) for the a - a n d / 3 - joints are mounted inside a lightweight casing which represents the first leg segment. The weight of this leg design is about 2.8 kg. Since the maximum torque output of the /3- joints is about 60 Nrn, the total robot weight of 23 kg is well managed by six legs. During S T A N C E - phase one of the middle legs has to lift about 11 kg.

555

Fig. 7 The Design of the powerful ~ - joint actuator On-board

Electronics

The execution of the control routines is done by six on-board computers. A S I E M E N S SAB 80C166 microcontroller serves as central processing unit in each of the six legcomputers. Together with the necessary address decoders and E P R O M - chips the complete 80C166 computer fits on a 50x80mm card. The microcontroller has been provided with a 10 Channel 10 bit A/D-Converter, 2 serial channels with a data transfer rate up to 612 kB/s. Furthermore 16 C A P C O M output lines which serve as individual capturecompare units produce the pulses required for the pulse-width-modulation of the output motor currents. This card is back-packed on a standard 100xl60mm 'Euro'- card on which the amplifier components for the pulse-width-modulation of the three motor currents together with the remaining logic chips for the preparation of the sensor signals are mounted. Thus, a completely autonomous computer unit including power amplifiers which manage up to 10 Ampere / 32 Volts per motor and including sensor electronics fits on one single 100xl60mm card. Six of these units are mounted on the central body of the insects as depicted in Fig. 8. Each unit is provided with eight sensor signals. The three servomotors per leg are provided with a DC-tachometer. The angular position itself is measured by precision potentiometers, together with the A/D conversion a resolution of 10 bit per 330 ~ rotation angle is achieved. The contact force on the ground is measured indirectly via strain gauges which are applied to the upper ends of the leg segments. Finally a contact sensor zeroes a binary TTL-signal when the leg touches the ground.

556 Leg Sensors:'] SIEMENS 3 Angles / 3 V e l o c i t i e s / - - - I > 80C 166 - Power 1 DMS / amplifiers 1 Contact / - Sensor 3 DC- Motors electronics

Leg Sensors 3 Angles 3 Velocities 1 DMS 1 Contact

3 DC- Motors

SIEMENS 80C 166

- Power amplifiers - Sensor electronics

SIEMENS 80C 166

- Power amplifiers - Sensor electronics

SIEMENS 80C 166

- Power amplifiers - Sensor electronics

Leg Sensors: 3 Angles 3 Velocities 1 DMS 1 Contact 3 DC- Motors

Leg Sensors: 3 Angles 3 Velocities 1 DMS 1 Contact 3 DC- Motors

hMbit/s serial d a t a link Leg Sensors:'] SIEMENS 3 Angles | 3 Velocities | - - - I > 80C 166 - Power 1DMS / amplifiers 1 Contact / - Sensor 3 DC- Motors electronics

SIEMENS 80C 166

Power amplifiers Sensor electronics

Leg Sensors: 3 Angles 3 Velocities 1 DMS 1 Contact 3 DC- Motors

Fig.8: On-board Computers and Signal Flow The control software is stored in twelve 32kByte E P R O M chips, it is cross-compiled from high-level C-language. The communication between the six units is based on a serial 625 kBaud link. A data block consisting of 7 Bytes with 9 bits each is passed around ten times a second. It contains all the information which is exchanged between the legs.

4 Summary The presented six-legged robot has been designed and constructed at the Institute B of Mechanics of the Technical University in Munich. Its main features are an insect-like leg design and its control structure which has no central supervisor but instead six individual leg controllers. The leg controllers communicate with each other, they establish a leg phase coordination by mutually shifting threshold values of their a - joint angles. The robot will be used for detailed studies of legged locomotion control.

557

5 References [1] B~SSLER, U.: Neural Basis of Elementary Behaviour in Stick Insects, Springer Berlin, 1983 [2] BROOKS, R.A.:A robot that walks: Emergent behaviors from a carefully evolved network, Neur. Comput., vol. 1, pp.253-262, 1989 [3] CHIEL,H.J., BEER, R.D., QUINN, R.D., ESPENSCHIED, K.S.: Robustness of a Distributed Neural Network Controller for Locomotion in a Hexapod Robot, IEEE Transactions on Robotics and Automation, Vol. 8, No. 3, June 1992 [4] CRUSE, H.: The Function of the Legs in the Free Walking Stick Insect, Carausius morosus, Journal of Comparative Physiology 112, 1976 [5] CRUSE, H.: What mechanisms coordinate leg movement in walking arthropods ?, Trends in Neurosciences 13,15-21,1990.

[6] CRUSE, H., DEAN, J., MIJLLER, W., SCHMITZ, J.:

The Stick Insect as a Walking Robot, Proc. Fifth Int. Conf. on Adv. Robotics, 'Robots in unstructured Environment, pp 936-940, Pisa, Italy, June 19-22, 1991.

[7] ELTZE, J., WEIDEMANN, H.-J., PFEIFFER, F.: Design of Walking Machines using Biological Principles. Proc. of the IFToMM-jc Intl. Symposium on Theory of Machines and Mechanisms, Nagoya, Japan, Sept. 24 - 26, 1992, Vol. 2, pp. 689 - 694. [8] PrEIFFER, F.; WEIDEMANN, H.-J.; DANOWSKI, P.: Dynamics ofthe Walking Stick Insect, Proc. of the 1990 IEEE International Conference on Robotics and Automation, Cincinatti, Ohio, USA, May 13-18, 1990, Vol. 3, pp. 1458- 1463 [9] PFEIF~ER, F.; WEIDEMANN, H.-J.: Walking of a Stick Insect, Theory and Practice, Proceedings of the IIXth CISM-IFToMM Symposium on Theory and Practice of Robots and M~nipulators, July 2-6, 1990, Cracow, Poland [10] P~EIFFER, F.; WEIDEMANN, H.-J.; DANOWSKI, P.: Dynamics of the Walking Stick Insect, IEEE Control Systems Magazine, Vol.ll, No. 2, Feb. 1991, pp. 9- 13 [11] WE~DEMANN, H.-J.; PVEIr~ER, F.: Dynamics and Control of 6-legged Walking Machines. Proc. of the 9th CISM-IFToMM Intl. Symposium on Theory and Practice of Robots and Manipulators, Udine, Italy, Sept. 1 - 4, 1992 (to appear). [12] WEIDEMANN, H.-J.: Dynamik und Regelung yon sechsbeinigen Robotern und natiirlichen Hexapoden. Dissertation, Technische Universitaet Muenchen, Juli 1993. Published as VDIFortschritt-Bericht No. 362, Series 8., VDI-Verlag, Diisseldorf, Germany. [13] WHI)EMANN, H.-J.; ELTZE, J., P~'EIF~'ER, F.: Leg Design using biological Principles. Proc. of the 1993 IEEE Conference on Robotics and Automation, May 2-7, 1993, Atlanta, Georgia, USA (to appear). [14] WEIDEMANN, H.-J.; EI~TZE, J., PFEIFFER, F.: A Design Concept for Legged Robots derived from the Walking Stick lnsect. Proc. of the 1993 International Conference on Intelligent Robots and Systems, IROS 93, July 26-30, 1993, Yokohama, Japan.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

559

Vision-Based Adaptive and Interactive Behaviors in M e c h a n i c a l A n i m a l s U s i n g t h e R e m o t e - B r a i n e d

Approach

Masayuki Inaba, Satoshi Kagami, Tatsuya Ishikawa, Fumio Kanehiro, Koji Takeda, Hirochika Inoue a aDepartment of Mechano-Informatics The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, JAPAN We present a variety of vision-based adaptive and interactive behaviors in mechanical animals. The mechanical animal is a multi-legged robot designed as a remote-brained robots which does not carry its brain within the body. It leaves the brain in the mother environment and talks with it by radio links. The brain is raised in the mother environment inherited over generations. The key idea of the remote-brained approach is that of interfacing intelligent software systems with real robot bodies through wireless technology. In this framework the robot system can have a powerful vision system in the brain environment. We have applied this approach toward the formation of vision-based dynamic and intelligent behaviors in mechanical animals such as a doglike robot and an apelike robot. In this paper we introduce the remote-brained approach and describe some remote-brained robots and visual processes for adaptive and interactive behaviors with them. 1. I N T R O D U C T I O N Although vision is one of the most important sensing functions of a robot, it is hard to build a robot with a powerful vision system in its own body because of the size and power requirements of a vision system. If we want to advance research on vision-based robot behaviors requiring dynamic reactions and intelligent reasoning based on experience, the robot body has to be lightweight enough to react quickly and have many DOFs, to enable it to show a variety of intelligent behaviors. In the history of robotics research, many vision-based robot systems have been presented. One of the typical vision-based robots is the hand-eye system [1-3]. The role of vision in hand-eye systems includes object recognition, visual verification of arm motion, and visual monitoring of the environment. Most hand eye systems have tried to perform intelligent robot behaviors based on visual feedback control. As the hardware technology has progressed recently, dynamic behaviors based on vision have also been demonstrated with advanced dynamic control of arms [4]. Another typical vision-based robot is the mobile robot [5-7]. The tasks for a mobile robot with vision are navigation and map building. As a mobile robot with vision can be a tool for computer vision, especially active vision, many approaches to implement vision

560

Perception

Action

I

I

Jtion

Perception

I,'

'11

Perce.tion

,._

Perfect Planning

Attention Selection

Motion Arbitration

Model Feed-forward

Feedback loop

Parallel Structure

(A)

(B)

Action

.tTt

(C)

Figure 1. Structures for Connecting Perception to Action

systems and techniques have been presented. As for legged robots [8-11], there is only a little research on vision-based behaviors [12]. The difficulties in advancing experimental research for vision-based legged robots are caused by the limitation of the vision hardware. It is hard to keep developing advanced vision software in limited hardware. In dynamic active vision research [13-16], such legged walking robots also can be a good tool to prove robustness in visual processes. The technology to build legged robot with arms and vision will advance experiments of active and purposive vision [17] with robots working as visual agents observing the real world with intention. In order to solve the problems and advance the study of such active vision-based behaviors, we have adopted a new approach through building remote-brained robots. The body and the brain are connected by wireless links. In order to perform vision-based behaviors, wireless cameras and remote-controlled actuators make the robot body physically separate from the brain environment. As the robot body does not need computers onboard, it becomes easier to build a lightweight body with many DO Fs. In this paper, we describe the research framework and our approach for vision-based robotics through building mechanical animals. Several experiments with vision-based interactive and adaptive behaviors are presented. 2. T H E R E M O T E - B R A I N E D

APPROACH

Robotics is intelligent connection of perception to action [18]. There have been presented several approaches. We summarize them by grouping them into three models as shown in Figure 1. (A) Perfect Plan and Feed-Forward Approach This is a classical approach aiming at an algorithmic and control-oriented theory. When a task and a model of a problem are provided, a robot in this approach tries to generate plans and methods to execute the task. This requires full description of the model and clearly defined specification of the task. In order to generalize

561 the capability of the robot, the representation of model and task is investigated [3]. However, it is hard to build a clear representation which treats the dynamic world. (B) Attention Selection and Feedback-Loop Approach This was a typical approach in sensor based robotics, which looks at intelligence on adaptivity to real world and instructability from a human. A robot on this approach has a behavior to request a goal and solve it with sensory feedback control. A goal is achieved by verification based on sensors and generation of operations filling a gap between the goal and the real situation. In the verification stage, the robot controls its attention based on the degree of goal achievement. The selected region of attention navigates the next stage of actions. Such attention mechanisms controls the deepness to which the robot thinks to solve the goal. (C) Action Arbitration and Parallel Connection Approach This was originated by Brooks. He showed the importance of parallelism in the connection from perception to action [19]. The subsumption architecture consists of processes of layered behaviors and arbitration mechanisms in sensor-based action. In this architecture, as the path from perception to action is connected by parallel flows in time space, the system has to arbitrate among parallely activated actions. The parallelism in the connection of perception to action (C) is a key for the run-time controller of a robot to perform adaptive behaviors in the face of dynamic reaction from the real environment. The idea of feedback loop (B) in persuit of a goal-oriented path through attention control is also required when a human passes a goal to the robot. At higher levels of interaction with a human a robot requires functionality like (A) to infer human behaviors and reason about the goal the human gives. The remote-brained approach is proposed as a way to investigate robot architecture for such broad capability in future robots. It tries to show a way to integrate them for both adaptive and interactive behaviors with real examples. The connection link between the body and the brain defines the interface between software and hardware. Bodies are designed to suit each research project and task. This enables us advance in performing research with a variety of real robot systems [20]. The main advantages of the remote-brained approach are:

1. Powerful Brain with Vision A major advantage of remote-brained robots is that the robot can have a large and heavy brain based on super parallel computers. Although hardware technology for vision has advanced and produced powerful compact vision systems, the size of the hardware is still large. Wireless connection between the camera and the vision processor has been an important research tool. The remote-brained approach allows us to progress in the study of a variety of experimental issues in vision-based robotics.

2. Lightweight B o d y Another advantage of the remote-brained approach is that the robot bodies can be lightweight. This opens up the possibility of working with legged mobile robots. As

562

Figure 2. Hardware Configuration of the Remote-Brained System

with animals, if a robot has 4 limbs it can walk. We are focusing on vision-based adaptive behaviors of 4-1imbed robots, mechanical animals, experimenting in a field as yet not much studied.

3. Evolving Mother Environment The brain is raised in the mother environment inherited over generations. The brain and the mother environment can be shared with newly designed robots. A developer using the environment can concentrate on the functional design of a brain. For robots where the brain is raised in a mother environment, it can benefit directly from the mother's 'evolution', meaning that the software gains power easily when the mother is upgraded to a more powerful computer. In the remote-brained approach the design and the performance of the interface between brain and body is the key. Our current implementation adopts a fully remotely brained approach, which means the body has no computer onboard. Figure 2 shows the hardware configuration of the remote-brained system which consists of brain base, robot body and brain-body interface. Our current system consists of three blocks of vision subsystems and the motion control system. A block can receive video signals from cameras on robot bodies. Two of the vision subsystems are parallel sets each consisting of eight vision boards. A body just has a receiver for motion instruction signals and a transmitter for sensor signals. The sensor information is transmitted from a video transmitter. It is possible

563 to transmit other sensor information such as touch and servo error through the video transmitter by integrating the signals into a video image [21]. The actuator is a geared module which includes an analog servo circuit and receives a position reference value from the motion receiver. The motion control subsystem can handle up to 104 actuators through 13 wave bands and send reference values to all the actuators every 20 ms. 3. T H E T R A C K I N G

VISION

Vision for mechanical animals requires powerful functions in dynamic visual processing. Our approach is based on a visual tracking function in order to keep observing the moving scene. We have developed a tracking vision board using a correlation chip [22]. This vision board consists of a transputer augmented with a special LSI chip(MEP [23]: Motion Estimation Processor) which performs local image block matching.

Figure 3. The Function of a Motion Estimation Processor

Figure 3 shows the functions of the MEP chip. The inputs to the processor MEP are an image as a reference block and an image for a search window. The size of the reference

564 block is up to 16 by 16 pixels. The size of the search window depends on the size of the reference block, usually up to 32 by 32 pixels so that it can include 16 * 16 possible matches. The processor calculates 256 values of SAD (sum of absolute difference) between the reference block and 256 blocks in the search window and also finds the best matching block, that is, the one which has the minimum SAD value. Our vision system provides the following visual functions. 1. Software W i n d o w In our current implementation, the main processor (a Transputer) sends the reference block and the search window images to the processor MEP from the input frame memory. Thus we can select the pixels for the reference block and the search window from the frame memory in software, which provides us with a flexible spatial sampling rate for the block. 2. R o t a t i o n a n d Scalable Block M a t c h i n g With this configuration, we can implement several kinds of template matching such as scalable and rotatable matching by software program. One calculation to find the best-matching block takes about 1.5 ms. We can thus perform about 20 block matchings in a video sampling time (33.3 ms). 3. M o t i o n T r a c k i n g When the vision system tracks an image, it applies the block matching process with the reference block stored in the initial frame (time t = to) and the search window selected to be centered around the last tracking position from the current frame (t = ti). This allows vision to follow the initial template image using this local searching method. 4. M o t i o n D e t e c t i o n When the vision system detects flow vectors in an image, it samples both the reference block (t = ti) and the search window images(t = ti+l) continuously. In our system, we can get a thinned 256 • 256 image by the transputer to reduce the influence of image sampling error. As one visual processing unit can calculate 32 flow vectors, an eight processor system can perform 256 optical flow generations at video-rate. 5. E d g e D e t e c t i o n As an other static image processing, we employ a line segement detector in a window done by means of Hough transformation. The detector can provide an effective boundary verifier and trackers by controlling the search area in direction and distance effectively.

565

Figure 4. Body Structures of the Mechanical Animals

566 4. T H E M E C H A N I C A L

ANIMALS

We have developed several mechanical animal,;. Figure 4 shows the structure of some of them. They can show several locomotive styles. Goemon (1) and Tama (2) are four legged robots. Goemon is designed to interact with a human by performing a variety of gestures such as sit down, lift a leg, bow and so on. Tama has a gripper to catch a ball and a smooth stomach to crawl on fiat surfaces. Apelike (3) and Humanlike (4) are two-legged and two-armed robots. Apelike perforlns knuckle walk to follow a human using visual feedback. Humanlike can perform bipedal walking since the leg has four DOFs. One of the important points for mechanical animals, especially two-legged animals, is ability to stand up after falling down. Both Apelike and Humanlike can standup by themselves. The main electric components of the mechanical animals are joint servo actuators, a control signal receiver, a battery for actuators, a camera, a video transmitter, and the battery for the vision systems. There is no computer onboard. Each servo actuator includes a geared motor and analog servo circuit. The control signal to each servo module is position reference. The torque of servo modules available cover 2 kgcm - 14 kgcm with the speed about 120 deg/sec. The control signal transmitted by radio link encodes eight reference values. The mechanical animals in Figure 4 have two receiver modules onboard. In our remote-brained environment, as we can transmit 13 wave bands simultaneously, all of the mechanical animals can move at the same time. The body structure is designed and simulated in the mother environment. The kinematic model of the body is described in an object-oriented lisp, Euslisp, which has enabled us to describe the geometric solid model and build a window interface for behavior design. 5. V I S I O N - B A S E D B E H A V I O R IN M E C H A N I C A L A N I M A L S

This research aims to investigate the organization of sensor-based interaction and adaptive behaviors for advanced robotics through building vision-based mechanical animals. A mechanical animal is a robot which has a multi-limbed body. The multi-limbed body requires sensor-based control to adaptively guide its motion. The mechanical animal has to communicate with a human to get instructions and rewards through the sensors. The research goal of sensor-based mechanical animal~ includes how to formalize both adaptive behaviors and interactive behaviors and al:~o how to unify them into an uniform architecture. The goal in interactive behaviors includes how to aquire the task from observation based on sensors. Here, the goals for adaptive behaviors are not specified in detail given by a human. Figure 5 shows a simplified structure of ~ensor-based behaviors and instantiated examples of interactive behaviors and adaptive behaviors. In designing the behavior of a mechanical anim~ 1, the designer has to define the elements of percepts and actions by describing the initial formation of Memorized Perccptual Organization(MPO), Memorized Behavior Organization(MBO) and Memorized Kinematical

Organization(MKO). In the example of interactive behavior, a goal-oriented behavior is individually required by a human. The robot has to infer the requested task fi'om the presented signal. The

567 Environmental Reaction

Sensing ]

Performing ]

Human Reaction

I

Memorized Kinematical Organization

Memorized Perceptual Organization Memorized Behavior Organization

Percepts

:

I

Actions

(A) Interactive Behavior

Sensing ]

=

Performing 1

Human Reaction

l

Target Sign Detection

Motion Trajectory Generation

Percepts

Task Inference

.~

I

Actions

(B) Adaptive Behavior

Sensing ]

=

I

Achievement Observation

Percepts

Performing

Environmental Reaction

Recovery Method Generation

I

Evaluation

Figure 5. Behavior Structure of Mechanical Animals

:--

Actions

568 MPO-MBO-MKO sequence consists of detecting the target sign, infering the action required by the sign, and planning motions for the action. After performing the task, the robot has to interpret the human reaction to know whether the relation between the signal and the task was correct or not. In the case of adaptive behaviors, the robot does not receive a specific goals from the human but generates one by evaluating the achievement of a goal. The visual percepts provide the evaluation stage with the parameters describing achievement. If the achievement is not enough, a recover motion emerges. The behavior designer embeds the structure for adaptivity into the initial organization of the system. 5.1. V i s i o n - B a s e d Interactive Behaviors The mechanical dog, Goemon, can not only walk but also perform several gestures such as stand up, bow, sit down, and lean. The gesture configuration is generated from its body model. Our first trial with this robot is the interactive teaching of gesture learning. Figure 6 shows some gestures and the scheme of learning a mapping between visual patterns shown by a human and the gestures. As the interface with a human is vision, it is assumed that the robot knows the predefined meta signs, Yes and No, to know whether the requested gesture is right or not. In the experiment, human shows a gesture sign to request a gesture. When the robot detects a gesture sign, it starts performing a gesture. After the robot performs a gesture corresponding to the presented sign, human verifies the correspondence. If it is bad, human indicates No by waving the sign horizontally. Figure 7 tries to explain another visually guided experiment for performing interactive behaviors. The robot is programmed to follow the target object the human shows. The actions the robot can select from are waiting, standing up, go forward/backward, and turn left/right. One of the action is selected by observation of the target object. As the prepared actions include primitive locomotion in any direction, the human can navigate the robot to anywhere by visual guidance. Figure 8 shows a real experiment of visually guided interaction with a human, as explained in Figure 7. It shows bowing, knuckle walking, standing up on a table and greeting behaviors. The basic actions shown here are preprogrammed. Vision works as the communication medium with humans. 5.2. V i s i o n - B a s e d A d a p t i v e Behaviors Humanlike robot can standup on two legs. As it can change the center of gravity of its body by controling the ankle angles, it can perform static bipedal walks. During static walking the robot has to control its body balance if the ground is not fiat and stable. Figure 9 shows an experiment in balancing behavior. In this experiment, a human tilts the ground board on which the robot is standing. The robot vision is tracking the scene in the front view. It remembers the vertical orientation of an object as the reference for visual tracking and generates several rotated images of the reference image. If the vision tracks the reference object using the rotated images, it can measures the body rotation. In order to keep the body balance, the robot feedback controls its body rotation to control the center of the body gravity. The rotational visual tracker [25] can track the image at video rate. Tama is designed to interact with human through catching a ball. It can chase a ball

569

Figure 6. Some Examples of Gestures by Goemon and Learning the Mapping from Visual Patterns to Gestures

570

Figure 7. Guiding Knuckle Walk by Visual Following

and grab it. Figure 10 shows a sequence of chasing a ball, holding it by forearms, and picking it up by a gripper on its head. After it performs the pick up action, it brings the ball to a human. Vision is the only sensor for this interaction. It detects the target ball by template matching, analyzes the relationship between its body and the target ball, and verifies the picking up action. Figure 11 shows the images the robot looks at in chasing a ball. A human shows a ball and moves it as shown in (1). The vision memorizes the reference image for tracking the ball. The larger square shows the search window and the smaller one is the reference image. Vision can keep tracking the ball by changing the subsampling rate (shown in (2)). The robot holds the ball by forelegs in (3). The robot can show several adaptive behaviors. When the robot chases a moving ball, it controls the stride of legs on both sides to change its walking speed and direction. In order to keep tracking the ball in the middle of the view, the robot continuously controls the neck orientation. The control of the neck runs in parallel with control of the walking motion. When target detection fails, a looking-for-action is performed and the robot tries to search the target by block matching while moving its head around. Two examples of obstacle avoidance are shown in figure 12. The vision tracks the target during walking and measures the distance and the direction to the target. If the distance to a static target does not change when walking, it means some obstacles are in the way. If the walking direction does not lead to the goal, one of the legs may be disturbed. The avoid and climb actions emerge to give adaptive capability in chasing the target. In case (a), the MPO-MBO-MKO sequence consists of observing the distance history during walking, evaluating whether the distance does not change, and generating a climb motion

571

Figure 8. Visually" Guided Knuckle Walks by Apelike Robot

572

Figure 9. Keeping Balance Behavior by Humanlike Robot

573

Figure 10. A Sequence of Ball Catching Action by Tama

as a recovery action. In case (b), the MPO-MBO-MKO sequence consists of observing the orientation history during walking, evaluating whether the direction of the body does not change, and generating a avoiding motion. The achievement observation in both cases is a process of monitoring a temporal record of visually detected parameters. 6. C O N C L U D I N G

REMARKS

This paper has presented vision-based behaviors with mechanical animals. The key to build such behaviors is the remote-brained approach. As the experiments have shown, wireless technologies permit robot bodies free movement. It also seems to change the way we conceptualize robotics. In our laboratory it has enabled the development of a new research environment, better suited to robotics and real-world AI. The robots presented here are legged robots. As legged locomotion requires dynamic visual feedback control, vision-based behaviors for it can prove the effectiveness of the vision system and the remote-brained system. Our vision system is based on a high speed block matching function implemented with motion estimation LSI. The vision system provides the mechanical bodies with dynamic and adaptive capabilities in interaction with human. The mechanical dog has shown adaptive behaviors based on distance measurement by tracking. The mechanical ape has shown tracking and memory based visual functions and their integration in interactive behaviors. The research with mechanical animals will open a new way in vision-based robotics

574

Figure 11. Example Images in Tracking a Ball

Target distance L

"~'

Step

G Target [direction I~~~D-M~)~i~tgcle

t

(a) Climb Figure 12. Recovery Actions in Locomotion

(b) Avoid

575 research because of the variety of the possible behaviors created from the flexiblility of the body. The remote-brained approach will support learning based behaviors in visionbased robotics. The next tasks in this research include how to learn from human actions and how to allow the robots to improve their own learned behaviors. REFERENCES I. M. Ishii and T. Nagata. Feature extraction of three-dimensional objects and visual processing in a hand-eye system using laser tracker. Pattern Recognition, Vol. 8, pp. 229-237, 1976. 2. H. Inoue and M. Inaba. Hand eye coordination in rope handling. In Michael Brady and Richard Paul, editors, Proceedings of the First International Symposium on Robotics Research (ISRR1), pp. 163-174. MIT Press, 1983. 3. T. Lozano-Perez, et al. Handey: A task-level robot system. Proc. of 4th ISRR; Published as Robotics Research ~, pp. 29-36, 1988. 4. M. Buhler and D. E. Koditschek. From stable to chaotic juggling: Theory, simulation, and experiments. Proc. of IEEE International Conference on Robotics and Automation, pp. 1976-1981, 1990. 5. H. P. Moravec. Visual mapping by a robot rover. In Proc. of the 6th International Joint Conference on Artificial Intelligence, pp. 598-600, 1979. 6. J. Iijima, S. Yuta, and Y. Kanayama. Elementary functions of a self-contained robot yamabiko 3.1. In Proc. of 11th International Symposium on Industrial Robots, pp. 211-218, 1981. 7. C. Thorpe, M. Hebert, T. Kanade, and S. A. Shafer. Vision and navigation for the carnegie-mellon navlab. IEEE Trans. PAMI, Vol. 10, No. 3, pp. 362-373, 1988. 8. R.B. McGhee and G.I. Iswandhi. Adaptive locomotion of a multilegged robot over rough terrain. IEEE Trans. on Systems, Man and Cybernetics, Vol. SMC-9, No. 4, pp. 176-182, 1979. 9. H. Miura and I. Shimoyama. Dynamical walk of biped locomotion. Robotics Research : the First International Symposium on Robotics Research (ISRR1), pp. 303-325, 1983. 10. M. H. Raibert, Jr. H. B. Brown, and S. S. Murthy. 3-d balance using 2-d algorithms. Robotics Research : the First International Symposium on Robotics Research (ISRR1), pp. 279-301, 1983. 11. S. Hirose, M. Nose, H. Kikuchi, and Y. Umetani. Adaptive gait control of a quadruped walking vehicle. Robotics Research : the First International Symposium on Robotics Research (ISRR1), pp. 253-369, 1983. 12. R. B. McGhee, F. Ozguner, and S.J. Tsai. Rough terrain locomotion by a hexapod robot using a binocular ranging system. Robotics Research : the First International Symposium on Robotics Research (ISRR1), pp. 228-251, 1984. 13. R. Bajcsy. Active perception. Proc. of IEEE, Vol. 76, No. 8, pp. 996-1005, 1988. 14. D. Ballard. Animate vision. Artificial Intelligence, pp. 57-86, 1991. 15. M. Asada and Y. Shirai. Interpretation of video and range images for a mobile robot with dynamic semantic constraints. In Robotics Research; The Fifth International Symposium, edited by Hirofumi Miura and Suguru Arimoto, pp. 153-160, 1990.

576 16. A. Blake and A. Yuille, editors. Active Vision. MIT Press, 1992. 17. J.Y. Aloimonos, I.Weiss, and A. Bandyopadhyay. Active vision. International Journal of Computer Vision, Vol. 1, No. 4, pp. 333-356, 1988. 18. M. Brady and R. Paul, editors. Robotics Research: The First International Symposlum. The MIT Press, 1984. 19. R. A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, Vol. RA-2, No. 1, pp. 14-23, 1986. 20. M. Inaba. Remote-Brained Robotics: Interfacing AI with Real World Behaviors. In Proceedings of the 6th International Symposium 1993; Robotics Research: The Sixth International Symposium, pp. 335-344. International Foundation for Robotics Research, 1993. 21. M. Inaba, S. Kagami, K. Sakakki, F. Kanehiro, and H. Inoue. Vision-Based Multisensor Integration in Remote-Brained Robots. In 199~ IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 747-754, 1994. 22. H. Inoue, T. Tachikawa, and M. Inaba. Robot vision system with a correlation chip for real-time tracking, optical flow and depth map generation. In Proceedings of the 1992 IEEE International Conference on Robotics and Automation, pp. 1621-1626, 1992. 23. SGS-THOMSON Microelectronics. STI3220 motion estimation processor (tentative data). In Image Processing Data Book, pp. 115-138. SGS-THOMSON Microelectronics, 1990. 24. M. Inaba, T. Kamada, and H. Inoue. Rope Handling by Mobile Hand-Eye Robots. In Proceedings of International Conference on Advanced Robotics ICAR '93, pp. 121-126, 1993. 25. M. Inaba, S. Kagami, and H. Inoue. Real time vision-based control in sumo playing robot. In Proceedings of the 1993 JSME International Conference on Advanced Mechatronics, pp. 854-859, 1993.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

577

A u t o n o m o u s S o n a r N a v i g a t i o n in I n d o o r , U n k n o w n a n d U n s t r u c t u r e d Environments W.D. Rencken a aSiemens AG, Corporate Research and Development, 81730 Munich, Germany, e-mail: Wolfgang.Rencken @zfe.siemens.de A mobile robot operating autonomously in unknown, unstructured environments has to be able to map its environment while at the same time determining its own position accurately within this environment. This paper presents an approach where the bootstrapping problem of concurrent localisation and map building is solved by estimating the respective errors introduced by each of the processes and correcting them accordingly. The success of this approach also hinges on the ability to determine which measurement originates from which feature. A heuristic multiple hypothesis data association framework is developed to deal with this problem. The problems encountered with the implementation of the algorithms on the mobile robot ROAMER are discussed. Real experiments in typical office environments have shown that the robot is able to navigate autonomously in such indoor environments. 1 INTRODUCTION 1.1 The Navigation Problem A mobile robot that has to operate autonomously in unknown, unstructured environments is faced with a challenging task. Not only are there numerous obstacles to contend with, but they can also move in a highly unpredictable manner. Furthermore the environment is in no way prepared to suit the robot's operation. Given these constraints, the navigator is faced with two fundamental questions, namely Where am I? and How do I safely get to my goal? To answer these questions, the navigator needs to reliably detect obstacles and features and associate the features with an accurate map of the environment. This paper will primarily deal with the first question, also known as the localisation problem. The second aspect of navigation usually falls under the domains of path planning and obstacle avoidance. Several commercial mobile robots exist which navigate with the aid of artificial beacons such as reflective strips, whose positions are known in advance. These beacons are observed by accurate, reliable sensors such as laser range scanners and cameras. However, in many applications, such as in homes or offices, the structuring of the robot's environment is undesirable. Therefore the robot should be able to navigate in unknown, unstructured environments without any artificial help. It might be conceivable that in some cases some rudimentary map information might be available, such as an architect's plan of a building. However, completely autonomous operation requires that the robot is capable of constructing and maintaining its own map in a dynamic environment. Therefore this work will concentrate on the case where the robot has no a priori map.

578 1.2 Robot Task To answer the first navigation question of Where am I?, the robot's task is defined as

9 building an accurate, large scale map of the environment and 9 using the map, accurately determining its position anywhere in its environment. 1.3 Previous Work There is an inherent dilemma in the definition of the robot's task. To determine its position in the environment, the robot needs a map. On the other hand the robot needs to know its position to build up a map. This is similar to the famous question of which came first, the chicken or the egg? To bypass this dilemma most previous work has tended to focus either on localisation [1] [3] [4], or on map building [2] [11] alone. This implies that the robot is either supplied with an a priori map of its environment or its position is known accurately. The main challenge lies in finding a generic method that is able to deal with the bootstrapping problem of concurrent localisation and map building in general. For a detailed discussion on the few previous approaches on concurrent localisation and map building [5] [6] [9] [10], please refer to [8]. In general the robot has to be able to deal with the shortcomings of real sensors, imperfect data-to-feature association and unknown interdependencies between the robot's position and the features. Furthermore all this will have to be performed stably in real time. The methods depend to some extent on the type of sensors used. For instance if the sensors do not deliver stereo information, as is the case for conventional ultrasonic sensors, then ways of dealing with the added uncertainty of stereo-from-motion have to be found. However if stereo sensors are used, then this potential source of uncertainty is eliminated. Furthermore characteristics such as the data rate and field of view also play an important role since this affects how fast and how much information the robot receives at any point in time. This implies that the main approach has to be generic, while the implementation will to some extent be driven by the type of sensor used. 1.4 Overview The mathematical details of this work together with some simulation results were presented in a previous paper [8]. Therefore this paper will focus on the specific problems encountered with the implementation of the algorithms on our mobile robot ROAMER and the results obtained in real experiments. Nonetheless the most important principles of the approach will be discussed, so that the reader is still able to understand the general framework. First the two processes of localisation and map building will be looked at separately in sections 2 and 3. While section 4 looks at the data association problem, the main contributions of this work towards solving the problem of concurrently determining the robot's position while building a map of its environment at the same time, is dealt with in section 5. An implementation of the algorithms on the mobile robot ROAMER is described in section 6 and encouraging results are presented in section 7. Finally some of the remaining problems are highlighted.

2 LOCALISATION The localisation process attempts to answer the following question: Given sensor measurements and a map o f the environment, what is the robot's position?

579

Figure 1: The basic localisation algorithm.

To answer the localisation question, the robot needs to have a model of its own kinematics (describing how it moves), also known as the plant model, and a model of its sensors (describing how it sees). The robot's map consists of features that naturally occur in indoor environments such as corners, edges and planes. Given the models, the robot uses an extended Kalman filter to update its position estimate as in Figure 1.

3 M A P BUILDING Map building tries to answer the following question: Given the robot's position and sensor measurements, what are the sensors "seeing"?

To answer the map building question, the robot needs to have models of the features that can possibly exist in its environment (in this case planes, edges and corners). The ultrasonic sensor returns a range measurement in the form of a circular arc. Planes lie tangent to these arcs, while corners and edges lie at the intersection of circular arcs [5]. The robot constructs hypotheses about these features and inserts them into its map. However, since ultrasonic sensors are known to deliver spurious measurements, the danger of using incorrect feature hypotheses for navigation exists. Therefore not all map features are

580

Figure 2: The basic map building algorithm. suitable for localisation purposes. To reflect this, the map is partitioned into hypothetical, tentative and confirmed features, with only the confirmed features used for navigation. For each feature, metrics such as the uncertainty of its parameters and the plausibility of its existence are determined. Given these models, the robot constructs its map as in Figure 2.

4 DATA A S S O C I A T I O N Data association plays a pivotal role for both the localisation and map building processes. In fact, robustly deciding which measurement comes from which feature turns out to be the stumbling block for most approaches. For each measurement there are a variety of possible interpretations. A new measurement could stem from 9 a known plane feature 9 a known corner/edge feature 9 an unknown plane feature 9 an unknown corner/edge feature 9 an unclassified feature 9 a false measurement Ideally each hypothesis should be considered separately and the most likely one selected. For such a multiple hypothesis approach [7] hypothesis trees are constructed, since most hypotheses can only be safely rejected after more than one measurement. This turns out to be computationally very expensive.

581 Therefore a heuristic approximation of the multiple hypothesis approach was chosen to reflect on one hand the complexity of the data association problem, while on the other hand ensuring that the computational burden is kept as low as possible. The heuristic multiple hypothesis approach (HMHA) works as follows: 9 If the new measurement falls within the validation gate of a known feature, then the only other allowable interpretation is a known/unknown feature of a different class, e.g. if the measurement matches a known plane, then it could also stem from a known/unknown corner/edge and vice versa. This is a realistic assumption since ultrasound is capable of measuring a thin pipe immediately in front of wall. Therefore it is almost impossible to distinguish between the two types of features since they are only one or two centimetres apart. 9 If the new measurement falls outside the validation gate of a new feature, then it could be an unknown feature of any type. 9 Unclassified features and false measurements are not explicitly considered as hypotheses. Instead since there is an inherent filtering of features over time, i.e. the progression from hypothetical to confirmed features, these misclassifications tend to disappear with time.

4.1 Feature Visibility Feature visibility also plays an important part in deciding whether a measurement originated from a specific feature. The infinite support line of a plane is parametrized by [Pr, P0' Pv], where Pr is the normal distance from the origin of the global co-ordinate system to the plane, P0 is the angle with respect to the x-axis of a perpendicular drawn from the plane to the origin, and Pv, either +1 or -1, indicates which half-plane is [5] (see also Figure 3). Corners and edges are characterised by [Px, Py], where Px and py are the Cartesian coordinates of the feature. In terms of their visibility there is a big difference between corners and edges. Corners are only visible over a small range of angles, whereas edges are visible over a much larger domain (see also Figure 3). When operating in cluttered environments this visibility is crucial, since the data association is bound to choose the wrong feature if it cannot distinguish between the multitude of point features which abound. Therefore visibility bounds for corners and edges were introduced to make the data association more robust in cluttered environments.

Figure 3: Feature Visibility

582 5 C O N C U R R E N T L O C A L I S A T I O N AND MAP BUILDING In order to perform concurrent localisation and map building, the underlying bootstrapping problem has to be tackled. The crux of the problem can be formulated as trying to fit an uncertain range measurement into an interval, consisting of an uncertain feature position and an uncertain robot position (see Figure 4). However, since nothing is certain, there is no fixed point for the range measurement within in the interval. Therefore there are theoretically an infinite number of possibilities of allocating the measurement to the interval. There are however some fits that are extremely implausible, as can be seen in Figure 4. Therefore the method should aim to eliminate these. Since there is so much uncertainty involved, the best fit should be the one with the best statistical fit. The position (either robot or feature) with the lower uncertainty can be thought of as being the more reliable. Therefore it should be changed less than the other position during the estimation process. This results in the respective end of the range measurement being drawn towards the more reliable position estimate more strongly than towards the unreliable one. This is shown in Figure 5. This alone however might not be enough to ensure that the bootstrapping process is successful. For instance at the start of the robot's operation its position is known with very high accuracy, but no map information is available. As the robot moves, features are built up and placed into the map. If it takes too long to add a confirmed feature to the map, the danger

Figure 4: The robot performs an uncertain range measurement from its own uncertain position of a feature with an uncertain position.

583

Figure certain robot's for the

5: Fitting the range measurement to the interval with a) the robot's position being more than the feature's position and b) with the feature's position being more certain than the position. Once the fit has been made, the estimation process is essentially the same as localisation and map building processes.

exists that the robot's position uncertainty has become too large. Therefore it is important that a confirmed feature be found as soon as possible. The extent to which this poses a problem depends on the type of sensor used and the way in which the robot moves. If the sensors can deliver stereo information while the robot is stationary, then the chances of finding features suitable for localisation purposes are very high. However, situations could arise where the robot has to move to see something, even when equipped with stereo sensors. Therefore the mechanisms for dealing with this problem are still generally valid. The uncertainty threshold for deciding when a tentative feature becomes a confirmed feature is a very important parameter. Choosing it too high means that very uncertain features are used for the localisation and so the whole navigation process is bound to become unstable. Choosing it too low means that by the time a feature is available, the robot's uncertainty has grown to a point beyond return. Therefore the combination of deciding when a feature is suitable for localisation purposes (also known as a confirmed feature), and determining how the range measurement should fit into the interval of the robot and feature positions are the important aspects of this approach.

584 6 IMPLEMENTATION To verify the algorithms, they were implemented on our mobile robot ROAMER (see also Figure 6)

Figure 6: The mobile robot ROAMER.

6.1 Sensors The robot is equipped with 24 Polaroid ultrasonic sensors that are arranged in two horizontal layers at height 4 and 15 cm respectively. Furthermore the 24 sensors are subdivided into 3 boards of 8 sensors each. This means that the sensors can be fired in groups of three (one sensor from each board). In order to reduce the crosstalk, the individual sensors in each group spaced 1200 apart. Odometric information is obtained from the driven wheels every 60 ms. 6.2 Measurements during Motion In the past it has been customary to stop the robot during the data acquisition phase. There are several possible reasons for doing so. If the robot is stationary while the sensors are measuring, the location of each measurement is known precisely. Furthermore the location of the transmission and reception of the signal are also identical. Furthermore since the data acquisition and processing phases can be lengthy, the danger of the robot running into an obstacle is high. Therefore it is also safer to remain stationary while the sensors are measuring the environment. However the robot's behaviour tends to be unconvincing if it has to stop every few centimetres to perform its measurements. Therefore it is desirable for the robot to measure whilst in motion. This leads to errors when determining the position of a measurement and the measurement itself. Therefore the estimation of the sensor positions needs to take the robot motion into account.

585 7 EXPERIMENTAL RESULTS

The robot was driven around a typical office, shown in Figure 7 and Figure 8, for extended periods of time. In no way was the environment structured for the robot's operation. Inside the office were closets, desks and tables. Cables, waste paper baskets and bags lay strewn underneath and around the desks and tables. Furthermore the office is connected to a long corridor. Results for a typical run of about 20 minutes at a speed of about 12 cm/s are shown in Table 1 and Table 2 and Figure 9. Since the robot is driving continuously during that time in and out of the office, measuring its true position and orientation tends to be a problem. Therefore a "measurement area" was designated into which the robot was driven and halted every few minutes. The true position and orientation were then measured by hand and compared to the estimated robot state.

Figure 7: A typical office scenario.

586

Figure 8: Schematic representation of the test environment Table 1 shows that the position error tends to remain stable around 5 cm, while the orientation error tends to vary more, though still within satisfactory bounds The plane features were chosen to reflect the accuracy of the map building process, since it is easy to associate the estimated features with the true features. For comers and edges it is not as simple since there tend to be numerous such features in cluttered spaces. From Table 2 it can be seen that the plane ranges are fairly accurate, whereas the orientations of the planes tend to vary more. There is a connection between the plane orientation errors and the robot orientation errors. Any robot orientation errors which result from the odometry lead to incorrect plane Table 1 Localisation Errors Time (min:sec) 5:28 11:57

14:53 18:06 20:12 Table 2 Final map buildin~ errors id a

b c d e

PF -1

+1 +1 +1 +1

Pos Error (cm) 5.8 5.3 5.8 4.0 2.5

Orient Error (o) -7.5 -6.2

0.1 -2.7 3.0

Estimated

Errors

Pr (cm)

P0 (o)

152.16 354.62 132.99 145.49 145.22

0.57 -2.87 271.72 6.31 95.16

p~ (cm) 2.84 0.38 -2.99 -0.49 1.78

Po (o) -0.57 2.87 -1.72 -6.31 -5.16

587

Figure 9: Final confirmed feature map orientations, which in turn lead to incorrect robot orientations when the robot returns to a specific area again. It should be noted that the robot drove over cables lying on the ground during its run that tends to severely affect its odometry accuracy. Given these facts, the overall performance is very encouraging. The map at the end of the run is shown in Figure 9. At first, the door to the corridor is closed. Later during the run the door was opened and the robot drove into the corridor. Both the closed and opened doors were mapped correctly. However the closed door was not removed from the map since the plausibility for the wall does not decrease significantly when the door is open. This occurs since the other part of the wall is seen almost as often as the part of the open door. Therefore the robot is unable to decide whether to remove the feature or not. Ideally it should only consider sections of walls and remove them rather than reason about the whole features. The algorithms also tend to erroneously lengthen wall segments as can be seen for the desk section towards the top. The ultrasonic transducer has a low angular resolution and therefore "sees" a plane feature for a longer period of time when driving past it. Furthermore it can fuse clutter, such as cables or feet of persons sitting at the desk, with the desk itself when the clutter is almost collinear with the plane feature and close to it. Such effects can also introduce errors into the navigation process. Further errors are introduced by unclassified features such as the cylinder in the corridor. To the ultrasonic sensors a portion of the cylinder appeared as a flat wall. This misclassification led to the larger map errors in the corridor itself and would have ultimately led to the robot becoming lost if it had continued driving down the corridor. Despite some of the remaining problems, the results obtained are very encouraging. Not only did the robot operate in a completely unknown and unstructured indoor environment, it also moved continuously over an extended period of time while performing concurrent localisation and map building.

588 8 CONCLUSIONS AND OUTLOOK This paper has presented an approach for autonomous navigation in unknown, unstructured environments. The algorithms have been tested on our mobile robot ROAMER in real office environments and the results showed that the robot navigated stably in its environment. Special attention was paid to overcoming the problems of measuring whilst in motion so that the robot is able to operate smoothly. The current speed of about 12 cm/s is still too low, limited in part by the lack of computational power. The localisation cycle typically takes about 1.5 s (about 0.7 s effective time since the rest of the time is needed for obstacle avoidance). It is envisaged to conduct further exhaustive tests in larger environments and at higher speeds. Some of the remaining problems such as misclassification of features, erroneous fusion of features and the removal of quasi-static features such as doors need to be addressed. Notwithstanding the remaining problems, this work has made considerable progress towards enabling mobile robots to navigate stably in unknown, unstructured environments. 9 REFERENCES

[ 1] F. Chevanier and J. L Crowley. Position Estimation for a Mobile Robot Using Vision and Odometry. Proceedings of the 1992 IEEE Conference on Robotics and Automation, pages 2588-2593, 1992. [2] G. Gilbreath and H.R. Everett. Path planning and collision avoidance for an indoor security robot. Proceedings SPIE- Int. Soc. Opt. Eng. Mobile Robots III, 1007:19-27,1989. [3] A.A. Holenstein, M.A. Mtiller and E. Badreddin. Mobile Robot Localisation in a Structured Environment Cluttered with Obstacles. Proceedings of the 1992 IEEE Conference on Robotics and Automation, pages 2576-2581, 1992 [4] L. Kleeman. Optimal Estimation of Position and Heading for Mobile Robots Using Ultrasonic Beacons. Proceedings of the 1992 IEEE Conference on Robotics and Automation, pages 2582-2587, 1992. [5] J.J. Leonard and H.F. Durrant-Whyte. Directed Sonar Sensing for Mobile Robot Navigation. Kluwer Academic Publishers, Boston, London, Dordrecht, 1992. [6] P. Moutarlier and R. Chatila. Stochastic multisensory data fusion for mobile robot location and environment modeling. In 5th Symposium on Robotics Research, 1989. [7] D.B. Reid. An Algorithm for tracking multiple targets. IEEE Transactions on Automatic Control, 6(24), 1979. [8] W.D. Rencken. Concurrent Localisation and Map Building for Mobile Robots Using Ultrasonic Sensors. In Proceedings of the 1993 IEEE/RSJ Conference on Intelligent Robots and Systems, pages 2192-2197,1993. [9] R.C. Smith and P. Cheeseman. On the representation and estimation of spatial uncertainty. Int. Journal on Robotics Research, 5(4):56-58,1986. [10] R. Smith, M. Self and P. Cheeseman. Estimating uncertain spatial relationships in robotics. In I. Cox and G. Wilfong, editors, Autonomous Robot Vehicles. Springer Verlag, 1990. [11] A. Zelinsky. Mobile robot map making using sonar. J. Robot. Syst., 8(5):557-577,1991.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

589

Control and Localisation of an Autonomous Mobile Vehicle M. Adams, N. Tschichold-Giirman, S. Vestli and D. von Fliie Institute of Robotics, Swiss Federal Institute of Technology, Zfirich, Switzerland Mobile robotics has been labelled as a subject which has promised so much but produced so little. In response to this we present here various research and development issues which we are addressing in order to realise an autonomous mobile robot. To realise this we have implemented a behavioural control system which is capable of recognising scenes within a known environment. The behaviours are defined as sensor data fueled algorithms capable of dealing with both expected and unexpected situations. An algorithm for localising the mobile robot in the presence of uncertain odometric information is also presented. By tracking predicted features from the environment with sonar and optical range sensors, we show that the position of the vehicle can be accurately estimated whilst simultaneously building a map of the environment. 1. I N T R O D U C T I O N Mobile robotic applications require solutions to many fundamental research problems. In response to this, we are currently addressing several fundamental mobile robot research problems. We describe here three particular areas that we have addressed, namely the overall control structure of the mobile system; a trajectory following control algorithm which generates smooth paths for the mobile robot to follow and algorithms which at present manipulate ultrasonic and optical range data in order to track environmental features (corners, walls or edges) in order to update the position of the mobile robot. In section 2 we present a mobile robot control system which manipulates two dimensional range data (ultrasonic and optical [15]) in order to recognise "situations" for the mobile robot. We define a situation as a recognised scene within an environment, meaning that some kind of preplanned action can be taken. Recognised situations are then used as a basis to provide a certain "behaviour" (a particular algorithm competent in handling the "situation") for the mobile robot, under a neural network decision making algorithm. Results using this technique in indoor environments are shown in the form of planned paths of the mobile robot. Planned paths provided by the decision making algorithm are passed to the "trajectory following controller", results of which are shown in section 3. We show that the steering angle and front wheel velocity of the mobile robot can be modelled as a non-linear feedback controller. By analysing the motion of the robot under such a non-linear control scheme, smooth trajectories can be achieved. As the mobile robot follows its desired trajectories, it is essential that it can upda, te

590 its position reliably and accurately. In section 4 we offer a solution to this localisation problem which, at present, manipulates two dimensional ultrasonic range data. The algorithm extracts certain features from an indoor environment (corners, walls) and keeps track of them as the mobile robot moves. We show in this section that positional errors caused by odometric estimates can be compensated for as the algorithm generates an optimal state estimate of the robot's position, after each scan of the environment, and further quantifies the probabilistic certainty of this estimate. Finally in section 5 we present a new approach for continuously tracking a single feature with an optical range finder. We will demonstrate that it is possible economise on the computer power necessary to localise a mobile robot. 2. M O B I L E R O B O T

CONTROL

STRUCTURE

\

When the operalilng environment of the robot is known, global planning can be realised using a graph search algorithm [6]. Most of these graphs contain only information on specific positions within the environment [9,12,13]. A graph search algorithm, such as A*, selects the nodes to be visited and the robot traverses them using a navigation algorithm. We propose to use available, a priori information about our environment as much as possible including information which can be delivered by the user. For this purpose we interpret the graph nodes as path segments and save all available information which characterise the path segments in the graph nodes. This allows us to define the behaviour as well as the reactions of the robot while traversing this segment. Thus the possible algorithms to be applied on a single path segment are defined by the user. Operating on such a graph, the global planner generates a list of intermediate "path segments" to be visited to reach a goal state. The local planner takes the next "path segment" from the list and tries to reach it using the information associated with this list element. An overview of this structure including the local planner is shown in figure 1.

User Interface

~

n(ext

__.........._t~

Global Planner

~

I subgoa and I how to

Local Planner

Figure 1. Control structure of the mobile robot

A pa.ttl segment contains its start and end coordinates, the standard beha,viour on tile pa,th and the reflex behaviour for exceptional cases (obstacles). The positional inforn]ation

591 can be defined in absolute world coordinates. It can also be a sensor defined position, this means, some sensor da.ta has to be detected a,t the specified position (e.g. information from passive sensors (see section 4), or a, particular scene to be detected from the optical system). It is also possible to use a combination of these. The standard behaviours will be: 9 Following the right wall. 9 Following the left wall. 9 Driving in the middle of the corridor. 9 Driving "blind" along a virtual trajectory (useful in larger rooms where little sensor data is available). 9 Passing through a door or entering an office. 9 Exact positioning. 9 Using the elevator. The reflex behaviours are needed to select an algorithm in case of exceptions such as obstacles in the path. Two of the main reflex behaviours are" 9 Trying to avoid an obstacle. 9 Not trying to avoid an obstacle (e.g. in a narrow corridor); 9 Just wait until the obstacle disappears (only possible for moving objects). 9 Use an alternative path. The appropriate standard or reflex behaviour is selected by the situation based behaviour selector, this process is shown in figure 2. 2.1. S i t u a t i o n based b e h a v i o u r s e l e c t o r The situation based behaviour selector is responsible for scheduling the appropriate controllers as indicated by the information contained in the path segment data packet and the current sensor data. A first prototype of the behaviour selector has been realised with a neural network based decision mechanism known as '~RuleNet" [14]. For this purpose, the network has been trained with a subset of a laser scanner's range data as input (9 inputs) and the appropriate behaviour as desired output. The information from the global planner about the behaviour and the reflex for the path segments were ignored. After training the network with 659 randomly generated examples it was tested on 346 newly generated examples. Tile network was able to recognise all training examples correctly. On the tested examples its success rate was 89.6%. Tile network was therefore able to recognise most of the situations correctly and select tile appropriate controller. An analysis of the training examples has shown that the distinction of the situations based only on features stemming from sensor data is very difficult as only small changes in the input data lead to significant changes in the outputs. This makes a classification

592

drive in the middle of the corridor

sensor information

w[ actual position

goal position

avoidonobstaclefight

follow left wall situation behavior selector based

~.._1 avoidonObStacleleft follow fight wall wait until obstacle disappears enter office

pre-defined behavior & reflex

call for help take elevator

Figure 2. The situation based behaviour selector, associated behaviours on the right and input information on the left.

based on the approximation of the pattern clusters with geometrical primitives very difficult. Therefore preprocessing the sensor data to produce useful qualitative information is necessary to achieve better results. The controllers of the behaviours take the sensor information and the path segment data packet to navigate the mobile robot. Tile sensor information is delivered from the sensor data interpretation module. Separate preprocessing of the sensor data makes the navigation algorithm independent of the robot's sensors. For example a controller for wall following needs the distance to the wall and the orientation of the robot to the wall. This reduced information leads to efficient controller algorithms because the relevant information is reduced to two values and is directly accessible. The preprocessing of the sensor data usually has the effect that similar inputs demand similar navigation steps, so any classification algorithms, including neural networks and fuzzy control algorithms can be used efficiently. Another advantage of the preprocessing is that the algorithm can be used on different robots with different sensors. The only part that must be changed is the preprocessing of the raw sensor data to extract the relevant information for the naviga,tion algorithm. As the controllers are totally independent of each other it is possible to change or optimise a particular behaviour without affecting tile others. It is also easy to add new beha,viour modules to the system. Only the behaviour selector module has to be extended and the user can introduce this new behaviour in the path segments of the global planner. All the existing behaviour modules are not influenced by this extension. On the other hand, if one single module has to handle several beha.viours, optimising one behaviour often has an uncontrolla.ble influence

593 on other behaviours. The behaviours are implemented as separate processes within the multitasking real-time system XOberon [4] using different control methods such as neural, fuzzy, and classical control techniques. An example run of the mobile robot controlled using the above described control scheme is illustrated in figure 3.

o$ --1

I

I

E

Figure 3. Performance of the robot control scheme. S - mobile robot start position, (3 = goal position.

3.

TRAJECTORY

FOLLOWING

CONTROL

As described above it is also desirable to follow virtual trajectories, for instance when moving across a large open space. We have implemented a behaviour for following virtual trajectories as a non linear feedback controller. Consider the robot relative to the path segment as in figure 4 below. The coordinates of the new position is provided by the global planner along with the information that this behaviour should be utilised. The virtual trajectory is a straight line from the last co-ordinate visited (old position) to the next coordinate to be achieved (next position). The goal of the behaviour is to keep the robot on this trajectory. The t)ehaviour is then implemented as a feedback control law according to equations 1 and 2. )~ - - tan-1 (~-1)c~ ~.,

= t',

(1) (2)

\,\.T]l(~r(~ .~ is lille steering angle and a.' is the h'ont wheel velocity. This steering angle and front wlleel velocity is then a,chieve(111sing two sepa,ral.e single input single out, put (SISO)

594 Next Position /

/

/

1 {w}

/

ey

Old Position

(o}

Figure 4. Mobile robot parameters shown relative to its virtual trajectory

controllers, the performance of which are described in [15]. A block diagram showing the operation of the control system is shown in figure 5. The parameters for a stable controller can be chosen according to [15] and an actual run with the mobile robot utilising such a controller is depicted in figure 6. The two solid lines show the desired trajectory, as output from the situation based behaviour selector. The control system in figure 5 produces the smooth path shown by the dashed line, which brings the mobile robot back on to track in a smooth curve. Note that the maximum deviation from the desired path in this case is approximately 15 cm. 4. L O C A L I S A T I O N OF T H E M O B I L E R O B O T As the mobile robot follows its given trajectories, it is essential for accurate navigation that it can update its position at all times. The problems of relying solely upon the odometry are well documented [11] and we will show here that wheel slippage, uneven floor surfaces and inaccurate calibration cause the positional uncertainty to grow indefinitely when using only odometry. l%llowing tile work by Leonard [8] we describe here our results of using an approach to mobile robot navigation that unifies the problems of obstacle detection, position estimation and map building in a common multi-target tracking framework. As the mobile robot moves it continuously tracks naturally occurring indoor targets or ~%ea,cons". Predicted targets (found from the known environmental map) are tracked in order to update the position of the vehicle. Newly observed targets (those not predicted) are ca.used by unknown environmental features or obstacles from which new tracks are initia.ted, classified and eventually integrated into the map. We ha.ve implemented the above technique, at present using real sonar data,. We note

TrajectorData y .~IT~jccmry ~C~176 ~I Controller l:xo, Yo>orld [xn,yn]world Upd~cs

[x,y.O]world

~

•[Calculatiofl on !

H-SlccringII

Controll~ H

Xacm~.

595

~owlsio~} Tct~

~sit/on

i, t

Figure 5. Trajectory controller block diagram showing inner steering angle and wheel velocity controllers. that a good sensor model is crucial for this work and in order to predict our expected observations (from the sonar data) we use the sonar model presented by Kuc and Siegel

[q.

Figure 7 shows regions of constant depth (RCD's) which have been extracted from 15 sona, r scans recorded from ca,oh cross (• The model from Leonard's work suggests that RCD's such as those recorded at the positions marked A in the figure correspond to planar surfaces; R.CD's marked B rotate about a point corresponding to a 90 ~ corner and t{,CD's such as C which cannot be matched correspond to multiple reflections of the ultrasonic wave.

4.1. Position Updating The position of the mobile robot is estimated under the framework of an extended Kalman filter algorithm [2]. The algorithm requires two models: 1. A plant model which describes how the vehicle's position changes with discrete time in response to a control input (from the trajectory controller) and a random noise disturbance. 2. A measu, rem, eT~t model which mathematically expresses a sensor measurement in terms of the vehicle's position and the geometry of the beacon being observed.

The aim of tile algorithm is to continually produce estimates of the position of the robot (the estimated states of the I(ahnan filter) and the associated variances (the error covariance matrix of the I(ahnan filter) based upon the previous estimates, the control i~li>ut a.~l
596

[-

Trajectory controller performance i

I

i

I

0.7

1

0.6 c o n t i n o u s line = d e s i r e d t r a j e c t o r y d a s h e d line = a c t u a l t r a j e c t o r y

0.5 0.4 --.9 0 . 3

/-/

-~ 0 . 2

./.f

L_ O

f

0.1 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

-0.1 -0"2 I -0.3

I

0.6

018

i

1

12

1.

14

1. W o r l d x [m]

16

1.

18

1.

2

Figure 6. Performance of the controller implementing equations 1 and 2 when moving onto a new path segment with a 45 ~ rotation relative to the last segment.

the true and odometrically measured positions. The stars (.) represent the estimated positions output from the extended Kalman filter algorithm. Centred upon each estimate is a ~certainty ellipse' which is known to contain the actual position of the robot with some constant probability. The ellipses have as minor and major axes the eigenvalues of the Kallnan error covariance matrix and are drawn in the directions of the eigenvectors of the error covariance matrix [2]. Note that during the first 5 scans (recorded at true points 1 to 5 in figure 8) the corner beacon was not visible and therefore the Kalman filter estimates correspond to the odometric estimates. Note also that the certainty ellipses grow from points 1 to 5 as our estimates of the vehicle position get worse since we have no observations. At position number 6 in figure 8, the corner beacon is observed and has been used to esl;imate the mobile robot's position. Note tha.t the estimated and true positions virtually coincide and the certainty ellipse has shrunk, as we are more confident in our position. Between positions 6 and 11, the bea, con is tracked by the algorithm and the certainty ellipses shrink further. The minor a.xis of each ellipse ha,s a.n approximate direction a,long the line joinillg tile sensor and the bea.con. It is in this direction that we are most certa, in of tt~e rol)ot's position. At position 12 in figure 8, the beacon is lost (the robot has traversed the door and the corller is no longer visible). T'he effect on the positional estimates and certainties can be clearly seen. Figure 9 shows the same mobile robot run, this time localising from two sensed beacons. namely the wall beacon a.t E and the wall at F in figure 7. It can be seen that the

597 algorithm is capable of producing highly accurate positional estimates of the robot and simultaneously building a map of its sensed environment a.s we become more confident of the nature of our features. 5. M O B I L E R O B O T

TARGET

TRACKING

We note that in section 4 we firstly take a 360 ~ scum containing 180 sonar range data samples and produce approximately 4 or 5 beacons from this data for use in localisation. We can therefore conclude that although this algorithm provides a powerful localisation method it is not very computationally economical (ie: 180 scan points required to produce just 4 beacons) and it must be possible to do better. We therefore present in this section a new philosophy which allows an optical range finder to continuously track a beacon within its environment. We will show that it is possible to register a chosen beacon by measuring the amplitude of the returned light signal with an amplitude modulated continuous wave (A.M.C.W.) light detection and rangin 9 (lidar) sensor. A.M.C.W. lidar sensors transmit an amplitude modulated light signal and measure the phase shift between the transmitted and received modulating wave forms. This phase shift is directly proportional to range. A description of exactly how these sensors work is given in [3] and [1]. In order to develop our tracking algorithm we will present here the physics involved in measuring range with such a sensor when the collimated optical beam is split between two objects of differing range and/or surface reflectivity. 5.1. S i m u l t a n e o u s R e f l e c t i o n of Signals f r o m T w o Surfaces Consider a transmitted reference signal V0 coswt which is incident upon an edge (figure 10). An area A1 is illuminated on the closer of the two surfaces returning a signal V1 cos(wt + r whilst an area ,42 is illuminated on the further surface yielding a signal 1/2 cos(wt + r The signal returned to the sensor will actually be the result of many modulated signals ~i~1 t~ cos(wt + r each being emitted from a small area 5Ai within the infra-red beam cross section. For a small beam cross sectional area the analysis is simplified if we assume that during the time the beam traverses the edge, 61 and ~2 remain constant and that 1/1 and 1/2 change only due to changes in A1 and A2. Changes in r as the beam moves across areas A1 or A2 individually are therefore assumed to be negligible. Hence the returned signal Y is given by:

Y = ~i ~os(~t + r

(3)

~i co~(~t + 6~)

so that: Y = [t.q cos 4), + V2 cos r

c.os ,~t - [V~ sin ~b, + V2 sin r

sin~t

(4)

which can be written as a single sinusoid: Y = 1.,I cos O c o s ~ , t - Vsin Osin,.'t = 1/cos(c~,/§ o)

(5)

which is the form that is estimated at the sensor outputs - ie: V is the output signal strength produced by both targets and 6 the reslllting phase shift.

598 I

I

I

I

I

I

I

I

A

X

X

X

/

X

X

X

X

X

X

X

A(F) C

0

I

1

I

2

I

3

I

4

I

5

I

6

{

7

I

8

9

Figure 7. Regions of constant depth (RCD's) extracted from 15 sonar range scans. Note the motion of the RCD's at plane surfaces (marked as A) and corners (marked as B).

599

I

I

14

15 6.5

x

5.5

-i-

x

x Jr

4.5

matched

10

beacons

9 3.5

8 j

7

/

6

/

5

4

2.5

2

90 3

1 -

|

x I

I

I

I

I

2

3

4

5

6

7

Figure 8. True, odoInetric and estimated positions of the mobile robot using a single corner beacon for localisation.

600 i

15

14

(~

12

4~9.

11

+~

10

matched wall

E

m

x

-~

9 8

3

x-~

m

6

7

4 x

x

3

2

m

1

x

x

x 2-

matched wall F 1-

1

I

2

I

3

I

4

I

5

I

6

I

7

8

Figure 9. True, odometric and estimated positions of tile mobile robot using two planar (wall) beacons for localisation.

601

Figure 10. The transmitted signal is split into two returned signals of differing phase by an edge. As the beam traverses the edge the illuminated areas and hence returned signal amplitudes will vary with traversal time.

To quantify the range estimate in this case, we need to fi~d V and ~b as functions of time, as the beam crosses the edge. From equations 4 and 5 we see that:

v~

~4

cos ~ - V cos 01 + V ~os 0~

(6)

and:

si~ 0 - V si~ 01 + V sin 0~

(7)

giving two simultaneous equations in V and O. Eliminating ~ gives: V ~ - ~q~ + 2 1 , ~ c,o s ( 0 ~ - 0 2 ) + I4~

(8)

Before we can proceed N r t h e r we need to determine, as generally as possible, the relationship between each returned voltage, the sensor to target range and tile illuminated area. We make the assumption that the emitted power is uniformly distributed over the cross sectional area of the beam. Therefore:

~.,:,~ -

A1,2

I,:,_,2 F'(J'~ '

1,2)

(9)

where the subscripts 1 and 2 refer the reflected signals 1 and 2, 1(1,2 are constants for each surface and incorporate surface reflecta.nces and beam to target angles of incidence, and F(l-g~.2) represents a function of the sensor to ta,rget ra.nges l~t.2.

602 To establish a relationship between /(1 and I(2 in equations 9, we consider the magnitude of the returned signal strengths when each surface is illuminated independently. We denote these as V~I and V~2. We will call these the end conditions and in general:

I,~2 - , i t S , ,

It1

1(2

~"~1f([~l)

t~2F(_g2)

(10)

Hence by substitution of equations 9 and 10 into equation 8 we get: V2 =

1(2

[,b-~(nl )12

[A~ + 2r/A, A2 c o s ( e , - r

+ r/2A~]

(11)

The above theory can be used to detect and register any discontinuous features within an indoor environment. We will show in the following section that the motion of the infra-red beam across an edge is only a particular case to which the above theory can be applied. The estimated phase and amplitude of any single reading can be considered to be the result of the addition of two signals from any two arbitrary ~end' conditions, which do not have to lie either side of an edge. By generalising this theory to any range and amplitude estimate, we will derive a method for the detection of range and/or reflectance discontinuities.

5.2. Discontinuity Detection Before proceeding, we make a distinction between what we will refer to as a discontinuity and an edge. We will use the term discontinuity to refer to an abrupt change in the signal amplitude. We label an edge as an abrupt change in the sensor to target range, as we human beings would perceive a true edge. Note that a discontinuity can be the result of an edge or a change in surface reflectance or both, as tile sensor head rotates. We also now clarify the constraints upon our end conditions. Equation 11 is only valid if the sum of the components of the areas normal to the optical beam, which illuminates each end point, is constant, ie: A~ + A2 = A

(12)

where A is the cross sectional area of the beam. This means that the chosen end conditions must be spatially joined in the plane of the scanning infra-red beam. This is shown in figure 11. We assume that there is a vertical boundary across which there is a possible change in range or surface reflectance. Due to the small optical beam diameter (2 cm in our case), we make the approximation that an 'end condition' is not just restricted to the circular beam cross section, but occupies the arched region marked ABCD in figure 11. This means that equation 12 is valid in equation 11. Within the arched region ABCD, the actual values of Rl and KI can change, but the sensed values of R1 and IQ will remain constant. Elilni~la, ting A2 in equation 11. from equation 12, and differentiating V 2 with respect to variable A1 shows that there is always a position between the end points at which V 2 is sta.tiona.ry with respect to A1. The second derivative of the square of the signal amplitude wit.h respect to A1 is given by: 02(V 2) 2h'~ [1 § ,/2 0,42 = F(R,)2 - 2,/c.os(o,- (32)]

(13)

603

Figure 11. The relationship between chosen end conditions.

which is independent of A1.

oz(v2) OA~ is therefore constant as the beam traverses from one

end condition to tile other, the value of this constant being dependent upon the end conditions only, ie: 01, q~2, r/ and K1. From equation 13 we can find the nature of the stationary va,lue of V 2 versus At. Simple analysis shows that for all values of 01, 02 and r/, ~0A~ is always positive (meaning that V 2 has a minimum with respect to A1) and approaches zero as r/--+ 1 and 0t ---> 4;2, ie: if the end conditions are similar, ~OA~ ~ O. This is demonstrated by the experimental results shown in figure 12 where V is plotted against time. We can therefore conclude that the numerical value of O2(v OA~2) across two end conditions gives us an indication of how "different', in terms of either sensor to target range and/or surface reflectance, the end conditions are. a2(v2) OA~ will rarely actually be zero in practice since two end conditions will rarely be identical even when the beam does not pass a discontinuity. It therefore remains for us to determine a value for ~OA 1 ' so as to register a particular discontinuity within our environment. We also need to note that the reflected infra-red light from a target contains two components, one being a specular component which follows Fresnel's equations and the other a diffuse component which is N)proximately described by Lambert's cosine law [10]. Because of the coaxial design of the sensor, specular reflections can only be received at very' l lear normal incidence and in practice are only noticed with extremely reflective targets. Therefore, except for these rather ra.re cases, only the diffuse component need be considered. o-~(v2) We now consider in more detail the value of 0A~ when the beam tra.verses a discontinuity. Consider the right hand diagram in figure 11. t~l and I~2 are two successive range estimates which we will choose as arbitrary end conditions. According to Lambert's cosine

604

1.5 -_---signal/volt

1.5 -~signal/volt

_

1.0

1.0_

0.5

0.5

o.o

0

5

10

15

(re,s, ....... 25 30

20

0.0 !

1.5

1.5

1.0

1.0

0.5 0.0

.

i

0

5

10

15

,, C ,ms ...... 20

25

30

........................................... !i,me(ms.....

0

5

10

15

20

10

15

20

25

30

signal/volt

0.5 0.0

....... 0 5

time/ms ..... 25 30

Figure 12. Graphs of signal amplitude versus time as the infra-red beam traverses different end conditions. law, the amplitude of the returned signal in each case is given by: l~t - pl A cos 0 R~ ,

~2-

p2 A cos(0 + 3') R~

(14)

where pl,2 are the surface reflectance constants from points A and B and 7 is the angle between resulting data points. In general, between end points A and B in figure 11, pl r p2. Hence, from the figure: '1 -

Ve2 I41

-

p2/~ 2 cos(O + ')') pl R 2 cos 0

=

P2 (/~'1~ 3

pl

\EJ

(15)

Applying the sine rule to triangle OAB in figure 11 and from equations 14 and 15 we define S as: ,q,__ 02(V2)

' R2 "2 sin2 if[1 4-

f)2

--'2

--

cos [/((/{~1- /{2)]]

(16)

where 1( is a, constant rela.ting pha.se shift to a.ctlla,l range a.nd R1 a.nd R2 a,re rela,te(l by tile cosine rule: (t2 - R~ + H.~ - 2R, R2 cos ~,

(17)

605 Between equa.tions 16 a.nd 17, it is possible to eliminate d a.nd determine a relationship between S, R1, R2, pl and p2. The va.lue of S therefore registers a pa.rticula.r discontinuity within a.n environment. We are currently implementing a tra.cking algorithm so that the sensors transmitter tracks regions of constant S va.lue within the environment. By monitoring the angular position of the sensors tra.nsinitter and the registered range value, mobile robot localisa.tion is feasible during mobile robot ma.noeuvres. 6. C O N C L U S I O N S

We have presented here three of the research issues which we are currently addressing in order to rea.lise an autonomous mobile vehicle. We have shown thaat a. "situation baased behaaviour selector" under the control of aaneural network decision mechanism can provide a useful path planning technique. The mobile robot reaacts to its environment with certain beha.viours or reflex beha.viours depending upon the input sensor data. The results of injecting a. desired trajectory into the mobile robot's control system has also been demonstrated. As the mobile robot moves it is capable of updating its position under a.n extended Ka.lma.n filtering algorithm using, at present, sonar range da.ta.. The algorithm is capa.ble of finding na.tura.lly occurring geometric beacons and tracking them as the mobile robot moves. As new beacons (from obstacles) a.re found (ie: not predicted) they cause new tracks to be initiated, classified and eventually integrated into the map. The results show that successful loca.lisa.tion and ma.p building is possible using this algorithm. Fina.lly we have demonstrated that a continuous, on line tracking algorithm is feasible so tha.t an optica.1 range finder can autonomously provide a locMisa.tion estimate for a. mobile robot a.s it moves. REFERENCES

1. 2. 3. 4. 5.

6. 7. 8.

M. D. Ada.ms. Optical Range Data Analysis for Stable Target Pursuit in Mobile Robotics. PhD thesis, University of Oxford, U.K., 1992. Y. Ba.r-Sha.lom a.nd T.E. Fortmann. Tracking and Data Association. Academic Press, 1988. Pa.ul J. Besl. Surfaces in Range Image Understanding. Springer-Verla.g, 1988. D. Diez a.nd G. Schweitzer. Rea.1 time systems for mecha,tronics. In Second Conference on Mechatronics and Robotics, Sept. 1993. R. Kuc a.nd M. W. Siegel. Physically Ba.sed Simula.tion Model for Acoustic Sensor Robot Na.viga.tion. In IEEE J)'ans. Pattern Analysis and Machine Intdligcnce. pages 766-778, 1987. Jean-Cla.ud Laatombe. Robot Motion Plartning. Kluwer Academic Publishers, 1991. 3. Leonard a.nd H.F. Durra nt-Whyte. Na,vigation by Correlating Geometric Sensor Data. Technical Report OIJEL 1788/89, Oxford U. Robotics l{esea.rch Group, 1989. J. Leonaard a.nd tt.F'. Durrant-Whyte. Application of Multi-Ta.rget ]u to Sonar / ~ .F. Based i~lobile Robot Navigation. In Ir~ternational Con[,.rence on Decision and Control, 1990.

606

10. 11. 12.

13. 14.

15.

T. Lozano-P(~rez and M. A. Wesley. An Algorithm for Planning Collision Free Paths Among Polyhedral Obstacles. In Communications of the A.C.M., volume 22, pages 560-570, 1979. P. Moon. The Scientific Basis of Illuminating Engineering. Dover Publications, New York, U.S.A., 1961. Winston L. Nelson and Ingemar J. Cox. Local Path Control for an Autonomous Vehicle. In Proc. IEEE Int. Conf. Robotics and Automation, pages 1504-1510, 1988. J.S. Singh and M.D. Wagh. Robot path planning using intersecting convex shapes: Analysis and simulation. In Proc. IEEE Int. Conf. Robotics and Automation, volume 3, 1987. O. Takahashi and R.J. Schilling. Motion planning in a plane using generalised voronoi diagrams. IEEE Trans. on Robotics and Automation, 5(2), 1989. N. Tschichold-Guerman. Rulenet: An artificial neural network model for cooperation with knowledge based systems. Special Issue of the Machine Learning Journal on Symbolic Knowledge and Neural Learning., 1994. S.J. Vestli, N. Tschichold-Guerman, M.D. Adams, and S. Sulzberger. Integration of Path Planning Sensing and Control in Mobile Robotics. In Proc. IEEE Int. Conf. Robotics and Automation, 1993.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

607

The Robotic Travel Aid "HITOMI" Hideo Mori, Shinji Kotani and Nofiaki Kiyohiro Dept. of EE and CS, Yamanashi University Takeda 4, Kofu 400, Japan

Robotic travel aid (RoTA) is a motorized wheelchair equipped with vision, sonar, and tactile sensors, in addition to a map database system. The visually impaired can get orientation, mobility and obstacle information from RoTA, and can inquire about their present location, landmarks and the future part of the route. The concept is implemented on a RoTA called "HITOMI" It can guide the impaired to avoid vehicles along a road with lane marks or along a sidewalk marked with Braille.

1. INTRODUCTION Various kinds of electronic travel aid (ETA) have been developed for the visually impaired. Among them the Moat sensor and Sonic guide have been available for 20 years, but have not yet become widespread. Why are they not commonly used by blind people? Jansson has suggested that it is because most ETAs give information about the environment only a few meters ahead, that can easily be obtained by using a long cane [ 1]. Another reason, we think, is yhat the sound by which ETAs give him information disturbs his echolocation, and vibration is too poor to inform him of the environment. The guide dog is the best travel aid, but it is difficult to train enough guide dogs as are needed. The number of guide dogs in the world (e.g. 8,000-10,000 in USA, 4,000 in United Kingdom, 730 in Japan) shows this difficulty. In Japan, shortage of training staff and budget (about $25,000 per dog) make it difficult for the guide dog to become widespread. The Mechanical Engineering Laboratory of Japan started project MELDOG to develop a robot guide dog in 1977 [2]. By use of CCD sensor the robot detected barcode-like landmarks fastened on the road. This research project came to an end after seven years and the robot was never used by the blind. Recently, vision-guided vehicles have been investigated by many researchers. Some of them follow a lane mark and avoid obstacles. We have proposed a behavior-based locomotion strategy called "sign-pattern-based stereotyped motion" and have applied it to mobile robots named "HARUNOBU" [3]. Recently car-navigation systems have been oroduced commercially by the electronics and car industries. Combining these technologies of the mobile robot and the carnavigation system, we have been developing the Robotic Travel Aid (RoTA) since 1990 [4]. In this paper we give an outline of RoTA.

608

2. RoTA REQUIREMENTS The RoTA is not a replacement for the guide dog but it is an advancement of ETA. Required functions of RoTA are listed as follows.

2.1. Target of RoTA The required functions of RoTA are different depending on the level of visual disability, the age of losing sight and whether the blind person has auditory impairment or not. In general, the greater the age of losing sight, the more difficult is the training in echolocation. Furthermore, for the older person, it is very difficult to memorize and recall the route from the start to the goal. Our RoTA is designed for those who lost their sight at an advanced age and have difficulty in memorize the route.

2.2. Size of RoTA To utilize a video camera as a sensor, RoTA should be large enough to stabilize the video image. The larger the RoTA is in size, the smaller is the vibration of a video camera attached to it. A motorized wheelchair was used as the undercarriage of RoTA. It is 1170 (depth), 700 (width), 1500 (hight) mm in size and 80 kg in weight. The wheelchair is big and heavy enough for the blind person to walk holding the handle bar.

2.3. Sensors of RoTA RoTA requires information about orientation and mobility. Orientation information is required to get to the goal, or the sub-goal is obtained by the use of vision to detect openings for passage or space without any obstacles. Mobility information means what is required by the blind person to control walking continuously and to keep balance while walking. RoTA makes proper use of sensors to get information of orientation and mobility. Table 1 shows multiple sensors of RoTA. Table 1 Multiple sensors of RoTA Sensor

Range

Object to which it is sensitive

Vision

3.50 m

Elongated feature, free space, vehicle, pedestrian

Sonar

less than 3 m

Wall like obstacle

Tactile

less than 0.3 m

Depression, stair

Information about orientation can be obtained by sensing the passageahead within about 5 meters. Only vision can provide such a range of sensing. We bel;ieve monocular and monochromatic vision is sufficient for get orientation. Vision is also useful to detect obstacles such as vehicles and pedestrians. However, it cannot detect a wall-like obstacle that has a homogeneous surface. The sonar is useful to detect a wall-like obstacle, but its effective range is limited to 3 meters in an outdoor environment.

609

2.4. Sign pattern based stereotyped motion We have previously proposed a behavior-based action strategy, stereo typed motion (STM) [6]. It is defined as a fixed action pattern that makes the robot perform a primitive skillful action. We assume that five STMs, "moving along", "moving toward", "moving for sighting", "following a person" and "moving along wall" are enough to perform any locomotion from start to finish. All complex actions such as obstacle avoidance can be defined as a chain of these STMs. Cooperative camera pan/tilt control is included in STM. A pattern or feature of the environment that is utilized to initiate STM or modify STM to fit the environment is called a sign pattern (SP). STM is different from subsumption architecture [11] which does not use SP. STM is a goal-oriented action and can perform a mission or task, but subsumption architecture is a reflective action and cannot perform a mission. The advantages of using SP-based STMs are ( 1) The robot can move even when the information about the future part of the passage is incomplete. For instance, what is round a corner is invisible, the robot can turn the corner by a chain of STMs that includes collision avoidance for an obstacle that might suddenly appear. (2) The reaction time to an SP is very short, as it does not needs motion planning.

2.5. Moving along and moving toward Jansson in his psychological study defined two kinds of perceptual guiding action, "walking along" and "walking toward". Applying them to RoTA we define two STMs; "Moving-Along" and "Moving-Toward." Moving-Along is defined as an STM that consists of two cooperative actions of the undercarriage and video camera systems. It moves along an elongated the SP keeping the distance from SP constant, changing the camera direction to keep the SP in the center of the video image. The elongated SP may include a lane mark of a road, the edge of a sidewalk, fences and a boundary of a campus path. Moving-Toward is also defined as an STM that consists of trhe cooperative action of the two systems to move toward a sub-goal. RoTA has to search for the SP of sub-goal and watch it through its video image. An SP of a sub-goal includes only a zebra-crossing mark at present, but in the future the entrances of buildings and stairs.

2.6. Obstacle avoidance When RoTA moves along a road or sidewalk, most critical obstacles are vehicles and bicycles. RoTA's obstacle avoidance is carried out through four task:: "moving along" in which an obstacle is found, "moving for sighting" finds the right side of the obstacle, "moving along a wall" which passes by the obstacle, and again "moving along", as shown in Figure 1.

2.7. Crossing an intersection One of the problems for the visually impaired in walking outdoors is to cross a road intersection. RoTA can find the intersection, if there is zebra crossing mark on the road by detecting it visually.

2.8. Map-based guidance The digital map system of RoTA gives two kinds of map information, one is for RoTA itself and another for the impaired person. The map information for RoTA is almost the same road information as that for car naviga-

610 tion, and it includes multiple SPs for each road and sidewalk.The multiple SPs are necessary to increase the robustness of the guidance. The map information for the impaired is used to let him know the landmarks of the present location land mark and the route represented by a command list such as "go straight" or "turn right". The map database system of RoTA is made up of two sub-system: the basic system and a portable system. We think the basic system should be managed by the welfare office and be updated every month. The portable system is part of the original one and is created by the impaired person or a volunteer before every journey. We have implemented the two systems on a well-known Japanese commercial digital map system, because that system covers almost all areas of Japan. The two systems are represented by a network of nodes and branches. A branch represents a passage. Its attributes include multiple SPs for RoTA and landmarks for the impaired. A node represents an intersection. When the impaired person does not know where he is or loses his way to the destination, he can interrogate the portable system.

Figure 1. A chain of STMs for obstacle avoidance

2.9. Semi-automatic navigation While an SP is detected visually, the impaired person may follow RoTA which keeps "moving along" or "moving toward". However when the SP disappears, RoTA stops, refers to the map system and makes an inference as to why the SP has disappeared. The inference of RoTA may be "We must reach the end of the SP" or "We will probably meet an obstacle". He will test the environment through his residual senses, including auditory, tactile senses, and understand the inference through his knowledge of environment, traffic, weather and time. RoTA makes a list of possible STMs, he selects one from the list, and RoTA performs it. 2.10. The blind-oriented interface RoTA has to inform the blind person of four kinds of information: mobility information, orientation information, obstacle/intersection information and map-based information, as shown in Figure 2. A command bar with a Braille key is fixed on the rear part of the robot. By holding this bar the user can get the mobility and orientation information.

611 Obstacle/intersection information detected by RoTA is issued as warning and alarm messages through the voice interface. The impaired person lets RoTA know the destination by the Braille key.

Figure 2. Blind-oriented interface

3. I M P L E M E N T A T I O N AND R E S U L T S The above specifications are implemented on a RoTA called HITOM1 ("pupil" in Japanese). Its system configuration and a photograph are shown in Figures 3 and 4. We will briefly describe the main points.

3.1. Undercarriage One of the most serious problems of the mobile RoTA is that it cannot go up or down stairs. The motorized wheelchair cannot go overa step more than 3 cm in height. We made the wheelchair back-to-front, becausethe reardrive wheels are larger in diamiter than the free front wheels and are able to go over higher steps.

3.2. Camera platform The problems of the camera platform are how to reduce the electric power consumption and how to decrease the vibration of the video image during the locomotion. At first we used servo systems for factory automation. They are high in electric power consumption and very large in size. We use commercially available servo systems developed for model aircraft. They are onefifth both in size and in power consumption compared with industrial servo systems. Its direction accuracy is 0.5 ~ in pan and tilt. To decrease vibration, the camera platform is supported on the RoTA by shock absorbing rubber.

612

Figure 4. HITOMI robotic travel aid

Figure 3. System configuration of HITOMI

3.3 Vision system One of the most difficult problem of the vision system is how to reduce electric power consumption while keeping the image processing time short. At present we use one board image-processing system MC 68040, which has an image buffer of FIFO and 4 MB main memory which is accessible through cache memory. It consumes 60 watts. We have developed the following software routines which can process a scene within two or three frames (66 or 99 ms). We apply dynamic vision as proposed by Graefe [7]. 3.4 Road and lane mark detection Understanding of the road and sidewalk image begins with lane-mark detection. To detect the lane-mark as an elongated SP, the monocular video image is binarized by the mode method based on a gray-level histogram. An SP is detected every two fi'ames (66 ms), false data of SP being omitted by Kalman filtering. 3.5. Zebra crossing mark detection When RoTA comes near to a road intersection, it knows this fact by map matching, and inserts a zebra-crossing mark searching process in every ten cycle of the lane mark detection processes [8]. When the mark is found, it is followed until RoTA is 3 meters in front of it. In its searching and following process, the road image is binarized and horizontal and vertical projec-

613 tions are performed on the image, as shown in Figure 5. By analyzing the two projections the mark is identified. Since some obstacles momentarily show almost the same projection as the mark, RoTA sometimes mistakes the obstacle for the mark, but this error can be corrected by checking the projection in the successive frames.

Figure 5. Zebra crossing mark detection by horizontal and vertical projection

3.6. Vehicle detection The vehicle is one of the most troublesome obstacles, whether it be moving or parked. We have proposed a simple useful vehicle detection algorithm, which is based on the fact that the sunny part of the road shines in the sun and sky light, while the underneath part of the vehicle is dark because of the faint environment light through the gap between the vehicle and the ground [9]. The underneath and shaded parts seem to be the same in luminance, but this is an illusion of brightness constancy. Our algorithm is easy to implement by software; right and left edges of the vehicle are also used for vehicle identification. An example of intensity curve in a window which is located in the underside of a vehicle is shown in Figure 6. We tested this method in four traffic scenes of (1) partly shaded road, (2) entirely shaded road, (3) non shaded road, (4) a road in cloudy; more than 97% of vehicles were successfully detected.

614 Table 2 Vehicle detection results Weather

Shadow on the road

Fine Fine Fine Cloudy

False results (%) Under-detection Over-detection

Number

Success rate (%)

Partly Entirely None

294 191 272

92 97 98

7 3.2 2.2

1% 0 0

None

405

98

1.3

0.3

Figure 6. An intensity curve in a window in the underside of the car

3.7. Pedestrian detection Pedestrian detection is very important when RoTA moves along a sidewalk. We have proposed a rhythm model to detect and follow a pedestrian [ 10]. The model is based on the following fact. Rhythm constancy: when a person walks, their volume changes rhythmically in area, width and height. The volume is specified by the mean and standard deviation of walking frequency, Fp and Op.

615 Advantages of the rhythm model are as follows. (l) Total image-processing time is shorter than that of any other model because few non-structured features are used in processing. (2) Rhythm does not vary with for distance change between person and observer. (3) Rhythm is independent of illumination and therefore robust against time and weather changes. (4) Rhythm is easy to detect when a person is wearing clothes. A disadvantage of the rhythm model is that someone with ill intent can deceive the robot by a simple trick. The process of pedestrian detection by the rhythm model is decomposed into four sub-processes: motion detection by frame subtraction and binarization, motion segmentation by headto-feet window, feet tracking, attribute acquisition, and rhythm matching. In motion segmentation, horizontal and vertical projections are performed, as shown in Figure 7, to set up a head-to-feet window for pedestrian detection. Right and left foot windows, which are 1/5 in height and 1/2 the width of the head-to-feet window, are set up for feet tracking in successive images, and the location and area of blobs in the two windows are measured. In rhythm matching, a power spectrum estimate method is operated on the two time series of areas of the fight and left foot windows. When the first components of the power spectrum of the two are matched and are within two standard deviations of the mean rhythm of walking, the objects in the two windows are judged to be the feet of a pedestrian.

Figure 7. Setting of three windows

Figure 8. A subtracted image of a pedestrian and a bicycle

In our experiments, video tapes of pedestrians on the road environment in cloudy conditions were used. Image subtraction is performed for every two fiames (66 ms). Figure 8 shows a subtracted image enclosed by a 1/5 size window. The subtracted image of the window is transformed into the binary image by a threshold operation. The area of a binary image is used for rhythm detection. The power spectrum is obtained by a maximum entropy method from 64 area measurements (66 ms • 64 = 4.2 s). Figure 9 shows the time series of the area and the obtained frequency components. It is found from Figure 9(b) that the rhythm of walking is 1 second in frequency. This method works well when the robot stops and looks at pedestrians at a distance of 7 to 30 meters through the video camera. The success rate was 94 % for one pedestrian, and 95 % for non-pedestrian. Detection errors are mainly caused by setting window incorrectly.

616 60

E5 -

03

E 50 ~ 4 0 -" .~_ ao -~ ..Q "6 20 E]O m 0 0

I':

:v ~,

1

i "

i i

2

/

"". !

3 Time (s)

!~: ..i i ~ ~.!

4

(a) Time series of the pedestrian's blob

5

=o4 (1)

~3 ~2 g_l 0

Power spectrum for blob ~ / o f left feet /[ ~,,_~ Power spectrum for blob

0

1

2 3 4 Frequency (Hz)

5

6

(b) Power spectrum for rhythm of the pedestrian

Figure 9. Rhythm pattern and power spectrum of the pedestrian in Fig. 8

4. C O N C L U D I N G R E M A R K S We have been developing the robotic travel aid (RoTA) "HITOMI" which guides the visually impaired in the road environment. HITOMI is a small mobile robot which utilizes a motorized wheelchair as its undercaniage. A vision system is equipped to detect the road, vehicles and pedestrians. A multiple-sonar system is fitted to detect walls and other obstacles which the vision system cannot detect. A portable digital map system is equipped to give the undercarriage a sequence of commands by which it follows the route from start to destination. It also gives the vision system detection parameters of sign patterns and landmarks of the route. The digital map includes the names of intersections and buildings. The user can ask where he is and the map system replies through a synthesized voice. A command bar is attached at the rear part of HITOMI. The user stands behind the RoTA and follows by grasping the command bar. He can get the mobility and orientation through the motion of HITOMI. HITOMI cannot recognize obstacles as well as man does. The success rate of vehicle and pedestrian detection is between 92 and 98%. When the visually impaired person walks guided by HITOMI, he may meet a traffic accident. To avoid this the semi-automatic navigation is applied. When HITOMI senses an environmental change, it infers its cause and tell the user its inference and the next plan of motion by the synthesized voice. The user confirms the inference by his residual senses and permits HITOMI to perform the plan or makes it wait for his permission. Generally speaking the impaired person does not want to have to obey completely what the robot commands. By the active use of his residual senses, his independence of life will be promoted.

617 ACKNOWLEDGMENTS

This paper has been partially supported by Grant-Aid for Scientific Research from the Ministry of Education No. 03650343. REFERENCES

1. G. Jansson, "Non-visual guidance of walking", Perception and control of self- motion, R. Warren and A. H. Wertheim, eds, Lawrence Erlbaum Associates,, 1990, pp. 507-521 2. S. Tachi and K. Komoriya, "Guide dog robot", Robotics Research: The 2nd Int'l Symposium, H. Hanafusa and H. Inoue, eds, The MIT Press, 1984, pp. 333-340 3. H. Moil, "Guide Dog Robot Harunobu-5 - Stereotyped Motion and Navigation", Information Processing in Autonomous Mobile Robots, G. Schmidt, ed. Springer-Verlag, 1991, pp. 135-149 4. H. Moil & M. Sano, "A guide dog robot Harunobu - Following a Person-", IROS'91, pp. 397-402 5. M. Takahasi ,Y. Yasumasa ,T. Hagiwara and H. Ohshiba, "A CCD Video Delay Line with Charge-Integrating Amplifier", IEEE J. of Solid-State Circuits, Vol. 26, No. 12, 1991, pp. 1915-1919 6. H. Mori,"A mobile robot strategy", The 5th Int'l Symposium of Robotic Research, 1990, The MIT Press, pp. 162-172 7. V. Graefe, "Dynamic vision systems for autonomous mobile robots", IROS'89, pp. 12-23 8. K. Yamaguch & H. Mori, "Real Time Road Image Processing for Mobile robot", lntn'l Conf. on Advanced Mechatronics, 1993, pp. 360-364 9. N. M. Charkari and H. Moil, "A new approach for real time moving vehicle detection", IROS'93, pp. 273-278 10. H. Moil and S. Kotani, "Recent progress in mobile robot HARUNOBU", Intelligent Vehicle'93, 1993, pp. 169-176 11. R. A. Brooks and A. M. Flynn, "Robot Beings", IROS'89, pp. 2-10

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

619

Vision-based Navigation for an Office Messenger Robot Takashi Gomi, Koh-ichi Ide, Patrick Maheral Applied AI Systems, Inc. 340 March Road, Suite 500 KANATA, Ontario, Canada K2K 2E4 Tel: +1 613 592 3030 Fax: +I 613 592 2333 E-mail: 71021.2755@compuserve. eom

ABSTRACT

A prototype office messenger robot with a new vision-based navigation/maneuvering system which implements the habitat constraint vision theory proposed by the Massachusetts Institute of Technology (MIT) has been built. The vision system is used for two objectives: to avoid collisions and find safe passage in a dynamically changing environment (maneuvering and short-range navigation); and to identify and follow landmarks to travel through its habitat using onboard topological map (long-range navigation). The process architecture for the robot follows the principles of Subsumption Architecture (SA) proposed by R. Brooks of MIT [Brooks,1986], and the basic control program for the robot consists of a set of behaviors implemented in Behavior Language (BL). In addition, the bulk of the vision processing behaviors for guiding the robot and detecting landmarks in the office space are implemented on a vision processor board using neural networks. The behaviorbased processing of sensor inputs results in extremely efficient software with highly satisfactory operational characteristics. Speech recognition/synthesis capability is added to allow a discourse between humans and the robot. The hardware and software structures of the robot and results from preliminary experiments are described. 1. INTRODUCTION Pattie Maes of MIT has compactly defined Behavior-Based AI, or "New AI" as it is now called in Europe [Maes, 1992]. Beginning roughly in the mid-1980's by a small number of researchers with a strong criticism of what the subscribers to the movement now call GOFAI ("Good Old Fashioned AI" [Haugeland,1985]), the behavior-based AI approach provides answers to some of the difficult questions directed at conventional symbolist approaches. While arguments on its pros and cons are still being debated, the approach offers almost immediate relief to some of the more serious practical problems encountered when developing intelligent systems operable in the real world [Brooks and Flynn, 1989]. For example, much of the agility, flexibility, robustness, and dynamism required in what are now collectively called "service robots" can be implemented with relative ease [Gomi, 1993]. Transportation is an application area where we have focused our efforts for the past two years. Some convincing results demonstrate the effectiveness of simple vehicle maneuvering algorithms based on SA. Although their application is initially limited to less open and somewhat structured and/or controlled applications, the algorithms clearly show

620 their potential when applied to dynamic and complex transportation applications [Gomi and Volpe, 1993], [Gomi and Laurence, 1993]. Another recent key development in a related field is that one of the first implementations of a behavior-based vision system has been completed and integrated with a mobile robot called 'Polly' [Horswill, 1992, 1993a, b]. The robot not only traveled fully autonomously at a convincing speed through a real office environment, it also demonstrated the effectiveness of its topological map system. The series of experiments continuously run throughout 1993 in an actual office set up at MIT's AI Lab, proved that something rather simple can be put into actual use as long as its control method is right. Our effort centers on the development of a prototype messenger robot capable of autonomously navigating through a corporate office environment fully autonomously and delivering documents and other light materials to workers. The approach chosen is to integrate the SA-based agile maneuvering control of a robot with a vision system built after the habitat constraint paradigm established by Horswill. Some modifications and improvements were necessary for better real-world performance. One improvement is in the way landmarks are detected. Instead of the topological map system based on snapshots of landmarks used in the MIT system, we employ neural networks as sensors of landmarks. The actions taken by the robot in response to the presence of a landmark follows the behavior-based paradigm. Compared to the more deterministic pattern matching in the original MIT implementation, the robustness of landmark detection under varied viewing conditions was improved. A limited vocabulary voice recognition/synthesis unit is incorporated in the robot to establish and maintain discourse between the robot and humans in the office. 2. SYSTEM DESIGN 2.1 The Overall Process Structure of the Prototype Robot The prototype messenger robot is built using a mobile robot platform called R3. R3 is a standard research robot built by IS Robotics. It has a set of onboard sensors, processors, and actuators. Although it can be programmed in several ways, the preferred control architecture is the one based on SA. The pod-shaped mobile robot is about 27 cm tall, 27 cm in diameter, and moves on two differential wheels and two omnidirectional casters. A gripper (9 cm long) can travel vertically along rails attached to the front of the robot. Our vision system is carried on top of the robot. It is 29 cm tall from the base to the tip of the 2 cameras. This makes the total height of the robot 56 cm. The relatively low profile of the R3 robot is not ideal for implementing a messenger robot since office workers receiving delivered items must stoop to retrieve them. We chose this platform however, for the benefit of keeping the control architecture purely behavior-based, and the ease of programming in Behavior Language [Brooks, 1990]. Some changes to the original R3 hardware include the addition of the behavior-based vision hardware, a high density people detection sensor unit, and a voice recognition/synthesis unit. The prototype messenger robot consists of a number of subsystems as shown in Figure 2.1. 2.2 Maneuvering and Navigation The prototype robot has eight active infrared (IR) obstacle detection sensors and five strips of tactile sensors around its body which can act as collision detection sensors. When the vision system overlooks local obstacles, IR sensors detect a dangerous situation and

621 behaviors associated with that input take over. Each of the eight IR obstacle detection sensors has an LED transmitter and a receiver. These sensors emit an IR pulse signal and then detect the returned signal reflected by an obstacle. The strength of the returned signal depends on a number of factors other than distance, such as color of the surface of the object, angle of reflection, texture, etc. Figure 2.1 Process structure of the Prototype Office Messenger Robot The robot does not measure the exact distance, nor is it affected adversely by the lack of Cartesian accuracy. Two behaviors are built using signals returned from 4 of the 8 IR sensors and signals from 2 of the 5 tactile sensor segments (each with 256 identifiable contact points within). Other available sensors can be used in later stages of development to enrich the behavior set of the robot. The rather casual and scant use of these sensors for building a highly effective maneuvering control system has been reported in [Gomi and Laurence, 1993]. The vision system for low level navigation and maneuvering uses the habitat constraints theory devised by Horswill [Horswill, 1992]. The key difference from Horswill's system is the use of a topological map system which consists of behaviors that get activated when an associated neural network detects a landmark in the habitat. While the robot is traveling, most of its local navigation and maneuvering are handled by its vision system. 2.3 H u m a n - R o b o t Interface

Various levels of protocol exist between humans who inhabit the office and the robot that operates there. This includes mutual collision avoidance, detection of a human by the pyro sensor array, voice exchange, detection by the vision system, and other task-achieving protocols. In order to govern some of the above protocols, a behavior designed to carry on conversation with office staff was developed. CONVERSE gets invoked when a target landmark is identified. It initiates a discourse with human(s) in the vicinity so that an item of delivery can be handed over. Micro IntroVoice developed by Voice Connexion is used for voice recognition/synthesis. The gripper, which can open as wide as 7 cm, is used for holding items to be delivered. The prototype messenger robot with all additional sensors and actuators is shown in Figure 2.2.

622 3. THE VISION SYSTEM

The basic maneuvering control software based on SA is similar to that used for previous systems [Gomi and Volpe, 1993], [Gomi and Laurence, 1993], and description is therefore not detailed here. This section describes the vision system and how it is used for maneuvering and navigation. The vision system plays two roles: to identify obstacles and free space for maneuvering and short range navigation and maneuver; and to detect landmarks in the habitat to aid the robot's navigation based on a topological map system. 3.1 The Vision System Hardware Hardware architecture of the vision system is shown in Figure 3.1. The vision processing board is Mandex's "Smart Eye I" image processing system which features Texas Instruments' TMS320C31 Digital Signal Processor (DSP) operating at 33.3 MHz. The board has 2 flame grabbers, each with 512 x 480 pixels, 8-bit deep, and can process up to four camera inputs.There Figure 2.2 The Prototype Messenger Robot are two RS232 serial ports on the board. Two Chinon CX-103 single board monochrome solid state chip cameras (324 x 246 pixels) are used to view scenes in front of the robot. The 46 x 44 x 23 mm CCD camera is capable of generating composite video signals at the rate of 60 Hz. The two cameras are each mounted on a Futaba FP-S133 servo motor, which provides approximately 170 degree swivel movement. The two-swivel-eye assembly is then supported by a Futaba FP-S 148 servo, which controls the pitch motion of the eye assembly. The pitch-controlling servo motor is then placed on top of another FT-S 148 servo, which gives a 190 degree horizontal (swivel) rotation to the entire assembly. The camera assembly is given large degrees of freedom (4 dof) so that eventually various active vision experiments can be conducted. Presently, only the pitch controlling servo is activated for scanning the depth of the field of view to detect obstacles in the path. This motion picks up obstacles appearing close and afar to the robot. Each of the CX-103 camera boards is encased in a small clear plastic enclosure, and the lens unit (approximately 15 mm in diameter and 10 mm in length) protrudes from the center of the camera. The angle of view of the lens is 110 degrees. The camera is currently adjusted to focus on objects located between 1 to 4 meters away. On both sides of the lens are two pyro-electric sensors with their own internal amplifier (ELTEC 442) to detect change in temperature in the 1 meter radius of the eye. The inputs from these pyro sensors will be used to implement aspects of peripheral vision processing such as the detection of humans approaching the eye, so that the eye can be swivelled in that direction.

623

Figure 3.1 Vision system hardware architecture Visual signals generated by the camera are fed to the Mandex board's flame grabber. Signals from selected pixels of the flame grabber are then sent to video memory. Currently, only 64 (horizontal) x 48 (vertical) pixels out of the available 512 x 480 are transferred to the buffer. The resolution is deliberately reduced to achieve increased processing speed. Although the flame grabber is capable of capturing a flame every 30 milliseconds, the communication channel bandwidth between the vision processor and the main onboard processor (Motorola MC68332) of the R3 robot allows an overall processing speed of 20 flames per second. This translates into 2.25 cm between flames when R3 robot runs at its top speed of 45 cm per second, more than sufficient for the application. The resolution of the vision system can be arbitrarily increased to say, 128 x 128 (robot moves 9 cm per flame) or 256 x 256 (35.3 cm per flame). We have tested the control structure on a much faster platform (2 meters per second). The processing speed of the system could easily cope with this speed at the current resolution. The 1R obstacle detection sensor can detect objects within 10 to 60 cm depending on surface reflection.

3.2 Local Navigation and Maneuvering using Behavior-Based Vision System The view obtained by a camera is divided into three areas as shown in the example in Figure 3.2a: Left, Center, and Right. The rough segmentation of the field of view is adequate to navigate a robot operating at human walking speed in a typical office environment, when combined with behavior-based processing of the signals. Figure 3.2a shows a scene from the camera of the robot superimposed with labelled objects in the actual field of view. Note that the camera has sufficient resolution to identify the details drawn, but they are not represented in the video buffer (see below). Figure 3.2b demonstrates how the vision system uses its free space detection algorithm and develops a depth map and the depth parameter set. Here, yL is the lowest column (the number of pixels to the nearest

624

Figure 3.2a Example scene through camera of robot

Figure 3.2b Depth map and parameters controlling steering

object) found in the Left region; yR, the lowest column found in the Right region; and yC, the highest column (the number of pixels to the deepest object) in the Center region of the scene. For every image captured in the video buffer, the image processing algorithm onboard the vision processor performs the steps described below. (1) Edge detection. A pair of standard 3 x 3 pixel masks for edge detection are used to scan the video buffer. Pixels with total significance are given the value of 255 (full bit) to indicate an edge, while other pixels are given the value of 0. After all 48 x 64 pixels are scanned the video buffer contains a matrix where edges are marked by a byte with full bit on and the rest by zero. (2) Depth map development. The video buffer is scanned vertically upward from the bottom of the screen (the 48th row). The depth of the edge at every column is recorded in the depth array, a 64 byte linear matrix. The map now contains values which give the distance from the camera to the nearest obstacle across the entire horizontal field of view. (3) Identification of the nearest edge (obstacle) in the Left and Right regions and the farthest point in the Center region. The field of view has been divided into three regions: column 1 to 26 is now called the Left region, 27 to 38 the Center region, and 39 to 64 the Right region. In the Left and the Right regions, the value of the nearest edge is located in the depth array (the minimum height, or the smallest value in the region's segment of the depth array) and designated as yL (the depth of the nearest obstacle in the Left region). Similarly, yR is obtained from the depth map. For the Center region, however, the maximum height or the farthest point in the region is searched in the depth map and its value is marked as yC. (4) Transmission of the depth parameters. The three parameters yL, yR, and yC are sent to the main processor (MC68332) of the R3 robot through the serial port. The values are used there in a number of behaviors to steer the robot and to control servo motors in the vision system. Figure 3.3 shows a behavior structure handled by the 68332 processor. These behaviors are written in BL, compiled and assembled into 68332 assembler language, and downloaded to the processor's NVRAM (non-volatile RAM) for execution.

625

Figure 3.3 Organization of behaviors for the vision based maneuver

STEER decides on the type and extent of the action to take this flame. The difference between yL and yR is used to calculate the direction of movement, while the speed of the robot is calculated directly from the value ofyC. PANTILT controls the pitch angle of the "neck" servo. The purpose of pitching is to effectively widen the vertical viewing angle. The movement is particularly helpful when detecting objects close to the camera. The control, however, is brittle and casual. Only three values are chosen and sent to the pitch control servo. When the value of speed is greater than 36, angle al is sent to the servo; between 24 and 35, a2; and below 24, a3. PAN_TILT also gives tilt-angle to SMOOTH STEER. SMOOTH STEER receives angle and speed from STEER, and tilt-angle from PAN_TILT. The value of angle is used to derive the differential between left and right motor speeds. It uses the reading of speed obtained a frame earlier and compares it with new speed from STEER. If the value of new speed is larger, then the value of angle obtained from PAN_TILT (minus a constant, in this case 5) is added. If it is smaller than the previous speed, the tilt-angle (plus the constant 5) is subtracted from speed. This arrangement provides the desired coordination between the speed, acceleration, and control of the tilt angles of the cameras. If the speed of the robot is reduced below a threshold (here set to 5) and stays there for a certain period, UNWEDGE behavior cuts in and rotates the robot clockwise or counterclockwise for approximately 1.5 seconds depending on the current value of angle. The robot continues normal steering operation after rotation. If the robot receives a return IR signal on one of two rear-facing IR sensors (IR5 or IR0) by detecting an object in the rear of the robot, FORWARD behavior slightly accelerates the speed of the left or right wheel, respectively, causing the robot to veer away from the object.

626 The object could be an inanimate object detected when the robot backs up too far, or a human approaching the robot from behind. ESCAPE monitors the bump sensors, B0 through B4, each with 255 sensitive points. However, each tactile sensor is treated as a 1 bit contact sensor for the time being. The robot makes a left turn, a left veer, a right veer, or a right turn, depending on the sensor which contacted an external object. 3.3 Use of Landmarks for Navigation The route the messenger robot travels is marked by a number of landmarks. Landmarks are ordinary segments of scenery the robot encounters when moving through the habitat. The robot can drive itself off course by significantly stepping out of the designated route and yet come back on course if the route has a series of easily detectable landmarks. A set of neural networks was trained to identify landmarks in the habitat seen through the robot's cameras. The current system uses one neural network to represent each landmark and there are now 8 neural networks trained for 8 landmarks. A neural network, after training, becomes a behavior which responds to a certain visual input (in this case, a view of the landmark seen from a direction under varied conditions) and generates an output signal to notify a match (which might be interpreted externally as "I know this place", or "This is my landmark"). An on-board neural network "senses" a scene when an image is captured in the video buffer (512 x 480 pixels). Preprocessing the 512 x 480 pixel image converts it to a 64 x

1,0 l

48 pixel image. The matrix ~l1 0

is applied to each pixel to extract the vertical edges in

the image. The number of pixels signifying a vertical edge are counted in each column to make a histogram consisting of 64 data points. These 64 data points are used as the input to the neural network. Each neural network "sensor" has a behavior associated with it. The behavior generates a number of outputs, such as the identification of the previous and the next landmarks when moving along the current route. In order to facilitate the approach to a landmark from various directions, two or more views of the same landmark (e.g., a copying machine and its surrounding) are stored to represent the landmark. The robot traces the topological map and travels to the landmark closest to the destination. Neural network "sensors" are implemented as a three layer backpropagation network resident in the memory of the vision processor. Each network has 64 elements in its input layer, one hidden layer with 2 processing elements, and an output layer with one element. A network is trained by feeding the contents of the depth array into its input layer when the robot is in front of a landmark. The number of iterations of training for the neural networks was 100,000. The navigation system currently monitors landmarks using a double linked list. Each node represents a landmark and the list maintains the topology of the route. A node has pointers to landmarks on both sides, allowing the robot to maintain its location, direction of motion, and its nearest landmarks (current, last, and next). It can also create and maintain a destination for the robot. Although efficient, this is a temporary measure until a more flexible and non-symbolic map system is developed. See Section 6 for discussion of this subject. Figure 3.4 shows a section of the habitat where the robot's navigation and maneuver capabilities were tested. The robot approaching landmark L~ has two choices for its next

627 course of action: continue straight towards L7 or turn left towards L z. An actual landmark is a visual surface of objects and parts of the office around the objects as seen from the vision system of the approaching robot. Its boundary is not crisp as the landmarks have been learned by a neural network through a large number of exposures. Each exposure differs from others in that the robot's precise position, lighting conditions, cameras' exact vertical and horizontal angles, and detail features of the landmark (e.g., a book on a bookshelf might have been removed since last exposure) differ between exposures. The ability of the recognition system Figure 3.4 Decision on next course based on neural networks to deal with ambiguity gives robustness to the robot's operation, and this is the reason we opted for a neurally based topological map system as opposed to one based on snapshots of scenes as used in Horswill's system. The approaching robot detects L 1. Since the goal landmark (Lj shown in Figure 5.1) exists as an extension to the direction of L2, the motivational signal generated by the goal flows stronger from that direction compared to one from L z. L 7 still sends some signal because it is linked to Ls, which in turn is linked to L~ through L 2. The behavior associated with L1 (the behavior which uses the neural network that acts as "L 1 sensor") generates a left turn signal to the robot, thus changing the course of the robot towards L 2. As the robot approaches L2, it eventually detects the new landmark and the behavior associated with it gets activated. Here, the choice between left and right turns are resolved similarly as at L~. Due to the ambiguity allowed by the neural network based place recognition system, the robot's approach to a landmark does not have to be precise. It has about 50 cm on each side of the robot before the robot gets too far off course to detect a landmark. The current set of neural networks begins detections of a landmark at a distance of about 1.7 meters from the landmark. An effort is underway to increase this distance to 3 to 4 meters so the lateral width margin for recognition is increased. The narrow lateral margin limits the robot's ability to recover from perturbations on the route. 4.

S P E E C H RECOGNITION/SYNTHESIS

4.1 Speech Recognition Hardware Speech recognition and text-to-speech capabilities (MIV) from Voice Connexion. All communication through an RS232 connection. The microphone and Figure 4.1 shows an overview of the hardware and interaction subsystem.

are provided by Micro IntroVoice between the robot and the MIV is speaker are connected to the MIV. software that make up the speech

628 The MIV has its own microprocessor, RAM, EPROM, and the necessary electronics for stand alone voice recognition and text-to-speech synthesis. It has a maximum vocabulary size of 1000 words and up to 40 KByte of response data (for the text-to-speech hardware and serial port). Training occurs off-line. That is, the MIV is removed from the robot and connected to a PC for training and editing of vocabularies. To train the voice recognition system, Figure 4.1 Hardware and software structure of voice system each person is asked to repeat all of the words in the base vocabulary. In this case the base vocabulary contains the following words: "Hello", "yes", "no", "okay", "goodbye", and each of the users names (e.g., "Patrick", "Koichi", "Darren", etc.). The data from the voice box is then uploaded to the PC and stored on the hard drive. All of the users' data files are combined into one file and divided into a number of sub-vocabularies. The first sub-vocabulary is the "Hello" vocabulary which contains everyone's "hello". When this sub-vocabulary is active, the pronunciation of the word "Hello" is used for speaker identification. Other sub-vocabularies are used to limit the number and type of possible responses made by the MIV. The combined data file is downloaded to the MIV which can then be reconnected to the robot.

4.2 Speech Recognition/Synthesis Software The software is divided into two levels: the interface level, and the protocol level. The interface level handles all communications with the MIV and it consists of two behaviors, HEAR and TALK. The HEAR behavior reads all data from the serial port and filters out command responses from recognition data. If the HEAR behavior is told to listen, then recognition responses are passed on to the protocol level. The HEAR behavior notifies the TALK behavior whenever a command response is received. Figure 4.2 shows the software structure of the voice system. The TALK behavior sends the necessary commands to the MIV to produce speech output, switch vocabularies, and enable recognition. It maintains a phrase queue so that the protocol level can send multiple phrases to the interface level without worrying about timing

629 issues. After a phrase is spoken, the TALK f lntroduction, behavior re-enables the "At Destination" _"TalkingTo" / speech recognition on "TalkingTo"~ / CoofirmctOoo . \ the MIV. "PackageFor"~ ~ / . . \ The CONVERSE Sensors Package Gripper J"~==~ ~ ~ ~ "~~ Gripper behavior implements DropOff L ~ / t Actuators most of the delivery Gripperforce,limits w ~ / ~ ~ ~ ~ protocol, with the andbreak-beamstatus / r~.,.~~. . . . remaining portion of "PickupP a c k a g e " ~ r'uuxu~=/~ ~ \ the protocol handled by "TalkingTo"~ PickUp ~I ~\(_~~~Speaker the MIV. This behavior MicrophoneF_____~/// ~ ~ I r/V_K k---- I_t I acts like a large state machine with additional 'Package To" parallel processes to ...~,,,~~~ ; .............. ) / handle timeouts and TalkingTo"/ ~ ~ / external commands. Essentially, the "Ready to Deliver" behavior can only be in one state and only for a limited time. If the Figure 4.2 Software organization of voice system behavior remains in a state too long, it will reset to prevent the behavior from getting stuck indefinitely. The protocol consists of a number of stages which closely follow the "states" in the behavior. These are: Introduction~Identification and Confirmation, Package drop off, Package pick up, Recipient Determination and Confirmation, and Goodbye. The Introduction~Identification consists of a simple "Hello". The robot expects the human to respond with a "Hello" which is used to identify the speaker. This is followed by a Confirmation stage in which the robot asks if it has determined the correct identity of the speaker. If the speaker gives a negative response, the robot asks for the speaker's name, and then asks to confirm the identity again. If the speaker's identity has been confirmed, the robot will attempt to drop off the package. During the Package drop off phase, the robot checks the identity of the speaker against that of the recipient of the package. If the speaker is not the intended recipient, the robot announces that it has a package for the intended recipient and asks if it may leave the package with the speaker since presumably the robot is at the correct location. If the response is "yes" or "okay", the robot will attempt to give the package to the speaker by asking the speaker to hold the package and say "okay". When the robot hears an "okay" it releases the package. If the robot was able to deliver the package, it then asks if the speaker has anything to send. If so, the robot asks the speaker to say "okay" after placing the package between its grippers. Once the robot is holding the package, it asks for the name of the recipient. Once the name of the recipient is confirmed, the robot announces that it will deliver the package and waits for a few seconds to hear a "good-bye" from the speaker. After saying "good-bye", the robot will continue on to the next destination.

Spe~ar~rlD

I GOOD-BYE1

630 5. EXPERIMENTS Experiments were conducted in an actual office space, which is a section of our own office premises in Ottawa, Canada (Figure 5.1). The habitat consists of work areas for both technical and administrative staff, reception area, manager's office, conference room, small

Figure 5.1 The office space kitchen, and small laboratory where light hardware work on robots takes place. Figure 5.1 shows one of the early routes of travel for the robot with landmarks marked 'L'. A typical experiment is delivery of small items to a target. Figure 5.2 shows the messenger robot approaching one of the landmarks (the fountain on the left). The target landmark (the copying machine) is coming in sight. The recipient of the message is a staff member at the typewriter desk. A few seconds later, a behavior associated with the target landmark behavior gets invoked when the neural network trained on the copier and its surrounding identifies the scene. The robot has reached its destination. A behavior for interacting with humans (CONVERSE) is invoked and prompts the staff member with voice. In Figure 5.3, the staff member picks up the envelope delivered by the robot. Although still limited in capability, the navigation capability of the robot is satisfactory. The neural networks readily identify their designated landmarks despite the rather simple training they received. On average, some 40 slightly differing scenes of a landmark have been fed to each neural network during training. Each neural network was trained until the

631 RMS (Root Mean Square) error was less than 0.1 using extended-delta-bar-delta strategy. The messenger robot does not attempt to ascertain its precise location other than being aware of what is the nearest landmark. By increasing the number of landmarks, the precision of its location could be improved. However, this is not how humans operate in an office, either. Experiments show that handling dynamic interactions with office furniture is very good. Recovery from unexpected loss of track needs improvement. The robot's ability to cope with unexpected, often perplexing, and even deliberately annoying Figure 5.2 Messenger robot approaches landmark encounters with office staff is better than expected. Figure 5.4 shows the distribution of output of the neural networks for landmark L 1 and L2. The distribution of output of the neural networks for landmark La shows that the neural network detected a high density of vertical lines in the left side of the captured image. The neural network for landmark L2 detected uniformity in the vertical line in the whole captured image. For the speech recognition, three users trained the system on their voice by repeating each of the recognition words three times. The resulting speech recognition patterns were stored in a personal file. The personal voice data files Figure 5.3 Messenger robot delivers envelope were combined into a single file and uploaded to the MIV. Initial results show that voice interactions between robot and human work well under quiet conditions. However, during some of the experiments, the speech recognition was affected by intermittent background noises. Constant background noises (such as ventilation) are filtered out by the MIV, but sudden noises such as a door closing or a phone ringing are detected as possible speech. These sounds are usually distinct from the trained vocabulary and ignored by the MIV, but occasionally, a misrecognition occurred. These misrecognitions indicate a need for a command word that will force the CONVERSE behavior to "backup" so that the user can correct the error. The system currently occupies 2.95 KByte of code on the vision processor board of which about 1 KByte is dedicated to running the neural networks and the rest to the vision-based navigation/maneuver. Each landmark neural network requires 800 Bytes to store its learned weights. In addition, 8.9 KBytes support 8 navigation/maneuver-related behaviors on R3's 32-bit main processor (MC68332).

632

Figure 5.4 Distribution of neural networks for L1

Figure 5.5 Distribution of neural networks for L2

The voice data file is approximately 4.5 KBytes, and the CONVERSE behavior is approximately 32 KBytes. The size of the CONVERSE behavior can be attributed to the voice output and vocabulary switching routines since each character sent to the serial port required a call to the serial-write-char and delay functions. 6. DISCUSSION

The prototype system has no multiplication or division in its entire processing, including its vision processing subsystem. Furthermore, all computational processes used could be implemented by those found in biological systems. We intend to further simplify the process by replacing steps with increasingly more biologically or physiologically plausible processes. Through our experience in building several SA robots, we feel strongly that there must be a fundamental revision in the approach by conventional robotics, specifically to mobile robots operating in dynamic real-life environments. The method perfected during the past few decades for mostly stationary manufacturing robots will not suffice, as it is necessary to cope with operational parameters that are at least an order of magnitude more demanding. In particular, a new breed of robots now called "service robots" has operational requirements and control characteristics drastically different from more stationary counterparts. For example, only by improved adaptability to the semi-structured and unstructured spaces found in the office environment does an office robot have a chance for survival. A change in approach rooted in a non-Cartesian philosophy will have to be realized. We discovered that by revising the way neural networks are stored on the current prototype, the number of landmarks stored can be increased several fold, allowing creation of richer and more flexible routes. This also means an increased number of "facets" per landmark, adding to the robustness of landmark detection. Not only could a robot detect a landmark under more varying conditions, but it could also return to the route from an unexpected or inadvertent loss of track, for example, when avoiding an approaching person. The modification is underway. The present vision system does not handle color although the vision processing board already has color processing capacity. By distinguishing colors, the robot will be able to

633 deal with changes in the color of carpet, locate an object of specific color in the habitat, identify a person by the color of his/her clothes, and so on. Color processing will be added by the middle of 1995. There will be a need for more sophisticated ways to locate the recipient of a message. For example, if a recipient happens to be in the office kitchen, the robot needs to know that is one location where she may be when she is not at her desk. For implementing this type of person tracking behavior, a topological submap structure like the "scene" concept studied by Schank in his memory architecture based on Cognitive Science [Schank, 1985], [Johnson and Robertson, 1981] is being considered. It maintains scenes of locations where a person might be. The submap typically has a topological structure linked to the main topological landmark map at the person's main domicile, such as her desk. The actual design of this submap scheme will depend on the improved landmark storage scheme discussed above. In the present landmark management system, landmarks must be pre-defined. In order to improve the flexibility and efficiency of the subsystem, an autonomous landmark recognition scheme is being studied. This was tried by Brooks [Mataric, 1990] using representation based or finite state machines. The new neurally represented subsystem will have the following attributes: (1) A "winner take all" network architecture to automatically assign a neural network to a scene representing a landmark. ART, LVQ, and other network architectures are being tested. A few biologically plausible neural networks with the ability to inhibit their neighbors are also being studied. (2) It will establish a topology without the help of conventional programming techniques such as linked list or other data structures. Again, we are investigating how to develop 'tentacles' to establish signal connections between landmarks. Analogy to horizontal connections made by stellar cells in cortical layers of mammalian brain to link pyramidal cell columns is being studied in detail. (3) The inter-landmark links must be amenable to the calculation of the cost to the robot of traveling along a particular route. A form of energy calculation will be used. When creating an autonomous mobile robot for a practical application, it would be ideal for the robot to manage its own power supply. The robot must detect when to recharge its battery within the envelope of being able to do so. When the robot is far away from the charging station, or the traffic (humans, other robots) is heavy, it may not succeed in recharging itself before it gets too late. The charging process will have to be secure and totally automatic with protection against overcharging/undercharging. Charging must be efficient, so that time for productive activities can be maximized. We have yet to tackle the recharging issue. A likable personality is necessary for an office worker to survive. An office robot must also have its own good character and personality. We have successfully implemented several artificial emotions on a few types of SA robots with encouraging results [Sicnolfi,1992], [Gomi and Ulvr, 1993]. Recognition level could be improved by using a larger training set such as five or seven utterances of each word rather than the three repetitions used in these experiments. Recognition accuracy should not degrade as more users are added since most of the subvocabularies are user specific. The only time that the recognition might be affected is during the speaker identification, since the robot must distinguish the user's "Hello" from all other users' "Hello". The confirmation phase of the protocol should not be affected seriously unless many people with the same name use the robot.

634 The conversation between the robot and human could be improved by moving more of the protocol processing from the robot to the MIV. This would use more memory since additional sub-vocabularies would be needed to handle the specific phases of the protocol. For example, the confirmation of the recipient could be handled almost entirely by the MIV in the same manner it handles the confirmation of the speaker's identification. Expansion of the vocabulary would not require new words to be added, rather, it would need additional sub-vocabularies containing the current words but with mode-specific responses. For example, in one sub-vocabulary (e.g., "Package to Send"), the response to the word "yes" might be "place package in grippers and say ok" followed by a switch to new sub-vocabulary. In another vocabulary (e.g., "Package For You"), the response would be "hold package and say ok" followed by a switch to the next sub-vocabulary. With this type of extended vocabulary, the MIV could implement most of the delivery protocol for approximately 15 people. The current implementation uses sequential control of behaviors to manage specific protocol. However, in future implementations we will consider using Action Selection Dynamics to make the sequences of speech protocol emergent. By expanding the vocabulary to include locations such as "kitchen" or "conference room", the robot could be given the ability to ask where to find the recipient. The location could then be passed on to the vision/navigation subsystem so that the robot could deliver the package to the recipient regardless of his or her location in the office. 7. CONCLUSIONS An office messenger robot was created using behavior-based control. The robot is fully autonomous and can operate without guiderails or external navigational guidance. The results from experiments indicate that the robot performed with sufficient mobility, agility, flexibility in its movements and ability to handle basic human-machine protocols as an office messenger robot. A number of expansions are being worked on to improve its performance. REFERENCES

[Brooks, 1986] Brooks, R.A., "A Robust Layered Control System for a Mobile Robot", IEEE Journal of Robotics and Automation, 1RA-2, 1986. [Brooks, 1990] Brooks, R.A., "The Behavior Language; User's Guide", MIT AI Memo 1227, April 1990. [Brooks and Flynn, 1989] Brooks, R.A., Flynn, A., "Fast, Cheap and Out Of Control: A Robot Invasion of the Solar System", Journal of the British Interplanetary Society, Vol 42, pp 478-485, 1989. [Gomi, 1993] Gomi, T., "Subsumption Robots and Application of Intelligent Robots in Service Industry", Journal of Labor Saving & Automation, February 1993. (In Japanese). Technical Note: TN 93-007, Applied AI Systems, Inc. (In English). [Gomi and Laurence, 1993] Gomi, T., Ulvr, J., "Behavior-Based AI Techniques for Vehicle Control", 'IEEE Vehicle Navigation and Information Systems Conference (IVHS'93)', October 1993, Ottawa, Canada. [Gomi and Ulvr, 1993] Gomi, T., and Ulvr, J., "Artificial Emotions as Emergent Phenomena", Proc. Robot and Human Communication (RO-MAN' 93 ), November 1993, Tokyo, Japan.

635 [Gomi and Volpe, 1993] Gomi, T., and Volpe, P., "Collision Avoidance Using Behavioral-Based AI Techniques". Proc.'Intelligent Vehicles'93 Symposium (IV'93), July 1993, Tokyo, Japan. [Haugeland, 1985] Haugeland, J., "Artificial Intelligence: The Very Idea". The MIT Press, 1985. [Horswill, 1992] Horswill, I., "A Simple, Cheap, and Robust Visual Navigation System". Simulation of Adaptive Behavior (SAB'92), December 1992, Honolulu, Hawaii. [Horswill, 1993a] Horswill,I., "Specialization of Perceptual Process". Ph.D thesis, Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May, 1993. [Horswill, 1993b] Horswill, I., "Polly: A Vision-Based Artificial Agent". Proc. 'National Conference for Artificial Intelligence (AAAI'93)', July, 1993, Washington, D.C. [Johnson and Robertson, 1981] Johnson, P., and Robertson, S., "Magpie: A Goal-Based Model of Conversation". Research Report #206, Department of Computer Science, Yale University, 1981. [Maes, 1992] Maes, P., "Behavior-Based Artificial Intelligence", Proc. 'Second International Conference on Simulation of Adaptive Behavior (SAB'92)', December 1992, Honolulu, Hawaii. [Mataric, 1990] Mataric, M., "A Distributed Model for Mobile Robot Environment-Learning and Navigation", MIT Artificial Intelligence Laboratory Technical Report AI-TR-1228. [Sicnolfi, 1992] Sicnolfi, J., Technical Report AAI-92-18, Internal report prepared for client, September 1992. [Schank,1985] Schank, R., "Dynamic Memory". Cambridge University Press, 1985.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

637

Trajectory Tracking Control for Navigation of the Inverse Pendulum Type Self-Contained Mobile Robot Yun-Su Ha and Shin'ichi Yuta ~ Intelligent Robot Laboratory Institute. of Information Sciences and Electronics, University of Tsukuba, Tsukuba 305, Japan In this paper, we discuss the trajectory control for a wheeled inverse pendulum type mobile robot. The robot has two independent driving wheels on the same axle, and a gyro type sensor to measure the inclination angular velocity of the body and rotary encoders to measure wheel rotation. The purpose of this work is to make a robot autonomously navigate in a plane while keeping its own balance. The control algorithm consists of three parts: balance and velocity control, steering control and straight line tracking control. We designed and implemented a vehicle command system for such robot to control using the proposed algorithm. Experiments of the navigation in a real indoor environment have been performed using our experimental robot "Yamabico Kurara". 1. I n t r o d u c t i o n

The focus of this work is to make a wheeled inverse pendulum type mobile robot navigate autonomously in a plane while keeping its balance. The robot is assumed to have two independent driving wheels on the same axle which support and move the robot itself, and a vibration type gyro sensor to measure the inclination angular velocity of the body and rotary encoders to measure wheel rotation. Previous research on wheeled inverse pendulum type robots have been reported. T. Kawamura and K. Yamafuji [1] proposed a posture and driving control algorithm for a similar vehicle. They assumed that the robot has a tactile sensor to detect the angle between the ground and the body. They suceeded at balance contol but the robot could not move freely in a plane because both of its wheels were driven by a single motor. O. Matsumoto, S. Kajita and K. Tani [2] presented the estimation algorithm of posture using the adaptive observer. The presented algorithm also ignored steering control on a plane. These two papers reported a real implementation on robots and results of experiments. However, their robots were not autonomous and were connected by wires to the external computers. E. Koyanagi et al. [3] proposed a two dimensional trajectory control algorithm for this type of robot and built an autonomous self-contained robot with the proposed algorithm. However, the algorithm only worked while the robot moved slowly. In this paper, we propose an algorithm for trajectory tracking control of a wheeled inverse pendulum type mobile robot which can operate freely at relatively high speeds in a plane. In section 2, we introduce to features of the wheeled inverse p e n d u h m type self-

638 contained mobile robot which are discussed in this paper and describe the control scheme. Section 3 is devoted to the description of our modelling method and the algorithm for posture and velocity control of the wheeled inverse pendulum type robot. The algorithm for steering and trajectory tracking control is discussed sections 4 and 5. We also introduce a vehicle control command system for this type of robot in section 6. The configuration of an experimental set-up of the wheeled inverse pendulum type self-contained mobile robot "Yamabico Kurara" and the results are shown in section 7. 2. T h e w h e e l e d i n v e r s e p e n d u l u m t y p e m o b i l e r o b o t a n d its c o n t r o l s c h e m e 2.1. T h e w h e e l e d i n v e r s e p e n d u l u m t y p e m o b i l e r o b o t The wheeled inverse pendulum type mobile robot which is discussed in this paper, has these features: 1. The robot has two driving wheels on the left and right sides of its body and no other supporting wheels. 2. The robot is self-contained in that all neccessary functions for control, such as a computer system, sensor system and battery, are installed within its body. 3. The two driving wheels of the robot are driven by independent DC motors. 4. The robot has a gyro sensor to measure inclination angular velocity of its body. 2.2. T h e s c h e m e of c o n t r o l s y s t e m s

V

Y

\

a'

/ a'

/

x

X

Figure 1. Motion and posture of the robot on the x-y plane

""

639 The motion of the robot on a plane is represented by the position vector (z, !/), the direction angle ~, the locomotion velocity v and the direction angular velocity ~. The posture of the robot is represented by the inclination angle ~b. The configuration is shown in Figure 1. The problem is, how to find the torque of both the left and right side motors to keep the robot's balance and track the given trajectory on the x-y plane. In this paper, we propose an algorithm for trajectory control of such robots, which is divided into three parts. The first part is for the balancing and translational velocity control on the x-y plane. The robot was modeled as a one dimensional wheeled inverse p e n d u h m . The control of balance and moving velocity of the robot are handled together as one problem because the posture has a deep relationship with velocity control. When the rate gyro sensor is used to estimate the inclination angle of the body, the accumulated error is a serious problem. We solved this problem by augmenting one integrator in the forward path of the control system. In this control system, the integrator cancels the effects of accumulated errors. The second part is for the steering control of the robot. Mathematically, the robot can not be modeled as a one dimensional wheeled inverse system while it is in a spinning or turning motion. However, the rotational angular velocity does not seriously affect the balance control if it is limited to small values. Hence, we assume that the steering control can be performed independently from the balance and velocity control without serious effects. The reference velocity of both the left and right wheels are calculated from the desired translational velocity and the directional angular velocity of the body. Each wheel is controlled independently to keep balance and to track the given moving velocity, where the algorithm used is based on the one dimensional model. The third part is straight line tracking control. It is treated as a linear regulator so that the difference between the reference straight line given by the vehicle command system and the current position and direction angle of the robot goes to zero asymptotically. The algorithm we use for straight line tracking control was adapted from a similar algorithm used for a normal wheeled vehicle with supporting casters [5]. Figure 2 shows configuration of the trajectory control system for the wheeled inverse pendulum type mobile robot. 3. P o s t u r e a n d v e l o c i t y c o n t r o l 3.1. M o d e l of t h e i n v e r s i o n a n d l o c o m o t i o n The robot is modelled as a one dimensional inverse pendulum which rotates about the wheels' axles. Hence, the body's motion on a plane is determined by the inclination and translational motion. Figure 3 shows the model of the robot, where, 0 and r are the wheel's rotation angle and the inclination angle of the body, respectively, and ,~ is the wheel's relative rotation angle to the body 0 - 6. Lagrange's motion equation is used to analyze the dynamics of this model, which is given as equations (1) and (2).

d .cgT

cgT

OU

cgD

Zt155 ) - 5 5 + 5 5 + o) - o:'

d aT d-7 (

aT

au

at)_Qo

) - -go + -go + a o

(2)

640

1 Desired

trajectory

Vref

b rref

~ ~ ~ref ~ b lref ~.[

~1

iirref rref

~" i kef lref

F

C/ r

X

v

v

~"

y

v

Figure 2. Configuration of the trajectory control system for a wheeled inverse penduhm type mobile robot

Figure 3. The model of the wheeled inverse pendulum

641

Where, T : Kinetic energy, U : Potential energy, D : Dissipation energy function, Q9 : External force to/3 axis and Oe : External force to 0 axis, which are given as: . . .+. z~) + ~Zw0 * "~ + ~~Zb(0- 9) ~ + ~*IM,/3" T - ~ V,~(h.~ + ~.~) + *~M~(.~.~ , U = M,~g,- + M~gl co~(0- 9)

D-~

Q /3 = rlrtu Qo = 0 .

+pg

)

Assuming that the robot is to move in an up-right posture (r ~ 0, q5 ~ 0), the linearized motion equations near the up-right state are given as

(M~l ~ +

z~ +

./"f.)~ + (M~.-1-

~z~)#

+ ,.r

-.m~

(3)

M~gZr - 0

(4)

,~0- M~vlr

(M~.-Z + M~Z~ + Zb)g + [(M~ + M.w).-~ + Mb.Z + I.~]# + , , 0 -

The state equation of the linearized model is obtained as equation (5).

j(-

A X + Bu

(5)

0)

Where, A( al

(0)

al a3 a5 , B-bl , X-[r a2 a4 a6 b2 a2 ) 1 ( (a22-a~2)Mbgl ( a 1 1 - a 2 z ) M b g l ) ( b l ) _

a3 a4 a5 a6 a21 a2e

--

--#sa22 (,sa22 § #ga,2)

-A

#sa21 --(#sa21 § ,gal,)

'

1 ( -a22wt )

b2

X

a21?]Tt

Mbrl + Mbl 2 + Ib (Mb + Mw)r 2 + Mbrl + I~

z~ -- alia22 -- a12a21 The parameters and variables in equation (5) are defined in Table 1, where the values are for our experimental robot Yamabico Kurara.

3.2. The c o n t r o l law for posture and velocity c o n t r o l The purpose of this section is to derive the control law to move at a given reference velocity while keeping the body balanced. If the reference rotational angular velocity ~)r~f of the wheel is constant, the steady state vector Xs and steady control input Us are derived as equation (6) from equation (5).

642 Table 1 Parameters and variables I symbol

parameter and variable name

Mb

Mass of the body Mass of the wheel Rotational inertia of the body Rotational inertia of the wheel Rotational inertia of the motor axis Radius of the wheel Length between the wheel axle and the center of gravity of the robot's body Coefficient of friction between the wheel axle including motor and gear Coefficient of friction between the wheel and the ground Torque constant of the motor Reduction ratio of gears Acceleration due to gravity

M~

Ib Iw IM ?.

1 s #g

Inclination angle of the body Wheel's Rotation angle Motor's input current

unit and value * [kg] [kg] [kg-m 2] [kg-m 2] [kg-m 2] [m] Ira]

9.01 0.51 0.228 5.1 -10 .4 3.2 -10 .6 0.062 0.138

[N-m/(rad/s)]

5.76-10 -3

[N-m/(rad/s)]

4.25-10 -:~

[N-m/A]

0.0235 39.5 [m/s 2] 9.8 [rad] [rad] [A]

* Values are for Yamabico Kurara.

-(asb2-aabl)

(alb2-a2bl) 0

u~

g(0~s)

Ores (G)

m

(alb2-a2bl)

Here, defining A X and Au as equation (7),

Au

-

(xxs) u-

u~

(7)

the problem can be treated as the design of a state feedback regulator for the system represented by A X and Au. Generally, the modelling errors due to disregard of nonlinear factors such as the backlash of the reduction gear and nonlinear friction are unavoidable. Also an accurate identification of the parameters is not easy. Therefore, a counterplan against the modelling errors, the parameters variations and disturbance should be considered when designing

643 controllers. In the case that the inclination angle is estimated by integration of measured values using the gyro sensor, accumulated errors are included. Hence, we augmented the control system to cope with these errors. Defining a new state variable z as

-- fot(O - O~f )dt

(8)

the augmented system (S) [4] is represented as equations (9) and (10). A X -- A A 2 + / 3 A u

(9)

:xy = d z x 2

(~0)

where, AJf-

AX z

'

~_

A C

0 0

'

/)_

B 0

'

(~_

0

'

C-

[001]

The optimal controller which asymptotically stabilizes the feedback system of the augmented system and minimizes the performance index J (11)

w h ~ , ~) = r is given by

=

_> 0, R = ~

> o

~ - t?~ ( x - x~) - / ~ f0~(b -

b~)dt

(12)

Where K =/)-1/~T/5 and t5 is the solution of the matrix Riccati equation. The posture and velocity control system is realized by a linear state feedback and f~edforward controller as equation (12) (Figure 4). 4. S t e e r i n g c o n t r o l When the robot spins itself to change direction on a plane, ie. ~ r 0, the value ~ affects the dynamics of balancing control, and equations (1) abd (2) are no longer exact. However, we can assume the robot's dynamics keeps equations (1) and (2) approximately, when the value ~ is small. Also, we regard the robot system as consisting of two independent inverse pendulums which are mounted on both wheels, which can be controlled independently. We can apply the Power Wheeled Steering method for heading control of the wheeled inverse pendulum type mobile robot based on these assumptions. The algorithm to control such a non-holonomic system should be divided into two steps: (a) Control to make a vehicle track the reference translational velocity v and directional angular velocity (b) Determination of locomotion velocity v and directional angular velocity ~ of the vehicle based on the deviation between the current position and direction, and the reference trajectory.

644

O +

"/dtl I

-lg(

i ref )

+

+D, + !0 U~,.

US

xs

0 ref " f( 0 ref) ~

Wheeled

O ~ I P " I ~1

+l

Inverse Pendulum

Figure 4. Block diagram of the posture and velocity control system

Rr

RI

i ! i

L

i

6"

i i

Figure 5. Parameters for a vehicle controlled by the PWS method

X

645 Step (a) is discussed in this section, and step (b) is described in next section. In the case of the power wheeled steering type vehicle (Figure 5), the moving velocity v and directional angular velocity ~ relate to the left and right wheel's rotational angular velocity 0t, 0r as in equation (13). (13) T

-Z-

Where L is the distance between the left and right wheels, and Rz and Rr are radius of the left and right wheels. The steering and moving velocity control of the robot are realized by controlling both the left and right wheel's rotational angular velocity. The reference rotational angular velocity for the posture and velocity control system (Figure 4) is given as equation (14).

Orref

!

~

r

L

)

(~4)

The block diagram of the control system to perform step (a) is shown in Figure 6. The inclination angle 6 and angular velocity 6 are obtained from a gyro sensor, which are commonly used for the controller of both wheels.

.........................................

Vref

~....

=1,..~.]

~

~

O"

L~'~ right wheel [ irreii I controller

r

Flr

-~ --~0 + +

posture and velocity controller ref

[ 0,

0 ,i;;~........... : ..... :;i,;?, i .,,w,...

coo,ro,,er I i

"

...........................

-. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

"

i ;

Figure 6. The heading and velocity control system

"

-

V

646

5. Straight line tracking control To make the robot track along a given trajectory in a plane, the reference value of moving velocity vrcf and direction angular velocity ~rc~,- must be calculated. In this section, we describe that algorithm.

Figure 7. Configuration of straight line tracking control

5.1. Heading control for straight line tracking Let us consider the case where the robot is to track along a straight line on a plane passing through a point (Xo, yo) with angle r from the x axis. The robot's current by dead reckoning or some position and direction are estimated as x(t), y(t) and r other positioning system. At first, we represent the robot position and direction on the - ~1 coordinate, on which the ~ axis denotes the given straight line to track. The robot position and direction is denoted by ~(t), v(t) and a(t). Then, the problem is formulated as "decide the angular velocity ~ ( t ) s o that ~(t) and c~(t) go to zero asymptotically, while v(t) remains constant and ~(t) increases with given velocity". So the problem can be treated as a regulator problem after linearization based oil the assumption that ~7(t) and ~(t) are small 9 The reference directional angular velocity at the next sampling instant should be calculated from negative feedback of ~(t) and a(t), ie:

Cre:(t +

At) -- -kv'rl(t) - k ~ ( t )

(15)

647

Velocity Desired velocity v~

!

tO

ta

i~.--.~1

Acceleration section

td

ts

Time

Deceleration section

Figure 8. Reference velocity for the robot

5.2. Translational velocity control for starting and stopping A sudden change of the reference velocity for starting and stopping may cause the robot to be unstable. Hence, the reference velocity is given in three stages: the acceleration section (to- ta), the ordinary operating section (ta- td), the deceleration section (td- t,), as shown in Figure 8. The reference velocity to start and stop are calculated as follows: 1. Calculation of the reference velocity in acceleration section The reference velocity is given as equation (16).

(16) where, At is sampling interval, a is given acceleration ratio and vd is the desired velocity from the locomotion control command. 2. Calculation of reference velocity in deceleration section for stopping The calculation of reference velocity to make the robot stop at the goal position exactly is not a simple problem. The starting time of deceleration can not be predicted exactly because of the effect of response delay time or steady-state error in the control system. Letting the velocity and position at any time t be v(t) and Xstop where robot should stop are given by

,(t) = a - ( t , -

t)

x(t),

v(t) and the position

(17)

648

X~top -- x(t) +

ft ts a.

(t~ - t)dt

(18)

Where t~ is the time when tile robot will stop. The velocity v(t) caI1 be rewritten from equations (17)and (18).

v(t) - v/2a(Xstop- x(t))

(19)

Hence, if the control system has no response delay time and steady-state error, the reference velocity can be calculated as follows.

v,,f(t + At)-

{ v,,:(t) " v(t) < 1 2 a ( X , t o p -

v~z(t ) -a.

x(t)) A t " v(t) >_ 12a(Xstop - x(t))

(20)

To avoid the run-over at the stop position X~top caused by such effects, we propose a formula to calculate the reference velocity for stopping as equation (21).

Vref(t + At) -- { vr~f(t) " v(t) < v/2a[(Xstop - c ) - x(t)] vr~i(t ) - a . A t " v(t) >_ v/2a[(xstop - e) - x(t)]

(21)

where e is an appropriate small value. 6. L o c o m o t i o n c o n t r o l c o m m a n d s y s t e m A command system named "HYS", which is used to define the reference trajectory and other information for this type of robot has been designed. It is similar to the Spur [5] command system for autonomous mobile robots proposed by our group. The usual HYS commands denote the coordinate parameters on the two dimensional plane to define the line or position. Each command is represented as a function in tile C language. The important HYS commands are as follows: 1. Straight line command: HYSJine_LC(xun~, yun~, Czin~) To define the straight line along which the robot should track. 2. Stop command: HYS_stop_LC(xstop, y~top, ~b~top) and HYS_stop() To define the position and direction in which the robot should stop and emergency stop. 3. Spin command: HYS_spin_LC(~bsp,i,~) To make the robot spin to an angle of

Cspin from the

x axis.

4. Position adjustment command: HYS_adj_pos_LC(x,ej, y~ej, Cadj) To replace the internally estimated position by tile given parameters which are usually determined by detecting the environment with the external sensors.

649 5. The other commands HYS_set_vel(v)" To set the reference moving velocity of the robot,. HYS_set_angv(~)" To set tile reference directional angular velocity for the spin motion. HYS_get_pos_LC(&x, &y, &~p): To monitor the current position and direction. HYS_get_vel_angv_pos(&v, &~/~,&r To monitor the current moving velocity, directional angular velocity and posture of the robot. Using these commands, the navigation program can easily control the robot trajectory in real time based on the map information and external sensory data.

7. Experiments using Yamabico 7.1. Experimental system and determination of parameters The robot used for experiments was a wheeled inverse pendulum type self-contained toobile robot Yamabico Kurara from which tile front and rear casters had been removed from the standard type Yamabico robot. Yamabico are a series of self-contained autonomous mobile robot platforms for experimental research which were developed by the authors' group. Figure 9 shows a self-contained autonomous inverse pendulum type experimental robot "Yamabico Kurara" which is controlling itself to keep its own balance. Tile robot has rotary encoders (resolution:2000) and a vibration type gyro sensor (TOKIMEC Co. TFG-160) to detect wheel rotation and the body's inclination angular velocity. Both wheels are driven by DC motors of 10 Watts. Ultra sonic sensors to recognize the environment are attached on the front, left and right sides of body. By using the parameters given in Table 1, the coefficient matrices /l and /) of tile augmented system (5) are found to be:.

~_

/ 0 100 / 38.23 -51.76 0

-0.085 0.347 0

0.121 -0.56 1

0 0 0

/)_ '

//o

-03.7 56.02

In the posture and velocity control experiment, the weighting matrices of performance index J were fixed as Q - d i a g ( 1 , 0.01,0.01,4) and/~ - 300. As a, result, feedback gains of the control system represented in equation (12) were given as/~1 - ( - 9 . 0 , - 1 . 5 8 , - 0 . 1 2 3 ) , /c2 - -0.115, and in this case poles of closed-loop system were calculated as A1, A2 (-6.177 + 0.119j) and A3, A4 - (-1.529 + 0.1412j). The paprameters k~, ks in equation (15) were determined to be 0.00457 and 1 respectively. An MC 68000 micro-processor was used as a controller, and all computation was performed in a fixed-point representation. The sampling interval of the control program is 5 miliseconds. The motor input current is controlled using tile feedforward current control method [6].

7.2. Experiments 1. Experiments for the posture and velocity control At first, the step response experiment to test the performance of the posture and

650

Figure 9. The inverse pendulum type self-contained mobile robot Yamabico Kurara

velocity control system was performed. One of the results is shown in Figure 10, which shows that the robot can track the reference velocity while keeping its balance. Figure 11 shows the experimental scene oil sloped rough ground. Tile result is shown in Figure 12. In spite of the roughness and slope of tile road, tile robot tracked the reference velocity while keeping balance of posture. This result shows that the proposed algorithm is robust enough for the indoor environment. 2. Experiments to verify our assumption on appling PWS method We assume that the steering control can be done independently without serious effects upon the balance of the body if the rotational angular velocity is limited to small values. We checked the validity of this assumption by examining the posture of the body at corners A and B when the trajectory as shown in Figure 13 is given for the robot. We found that the robot could track the given trajectory and maintain the stability of posture while performing a turning motion. The result is shown in Figure 14. 3. Experiments for trajectory tracking and position control Figure 15 shows the result of trajectory tracking control experiment when the desired straight line is given for the robot moving with velocity 25[cm/s] at point A and B on the x - y plane. Figure 16 shows one of tile experimental result using the reference velocity is calculated by algorithm as equation (21) for position control. 4. Experiments of indoor navigation

651

Figure 10. Experimental results for posture and velocity control

Figure 11. Experimental scene oil sloped rough ground

652

Figure 12. Experimental results for posture and velocity control on sloped rough ground

Corne~~.~ 200 cm -~IC~ rner A

20

.]

\

...................... "~'

,...........

.....

I ,o o,

~~__] MovingDirection J

Figure 13. The trajectory to test posture control in turning motion

653

Figure 14. Experimental results of posture stability in turning motion

We also performed an experiment in indoor navigation to test the performance of the total control sytem and the validity of the command system. The Intelligent Robot Laboratory at the University of Tsukuba was used for as the experimental environment. Figure 17 shows the robot moving in a real environment. 8. C o n c l u s i o n

In this paper, we proposed a control algorithm to make the inverse pendulum type mobile robot move in a plane, with the desired constant velocity, and while keeping the balance of posture by itself. The control algorithm was implemented and tested on the experimental robot. Through experiments using the robot, we concluded: 1. In spite of modelling errors, parameter variation, and the accumulated errors in the inclination angle, the robot can move robustly at the desired velocity and while keeping its balance. 2. It is possible to separate the steering control from the posture and velocity control without locomotion velocity and direction angular velocity being constrained too much.

654

Figure 15. Experimental results for trajectory tracking control

Figure 16. Experimental results for position control

655

Figure 17. Experiment in indoor navigation

3. Sensor based navigation, while keeping its balance, has been realized using the implemented command system. REFERENCES

1. K. Yamafllji and T. Kawamura, "Postural control of a monoaxial bicycle", Journal of tile Robotics Society of Japan, Vol.7 No.4 pp.74-79, 1988 (in Japanese). 2. O. Matsumoto, S. Kajita and K. Tani,"Attitude estimation of the the wheeled inverted pendulum using adaptive observer", Proc. of 9th Academic Conf. of tile Robotics Society of Japan, pp.909-910, 1991 (in Japanese). 3. E. Koyanagi, S. Iida, K. Kimoto and S. Yuta,"A wheeled inverse pendulum type self-contained mobile robot and its two-dimensional trajectory control", Proc.of ISMCR'92, pp.891-898, 1992. 4. J. Medanic and Z. Uskokovic, "The design of optimal output regulators for linear multivariable system with constant disturbances", Int. J. Control, Vol.37, No.4, pp.809830, 1983. 5. S. Iida and S. Yuta,"Vehicle command system and trajectory control for autonomous mobile robot", Proc. of the 1991 IEEE Int. Workshop on Intelligent Robots and Systems, pp.212-217, 1991. 6. S. Iida and S. Yuta, "Feedforward current control method using two-dimensional table for DC servo motor software servo system", Proc. of the 1988 IEEE Int. Conf. of Industrial Electronics, pp.466-471, 1988.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

An Outdoor All-purpuse Takao

OKUI

Robots System Platform and Yoshiaki

657

for

Autonomomous

SHINODA

FOURTH RESEARCH CENTER TECHNICAL RESEARCH and DEVELOPMENT JAPAN DEFENSE AGENCY 2-9-54

Mobile

Fuchinobe,Sagamihara,Kanagawa

INSTITUTE

229 J a p a n

1. A B S T R A C T A u t o n o m o u s g r o u n d robots for exploring n a t u r a l e n v i r o n m e n t s are developed in our r e s e a r c h center, along with t h e r e c o g n i t i o n of t h e c o n f i g u r a t i o n s and t h e obstacles. The t h r e e key technologies for such a robot are t h e s t u d y of t h e locomotion m e c h a n i s m , the a u t o n o m o u s s y s t e m a n d t h e m u l t i p l e robot control. This p a p e r will discuss the definition of r o u g h t e r r a i n a n d t h e e s t a b l i s h m e n t of t h e f u n d a m e n t a l precondition based on t h e r e s u l t s of t h e e x p e r i m e n t s we conducted. It will also propose an a u t o n o m o u s system for robots t a k i n g not only its c o n f i g u r a t i o n s into c o n s i d e r ation, b u t w e a t h e r , l i g h t i n g and t e m p e r a t u r e conditions as well.

2. 1 N ' I ~ O D U C T I O N

The goal of our work is the d e v e l o p m e n t of an a u t o n o m o u s s y s t e m t h a t does not depend on t h e controlled condition of i n d u s t r i a l s e t t i n g s , b u t functions in an u n s t r u c t u r e d outdoor e n v i r o n m e n t . The difficulties to r e a l i z e t h e a u t o n o m o u s system are not only that robots h a v e t h e l i m i t a t i o n of t h e c a p a c i t y of t h e h a r d w a r e i t self ( e . g . , u n c e r t a i n t y and error), b u t also that the architecture and t h e d a t a s t r u c t u r e for robots r e m a i n s unsolved. Recently, m a n y a r c h i t e c t u r e s for a u t o n o m o u s i n t e l l i g e n t robots are being proposed and d e m o n s t r a t e d . The C a r n e g i e Mellon N a v l a b is n a v i g a t e d on roads. [1] The C a r n e g i e Mellon A m b l e r was developed in r e s p o n s e to a u t o n o m o u s e x p l o r a t i o n of p l a n e t a r y a n d l u n a r surfaces. [2-4] The M I T M o b i l e Robot group d e m o n s t r a t e d complex control

658 s y s t e m s , o t h e r w i s e k n o w n as s u b s u m p t i o n a r c h i t e c t u r e , for a u g m e n t e d f i n i t e s t a t e m a c h i n e s . [5][6] H o w e v e r , a u t o n o m o u s r o b o t s h a v e y e t to be completed. O u r a p p r o a c h to t h e a u t o n o m o u s s y s t e m was to f i r s t c o n s i d e r t h e d e f i n i t i o n of r o u g h t e r r a i n . R o u g h t e r r a i n ( F i g . l ) c o n s i s t s of a v a r i e t y of a r t i f i c i a l a n d n a t u r a l objects, such as t r e e s and ditches, a n d also c h a n g e s in c o n f i g u r a t i o n , w e a t h e r , l i g h t a n d so on. We h a v e a n a l y z e d a n d defined t h e t e r r a i n o u t d o o r s as shown in Fig.2. It is obvious t h a t m a n y k i n d s of robots locomote on such a r o u g h terrain autonomously. Therefore, we h a v e e s t a b l i s h e d t h e f o l l o w i n g preconditions"

Figure 1: Rough Terrain Soil-Rock 9Kind of grain 9Water content 9Wet density

- - Configuration 9Ground bearing power

Shape 9 9 9

Rough Terrain

Obstacle 9Natural obstacle (weed,forest,river etc.) -Artificial obstacle (building,wall,etc.) Figure 2: The Definition of Rough Terrain

659 1) T h a t t h e robot is as small and light as possible and has high mobility against rough terrain. Therefore, we i n t r o d u c e d t h e concept of giving the robot the ability to change p o s t u r e . 2) T h a t the robot has a map which describes the rough t e r r a i n between the s t a r t point and the goal point. However, the map is rough and does not h a v e complete information on obstacles, w a s h boards, changes in configurations, and so on. 3) T h a t the robot has a d a t a b a s e of its postures which p a r t i a l l y describes how to t r a v e r s e rough t e r r a i n . 4) T h a t a man indicates to the robot the s t a r t and goal points in which the robot will a u t o m a t i c a l l y create a global p a t h and will locomote along t h a t path. By u s i n g a model (Fig.3), we s t a r t e d the autonomous system, which does not depend on a locomotive mechanism, u n d e r the above p r e c o n d i tions. We h a v e n a m e d this robot the Autonomous Mobile A l l - p u r p s e Platform (AMAP). In this paper, we are proposing an a r c h i t e c t u r e for an a u t o n o m o u s systemfor AMAP predicated on the b e h a v i o r - b a s e d concept. This c o n cept is becoming widely adopted. We h a v e extended t h i s and are c o n s t r u c t i n g the b e h a v i o r - b a s e d intelligence h i e r a r c h y system. The r e m a i n d e r of this paper is organized as follows: In Section 3 t h e s y s t e m concept is described. The b e h a v i o r of robots t h a t l o c o mote on r o u g h t e r r a i n are discussed. In Section 4 s y s t e m s t r u c t u r e is p r e s e n t e d . This s t r u c t u r e was constructed by decomposing the a r c h i t e c t u r e into t a s k - a c h i e v i n g modules along the flow of the f u n c tion. Conclusions follow in Section 5.

Figure3:

AMAP

660 3. S Y S T E M

CONCEPT

A robot m u s t execute a diverse a r r a y of b e h a v i o r s in order to move on an o u t d o o r r o u g h t e r r a i n . (Fig.4) T h r o u g h our e x p e r i e n c e s of o p e r a t i n g AMAP on a r o u g h t e r r a i n model and a n a l y z i n g the r e s u l t i n g behaviors, we developed a concept for t he a r c h i t e c t u r e . As a result, we h a v e classified t he b e h a v i o r s into six t a s k s to create t h e a u t o n o m o u s s y s t e m as follows: 1) Global path planning P l a n s t h e global p a t h by u s i n g a map a n d m a k e s i m p r o v e m e n t s upon t h i s map. 2) Task management Selects from t h e following four t a s k s and sensor control for o b t a i n i n g t h e i n f o r m a t i o n t he robot r e q u e s t s . 3) Locomotion on unknown rugged configurations Recognizes a r o u g h t e r r a i n in f r on t of t h e robot a n d m a k e s m o t i o n p l a n n i n g for t r a v e r s i n g r u g g e d c o n f i g u r a t i o n s w h i c h are not described on t he robot's d a t a b a s e . R u g g e d c o n f i g u r a t i o n s are defined as t he t o p o g r a p h y t h a t t h e r o b o t in t h e basic p o s t u r e can not t r a v e r s e b u t by c h a n g i n g p o s t u r e can.

Figure 4: The Outdoor Autonomous Locomotion

661 Global path planning / / Task management \ ]Locomotion on unknown\ /rugged configurations \ ] Locomotion on known \ / rugged configurations ~

o = /

+~ +~= .~

Obstacle avoidance

/ Locomotion . on plain f!e!d s

I

Figure 5: B e h a v i o r - b a s e d Intelligence Hierarchy System 4) Locomotion on k n o w n rugged configurations Recognizes configurations in front of the robot and makes motion inferences by selecting from the database, which describes these rugged configurations and the way to traverse them. 5) Obstacle avoidance Detects and avoids rough t e r r a i n t h a t the robot can not locomote and traverse. 6) Locomotion on p l a i n fields Moves along the global path, conducts real time control, and detects changes from plain fields to rugged configurations or obstacles. Through analyzing these tasks (Fig.5), we have created a b e h a v i o r based intelligence h i e r a r c h y system t h a t deals with a diverse group of requested behaviors. [7]

4. S Y S T E M S T R U C T U R E

Before building the system flow on the basis of the previous system concept, we introduced five function stages. The Stages are sensing, processing, recognition, judgement and control. The sensing stage consists of sensors' hardware for obtaining the data about the environment around the robot. The processing stage is the extraction of signals from the sensors. The recognition stage is the generation of information for judgement. The judgement stage consists of motion planning to indicate where to go and what to do. The control stage constitutes motion control depending on the m e c h anism of the robot. In each t a s k and stage we have developed modules by considering the flow of data and distributivity. Therefore creating a system flow. (Fig.6) Each module is as follows:

662

Figure 6: The Flow of Autonomous System

..... 1) Location M a k e s an e s t i m a t i o n of the robot's c u r r e n t location a n d m a k e s a n y corrections to localize the robot's position. 2) Contact force Detects t h e a m o u n t of contact force for soft t e r r a i n . 3) Range image Acquires r a n g e images. This module is a k i n d of p i x e l - l e v e l fusion t h a t can be u s e d to i n c r e a s e the information content associated with each pixel in an image formed t h r o u g h a c o m b i n a tion of m u l t i p l e images. We are o b t a i n i n g t h e s e i m a g e s by u s i n g a t i m e - o f - f l i g h t - t y p e laser r a n g e finder (Fig.7) and C C D c a m e r a s . 4) Contact Touches c o n f i g u r a t i o n s and obstacles t h e e x t r a c t i o n of contours. We m a d e a r u b b e r a c t u a t o r t a c t i l e sensor w i t h e l a s t i c i t y to r e act to sudden changes in configuration. (Fig.8) 5) Edge extraction E x t r a c t s t h e b o u n d a r y b e t w e e n plain fields and otherwise, and sends t h e i n f o r m a t i o n to Caution. [7] 6) Region segmentation I S e g m e n t s i m a g e d a t a f r o m C C D c a m e r a s by a n a l y z i n g t h e t e x t u r e of t h e photo to detect t h e a r e a s w h e r e t h e robot can not locomote. 7) Pattern recognition Detects and recognizes moving objects.

663

Figure 7: Laser Range F i n d e r

F i g u r e 9: Region S e g m e n t a t i o n

Figure 8: Tactile Sensor

Figure 10: Polygon Approximation

8) Region segmentation II S e g m e n t s r a n g e d a t a into plain fields, convexs, concaves and blind spots. We h a v e defined s e g m e n t a t i o n by h e i g h t . (Fig.9)

[8][9] 9) Polygon approximation A p p r o x i m a t e s the boundaries of regions t h a t are obtained by Region s e g m e n t a t i o n I and lI to a convex hull (fig.10) and l a bels each region as "possible to move" or "impossible to move".

664

10) Polyhedron approximation A p p r o x i m a t e s the shape of configurations and obstacles by u s i n g inclinations in r e l a t i o n s h i p to a polyhedron. [9] 11) Data fusion Recognizes t h r e e dimensional shapes, moving objects and s o f t ness. [10] This module is a k i n d of f e a t u r e - l e v e l fusion t h a t can be used to both increase the likelihood t h a t a f e a t u r e , e x t r a c t e d from the i n f o r m a t i o n provided by t h e sensors, is r e c o g nized, and to c r e a t e a d d i t i o n a l composite f e a t u r e s for use by t h e system. It also is a s y m b o l - l e v e l fusion t h a t allows t h e i n f o r m a t i o n from t h e m u l t i p l e sensors to be effectively used t o g e t h e r at the h i g h e s t level of abstraction. 12) Adjancent detection Detects changes a r o u n d the robot by using tactile sensors and supersonic sensors, and produce obstacle fuzzy vectors t h a t e x press the distance and direction of obstacles. 13) Caution Detects a c t u a l changes in plain fields to m a k e a robot stop and produce caution fuzzy vector ( F i g . l l ) from obstacle fuzzy v e c tors. This module is a k i n d of s i g n a l - l e v e l fusion which r e f e r s to a c o m b i n a t i o n of signals from a group of sensors. Their objective is to provde a signal t h a t is u s u a l l y of t h e same form as the original signals, but it is of g r e a t e r quality. 14) Local path planning P l a n s a subgoal, which describes t h e position and t h e direction t h e robot should locomote, in the s e n s i n g a r e a of the robot and guides the robot to avoid obstacles with Reaction module. 15) Selection of action patterns Selects from t h e d a t a b a s e of actions (Fig.12) to t r a v e r s e r u g g e d configurations.

F i g u r e 11: Caution Fuzzy Vectors

F i g u r e 12: D a t a b a s e of Actions

665 16) Generation of action patterns Generates sequences of posture by using the posture database. 17) Task selection Makes a j u d g e m e n t of the task based on the ability of locomotion and the r e s u l t of recognition. 18) Map improvement Matches the global and local maps and makes i m p r o v e m e n t s based on the obtained e n v i r o n m e n t a l information. 19) Global path p l a n n i n g Generates a global path on the map. 20) Reaction Reacts to avoid rugged configurations and obstacles in a m a n n e r s i m i l a r to a h u m a n ' s unconditioned response by s y n t h e s i z i n g the planned path and the reaction vector. (Figl3) 21) Locomoting control Stabilizes the body of the robot in order to deal w i t h small changes in configuration. Locomotes by obeying speed and direction commands. 22) Posture control Changes control of posture by obeying the posture command. 23) Sensor selection Selects sensors to obtain information the robot needs. 24) Sensor control Controls the direction, the v i s u a l field and the m a g n i f i c a t i o n of the sensor to obtain the necessary information. We have constructed the e x p e r i m e n t a l system in our l a b o r a t o r y in order to study the system flow. (Fig.14) All of the devices, which belong to Sun's engineer work station, are connected by E t h e r n e t to condct p a r a l l e l d i s t r i b u t e d processing and to increase the e f f i c i e n cy of programming. The signals fromthe sensors are t a k e n in t h r o u g h

I I_~nnedPath ~ , I fns';CF.uction

f

Y P

Robot [ / / -

I _ o ~

X

React ion Vector J

Figure 13" Reaction

666 Body External sensor

CCD image processing] ....I

9 CCD

9Laser range finder 9Tactile 9Ultrasonic .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Internal sesor 9 G sensor

9 F.0.G.

9 Inclinometer 9 Enc0der Motor ammeter

Robot model

Range imageprocessing Tactile data processing ]

i

Environmental recognition

~

~ ~

Ultrasonic data pr0cessingh

ii

" ] ~j -I

Self-localization

_-7 Locomotion control

Figure 14: The Experimental System t h e V M E - b u s a n d a r e p r o c e s s e d by devices i n s t a l l e d w i t h r e a l t i m e UNIX. On t h e c o n t r a r y , t h e s i g n a l s to t h e r o b o t are p r o c e s s e d by devices i n s t a l l e d w i t h r e a l t i m e U N I X a n d are t a k e n out.

5. C O N C L U S I O N In t h i s p a p e r , we d e f i n e d r o u g h t e r r a i n , d i s c u s s e d t h e a u t o n o m o u s s y t e m f o r r o b o t s w h i c h l o c o m o t e on r o u g h t e r r a i n , and proposed the behavior-based intelligence hierarchy system which shows t h a t d i v e r s e sets are feasible. We c r e a t e d t h e A u t o n o m o u s Mobile A l l - p u r p o s e P l a t f o r m ( A M A P ) w i t h t h e a b i l i t y to c h a n g e its p o s t u r e a n d e q u i p p e d A M A P w i t h e x t e r n a l s e n s o r s such as t h e CCD a n d t h e l a s e r r a n g e f i n d e r a n d i n t e r n a l s e n s o r . We e x p e r i m e n t e d on s e v e r a l m o d u l e s w h i c h i n c l u d e L o c a t i o n , R a n g e i m a g e , C o n t a c t , Edge e x t r a c t i o n , R e g i o n s e g m e n t a t i o n lI a n d P o l y h e d r o n a p p r o x i m a t i o n . We o b t a i n e d some i n t e r e s t i n g r e s u l t s . We h a v e t h r e e objectives for t h e f u t u r e . F i r s t , we p l a n to c r e a t e a p r o c e s s flow for each m o d u l e by doing more e x p e r i m e n t s w i t h A M A P . Second, we will s t u d y t h e p r o p e r t i e s a n d l i m i t a t i o n s of t h e a r c h i tecture. Finally, we w i l l c l a r i f y t h e t r u e a r c h i t e c t u r e and data s t r u c t u r e for a u t o n o m o u s robots.

667 REFERENCES

1. Thorpe.C. (editor), "Vision and Navigation", t h e C a r n e g i e Mellon Navlab, Kluwer,1990. 2. R.Simmons and T.M.Mitchell, " A Task Control A r c h i t e c t u r e for Mobile Robots", Proc AAAI Spring Symposium, Stanford, California, March 1989. 3. J . E . B a r e s and W . L . W h i t t a k e r , " W a l k i n g Robot with a circulating gait ", Proc. of IEEE Int. Workshop on I n t e l l i g e n t Robots and Systems, p p . 8 0 9 - 8 1 6 , 1990. 4. Bares J.et al., " A m b l e r - a n autonomous rover for p l a n e t a r y e x p l o ration", IEEE Computer, Vol.22, No.6, 1 8 - 2 6 , 1989. 5. Rodney A. Brooks, "A Robust Layered Control System for a Mobile Robot", I E E E J o u r n a l of Robotics and Automation, V o l . R A - 2 , No.l, 14-23,1986. 6. Rodney A. Brooks, "A robot t h a t walk; e m e r g e n t behavior from a carefully evolved network", Proc. of 1989 Int. Conf. on Robotics and Automation, 6 9 2 - 6 9 4 , 1989. 7. Okui et al., "Astudy on an Autonomous Ground Vehicle:The system and the o f f - r o a d Detection using CCD", Proc. of 1992 Conf., J a p a n Society of Mechanical Engineers, Vol.C, N o . 9 2 0 - 1 7 , 5 1 6 - 5 1 8 , 1 9 9 2 . 8. F u r u t a , Okui et al., " A S t u d y on Recognition of T e r r a i n using L a s e r Range F i n d e r : Discussion on Simple Expression of T e r r a i n and its perception Method", Proc, of 1992 Conf., Robotics Society of J a p a n , No.3, 9 6 7 - 9 6 8 , 1992. 9. Kouzuki, Okui et al., " A Study on Recognition of T e r r a i n using Laser Range Finder: Detection of Planes using 3 - D hough T r a n s f o r m a t i o n and a study on Expression of t e r r a i n ", Proc. of 1993 Conf., Robotics Society of J a p a n , No.2, 7 8 3 - 7 8 4 , 1 9 9 3 . 10. A.Abidi and C.gonzales (editor), "DATA F U S I O N i n Robotics and Machine Intelligence", Academic Press Inc., 1992.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

669

Mechanism and Control of Propeller Type Wall-Climbing Robot A. N I S H I

and H. MIYAGI

Faculty of Engineering, Miyazaki University Gakuen-Kibanadai, Miyazaki 889-21, JAPAN Abstract Use of a wall-climbing robot for rescue or fire-fighting has been anticipated for a long time. Four quite different models have been developed in our laboratory. The present model can move on a wall by using the thrust force of a propeller and can fly whenever it is required. Its mechanism and control system are discussed. 1. I N T R O D U C T I O N A robot capable of moving on a vertical wall has been looked forward to for a long time. It could be used for rescue and/or fire-fighting in high-rise buildings. Four quite different types of wall-climbing robots have been developed in our laboratory over the last 20 years. The first model, shown in Fig. 1, has a large sucker and crawlers as a moving mechanism [ 1]. Recently, many varieties of this type model have been developed for wall inspections in Japan[2]-[4]. The second type, shown in Fig.2, is a biped walking model, which has a small sucker on each foot[5]. As this can be applied to almost all irregular wall surfaces, it has a wider applicability than the former. In general, the walking motion is not very quick so that the walking model takes much time to climb to higher places on the wall. Unfortunately, a robot would be required to climb to higher places on buildings in a short time for emergency purposes, such as carrying a rescue tool or fire-fighting in buildings. The third model, shown in Fig.3, aims at these purposes[6]-[ 10]. It has propellers, and their thrust forces are inclined a little toward the wall to make use of the frictional force between the wheels and the wall surface, as well as to support the robot itself. Sometimes unexpected strong winds occur on the walls of high-rise buildings. In such cases, the control system to compensate for the wind load is important to avoid having the robot falling from the surface. This situation has been previously discussed in detail in Refs.[6] and [7]. Often there are many obstacles at the lower part of buildings, such as trees, eaves, entrances, etc.. In such cases, it would be useful if the robot could fly over these obstacles and reach the upper wall surface. Moreover, if the robot fell from a higher place on the wall surface by accident, it would need to make a soft landing to avoid endangering itself or its surroundings. These maneuvers can be done by employing a mechanism and control system enabling it to fly. As the robot has enough thrust force to sustain itself in the air, it can fly or land. This is the fourth model and its mechanism and control system are discussed in this paper[ 10]-[ 13].

670

Fig. 1 Large sucker model. 2. M E C H A N I S M

Fig.2 Biped walking model. Fig.3 Propeller type model.

AND CONTROL

OF THE ROBOT

2.1 Thruster of the wall-climbing robot An example of a conceptual model is shown in Fig.4. A conventional thruster is a combination of an engine and a propeller. Although a small jet engine and rocket motor can be made available for emergency use, they are expensive and noisy compared with the conventional engine and propeller. There are two types of propellers: those with larger diameters, similar to helicopter rotors; and smaller ones, similar to light-plane propellers. The former can produce a larger thrust force for the same engine power. However, the wind load is larger, and it is difficult to design a wall-climbing robot using this type. Therefore, the latter has been used for the model concerned here. 2.2 Mechanism of the flight model To use a robot for carrying rescue tools or for fire-fighting, first, it is necessary for the robot to reach and attach itself to the wall surface at the lower part of buildings. However, often there are obstacles such as trees, eaves and entrances at the lower part of the buildings. In such cases, it may be difficult for the robot to attach itself to the wall directly. As the robot, concerned here, has enough thrust force to lii~ itself up in the air, it can fly over these obstacles employing a suitable mechanism and control system, while the wind is mild or there is a calm wind condition. Almost the same situation will occur at higher parts of the buildings, where there are eaves or steps on the wall surface which are difficult to get over using the conventional moving mechanism. Moreover, there is danger of the robot falling from the wall surface by an unexpected strong wind or accident. In such cases, the robot must make a soft landing, or the fall would be very dangerous for the robot itself and its surroundings. These situations can be avoided by designing the robot so that it is able to fly or land. These maneuverings are shown in Fig.5 schematically.

671

Fig.4 Conceptual model.

Fig.5 Maneuvering of the flight model.

However, its major purpose is not to fly like a helicopter, but to move on a wall surface using its wheels, since moving on a wall is safer. Therefore, it has a long body with many wheels, and its thrust force has a component toward the wall to produce frictional force between the wheels and the wall surface. The conceptual robot is designed for this requirement. Two propellers are installed at the top and bottom of the trapezoidal body. Many light-weight wheels are arranged on the outside of the body, so that two outer planes can touch the wall surface, and it can get over the irregularity on the wall surface, such as the window frames, eaves and so on. When the robot touches the wall, the thrust force axis is inclined a little toward the wall, and the frictional force of the wheels is produced automatically.

2.3 Control mechanism of the flight model Three mechanisms have been constructed and examined to produce the control force of the flight model in the air. The first one is inclining the thrust force directly. This method can produce a large force, however, it is not suitable to the flight robot, because of the heavy weight of the mechanism and hence the insensitive response. The second mechanism is directing the propeller slip stream with vanes. This control mechanism is easy to construct and lighter than the former. Therefore, it is supposed that this is better for the flight model. The slip stream of the main thruster propeller rotates and thus produces reaction torque on the body. Counter rotating propellers are used to decrease this torque, however, its magnitude is variable for changing the thrust force depending on the up and down motion of the robot. The regulating stream vanes are installed in the slip stream of the propeller, and a couple of the vanes are controlled to eliminate the reaction torque. The flight control is accomplished by two control blades. These are set in the down stream of the regulating vanes. These are set crosswise. Each of them are controlled independently to produce a side force in any direction. A considerable amount of drag force was produced by these three blade rows and this decreases the thrust force. Moreover, the interaction of forces produced by each row was observed, and a complicated control software was required. A laboratory model and a flight model of this type were constructed and tested. The laboratory model could be controlled easily, however, the flight model was difficult to control since a more complicated mechanism was required to decrease the non-linear effect of the control, caused by

672 the heavy weight of the control mechanism. So, this mechanism was abandoned for the third one. As the body is symmetrical along the thrust axis, a righting force is not produced by the thrust force, when the robot is inclined in the air. Therefore, the control thrusters are required to produce the force to control the robot in the air. This is the third method and is applied to the present model. Eight DC-motor driven propellers are employed as a control thrusters, which are set at the top and bottom of the robot body, shown in Fig.4. As the flight model has 5 degrees of freedom in the air except for the up and down motion, this mechanism is redundant. Each control thruster can be controlled independently, therefore the control is easy compared with the above two methods. For the mechanism, however, a powerful and light weight battery has to be installed on the robot. The tail rotor of the helicopter model was employed as a thruster, and it can produce both forward and backward forces depended on the blade pitch angle. The thrust forces to produce the inclination of x- and y-directions are given by the forces Fx and Fy respectively, and the rotational motion in z-direction is given by Fz. Then, the following relations are derived: Fx=F2-F4+ (Fs-F7) Fy=F~-F3 + (-F6 +F8) Fz = (FI +F2 +F3 +F4)-(Fs +F6 +F7 +Fs)

(1)

(2) (3)

where F ~ F s are the forces of thrusters, and these are shown in Fig.6. The following relations are assumed: F2-F4=F5-F7

(4)

FI-F3=-F6+Fs FI +F2+F3+F4=F5 +F6+F7+Fs

(5) (6) 7'

t-

i

?

{k. "( k,,..

_| (~

.-

y~

~x

I

.o

Fig.6 Schematic of control thruster.

Fig.7 Schematic of laboratory model.

673 Then, each thrust force can be given by Fx, F), and Fz as follows: F~ = (F~+FY2)/4 F2 = (Fy +FJ2)/4 F3: (-F~ +FJ2)/4 F4: (-Fy +F._/2)/4 F,:(-Fy-FJ2)/4 F6:(-F,~-Fz/2)/4 FT=(Fy-Fz/2)/4 Fs:(Fx-FY2)/4

3, M E C H A N I S M

AND TESTING OF THE LABORATORY

(7)

MODEL

3.1 Mechanism of the laboratory model. Before the construction of a flight model, an investigation was carried out using a laboratory model, shown in Fig.7 schematically. It had a control mechanism similar to the conceptual model, but did not have the main thrusters. It was designed to obtain experimental data of the control mechanism easily. The model was supported by a rigid bar at the center of gravity with a universal joint. The inclination angles of x- and y-directions are sensed by both potentiometers and inclination sensors, and the rotational motion in z-direction is measured by an another potentiometer. Eight thrusters were controlled independently, and the inclination and rotation of the model were controlled both automatically and manually. As each control was independent for the small increment of each direction, the equations of motion were simple. The state feedback method was used, angles and angular velocities were taken as the feedback variables in each direction.

3.2 Wireless communication system and its preliminary testing Wireless communication is required to control the flight robot in the air or at higher places on a wall. The radio controller of a model plane was employed to simulate the control system for the laboratory model. The sensor signals from the model were given by wire to the computer. The feedback signals to the model were given by radio signals. Eight channels of the radio controller were used for the servo-motors of the thrusters. The preliminary experiment demonstrated that this method was an effective and easy way to control the model. This can be developed for the two-way communication between the flight model in the air and the computer on the ground.

3.3 Experimental results The moment of inertia of the model was adjusted by adding the corresponding mass on the model in order to compensate the lack of main thrusters. Then, almost the same amount of control forces was required compared with the flight model. The equations of motion are given as follows: ~

Ix ~ =F~L+ To~

(8)

,.

b r =FyL+Toy

(9)

Iz "~=Y~l+Toz

(10)

674 ,

i

i

J

I

|

I

t~

F .

tD

= 0.1

i

O

I

c~ 0 c~ .,_, r

N 0

-0.I

21,

0

I'0

2'0

-03

30 40 Time (see)

t

I

10

,

1

20

I

30

I

I

40 Time (see)

(a) Inclination sensor. (b) Potentiometer Fig.8 Responses of the laboratory model. where .Ix, Iy and Iz are the moments of inertia in x-, y- and z-directions respectively, L the distance between upper and lower control thrusters, and 1 the breadth of the model shown in Fig.7. Tox, Toy and Toz are the resultant torques around x-, y- and z-directions, respectively, produced by the engine and propeller operations. The time lag of the inclination sensors are measured and given as r =0.38 s. The optimum feedback coefficients are determined by the pole assignment method and its simulations. Examples of the results are shown in Fig.8, for the different feedback signals from the potentiometers and inclination sensors. The model was inclined about -0.13 rad initially, and after a given start signal it moved to the neutral position automatically. After that, the angles were controlled manually in _ 0.13 rad. The response angle showed that a certain amount of overshoot and small oscillations appeared in the inclination sensor signals. Almost the same results were obtained for the y-direction. A satisfactory control could be expected by these results.

4. FLIGHT

SIMULATION

4.1 Computer simulation Before the construction of the flight model, the flight simulation was carried out to ascertain the moving of the model in the air. The maneuvering of the flight model is done by the control of inclination angle of the model. When the model is inclined at a small angle, the side force component is produced by inclining the main thrust force. Then, the side movement occurs with acceleration. The engine revolutional speed is adjusted a little to maintain the same height. To move the model back to the initial position, the inclination angle has to be changed to the opposite direction. These control and motions are simulated by the computer and a result is shown in Fig.9. This shows that the robot moved about 10m from the origin in 24s, after that it

675 moved to the opposite position. The required changes of the inclination angle and velocity are shown in the same figure for the movement. This was obtained by using the pole assignment method of analysis.

4.2 Manual control of movement using the laboratory model As the laboratory model is supported by a bar, it cannot move to the side direction. This motion was created by the computer and combined with the inclination of the laboratory model. A combination of step inputs and a few adjustments were given to the model manually, and the inclination angle of the model was obtained by the potentiometer signals with a certain amount of time delay. The distance and velocity of the side movement as a function of the inclination angle were calculated using a computer. These results are shown in Fig. 10. It is shown clearly that the flight model can be controlled easily by the manual control. 0 L

Distance L

0.1

Incl. Angle O /

n'~,

;

",J

,,",,

\

I \I

',

L--:r.----~---..~. -..~---'---=-'t..,--- I

IoEE

----- V e l o c i t y V

. J ~

.t.-'~--~.~..~.~._.~.,

!,,/\

-0.1 .

i

I

0

~

I

I0

!,

.

I

20

-10

.....

L.

I

30

L

I

40

50

Fig.9 Computer simulation of the flight robot.

I

t (sec)

O

CD 0.I

'

F ,/',/ '/'~"

/

I

/

i

~

9

'

I

~

i- - I \ " ,

10

'

----- Manual Input Distance L Incl. A n g l e 8

F't-~"

----veto~iw

\

60

EE ...I

v

o

L L.W L'.'XJ -0.I

f

0

,

i

10

t

i

2o

n

. . . . . . 10

t (sec)

Fig. 10 Manual control of the laboratory model.

30

>

676 5. F L I G H T

MODEL AND ITS TESTING

5.1 Dynamic behavior of the main thruster In order to obtain the dynamic behavior of the main thruster, a test apparatus was constructed. The purposes of this experiment are as follows: (1) to see whether enough thrust force could be produced to sustain the robot itself in the air, (2) to obtain the dynamic thrust force, the side forces in x- and y-directions, and the resultant torque in z-direction for a sudden change of engine operation, (3) to ascertain the effect of electronic noise of the ignition plug and other high voltage parts to the radio communication system. These tests are now progressing.

5.2 Flight model and its testing The flight simulations and above experiments show that the flight model must be controlled well in the air. The last stage of the development is the construction and testing of the flight model. This is now in preparation.

6. C O N C L U S I O N Three different types of wall-climbing robots have been developed in our laboratory. The first model had a large sucker and could be utilized on flat and/or wide wall surfaces. The second was a biped walking model, which had a small sucker on each foot. This could be applied to almost all irregular wall surfaces. The third model could move on a vertical wall at high speeds. It had propellers, and the thrust force supplied both the forces to lift the robot itself up and produce the frictional force between the wheels and wall surface to be safe on a wall. When the third model is employed for emergency use, at first it is necessary for it to reach and attach itself to the wall surface. Often there are obstacles at the lower parts of the buildings, such as eaves, trees, entrances, etc.. Therefore, it would be very useful if the robot could fly over these obstacles and reach the upper wall surface. Moreover, if the robot falls from the wall surface by accident, a soft-landing control has to be employed to avoid danger to the robot and the surroundings. For these purposes a flight model has been considered. The mechanism and control systems for such maneuvering have been discussed. A laboratory model was constructed and tested. As a result, it was demonstrated that the laboratory model could be controlled easily, and similar control methods can be applied to the flight model. The flight simulation was carried out to ascertain the control mechanism and system. As the results, it is shown that the flight robot can be controlled easily and the control mechanism is suitable for the flight robot. The next step is the construction of the flight model and its testing in the air. This is now in preparation.

677

REFERENCES [1] Nishi A., Wakasugi Y. and Watanabe K., Design of a Robot Capable of Moving on a Vertical Wall, Advanced Robotics 1-1, 33-45(1986). [2] Nagatsuka K., Vacuum adhering crawler system "VACS"(in Japanese),Robot 53,127135(1989). [3] Ikeda K., Nozaki T. and Shimada S., Development of wall-climbing robot (in Japanese), Proc. 1st Symp. on Construction Robotics in Japan, 107-116(1990). [4] Sato K., Tominaga I., and Fukagawa Y., Development of wall-climbing robot and its application to nuclear power plant (in Japanese), Piping Technology 29, 92-96(1987). [5] Nishi A., Biped Walking Robot Capable of Moving on a Vertical Wall, Mechatronics 2-6, 543-554(1992). [6] Nishi A. and Miyagi H., A wall-climbing robot using thrust force of propeller, (Mechanism and control in a mild wind), JSME Int. J, Series C, 36-3,361-367(1993). [7] Nishi A. and Miyagi H., A wall-climbing robot using thrust force of propeller, 2nd Report, Trans. Jpn. Soc. Mech. Eng. (in Japanese), 58-547C, 163-169(1992). [8] Nishi A., A wall-climbing robot for inspection use, Proc. 8th Int. Sym. on Automation and Robotics in Construction, 267-274(1991). [9] Nishi A., A wall-climbing robot using propulsive force of propeller, Proc. 5th Int. Conf. Advanced Robotics, 320-325( 1991). [10]Nishi A., Control of a wall-climbing robot using propulsive force of propeller, Proc. IEEE/RSJ Int. Workshop on Intelligent Robots and Systems IROS '91, 1561-1567(1991). [ 11 ] Nishi A., and Miyagi H., Propeller type wall-climbing robot for inspection use, Proc. 10th Int. Symp. on Automation and Robotics in Construction (ISARC),189-196 (1993). [12]Nishi A., and Miyagi H., Development of wall-climbing robot, Intelligent Automation and Soft Computing, Vol.2, TSI Press, 327-332 (1994). [13]Nishi A., and Miyagi H., Mechanism and control of propeller type wall-climbing robot, Proc. IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems, IROS '94, 1724-1729 (1994).

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

679

Automated Guided Vehicles in Japan --Recent Trends in Advanced Research, Development, and Industrial Applications-TOSItlHIRO TSUMURA University of Osaka Prefecture Sakai, Osaka, 593 Japan ABSTRACT: Recent trends in advanced AGV arc reviewed and discussed, with examples. Special concern is given to automated position finding, guidance, multi-vehicle strategies, and communication systems, with the focus on engineering, electromcchanical and optical problcrns. In addition to conventional manufacturing and advanced industrial applications, service vehicles such as floor cleaning, blind aids and off-road uses are described. New electric power supply systems are also presented.

1. INTRODUCTION Recently, in developing filll-automated factory manufacturing systems, AGVs (automated guided vehicles), permitting automated guidance and operation, are used as a key means of transportation on road, construction site or other locations. Various means of transportation for factory operation have been put to use (see materials [ 1]-[41), including hoist, crane, earth-moving machine, terrain vehicle, railway carrier system, fork lift truck, and automated or manually controlled carrier with tires. In the past 10 years, automated guided wiretype vehicles have mainly been used for transportation in industry, service and clerical work. Tile world's first AGV was developed in 1963 by Daifuku Ltd. in Osaka. In developing the vehicle, WEB, a U.S. company, transferred AGV technology to Daifuku. Following Daifilku, more than 20 Japanese companies commenced AGV developnlcnt.

Fig. 1 Numbers of vehicles and systems

680 Presently, AGV is being developed by more than 35 companies, of which more than 30 are manufacturing AGV otl a co,nmcrcial basis. Figs. 1-3 sl~ow data on AGVs (source: Japan Industrial Vehicle Assn.) in Japatl. Almost all AGVs in J~q)an are used in factories. Recently, however, new AGV applications have been developed in various fields; tllese applications include office automation, public facilities, railway temaihal, sllopl~ing center, container yard, restaurant, medical care, agriculture, engineering work, and building

Fig. 2 Breakdown of systerns according to vehicle type

Fig. 3 Btcakdown of systems according to business type

681

construction. This paper discusses the recent trends in the research, development and commercial use of AGV. Specifically, it treats automated position finding, guidance, multi-vehicle strategy, and vehicle communication. Another focus is on clectromcchanical and optical techniques.

2. AUTOMATED GUIDANCE SYSTEM

2.1 Reflecting tape guidance (optical or magnetic tape guidance system) In this system, reflecting tape (white tape, metal tape, or tape painted with glass powder) or magnetic tape is placed on the vehicle route, to detect the vehicle's deviation from the predetermined route. Light emitted from the vehicle lower part is reflected by the tape; the reflected light is sensed as signal by the photosensor on the vehicle lower part. In the case of guidance using magnetic tape, a magnetic sensor must also be installed on the vehicle lower part. It is better to place tape over the entire route, than to place it on some parts of the route. This system is superior to the electromagnetic cable guidance system in that the former system permits flexible and easy installation. The tape guidance system is advantageous also because it can bc easily recognized by passers-by. In Japan, more than 10 companies recommend the use of the system for manufacturing parts or office machines. A system using a single tape permits satisfactory operation. To improve operation, however, the following types of tape have been developed: a) Two tapes installed in parallel, to enhance system reliability b) Tape on which bar codes or some kind of marks are printed, to store special data or road information c) Curved tapes, with various curvatures Fig. 4 shows a tape guidance system. 2.2 Laser guidance (laser beam guidance system) The following two laser guidance systems have been devised: a) Straight line guidance system: In this system, laser beam is fixed. b) Guidance system for freely chosen courses: In this system, using a laser scanner, a two-dimensional mirror controller reflects laser beam.

Fig. 4 Tape guidance system

Fig. 5 Laser beam controller for curved course

682 Fig. 5 shows a reflecting mirror controller for guidance on a fi-eely chosen curved course. (The vehicle detects laser-transmitted instructions for guidance.) Presently, straight line guidance systems are commercially available. These systems are superior to electromagnetic or reflecting tape guidance systems, especially when used for carriers in super-clean rooms. Also, the applications of guidance systems for freely chosen courses will soon expand.

2.3 Optical recognition guidance system Various image recognition systems for vehicular or moving targets have been studied and developed. These systems range from simple mark recognition systems to large-scale computerized three-dimensional image recognition systems. In my opinion, the commercialization of image recognition systems require the fillfilhnent of three tasks: reducing cost, enhancing reliability, and increasing processing speed. 2.3.1 Comer cube laser-scaiming system In this system, comer cubes (a kind of retroreflectors) are set above the route (e.g. on the ceiling). A laser scanner scans laser beam to determine the course. A photo-sensing array is installed on the vehicle upper part. Data detected by the array are sent to the AGV steerage circuit. Groups of corner cubes, each group containing two or more cubes, are placed on the ceiling along the route. A fan beam-type laser scanner with rotary encoders is installed on the AGV upper part. Laser beains reflected by a corner cube group are received by AGV's photo-sensor. Based on beam angles, a computer mounted on AGV determines deviation from the course. This guidance system also dctects direction deviation from the reference course. Fig. 6 shows the principle of the guidance system. (See material 111.) This principle can be applied to blind aids. 2.3.2 Bar code mark recognition system In this system, a series of bar code marks are placed on the route, to provide route information to the vehicle. Video camera systems are installed on the vehicle front and/or lower part. Optical data at bar

Corner cube CCa

i

,'///_L,~l,'/_,'///// ,~//_,~/,~/(L,

o

V

Corner cube CCb

o

l

9

I

0

Corner cubes are set al predetermined posilion on ceiling

am scaimer with sensor

AGV ~i~,)l~iJl]lJiJ/~llllll, Y

* 0 = 0 9AGV is on the right course. * CCa = CCb 9AGV is on a course parallel to tile right one

Fig. 6 Principle of a comer cube -laserscanning system

683 code pattern position are processed by a computer mounted on the vehicle, to provide positional and guidance data. This type of systems are commercially available in Japan. 2.3.3 Spot mark/video camera system In this system, small glass balls are set on the course as spot marks, and protected by hard plastic cases. A video camera system is installed on the vehicle front, to measure deviation from the course via CCD receiver. Obtained data are processed by a computer mounted on the vehicle, to provide steerage infornmtion to the vehicle. 2.3.4 Roadside cube guidance System (1) Fixed fan laser-type system Fig.7 shows this system, in which many corner cubes are set along the guard rail, arbitrarily or at predeter, mined positions. Two or three laser fan beams are installed on the vehicle side. Using a mathematical method, the mounted microprocessor determines vehicle position and direction in relation to guard rail, based on odometric data and information on laser beam reflected by the cube. Accuracy in determining vehicle position and direction can be improved by setting corner cubes on both sides of the road. Fig. 8 shows an experimental vehicle. The cube guidance system pemaits the quick and accurate determination of vehicle position and direction. Data, provided continuously by the system, are used for adjusting the guiding controller in the automatic position detenuination system.

Fig. 7 Road side corner cube guidance system

Fig.8 Experimental car for roadside corner cube guidance system

(2) Fan beam scan-type system In this system, corner cubes are set on both sides of the road, especially when the road is undulate or uneven. The vehicle is equipped with dead reckoning and laser scanning systems, to determine the present vehicle position and the predetermined vehicle direction. The laser scanning system scans retro-reflectors at predeternaincd positions to determine the angles of reflected laser beams. Fan beam laser scanning permits this operation even when the vehicle is on a slope. Fig. 9 shows the position deternlination hardware design of a vehicle using tile fan beam scan-type system. Tiffs system will prove usefill in aiding blind persons.

684

L. T. : Laser transceiver ( scanning via rotation)

Fig. 9 Position determination hardware design for fan beam scan type vehicle

2.3.5 Edge detection system This system is designed to detect road edge or guard rail using route infomlation. The system uses one or two CCD cameras mounted on the vehicle. It is usefi~l to mark the course by a white or dashed line. Fig. 10 shows a vehicle with a road edge detection system. 2.4 Laser beacon system Ill 2.4.1 One-beacon system This system uses a laser scanning system installed at a predetermined position. The scanning system has a precision digital encoder, and a transmitter sending angular data to the surface vehicle. The vehicle is cquippcd with three or more photo-sensors, each receiving rotating scanning laser beam. Upon reception, received angular data are sent to a processor. In

Fig. 10 Guidance via edge detection system

685

Fig. 11 Experimental laser beacon system geometric design, the one-beacon system is similar to the sonic beacon system. However, the former system, permitting the emission of sharper laser beams, ensures higher accuracy. The one-beacon system is suitable for vehicles on seaside construction sites, ship positioning and guiding systems, and indoor AGVs. Fig. 11 is the illustration of an experimental robot vehicle of simple design 18i. 2.4.2 Two-bcacon system for 3-dimensional positioning 191 This system has two laser scanning systems installed at predetermined positions. Each of these systems is equipped with a precision digital encoder. Tile photo-sensor, mounted on tile vehicle, receives laser beams cmitted by the cncoders, and determines vehicle position via geometrical analysis. Information on vcrtical position clement can be obtained by setting thc laser for horizontal scanning, and installing the photo-sensor in the vertical direction. The two-beacon system is uscfi~l in guiding robots for construction work (e.g. those for measuring undulation during golf course construction), v,hicll travel over wide areas. Fig. 12 shows the system's principle. Fig. 12 Two-beacon system for 3-D positioning

686 2.5 Corner cube system In this system, three or more comer cubes (or reflecting tapes) are set on the course or field. A laser scanner with a precision rotary cncoder is mounted on the vehicle. The vehicle determines its position and direction via computer. A variety of the system uses two comer cubes and four laser transmitters/receivers, but no scanners. These systems are called laser navigators. (see Fig. 13) A popular application of the automatic guidance system is the outdoor concrete cement finishing system, using automatic map guidance; in this system, the automatically measured vehicle position is checked accurately via laser navigator

II01.

In any of the measurement methods explained so far, the guided vehicle uses the differential odometric method, or the dead reckoning system (e.g. an odometer with a gyroscope; it is to be noted that, recently, reasonably priced but highly efficient gyroscopes have become available). In my opinion, the development of transportation infrastructure using laser guidance and corner cube setting methods will prove advantageous, since such infrastructure can be used in heavy rain, snow or fog; or for waste land, grated floor, sloped surface, stairway, escalator, or moving road. These methods will prove effective in guiding not only AGVs, but caterpillar or walking type vehicles, humans and animals. Recent discussions on advanced AGV and its system have been reported in many newspapers specializing

Fig. 13 Laser navigator

687 in industry. According to the papers, recent engineering efforts related to AGV have been focused on the following points: 1) Flexible course setting 2) Dust-free system 3) Low-noise vehicle 4) Energy (battery) problem 5) Autonomous guidance 6) t lybrid mode guidance (combination of autonomous guidance with forced guidance) 7) Velficle-to-vehicle communication 8) System or traffic control for guiding a large group of independent vehicles 9) Man-machine interface 10) Lower cost, higher performance 11) Development of the present system to expand applications, in such fields as agriculture, engineering work, building construction, and office automation 12) Operation in clean environment (e.g. in semiconductor or LSI plants) 13) Safety problems 14) Automatic guidance of vehicles in a formation 15) Automatic guidance system using a single trajectory (called "dotetsu" in Japan) a) Dotctsu using a mechanically coupled trajectory b) Dotetsu using a trajectory coupled non-mechanically (via radio, optical means, map etc.) 3. MULTI-VEillCLE STRATEGY The guidance and control of more than one vehicles involve additional problems. Recently, Japanese researchers in AGV or moving robot have been interested in such subjects as multi-vehicle cooperation, formation control, and collision avoidance. The following points are important in studying these subjects from the engineering or general teclmical standpoint: 1) Adoption of positive (cooperative) or negative (collision avoidance) strategy 2) Vehicle-to-vehicle or vehicle-to-station communication 3) Relative but accurate position and direction determination Table 1 shows the outline of research fields related to collision avoidance. 4. VEHICULAR COMMUNICATION For mobile communication, radio wave serves as efficient, economical and conveniently available means. However, radio wave communication in factory, office or outdoor sites causes unexpected, obstructive noise, which adversely influences AGV operation. The recently popular spread spectrum method may also prove detrimental to other AGV systems. Optical communication involves the establishment of a wireless system linking communication partners directly via optical beana. The important characteristics of optical communication include the following: 1) Absence of electromagnetic interference or noise 2) Communication capacity larger than that of radio 3) Absence of multi-path interference 4) Little leakage of information

688 Table 1. Research on collision avoidance in Japan

Recently, two optical methods have been proposed in Japan for frce communication between moving vehicles. 4.1 Control system for mutual tracking and follow-up This system was dcvclopcd in a MITI special projcct from 1984 to 1991. 4.2 Method using laser and corner cube (retroreflector system) This system has basically proved successful. The system requires one target tracking system, and one comer cube pemfitting modulation via signal. The tracking and follow-up control in the rctroreflcctor system need not be accurate; inaccuracy in the control is compensated by the accurate reflection of tracking optical beam on the transmitter via comer cube. Fig. 14 shows the principle of optical twoway communication using one light beam

Operation (Dynamic)

Operation (Static) . . . . .

.Heuristic

9Gradient methods (Ishitani, 1983)

(Noborio, 1989) Global (Kondo, 1988) (Okutani, 1983) 9Dynamic Programmin~ 9Potential methods (Ikegami, 1980) 9Fuzzy method (Takeuchi, 1988) 9Multipurpose program Local (Miyake, 1987)

1111. 5. MOTOR CONTROL

9Laplace potential (Akishita, 1991) 9Vertual impedance (Arai, 1989) 9Fuzzy method (Koyama, 1991) (Ishikawa, 1993) 9Multipuq~ose program (Saito, 1990) -Others (Tsubouchi, 1994) ( Nagata, 1993)

Recently, a new type of AC servo motor has come to be used widely for AGV. The cur-

Laser transmitter

Laser modulator

/

[Interface

Comer cube with laser modulator

Beam splitteQ..._ J

Interface ]

I"+ Computer Laser transceiver

Computer Respondent device

Fig. 14 Principle of optical two-way communication using comer cube

689 rent AC scrvo control system requires the following: 1) Low-inertia motor 2) Maintenance-free operation 3) Precise positioning and quick response by virtue of speed pattern matching and higher stall torque Cost reduction will lead to the filrther expansion of demand for AC servo motors. 6. AGV ENERGY PROBLEM

Almost all AGVs need powcrfitl battcrics as energy sources. Some AGVs for indoor use obtain powcr via wire cable. Since cable is not suitable for guidance on a complex route, some connectors for releasing and catching power have bccn designed. Recently, an induction-type power transmission system has been developed for commercial use [12]. When high-frequency current flows in the power transmission line, electromotive force is generated in the pickup coil installed on the AGV bottom through electromagnetic induction, transferring electric power to AGV. This system is durable, quiet and spark-free. Accordingly, it cart bc operated at dusty, ',vet or humid locations. 7. CONCLUSION Many technical problems concerning AGV remain to be solved. The AGV system is highly interesting for cngincers specializing in mechanics, electrical systems, instrumentation or computer. However, researchors in AGV invariably face cost problems and resulting constraints. This paper has discussed solutions to many AGV-rclatcd problems, including those concerning automatic guidance and positioning; these solutions involve the use of optical methods and systems. REFERENCES 1. T. Tsnmura, 1986 IEEE Robotics and Automation (1986) 1329. 2. T. Tstlmura, Jour. Soc. of Instr. and Cont. Eng. (SICE), 30, 1 (1991) 1 ( in Japanese). 3. A. Mcystcl, Autononaous Mobile Robots, World Scientific Publishing (1991). 4 T. Tsumura, Jour. Systems, Control and Information, 37, 6 (1993) 327 ( in Japanese). 5 T. Takcda, Proc. '86 IEEE Robotics & Automation (1986) 1349. 6 T. Tsumura, Proc. "94 JAPAN-U.S.A. Symposiuna on Flexible Automation (1994) 293. 7 H. Mochizuki, Proc. 7th ADVANTY Syrup. (1994) 85 ( in Japanese). 8 Y. Nakahara, Proc. 3rd Struct. Robot Syrnp. (1993) 309. 9 T. Ocl~i, Proc. 7th ADVANTY Syrup. (1994) 57 ( in Japanese). 10. K. Nishide, Proc. '90 JAPAN-U.S.A. Syrup. on Flexible Automation (1990) 531. 11. T. Tsumura, Proc. Intelligent Vehicle '93 (1993) 366. 12. Catalog of DAIFUKU Co. Ltd. (1994).

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

691

R o b o t i c s in M e d i c i n e Paolo Dario, Eugenio Guglielmelli and Benedetto Allotta ARTS Lab - Advanced Robotics Technology & Systems Laboratory, Scuola Superiore Sant'Anna via Carducci 40, 56127 Pisa, Italy

This paper reports the current state-of-the-art in medical robotics. Rather than trying to enumerate all the possible applications of robots or robotic technologies to medicine, three general areas of advanced robotics are identified which, based on the current technological background and expertise, could potentially provide significant improvements to the state-of-the-art in medicine. These are: macro robotics, micro robotics and bio-robotics. Macro robotics includes the development of robots, wheelchairs, manipulators for rehabilitation as well as new more powerful tools and techniques for surgery; micro robotics could greatly contribute to the field of minimally invasive surgery as well as to the development of a new generation of miniaturised mechatronic tools for conventional surgery; bio-robotics deals with the problems of modelling and simulating biological systems in order to provide a better understanding of human physiology. According to this classification, a review on the most important past and ongoing research projects in the field is reported. Some commercial products already appeared on the market are also mentioned and a brief analysis of the economical potentialities of robotics in medicine which can be prefigured in the near future is presented. 91.

INTRODUCTION

The application of robots - or more generally of technologies and know-how derived from robotics research- to medicine has moved rapidly in the last few years from the speculation of a small group of "visionary" scientists to reality. Today robotics can be considered as a real opportunity, available to a range of operators in the medical field, as well as to industries which want to explore a market that can quickly become very attractive [1]. The growth of interest on medical application of robotics has been so rapid recently, that is already difficult to provide a "still" picture of this field. However, we propose a classification that may be helpful as a guideline to discuss the main applications or perspectives of robotics in medicine. This classification is presented in Figure 1.

692 Rather than trying to enumerate all the applications of robots or robotic technologies to medicine, we have identified three main areas of advanced robotics which, based on the current technological background and expertise, could potentially provide significant improvements to the state-of-the-art in medicine. Robots can find practical application in two main medical fields: surgery and assistance to disabled and elderly persons. Moreover, robotics can also be conceptually associated to biological systems in the area that we can broadly define as bio-robotics. In this area, robotics can be seen as a "metaphor" of biological systems, and robotics research as an important bridge between h u m a n and biological sciences, on one side, and artificial intelligence, on the other side.

Figure 1.

Main applications and perspectives of medical robotics

However, instead of being just speculative, research on bio-robotics may also lead to a number of practical applications for the substitution or augmentation of organs a n d / o r functions of humans. A distinction that can be helpful to clarify some basic concepts and to discuss applications of medical robotics is the one we propose between macro robotics and micro robotics. In fact, this distinction implies not only different size, b u t - and most important - different intrinsic design features and mode of operation. This is particularly clear in the area of surgery, which most scientists and industrialists

693 perceive as the most promising field of application for robotics, and where two substantially different approaches can be envisaged: the use of robots as rigid accurate and autonomous machine tools, on one hand, or the use of robot-like endoscopes, flexible and teleoperated by the surgeons, on the other. Somewhere in the "middle" is an area also very interesting, which some investigators refer to as "mechatronic tools for surgery". The aim of this branch of medical robotics is to broaden the concept of robotic device for surgery by taking advantage of methodologies and technologies directly derived from the state-of-theart in robotics. Considering the valuable progresses of the last decades in the field of micro-mechanics and mechatronics, this approach could potentially lead to a quick and wide diffusion into the market of innovative surgical tools and thus to clinical practice. Significant potentialities have been also identified in the field of rehabilitation robotics. For instance, the feasibility of desktop robotic assistants has been already demonstrated, even though under particular conditions [2, 3]. In spite of this, the potential of mobile robotic systems has not been clearly defined yet. In the last decade, the prototypes of robots dedicated to household tasks which have been commercialised did not encountered the favour of the market, mainly due to their high cost associated to poor performance with respect to the average expectation of the user. In addition, the market for rehabilitation robotics is far from being a single one, actually being still strongly dependent on the trends of the mass consumer market. However, the success of such systems will mainly rely on their modularity and easiness of use, which must be considered as key factors from the very beginning of system design. In fact, modular components and friendly user interfaces could represent the real link between the mass consumer market and the rehabilitation technology market. Assistance systems currently available on the market require heavy adaptation of the house by means of special building design, installation of centralised environmental control systems (for door, windows, appliances, etc.) and of fixed desktop workstations. These systems are rather expensive, especially if also the cost of building special apartments and residences is taken into account. On the contrary, mobile personal robots represent a highly attractive solution, even in economical terms, as they could significantly contribute to minimise the required degree of adaptation of the house. This fact will not only decrease the global cost for the installation of the assistance system but it will also have some functional and cultural implications: the house will be available for use to both able-bodied and disabled people, no specialised environment will be necessary apart from the rest of the house, and consequently no problems will arise if the disabled person needs to move to a new apartment. In this paper, an overview of the past and ongoing research projects as well as of the first commercial products already appeared on the market is reported for the areas of surgical and rehabilitation robotics. Finally, basic concepts and potential applications of bio-robotics will also be discussed with reference to current research activities in the field.

694 2.

ROBOTICS FOR SURGERY

Many different projects in this field have been carried out during the last ten years and few of them already generated industrial systems that are currently under clinical evaluation in hospitals [4, 5, 6]. The numerous applications to surgery can be classified in two main areas: those based on "image-guidance" and those aimed to obtaining minimal invasiveness. 2.1 Image-guided surgery

The basic concept behind image-guided surgery is the use of a robot workstation integrated into the operating theatre where some of the parts of the patient's body are fixed by means of suitable fittings. This scenario is easy to implement for orthopaedic surgery, where fixators are commonly used to fix bones and also for neurosurgery, where the stereotactic helmet, mounted on the patient's head, is quite popular to provide absolute matching between pre-operative and intra-operative reference frames. Vision-based surgery may be viewed as a robotic CAD-CAM system where diagnostic images (from CT, NMR, US, etc.) are used for off-line planning of the intervention. The robot is used in a CNC machine tool-like fashion for precise cutting, milling, drilling and other similar tasks. A better quality of the intervention results from better performance of robots with respect to the manual operation of a surgeon in terms of accuracy and repeatability. A clear demonstration of the superiority of robot cutting versus normal cutting is shown in Figure 2 for the case of bone milling for hip implant [5].

Figure 2.

A comparison between (a) robot's and (b) surgeon's performance in bone milling for hip replacement. Reprinted from [5].

Real-time images may also be used during the intervention in combination with diagnostic images and tool position/orientation data in order to provide the surgeon with feedback about the current state of the intervention. It is important to point out that the surgeon supervises the robot system during operation.

695 Among the obvious differences between an industrial robot application and a surgical one, the need for suitable matching procedures between diagnostic images and off-line intervention planning on one hand and real execution on the other hand is still a key issue. As mentioned before, the issue of matching has been addressed and solved in some cases (specifically in the case of bone cutting in orthopaedics), but many problems still remain open due to the fact that most interventions on parts of the human body involve soft tissues and large deformations may occur. This results in possible discrepancies between pre-operative and intra-operative images. Image-guided surgery includes orthopaedic surgery, spine surgery, neurosurgery, reconstructive/plastic surgery and ORL surgery. A very representative example of implementation of image-guided robotic surgery is the one proposed by R. H. Taylor et at. [6] which has been implemented in an industrial system (Robodoc, Orthodoc, ISS Inc., Sacramento, CA, USA) currently used in human trials for the automated implant of hip prostheses.

Figure 3.

A view of the hip replacement surgery system in the operational theatre. Reprinted from [5].

The main reason which motivated many researchers to explore the use of robotic devices to augment a surgeon's ability to perform geometrically precise tasks is the consideration that the precision of surgical planning often greatly exceeds that of surgical execution. The ultimate goal of this effort is a partnership between men (surgeons) and machines (computers and robots) that seek to exploit the capabilities of both to perform a task better than either can perform alone [5]. The architecture of the hip replacement surgery system depicted in Figure 3, consists of a CT-based presurgical planning component, shown in Figure 4, and of a surgical component. The surgical procedure includes manual guiding to approximate positions of pins pre-

696 operatively inserted into bones (which are fixated to the operating bed) and automatic tactile search for each pin. Then, the robot controller computes the appropriate transformation between CT and robot coordinates and uses this information to machine out the implant cavity. Finally, the pins are removed and the surgery proceeds in the conventional manual procedure.

Figure 4.

An example of the pre-operative planning procedure for the hip replacement system. Reprinted from [6].

Safety issues have been taken into high consideration. In the "Robodoc" system they include extensive checking [7] and monitoring of cutter force, and the possibility for either the surgeon or the controller to freeze all robot motion or to turn off manipulation and cutter power in response to recognized exceptions. Techniques which are essentially similar to the one described before, but which have been adopted to different operation tasks and scenarios, have been developed for the cases of total knee arthoplasty [8] (see Figure 5), percutaneous discectomy [9], spine surgery [10] (see Figure 6), neurosurgery [11, 12], prostate surgery [13], and eye surgery (by the group of Ian Hunter at the McGill University, Canada).

Figure 5.

Example of knee prosthesis implant : (a)before intervention; (b)after intervention; (c) detail of femur resection obtained by robot cutting. Reprinted from [8].

697 Similar approaches have been presented by many authors, for example by Finlay and Giorgi [14] for neurosurgery (see Figure 7), by Stfittem et al. for ORL surgery

[~5].

Figure 6.

An example (a) of computer assisted spine surgery for treating a case of scoliosis (b). Reprinted from [10].

Figure 7.

A stereotactic helmet equipped with passive arm for neurosurgery. Reprinted from [14].

Teleoperation, virtual reality environments and advanced man/machine interfaces will probably play a key role in the future of image-guided surgery. Teleoperation can be useful in some cases, such as when a patient must be operated urgently in a place where no specialised surgeon is available (for example, on a battle field or an ambulance), or when for safety reasons (patients with infective diseases, or long operations under X-ray) it is not appropriate for the surgeon to be within the operation field. Another critical problem which can be solved by means of

698 teleoperation is the unavoidable increasing requirements in terms of room for equipment in the neighbourhood of the patient's bed, so that it might become necessary for the surgeon to move away and operate remotely. Advanced man-machine interfaces and force replication devices might also play an important role in the framework of intervention simulation and surgeon training carried out in virtual environments featuring realistic 3D representations of body organs. Some examples of interfaces possessing the sophisticated features which are required for truly realistic simulations of surgical interventions are already existing, like the one developed by Bergamasco et al. [16] at the Scuola Superiore Sant'Anna (Pisa, Italy) and many more will probably appear in the near future, like the one under development at the McGill University (Canada) by Hunter et al.. 2.2.

M i n i m a l invasive surgery

Minimal invasive surgery (MIS), also called "endoscopic surgery", is gaining increased acceptance as a powerful technique beneficial to patient's integrity time of recovery and cost for assistance. At its current stage of development, MIS depends on three prerequisites: the availability of high quality video endoscopy, the ability of high precision surgical instruments and the manual skill of well-trained surgeons [17]. MIS requires accessing the organ to be operated through a small hole, and the surgeon, although directly responsible for the manipulation of the surgical tool, misses large part of the information necessary to control finely the end effector. At present MIS is a sort craftmanship, where operating surgeons has to compensate with their skills the fact that they can not touch and sense with their fingers for diagnostic purposes, they may not have 3D view of the workspace, the access to the workspace is restricted, they can not feel the forces/torques and pressure they are exerting at the end effector tip. A possible scenario of the next generation of MIS consists of a combination of telemanipulation and telerobotics technology, as depicted in Figure 8.

Figure 8.

A possible scenario for teleoperated surgery. Reprinted from [17].

699 Generally speaking, the contribution of robotics to significant decrease of the level of surgical invasiveness could involve three different fields: the first is laparoscopic surgery, which is based on stiff tools that the surgeon manipulates directly and by which can keep some (even if small) degree of "sensation" on the features of the operation workspace; the second is commonly referred as endoscopic surgery which makes use of flexible endoscopes and implies the virtual loss of any type of "sensation" for the surgeon; the third is not linked to any specific type of surgery and consists of an attempt of improving the performances of traditional macro surgical tools by applying mechatronic technologies aimed at decreasing the invasiveness of tool operation.

Figure 9.

Example of catheter tip with increased dexterity. Reprinted from [ 18].

Major developments efforts are needed in such areas of robotics technology as sensor integration, force reflection, miniaturisation of mechanisms and actuators, control. A very challenging approach to MIS is pursued in Japan, where the development of teleoperated micro-catheters capable of diagnostic and surgical interventions within brain blood vessels is currently underway. The micro-catheter will possess high dexterity at the tip like the macro one shown in Figure 9, and all along its length and will incorporate micro-fabricated tactile flow and pressure sensors at the tip, along with micronozzles and micropumps for local injection of drugs and solutions for dissolving thrombus [18]. A scheme of the system for brain blood vessels diagnosis and surgery is depicted in Figure 10. As outlined by Sato et al. [19] and by Morishita et al. [20] (see also Figure 11), the development of suitable instrumentation for MIS requires considerable research efforts both in miniaturisation of components and the adaptation of teleoperation techniques [19].

700

Figure 10. Scheme of the system for brain blood vessels diagnosis and surgery. Reprinted from [18].

Figure 11. Relations between micro world and human world. Reprinted from [20].

Figure 11. Relations between micro world and human world. Reprinted from [20].

701 An example of next generation micro endoscopes for MIS is given in Figure 12.

Figure 12. A next generation micro endoscope for MIS. Reprinted from [18].

3.

R O B O T I C S FOR R E H A B I L I T A T I O N

The possible role of robotics in the field of rehabilitation has been widely investigated in the last decades. Possible specific applicative areas which have been already identified range from the assistance to the disabled and the elderly, by means of robotic manipulators, intelligent wheelchairs and dedicated interfaces for household and vocational devices, through the restoration of impaired functions, by means of advanced prostheses, ortheses and electrical functional stimulation (FES), to the development of virtual environments for training and genuine rehabilitative therapies. Whatever is the selected approach, one of the key factors for the success of robotic a i d s i s certainly the potentiality to make these peculiar users still able to exert a complete control on their environment by using a robotic interface. In rehabilitation robotics, the term "environmental control" refers to a disabled user's capacity to actively interact with his or her external environment [21]. Although all of the sensory and motor functions are necessary for a complete environmental control, disabilities based on partial or total loss of upper limb

702 function are particularly serious, due to the consequent reduction in, or loss of, the manipulative function. This kind of disability is the most significant impediment in carrying out common everyday activities (e.g. personal hygiene, job, hobbies): the user receives the external stimuli but is then unable to respond to, or act on them (by modifying the external environment, for example). When lower limb function is also reduced (or lost), the physical (and psychological) loss of control is profound, and makes a disabled user dependent on others in virtually every respect. Numerous research teams, potential users and manufactures are already involved in developing techniques for environmental control, and for controlling robots in the context of rehabilitation and assistance for disabled users. At present, one of the commonest use of rehabilitation technology puts a general purpose computer at the hub of a multi-purpose 'cockpit', and the users operate all of their (specialised or adapted) products from that cockpit [22]. There are some advantages to this, especially for severely disabled or bed-ridden users, but less so for users with moderate, or age-related disabilities: the user is distanced from the task itself, the interaction style is based on the computer, rather than on the product being used and the task being performed. In such a context of use, and for these users, the computer remains an obtrusively technical device which tends to appear as the unique link between disabled users and their environment. 3.1

Manipulators in rehabilitation One of the primary objectives of rehabilitation robotics has always been to fully or partly restore the disabled user's manipulative function by placing a robot arm between the user and the environment. Some important factors must be considered in the design of such a peculiar environmental control system: the user's degree of disability (a system must be flexible enough to be adapted to each user's capabilities); modularity (system inputs and outputs must be easy to add or remove accorfling to each user's needs); reliability (a system must not let the user down); and cost (the system must be affordable). According to the state-of-the-art of rehabilitation robotics, three different configurations of robot systems, differently reflecting the above mentioned factors, have been considered as feasible for the assistance of severely and moderately disabled users. Historically, the first configuration which has been investigated is the bench- or table-mounted manipulator included in a completely structured desktop workstation [23, 24, 25, 26, 27, 28]. Even though various systems based on this approach were positively evaluated with users [2, 3, 28, 29, 30] and some commercial products already appeared on the market, such as the DEVAR system (Tolfa Corporation, Palo Alto, CA, USA) and the RAID system (Oxford Intelligent Machines Ltd., Oxford, Great Britain), yet they seem to be particularly suitable for assisting disabled employees for executing vocational tasks at their workplace. In fact, desktop workstations better reflect the type of organisation of space and time which is typical of vocational activities. Furthermore, this approach brings all the previously discussed drawbacks of using a 'cockpit' environmental control system. One of the first prototypes of desktop workstation, the MASTER (Manipulator Autonomous at

703 Service of Tetraplegics for Environment and Rehabilitation) system was developed in France by CEA (Paris) and is shown in Figure 13. The wheel-chair-mounted arm is particularly suited to users with upper limb disabilities, but its usefulness relies on the user being able to move (or control the movements of) the wheelchair, so that the robot can be taken to the area(s) of activity: the home environment must also be adapted to suit the robot's working height, making it an intrusive solution. This solution, depicted in Figure 14, is becoming much popular as it allows to the disabled or elderly to use the robot arm anywhere, not being necessarily related anymore to some fixed structured locations [31, 32] and is being widely experimented [33, 34].

Figure 13. The MASTER desktop workstation (CEA- France)

Figure 14. The wheelchair-mounted MANUS manipulator. Reprinted from [32].

However, some technical problems, mainly concerning the accuracy which can be obtained on grasping operations (the arm is fixed to the wheel-chair which is not properly a rigid structure) and the possibility to equip the wheel-chair with a friendly (and then complex) arm controller (the available space and the battery energy are limited) have still to be solved. Moreover, this solution does not actually address all disabled users, but only those with upper limb disabilities. Consequently, it seems to have limited concrete perspectives to become economically attractive by inducing mass market demand. In both previous cases, a severely disabled or bed-ridden user is not well catered for: the workstation dominates the user's home environment, while the wheelchairmounted robot is simply not an option. A third different solution is that of using an autonomous or semi-autonomous mobile vehicle equipped with a manipulator and additional sensor systems for autonomous or semi-autonomous operation. Its mobility and versatility make it particularly suitable for severely disabled or bed-ridden users, as long as the interface between the user and the robot is easy to use: the user should be able to

704 instruct the robot with a high-level language, via a bi-directional user interface offering appropriate methods of input and suitable output. This configuration has been first used in industrial applications (e.g. textile industries) and is surely the most sophisticated one, but it is also the most generally applicable, since the idea to have a robot in personal service can result attractive for both able-bodied and disabled users. A first prototype of a mobile robot for rehabilitative applications was developed by S. Tachi et al. of the MITI Japanese laboratories. This system, named MELDOG, was devised only to act as a robotic "dog" for blind patients, thus not having any possibility of manipulating or carrying objects. Generally speaking, the ideal system should include sub-systems dedicated to vision, fine manipulation, motion, sensory data acquisition, and system control. A sketch of a possible configuration for such a robot [1] is shown in Figure 15. Unfortunately, the cost of this solution is often prohibitive for all but the wealthiest of users, and the usability problems inherent in such sophisticated systems have yet to be satisfactorily solved for non-professional users. Various prototypes of autonomous or teleoperated mobile robots for the assistance to the disabled in different activities have been implemented and others are currently under development [35, 36, 37].

Figure 15. Concept of a mobile robotic aid. Reprinted from [1].

As a sample, Figures 16 and 17 illustrate two ongoing research projects in this field: namely, the american MOVAR system, mainly devised for vocational use, and the Italian URMAD system, mainly devised for household applications. Both prototypes have been mostly implemented. In particular, the URMAD system (the acronym stands for "Mobile Robotic Unit for the Assistance to the Disabled") will be

705 widely experimented with the final end users, i.e severely disabled patients, before the end of 1994.

Figure 16. The Mobile Vocational Assistant (MOVAR). Reprinted from [1 ].

One of the main unsolved technical problems is that the more the performance of the robot is enhanced the more its dimensions (imposed by the standard building norms), its autonomy of operation, and consequently its cost become excessive for prefiguring an effective commercial exploitation.

Figure 17. An overview of the URMAD system.

An attempt [38] to overcome these drawbacks and limitations is being carried out in the framework of the TIDE-MOVAID project by an European team coordinated by

706 the ARTS Lab of the Scuola Superiore S. Anna. As illustrated in Figure 18, the final objective of this project, in fact, is to develop a complete system, including a mobile robot, having globally functional performances similar to URMAD but still respecting the limitations imposed by a normal household environment so to avoid heavy adaptation of the house.

Figure 18. The MOVAID approachi a mobile base which fits into different activity workstations. This goal is being pursued by partially distributing resources in the environment instead of concentrating them on the vehicle. As long as distribution of resources is well balanced and technologies are properly integrated in the domestic environment, such a realistic solution could represent a good compromise between the current state-of-the-art in advanced robotics and the ideal concept of the autonomous robotic assistant in a modern domestic scenario, thus hopefully favouring a rapid commercial exploitation. To this aim, the MOVAID project, rather than developing new basic hardware components, will take advantage of the results of previous research projects. As a sample, in Figure 19 the URMAD manipulator, which will be also used for the MOVAID mobile unit, is shown. Another important aspect of the MOVAID project is that the emphasis is on the friendliness of the system's interface and on a non-intrusive integration of technologies in the domestic environment. In this optics, MOVAID represents a potential opportunity to have a direct transfer of advanced technologies towards the home environment. This approach, which is strictly related to the increasing industrial interest in "domotics" (i.e. the development of a "smart" house accessible to all users, including the disabled and the elderly) could also have interesting implications in the medical field (telemedicine, home assistance vs. clinical assistance for the disabled and the elderly, etc).

707

3.2 Robotic systems for hospitals Mobile robots could become one answer to the current shortage of help in hospitals, as well as one solution to the problem of the diseases (lumbago, low back pain) which affect the personnel involved in heavy physical tasks such as lifting a patient and carrying h i m / h e r to the toilet or changing sheets in the bed.

Figure 19. The URMAD manipulator: an 8 d.o.f, redundant dextrous arm purposely developed for service applications (SM - Scienzia Machinale srl - Pisa, Italy). An example of implementation of a hospital transport mobile robot has been presented recently by Transition Research Corporation (TRC Inc., Danbury, CT, USA). The robot, depicted in Figure 20, has been designed and built for addressing the need for assistance with such tasks as point to point delivery [39].

Figure 20. The HelpMate hospital transport mobile robot. Reprinted from [39].

708 The objective of the hospital transport mobile robot ("Help Mate") is to curry out such tasks as the delivery of off schedule meal trays, lab and pharmacy supplies and patient records. The navigation system of HelpMate, unlike many existing delivery systems in the industry which operate within a rigid network of wires buried or attached to the floor ("AGVs"), relies on sensor-based motion planning algorithms that specifically address the issue of navigation in a partially structured environment. The system is also able to handle sensor noise and sensor inaccuracy, errors in position estimation, and moving obstacles (e.g. people). HelpMates have been installed in several hospitals. In some hospitals HelpMate is in operation 24 hours per day and the hospitals are reporting an increase in productivity and efficiency. The HelpMate represents a useful and probably industrially valuable solution to some basic needs requiring the transportation of lightweight objects. The complexity of the system is deliberately kept low by eliminating automated manipulation (which is carried out by human operators), by assuming flat floor in the working environment (the robot uses elevators - which are controlled by means of elevator control computers activated by an infrared transceiver-, and doors), and by providing the robot with accurate geometric and topographical information about the hospital hallways, elevator lobbies and elevators. Further help to nurses could be provided by heavier robots, designed to execute tasks requiring hard muscular work. Japanese laboratories and industries have identified this field as very promising, and have invested substantial efforts in the development of fetch and carry robots for hospitals, usually hydraulically actuated and featured by high payload. An interesting example of such type of robot is the patient care robot named "MELKONG", that was developed a few years ago by the Mechanical Engineering Laboratory (MEL) in Japan [40]. The "MELKONG was intended to lift, hold and carry an adult patient (weighing up to about 100 kg) or a disabled child. The robot docked to the bed in the hospital room, lifted the patient in its arms from the bed, moved back still holding him/her in its arms and transferred him/her to the toilet, or bathroom, or dining room. Usually the robot was controlled by a nurse, but it was expected that at night the patient could also call the robot and control it by means of simple commands given by means of a joystick. Serious problems related to automatic docking, mobility, manipulation, actuation (by hydraulic actuators), energy supply, man/machine interfaces were addressed and solved. The MELKONG concept has evolved from the early prototype described above to more sophisticated versions incorporating functional and aesthetic improvements, such as the one depicted in Figure 21. A transfer-carrier vehicle based on an evolution of the MELKONG concept is commercially available in Japan from Sanyo Electric Co. Ltd. Furthermore a simple functional robot aimed at supporting the elderly and the disabled for independent living, in particular for evacuation (a function that the elderly wished to do by him/herself if an adequate support equipment is available), is also being developed in Japan jointly by the National Institute of Bioscience and

709 H u m a n Technology of AIST, Aprica Kassai Inc. and Hitachi Ltd, in the framework of the National Programme for Welfare State and Apparatus (Towards a New Society).

Figure 21. Recent version of a robot for lifting bedridden patients. Reprinted from [41].

3.3 Intelligent wheelchairs For a physically disabled person the main advantage of using a transport mechanism, like a wheelchair, is to allow h i m / h e r to achieve some degree of personal mobility. In the case of a wheelchair carrying a robot arm, the severely disabled can use the robot arm anywhere, and not only remotely or in fixed permanent locations. In order to increase the performance of ordinary wheelchairs (and of the robot manipulators possibly mounted on them), a number of approaches have been proposed based on robotic and mechatronic technologies. These approaches comprise attempts to develop autonomous vehicles which can be used to transport a person from one location to another with little or even without outside assistance, as well as attempts to increase the capability of the vehicle to locomote on unprepared surfaces and to overcome obstacles. An example of the first approach is represented by self-navigating wheelchairs, as the one proposed by Madarasz et al. a few years ago [42]. The vehicle, designed to function inside an office building, is able to plan its own path from its current location to a particular room in the building, and then to travel to that location. The system must also function with minimum impact on the building in which it will be used, that is the building cannot be equipped with a guidance mechanism, such as embedded wires in the floor or painted stripes that can be followed. Therefore, the wheelchair becomes substantially a sort of mobile robot with high degree of autonomy. In fact, the vehicle is self-contained: all of the sensing and decision making are performed by the on-board equipment. This approach relieves the disabled person from tasks h e / h e r may be unable to carry on, but a system for

710 supervised control is provided for high level commands and for other types of operations requiring direct guidance. In Europe, a recent example of a project aimed at developing a wheelchair featured by partly autonomous behaviour is represented by the European TIDEOMNI project. An italian manufacturer (TGR s.r.l., Ozzono Emilia, Italy) produces a wheelchair (named "Explorer") incorporating both wheels and tracks. This wheelchair, which has been also modified to host the Manus system in the framework of the European SPRINT-IMMEDIATE project, can not only run on regular terrain, but also go up and down stairs with the user on board. This approach represents an evolution towards the possible development of a new generation of vehicles designed to deeply enhance the mobility of the user. A very interesting example of this evolution is the adaptive mobility system proposed recently by Wellman et al. [43] at the University of Pennsylvania, Philadelphia, USA The design of the system, that is basically a hybrid vehicle incorporating wheels as well as two "arms" that can work both as manipulators and as legs, is based on the assumption that a legged vehicle allows locomotion in environments cluttered with obstacles where wheeled or tracked vehicles can not be used. A legged vehicle is inherently omnidirectional, provides superior mobility in difficult terrain or soil conditions (sand, clay, gravel, rocks, etc.) and provides an active suspension. The legs also give the chair versatility and allowed it to be re-configured. When stationary, one of the legs can be used as a manipulator in order to perform simple tasks such as reaching for objects or pushing open doors. 4.

F R O M A D V A N C E D P R O S T H E S E S A N D O R T H E S E S TO F.E.S.

There have been frequent intersections between robotics and limb prosthetic technologies in the past. Many devices, like artificial legs, artificial hands and arms, have evolved in the '60s and '70s both as prostheses for amputees and as possible components of advanced robots. Examples of these devices are the Belgrade hand [44] and the UTAH arm [45]. More recently the UTAH-MIT dextrous hand [46] was designed as a robotic hand by taking inspiration from the human hand, whereas new prostheses for amputees have been developed by exploiting last advances in robotic technology (like the one developed in the framework of the European TIDEMARCUS project). However, it is quite obvious and very clear to all of those working in the field of aids for disabled that, although disabled persons may accept "artificial" devices as assistants, their dream and ultimate goal is to be able to manipulate and walk again. Although this is out of reach for current medical capabilities, a few promising approaches are being pursued by some investigators which might ultimately lead to render that dream closer to reality. A first example of robotic device that has been developed to help patients with impaired walking capabilities to restore their functions is the one developed in Japan and illustrated in Figure 22.

711 An evolution of the above mentioned assistive device for "natural" walking is the active orthosis, whose development has been pioneered by Prof. Pierre Rabishong and his team of INSERM, in Montpellier, France [47]. This active orthosis, which is shown in Figure 23, was intended and used for clinical research and re-education. A further application of the system was the combination of the re-education apparatus with functional electric stimulation (F.E.S.) studies. This evolution is the goal of a EUREKA project, called CALIES (Computer-Aided Locomotion by Implanted Electro-Stimulation), which represents probably the most important coordinated effort presently carried out in the world for restoring autonomous locomotion in paralysed persons. The project investigates the possibility of implanting stimulating electrodes into lower limb muscles, or even nerves, and to obtain close to natural walking by providing appropriate stimulation patterns by means of an external portable computer.

Figure 22 Robotic device for walking rehabilitation. Reprinted from [48]. A key component for improving the performance of FES-based apparatus is the implanted electrode. Development in this field might not only allow to obtain better systems for computer-assisted manipulation and locomotion, but eventually even lead to achieve the dream of restoring natural manipulation and locomotion. In fact, some projects aim to develop implantable neural prostheses based on the capability of the peripheral nervous system to regenerate, which could establish a bidirectional electrical continuity between the nervous system and external devices [49, 50]. One of these projects, the European ESPRIT-INTER project [51], investigates an approach based on a combination of silicon microfabrication technology, polymer channel guidance and nerve growth factors release, in order to promote nerve regeneration through a pattern of microholes with electrodes, thus rendering it possible to pick up sensory signals and to selectively stimulate nervous fibers. Other projects are exploring the possibility of by-passing interruptions in the nervous system (even at the central level) by artificial nerve grafts, fabricated by conductive

712 polymer fibers. Some other projects try to obtain nerve regeneration at the spinal cord level by implanting fetal nervous cells. It is the hope of many researchers that this class of "hybrid" devices, which exploit the properties of artificial materials combined with those of biological factors (possibly modified by means of biotechnology), could eventually allow to restore the continuity of nervous pathways, and make the dream of paralysed persons to move their own limbs true.

Figure 23. Active orthosis: master-slave version. Reprinted from [47]. In fact, some projects aim to develop implantable neural prostheses based on the capability of the peripheral nervous system to regenerate, which could establish a bidirectional electrical continuity between the nervous system and external devices [49, 50]. One of these projects, the European ESPRIT-INTER project [51], investigates an approach based on a combination of silicon microfabrication technology, polymer channel guidance and nerve growth factors release, in order to promote nerve regeneration through a pattern of microholes with electrodes, thus rendering it possible to pick up sensory signals and to selectively stimulate nervous fibers. Other projects are exploring the possibility of by-passing interruptions in the nervous system (even at the central level) by artificial nerve grafts, fabricated by conductive polymer fibers. Some other projects try to obtain nerve regeneration at the spinal cord level by implanting fetal nervous cells. It is the hope of many researchers that this class of "hybrid" devices, which exploit the properties of artificial materials combined with those of biological factors (possibly modified by means of biotechnology), could eventually allow to restore the continuity of nervous pathways, and make the dream of paralysed persons to move their own limbs true.

713 5.

BIO-ROBOTICS

Biological systems are not only the recipient of the services of robots, but also the source of inspiration for components and behaviours of future robot systems. This not well defined, but intriguing and stimulating area of interest for robotics, can be called "bio-robotics" and is currently receiving an increasing amount of attention by many investigators. In general, biological systems are a living proof that some complex functions (both sensorimotor and "intellectual") that robotics researchers should like to realize in artificial systems can actually be implemented. Locomotion, manipulation, vision, touch are all functions which living beings execute seemingly without effort, but which turned out to be extremely difficult to replicate in artificial systems. In the recent past, many different groups have been active in this "borderline" area where the distinction between "robotics"and "bioengineering" becomes very subtle. A book discussing state-of-the-art results and perspectives in this field has been published recently [52]. Examples of components which are explicitly inspired to their biological counterparts (and which are intended to be the "core" of sensorimotor systems capable of replicating the function of their biological counterparts) are retina-like CCD sensors [53] and tactile sensors [54]. A photograph of the CCD vision sensor, whose geometry is inspired to the one of the human retina (including the high-acuity fovea-like central part), is reported in Figure 24. A scheme of a fingertip incorporating three different types of sensors which provide (in combination with appropriate sensorimotor acts) the robot controller with information on object geometry and material features comparable to those of the human fingertip, is given in Figure 25. A close-up of one of the sensors (a 256-element array sensor), which imitates the space-variant distribution of tactile receptors in the fingertip skin, thus emphasising the role of "attentive behaviour" in active touch, is also reported in the same figure.

Figure 24. The foveal structure of the CCD retina-like sensor. Reprinted from [53].

714

Figure 25. The ARTS Lab fingertip probe (a) and tactile sensor (b). Reprinted from [54].

Further applicationsin the fieldof bio-roboticsinvolve the use of artificialsystems as accurate models for investigating the physiology of biological systems. The laboratory which has pioneered this approach is the one headed by the late Prof. Ichiro Kato at Waseda University, which has developed robotic devices capable of playing different musical instruments such as organs, piano, violin and flute. A system developed for investigatingthe function of mastication in humans is depicted in Figure 26. Prof. Kato himself (ICNR '91) and other investigators (Coiffet [56], Rabischong [5Y]) have proposed even more intriguing speculations on the relations between human mind and robot mind. Based on these hypotheses the Waseda laboratory is currently investigating an approach to the assistance to the disabled and the elderly which involves not merely the concept of "service robots", but even the one of robot as "companions" of humans.

715

Figure 26. The mastication robot. Reprinted from [55].

6.

CONCLUSIONS

The reasons why robots did not gain immediate acceptance in the medical community are in part obvious, and in part more subtle. The obvious reasons are both psychological (robots may be perceived as "competitors" by physicians, and as potentially dangerous exotic machines by patients) and technical (industrial robots are reliable, but no real expertise exists in the world about robots working full time in the vicinity or even in contact with humans). The subtle reasons are related to a possible misconception of the very same notion of robots, which should probably be corrected in the interest of the robotic research community. In fact, most users perceive robots either as the industrial robot arm, or as an exotic and anthropomorphic creature. In the field of advanced robotics, and of medical robotics in particular, the robot leaves the factory floor and gets into physical contact with the h u m a n operator (the surgeon, the patient). In some cases, the robot will maintain the overall usual structure of an industrial robot (although new robot arms dedicated to medical application are being presented), but in most other cases robotic technologies will be embedded into tools which will not possess the traditional robotic "look". This shift should not be seen, in our opinion, as a problem, but rather as a very interesting and attractive opportunity for the robotics research community to extend their reach to a broader area (sometimes referred to as "mechatronics in medicine" or even "bio-mechatronics"). Nevertheless, the concrete experimental and clinical results achieved both in the fields of "macro" and "micro" medical robotics, and only partially reported in this paper, together with the economical and social motivations for using these new technological tools, permeated by robotic technologies, could certainly represent the best viaticum for a massive development of this new area of advanced robotics in the near future.

716 REFERENCES 1. J.F. ENGELBERGER: Robotics in service, Biddles Ltd., Guildford (GB), 1989. 2. J. HAMMEL, K. HALL, D. LEES, L. J. LEIFER, H. F. M. VAN DER LOOS, I. PERKASH, R. CRIGLER, Clinical evaluation of a desktop robotic assistant, Journal of Rehabilitation Research and Development, 2, 3, 1-16, 1989. 3. J. HAMMEL, H. F. M. VAN DER LOOS H. F. M., I. PERKASH, Evaluation of a vocational robot with a quadriplegic employee, Archive of Phys. Med. and Rehabil., 73, 683-693, 1992. 4. P. CINQUIN, J. DEMONGEOT, J. TROCCAZ, S. LAVALLI~E, G. CHAMPLEBOUX, L. BRUNIE, F. LEITNER, P. SAUTOT, B. MAZIER, A. Pt~REZ, M. DJAID, T. FORTIN, M. CHENIN, AND C. A. IGOR: Image guided operating robot: methodology medical applications, results. ITBM (Innovation and Technology in Biology and Medicine) - Special Issue on Robotic Surgery, 13(4):373-393, 1992. 5. H.A. PAUL,, B. MITTLESTADT, B. MUSITS, R. H. TAYLOR, P. KAZANZIDES, J. ZUHARS, B. WILLIAMSON AND W. HANSON: Development of a surgical robot for cementless total hip arthoplasty, Clinical Orthopaedics, December 1992, Vol. 285, pp. 57-66, USA. 6. P. ELURY, P. LOPEZ, D. GLAUSER, N. VILLOTTE, C. W. BURCKHARDT: Minerva, a robot dedicated to neurosurgery operations, Proc. of the 23th ISIR, Barcelona, Spain, 1992, pp. 729-733. 7. R. H. TAYLOR, H. A. PAUL, P. KAZANZIDES, B. MITTLESTADT, W. HANSON, J. ZUHARS, B. WILLIAMSON, B. MUSITS, E. GLASSMAN, W. L. BARGAR: Taming the bull: safety in a precise surgical robot, Proc. of '91 ICAR, International Coneference on Advanced Robotics, vol. 1, pp. 865-870, Pisa, June 1991 8. M. MARCACCI, M. FADDA, P. DARIO, S. MARTELLI, A. VISANI, T. WANG: The total knee arthoplasty project: a demonstration feasibility of the computer assisted approach, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), in, June 20-22, 1994, pp. 59-66. 9. M. FADDA, S. MARTELLI, P. DARIO, M. MARCACCI, S. ZAFFAGNINI, A. VISANI: First steps towards robot-assisted discectomy and arthroscopy, ITBM (Innovation and Technology in Biology and Medicine) - Special Issue on Robotic Surgery, 13(4):394-408, 1992 10. P. SAUTOT, S. LAVALLI~E, J. TROCCAZ, P. CINQUIN: Computer assisted spine surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22,1994, pp. 51-58. 11. Y. S. KWOH, J. HOU, E. A. JONCKHEERE, S. HAYATI: A robot with improved absolute positionning accuracy for CT guided stereotactic brain surgery, IEEE Transactions on Biomedical Engineering, Vol. 35, N. 2, February 1988, pp. 153-160. 12. N. VILLOTTE, D. GLAUSER, et at.: Conception of stereotactic instruments for the neurosurgical robot Minerva, Proc. of the 14th Annual Int. Conf. of the IEEE EMBS '92, 1089-1090, Paris, France.

717 13. B. DAVIES: Soft tissue surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 67-72 14. C. GIORGI, M. LUZZARA, D. S. CASOLINO, AND E. ONGANIA:. A computer controlled stereotactic arm: virtual reality in neurosurgical procedures, in Advances in stereotactic and functional neurosurgery edited by B. Meyerson, Springer Verlag, Vienna, NY, USA, 1993, pp. 75-76. 15. J. STUTTEM, B.KREMER, B. KORVES, L. KLIMEK, G. KROCKELS, R. MOSGES. Robotics in ear, nose and throat surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 31-35. 16. M. BERGAMASCO, B. ALLOTTA, L. BOSIO, L.FERRETTI, G. PARRINI, G. M. PRISCO, F. SALSEDO, G. SARTINI: An arm exoscheleton system for teleoperation and virtual environments applications, Proc. of the 1994 IEEE Int. Conference on Robotics and Automation, San Diego, CA, USA, May 8-13, 1994, pp. 1449-1455. 17. E. HOLLER, W. WEBER: System and control concepts for a telemanipulator system to be applied in minimally invasive surgery, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 111-120. 18. H. NARUMIYA: Micromachine technology (Intraluminal diagnostic & therapeutic system), Proc. of the 1993 Workshop on Micromachine Technologies and Systems, Tokyo, Japan, October 26-28, 1993, pp. 60-65. 19. T. SATO et al., Micro handling robot for micromachine assembly, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 138-146. 20. H. MORISHITA, Y. HATAMURA: Development of ultra precise manipulator system for future nanotechnology, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 34-42. 21. L. LEIFER: Rehabilitative Robots, Robotics Age, May/June, 1981. 22. G. BIRCH. Technology and Homes: the Remote Gateway Project, The Neil Squire Foundation Journal, Technical section, 5-8, 1992. 23. H. ROESLER, V. PAESLAK: Medical Robot arms, Proc. of 1st CISM-IFToMM Symp. on Theory and Practice of Robots and Robot arms, Udine, Italy, 1974, pp. 158169. 24. J. GUITTED et al.: The Spartacus, Telethesis: robot arm control studies, Bulletin of Prosthetics Research, 32, 1979, pp. 69-105. 25. H. FUNABUKO et al: Application of robot arm for environmental control system of bed ridden patients in private house, Proc. of 11th Int. Symp. on Industrial Robots, Tokyo, 1981, pp. 71-78. 26. M. R. HILLMAN. A feasibility study of a robot manipulator for the disabled, Journal of Medical Engineering & Technology, 11, 4, 1987, pp. 160-165.

718 27. D. LEES et al.: A third-generation desktop robotic assistant for the severely physically disabled, Proc. 11th Annual Conference on Rehabilitation Technology (ICAART '88), Montreal, Canada, 1988. 28. W. SEAMONE, G. SCHMEISSER: Early clinical evaluation of a robot arm/worktable system for spinal-cord-injured persons, Journal Rehabilitation Research and Development, 22, 1, 1985, pp. 37-56. 29. L. HOLMBERG, Rehabilitation Robotics at HADAR, Rehabilitation Robotics Newsletter, Vol. 4, No. 4, Wilmington, DE, Fall, 1992 30. R. CAMMOUN, J. M. DETRICHE, F. LAUTURE, B. LESIGNE: Evaluation of the MASTER 1 robotic system and design of the new version, Proc. IEEE Int. Conf. on Advanced Robotics (ICAR '93), Tokyo, Japan, 1993, pp. 319-322. 31. S. D. PRIOR: A wheelchair mounted telemanipulator arm for use by the disabled, Research Project Report 1988, Middlesex Polytechnic, London. 32. H. KWEE, C. STANGER: The Manus robot arm, Rehabilitation Robotics Newsletter, 5, 2, 1993, pp. 1-2. 33. J. C. ROSIER, J. A. VAN WOERDEN, L. W. VAN DER KOLK, B. J. F. DRIESSEN, H. H. KWEE, J. J. DUIMEL, J. J. SMITS, A. A. TUINHOF DE MOED, G. HONDERD, P. M. BRUYN: Rehabilitation robotics: the MANUS concept, Proc. of '91 ICAR, vol. 1, pp. 893-898, Pisa, June 1991. 34. J. HENNEQUIN, R. G. S. PLATTS: IEE Colloqium Digest, No.1992/108. 35. Progress Report, Veterans Administration R & D Center, 1988, Palo Alto CA, USA. 36. M. A. REGALBUTO, T. A. KROUSKOP, J. B. CHEATHAM: Toward a practical mobile robotic aid system for people with severe physical disabilities, Journal of Rehabilitation Research and Development, 29,1, 1992, pp. 19-26. 37. P. DARIO, E. GUGLIELMELLI, C. MULE', M. DI NATALE: "URMAD: An Autonomous Mobile Robot System for the Assistance to the Disabled, 6th International Conference on Advanced Robotics ('93 ICAR), Tokyo, Japan, November 1-2, 1993, pp. 341-346. 38. P. DARIO, E. GUGLIELMELLh L'introduzione di nuove tecnologie nell'ambiente domestico. Atti del Convegno "Tecnologie, servizi ed edilizia innovativa per la terza et?z", SAIEDUE, Bologna, Italy, 1994. 39. J. EVANS et al.: Handling real-world motion planning: a hospital transport robot, IEEE Control Systems Magazine, 1992, pp. 15-19. 40. E. NAKANO et al.: First approach to the development of the patient care robot, Proc. of llth Int. Symp. on Industrial Robots, Tokyo, 1981, pp. 95-102. 41. S. HASHINO: Aiding Robots. Advanced Robotics, 7,1, 1991, pp. 97-103 42. R. L. MADARASZ et al.: The design of an autonomous vehicle for the disabled, IEEE J. of Robotics and Automation, Vol. RA-2, No. 3, 1986, pp. 117-126. 43. P. WELLMAN. An adaptative mobility system for the system. Proc. of the 1994 Int. Conf. on Robotics and Automation, San Diego, CA, 1994, 3, pp. 2006-2011

719 44. G. BEKEY et al.: Control architecture for the Belgrade-USC hand, Proc. of 1988 IEEE Int. Conf. on Robotics and Automation, Philadelfia, PA, USA, 1988. 45. S. C. JACOBSEN et al.: Development of the UTAH artificial arm, IEEE Transactions on Biomedical Engineering, BME-29(4), pp. 249-269. 46. S. C. JACOBSEN et al.: Design of the UTAH/MIT dextrous hand, Proc. of the 1986 IEEE Int. Conference on Robotics and Automation, San Francisco, CA, USA, pp. 1520-1532. 47. J. P. MICALEFF AND P. RABISHONG: Motorized assistance for the handicapped: control systems, vehicles, walking machine, Proceedings of the First European Workshop on Evaluation of Assistive Devices for Paralyzed Persons, 1984 pp. 125-132, Edited by A. Pedotti and R. Andrich, COMAC BME. 48. T. DOHI: Robot technology in medicine, Advanced Robotics, Vol. 7, No. 2, pp. 179187, 1993. 49. D. J. EDELL: A peripheral nerve information transducer for amputees: long-term multichannel recordings from rabbit peripheral nerves, IEEE Transactions on Biomedical Engineering, Vol. BME-33, no. 2, February 1986, pp.203-213. 50. G. T. A. KOVACS, J. M. ROSEN: Regeneration-type peripheral nerve interfaces for direct man/machine communication, in Robots and biological systems: towards a new bionics?, NATO ASI Series, Springer Verlag, Berlin, 1993, Vol. 102, pp. 637-666. 51. M. COCCO et al.: An Implantable Neural Connector Incorporating Microfabricated Components, Proceedings of MicroMechanics Europe, Neuchatel, Switzerland,, 1993, pp. 185-188 52. P.DARIO, G. SANDINI, P. AEBISCHER: Robots and biological systems: towards a new bionics?, NATO ASI Series, Springer Verlag, Berlin, 1993. 53. P. DARIO, M. DE MICHELI, G. SANDINI, M.TISTARELLI: Retina-like ccd sensor for active vision, NATO A R W on Robots and Biological Systems, I1 Ciocco, Tuscany, Italy, 1989. Springer-Verlag. 54. M. RUCCI, P. DARIO: Autonomous learning of tactile-motor coordination in robotics, Proc. of the 1994 IEEE International Conference on Robotics and Automation, San Diego, CA, May 8-13, pp. 3230-3236. 55. H. TAKANOBU, A. TAKANISHI, I. KATO: Design of a mastication robot mechanism using a human skull model, Proc. of '93 IROS, Yokohama, Japan, July 26-30, 1993, pp. 203-208. 56. P. COIFFET: Robot habilis and robot sapiens or some ideas about robot man links in robot design, Proc. of 1st European. Conference on Medical Robotics (ROBOMED "94), June 20-22, 1994, pp. 169-174. 57. P. RABISCHONG: Is man still the best robot, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 157-159.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

721

Planning, Calibration and Collision-Avoidance for Image-Guided Radiosurgery * Achim Schweikard *t, Mohan Bodduluri *, Rhea Tombropoulos * w and John R. Adler t * Robotics Laboratory, Department of Computer Science, Stanford, CA 94305 t Department of Neurosurgery, Stanford, CA 94305 * Accuray Inc., Santa Clara, CA 95054 w Section on Medical Informatics, School of Medicine, Stanford, CA 94305 In radiosurgery a moving beam of radiation is used as an ablative surgical instrument to destroy brain tumors. Classical radiosurgical systems rely on rigid skeletal fixation of the anatomic region to be treated. This fixation procedure is very painful for the patient and limits radiosurgical procedures to brain lesions. Furthermore, due to the necessity of rigid fixation, radiosurgical treatment with classical systems cannot be fractionated. A new camera-guided system capable of tracking patient motion during treatment has been built to overcome these problems. The radiation source is moved by a 6 degree-of-freedom robotic arm. In addition to offering a more cost effective, less invasive, and less painful treatment, the robotic gantry allows for arbitrary spatial motion of the radiation source. Based on this feature we can treat non-spherical lesions with accuracies unachievable with classical radiosurgical systems. The system introduces a new class of radiosurgical procedures, called non-stereotactic, or image-guided radiosurgery. At the heart of these procedures are algorithms for planning both a treatment and the corresponding beam motion, given the geometric description of the tumor shape and relative locations in the particular case. 1. I N T R O D U C T I O N In radiosurgery, brain tumors (and other brain lesions) are destroyed by moving an intense beam of radiation [2,6]. A necrotic dose is delivered to a tumor by cross-firing from multiple directions, to reduce the amount of energy deposited in healthy brain. The radiosurgical treatment consists of several phases. First, a precise 3D map of the anatomical structures in the area of interest (for instance, the brain) is constructed using computed tomography (CT) and magnetic resonance (MR). Next a motion path for the radiation beam is computed to deliver a dose distribution that the surgeon finds acceptable (taking into account a variety of medical constraints). Finally, a jointed mechanism moves the radiation source according to this path. *This research is funded in part by the Sheik Enany Fund, Lorraine Ulshafer Fund and by the National Library of Medicine (LM-05305 and LM-07033). The authors thank Joe Depp, Martin Murphy and Bo Preising of Accuray Corp. for comments and suggestions.

722

Figure 1. Robotic radiosurgical system, patient couch, on-line x-ray vision system

A new radiosurgical system has been built to overcome limitations of earlier systems. The new system is based on an on-line x-ray vision system and a 6 degree-of-freedom manipulator moving a compact and light weight linear accelerator. 2. S Y S T E M

OVERVIEW

The system consists of the following components (Figure 1): 9 R a d i a t i o n source: A novel compact and light-weight linear accelerator running at high frequency generates a 6 MV photon beam. 9 M e c h a n i c a l gantry: A GM-Fanuc 420 manipulator with 6 degrees-of-freedom moves the linear accelerator during treatment. 9 T r e a t m e n t planning: Geometric planning algorithms are at the heart of the new system. A beam path is computed given the constraints placed by the desired dose distribution. The beam path must observe additional constraints, i.e. the path must be collision-free and must not obstruct the surveillance and on-line vision subsystems. 9 Surveillance: An ultrasonic system surveys the position of the robot and the linear accelerator during treatment. 9 S t a t i c and D y n a m i c Imaging: CT and MR scanners acquire pre-operative images of the brain. An online vision system with two orthogonal x-ray cameras takes images of the patient's skull twice every second and computes the patient's position by correlating the images to precomputed radiographs. Small movements of the head are compensated for by a corresponding motion of the manipulator arm. Larger movements cause the radiation process to be interrupted. During treatment, the linear accelerator is moved in point-to-point mode through a series of configurations. At each configuration, the beam is activated for a small time interval St, while the arm is held still.

723 Currently, the beam has circular cross-section the radius of which remains constant throughout a given treatment. However, this radius is initially set between 5mm and 40mm by mounting a collimator (lead inset for beam focusation) to the radiation source. The new system offers the following advantages over earlier radiosurgical and radiotherapeutical systems (see e.g. [7,9,10]): 9 The tissue does not have to be fixed in space. In earlier radiosurgical systems this was done with stereotaxic frames. Besides being very painful for the patient, this fixation was only possible for the lesions in the head, so the use of radiosurgical methods was limited. 9 Treatment can be fractionated, i.e. the total dose can be delivered in a series of 2-30 treatments, where only a small dose is delivered during each treatment. Fractionation has turned out to be highly effective in radiation therapy, but could not be used in radiosurgery due to the necessity of rigidly fixing the tissue in space. 9 Based on geometric planning algorithms the radiation dose can be focused within the lesion with high accuracy so that healthy tissue can be protected, and side effects of radiation can be reduced dramatically. In particular, our algorithms allow for generating dose distributions with prescribed non-spherical isodose surfaces and high homogeneity. A prototype of the new system has been installed at Stanford Medical Center. The use of the system for clinical investigation was approved in February 1994. 3. G E O M E T R I C

TREATMENT

PLANNING

The objective of treatment planning is to find a motion suitable for treating the particular condition presented by a patient. This planning problem is illustrated in Figure 2. Figure 2-a schematically depicts an axial cross-section of the brain with a circular tumor T. If T is irradiated from only one direction r l , the healthy tissue along the beam absorbs approximately the same dose as the tumor. If, instead, we use two beam directions, r l and r2, the dose deposited in the tumor is approximately twice the dose in healthy tissue. Using more beam directions can lead to further improvements of the dose distribution, and a very sharp drop-off of the dose in the tissue surrounding the (spherical) tumor region can be achieved. In this type of treatment, the axes of all beams cross a single point in space, called the isocenter. However, for tumors of non-spherical shape this single-isocenter treatment procedure is problematic (Figure 2-b). Treatment planning has to take into account the following constraints. 9 The region receiving high dose should match the tumor region. To protect healthy tissue, sharp dose drop-off should be achieved in the healthy tissue surrounding the tumor. 9 Certain anatomic structures (e.g. optic nerves, arteries, and brain stem) are very critical and/or radiation sensitive. The dose in such critical regions should be very small.

724

Figure 2. Cross-firing at a tumor

9 Dose homogeneity: for certain lesions, it is essential that the dose inside the lesion be uniform; the dose in healthy non-critical tissue should also be uniform. Planning a radiosurgical or radiotherapeutical treatment is difficult and time-consuming. In many cases, the constraints characterizing an adequate dose distribution cannot be met simultaneously. Our method for computing a suitable treatment plan proceeds in two steps" 9 Beam selection: beams are selected according to the given tumor shape. The selection uses a reachability model of the manipulator workspace. 9 Beam weighting: an intensity or weight is assigned to each beam. The weight is given by the duration of beam activation. 4. S E L E C T I N G

BEAM

CONFIGURATIONS

4.1. Spherical T u m o r s First assume the tumor is spherical, with center p and radius r. We set the radius of all beams to r. The beam configurations are then selected in the following way. The axes of all beams cross the tumor center p (p is the isocenter). We consider beam directions determined by vertices of a regular (Platonic) polyhedron centered at p. The choice orientation of this polyhedron is problematic if there are critical healthy structures in the t u m o r vicinity. Furthermore there is no such regular polyhedron with more than 20 vertices. A more even distribution is obtained using more directions. The methods in [14] provide point grids on spheres which are adequate for our purposes. Our planning method thus uses a large set of beams (typically n >_ 300) for a single sphere, and addresses the presence of critical regions not during the beam selection phase, but during the beam weighting phase.

725

Figure 3. Isodose surface, beam directions. Region inside surface absorbs 50% or more of maximum dose

4.2. Ellipsoidal Shapes While the scheme in the previous paragraph gives both very sharp dose drop-off around a sphere-shaped lesion and sufficient homogeneity, generalizations of this scheme for treating non-spherical tumors are not obvious. A first attempt is to use two spheres instead of a single sphere, where each sphere is treated independently. However, it is generally difficult to cover an arbitrary tumor shape with spheres, such that the volume of overlap with the tumor environment is small, and the entire shape is covered. This is aggravated by the fact that the two spheres should not intersect, since this would yield very inhomogeneous dose distributions, i.e. high dose (so-called hot spots) in the intersection region of the spheres and/or insufficient coverage of the tumor region. Studies on complication rates show that hot spots, even inside the tumor, should be avoided if at all possible. However, a simple generalization of the above single sphere method is obtained in the following way. Instead of using a single isocenter, we place a series of isocenters evenly spaced along a line segment. Each such isocenter is treated in the same way as a single isocenter. Central to this approach is the fact that each sphere receives very low dose, and many spheres are overlaid. In this way we can avoid hot spots in the intersection of two spheres. The dose distribution resulting from such a motion as well as isocenter points and beam directions for generating this distribution are shown in Figure 3. Figure 3 shows a 50% isodose surface, i.e. the region inside the surface receives 50o-/; or more of the maximum dose. There are several problems with this method: 1. If we use the same set of directions for all isocenters thus chosen, many of the beams will be parallel. Accumulated dose fall-off from such beams will cause dose inhomogeneity in the surrounding tissue.

726

Figure 4. Sweep-motion (a) beam has circular cross-section (b) Beam cross-section is rectangular

2. Dose inhomogeneity in the tumor may occur. The dose in the tumor center will be much higher than the dose along the tumor boundary, as illustrated in Example 1 below. This inhomogeneity is in large part due to the cylindrical beam shape. To address the first problem, we choose rotated beam sets for each isocenter. In practice rotating the entire beam set aimed at one isocenter by a fixed amount is equivalent to choosing each individual beam direction from a finer spherical point grid. Furthermore, instead of using several beams for each isocenter, we can use a larger set of isocenters, evenly spaced on a line segment and a single beam (from the finer sphere grid) for each such isocenter. Thus we place a single beam through each isocenter point along the line segment. Each beam direction is then chosen (in random order) from a fine grid of points on the unit sphere. The second problem can be addressed in several ways. The following example shows that the mentioned problem is related to the collimator shape.

4.3. Example Figure 4-a shows the cross-section of a cylindrical beam swept along a line segment 1. The beam axis is orthogonal to the plane of cross-section. The beam cross-section is a circle c. The point p will absorb higher dose than point p' during this sweep-motion. The dose at these points will be proportional to the length of the segments s and s', thus causing substantial inhomogeneity. A beam with rectangular cross-section avoids this problem (Figure 4-b). 4.4. General Shapes The above approach is restricted to the pseudo-cylindrical shapes shown in Figure 3 and shapes obtained from bending such shapes. A direct generalization of this scheme is the following. With an algorithm in [14], we compute a grid of n points on the t u m o r

727 surface. Each such point is now treated as an isocenter. As above, a single beam is placed through each isocenter. This scheme addresses both the first and the second of the mentioned problems: Accumulation along certain principal directions is avoided by using a fine grid. Accumulation of high dose in the tumor center is avoided by placing isocenters on the tumor surface. To achieve homogeneity, the point grid on the input surface can be expanded or retracted interactively. A path thus generated is refined by computing beam weights. 5. C O M P U T I N G

BEAM

WEIGHTS

During the second planning step, weights are assigned to each of the previously selected beams. Let C 1 , . . . , Cn denote the cylindrical beams at the n selected configurations. Our method will assign a dose value wi >_ 0 to every Ci. wi is a factor specifying activation duration at the particular (static) beam configuration. The values W l , . . . , wn determine a distribution D, defined as follows. If p is a point and C i l , . . . , Cik are the cylinders containing p, then D(p) - Wil + " " + wik. (This definition provides a coarse model for photon beam characteristics. Refinements described in [12] allow for a more accurate representation of these characteristics.) We consider two disjoint regions T (for tumor) and H (for healthy tissue), with the following constraints: the dose delivered at each point in T must be larger than some value a, while the dose at each point of H must be below/3 (/3 < a). The n cylinders, T, and H define an arrangement of cells in space. Each cell is defined as a maximal connected set not containing any piece of the boundaries of regions T and H or the cylinders. For each cell we compute a label. A cell a in cylinders C i l , . . . , Cik has label 1 - { i l , . . . , ik}. The calculation of W l , . . . , w~ reduces to finding a point in the intersection of two ndimensional polyhedral sets: If ~ is in T and labeled by { i l , . . . , i k } then g determines the inequality: OL ~ Wil + ' ' " + Wik.

A r162 ,~' l~br

by { j ~ , . . . , j~,} il~ H gives-

/3 E wj~ + . . . + wjk,. The inequalities for all cells in T determine a convex polyhedral set P~. Similarly, the inequalities derived from the cells in H determine a polyhedron PZ. If P~ and PZ intersect, any point (Wl,..., w~) in the intersection gives a dose distribution that satisfies the given constraints. Otherwise, the problem admits no solution. More generally, we can specify several healthy critical or non-critical regions H 1 , . . . , Hq marked by distinct maximal doses/31,...,/3q. We then obtain polyhedra P~, PZ,,..., PZ~. Any point (Wl,..., w~) in the intersection of these polyhedra determines a dose distribution that satisfies the input constraints. The intersection of the polyhedral sets P~, P91,.-., PZq is another n-dimensional convex polyhedral set. Extreme points of this set can be computed, allowing us to deal directly with optmization criteria (e.g., in addition to satisfying the input constraints defined by c~,/31,...,/3q, minimize the dose delivered to some region Hi). We consider two inequalities deriving from two cells ~ and ~' in T. Let L - { i l , . . . , ik} and L ' - { j l , . . . ,jk,} be the labels of ~ and ~'. If L C L', then the inequality given by

728

Figure 5. Regions T, H, minimal and maximal cell nmi,~ resp.

I~ma x .

n implies the inequality given by n I, since all wi are positive or null. A T-cell is called minimal, if its label is not a superset of any other T-label (Figure 5). Similarly, a cell in Hi is called maximal, if its label is not a subset of any other label in Hi. Thus, in the above polyhedral intersection test, we need only to consider minimal T-cells and maximal H-cells. 6. K I N E M A T I C S

AND

COLLISION AVIODANCE

The beam configurations selected by the planning system must be organized into a sequence defining a collision-free path of the linear accelerator. For safety reasons, we currently do not use optimization methods for computing this sequence. Instead, we use a hard-coded sequence of beam configurations, called nodes. A fixed motion connecting the nodes provides a path template. An essential feature of the beam selection method is the following: beam configurations can be selected in such a way that the deviation between the hard-coded path template and the treatment beams calculated for an individual patient is small and upper-bounded. For all node configurations the beam axis crosses the work space origin. A node is thus simply given by a tool center point placement. We use tool center points on a sphere of appropriate radius centered at the origin. For each node we compute a local joint space. Thus, angle limits within which the arm and the accelerator can be moved without colliding or getting into the line of sight of a camera are associated with each node. To compute treatment beams given the particular tumor shape and location as described in section 3, we connect isocenter points to the tool center points of nodes. Thus each treatment beam is given by an isocenter point and the tool center point for a node. Given reasonable bounds for the tumor size we can assume that isocenter points are close to the workspace origin. Since the source-target distance for the beam is large, we can ensure that the angular deviation between a hard-coded node and associated treatment beam configurations remains below a fixed threshold, and within the local joint space of the node (Figure 6). The placement of a cylinder beam in space is kinematically redundant. The beam

729

Figure 6. Origin O, target region T, tool-center points

Vl,...

, Vr~

for hard-coded nodes

can be translated along and rotated about its axis without changing spatial placement. Thus a cylinder has 4 degrees-of-freedom. The range of beam directions should be as large as possible to allow for homogeneous distributions. A simulating program is used to determine tool center points for nodes covering a large portion of a half-sphere (Figure 7). The beam widens with source-target distance. By placing tool center points for all nodes and treatment points on a sphere we can ensure that all beams have nearly constant and equal radius in tissue. Effector orientation is represented by 3 Euler angles such that the last of these angles gives the rotation about the beam axis. For each node an appropriate value for this angle is computed in such a way that arm posture changes along the path can be avoided. For verification the path in Figure 7-b was executed and a photographic film in a water phantom (plexiglass reconstruction of a skull, placed at the end of the patient couch in Figure 1) was exposed to the generated radiation. Figure 8 shows the photographic film. During execution of the path in Figure 7-b an equal dose value was delivered at each node, with the beam axis aimed at the work space origin. The restrictions in the workspace result in a penumbra along the horizontal direction in Figure 8-c and -d. A total of 6000 cGy was delivered at 300 nodes, resulting in a total execution time of 1 hour. 7. CALIBRATION

The positioning accuracy of the mechanical gantry is much lower than the repeating accuracy. The radiation source must remain at sufficient distance from the tissue to be irradiated. Small inaccuracies in positioning the radiation source are thus amplified and may cause inaccuracies in target localization. Thus the hard-coded motion path was calibrated in the following way. An optical crystal was placed at the work space origin. A laser beam pointing along the beam central axis was placed in the radiation source. The crystal is activated when touched by the laser beam. Activation intensity can be read

730

Figure 7. Path template connecting node configurations

Figure 8. Photographic film phantom for spherical distribution (axial cross-sections)

from the crystal, thus giving a measurement for placement accuracy. The calibration process considers each node separately. The end effector is placed at grid points in a small local joint space surrounding the node. At each grid point the laser is activated and a measurement is taken from the crystal. For the given node we can thus find joint angle offsets placing the beam in such a way that the beam axis crosses the origin. The angular deviation between treatment points and hard-coded nodes is small, and we use the offsets thus computed as offsets for the treatment beams as well. A bound of 0.3 mm for inaccuracies was determined experimentally for isocenter points close to the origin. 8. P L A N N I N G

SYSTEM

A treatment planning system with 3D graphical user interface based on the above beam selection and weighting methods is incorporated into the new system. It is designed to enable the surgeon to make use of the system flexibility. The tumor and critical tissues are described as polyhedra. The regions of interest are delineated by polygons in the MR images showing axial, sagital, or coronal cross-

731

Figure 9. Sample case (Stanford Medical Center). (a) Axial CT with tumor delineated in black (b) 3D representation of tumor delineation (c) Treatment beams (d) Isodose surface overlaid on tumor reconstruction

sections of the brain. These polygons are thickened between two cross-sections, and a 3D reconstruction of the tumor shape is computed. The radius r of the (cylindrical) beam and the number (n) of beam configurations are given as input. The beam selection method in section 3 in combination with the above collision avoidance method is invoked to compute a set of beam directions corresponding to the input tumor shape and location. To compute beam weights, our implementation uses a modified simplex algorithm to find a point within the intersection region of k n - d i m e n s i o n a l polyhedra. A test series for the new system was carried out to evaluate the possibilities of generating prespecified dose distributions. In the test series, cases treated during a comparison period with an earlier system for frame-based stereotactic radiosurgery [7] were considered. In each case data for the actual treatment performed were retrieved. Plans for the same cases were then computed with our new planning system. The comparison is based on photographic film phantom studies in water, dose-volume histograms, as well as 2D and 3D dose visualization paradigms. The results of the test series and variants of the above methods are discussed in [12]. An example is shown in Figures 9 and 10. Figure 9-a shows an axial cross-section of the brain for a 69 year old patient presenting a large non-spherical tumor of the falx cerebri, delineated in the cross-section. Figure 9-b shows the 3D reconstruction of the tumor outline. A regular grid of 600 points was expanded on a surface reconstruction,

732

Figure 10. Film phantom for sample case in Fig. 9, showing the shape of the region receivinghigh dose. Film slices in the phantom are arranged in the same spatial structure as the tomographic images.

giving isocenter points. Figure 9-c shows the treatment beams used. Figure 9-d overlays the input tumor shape with the generated 50% isodose surface to illustrate the matching between input and output shapes. For verification, the motion was executed exposing a photographic film phantom (Figure 10). The phantom shows a close matching between the input tumor shape and the actual shape of the region receiving high radiation dose.

9. C O N C L U S I O N A prototype of the new system has been installed at Stanford Medical Center and the use of the system for clinical investigation has been approved. Systems will soon be installed at several major medical research institutions. The methods described divide treatment planning into two steps: (1) Selection of beam configurations (2) Computation of dose weights for these configurations. For the latter step, we proposed a general theoretical approach. The evaluation of the planning methods gives very encouraging results. Improved methods could include other collimator geometries and corresponding generalizations of the planning paradigms. The planning method relies on weighted arrangements in 3D space. This is not limited to cylinders, and seems appropriate for treatment planning in conventional radiation therapy as well. It is likely that the new systems will allow for radiosurgical treatment of tumors outside the brain, including tumors for which no treatment is currently available. In particular, extensions are desirable for very radiation-resistant tumors and cases in which conventional radiation therapy would cause too much damage to healthy tissue and is not applicable.

733 REFERENCES

1. Barth, N. H. An Inverse Problem in Radiation Therapy. Intern. d. Radiation Oncology Biol. Phys., 18:425-431, 1990. 2. Betty, O. O., Munari, C., and Rosler, R. Stereotactic Radiosurgery with the Linear Accelerator: Treatment of Arteriovenous Malformations. Neurosurgery, 24(3):311321, 1989. 3. Brahme, A. Optimization of stationary and moving beam radiation therapy techniques. Radiother. Oncol., 12:127-140, 1988. 4. Cormack, A. A problem in rotation therapy with x-rays. Int. J. Radiat. Oncol. Biol. Phys., 13:623-630, 1987. 5. Halperin, D. and Yap, C.K. Combinatorial Complexity of Translating a Box in Polyhedral 3-Space 9th Annual A CM Syrup. on Computational Geometry, 29-37, 1993. 6. Larsson, B. et al. The High Energy Proton Beam as a Neurosurgical Tool. Nature, 182:1222-1223, 1958. 7. Lutz, W., Winston, K. R., and Maleki, N. A System for stereotactic radiosurgery with a linear accelerator. Int. d. Radiation Oncology Biol. Phys., 14:373-381, 1988. 8. Murtagh, B. A., Saunders, M. A., Minos 5.4 user's guide. Technical Report SOL 83-20R, Dept. of Operations Research, Stanford University, 1983. 9. Podgorsak, E. B., et al. Dynamic Stereotactic Radiosurgery Intern. d. Radiation Oncology Biol. Phys., 14:115-126, 1988. 10. Rice, R. K., et al. Measurements of Dose Distributions in Small Beams of 6 MeV X-Rays. Phys. Med. Biol., 32:1087-1099, 1987. 11. Schweikard, A., Adler, J. R., and Latombe, J. C., Motion Planning in Stereotaxic Radiosurgery. Proc. IEEE Int. Conf. Robotics and Automation, Atlanta, GA, May 1993, 1909-1916. (Extended version to appear in IEEE Tr. Robotics and Automation.) 12. Schweikard, A., Tombropoulos, R. Z., Kavraki, L., Adler, J.R., and Latombe, J.C., Treatment Planning for a Radiosurgical System with General Kinematics. Proc. IEEE Int. Conf. Robotics and Automation, San Diego, CA, May 1994, 1720-1727. 13. Troccaz, J. et al. Conformal External Radiotherapy of Prostatic Carcinoma: Requirements and Preliminary Results, Rep. No. 912I, TIMC-IMAG, Facult6 de M6decine, Grenoble, 1993. 14. Turk, G. Re-Tiling Polygonal Surfaces. Computer Graphics, 26(3):55-64, 1992.

This Page Intentionally Left Blank