Autonomous Systems – Self-Organization, Management, and Control
Bernd Mahr · Sheng Huanye Editors
Autonomous Systems – Self-Organization, Management, and Control Proceedings of the 8th International Workshop held at Shanghai Jiao Tong University, Shanghai, China, October 6–7, 2008
123
Bernd Mahr Technische Universität Berlin Berlin Germany
[email protected]
ISBN 978-1-4020-8888-9
Sheng Huanye Shanghai Jiao Tong University Shanghai China
[email protected]
e-ISBN 978-1-4020-8889-6
Library of Congress Control Number: 2008931287 All Rights Reserved c 2008 Springer Science + Business Media B.V. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
This page intentionally left blank
Committee
Workshop Co-Chairs Bernd Mahr Yuxi Fu
Program Committee Bao-Liang Lu Liqing Zhang Bernd Mahr
Organizing Committee Bao-Liang Lu (Chair) Wolfgang Brandenburg Min-You Wu Liqing Zhang Tianfang Yao Fang Li
Editorial Office and Layout Wolfgang Brandenburg
Book Editors Bernd Mahr Technische Universit¨at Berlin Fakult¨at IV, Sekretariat FR 6-10 Franklinstr. 28/29 10587 Berlin, Germany
Sheng Huanye Shanghai Jiao Tong University Department of Computer Science and Engineering Shanghai 20000240, China v
This page intentionally left blank
Preface
The International Workshop on “Autonomous Systems – Self-Organization, Management, and Control” is the eighth in a successful series of workshops that were established by Shanghai Jiao Tong University and Technische Universit¨at Berlin. The goal of these workshops is to bring together researchers from both universities in order to present research results to an international community. The series of workshops started in 1990 with the International Workshop on Artificial Intelligence and was continued with the International Workshop on “Advanced Software Technology” in 1994. Both workshops have been hosted by Shanghai Jiao Tong University. In 1998 the third workshop took place in Berlin. This International Workshop on “Communication Based Systems” was essentially based on results from the Graduiertenkolleg on Communication Based Systems that was funded by the German Research Society (DFG) from 1991 to 2000. The fourth International Workshop on “Robotics and its Applications” was held in Shanghai in 2000. The fifth International Workshop on “The Internet Challenge: Technology and Applications” was hosted by TU Berlin in 2002. The sixth International Workshop on “Human Interaction with Machines” was hosted by Shanghai Jiao Tong University in 2005. The seventh workshop “Embedded Systems – Modeling, Technology, and Applications” was held in Berlin in 2006, where for the first time three TUB students received their SJTU master degrees and TUB diplomas after successfully studying at both universities in accordance with the agreement on the dual degree program between SJTU and TUB. Since, a considerable number of students from both universities have been awarded the two degrees under this program. The subject of this year’s workshop reflects the increasing interest in autonomous systems, which can also be observed in the wide spread activities at our two universities, for example in the recent work on the brain computer interface, on new features of search engines, and in the new research initiative on flying sensors. Autonomy of action has become a broadly desirable property of systems which operate in complex technical, societal and natural environments. Moving or flying robots, for example, need to act on sensor data in real-time, semantic services need to adapt to changing requirements of their users, and information systems which are based on huge information sources, like the World Wide Web, need to select information
vii
viii
Preface
in view of their user’s opinions and interests. To develop such systems and applications advanced techniques, algorithms, languages, methodologies, and architectures are required, which allow for self-organization, self-management, and self-control. Reporting on recent developments in autonomous systems and on studies of their specification and analysis, this workshop addresses the following topics: • • • • •
Techniques of Sensing and Recognition Applications of Autonomous Behavior Self-Reference and Formal Analysis Architecting and Management Opinion Mining and User Interest
The continuous support of both universities is gratefully recognized. It enabled the permanent exchange of ideas between researchers of our two universities. Berlin May 2008
Bernd Mahr and Sheng Huanye
Contents
Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Head Pose Perception Based on Invariance Representation . . . . . . . . . . . . Wenlu Yang and Liqing Zhang
1
Geometrical Approaches to Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . Kamil Adiloglu, Robert Annies, Falk-Florian Henrich, Andr´e Paus, and Klaus Obermayer
11
Detecting Drowsiness in Driving Simulation Based on EEG . . . . . . . . . . . . Jia-Wei Fu, Mu Li, and Bao-Liang Lu
21
Multi-Task BCI for Online Game Control . . . . . . . . . . . . . . . . . . . . . . . . . . . Qibin Zhao, Liqing Zhang, and Jie Li
29
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian Li and Weidong Chen A Slung Load Transportation System Based on Small Size Helicopters . . Markus Bernard, Konstantin Kondak, and G¨unter Hommel Multi-Source Data Fusion and Management for Virtual Wind Tunnels and Physical Wind Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huijie Hu, Xinhua Lin, and Min-You Wu
39 49
63
Flying Sensors – Swarms in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan J¨ahnichen, Klaus Brieß, and Rodger Burmeister
71
∈µ -Logics – Propositional Logics with Self-Reference and Modalities . . . . Sebastian Bab
79
ix
x
Contents
Compositionality of Aspect Weaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian Kamm¨uller and Henry Sudhof Towards the Application of Process Calculi in the Domain of Peer-to-Peer Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sven Schneider, Johannes Borgstr¨om, and Uwe Nestmann
87
97
Specification Techniques (Not Only) for Autonomous Systems . . . . . . . . . . 105 Peter Pepper Quality Assurance for Concurrent Software – An Actor-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Rodger Burmeister Enabling Autonomous Self-Optimisation in Service-Oriented Systems . . . 127 Hermann Krallmann, Christian Schr¨opfer, Vladimir Stantchev, and Philipp Offermann Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Michael A. Jaeger, Gero M¨uhl, Matthias Werner, Helge Parzyjegla, and Hans-Ulrich Heiss Combining Browsing Behaviors and Page Contents for Finding User Interests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Fang Li, Yihong Li, Yanchen Wu, Kai Zhou, Feng Li, Xingguang Wang, and Benjamin Liu Patent Classification Using Parallel Min-Max Modular Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Zhi-Fei Ye, Bao-Liang Lu, and Cong Hui A Study of Network Informal Language Using Minimal Supervision Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Xiaokai Zhang and Tianfang Yao Topic Identification Based on Chinese Domain-Specific Subjective Sentence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Hang Yin and Tianfang Yao
Head Pose Perception Based on Invariance Representation Wenlu Yang and Liqing Zhang∗
Abstract This paper investigates head pose estimation problem which is considered as front-end preprocessing for improving multi-view human face recognition. We propose a computational model for perceiving head pose based on neurophysiological plausible invariance representation. In order to obtain the invariance representation bases or facial multi-view bases, a learning algorithm is derived for training the linear representation model. Then the facial multi-view bases are used to construct the computational model for head pose perception. The measure for head pose perception is introduced that the final-layered winner neuron gives the resulting head pose, if its connected pre-layer has the most firing neurons. Computer simulation results and comparisons show that the proposed model achieves satisfactory accuracy for head pose estimation of facial multi-view images in the CAS-PEAL face database.
1 Introduction Human possess a remarkable ability to recognize faces regardless of facial geometries, expressions, head poses (or facial views), lighting conditions, distances, and ages. Modeling the functions of face recognition and identification remains a difficult problem in the fields of computer vision, pattern recognition, and neuroscience. Particularly, face recognition robust to head pose variation is still difficult in the L. Zhang Lab for Perception Computing, Shanghai Jiao Tong University, China e-mail:
[email protected] W. Yang Lab for Perception Computing, Shanghai Jiao Tong University, China e-mail:
[email protected] and Department of Electronic Engineering, Shanghai Maritime University, China e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
1
2
W. Yang, L. Zhang
complex natural environment. Therefore, head pose estimation is a very useful front-end processing for multi-view human face recognition. Generally, head pose variation covers the following three free rotation parameters, such as yaw (rotation around the neck), tilt (rotation up and down), and roll (rotation from left profile to right profile). In this paper, for simplicity, we will focus on yawing rotation which has many important applications. Previous methods on head pose estimation can be roughly divided into two categories: feature-based approach [8] and image-based approach [6, 10]. These existing methods need either elaborate configurations of facial landmarks or projection to a subspace, and then classify the facial images into groups with different head poses. Unlike the previous methods, our method takes advantages of brain-like perception to model the mechanism of head pose perception. The proposed model is three-layer network. The first layer receives multi-view facial images. Neurons at the second layer respond to stimuli through their receptive fields, the facial multi-view (or head pose) bases which can be learned from facial images. The winner neuron at the final layer gives the resulting head pose according to the measure of multi-view perception. The rest of this paper is organized as follows. First, we propose a computational model for pose perception based on invariance representation and ICA decomposition. Second, computer simulations are provided to show the performance of our proposed model. Finally, conclusions are made.
2 The Perception Model for Pose Estimation In this section, we first introduce a framework for learning the facial multi-view bases. Then we propose a computational model and corresponding algorithm for head pose perception. Based on Redundancy Reduction [1] proposed by Barlow, Olshausen and Field [9] presented the Sparse Coding method that only minor neurons respond to the stimulus from the natural environment, whereas the major neurons weakly respond. They provided experimental results that the learned bases from natural images are considered as the receptive fields. The corresponding coefficients are described as the supergaussian probability distribution. An alternative method is Independent Component Analysis (ICA) [5] that imposes the mutual independent constraint on the responses of neurons, and obtains the similar results that receptive fields of simple cells in primary visual cortex are localized, oriented, and bandpassed. Using ICA decomposition, we propose a framework for spatiotemporal feature extraction, as shown in Fig. 1. The framework is a four-layer network which includes the input layer L1, the sparse representation layer L2, the integrated layer L3, and the final invariance representation layer L4. At the first layer L1, each neuron in the retina receives gray value of one pixel of input image u(τ ) at time τ . L2 is a layer for sparse representation, and its main function is to represent the input image u(τ ) with features A(τ ) and the corresponding independent components x(τ ). In the mathematical term, u(τ ) = A(τ )x(τ ) = W−1 (τ )x(τ ). Here, W is the
Head Pose Perception Based on Invariance Representation
3
Fig. 1 Framework for spatiotemporal features
inverse of A. The layer L3 is the integrated layer at which a neuron at the location r averages all activities of neurons connected to layer L2, z(r) = mean(x(τ )). This is because that we impose the invariance constraint on the responses of neurons at the third layer. Here, the invariance means that neurons respond with little change because of mean operation. And the final layer L4 is a senior representation layer at which neurons receive responses of neighboring neurons in the layer L3 to extract the transformation invariance. Here, h(r) denotes neighboring connecting weights, and y(r) responses. In other words, the L4 layer achieves transformation invariance representation. To derive the learning algorithm of the framework, we used the Kullback-Leibler divergence between the distribution p(x(τ );W(τ )) of random variable W(τ ) and the reference factorized distribution q(x(τ )), resulting in the cost function [2] as 1 E log qr (x(r, τ )). R(x(τ ), W(τ )) = − log |det(W(τ )WT (τ )| − 2 r
(1)
Applying the Natural Gradient rule to the cost function, the learning algorithm of W(τ ) (the corresponding bases A(τ ) = W−1 (τ )) can be described [12] as W(τ ) = −η(t)
∂R WT (τ )W(τ ) = η(t)[I − ϕ(x(τ ))xT (τ )]W(τ ), ∂W(τ )
(2)
where ϕr (x(r)) = qr (x(r))/qr (x(r)), q(x(r)) is the prior probability distribution over the coefficients of x(r), which is peaked at zero with heavy tails as compared to a Gaussian distribution of the same variance such as the Laplace probability distribution function. The standard ICA estimation method constrains the components to be uncorrelated. Generally speaking, these components have higher-order correlation in firing energy. This is interpreted biologically as simultaneous activation of neurons at the time when neurons receive a stimulus. Therefore, we can use this mechanism to analyze the higher-order correlation of neural responses.
4
W. Yang, L. Zhang
Suppose z(r1 ) and z(r2 ) are responses of two neurons, if z(r1 ) and z(r2 ) are topographical neighborhood: cov(z2 (r1 ), z2 (r2 )) = E{z2 (r1 ), z2 (rz )} − E{z2 (r1 )}{z2 (r2 )} = 0.
(3)
Due to the higher-order correlation, the response of each complex cell is de 1/2 2 scribed as |y(r1 )| = , h(r1 , r2 ) is the connecting weight r2 h(r1 , r2 )z (r2 ) between a complex cell r1 and its neighboring simple cell r2 . That is, the receptive field of a complex cell consists of that of its neighborhood simple cells and the size of receptive field is bigger than that of simple cells. Its probability distribution function is given by the following expression. ⎛ √ ⎞ 2 1 h(r1 , r2 )z2 (r2 )⎠ , (4) q(y(r1 )) = √ exp ⎝− σ 2σ r 2
where σ 2 is the variance of responses. We obtain ⎛ ⎞ √ qr (y(r1 )) ⎜ 2h(r1 , r2 )z(r2 ) ⎟ =⎝ ϕr1 (y(r1 )) = − 1 ⎠. qr1 (y(r1 )) 2 (r ) h(r , r )z r2 σ 1 2 2 r2
(5)
To this end, the learning algorithm of topographically self-organized receptive fields of simple cells is described as W(r, τ ) = η(t)[I − ϕ(y(r))x(r, τ )T ]W(r, τ ),
(6)
where ϕ(y) = [ϕ(y(1)), ϕ(y(2)), · · · , ϕ(y(n))]T . Applying Eq. (6) on the training data, sequences of faces with the view angle α ∈ {±45, ±30, ±15, 0} in degree, yields multi-view bases with the same head poses as faces. Using the bases as connecting weights between the first layer and the second layer, we propose a computational model for multi-view perception, shown in Fig. 2. This is a simplified perception model consisting of three layers. The first layer is to receive the input pattern which is a face with the view angle α ∈ {±45, ±30, ±15, 0} in degree. Therefore, we are to develop the head pose estimation model for perceiving seven view angles. The middle layer is for neural internal information representation. The neurons at this layer sparsely activates if their receptive fields are similar very much to the input patterns. The number of neurons is up to the necessary of the problem. In this paper, we use seven groups and in each group there are 20 × 20 neurons. The final layer consists of seven neurons, and each neuron is to perceive one specific head pose. After the neurons at the middle layer respond to the stimuli uαi , this layer counts the firing neurons in each group with same head poses at the previous layer. The resulting head pose is the corresponding one presented by the group including the most of firing neurons.
Head Pose Perception Based on Invariance Representation
5
Fig. 2 Head pose perception model. The first layer is the input layer at which neurons receive stimuli u(τ ) with different views. The second layer is called sparse representation layer at which neurons will fire if similarity between u(τ ) and connected weights A(r, τ ). The final layer is head pose perception layer, whether or not a neuron fires depends on the number of firing neurons correspondingly connected in the second layer to it. The winner neuron will give the resulting head pose
3 Simulations and Results In this section, we present the experimental results to verify the performance of our proposed model and the learning algorithm. First presented is the training data used for learning facial multi-view features. Second are the bases of multi-view invariance. Finally, the results of head pose perception are given. The training data is selected from the pose subset, images of 1,040 subjects across seven different poses, included in the CASE-PEAL-R1 face database [3]. The training data are generated as follows: for any view τ facial image, detect and crop the face, resize it to size of 36 × 30 pixels. Then these seven cropped faces are reshaped to a column vector as a sample u(τ ), size of 1,080-by-1, respectively. Here, U(τ ) ∈ 1080 × 1040 (τ = 1, 2, . . ., T). As described in Section 2, applying the presented learning algorithm on the training data, we obtain the facial multi-view features. A subset of learned face features, also called ‘ICAfaces’, is shown in Fig. 3. From the figure, it is easily observed that the head poses of ‘ICAfaces’ in each subset are the same. In order to carefully examine the multi-view invariant responses, it is necessary to define what the response invariance is. The multi-view invariant response means that the response of a neuron slightly fluctuates while a face changes its views within the receptive field of the neuron. For example, first we randomly select from face database a testing sample U(m,j ) composed of seven faces with view angles α ∈ {±45, ±30, ±15, 0} in degree. Then compute the responses through seven
6
W. Yang, L. Zhang
Fig. 3 Subsets of multi-view face features with −45, −30, −15, 0, 15, 30, and 45 in degree 0 −5 −10 −15 −20 −25 −30
0
50
100
150
200
250
300
350
400
Fig. 4 Invariance of responses with view change. The x-coordinate is index of neurons, and the y-coordinate is the responses. The top seven lines denote the responses corresponding view angles:{−45, −30, −15, 0, +15, +30, +45} in degree. And the rest of lines denote dispersion between the upper seven ones and the second one
view features and plot that in Fig. 4. Here, the upper seven lines are the responses x(m, j, τ )(m = 1, 2, . . ., 400; τ = 1, 2, . . ., 7; j = 1 for instance), their head poses of face bases are the same as that of stimuli. The rest seven lines denote dispersion between the upper seven ones and the second one. From the figure, the similarity of responses is very high while a face changes its views within their receptive fields. As we have seen, the responses of neurons have an invariant characteristic when an individual’s head rotates horizontally from left to right in their receptive fields. And how about the properties of neuronal responses when different view faces are projected onto the subspace spanned by some view bases? In general, a neuron fires when its response is bigger than a firing threshold. For the simplicity of computation, we limit the number of firing neurons, e.g. N. When seven groups of neurons receiving a stimulus, the most of firing neurons may be no more than N. To investigate the generality of responses, we select randomly three individuals’ seven different view faces as stimuli, shown in Fig. 5. Through different view bases, there are total forty-nine groups of responses that are arranged in 7 × 7 grid. In the grid, the seven blocks in the first row illustrate that neurons respond to seven different view faces through the first facial bases A1 . Following this way, the second row is
Head Pose Perception Based on Invariance Representation
7
Fig. 5 The responses of neurons at the middle layer. Examples of stimuli are in the first row. The distribution of responses of 100, 50, and 10 firing neurons is shown in the second row. See text for more details
through A2 , and so on. In Fig. 5, the top row is seven different view faces of one individual, the first grid is the responses of N = 100, the others are that of N = 50 and 10. It is worthy pointing out three important observations from these invariance representations. The first one is that neurons will strongly respond to some stimulus if its view of receptive field is same as that of the face bases, shown as the diagonal blocks in the grid. This point is consistent with the observation found in the neurophysiological experiments: a neuron will fire strongly when the stimulus has the same feature as that of its receptive field [4]. The second one is that the neighboring view neurons will easily fire, although their responses are not stronger than that of the exact view neurons. And the last one is that only minor neurons are needed for perceiving the head pose. That is in accord with the main idea of Sparse Coding [9]. We further study the distribution of responses. Without loss of generality, randomly select seven view faces of fifty individuals and a group of view −30 degree bases. The resulting responses are projected onto 3D subspace spanned by three principal components obtained by dimensionality reduction using Principle Component Analysis. The projection is shown in Fig. 6. The figure shows us that the responses of the continuous view faces through the same bases form a manifold in the subspace of multi-view bases. This result is similar to the results obtained by using LLE [11]. The promising manifold structure further explains the feasibility of head pose perception using the proposed model. Having shown the view invariance in the feature representation, we apply the model to perception of head pose, or facial view. The view perceptual method is described as follows. First, we randomly select 4,000 faces with different view angles α ∈ {±45, ±30, ±15, 0} in degree from face database. Then feed every face image to the proposed perception model. We limit the number of firing neurons responding to a stimulus. With the maximum number of firing neurons, e.g. N, there should be the only one group at the second layer at which there are N firing neurons. That is
W. Yang, L. Zhang
PC3
8 −45 −30 −15 00 +15 +30 +45
5 0 −5 2 1 0 −1 −2 PC2
−3
−2
−1
1
0
2
PC1
Fig. 6 Manifold distribution of responses
1
0.99
Average accuracy
Average accuracy
1
X: 4 Y: 0.9814
0.98 0.97
0.99 0.98 0.97
5 10 15 Number of firing neurons
20
X: 2 Y: 0.9969
5 10 15 Number of firing neurons
20
Fig. 7 Accuracy of perception of head poses. The x-coordinate is number of firing neurons, and the y-coordinate is the perceptual accuracy of head pose. (left): Exact head pose perception. (right): Head pose perception with error of neighboring poses
the winner. The neuron at the third layer connecting to the winner group gives the resultant head pose. The head pose perception results are shown in Fig. 7. The figures tell us that several firing neurons are able to perceive face view. For example, in the left of Fig. 7, four neurons are enough to high perception accuracy of head poses. In other words, our proposed model provides good performance that neurons fire sparsely when they receive stimuli of faces in different views. This result is similar to theoretical work [9] suggesting that natural scenes can be efficiently represented by a sparse code in the primary visual cortex. The error of head pose perception is considered. That means that it is accepted if a known view faces is perceived as its neighboring views. For example, −15 in degree may be estimated to −30 or 0 in degree. The error results can be derived from the case in which the head pose labeled in the database is not exact truth because of weak control of individuals at the stage of taking photograph. The average accuracy with accepted neighboring head pose is 99.69%, higher than that of exact head pose perception.
Head Pose Perception Based on Invariance Representation
9
Compared to similar works, our method shows some advantages such as simplicity, effectivity, and high accuracy. First, testing on the same face database included in the CAS-PEAL, Ma [7] used the combination of SVM classifier and LGBPH features for pose estimation, showed that their best accuracy of pose estimation is 97.14%. Whereas, the worst average accuracy we obtained is 97.5% using 20 firing neurons and 98.1% using 4 neurons. Second, in the preprocessing process of the face images, Ma et al. applied geometric normalization, histogram equalization, and facial mask to the images, and cropped them to 64 × 64 pixels. And we simply and automatically detect faces and crop them from the images, and to resize them to 36 × 30 pixels for the limit of computing resources. Therefore, our preprocessing process is simpler and the results are better.
4 Conclusions We have proposed a novel framework of learning temporal features and a computational model for head pose perception. Computer simulation results show that our proposed model successfully performs the task of head pose perception. Acknowledgments The work was supported by the National High-Tech Research Program of China (Grant No. 2006AA01Z125) and the National Basic Research Program of China (Grant No. 2005CB724301).
References 1. H.B. Barlow. Redundancy reduction revisited. Network-Comp Neural, 12:241–253, 2001. 2. A. Cichocki, and L. Zhang. Two-stage blind deconvolution using state-space models. In: Usui S, Omori, T (Eds.) The fifth international conference on neural information processing (ICONIP), Kitakyushu, Japan, pp 729–732, 1998. 3. W. Gao, B. Cao, et al. The CAS-PEAL large-scale Chinese face database and evaluation protocols. Technical Report No. JDL-TR-04-FR-001, Joint Research & Development Laboratory, 2004. 4. D. Hubel, and T. Wiesel. Receptive fields and functional architecture of monkey striate cortex. J Physiol, 195:215–243, 1968. 5. A. Hyvarinen, and P.O. Hoyer. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput, 12(7):1705–1720, 2000. 6. S. Li, X. Lv, X. et al. Learning multi-view face subspaces and facial pose estimation using independent component analysis. IEEE T Image Process, 14(6):705–712, 2005. 7. B. Ma, W. Zhang, et al. Robust head pose estimation using LGBP. In: Tang YY, Wang SP, Lorette G, Yeung DS, Yan H (Eds.) Proceeding of international conference on pattern recognition, 2:512–515, 2006. 8. S. McKenna, and S. Gong. Real time face pose estimation. Real-Time Imaging, 4(5):333–347, 1998. 9. B.A. Olshausen, and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
10
W. Yang, L. Zhang
10. W. Peng, and J. Qiang. Multi-view and eye detection using discriminant features. Comput Vis Image Und, 105:99–111, 2007. 11. L.K. Saul, and S.T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res, 4:119–155, 2003. 12. L. Zhang, A. Cichocki, and S. Amari. Self-adaptive blind source separation based on activation function adaptation. IEEE T Neural Network, 15(2):233–244, 2004.
Geometrical Approaches to Active Learning Kamil Adiloglu, Robert Annies, Falk-Florian Henrich, Andr´e Paus, and Klaus Obermayer
Abstract Learning from examples is a key property of autonomous agents. In our contribution, we want to focus on a particular class of strategies which are often referred to as “optimal experimental design” or “active learning”. Learning machines, which employ these strategies, request examples which are maximal “informative” for learning a predictor rather than “passively” scanning their environment. There is a large body of empirical evidence, that active learning is more efficient in terms of the required number of examples. Hence, active learning should be preferred whenever training examples are costly to obtain. In our contribution, we will report new results for active learning methods which we are currently investigating and which are based on the geometrical concept of a version space. We will derive universal hard bounds for the prediction performance using tools from differential geometry, and we will also provide practical algorithms based on kernel methods and MonteCarlo techniques. The new techniques are applied in psychoacoustical experiments for sound design.
1 Introduction Active learning methods seek a solution to inductive learning problems by incorporating the selection of training data into the learning process. In these schemes, the labeling of a data point occurs only after the algorithm has explicitly asked for the corresponding label, and the goal of the “active” data selection is to reach the same accuracy as standard inductive algorithms – but with less labeled data points (see [1–4]). In many practical tasks, the acquisition of unlabeled data can be automated, while the actual labeling must often be done by humans and is therefore time consuming and costly. In these cases, active learning methods – which usually trade labeling costs against the computational burden required for optimal data K. Adiloglu, R. Annies, F.-F. Henrich, A. Paus, and K. Obermayer Neural Information Processing, Technische Universit¨at Berlin, Germany e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
11
12
K. Adiloglu et al.
selection – can be a valuable alternative. Note, that performance analysis in active learning is challenging due to the fact that the i.i.d. assumption on the training data is violated. Theoretical work has shown that active learning strategies can achieve an exponential reduction of the generalization error with high probability (see [2,5]). In this contribution, we put the emphasis on a mathematically rigorous derivation of hard upper bounds for the generalization error. This is in contrast to other studies which give bounds in probability (see [2]) or discuss asymptotic behavior (see [6]). After discussing a query filtering approach (kernel billiard) to active learning we move on to give a mathematical analysis of the generalization error in a spherical subdivision scheme. In this approach, we successively cut a version space – the set of models consistent with the seen training data (see [7]). This leads to hard upper bounds on the generalization error, which decrease exponentially in some cases. While we do accommodate for non-separable data in the case of kernel billiard we restrict the analysis of the generalization error to the linearly separable case. Finally, we consider practical applications to psychoacoustic experiments—a case of high costs for acquisition of labels. Here, presenting examples to human subjects can be shortened using active learning techniques. The trained classifiers are used to construct tools for sound design that can predict the perception of synthesized sounds.
2 Active Learning by Reduction of Version Space Our approach to active learning is to generalize the best strategy for the game of High-Low. The learner’s task of this game is to guess an unknown number on an interval between 1 and 100. The only feedback the teacher gives is, if the student’s guess was too high or too low. The best strategy for this game is to always ask for the middle number of the remaining interval (interval halving) and achieves an exponentially decreasing interval on average. Let us first consider a linearly separable binary classification problem on the ndimensional sphere S n = {x ∈ Rn+1 | x, x = 1}. Later on, we will show how to extend this approach to noisy, non-linear classification problems in Rn . Machine learning of spherical (or directional) data has become an active subject of research during the last couple of years (see [8]). On S n , we use hemispheres
+1 : x, p ≥ 0 n c : S → Z2 , c(x) := −1 : x, p < 0 to classify the given data. Here, p ∈ S n is the center of the hemisphere, and ., . denotes the Euclidean scalar product in Rn+1 . In this setup, the sphere serves a twofold purpose: firstly, it represents the data space. Secondly, it functions as the space of classifiers, because each hemisphere classifier c is uniquely determined by its center p. We assume that there exists some unknown true classifier c∗ , which also takes the form of a hemisphere.
Geometrical Approaches to Active Learning
13
Fig. 1 Lhs: an illustration of an active learning algorithm. While usually the trainset {(xi , yi } consists of datapoints which are randomly chosen from an input space, it is preferable to use a query algorithm to optimize the selection if the labels yi given by an oracle are costly. After the training algorithm searched for the learners parameters w which minimize the generalization error, the query algorithm selects the next most informative datapoint, depending on the state w of the learner. Rhs: an illustration of a versionspace is shown which is given by the intersection of three hemispheres, corresponding to three labeled datapoints
If (x, 1) is a labeled data point, it follows that c∗ is contained in the closed hemisphere around x. Thus, given a labeled set {(x1 , y1 ), . . . , (xI , yI ))}, c∗ is contained in the intersection of the corresponding hemispheres. This intersection, which we will call version space V , corresponds to the remaining interval in the game of High-Low. Geometrically, V is a convex spherical polytope (see Fig. 1, rhs.). Each labeled data point (xi , yi ) corresponds to an orthogonal hyperplane Hi , which divides the classifier sphere into two hemispheres Ci± , one of which (depending on the data points true label ys ) contains the true classifier c∗ . At each iterative step , given the labeled training set T , the version space V is the intersection of corresponding hemispheres (Hi )i≤ . In addition to the labeled data points, we assume a pool of unlabeled data points P = {xi ∈ S n | < i ≤ + k}. The corresponding hyperplane Hi to each unlabeled datapoint xi divides the version space V in up to two partitions πi+1 and πi−1 , which contain all classifiers, which assign the same label +1 or −1 to xi , respectively. If a hyperplane Hi does not intersect the version space, then the complete version space labels x i consistently, a query for the label would be unnecessary. In general, given a flat probability distribution for the unknown true classifier ∩C + || ∗ c , the probability for p(yi = 1) = ||V||V || . The informativeness, measured as entropy H = p(yi ) ln p(yi ), of a query is maximized if its hyperplane Hi divides the version space into two partitions with equal area. Since in practice it is not possible to derive the version space area analytically, numerical approaches have to be applied. A simple Monte Carlo integration does not scale well in high dimensional data spaces and for small version space areas. Therefore, we use kernel billiard sampling (see [9, 10]). This algorithm tracks the
14
K. Adiloglu et al.
path of a point (the billiard ball b) on the classifier hypersphere surface inside the version space, which is given a direction v and moves along a geodesic. Both billiard ball position and direction can be expressed as a linear combination of the training datapoints xi . In each algorithm iteration, the intersection point of the trajectory with the version space boundary is determined and the direction of the billiard point is reflected at the boundary. For a fixed number of reflections, its path length li± inside of each version space partition πi±1 is integrated. Given an ergodic trajectory, the ratio between both path integrals equals approximately the ratio of the version space areas. To extend our approach to linear separable data in Rn we introduce an additional dimension and embed the data space D as affine hyperplane (xn+1 = 1). By normalization we then project the datapoints onto the sphere S n . The intersection of each classifier hyperplane H with the embedded data D space leads to an affine separating hyperplane in D (see gnomonic projection in Section 3). Nonlinear classification can be achieved similar to support vector machines by replacing the linear classifier by a kernel classifier. l αi k(xi , x) c(x) = sign i=1
If we choose a symmetric and positive definite kernel k, it is known from the theory of reproducing kernel Hilbert spaces (see [11]) that this classifier can be expressed as a linear classifier in a feature space F h(x) = sign (p, φ(x)F ) where φ(x) is a mapping from the data space to the feature space and w = i αi φ(xi is as linear combination of mapped training datapoints. Since the datapoint vectors enter the equations of the kernel billiard algorithm always only as inner products, these can be replaced by kernel functions. Training errors can be taken into account by allowing soft version space boundaries: αi k(xi , x) ≥ −λyj αJ k(xi , x) yj where λ controls the softness. This λ enters the kernel matrix as term on its diagonals. Figure 2 shows the results of an experiment using the 5-dimensional thyroid benchmark dataset1 and a Gaussian kernel function. To approximate the generalization error, 100 realizations of the dataset have been split into 140 train vectors and 75 test vectors. It can be seen from the figure that using our active learning approach, the learning curve decreases exponentially and the minimal generalization error is achieved already after 40 labeled datapoints. In contrast, with inductive learning, all 140 datapoints have to be labeled to achieve the minimal generalization error. 1
http://archive.ics.uci.edu/ml/machine-learning-databases/thyroid-disease/new-thyroid.data
v
0.12
700 passive active 600 time
0.1
500
0.14
x V
b
t
x2
logarithmic error
x1
15
400
0.08
300 0.06
200
computation time /sec
Geometrical Approaches to Active Learning
100
x3 x4
0.04 0
20
40
60
80
100
120
0 140
number of training points
Fig. 2 Lhs: this illustration of the kernel billiard algorithm shows a part of the surface of the unit sphere. The intersection of four hemispheres, which correspond to four labeled training data points x1 , ..., x4 , forms the version space V . An unlabeled datapoint x partitions V into two halves. To estimate the ratio of the areas of these partitions, a mass point generates a trajectory t Rhs: logarithmic plot of the learning curves for inductive and active learning using the UCI-thyroid dataset. The dashed line shows an exponential regression fit for the first 35 datapoints. The dotted line shows the computation time for each iteration
The computational costs are independent on the size of the version space, since they do not depend on the length of the trajectory segments. They are linear in the size of the data pool and quadratic in the size of the train set.
3 Error Bounds for Spherical Subdivision We will now analyze the performance of an active learning algorithm on the n-sphere S n which is based on the concept of version space as defined in Section 2. Due to computational complexity, it is practically impossible to explicitly compute the version space. One solution is to work with spherical simplices, which are the generalization of triangles to higher dimensions. The strategy is now to use an equilateral simplex as initialization and then continuously subdivide it along one of its edges of maximal length. For each iteration, we compute a unit normal vector u ∈ S n for the plane which cuts the simplex orthogonally to the edge e of maximal length. Depending on the label of u, we now continue with either part of the simplex. In a final step, one may choose a learned classifier from the reduced version space. A computationally feasible choice is the simplex’ center of mass. Alternatively, one could use a SVM to approximate the center of the largest inscribable hypersphere of the simplex (see [12]). In order to make an exact mathematical analysis feasible we always require the label of the optimal data point u instead of using an approximation. Figure 3 illustrates one iteration of the simplex algorithm on the sphere S 2 . The worst case generalization error after each iteration can be computed by evaluating the maximum
16
K. Adiloglu et al.
c
longest edge m
u o
a
b
Fig. 3 The drawing on the left shows one iteration of the simplex algorithm for spherical triangles on the unit sphere S 2 . The current version space is the spherical triangle (a, b, c). The longest edge (b, c) is about to be cut at its midpoint m. Together with the origin o, the vertex a and the point m define a plane in R 3 one of whose unit normal vectors is u ∈ S 2 . Depending on the label of u, the new triangle will be either (a, b, m) or (a, m, c). The diagram on the right shows learning curves on S 49 ⊂ R 50 . The figure shows the average maximal edge length (upper solid line), the average maximal distance from the simplex’s center of mass (upper dashed line), and the average approximate generalization errors for the uniform (lower dashed line) and an aspherical (lower solid line) data density as a function of the number of selected training examples. Error bars indicate variances, however, only the approximate generalization error for the aspherical data density shows large fluctuations between simulation runs
spherical distance of the chosen classifier to the vertices of the spherical simplex. To be explicit, the following statement holds: Proposition 1. If S is the current simplex from the simplex algorithm with vertices v1 , . . . , vn+1 ∈ S n , and c ∈ S some classifier, d G (c∗ , c) ≤ max d(vi , vj ) = max arccosvi , vj . i,j
i,j
This bound is tight and attainable if we allow any element of the version space to be the learned classifier. Moreover, if c ∈ S denotes the center of mass, then d G (c∗ , c) ≤ max d(c, vi ) = max arccosc, vi . i
i
is a tight and attainable upper bound for the generalization error. In order to include the classical case of linear classifiers with bias in Rn into our spherical setup, we use gnomonic projection ϕ : S n ⊃ {xn+1 > 0} → Rn ,
(x1 , . . . , xn+1 ) →
(x1 , . . . , xn ) xn+1
to map the data and the classifiers to Euclidean space. Note that our separating great spheres are projected to affine hyperplanes in Rn . Now the derived error bounds apply as long as the data in Rn is distributed according to the density
Geometrical Approaches to Active Learning
ωx =
17
1 (1 + x2 )
n+1 2
dx1 ∧ . . . ∧ dxn .
(1)
We may adapt the bounds to more general, that is, non-uniform, data densities by using a scaling function f : S n → R+ and expressing the new density as ω = f ω, where ω is the uniform density. Changing the density forces a change in the shape of geodesics, so we cannot compute them explicitly anymore. As a side-effect, the quality of solutions obtained by the kernel billiard algorithm degrades as soon as the data is not distributed uniformly on the sphere. For simplex-halving, we obtain the following approximations:
Proposition 2. Denote by d G the generalization distance induced by the scaled density ω = f ω where f : S n → R+ is some positive smooth scaling function. If supx∈S n |1 − f (x)| < ε for some ε > 0 then
d G (c1 , c2 ) ≤
(1 + ε)Vol(S n ) d(c1 , c2 ), n) π Vol(S
where d is the canonical geodesic distance of S n . In this formula, Vol(S n ) := S n ω n n ) := n ω := f ω. and Vol(S S ω denote the volumina of S with respect to ω and Proposition 3. Let S be the current simplex from the simplex algorithm with vertices v1 , . . . , vn+1 ∈ S n , and c ∈ S ⊂ S n an arbitrary classifier, not necessarily the center of mass. If c∗ ∈ S n denotes the unknown true classifier, the generalization error of c is bounded by
d G (c, c∗ ) ≤
(1 + ε)Vol(S n ) max d(vi , vj ), n) i,j π Vol(S
where d(vi , vj ) denotes the spherical distance of the vertices. We note that for special data spaces we can analytically deduce an exponential decrease of the generalization error – regardless of the data density (see [13]). The above results can be applied to the performance assessment of active learning algorithms by comparing them to spherical subdivision.
4 Application in Psychoacoustic Experiments Perception of everyday sounds is in the focus of psychoacoustical research. Firstly to learn more about sound perception and recognition in natural and urban environments, secondly to gain knowledge of how to adapt sound generating processes of technical devices to support their usability, design sounds for auditory displays or reflect or induce certain emotions of the listener. To achieve this we need to analyze the relation between sound and human recognition. Training predictors is a way to study this relationship.
18
K. Adiloglu et al.
While direct analysis of the neural processes that effect the recognition of sounds is still not possible, we can collect data from human subjects in psychoacoustic experiments. Subjects listen to sounds and categorize them. This data can be used to model the human categorization, i.e. estimate the mapping sound → class. As tools for these experiments we need (a) a set of sounds and (b) an adaptive prediction model that can be trained by examples. While it is possible to use either recorded as well as synthesized sounds, the latter have the advantage that one can generate arbitrarily many of them under similar conditions. Since we are studying realistic environmental sounds we chose physical modeling as sound syntheses process. The approach is to model the sound sources physically by sets of equations describing oscillations that arise from collisions of objects or other impacts. In real-time simulations of the temporal processes the resulting signal is calculated. This was done with the sound design framework The Sounding Object (see [14]). In particular, the physical model we used simulates impact sounds with acoustically varying material types: glass, metal, plastic, wood. It is controlled by six independent control parameters. During the experiments subjects had to classify the sounds to one of the four classes one by one. The acquisition of labels takes time and the subjects cannot label a large amount of sounds subsequently, since people get tired over time, while quality of the labels decreases. However, a large amount of unlabeled examples is at hand or can be generated. Thus we need to make sure that the necessary amount of labels can be kept small. As a prediction model we used therefore a kernel perception that is adapted online and uses version space reduction (see Section 2) to query actively the next most informative sound from a pool of data points, see Fig. 4a. The psychophysical experiments were done with both active and standard inductive learning for comparison. First inductive learning was applied and using the
40 active learning
no. of labels trained
35
inductive learning
30 25 20 15 10 5 0
0
2
4
6
8
10
subject
a A parameter vector x is used to synthesize a new sound by the physical model (synthesizer). After listening the subject labels it with y ∈ {glass,metal,plastic,wood}. The learning machine updates its prediction model and the querying algorithm suggests a new sound
b Plotted is the number of labels (y-axis) required for training to reach a test error rate of 0.35 for active and inductive learning for each subject (x-axis). The task was to train a material predictor for sounds, given human labeled data
Fig. 4 Psychoacoustic experiment using active learning
Geometrical Approaches to Active Learning
19
resulting data an active learning run was simulated, i.e. the order of examples during the training was optimized by the query algorithm. The learning progress was measured in terms of a test error that was calculated from a set of 100 examples, labeled by the subject, which were not available for training. The test error shows how well a trained classifier mirrors the perception of one subject. Figure 4b shows the difference of required labels to reach the same generalization in terms of the test error. To reach a prediction error of 35% the standard inductive learning required on average 2.49 times the amount of labels than active learning did. These experiments show that active learning can be used in psychoacoustic experiments to train predictors while keeping the amount of necessary labels low. Acknowledgment We thank our partners of the EU-project CLOSED2 : the Perception and Sound Design team at IRCAM3 for conducting psychoacoustic experiments and the VIPS lab4 for contributing the physical sound model. The project is funded by the Sixth Framework Programme of the EU.
References 1. S. Fine, R. Gilad-Bachrach, and E. Shamir. Learning using query by committee, linear separation and random walks. Theoretical Computer Science, 284:25–51, 2002. 2. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3):133–168, 1997. 3. M. Opper, H. S. Seung, and H. Sompolinsky. Query by committee. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 287–294, Pittsburgh, PA, 1992. 4. S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2:45–66, 2001. 5. M.-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 65–72, 2006. ACM Press, New York. 6. Francis R. Bach. Active learning for misspecified generalized linear models. In B. Sch¨olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 65–72. MIT Press, Cambridge, MA, 2007. 7. T. M. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203–226, 1982. 8. A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6:1345–1382, 2005. 9. R. Herbrich, T. Graepel, C. Campbell, and C.K.I. Williams. Bayes Point Machines. Journal of Machine Learning Research, 1(4):245–278, 2001. 10. P. Rujan. Playing Billiards in Version Space. Neural Computation, 9(1):99–122, 1997. 11. G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990. 12. R. Herbrich. Learning Kernel Classifiers–Theory and Algorithms. Adaptive Computation and Machine Learning. MIT Press, 2002. 13. F.-F. Henrich and K. Obermayer. Active learning by spherical subdivision. Journal of Machine Learning Research, 9:105–130, 2008. 14. D. Rochesso and F. Fontana. The Sounding Object. Mondo Estremo, Firenze, Italy, 2003. 2 3 4
http://closed.ircam.fr, FP6-NEST-PATH “measuring the impossible” project no 29085. Institut de Recherche et Coordination Acoustique/Musique, Paris, France. Vision, Image Processing and Sound lab., University of Verona, Verona, Italy.
Detecting Drowsiness in Driving Simulation Based on EEG Jia-Wei Fu, Mu Li, and Bao-Liang Lu∗
Abstract Although sleep and wake can be easily distinguished by using EEG, how to detect a drowsy state before subject falls asleep is of more importance in avoiding fatal consequences in accidents behind steering wheel caused by low level vigilance. Starting with the classical problem of difference between wake and sleep, we propose a method based on probabilistic principle component analysis (PPCA) and succeed in detecting drowsiness distinguished from wake and sleep based on EEG.
1 Introduction Vigilance is a term which describes the ability of observers to maintain their focus of attention and to remain alert to stimulation for prolonged periods of time. In our daily lives, especially in human-machine interaction systems, it is very important for the operator to retain vigilance above a constant level. For example, low vigilance due to drowsiness is known to be a major factor in automobile crashes [1]. Once the subject feels too tired to operate the machine, he should be warned to stop working. During the past few decades, some studies have shown that information related to vigilance can be found in EEG [2]. Comparing to other physical and physiological signals, e.g. facial expression and skin potential [3], EEG represents the nature of human brain activity and may have some advantages in time resolution and accuracy. The previous results demonstrate some specific changes in EEG spectrum [4]
J.-W. Fu, M. Li, and B.-L. Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail: {gawafu, mai lm, bllu}@sjtu.edu.cn ∗
This work was partially supported by the National Natural Science Foundation of China under the grants NSFC 60473040 and NSFC 60773090, the Key Laboratory for Intelligent Computing and Intelligent Systems, Ministry of Education and Microsoft, and the Okawa Foundation Research Grant.
B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
21
22
J.-W. Fu et al.
while the vigilance state changes, e.g. the increase in slow activity and decrease in fast activity while the subject falls asleep. However, usual quantitative approach shows inter-subject variability of key parameters denoting decreased alertness [7]. In previous works, audio or visual stimulus is used and the subject is required to give some responses [8]. Reaction time performance is the measurement for vigilance. In this mode, supervised learning methods such as artificial neural network (ANN) can be used to classify different states of vigilance. But stimulus may introduce some noise. So in [10], the author proposed a semi-supervised learning algorithm which can quickly label huge amount of data. Here we propose another kind of semi-supervised learning method based on probabilistic principle component analysis (PPCA) to classify EEG data with little label information. Our main goal is to recognize some state other than clear-headed. By using multichannel EEG signals, we start with a 2-state problem to distinguish sleep and wake (clear-headed) states. We apply PPCA to EEG data with little label information to extract useful features, and then perfectly classify sleep and wake. With a drowsy state added, we use one-vs-one strategy to decompose the problem. The final prediction is generated by voting. This paper is organized as follows. In section II, we introduce our experiment setups of driving simulation. Method used to analyze the EEG signals is in section III. In section IV, we give some results of our research in both 2-state and 3-state problems. Finally, conclusion is in Section 5.
2 Experiment Setup A driving simulation experiment (see Fig. 1) was designed to record EEG data with different vigilance level. In a sound proof room, the subject was required to drive with a steering wheel. The simulating scenes was displayed on a 19 LCD in front
2 1 3
Fig. 1 Driving simulation environment. Driving scenes were displayed on the LCD screen (3). EEG is captured from scalp by the electrode cap (1), and the subject face expression is recorded by a DV (2)
Detecting Drowsiness in Driving Simulation Based on EEG
23
of him. To make the subject feel drowsy, the experiment map just consisted of two long straight roads and two spin turns. After 10 min practice, the subject drove the car along the road and tried to keep the speed between 40–60 km/h. Meanwhile, two DV cameras were used to record the subject’s facial expression and the screen. Since the speed was always changing, once the subject was to tired to control the car, we could observe this by checking the video tape. The experiment lasts one and half an hour. Ten healthy young volunteers, aged 18–28 years old (all male), took part in our driving simulation experiment. They were required to abstain from alcohol and caffeine one day before experiment. During the experiment, the subject was fitted with a 62-channel electrode cap. All Ag/AgCl electrodes were arranged according to the extended international 10–20 system, with bipolar references at papillary place behind ears. The contact impedance between electrodes and skin was kept less than 50 k. The EEG data was recorded with a sampling rate of 1,000 Hz. After the experiment, three time segments (sleep, drowsy and wake) are labeled according to the videos. Usually at the beginning, the subject is clear-headed and this state may last for several minutes. At this time, we can see the subject was driving with eyes open and able to keep the speed. With time passing, the subject fell asleep (close his eyes and stop driving). Such segments were labeled as sleep. The segment between sleep and wake is labeled as drowsy. In the video, we can observe that the subject rapidly blinks and sometimes fails to keep the speed during the drowsy state.
3 Methods The data processing procedure is described in Fig. 2.
3.1 Preprocessing Since different frequency band contains specific information, we have to filter EEG data before processing. Considering the possible inter-subject difference, we choose T NxT
DxT
Raw EEG
MxT
f1
Posterior PPCA Probability
N Band-pass Filter
C1
DxT
C2
MxT
f2
PPCA
Fig. 2 Data processing flowchart. Assume the raw data a channel by time (N -by-T ) matrix, after filtered by a band-pass filter, PPCA is used to whiten the raw data into a M-by-T matrix. Then the features f1 and f2 for classes C1 and C2 are calculated, respectively
24
J.-W. Fu et al.
finite impulse response (FIR) filter and filter the data into six different frequency bands, which are α (8–12 Hz), lower α (8–10 Hz), upper α (10–12 Hz), β (19– 26 Hz), γ (38–42 Hz) and broad band (8–30 Hz) [5]. We benefit from this preprocessing that usual artifacts like EOG and EMG are also removed.
3.2 Training The training stage needs three segments of data labeled as wake, drowsy and sleep, respectively. We decompose the problem into three 2-state sub-problem. At each time two of them are selected to train a model to classify the chosen two states. Totally there will be three models generated. For example, when training the model to distinguish wake from drowsy, the filtered data segments of wake and drowsy are first reduced into D dimension. Then a M-dimension latent Gaussian random variable for each class is assumed, whose parameters are calculated by another turn of PPCA.
3.3 Main Algorithm Given a N -channel spatial EEG signal x at time t, which is a N-by-1 vector, we assume x a N -dimension random variable with conditional distribution on latent variable z is defined as p(x|z) = N (x|Wz + µ, σ 2 I),
(1)
where z is a Gaussian random variable with p(z) = N (z|0, I), W is a N × D linear transformation matrix, and µ and σ 2 govern the mean and variance of x. The marginal distribution of x is given by p(x) = N (x|µ, WWT + σ 2 I).
(2)
The parameters of this model is determined by using maximum likelihood, called probabilistic principle component analysis (PPCA) [11]. Assume that we have T signals, denote by X = {xt }Tt=1 , with mean x and covariance S T T 1 1 x= xt , S = (xt − x)(xt − x)T . (3) T T t=1
t=1
The log likelihood function corresponding to (2) is ln p(X|µ, W, σ 2 ) =
T
ln N (xt |µ, WWT + σ 2 I).
(4)
t=1
By setting the derivative of (4) with respect to µ, W and σ 2 to zero respectively, we get
Detecting Drowsiness in Driving Simulation Based on EEG
25
ˆ = x µ = U(L − σ 2 I) 21 W σˆ 2 =
1 N −D
N
(5) (6) λi ,
(7)
i=D+1
where (λ1 , . . . , λN ) are the eigenvalues of the covariance matrix S with descending order, L is the diagonal matrix diag(λ1 , . . . , λD ), and U is a N × D matrix whose columns are eigenvectors of S corresponding to the eigenvalues of L. Then the sphere transformation P is given by † (x − µ), ˆ P (x) = L− 2 W 1
(8)
which sphere x into a zero-mean and unit-covariance D-dimension Gaussian random variable, here the symbol † means pseudoinverse. Further, assume x consists of two different kinds of EEG signals C1 and C2 , namely sleep and wake. Let Xi = {x|x ∈ Ci }, and Bi be the corresponding transformed signals, i.e. Bi = P (Xi ), for i = 1, 2. By define the covariance matrix Si of Bi as (3), we could get S1 + S2 = I [6, 9]. Assume a M-dimensional latent variable and apply PPCA for each class, the marginal distribution of b in Bi is given by i ) with p(b) = N (b|0, i = Ui (Li − σˆ 2 )UT + σ 2 I, i i i
(9)
σˆ i2
where Ui , Li and are similarly to those defined in as Eqs. (5) through (7). In order to classify a new coming trial X into class C1 or C2 , which is a N × T matrix, we first transform X into B using transformation P , then consider the feature fi
fi =
T
i ) = ln p(B |Ci ) ln N (bt |0,
(10)
t=1
= ln p(Ci |B ) + c, for i = 1, 2 where c is constant. We assume b is conditional independent on Ci and the prior probability p(Ci ) and p(B ) are constant, then fi is the logical class posterior probability p(Ci |B ) plus a constant. The time complexity of our method is linear with the number of samples. The first turn of PPCA costs O(N 2 × D × T ) time, and the second one costs O(D 2 × M × T ). Since the parameters N, D and M are very small comparing to the number of samples T , the time complexity is O(T ).
3.4 Voting In predicting process, a new coming trial is first filtered into a specific band which performs best in the training process. Each of the three models trained will give
26
J.-W. Fu et al.
a pair of preferences proportional to logical class posterior probabilities to which state that trial belongs, and then give a prediction result. The state predicted by most models wins the voting and is the output.
4 Results Five subjects who had shown a tendency to fall asleep during the driving simulation were chosen for further analysis. At first, we try to distinguish wake state from sleep state during the driving experiment using EEG. After that, we add one middle state between clear-headed and falling asleep to try to distinguish drowsy state from wake, which might be more meaningful.
4.1 Two-State To simplify the problem, we selected two kinds of data which were judged as sleep and wake, each of these data is 10 min long. The data was divided into ten parts for cross-validation, seven for training and the rest for testing. We repeated this procedure for ten times by selecting different parts for training to get average accuracy. In the training stage, we choose the number of dimension D that captures 90% eigenvalue for the first time applying PPCA (usually 10), and then a fixed number of dimension M = 3 for the second time. Then in the test stage, data with 5 s time window and 2.5 s overlap were used. A trial is classified into wake if the corresponding feature f1 is larger than f2 and vice versa. The test accuracies for five subjects are shown in Table 1. Six commonly used frequency bands are considered, which are α, lower α, high α, β, γ and a broad band, corresponding to 8–12, 8–10, 10–12, 19–26, 38–42 and 38–42 Hz, respectively. The best result for each subject is higher than 98% (100% for subject 2 and 98% for subject 1, 3, 4, and 5, respectively), which is obtained with different frequency bands. Notice that β, γ and 8–30 Hz give average subjects accuracies 97%, 98%, and 97%, respectively. Further, γ and 8–30 Hz are more stable than other frequency band across subjects. Table 1 The test accuracies (%) of five subjects. Data are filtered into six common considered frequency bands Subject
1 2 3 4 5
Length (min) Wake
Sleep
13 6 11 13 10
13 6 11 13 10
α
Lα
Uα
β
γ
8–30 Hz
89.6 84.0 96.9 88.1 93.8
82.9 91.3 87.6 88.9 91.1
93.2 85.3 95.9 86.3 96.7
98.2 100.0 98.4 98.1 89.1
94.1 100.0 98.5 98.1 98.4
98.8 96.8 96.1 95.7 96.2
Detecting Drowsiness in Driving Simulation Based on EEG
27
4.2 Three-State Since the two states of sleep and wake could be classified, we then added a transition state named drowsy. Three classifiers were trained between one state and another. In testing stage, the three classifiers vote and the state who wins in 2 classifiers will be the output. Again, the six frequency bands were considered. The test accuracies for the same five subjects are shown in Table 2. Obviously, the accuracy declines a lot with a third state added. Since the states of sleep and wake can be classified correctly, this means the drowsy state is quite similar to the other two. In transition from wake to sleep, there may be fluctuations of changes between different states. When applying to the whole data of subject 3, a drowsy stage from 5 to 15 min is recognized, which was recognized as wake in 2-state condition. The overlapping from 20 to 40 min shows that the subject’s sleep was not deep enough (see Fig. 3). And after 50 min, a continuous stage of sleep is recognized, which was labeled as sleep with high probability in 2-state condition. Table 2 The test accuracies (%) of five subjects for 3-state. Data are filtered into six common considered frequency bands Subject
1 2 3 4 5
Length (min) Wake
Drowse
Sleep
8 6 6 8 10
8 6 6 8 10
8 6 6 8 10
α
Lα
Uα
β
γ
8–30 Hz
77.8 87.6 87.0 78.5 78.4
77.5 87.3 79.5 78.1 66.1
73.7 78.2 84.6 75.2 80.8
88.5 78.2 94.9 86.0 80.8
92.3 82.6 95.8 93.5 82.7
82.0 78.0 91.6 82.3 76.1
Drowse
Fig. 3 The performance of subject 3 during an 86-min length driving simulation experiment. X-axis is the time in minute. Nine face video snippets are presented in the top, whose times are corresponding to the x-position of the numbers below them. The classification results are given in the WAKE, DROWSE and SLEEP rows; each point means a 5 s trial
28
J.-W. Fu et al.
Comparing with results in 2-state condition, we can clearly extract some drowsy states before the subject falls asleep. This will help us to warn the driver when he starts to lose his alertness.
5 Conclusions and Future Works In this paper, we extend our previous work and propose a method based on probabilistic principle component analysis (PPCA) to distinguish wake, drowsy and sleep in driving simulation experiment. Though it is quite difficult to distinguish drowsy and sleep without overlapping, the wake state is different from the other two. Our method can recognize the drowsy state when the subject is not clear headed (drowsy and sleep). This is very useful in preventing automobile accidents. After training with data of around 20 min (6–8 min for each state), we can directly use our method as a real time classifier to estimate driver’s vigilance state.
References 1. George, CF., Boudreau, AC., Smiley, A.: Simulated driving performance in patients with obstructive sleep apnea. American Journal of Respiratory and Critical Care Medicine. 154(1), 175–181 (1996) 2. Dement, W., Kleitman, N.: Cyclic variations in EEG during sleep and their relation to eye movements, body motility, and dreaming. Electroencephalography and Clinical Neurophysiology. Suppl. 9(4), 673–690 (1957) 3. Hori, T.: Electrodermal and electro-oculographic activity in a hypnagogic state. Psychophysiology. 19(6), 668–672 (1982) 4. Makeig, S., Jung, TP.: Changes in alertness are a principal component of variance in the EEG spectrum. Neuroreport. 7(1), 213–216 (1995) 5. M¨uller-Gerking, J., Pfurtscheller, G., Flyvbjerg, H.: Designing optimal spatial filters for single-trial EEG classification in a movement task. Clinical Neurophysiology. 110(5), 787– 798 (1999) 6. Ramoser, H., Muller-Gerking, J., Pfurtscheller, G.: Optimal spatial filtering of single trial EEG during imagined handmovement. IEEE Transactions on Rehabilitation Engineering. 8(4), 441–446 (2000) 7. Oken, BS., Salinsky, MC., Elsas, SM.: Vigilance, alertness, or sustained attention: physiological basis and measurement. Clinical Neurophysiology. 117(9), 1885–1901 (2006) 8. Makeig, S., Jung, TP., Sejnowski, TJ.: Using feedforward neural networks to monitor alertness from changes in EEG correlation and coherence. Advances in Neural Information Processing Systems. 8, 931–937 (1996) 9. Xu, W., Guan, C., Siong, CE., Ranganatha, S., Thulasidas, M., Wu, J.: High accuracy classification of EEG signal. Proceedings of 17th International Conference on Pattern Recognition. 391–394 (2004) 10. Shi, LC., Yu, H., Lu, BL.: Semi-supervised clustering for vigilance analysis based on EEG. Proceedings of 20th International Joint Conference on Neural Networks. 1518–1523 (2007) 11. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. Journal of the Royal Statistical Society. 61(3), 611–622 (1999)
Multi-Task BCI for Online Game Control Qibin Zhao, Liqing Zhang, and Jie Li
Abstract In this paper, we develop a new type of brain-computer interface (BCI) which is able to control a computer game by motor imagery electroencephalogram (EEG). We propose a new framework of feature extractions using common spatial frequency patterns (CSFP) for classification of motor imagery EEG. The aim of our BCI system is to provide an on-line “hit rat” game control with short response time and subject-specific adaptation of system parameters. Our BCI system is able to detect three different motor imagery-related brain patterns (imagination of limb movements: left hand, right hand and both feet) from the ongoing brain activity by using only five EEG channels. The best hit accuracy of the game with fast response time attained by subject 2 is about 73%, which demonstrates that our BCI system has the ability of providing much fast BCI of even 1 s per command.
1 Introduction Translating thoughts into actions without acting physically has always been material of which dreams and fairytales were made. Recent developments in brain-computer interfaces (BCIs) technology, however, open the door to making these dreams come true. The BCIs are devices that allow interaction between brain and artificial devices; it is especially promising for assisting patients who have seriously disabled motor functions, such as those who are completely paralyzed with amyotrophic lateral sclerosis [1–7]. The feasibility of this type of communication depends on the extent to which the EEG associated with several mental processes can be reliably recognized by BCI system. Based on event-related modulations of the µ or β rhythms of sensorimotor cortices, the Graz BCI system [8] achieved accuracies of over 96% in a ternary classification task with the trial duration of 8 s analyzed by Q. Zhao, L. Zhang, and J. Li Lab for Perception Computing, Shanghai Jiao Tong University, China e-mail:
[email protected];
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
29
30
Q. Zhao et al.
evaluation of adaptive autoregressive models (AAR). In [9, 10], the Common Spatial Subspace Decomposition (CSSD) method was proposed for classification of finger movement achieving a satisfactory performance for BCI competition 2003-data set IV. The Common Spatial Patterns (CSP) approach [11] is introduced for motor imaginary pattern discrimination by extracting event-related (de)synchronization (ERD/ERS) [12]. There are also some other algorithms for motor imagery EEGs that are based on ERD/ERS [13–18]. However, the best discriminative pattern does not exist only at spatial pattern but also lies in a particular combination of spatial and frequency patterns. In this paper, the common spatial frequency patterns (CSFP) strategy which allows the simultaneous optimization of spatial and frequency patterns are proposed for feature extraction and we develop a new BCI game based on motor imagery EEG.
2 Methods 2.1 Time Frequency Analysis Time-frequency analysis of event related potentials of EEG and MEG data has recently attracted much attention. One way of achieving an improved trade off between temporal resolution and frequency resolution is through the continuous wavelet transform which varies the window length over frequencies. The continuous wavelet transform (CWT) of the EEG signal x c,k (τ ) is defined as +∞ 1 τ −t c,k c,k dτ, (1) x (τ )ψ W (a, t) = √ a a −∞ where t denotes the time shift, a denotes the scale and ψ is the wavelet function, and W c,k (a, t) represents the CWT of the data segment x c,k (τ ), c represents each channel and k represents EEG trial number. Although many types of wavelets exist, the complex Morlet wavelets are appropriate for time-frequency analysis of EEG signals. Complex Morlet wavelets of Gaussian shape in time (standard deviation σt ) are defined as 1 2 2 e(−t /2σ ) e2iπf t , ψ(t) = √ 2 2π σ
(2)
where σ is a parameter defining how many oscillations are included in the analysis denoted the width (m) of the wavelet given as m = 2π σ oscillations. Since the scale parameter a is related to frequency f by the relation a = ω0 /(2πf ), we can obtain the simple formulate of Wˆ c,k (f, t) which denotes the time-frequency coefficient at channel c, frequency f and time t of k-th trial of the EEG signals given by x c,k (τ ).
Multi-Task BCI for Online Game Control
31
2.2 Common Spatial Frequency Patterns We will describe how the frequency pattern is subsequently taken into account. In general, using a wider frequency range from the acquired EEG signals can achieve higher classification accuracy in comparison with a narrower one. Then a wide frequency ranges containing all µ and β rhythm components are adopted to include all the important signal spectrums for motor imagery classification. In our experiments, the active frequency ranges for all subjects are almost located in the frequency band between 6 and 30 Hz. The frequency band 6–30 Hz is thus adopted in computing the CWT for obtaining the time-frequency distribution of each EEG trial. Let us define c,f
pk
2 = Wˆ c,k (f, t) ,
(3)
c,f
Frequency (Hz)
where pk represents the time-vary energy of EEG trial k at channel c and frec,f quency f . The values of pk with different frequency f and time t of trial k can form a 2D time-frequency plot. Figure 1 illustrates the mean CWT obtained by directly averaging all the CWT data for each class in the training data set. From the first row of this figure, µ rhythm ERD on electrode C4, the contralateral side to the
C3
FC3
CZ
FC4
C4
6
10 15 20 25 30
4 2 2
2
2
2
2
Frequency (Hz)
Time (s) C3
CZ
FC4
C4
10 15 20 25 30
6 4 2 2
2
2
2
2
Frequency (Hz)
Time (s) C3
FC3
CZ
FC4
C4
10 15 20 25 30
8 6 4 2 2
2
2
2
2
Time (s)
Fig. 1 Average time-frequency distribution of five channels during left hand (first row), right hand (second row) and feet movements (third row) imagination for 2 s. In the first row, ERD phenomena during left hand imaginary is obvious in the right area for the channels C4 and FC4 and also the same ERD phenomena during right hand imaginary for the channels C3 and FC3 is indicated by the second row. Furthermore, the frequency range of ERD is slightly different for two different limbs movement imagination
32
Q. Zhao et al.
imagined hand, is clearly shown. The µ rhythm ERD at electrode C3 during the right hand movement imagination is also clearly seen in the second row. Furthermore, it is obvious that the frequency range of ERD is in fact not only subject-dependent but also class-dependent. It could be different among 6–30 Hz for each subject in different imagery state. Then, we can form a time-frequency distribution matrix for each channel and a spatio-temporal distribution matrix for each frequency bin, i.e., c,f c,f c,f T Uck = pk 1 pk 2 . . . pk m
and
f c ,f c ,f c ,f T Vk = pk1 pk2 . . . pkn ,
(4)
where Uck denotes time-frequency matrix for EEG trial k at channel c, in which the f frequency varies from f1 to fm and Vk denotes spatio-temporal matrix for EEG trial k at frequency f , in which the channel varies from c1 to cn . In order to reorganize the entire time-frequency distribution of the EEG signal, we append the Uck of all channels or Vck of all frequency bins, i.e., ⎛
Uck1
⎛
⎞
⎜Uc2 ⎟ ⎜ ⎟ Yk = ⎜ .k ⎟ ⎝ .. ⎠ Uckn
or
f
Vk 1
⎞
⎜ f2 ⎟ ⎜ Vk ⎟ ⎟ Yk = ⎜ ⎜ .. ⎟ , ⎝ . ⎠ f Vk m
(5)
where Yk represents the new constructed time-frequency decomposition of EEG recordings for the k-th trial. Using this notation then the two class-covariance matrices are given as 1 =
k∈s1
Yk ∗ (Yk )T Yk ∗ (Yk )T = , , 2 trace(Yk ∗ (Yk )T ) trace(Yk ∗ (Yk )T )
(6)
k∈s2
where Yk ∈ R i×j denotes an EEG time-frequency matrix of the k-th trial, i is the number of channels number × frequency number (i.e., n × m), j is the number of samples in each trial, s1 and s2 refer to the two classes of the training data. The optimization criterion of CSFP is to find several spatial and frequency combination patterns described by W. The vector wl ∈ R d (d = n × m) which refers to the l-th row of W maximizes the difference in the average band power of the filtered signal while keeping the sum constant. W1 WT = D,
W2 WT = I − D,
(7)
where I is an identity matrix, D is a diagonal matrix. Using this decomposition matrix W, the TFD Yk of EEG signals are projected onto W Zk = WYk .
(8)
Multi-Task BCI for Online Game Control
33
The interpretation of W has two aspects: the decomposition matrix W can be diˆ cn ) or W = (W ˆ f1 , . . . , W ˆ fm ) ˆ c1 , . . . , W vided into several sub-matrixes, i.e., W = (W such that fm cn c ˆ ˆ f Vf , Wc Uk W or Zk = (9) Zk = k f =f1
c=c1
ˆ c represents the frequency combination patterns for each channel c, then where W ˆ c describes a set of frequency combination coefficients for a speeach row of W cific channel c. Based on this, the maximal discriminative direction of frequency combination patterns is different not only for different channels but also for difˆ f represents the spatial combination patterns ferent classes. Correspondingly, the W for each frequency f . Therefore, Zk denotes the maximal discriminative components for two classes which are obtained by optimizing the spatial and frequency patterns simultaneously. To further explore the implications of this decomposition, we will provide an interpretation into the spatial and frequency pattern. Therefore, let wp denote the p-th row of the decomposition matrix W, then the projected signal Zkp = wp Yk can be expressed as p
Zk =
fm cn
c,f
c,f
wp Yk ,
(10)
f =f1 c=c1 c,f
c,f
where Yk is the TFD of EEG trial k at channel c and frequency f , (i.e., pk ) c,f and wp is the scalar coefficient of spatial and frequency corresponding to the decomposition of CSFP. In order to find the maximal discriminative components of two classes EEG data, several spatial and frequency pattern wp will be calculated that maximize the variance for one class whereas the sum of the variances of both classes remains constant. In other words, the projected signal has high variance for one class and low variance for the other.
2.3 Classification The features used for the classification are obtained by decomposing the TFD of EEG according to Eq. (7). Specifically we applied a linear Support Vector Machine (SVM) as classification model and the 5-fold cross-validation is used to estimate the classification accuracy. To extend CSFP to multi-class problems, we apply the one versus the rest (OVR) strategy to obtain the multi-class CSFP decomposition.
34
Q. Zhao et al.
Fig. 2 Hit rat game. The top-left panel denotes the beginning of each trial; after 1 s, a rat appeared from one random hole, as shown in the top-right panel. At the end of trial, the bottom-left panel shows that the subject produced the correct command and the hammer moved and hit the rat. The bottom-right panel denotes the error case in which the subject hit the wrong target
3 Hit Rat Game For this new application, at the beginning of each trial the subject was presented with an on-screen image of a hammer and three holes located in left, right and center areas. After 1 s, a rat appeared from a random hole. The subject’s task was to control the hammer to hit the rat by imagined movement EEG according to position of the rat (see Fig. 2). Three imagination tasks (i.e., left hand, right hand and feet) were used to move the hammer toward the target (left, right, and center) and to hit the rat. The duration time of motor imagery varied from 1 to 5 s. The hitting process corresponding to the online classification results were presented as feedback for second. Therefore, each trial consisted of preparing for 1 s, imagination for 1–5 s, and hitting action (feedback) that lasted for 1 s. The system has subject-depended adaptation processing. After each session, the system parameters of feature extraction and classifier were updated automatically. From the on-line performance analysis, it is evident that this special BCI game was able to improve on-line performance to less than 1 s for detection of each command.
Multi-Task BCI for Online Game Control 50
40
35 1s 2s 3s 4s 5s
1s 3s
2s 1s
Error rate [%]
4s 30
20
1s 3s
3s 2s
4s
2s
5s
10
4s
5s
5s 0
S1
S2
S3
Fig. 3 Online performance with trial lengths of 1–5 s (response times) for three subjects
4 Results For illustrations of properties of the proposed method, we focus on a classification task of imaging left hand, right hand and feet that serve this purpose best. In this paper, we apply the proposed CSFP algorithm to EEG data recorded from five different subjects. All experiments start with training sessions in which the subjects performed mental motor imagery tasks in response to a visual cue. In such a way, brain activity can be obtained during the different motor imagery tasks and each task was carried out for 4 s. Only five channels (i.e., C3, Fc3, Cz, Fc4, C4) EEG signals were recorded from the scalp at a sampling rate of 256 Hz by multi-channel EEG amplifiers. In this off-line study, we will only take data from the training session into account to evaluate the performance of the algorithm under study. In order to provide the continuous EEG classification, data preprocess of segmentation has been applied to the original EEG signals. Each EEG trial has been segmented into several sub-trials by sliding window method with the fixed window length. After preprocessing, we can obtain overlapped EEG segments with duration of each segments varied among 0.5 to 2 s. In our experiments, we found the 0.5 s duration of each sub-trial is shortest for classification of motor imagery EEG. The on-line average error rates for three out of five subjects are shown in Fig. 3, which shows the performance over the different response time (1–5 s). The best online error rate of 4% was attained by subject 2, with a 5 s response time. This figure proves that the feature extraction method is suitable for online BCI classification. Furthermore, after some sessions, subjects learned to play the BCI game better and better with two adaptation cycles of man and machine. Then, we also tested the performance with a shorter trial length and obtained fast online BCI. The best performance with fast response time attained by a subject was about 75%, which
36
Q. Zhao et al.
demonstrates that our BCI system has the ability of providing much fast BCI at a response time of only 1 s.
5 Conclusion We have proposed a spatial-frequency filter optimization algorithm for the single trial EEG classification problem, which utilized the method of CWT in order to extend the CSP algorithm to the time-frequency space. The presented CSFP algorithm successfully improves the performance by optimizing the spatial and frequency patterns simultaneously. The advantages of the proposed method were proved by classification of multi-task motor imagery EEG for the real-time hit rat game. Our extensive experiments show that fast control of an on-line hit rat is feasible and very promising for implementing a brain-computer interface. Acknowledgments The work was supported by the National High-Tech Research Program of China (Grant No. 2006AA01Z125) and the National Basic Research Program of China (Grant No. 2005CB724301).
References 1. Birbaumer, N., Ghanayim, N., Hinterberger, T, et al. (1999) A spelling device for paralysed. Nature 398, 297–298. 2. Pfurtscheller, G., Flotzinger, D., Kalcher, J. (1993) Brain-computer interface: a new communication device for handicapped persons. Journal of Microcomputer Applications 16 (3), 293–299. 3. Vaughan, T., Wolpaw, J., Donchin, E. (1996) EEG-based communication: prospects and problems. IEEE Trans. on Neural Systems and Rehabilitation (4), 425–430. 4. Blankertz, B., Curio, G., Muller, K.-R. (2002) Classifying single trial EEG: towards brain computer interfacing. Advances in Neural Information Processing Systems 14, 157–164. 5. Wolpaw, J., Birbaumer, N., et al. (2000) Brain–computer interface technology: a review of the first international meeting 8 (2), 164–173. 6. Wolpaw, J.R., Birbaumer, N., et al. (2002) Brain-computer interfaces for communication and control. Clinical Neurophysiology 113, 767–791. 7. M¨uller, K.-R., Krauledat, M., Dornhege, G., Curio, G., Blankertz, B. (2007) Machine learning and applications for brain-computer Interfacing, in M.J. Smith, G. Salvendy (Eds.): Human Interface, Part I, HCII 2007, Springer LNCS 4557, pp. 705–714. 8. Pfurtscheller, G., Miiller-Putz, G., Graimann, B., Scherer, R., Schlogl, A., Vidaurre, C., Wriessnegger, S., Neuper, C. (2007) Graz-brain-computer interface: state of research. Toward Brain-Computer Interfacing. 9. Li, Y., Gao, X., Liu, H., Gao, S. (2004) Classification of single-trial electroencephalogram during finger movement. Biomedical Engineering, IEEE Transactions on 51 (6), 1019–1025. 10. Wang, Y., Zhang, Z., Li, Y., Gao, X., Gao, S., Yang, F. (2004) BCI competition 2003-data set IV: an algorithm based on CSSD and FDA for classifying single-trial EEG. Biomedical Engineering, IEEE Transactions on 51 (6), 1081–1086. 11. Muller-Gerking, J., Pfurtscheller, G., Flyvbjerg, H. (1999) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clinical Neurophysiology 110, 787–798.
Multi-Task BCI for Online Game Control
37
12. Pfurtscheller, G., Brunner, C., Schlogl, A., Lopes da Silva, F. (2006) Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks. Neuroimage 31 (1), 153–159. 13. Smola, A., Gretton, A., Borgwardt, K., Bedo, J. (2007) Supervised feature selection via dependence estimation. Proceedings of the 24th international conference on Machine learning, 823–830. 14. Ince, N., Tewfik, A., Arica, S. (2007) Extraction subject-specific motor imagery timefrequency patterns for single trial EEG classification. Computers in Biology and Medicine 37 (4), 499–508. 15. Kalkan, H., Ince, F., Tewfik, A., Yardimci, Y., Pearson, T. (2007) Extraction of optimal timefrequency plane features for classification. Signal Processing and Communications Applications, 2007. SIU 2007. IEEE 15th, 1–4. 16. Yang, B., Yan, G., Wu, T., Yan, R. (2007) Subject-based feature extraction using fuzzy wavelet packet in brain-computer interfaces. Signal Processing 87 (7), 1569–1574. 17. Blankertz, B., Dornhege, G., Krauledat, M., M¨uller, K.R., Curio, G. (2007) The non-invasive Berlin brain–computer interface: fast acquisition of effective performance in untrained subjects. Neuroimage 37 (2) 539–550. 18. Krepki, R., Blankertz, B., Curio, G., M¨uller, K.R. (2007) The Berlin brain-computer interface (BBCI) – towards a new communication channel for online control in gaming applications. Multimedia Tools and Applications 33 (1), 73–90.
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model Jian Li and Weidong Chen
Abstract A biped pattern generator based on Passive Inverted Pendulum Model (PIPM) is presented in this paper. It is determined by the pattern parameters such as the walking period and phase stride, and enables to generate an efficient walking by using the inertia of the biped, and makes the biped exhibit an up-and-down motion of the upper body. As a result, a human-like natural walking on the flat surface is obtained. Next, the pattern generation method is extended to adapting to various desired terrains. Finally, the walking experiments are conducted with the biped robot SHR-1, and the effectiveness of the proposed pattern generation method is confirmed.
1 Introduction Biped humanoid robots have become one of the most challenge researches of intelligent robots s. In the past two decades, great progresses have been made and many biped humanoid robots that are capable of performing stable dynamic walking have been successfully developed, such as Asimo [1], HRP [2], WABIAN [3], KHR [4], and Johnnie [5]. For most robots that can walk stably, the gait generation and control are based on ZMP concept [6]. There are generally two groups for the biped pattern generation [7]. The first group uses a multiple rigid body model [1,3,8]. This model needs precise information of the robot such as mass, inertia, and center of mass of each link, it is comparatively precise, but it leads too huge a computation cost to be used for online pattern generation. On the other hand, the structures of the biped robots are so complex that it is hard to achieve the precise information talked above. The other group [2, 4, 5, 7, 9] utilizes limited knowledge such as the total center of mass and total angular momentum. Typically, this group makes use of 3D inverted pendulum based on the assumption that the whole robot mass is concentrated to the J. Li, W. Chen Department of Automation, Shanghai Jiao Tong University, China e-mail: {lijia sjtu, wdchen}@sjtu.edu.cn B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
39
40
J. Li, W. Chen
robot’s COG (Center Of Gravity) and the robot locomotion in the sagittal and lateral planes are supposed independently. These two conventional groups can achieve various stable walking pattern, however, the walking motions of them have obvious difference as that of a human being: A human bending utilizes up-and-down motion of an upper body to walk efficiently, while the conventional humanoids exhibit unnatural walking by keeping the height of waist constant [2–9]. To solve the problem, Kajita et al. [10] proposed the control method using the conserved quantity and realized the planar stable walking. Yu et al. [11, 12] took the trajectory for the knee joint in each leg predetermined as one of the initial walking parameters. Kurazume et al. [13] proposed the Knee Stretch Index and the Knee Torque Index to generate a straight legged walking pattern. Masahiro et al. [14] proposed the method to use the robot dynamics directly by making the point-contact between a robot and the ground. In this paper, we propose how to generate an efficient walking pattern based on Passive Inverted Pendulum Model (PIPM) to tackle this problem. The proposed pattern generation utilizes the dynamics of the PIPM for the sagittal and lateral planes separately, so the biped robot can have a dynamic walking with a reduced ankle power, as depicted in [15]. It exhibits an up-and-down motion of the upper body, which is similar to that in human walk. Because of the reduced power, the biped walking pattern is efficient and has a better stability. The rest of this paper is organized as follows. Section 2 describes how to use the PIPM to generate walking pattern on the flat surface. In Section 3, the pattern generation method is extended to adapting to various desired terrains. In Section 4, we briefly introduce the biped robot SHR-1, shown in Fig. 1, and apply the pattern generation method to conducting the walking experiments. Finally, the conclusions are presented in Section 5.
Fig. 1 Prototype SHR-1
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model
41
2 Pattern Generation Based on PIPM 2.1 Passive Inverted Pendulum Model The Passive Inverted Pendulum Model, shown in Fig. 2, is composed of lumpedmass COG and a massless shaft of length l, which connects COG and the ankle joint of the supporting leg, which is similar to the model in [14] and [16]. To derive the Passive Inverted Pendulum Model (PIPM), the following assumptions are made: 1. The contact between a robot and the surface is point-contact. 2. The robot motions in sagittal and lateral planes are independent. The point-contact assumption means that the ankle joint of the stance is passive, that is to say the surface has no effect on the robot dynamics and there is zero torque between the ankle joint and the surface. The second assumption considers that the effect of the lateral motion on the sagittal dynamics can be neglected since lateral side-to-side rocking motion is quite small, so it allows us to utilize the dynamics of the PIPM for the sagittal and lateral planes separately. The motion of PIPM in the sagittal and lateral planes can be given as ..
θ=
g sinθ l
(1)
..
g sin φ, (2) l where θ and φ are the pendulum angles in the sagittal and lateral planes respectively, and g is acceleration of gravity. φ=
z
COG
θ
l
x
o Fig. 2 Passive inverted pendulum model
42
J. Li, W. Chen
2.2 Trajectory of COG Let (xC , zC ) be the position of the COG, and locate the origin of the local coordinate system lies in the ankle of the supporting foot, as depicted in Fig. 2, we can derive xC xC = l sin θ . (3) , θ = a sin l zC = l cos θ Supposing that the period necessary for one walking step is TS , the time of the kth step walking is from kTS to (k + 1)TS . According to periodicity and symmetry of the gait [8], there exit the following constraints: xC ((k + 1)TS ) − xC (kTS ) = SS , (4) zC ((k + 1)TS ) = zC (kTS ) where SS is the length of stride. Assuming that the time when COG passing normal axis is t1 , the following constraints can be derived: θ (kTS + t1 ) = θ (kTS ) = 0 . . (5) θ (kTS + t1 ) > 0 For a general walking (the length of a stride does not change during the period of the walking), switching of the gait happens where the intermediate point of a stride, which means t1 = T2S . So we can derive the following relations due to periodicity and symmetry of the gait: ⎧ ⎨θ ((k + 1)TS ) = θ (kTS ) = a sin SS l . (6) TS ⎩ θ kTS + 2 = 0 Solve Eq. (1), general solution can be represented as
θ (t) = C1 e−Kt + C2 e−Kt ,
(7)
where K = gl , and C1 , C2 are undetermined coefficients. Substituting Eqs. (3), (4), and (6) into Eq. (7), the equation of the pendulum angle can be derived as a sin SlS eKt − e−Kt . θ (t) = −KTs (8) KTs e 2 −e 2 Using Eqs. (8) and (3), the trajectory of COG motion of a biped robot can be achieved in the sagittal plane. The trajectory of COG in the lateral plane is derived using the similar method. Next, we can specify the trajectory of the swing foot by means of spline function. Once both foot trajectories and COG trajectory are known, all joint trajectories of
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5 z [m]
z [m]
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 −0.1
0
0 −0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x [m]
Fig. 3a Conventional walking pattern
43
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x [m]
Fig. 3b Proposed walking pattern
the biped robot will be determined by the kinematics constraints, so the walking pattern can therefore be denoted uniquely.
2.3 Results of Pattern Generation Figure 3 shows the comparison of a conventional walking pattern with a constant waist height of 0.6 m and the proposed walking pattern. Both the two walking patterns have the same gait parameters: step length of 0.25 m, the step period of 0.8 s/step and the step height of 0.03 m. The comparison reveals that the proposed walking pattern is more human-like with an up-and–down motion of the upper body, so the walking pattern is more efficient.
3 Pattern Generation Extended to Any Desired Terrain For a lumped-mass model, when the external force does not act to biped robot, ZMP can be calculated as ..
xZMP
xC z = xC − .. z +g
(9)
..
yZMP = yC −
yC z
..
z +g
.
(10)
In local coordinate system as depicted in Fig. 2, for PIPM, ZMP is kept as (xZMP ≡ 0, yZMP ≡ 0) according the definition of ZMP in [6]. From Eqs. (9) and (10), we can see that COG motions in the sagittal and lateral planes can not be specified arbitrarily. In sagittal plane, we can define the COG motion as zC = f (xC ) + s(t),
(11)
44
J. Li, W. Chen
where zC = f (xC ) depicts the COG motion as a PIPM in the sagittal plane, and g(t) contains the additional motion of COG in the Z direction besides the motion of PIPM. So we can derive the following equations: ..
xZMP
= xC −
xZMP = xC −
x C f (xC )
=0 .. f (x C ) + g .. x C (f (xC ) + s(t)) ..
..
(f (x C ) + s(t)) + g
(12) = 0.
(13)
From Eqs. (12) and (13), we can derive ..
..
xC s(t) = . s(t) xC
(14)
In Section 2.2, we have derived the trajectory of COG in the sagittal plane. Furthermore, we can derive the constraints of s(t) due to the periodicity and symmetry of the gait as . . s(kTS ) = s((k + 1)TS ) = 0. (15) According Eqs. (3), (8), (14), and (15), we can derive the trajectory of COG in the sagittal plane for any known surface.
4 Experiments This section shows two selected experiments of the biped robot SHR-1. Both are performed with the pattern generation proposed above.
4.1 Prototype Biped Robot The biped robot shown in Fig. 1a is developed since 2006. Its height is 0.85 m and 32 kg, and there are 6 DOFs for each leg (3 DOFs for the hip, 1 DOF for the knee, and 2 DOFs for the ankle). The hip joint has a cantilever type structure as HRP [2]. All the joints are actuated by DC motors and harmonic drive reduction gears. The robot is installed with inertial sensor in the torso and 3-axis Force/Torque sensors at the bottom of the feet. The actuators are controlled by digital controllers supporting position, velocity, and force feedback, which are connected with CAN communication. With RTLinux real-time operation system (OS), the robot is controlled with an appropriate control frequency. The overall specification of SHR-1 is given in Table 1.
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model
45
Table 1 Specification of SHR-1 Weight Height Walking speed Actuator
32 kg 0.85 m 0 ∼ 1.8 km/h Servo motor + Harmonic drive gear + Drive unit Inertial sensor 3-Axis force/Torque sensor Position sensor RTLinux 12 DOFs
Torso Foot Joint Operating System (OS) Degree of Freedom Sensors
0 [s]
0.4 [s]
0.8 [s]
1.2 [s]
Fig. 4 Snapshots of walking experiment on the flat
4.2 Walking Experiments The pattern generations based on PIPM for the plat and the known slope terrains are applied to SHR-1. The walking patterns are generated beforehand, and the trajectory data for each joint are used to do real-time control of robot on the both surface.
4.2.1 Walking Experiment on the Flat Figure 4 shows the snapshots of the biped walking on the normal concrete flat with velocity of 1.125 km/h, the gait parameters are: step length (SS ) is 0.25 m, the period of the single support (TS ) is 0.8 s, the length from COG to the ankle (l) is 0.65 m.
4.2.2 Walking Experiment on the Slope With known angle of slope, we can determine the trajectory of COG using the pattern generation method above. The gait parameters are: step length is 0.2 m, the period of the single support is 1.6 s, the length from COG to the ankle is 0.65 m, and the slope angle is 3◦ . Figure 5 shows the snapshots of the biped walking on slope.
46
J. Li, W. Chen
0 [s]
0.8 [s]
1.6 [s]
2 [s]
Fig. 5 Snapshots of walking experiment on the slope
5 Conclusion In this paper, we propose the PIPM model, based on which an efficient dynamic walking with up-and-down motion of the upper body can be generated. The generated walking pattern is more natural than the conventional pattern generation methods which are applied in the most successful biped humanoid robots. Furthermore, the pattern generation method is easy to be extended to serious desired terrains. The walking experiments conducted with the biped robot SHR-1 on the flat and slope terrains confirm the effectiveness of the proposed pattern generation. In the future, we will make an intensive study on the bridge between the active humanoid robot and passive humanoid robot utilizing the PIPM model. In addition, we will also develop the online free patter generation technology using PIPM with an online compensation control technology to make the biped adapt to unknown terrain. The more effort will be paid on the development of the biped prototype SHR-1. Acknowledgments This work is partly supported by the Central Academe of Shanghai Electric Group Co., Ltd, the National High Technology Research and Development Program of China under grant 2006AA040203, the Natural Science Foundation of China under grant 60475032 and the Program for New Century Excellent Talents in University.
References 1. Hirai K, Hirose M, Hacksaw Y and Takeaway T (1998) Development of Honda Humanoid Robot. In: Proc. of IEEE Int. Conf. on Robotics and Automation. Leuven, Belgium, pp 1321–1326 2. Kaneko K, Kanehiro F, Kajita S, Yokoyama K, Akachi K, Kawasaki T, Ota S and Isozumi T (2002) Design of Prototype Humanoid Robotics Platform for HRP. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. EPFL, Lausanne, Switzerland, pp 2431–2436 3. Yamaguchi J, Inoue S, Nishino D and Takanishi A (1998) Development of a Bipedal Humanoid Robot Having Antagonistic Driven Joints and Three DOF Trunk. In: Proc. IEEE Int. Conf. on Intelligent Robots and Systems. Victoria, BC, Canada, pp 96–101 4. Kim JY, Park IW, Lee J, Kim MS, Cho BK and Oh JH (2005) System Design and Dynamic Walking of Humanoid Robot KHR-2. In: Proc. IEEE Int. Conf. on Robotics and Automation. Barcelona, Spain, pp 1443–1448
Efficient Biped Pattern Generation Based on Passive Inverted Pendulum Model
47
5. Gienger M, L¨offler K and Pfeiffer K (2001) Towards the Design of a Biped Jogging Robot. In: Proc. IEEE Int. Conf. on Robotics and Automation. Seoul, Korea, pp 4140–4145 6. Vukobratovich M, Borovac B, Surla D and Stokic D (1990) Biped Locomotion: Dynamics, Stability, Control and Application. Springer, Berlin, pp 414–420 7. Kajita S, Kanehiro F, Kaneko K, Fujiwara K, Harada K, Yokoi K, and Hirukawa H (2003) Biped Walking Pattern Generation by Using Preview Control of Zero-Moment Point. In: Proc. IEEE Int. Conf. on Robotics and Automation. Taipei, Taiwan, pp 1620–1626 8. Huang Q, Kajita S, Koyachi N, Kaneko K, Yokoi K, Arai H, Komoriya K and Tanie K (1999) A High Stability, Smooth Walking Pattern for a Biped Robot. In Proc. IEEE Int. Conf. on Robotics and Automations. Detroit, Michigan, pp 65–71 9. Tang Z and Er MJ (2007) Humanoid 3D Gait Generation Based on Inverted Pendulum Model. In: 22nd IEEE Int. Symposium on Intelligent Control. Singapore, pp 339–344 10. Kajita S, Yamaura T and Kobayashi A (1992) Dynamic Walking Control of a Biped Robot along a Potential Conserving Orbit. IEEE Trans on Robotics Automation, vol. 8, no. 4, pp 431–438 11. Ogura Y, Lim HO and Yakanishi A (2003) Stretch Walking Pattern Generation for a Biped Humanoid Robot. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. Las Vegas, Nevada, pp 352–357 12. Ogura Y, Shimomura K, Kondo H, Morishima A, Okubo T, Momoki S, Lim HO and Takanishi A (2006) Human-like Walking with Knee Stretched, Heel-contact and Toe-off Motion by a Humanoid Robot. In: Proc. IEEE/RSJ Int. Conf. on Intelligent and Robots Systems. Beijing, China, pp 3976–3981 13. Kurazume R, Tanaka S, Yamashita M, Hasegawa T and Yoneda K (2005) Straight Legged Walking of a Biped Robot. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. Fukuoka, Japan, pp 3095–3101 14. Doi M, Hasegawa Y and Fukuda T (2005) Realization of 3-Dimensional Dynamic Walking Based on the Assumption of Point-contact. In: Proc. IEEE Int. Conf. on Robotics and Automation. Barcelona, Spain, pp 4131–4136 15. Yi KY and Zheng YF (1997) Biped Locomotion by Reduced Ankle Power. Autonomous Robots, vol. 4, no.3, pp 307–314 16. Akinori S, Atobe Y, Kameta K, Tsumaki Y and Nenchev DN (2006) A Walking Pattern Generator around Singularity. In: 6th IEEE-RAS Int. Conf. on Humanoid Robots. Hirosaki, Japan, pp 270–275
A Slung Load Transportation System Based on Small Size Helicopters ¨ Markus Bernard, Konstantin Kondak, and Gunter Hommel
Abstract The paper surveys a slung load transportation system using small size helicopters. Some typical real world applications are presented and the problems arising from the attachment of slung loads to the helicopter fuselage are discussed. An overview of the existing solutions to these problems is given and a new approach developed in our group for slung load transportation using one or multiple coupled small size helicopter is presented. Possible application for slung load systems composed of one or multiple helicopters are discussed.
1 Introduction To Slung Load Transportation The development of helicopters for load transportation started in 1950. Until today, helicopters used for load transportation have evolved into an own helicopter class of so called skycranes or aerial cranes. The main civilian use of aerial cranes is logging, where huge timber logs are transported from the place where they were cut down, to places easier to reach by trucks or trains. The timber logs are connected to the fuselage of the helicopter by a single long sling line (so called longline operation). Other uses for aerial cranes are the transportation of parts of prefabricated houses, building material for chimneys of industrial plants or the installation of air conditioners, power poles or antennas. Due to the fact that the pilot has no visual feedback of the motion of the load the operation of aerial cranes is difficult. The pilot only gets feedback through the forces and torques imposed on the helicopter by the motion of the load. There is a high risk that inexperienced pilots, who try to damp the oscillation of the load, will amplify it instead. M. Bernard, K. Kondak, and G. Hommel Real-Time Systems and Robotics, Technische Universit¨at Berlin, Germany e-mail: {berni, kondak, hommel}@cs.tu-berlin.de B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
49
50
M. Bernard et al.
The possibility of using multiple helicopters can increase the load capacity and therefore will open new application areas for slung load transportation. The construction and maintenance of high payload capacity helicopters is extremely difficult and expensive. The deformations of the blades on the main rotor impose the limits on increasing the main rotor diameter and the payload of the helicopters. The most powerful helicopter ever built (Mi-26) has a payload of 25 t. Therefore the idea of using existing helicopters for transportation of heavy and bulky loads and just coupling them to reach the required payload is interesting and important for practical applications. Possible applications can be the transportation of fuselage parts for aircraft industry or the transportation of generator parts for power plants. To our knowledge, the first study on load transportation using multiple coupled helicopters was performed by Sikorsky Aircraft Corporation at the end of 1960s. In 1970, the transportation of a 20-t-heavy load with two CH-54B helicopters was demonstrated. The helicopters were controlled manually. For safety reasons both helicopters were equipped with an emergency load release device. The second demonstration known to the authors was performed in the beginning of 80 s in Scotland by the British army. Two Jet Ranger helicopters (payload of each ca. 0.54 t) were transporting a load with a mass of 1 t. Also in this experiment both helicopters were controlled manually. These experiments have shown that this kind of operation is possible, but requires very skillful pilots. The problem is that the motion of one helicopter directly affects the other helicopter. Every maneuver needs to be coordinated using audio contact of the pilots and the effects of every unforeseen maneuver (of one helicopter e.g. through local wind gusts) needs to be compensated by the other helicopter as well. For a longer period of flight time, coupled load transportation with two manually controlled helicopters is difficult, because of the demanding requirements of the task and fast fatigue of the pilots. The authors have no knowledge about research work or experiments using more than two coupled helicopters. In our research work, we are developing an automatic control system which allows the usage of one or multiple helicopters for load transportation. The number of helicopters should be configurable depending on helicopters’ capabilities and the load to be transported. This paper is organized as follows: In Section 2, a detailed description of the problems related to helicopter control with an attached slung load is given. In Section 3, recent work related to helicopter control with an attached slung load is presented. In Section 4, we describe the main concepts of our approach to control one or multiple coupled helicopters for load transportation. In Section 5, the realization of the load transportation system based on three small-size helicopters is presented. Section 6 is devoted to experimental results achieved with the presented load transportation system. In Section 7, possible applications for this system are discussed and Section 8 concludes the paper.
A Slung Load Transportation System Based on Small Size Helicopters
51
2 Problem Description In order to understand the problems, arising from the attachment of slung loads to the helicopter fuselage, let us first consider the generation of forces on the helicopter main rotor required for its translational movement. In Fig. 1 the helicopter is shown during hovering and forward flight. The current orientation frame of the helicopter is given by the three vectors f1 , f2 and f3 . To simplify matters the rotation of the helicopter around f3 is neglected (the change of the helicopter heading); this includes the torque TMR 3 around f3 generated by rotation of the main rotor, as well as all forces and torques generated by the tail rotor. This simplification can be made without loss of generality, because it was shown in [3, 4] that the control of the helicopter heading can be separated from the control of the remaining orientation and the translation. and two torques TMR The main rotor can independently generate a force FMR 3 1,2 . MR The force F3 is always directed approximately perpendicular to the rotor disc. To generate forces not vertical to the ground (see FMR in Fig. 1b) a proper change of 3 the helicopter orientation becomes necessary, which can be archived using TMR 1,2 . Figure 1a shows the helicopter during hovering, where the force required to lift the helicopter FLift is equal to the force generated by the main rotor FMR 3 . During forMR ward flight (see Fig. 1b) the force F3 is split into the lifting force FLift and the force FAcc used for the acceleration of the helicopter. To keep the helicopter at the same height, the magnitude of FLift has to be preserved and therefore the magnitude of FMR 3 has to be increased. To sum up: The coupling between rotation and translation (for a helicopter without slung load) can be expressed in the following relation: “the orientation has effect on the translation”. In Fig. 2 the attachment of a slung load to a helicopter fuselage is schematized. The rope connecting helicopter and load is attached to the fuselage at point r. The force caused by the load in point r is given by the vector Fr . The vector pr cm connects the mounting point r and the center of mass cm of the helicopter. The load angle θ relative to the fuselage is defined as the angle between the extension of the vector pr cm and the rope. Because of constructional limitations it is normally not
Fig. 1 Simple force generation scheme for helicopters
52
M. Bernard et al.
Fig. 2 General scheme of helicopter, load attachment
possible to attach the rope directly in the center of mass cm of the helicopter. For that reason the vector pr cm connecting r and cm is assumed not to be zero during the following considerations. During hovering of the helicopter, with the load at rest and no perturbations, the angle θ is zero and no torque is caused by the load. If helicopter and load are considered to be in the state as depicted in Fig. 2, the load angle θ is not zero and the load causes a torque that will change the helicopters orientation. A change of the helicopters orientation leads to acceleration/deceleration of the helicopter (see Fig. 1b). The acceleration/deceleration of the helicopter leads to a change of the load angle θ and to acceleration/deceleration of the load. This again changes Fr and the torques acting on the helicopter. Therefore the following relation is true for a helicopter with a slung load attached: “orientation has effect on translation and vice versa”. Due to this relation it is difficult to stabilize helicopter and load after the load has begun to oscillate. The higher the mass ratio between load and helicopter becomes, the higher becomes the impact of the torques caused by the load. Please note that even for a helicopter motion with constant acceleration let say in n1 direction oscillations of the slung load are possible.
3 State of The Art There has been a lot of research about modeling and control of small and full size helicopters e.g. [9], but regarding slung load transportation only a few publications exist. The authors have no knowledge of a fully autonomous system for slung load
A Slung Load Transportation System Based on Small Size Helicopters
53
transportation using one full size helicopter. To enable inexperienced pilots the operation of aerial cranes the iMAR GmbH and the German Aerospace Center DLR developed a guidance system called “iSLD-IVC” (iMAR Slung Load Damping based on inertial stabilized vision control). The system consists of sensors to estimate the orientation of the helicopter/load and a controller to calculate the necessary control signals to damp the oscillation of the load. The control loop is not directly closed, but the output of the controller is presented to the pilot using an artificial horizon instrument. This way the pilot is guided to damp the oscillation of the load, without direct visual feedback of the load itself. For small size UAVs a group of Aalborg University did some research on modeling of slung load systems based on one helicopter, see [5]. In their approach [6,7] the system state is estimated using vision, where a camera is used to observe the motion of the load. Recently some details about their control approach were published [8], using feed forward input shaping and a delayed feedback controller. There are some theoretical publications [1, 2] about the modeling and control of twin-lift full size helicopter systems where non-linear control based on an inverse dynamic model of the system for hovering configuration and H-inf approach were used. Twin-lift systems are composed of two helicopters carrying one slung load together, in most approaches a spreader bar is used to separate the helicopters. For load transportation using more than two small size UAVs the authors don’t have any reference.
4 Concepts The modeling and control concepts described in the section, are based on the nonlinear model and control algorithms presented in [3, 4]. In Fig. 3 the general control scheme for the slung load transportation using one or more helicopters is presented. The index i identifies the different helicopters, which may participate in the load transportation task. For simplicity the index is ignored in the following considerations. The controller is composed of an inner control loop Rori for the orientation and an outer control loop Rtrans for the translation of the helicopter. The presented controller utilizes the non-linear model given in [3, 4]. For example: The inverted
Fi R
Fi 3MR x* (t) -
R trans
Fi
x (t ) Fig. 3 General controller scheme
−1 123
F
xi{1,2,3} * i {1,2}
q
R ori
C MR Ti{1,2}
R Ti{1,2}
system: helis+ load
54
M. Bernard et al.
MR to be translation dynamics F−1 123 of the model is used to calculate the force F3 generated by the main rotor, as well as desired angles q∗1,2 from the force F. The force F is calculated by the translation controller and can be interpreted as the force necessary to translate the helicopter in the desired way. The exact implementation of Rtrans depends on the number of helicopters and will be explained later. As shown in Section 2 the control of the helicopter orientation is crucial for the control of the whole system. The translation controller Rtrans can only work, if the utilized orientation controller Rori is able to set up the desired orientation. The main idea behind the slung load controller is to make the orientation controller Rori robust against external influences, including ones created by the attachment of slung load to the fuselage for the case of single as well as of multiple UAVs. The orientation controller Rori internally uses the inverted model of the rotation dynamics of one helicopter without slung load to calculate the moments T1,2 from the desired angular accelerations. Therefore the first idea was to include the slung load in the model of the helicopter. This way the inverted model of the rotation dynamics includes the influence caused by the slung load. This approach works well in simulation, if the parameters of the inverted and the simulated model match each other exactly. In case of small parameter variations between the simulated and the inverted model, the controller quickly becomes unstable. For real world applications the exact model parameters are never known and therefore this approach is not feasible. A successful approach is to use the existing orientation controller described in [3, 4] and extend it with an additional feedback loop with the torque compensator C as shown in Fig. 3. The force FR generated by the load is measured by a force sensor. Besides the force, the orientation of the rope relative to the helicopter fuselage is measured. Using this information, it becomes possible to calculate the torques TR 1,2 (acting on the helicopter), which are generated by the load or the coupled UAVs. The compensation is realized by subtracting those torques from the torques generated by the controller. Through the use of the force sensor the feedback loop with the torque compensator becomes independent from the parameters of the remainder of the system, like weight of the load, rope length and even the UAV configuration (single and multiple). For single UAV slung load transportation a new translation controller was designed. This controller is used to achieve the desired translational movement of the UAV and suppress or damp oscillation of the load quickly. With the UAV orientation considered to be controlled and stable, it becomes possible to use a simplified model for the design of the translation controller. The UAV and the load were approximated as mass points for the design of the single UAV slung load controller. Forces can only be applied to the UAV mass point and need to be generated by changing the orientation and the adjustment of FMR 3 . The controlled orientation and therefore the generation of forces can be approximated by a third order system. The force generation model and the mass point pendulum model were combined and the resulting model was used for the design of a PI-state-feedback controller.
A Slung Load Transportation System Based on Small Size Helicopters
55
For the multi UAV load transportation it was possible to use the translation controller described in [3, 4] combined with the extended orientation controller described above, which is using the torque compensator.
5 Realization In Fig. 4 one of the UAVs, used for the slung load transportation experiments, is shown during flight. The UAVs are based on commercially available small size helicopters. The helicopters have a rotor diameter of 1.8 m, a main rotor speed of approximate 1,300 RPM and are powered by a 1.8 kW two-stroke engine. The UAVs can carry about 3 kg of additional payload, whereas the weight of the UAV itself is 13 kg. The different components necessary to achieve autonomous flight capabilities are mounted to the helicopters, using a frame composed of strut profiles. Through the use of these profiles, the location of hardware components can be altered and new hardware can be installed easily. This allows quick reconfiguration of the UAVs for different applications, easy replacement of defective hardware and alteration of the position of different components to adjust the UAVs center of gravity. The components necessary for autonomous operation are shown in Fig. 4: A GPS, an IMU, a control computer and a communication link using WLAN. Due to the strong magnetic field of the engine an additional magnetic field sensor is mounted on the tail. All UAVs are equipped with Load Transportation Devices (LTD), which are specially designed for the transportation of slung loads using one or more UAVs. The LTD is composed of a two axis cardan joint with two magnetic encoders attached to
Fig. 4 UAV system during flight – hardware overview
56
M. Bernard et al.
Fig. 5 Load Transportation Device
each axis. After the joint a force sensor is mounted. After the force sensor a release mechanism for the rope is attached, which is composed of a bolt, inserted into a tube. The bolt is fixed in the tube through a pin, which can be pulled out by a small motor, to release the load. The release mechanism can be used for emergency decoupling of the UAV from the load (or the other coupled UAVs), but also to release the load after successful transportation. The magnetic encoders allow to measure the rope orientation relative to the UAV fuselage. With this information and the measured force in the rope, it becomes possible to calculate the torques imposed on the UAV through the load (and/or the other coupled UAVs), which are needed for the torque compensator described in Section 4.
6 Experimental Results In Fig. 6 two different slung load transportation tasks are shown. The slung load transportation using one helicopter was performed in Utrera (Spain), April 2008 (the experiment was conducted for the first time (by the authors) in Berlin in November 2007). The helicopter transported one liter of water in a jerry can. The multi UAV load transportation was performed in Berlin in December 2007: the three UAVs together transported a load of 4 kg. Figure 7 shows the load transportation using a single helicopter. All coordinates are given in a local Newtonian system N1,2,3 . The origin of the frame is defined at the UAV take-off position. The coordinate X3 represents the height of the UAV.
A Slung Load Transportation System Based on Small Size Helicopters
57
Fig. 6 Single and multiple UAV slung load transportation
Fig. 7 Slung load transportation and deployment using one UAV
Until the load was deployed the coordinates X1,2 show the position of the load and after the deployment the X1,2 coordinates represent the position of the UAV itself. The deployment is indicated by a vertical dotted line. The dotted step is the desired position as is presented to the controller. The colored area shows an internal state
58
M. Bernard et al.
of the controller. This internal state represents the input of the controller, which has been filtered to match the flight dynamics of the UAV. The UAV performed an autonomous take-off using a conventional controller; after the load was lifted (at approximate 5 m) the controller designed for the slung load transportation was used. The working height of 15 m was reached in two steps. The load was transported 41 m along n1 and after the desired position had been reached, the height was decreased until the load was deployed from approximately 2 m height. A few seconds before the deployment of the load the conventional controller was reactivated. The transition between the two controllers was not smooth and therefore a movement of approximately 0.5 m can be observed just before the deployment. The conventional controller was used to return the UAV to it’s take-off position and to perform autonomous landing. The performance of the controller was quite good, despite the stormy weather conditions on that day. Even with steady wind of 30 km/h and wind gusts of 40 km/h the controller was able to stabilize the helicopter and damp upcoming oscillation of the load. The load transportation using three small size helicopters is shown in Fig. 8. A load of 4 kg was transported and ropes with a length of 13 m were used. The helicopters were arranged as an equilateral triangle on the ground with a side length of 8 m. Figure 8 shows the coordinates of all three helicopters during the flight. During the experiment the coordinates of the helicopters were recorded in different Newtonian frames N1,2,3 , which have the same orientation, but different origins. That’s why there is no static offset between the trajectories of the different helicopters.
Fig. 8 Slung load transportation using multiple UAVs
A Slung Load Transportation System Based on Small Size Helicopters
59
The steps given to the controller are shown as dotted lines and the colored areas are internal controller states, showing the input steps after a prefilter has been applied, to match the input steps to the translation dynamic of the helicopters. The horizontal dotted line shows the lift-off height of the load. The helicopters performed an autonomous take-off and increased their height to 10 m (with the load still on the ground). Then the height of all three helicopters was increased to 15 m. The load was lifted at approximately 12.4 m. The additional weight of the load was not included in the controller, which leads to a small disturbance in the trajectories along X3 at the moment the load was lifted from the ground. For the same reason disturbances in X3 can be observed during strong acceleration in X1,2 -direction. Steps of ±10 m along n2 and of +10 m along n1 were performed. The achieved position error during hovering was about ±0.3 m. The helicopters only changed their relative position about ±0.3 m during flight and the triangular formation of the helicopters was preserved. The load showed almost no oscillation during the whole flight.
7 Possible Applications The presented system for slung load transportation has two purposes. First, it is an experimental platform for testing of control strategies for slung load transportation using full-size helicopters and second, it is used for the development and testing of new applications for load transportation using small size helicopters. As explained in [3, 4] the system dynamics of small size helicopters are quite different from the system dynamics of full size helicopters (mostly because of the different mass ratio helicopter fuselage/main rotor, much lower main rotor rotation speed and motion of the main rotor blades). Therefore not all results achieved for small size helicopters can be directly transferred to a system based on full size helicopters without modifications. Nevertheless, the main concept – usage of a force sensor in the cardan joint attachment of the load to calculate the influence on the rotational dynamics from the remainder of the whole system and use it in the feedback loop – as well as the controller design scheme and sensor data processing algorithms can be applied to a system based on full size helicopters. The presented experiments for load transportation with one and three coupled helicopters have shown that an automatic control system can be successfully realized with existing technical means. In addition, there are some applications, where a small size system of coupled UAVs is useful. One example is the transportation of sensor probes, which are too heavy for a single small size UAV and/or need to be transported at some distance from the UAVs themselves, to avoid mutual influence of the UAV electronics and the sensor measurement. In the European AWARE project coupled UAVs are used in a disaster and fire surveillance scenario. One objective is the transportation of tools (radios, fire axe, rope-ladders or gas masks) to the top of a building, where they can be used by
60
M. Bernard et al.
trapped civilians or fire fighters. It is of course possible to build bigger UAVs, which are alone capable of carrying the load. But if the space is limited e.g. in a fire truck, bigger UAVs are more difficult to transport and may need to be partially disassembled to be transported on several trucks. Good scalability is another unique feature of the coupled system. If one particular load is too heavy for e.g. three helicopters, more helicopters can be added to the system, up to a certain reasonable extent. The control system for multiple helicopters developed in our group can also be used for coordinated control of multiple uncoupled helicopters. Based on this functionality, we have been performing investigations of new applications for near field filming.
8 Conclusions In this paper, a slung load transportation system using autonomous helicopters was described. The problems occurring if a slung load is connected to the helicopter were discussed. An overview of state-of-the-art solutions to this problem was given, followed by a presentation of our concepts for control of one or multiple helicopters for the slung load transportation. Some details of the technical realization of the system based on small size helicopters and the experimental results were presented, followed by a discussion about possible applications for the slung load transportation using multiple helicopters. The presented slung load transportation system has a unique feature not shown by any other system: It allows one or more helicopters to participate in joint slung load transportation being completely scalable within reasonable extents and without any necessary modification for the integration of additional helicopters. From the controller design point of view, the new issue in our system is the usage of the force sensor in the rope to get the estimation of the influence from the remaining part of the system and to use this estimation in the feedback loop for the orientation controller. The presented approach eliminates the need of using inverse dynamic equations for estimation of the influence coming from the slung load. This makes the controller independent of several system parameters and increases the overall robustness of the system. To the knowledge of the authors, slung load transportation using three helicopters has never been demonstrated before, neither with full size, nor with small size helicopters. Autonomous slung load transportation using one helicopter is also quite a new field. To the knowledge of the authors there was only one successful demonstration shown by the group at Aalborg University, performed about two month before the demonstration in our group. In the future work, different control approaches for load transportation will be compared in flight experiments in order to get a feasible approach with maximal performance. Further, the work will be continued in order to make the step from first feasibility demonstrations to the first applications. We have also been continuing
A Slung Load Transportation System Based on Small Size Helicopters
61
the work on sensors for environment perception and collision avoidance in order to integrate these functionalities into the current control system.
References 1. H. K. Reynolds and A. A. Rodriguez, “H-inf control of a twin lift helicopter system,” in IEEE Int. Conf. on Decision and Control, pp. 2442–2447, 1992. 2. M. Mittal and J. V. R. Prasad, “Three-dimensional modelling and control of a twin-lift helicopter system,” Journal of Guidance, Control and Dynamics, 16(1), 86–95, 1993. 3. K. Kondak, M. Bernard, N. Losse, and G. Hommel, “Elaborated modeling and control for autonomous small size helicopters,” in ISR/ROBOTIK 2006 Joint conference on robotics, 2006. 4. K. Kondak, M. Bernard, N. Meyer, and G. Hommel, “Autonomously fying VTOL-robots: Modeling and Control,” in IEEE Int. Conf. on Robotics and Automation, pp. 2375–2380, 2007. 5. M. Bisgaard, J. Bendtsen, and A. L. Cour-Harbo, “Modelling of generic slung load system,” in AIAA Modeling and Simulation Technologies Conference, 2006. 6. M. Bisgaard, J. Bendtsen, A. L. Cour-Harbo, E. N. Johnson, “Vision aided state estimation for helicopter slung load system” in IFAC Symposium on Automatic Control in Aerospace, 2007. 7. M. Bisgaard, A. L. Cour-Harbo, J. Bendtsen, “Full state estimation for helicopter slung load system”, in AIAA Guidance, Navigation and Control Conference, 2007. 8. M. Bisgaard “Modeling, estimation, and control of helicopter slung load system,” PhD thesis, Aalborg University, 2008. 9. W. Johnson, Helicopter Theory. Dover Publications, 1980.
Multi-Source Data Fusion and Management for Virtual Wind Tunnels and Physical Wind Tunnels Huijie Hu, Xinhua Lin, and Min-You Wu
Abstract We can make the full use of vast multi-source data by adopting flexible methods that are used to integrate and manage them. However, current works do not consider the database features on fusing and managing data. The main objective of this paper is to design a specific framework between client and database server to fuse and manage a mass of data which come from both physical and digital wind tunnel experiments. The system always adopts the latest data fusion and database conceptions. Therefore, the user could use the physical wind tunnels’ results to verify the data worked out from virtual wind tunnels, and to utilize the latter to supplement the former. Furthermore, the data of the virtual wind tunnel could replace some practical results which cannot be acquired in the real condition.
1 Introduction “Wind tunnels are ground test facilities used for simulation different flight conditions encountered by aerospace vehicles [4]”. It is the most frequently used and most efficient tool for the aerodynamic experiments. Wind tunnel experiments are widely utilized for aero plane design, such as testing the Pneumatic attributes of flat-shape airfoils, verifying the deployment of air inlet shafts and so on. Now rapidly improving technology empowers computer a necessary assistant in the engineering process. The Computational Fluid Dynamics (CFD) becomes an extremely important component in the aircraft manufacturing. Virtual wind tunnels are wind tunnel models based on virtual environment which use Virtual Reality (VR) technology, and integrate CFD, visualization, 3-dimension interactive functionalities. Thus users could run demos dynamically on the theoretic compute and analysis
H. Hu, X. Lin, and M.-Y. Wu Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail: {huhuijie, lin-xh, mwu}@sjtu.edu.cn B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
63
64
H. Hu et al.
tools to simulate the process encountered in the real conditions. Plenty of time and energy will be saved by taking this measurement, and the design reliability is also enhanced dramatically. Data fusion is also called information fusion. “This notion is derived from the fact that understanding of phenomena from a scientific basis, creating an engineering design, or assessment for sound decision making requires the utilization of data from many distinct sources [6]”. Traditionally such tasks have been focusing on collecting data from a large number of sensors, then treating them to different degrees, finally displaying the results on the screen by the way of images or numbers. In electronic area, data fusion means gathering the information like lights, sounds, views and exhibit users by visual measurements. While in automatic industry, data fusion refers to the operation of concretizing the space-time information with existed rules before figuring out the expected answers. We focus on the issue that how to commix the data which come from different sources, because neither the physical wind tunnels nor the virtual wind tunnels has been achieved to meet the truthfulness and integrity simultaneously. Physical wind tunnel results could realize the authenticity because its experimental condition is as close as possible to the real situations. Unfortunately, the physical equipments cannot be changed in size under the changing demands, so only a few values in some fixed test cases could get by experimenters. On the contrary, virtual wind tunnels are able to simulate all scenes pre-set by users, but the scenes are not perfect enough because of ignoring the existence of aircraft noise or aerodynamic friction. The correct and complete data will be obtained if we could fuse the values come from physical and digital wind tunnels together. To address these problems, we design the framework to combine the two wind tunnels’ advantages and eliminate the disadvantages. This system improves the authenticity and availability of original data by taking data fusion technologies, and using different kinds of data fusion algorithms to enable the wind tunnels data’s verification, supplement and scalability. In addition, the data fusion framework is integrated with the aircraft design platform’s VR environment, and we use high performance computing techniques to support the data analysis. In a word, the innovative system combines multiple latest computing technologies which are related to wind tunnel data operations into one framework. The rest of the paper is organized as follows. Follow the introduction in the Section 1, Section 2 discusses the related works. Section 3 describes the wind tunnel equipments in the Shanghai Jiao Tong University (SJTU) as the background of this paper. Section 4 introduces the data fusion and management methods in the system, and finally end with the conclusion.
2 Related Works In this section, we not only introduce the existing wind tunnel productions, but also briefly discuss the measurements which are chosen to dispose data.
Multi-Source Data Fusion and Management
65
The wind tunnel grid Project [2, 3] in the University of Southampton is used to manage and analyze aero acoustics data. It processes microphone array data at the adjacent speed with real-time and executes data fusion with CFD. The system uses Windows Workflow Foundation to create workflows for data analysis and applies Microsoft SQL Server to conserve data that obtained from three different wind tunnels (11. × 8., 7. × 5., and 3. × 2.). NASA Langley Research Center conducts their research on Basic Aero-dynamics Research Tunnel (BART) [4]. It combines data which come from dissimilar wind tunnels by using different mathematic algorithms and then prints the final results out through visualization technology. Reference [5] describes a neural-network based architecture, whose functionality is interpolating wind tunnel test data. It divides the whole architecture into four layers and three modules. The most important module named Knowledge Extraction Module implements neural network approaches such as BPN, RBF and GRNN to acquiring the useful information from those results. The work introduced above mainly concentrated on one or two aspects in data fusion. The researchers only consider one kind of wind tunnels in their systems.
3 Wind Tunnel Experiments 3.1 Physical Wind Tunnels The physical wind tunnel experiments mentioned in this article carry on at the aerospace department in SJTU. The wind tunnel equipment is consists of three parts: main body, driver system and verification system. First, the experimenters put the unformed aircraft models in the tunnels and turn on the switch so that the wind engine could produce fierce wind to blow the fixed model. Then the different kinds of sensors covered on the surface of the model begin to gather information about the temperature, humidity and pressure at the deployed place. After one procession finished, all the related results are sent to the central computer to start the ultimate data processing.
3.2 Virtual Wind Tunnels The virtual wind tunnel used by SJTU is a part of the Air Design Platform (ADP). The software ADP includes three functions: one is for designers to set up air plane models and partition the models into grids, the rests have integrate various solvers to execute calculation. One kind of these solvers, named NS3D, Euler, DATCOM, are used to analyze the virtual work conditions in which the digital air plane encountered, and work out the useful results in this simulation. Another type of solvers
66
H. Hu et al.
like Inheritance undertakes the work of optimizing the results gained from the last step. After a long time’s interactive computation, the appropriate answers will reply to designers, they could view the best 3-D air plane shape on a wide screen. If the experimenters do not content with the solution, it is easy for them to change some configurations and run the simulation again. The data collected from the models contains parameters of Match number, Attack angle, Wing area, Leading edge, etc. We ought to save them into database and choose a convenient approach to fetching them, display the results by using images to the client whenever someone wants to use them. The database meets an obvious matter that the data gotten from both physical and digital wind tunnels is temporal and spatial data. The metadata changes with time and space, they might be one value at sometime in a certain place, but changed to another number when a few seconds past or the sensors moved only a little distance away. So the efficient storage techniques are needed when we consider how to save data in to servers.
4 Data Fusion and Management The term “data fusion” is drawn from early military application. At the very beginning, in order to accurately detect enemy’s trace, sensors are deployed in a wide range area. As long as the enemies appear, their positions are sensed by nodes immediately, and the path the enemy walked through has been recorded. Then sensors send the data to central computer to execute complicate calculation, the machine fuse these multi-source data in various degrees by using different kinds of algorithms. After getting the appropriate information, the whole track will be drawn as form of pictures and shown to technicians. Data fusion has other applications in Automatic, Graphics and Electric. But we use data fusion here refer to some data treatments based on database techniques and wind tunnel experiments. Generally speaking, in this paper, we represent data fusion and management as four distinct applications: data collection, data verification, data supplement and data interpolation. The difficulties to realize those operations increase because of the higher requirement of understanding the source data. Each step will take distinct approaches to reach the targets.
4.1 Data Collection We consider data collection from two different instances. In physical wind tunnel experiments, source data is gathered from sensors which are scattered on the surface of air plane model. In the past, collection work was done by using some simple instruments. Experimenters had to write the values on papers by themselves. Now, electronic control system and real-time sampling system participate in the
Multi-Source Data Fusion and Management
67
Data Fusion
Server
Data Interpolation
Client
Database Data Supplement
Show
Screen Data Verification
View
Get Results
Data Collection
Config
Computational Nodes
user s ter me ara dP Sen
C -ec oll Da tta
LAN C D -ect oll at a
ata tD llec Co
Physical Layer
Virtual Wind Tunnels Physical Wind Tunnels
Fig. 1 The architecture of data fusion and management system
collection process. The electric equipment drives sensors to get the expected values from model and send them to the connected device whose function is to receive data and save them. The second data source is virtual wind tunnels, in which experiments carry out totally based on computer. Firstly, the users use the ADP to create a project, choose the right airfoil type from an airfoil list. Secondly, the system asks user to type some key parameters into the suitable place, such as tip ratio, aspect ratio and span to simulate a virtual air plane. After the model is built in the system, the air plane designer should pick one or two parallel solvers to run on the case which was set up in the former step. The optimize work will be performed to calculate the most suitable plan in the end, all data produced during the whole project will be collected automatically by the software and transmitted into the database which connected with a middleware whose job is to coordinate the tasks executed by different components. The data drawn from different sources will be conserved by distinct treatments. The physical wind tunnel values only contain the parameters useful to designer and the result gained from sensors. However, the virtual wind tunnel record all data come out during the entire simulation. Namely, we should drop some useless information before we save the data into database. Of course, the algorithm will take the actual condition in to account. Finally we only select the user-input parameters, values related with aerodynamic situations on the model surface, and the ultimate air plane shape information as the save data source. We set up a library named “physical lib” to record physical wind tunnel parameters, another “virtual lib” for digital wind tunnel data. The most difficult thing for conserving those data is how to store geometric information. We suppose the air plane object saved in database is always geometric object. The simplest 2-D geometric objects include point, line, triangle, and other polygons. Complex 2-D objects
68
H. Hu et al.
Table 1 Part of parameters saved Parameters
Default Value (Area)
March Number Attack Angle NEND REOO CFL RVIS2J
0(0,10) 0(−90,90) 1000(0,10000) 1.1e7(1E2,1E10) 4.0(0,10) 8(1,100)
Fig. 2 The data verification process
Physical Lib
RVIS2J RVIS2K RVIS4 8 8 16
HM 0.05
ISM00 true
EPSX 0.8
conflict Virtual Lib RVIS2J RVIS2K RVIS4 8 16 8
HM 0.05
ISM00 EPSX false 0.8
could be obtained through serial Union, Intersection and Set Difference operations to the simple objects. Similarly, the complex 3-D air plane could be described by using those operations to 3-D simple models (Ball, Column, Cube, etc.) [1]. So we decide to use vector data to represent the special geometric information of air plane components.
4.2 Data Verification Data verification is the second level of data fusion. After collecting data from source, we should make full use of them, not only store information in database. Otherwise, users will never use the data unless we bring into mind that there is such experiment values existed. Generally speaking, the physical data is more creditable than the digital ones, because the physical data is straightly collected from the real world and it is close to the real circumstance which the air planes meet. But in the computational environment, we just overlook these facets such as friction and noise from the practical world which leads every issue too ideal. We cannot imitate the full case in the real world, so that the answer got from two sources will not match each other more or less. When we realize that diversity, some evaluations should be carried out urgently. If the users want to compare the dispersion between the multi-source data, he /she
Multi-Source Data Fusion and Management
69
could query the physical results from the physical lib by using the project name set by him/her, the system will allocate the parameters into different arrays. Someone stores the values about the experiment condition, some arrays load the direct or indirect data collected by sensors, and the rests are used to save the ultimate model information. This work is done again when users fetch virtual wind tunnel data from virtual lib. The first step of data verification is check whether the two working condition are consistent. The algorithm compares each element carefully, if any distinctness has been found, the process will cancel then reply an error to users. Otherwise, the algorithm continues to look over the air plane model technique indicators and the performance results. A signal stand for success will display on screen indicates no dissimilar has been found. Or an elaborate list will present to designer with report in which items in virtual wind tunnel data are different with physical experiment data. The worker could adjust the precision of the wind tunnel software, and get correct number next time.
4.3 Data Supplement Data supplement is based on the profound comprehension of the data. It focuses on finding the association between different values. In this circumstance, physical wind tunnel value and virtual wind tunnel data is no longer separated to individual again, whereas they are cooperating with each other to obtain optimal results. Namely, we need fuse these data on a higher layer. The reason we ought to use such measurement here is that the physical experiments are often limited by the material condition in the real world. Either the technique capability cannot reach the requirement or the experiment equipments cannot support the demand functions could be the reason. Therefore, we need computer techniques assist to accomplish several scenarios. Data supplement method in this framework is applied in that case. One aircraft expert has design an air plane model then put it in the physical wind tunnel, and received the satisfied answer. Some days later, he wants extent the model to a larger one. Therewithal, the old physical wind tunnel is not big enough for his new plan to put in. it is impossible for him to build another wind tunnel, and the new demo actually need to be tested. The most feasible approach is to simulate a virtual environment which is close to the real one, meanwhile draws a 3-D graphic plane which uses the parameters of the new model in computer, and then calculates the final values to replace practical experiments. By utilizing the data verification technology; it is easily to judge whether the values got from computer simulation is in full line of the real condition. This approach needs complete understanding of the data, so that user could set parameters for the virtual wind tunnel and virtual airplane, the source only comes from the information gained from the early physical experiment. The wind tunnel database integrated in the ADP provides a scheme to evaluate the closer degree of the computational environment compared with the realistic situation.
70
H. Hu et al.
4.4 Data Interpolation Data interpolation is the most difficult part in data fusion and management; it utilized all the data fuse techniques discussed above, use some data mining conception and methods for reference. It is known to all that data mining is an effective tool for users to extract useful data from large-scale database, and then make important business decision by using this information. Data interpolation concentrates on the applications that need predictions. Suppose that we have several physical wind tunnels that built in different sizes, so that we could conduct our experiments in various conditions, the scenario described above seems a good scheme. However, some test will gain a better result if we put the model in a continuous test sc`ene. Namely, if we could predict some working condition which could fill in the discrete practical circumstances, seriate answers can be got from the ideal test scene. This idea is just a nonobjective conception raised in the real engineering, so we are still in process of working out an efficient algorithm to solve this problem.
5 Conclusion In this paper, we have introduced a new framework which is used to deal with data produced from wind tunnel experiments. We settled the problem that how to fuse physical wind tunnels and virtual wind tunnels’ results together in this framework. That makes the perfect and realistic wind tunnel experiments possible. Moreover, the system will be connected with ADP’s VR environment and high performance computers. The two devices enable users to watch the results directly through 3-D models and access data conveniently. We make a contribution to the designers, so that they will not be limited by the practical physical equipments through utilizing this framework. And they no longer need worry about the correctness of the values obtained from virtual wind tunnels. The time and energy could be saving in air plane design process if the framework could be used broadly. We will work to achieve this system in the next step.
References 1. Tgomas Connolly, Carolyn Begg. “Database Systems – A Practical Approach to Design, Implementation, and Management.” (Third Edition) January, 2004. 2. A. Paventhan, Kenji Takeda, Simon J. Cox and Denis A. Nicole. “Federated Database Service for Wind Tunnel Experiment Workflows.” Science Programming 14(2006), 173–184. 3. A. Paventhan, Kenji Takeda, Simon J. Cox, and Denis A. Nicole. “Workflows for Wind Tunnel Grid Applications.” ICCS 2006, Part III, LNCS 3993, pp. 928–935, 2006. 4. S. Thamarai Selvi, S. Rame, E. Mahendran. “Neural Network Based Interpolation of Wind Tunnel Test Data.” DOI.10.1109/ICCIMA.2007.176. 5. Kurt Severance, Paul Brewster, Barry Lazos, Daniel Keefe. “Wind Tunnel Data Fusion and Immersive Visualization: A case Study.” IEEE Visualization 2001, 21–26 October, 2001. 6. Lioyd A. Treinish, “Visual Data Fusion for Decision Support Application of Numerical Weather Prediction.” IBM T.J. Watson Research Center.
Flying Sensors – Swarms in Space Stefan J¨ahnichen, Klaus Brieß, and Rodger Burmeister
Abstract The aim of the Flying Sensors research group is to develop swarm technologies for future, high-performance space-based applications. In a swarm, a large number of autonomous spacecraft cooperate with each other to jointly perform their tasks. Combining them in different formations improves both temporal and spatial sensor coverage and allows the simultaneous combination of different instruments with different perspectives. Swarms are thus valuable for large-scale space and earth observation. Our specific objective is to develop and examine solutions for distributed disaster monitoring, traffic control and atmospheric soundings. Autonomous behaviour for each swarm element and the swarm as a whole is also a basic prerequisite for future deep-space exploration. Ground-controlled setups are mostly inadequate because the radio signals delay is too long to respond to shortterm events. Having a large number of systems arranged in a redundant constellation also improves fail safety and makes services more robust (e.g. in the case of solar bursts) compared with a single spacecraft. To limit the costs of such an installation, we intend to develop a small cluster of lightweight (nano-)satellites based on commercial off-the-shelf components and release them as a secondary payload in low earth orbit.
1 Introduction Swarms are a well-established natural phenomenon but are so far little understood in theory and practice. Their main characteristic is the ability of numerous individuals to respond to unforeseen situations robustly and efficiently. In space, swarm systems are a useful means to explore all sorts of terrestrial phenomena. So far, however, there are no, or only few, theoretical, methodological or technical foundations S. J¨ahnichen, K. Brieß, and R. Burmeister Software Engineering/Space Technology, Technische Universit¨at Berlin, Germany e-mail:
[email protected];
[email protected];
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
71
72
S. J¨ahnichen et al.
for the use of swarm systems in various space applications. By swarm, we mean a number of small, autonomous, mobile systems, whose individual composition is simple but which, when acting together as a swarm, exhibit a complex, purposeful, emergent behaviour. Terms such as self-organization, swarm autonomy or swarm intelligence are often used to describe this characteristic. The potentially large number of systems involved will enable future space applications to carry out much more comprehensive, detailed missions with a short repeating (e.g. for earth observation) with greater efficiency and robustness than is possible with today’s monolithic systems or centrally coordinated constellations such as GPS. Redundancy, extensive multi-point measurements and adaptability are important properties connected with the idea of swarms and identifying a new class of systems. The idea of creating a swarm of satellites draws on experience gained at the Berlin Institute of Technology with their specific satellite design and operation know-how. Seven successful satellite missions testify to the expertise available in this area in the Berlin-Brandenburg region. One of these missions is the BIRD satellite, which was launched approximately 6 years ago and today is still orbiting the earth as a sort of space-based fire alarm. For the BIRD mission, engineers succeeded in building the entire IT infrastructure, i.e. the on-board computer plus the complete software for operating and controlling the satellite, from standardized components and ensuring the required fault tolerance (against both internal and external malfunctions) through systematic redundancy. The success of these measures is confirmed by the fact that the satellite is still operating effectively. One important result to emerge from the BIRD satellite mission was the suitability of the newly developed operating system BOSS for use in similar missions. In a satellite swarm, too, it is used as a basis for the operations of each individual swarm member. The major challenges we must master are: (1) building resource-optimized nanoand pico-satellite systems with autonomous formation-flying capacity, (2) creating a small distributed operating system that supports a robust base of swarm concepts, (3) proving the systems added value and general feasibility for different swarm applications, and (4) verifying correct operation of the overall system, including swarm concepts for preflight qualification. This paper presents the basic concepts needed to meet these four challenges.
2 Space Systems The concept of swarms is based, among other things, on sensor networks. The idea of using such networks is by no means new. It utilizes the fact that, by connecting a large number of sensors and using the totality of their signals, it is possible to improve the quality of the results obtained. The example of using different optical sensors to create 3D profiles also demonstrates that a new quality can be attained when swarms are used to explore the earths surface or distant stars in conjunction with satellite technology.
Flying Sensors – Swarms in Space
73
2.1 Sensor Networks Swarm systems can simultaneously capture and evaluate a very large number of measuring points with high temporal resolution from different perspectives. In addition, swarm systems form the basis for extendable and reconfigurable sensor networks offering greater fail safety. Potential space applications based on these characteristics are: satellite-based traffic observation systems, sensor arrays for monitoring climate parameters, disaster monitoring, optical remote sensing, communication networks, deep-space probes and exploratory missions to distant objects (planets, moons, asteroids) using microprobes or microrobots. So far, there is worldwide no experience available with the use of swarms in space applications, in the sense of emergent, autonomous systems. That is why it is necessary to develop and test new, suitable communication strategies, algorithms for partitioning and fusioning sensor data, ad hoc networks and techniques for efficient distributes on-board processing for various concrete applications. The last aspect in particular is of special importance here. Given the high volume of data and limited transmission options, the data must be preprocessed, prepared and compressed on board.
2.2 Satellite Swarms The spacecraft used for swarm applications must be small, light and powerful. The current cost of putting a payload into earth orbit is between USD 15,000 and 30,000 per kilogramme flown! Alternative launching techniques and light and compact spacecraft are therefore needed to achieve the long-term goal of putting large numbers of swarm elements into space. Nano- and pico-satellite systems (1–10 kg) meet these requirements perfectly in terms of size and launch weight. Powerful propulsion systems, attitude control systems and sensors that are specifically tailored to systems of this size are not yet available. The energy supply is also limited owning to the comparatively small solar collector plates. In the future, mission profiles will therefore continue to necessitate the use of larger and more powerful satellites in some cases, though for numerous applications the less expensive pico-satellites will suffice. In the development and manufacture of nano- and pico-satellite systems, the focus is on high-precision orbit and attitude control systems, powerful sensor systems, space- and energy-optimizing construction methods and cost-cutting alternative launching techniques. Requirements regarding the miniaturization of all technical components naturally call for special design of the IT components. Miniaturization of the hardware components is probably the least problem here, but energy consumption and fault tolerance make high demands in terms of hardware redundancy and in terms of the software, which has to be specially designed and built for this purpose. As far as qualification of the overall system is concerned, problems in this area actually begin before the launch and have to be constantly monitored during operation. The bandwidths available for communication necessitate a large
74
S. J¨ahnichen et al.
amount of on-board preprocessing. These problems are multiplied in the case of swarms because initially the data and algorithms required for solving specific problems occasionally have to be made available on several swarm members, or at least it is necessary to ensure cooperation and synchronization among swarm members. Little research has so far been done on the software requirements here and no standardized solutions are yet available.
3 Properties of Swarm Systems The satellite components, in particular its technical devices and hardware and software, must be specifically tailored to the needs of the swarm. Two requirements are of paramount importance here: the autonomy of the swarm, and thus of its members, and the emergent behaviour of the swarm. We address these two requirements in the following sections.
3.1 Autonomy The current state of the art involves the dedicated programming of each spacecraft with simple, linear control command lists. Conditional branches and context-dependent interpretations are not used because of the need for behaviour verification. Coordinating and administering a large number of spacecraft using dedicated control command lists means that each additional craft increases the administration effort involved and is therefore unsuitable for large swarm systems. Increasing autonomy enables formations to be maintained without direct ground control, data to be analyzed, decisions to be taken self-reliantly and, finally, self-coordinating platforms to be built. Here, it is not only system and swarm behaviour that is controlled autonomously but also interaction with the applications. Both the quality of situational awareness and the response capability of the individual craft are highly dependent on the efficiency of the on-board sensors and actuators. A particular challenge is implementing autonomy when resources are limited. Autonomy must be realized at the level of a so-called swarm operating system. Building on the operationally independent individual satellites, the swarm needs a distributed operating system layer that is responsible for receiving commands and, depending on the current constellation and configuration of the swarm, suitably translating them into swarm behaviour that enables the task to be successfully performed. To realize autonomy in swarm applications, we need powerful sensors, drive and attitude control systems, simple local behaviour strategies and the theoretical foundations required to control the systems.
Flying Sensors – Swarms in Space
75
3.2 Emergent Behaviour The cooperative interaction of individual elements according to swarm principles enables the performance of complex multi-component systems (e.g. sensor networks) to be improved significantly. Besides better temporal and spatial coverage, a swarm can, thanks to its inherent redundancy, respond much quicker than centralized, monolithic systems or constellations to unforeseen situations such as malfunctions, failures or temporary changes of priority, especially in cases where direct intervention from the ground is not possible. Variable distances between the swarm elements and the context-dependent fusion of sensor data also enable the system to be adapted to new situations and tasks. Swarm elements must autonomously communicate joint goals, mutually coordinate roles and cooperate in the performance of joint tasks. Only then is it possible to globally formulate goals independently of individual spacecraft and reduce administrative overhead to a consistently low level. To meet these requirements, suitable means of expression must be developed to formulate global swarm goals, analysis, inference and communication strategies must be defined and sensor fusion must be examined in the context of different applications. Here, too, the limitation of resources represents a particular challenge requiring new, innovative solutions.
4 Applications The strategic goals pursued with the development of swarm systems are creating innovative applications and using for this purpose knowledge of the theoretical foundations of distributed systems. Examples of application scenarios are: the vertical probing of the atmosphere, disaster monitoring using appropriate early-warning systems, air and sea traffic monitoring as well as the exploration of alien stars. The sort of payloads flown include Global Navigation Satellite System (GNSS) receivers or compact high-resolutions optics. For traffic monitoring, new spacecapable receivers for Automatic Identification System (AIS) and Automatic Dependent Surveillance - Broadcast (ADS-B) must be developed because of the limited space available. In terms of algorithms, new techniques must be found for data fusion and the efficient in-orbit processing and partitioning of applications (transferring established applications to distributed systems). The planning and implementation of orbit- and attitude-control aspects for formation flying requires theoretical concepts and models which enable emergent swarm behaviour to be described and synchronized and action strategies to be derived for each swarm element and implemented into suitable control commands. This involves enabling interaction between the autonomous planning component and the applications as well as interaction with the underlying operating and control system. These aspects can then be implemented with suitable communication protocols in a distributed real-time operating system (FlockOS).
76
S. J¨ahnichen et al.
For all applications, it is important to assure the required swarm properties in terms of the applications and flight qualification using suitable simulation, testing and proof methods. This, too, involves developing new information-theoretical foundations as well as systematically capturing basic hardware requirements and mission-specific characteristics and integrating them into appropriate simulation tools. In addition to the swarm-specific aspects such as concurrency or emergence, techniques and tools for formal system description and automated system qualification are needed.
5 Outlook The idea of swarms is an interesting concept for space applications because it helps complex missions to be conducted inexpensively as well as enabling small countries to participate in such missions without huge resources being required and to share in the benefits of global earth observation. New swarm members can be added to swarms with adaptive characteristics at any time. In this case, it is possible to use the competence of the entire swarm for the development and flight costs of a mere pico- or nano-satellite.
Further Reading 1. Sandau R (2006) International Study on Cost-Effective Earth Observation Missions. Taylor & Francis, London. 2. Herfort M, Berlin M, Geile HP, Yoon Z (2007) Beesat Attitude Control System. In: Small Satellites for Earth Observation, IAA-B6-0605, 191–194. 3. Simunic T, Mihic K, Micheli GD (2005) Optimization of Reliability and Power Consumption in System on a Chip. In: PATMOS 2005, 237–246. 4. Javidi B, Hong SH, Matoba O (2006) Multidimensional Optical Sensor and Imaging System. In: Applied Optics 45(13), 2986–2994. 5. Perrin S, Redarce T (1996) CCD Camera Modeling and Simulation. In: Journal of Intelligent & Robotic Systems 17(3), 309–325. 6. Gleason S, Hodgart S, Sun Y, Gommenginger C, Mackin S, Adjrad M, Unwin M (2005) Detection and Processing of Bistatically Reflected GPS Signals from Low Earth Orbit for the Purpose of Ocean Remote Sensing. In: IEEE Trans. Geosci. Remote Sensing 43(6), 1229–1241. 7. Sokolovskiy S, Rocken C, Hunt D, Schreiner W, Johnson J, Masters D, Esterhuizen S (2006) GPS Profiling of the Lower Troposphere From Space: Inversion and Demodulation of the Open-Loop Radio Occultation Signals. In: Geophys. Res. Lett. 33, L14816, doi:10.1029/2006GL026112. 8. M¨uller R, Z¨older A, Hartmann F (2004) The Historical AIS Data Use for Navigational Aids. In: Proceedings of ISIS 2004. 9. Hinchey M, Rouff C, Rash J, Truszkowski W (2003) Formal Approaches to Intelligent Swarms. In: Proceedings of the 28th Annual NASA Goddard Software Engineering Workshop (SEW’03). 10. Reynolds CW (1987) Flocks, Herds, and Schools: A Distributed Behavioral Model. In: Computer Graphics 21(4).
Flying Sensors – Swarms in Space
77
11. McLurkin J, Smith J, Frankel J, Sotkowitz D, Blau D, Schmidt B (2006) Speaking Swarmish: Human-Robot Interface Design for Large Swarms of Autonomous Mobile Robots. In: Proceedings of AAAI Spring Symposium. 12. Han CC, Rengaswamy RK, Shea R, Kohler E, Srivastava M. (2005) SOS: A Dynamic Operating System for Sensor Networks. In: ACM Proceedings of the 3rd International Conference on Mobile Systems, Applications, and Services. 13. Szymanski M, W¨orn H (2007) JaMOS – A MDL2e based Operating System for Swarm Micro Robotics. In: Proceedings of IEEE Swarm Intelligence Symposium 2007, 324–331. 14. Laneman JN, Tse D, Wornell G (2004) Cooperative Diversity in Wireless Networks: Efficient Protocols and Outage Behavior. In: IEEE Trans. Inform. Theory 50(12), 3062–3080. 15. Hinchey MG, Rouff CA, Rash JL (2005) Requirements of an Integrated Formal Method For Intelligent Swarms. In: ACM Proceedings of FMICS’05. 16. Baeten JCM, Bergstra JA (1993) Real Space Process Algebra. In: Formal Aspects of Computing 5, 481–529. 17. Weston NR, Balchanos MG, Koepp MR and Mavris DN (2006) Strategies for Integrating Models of Interdependent Subsystems of Complex System-of-System Products. In: Proceedings of the 38th Southeastern Symposium of System Theory. 18. Br¨uckner I (2007) Slicing Concurrent Real-Time System Specifications for Verification. In: Proceedings of IFM 2007 – Sixth International Conference on Integrated Formal Methods, Lecture Notes in Computer Science. Springer. 19. Hoenicke J, Maier P (2005) Model-Checking of Specifications Integrating Processes, Data and Time. In: Proceedings of FM 2005 LNCS 3582, 465–480. Springer. 20. Ayeb M, Theuerkauf H, Winsel CWT (2006). Robust Identification of Nonlinear Dynamic Systems Using Design of Experiment. In: Proceedings of IEEE International Symposium Computer-Aided Control Systems Design, 2321–2326.
∈µ -Logics – Propositional Logics with Self-Reference and Modalities Sebastian Bab
Abstract In this paper we present a special class of propositional logics with means of expression for self-reference, classical connectives, quantification over propositions, and the ability to integrate modalities coming from arbitrary modal logics over Kripke semantics. This class of logics – the so-called ∈µ -logics – was first defined in [1]. Here we show how ∈µ -logics can be used for reproducing certain natural language semantics (especially in the context of (self-)referential sentences) in the formal language of a logic.
1 Introduction The problem with reference and especially self-reference is a topic of wide interest in logic. Whenever a logic contains means of expression for self-reference together with classical negation and the possibility to speak about the truth and falsity of sentences, the problem of antinomies appears. As we will see in the next section there exist several approaches to address this problem in logic. The goal of this paper is to introduce the class of ∈µ -logics which are free from antinomies and have total truth predicates, but nevertheless offer the possibility to refer to antinomies by non-satisfiable equations.
1.1 Self-Reference and the Liar Paradox One of the most well-known antinomies is the Liar paradox given by the sentence This sentence is false. Considering the Liar paradox to be true, it results in being S. Bab Formal Models, Logic and Programming, Technische Universit¨at Berlin, Germany e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
79
80
S. Bab
false, and vice versa. The problem of the Liar paradox arises (like with all antinomies) from the fact that it is both self-referential (This sentence) and contains negation (is false in the sense of not being true). There are several ways to address the problem of antinomies in logic. Firstly, one can limit the possibility to formulate referential sentences in the logic to consistent sentences only. This approach is very strict and limits the use of (self-)reference for the reconstruction of natural language semantics in an extensive way. Another way of dealing with inconsistent sentences is to limit the notion of truth and falsity by introducing further truth values like unknown. By introducing the truth value unknown one can assign truth values to a wider area of sentences without coming into problems, but not to all self-referential sentences. The solution here is to introduce further truth values whereby an infinite hierarchy of partial truth predicates is being constructed (this approach is due to Kripke can be found in [3]). Another way of dealing with antinomies in a logic is given by referring to the antinomies over consistent sentences rather than to allow inconsistent sentences to be formulated. As we will see, this approach is used in ∈µ -logics and its related preliminary works for the treatment of antinomies.
1.2 Studies in ∈T - and ∈µ -Logics The concepts of ∈µ -logics proceed, compass, and widely extend two previous works on propositional logics by Str¨ater and Zeitz. Classical ∈T -logic (see [6]) introduces a theory of truth and propositions and was first defined by Werner Str¨ater in the context of reconstructing natural language semantics by means of self-referential structures (see [4, 5]). Formulas of classical ∈T -logic are built from classical propositional connectives together with a concept of quantification over propositions, means of expression for propositional equality and predicates for truth and falsity. Classical ∈T -logic is intensional in the way that formulas are not only interpreted as true or false, but rather explicitly interpreted as propositions which are available in its models. Str¨ater showed that classical ∈T -logic is free from antinomies despite its total truth-predicates and its ability to model self-referential sentences and impredicative quantification. Classical ∈T -logic was picked up by Zeitz who generalized its concepts in order to handle the extension of arbitrary logics by the concepts of truth, reference, and quantification over propositions. Zeitz’ parameterized ∈T -logic (see [7]) admits propositional constants to be formulas from any other logic. To cover the extension of arbitrary logics in parameterized ∈T -logic Zeitz studied different forms of abstract logics and introduced the concept of logics in abstract form, in which the semantics of a logic is given by a system of sets of formulas, which can be seen as the theories of the logic. As Str¨ater before, Zeitz showed the existence of special intensional models of parameterized ∈T -logic and showed that parameterized ∈T -logic like classical ∈T -logic is free from antinomies.
∈µ -Logics – Propositional Logics with Self-Reference and Modalities
81
The concept of ∈µ -logics by Bab (see [1]) generalizes and gives a new interpretation to the concepts of classical and parameterized ∈T -logic. ∈µ -logics form a class of propositional logics which extend arbitrary logics in abstract form. In ∈µ -logics the underlying logic is seen as the object level logic. The ∈µ -logic which extends the object level logic can then be used for meta-level reasoning about the sentences of the underlying logic. Like classical and parameterized ∈T -logic before, any ∈µ logic contains means of expression for reference and even self-reference, quantification over propositions, and classical connectives. Furthermore, ∈µ -logics allow for the integration of modalities coming from arbitrary modal logics over Kripke semantics. ∈µ -logics have been proven free from antinomies by the proof of the existence of special extensional and intensional models. Classical and parameterized ∈T -logic as well as ∈µ -logics are very interesting in their ability to handle referential sentences without coming into the problems of antinomies. Bab showed that the class of ∈µ -logics fully contains classical and parameterized ∈T -logics (see [1]). On this account we only treat the class of ∈µ logics in the following sections for studying the abilities of ∈T - and ∈µ -logics in handling referential sentences.
2 ∈µ -Logics In this section we present the concept of ∈µ -logics as it was defined by Bab in [1]. The class of ∈µ -logics compasses the concept of parameterized ∈T -Logic by Zeitz in [7] and the concept of classical ∈T -Logic by Str¨ater in [6]. The class of ∈µ -logics is generated by two parameters: firstly, a logic L in abstract form which functions as the object level logic which is to be extended in the corresponding ∈µ -logic. Secondly, a set of propositional constructors M which can be used as propositional means of expression in the corresponding ∈µ -logic. The ∈µ -logic over L and M is called ∈µ (L , M).
2.1 Basic Concepts and Syntax of ∈µ -Logics The class of ∈µ -logics compasses a wide range of propositional logics, which are similar in some basic concepts: every ∈µ -logic contains a basic set of propositional constructors, extends an underlying object level logic in abstract form, and brings some basic semantic concepts to interpret formulas as propositions. Propositions can be either true or false. A formula is said to be true in the semantics of an ∈µ -logic if and only if the formula is interpreted as a true proposition. We define the concept of logics in abstract form according to Zeitz in [7]1 :
1
For a more detailed discussion on logics in abstract form see for example [1, 2, 7].
82
S. Bab
Definition 1 (Logics in abstract form, logical consequence, validity). A logic in abstract form L = (L, B) is given by a set of formulas L and a set B of subsets of L which is called the basis. For a logic in abstract form the following derived concepts are defined: 1. The logical consequence relation B is defined as: Φ B ϕ iff Φ ⊆ B implies ϕ ∈ B for all B ∈ B. 2. A formula ϕ is called consistent if there is at least one B ∈ B with ϕ ∈ B. 3. A formula ϕ is called tautological if ϕ ∈ B for all B ∈ B. The basic set of propositional constructors of every ∈µ -logic contains classical negation and implication as a basis, a constructor for stating propositional equivalence of formulas (which means that the formulas are interpreted as the same proposition), and the all-quantifier for quantifying over all available propositions. As already stated above every ∈µ -logic can be extended by a set of modalities as means of expression. These means of expression are represented in a so-called integration form (for a detailed discussion and introduction into the following concepts see [1, Chapter 4]): Definition 2 (Integration form of constructors). An integration form of a set of constructors is given as M = (C, s, K, F) where C is a set of constructors, s : C → N is a function which states the arity of the constructors, K is a class of Kripke structures2 as the basis for the semantics of the constructors, and F contains for every c ∈ C a function fc of one of the following forms: (a) fc : {T , F }s(c) → {T , F }. (b) fc : N s(c) (K) → {T , F } where N s(c) (K) := {((W, R), w, V ) | (W, R) ∈ K, w ∈ W , V ∈ Vs(c) (W )} and Vs(c) (W ) := {V | V s(c) : W → {T , F }s(c) }. Constructors with functions of the first form are called local, whereas constructors with functions of the second form are called modal. Based on the above definitions we can now define the syntax of an ∈µ (L , M)logic: Definition 3 (Syntax of ∈µ -logic). Let L = (L, B) be a logic in abstract form, M = (C, s, K, F) an integration form of a set of constructors, and X = {xi | i ∈ N} a well-ordered set of variables. The set of formulas Lµ of the ∈µ (L , M)-logic is defined as follows: 1. X ⊆ Lµ and L ⊆ Lµ . 2. ϕ ∈ Lµ implies ¬ϕ ∈ Lµ . 3. ϕ, ψ ∈ Lµ implies (ϕ → ψ), (ϕ ≡ ψ) ∈ Lµ . 4. ϕ ∈ Lµ and x ∈ X implies (∀x.ϕ) ∈ Lµ . 5. For all c ∈ C: ϕ1 , . . . , ϕs(c) ∈ Lµ implies c(ϕ1 , . . . , ϕs(c) ) ∈ Lµ . A Kripke structure is defined in the usual way as K = (W, R), where W is set of worlds and R ⊆ W × W is an accessibility relation on the worlds of W .
2
∈µ -Logics – Propositional Logics with Self-Reference and Modalities
83
2.2 Semantics of ∈µ -Logics The semantics of ∈µ -logics is given in the terms of ∈µ -structures. Every ∈µ -structure consists of a Kripke structure for the interpretation of the modal constructors which are to be integrated, a set of propositions M, a family of subsets Tw ⊆ M for every w ∈ W which state the true propositions in the different worlds, and a meaning function Γ which interprets formulas as propositions depending on the assignment of the variables and the current world. The meaning function underlies several conditions which guarantee that the interpretation of formulas as propositions comply with the truth functionality of the constructors and with the intuition as well. An ∈µ -structure is defined as follows3 : Definition 4 (∈µ -structure). Let ∈µ (L , M) be the ∈µ -logic over L = (L, B) and M = (C, s, K, F). An ∈µ (L , M)-structure is defined as M = (K , M, T , Γ ) where: 1. K = (W, R) ∈ K is a Kripke structure. 2. M is a non-empty set of propositions and T = (Tw )w∈W is a family of non-empty sets Tw M, which state the true propositions in the worlds of W . 3. Γ : Lµ × W × [X → M] → M is the meaning function which interprets formulas as propositions, where the following conditions hold: (a) Truth properties: For every w ∈ W and every propositional assignment β : X → M it holds: (1) Γ (¬ϕ, w, β) ∈ Tw ⇔ Γ (ϕ, w, β) ∈ / Tw (2) Γ ((ϕ → ψ), w, β) ∈ Tw ⇔ Γ (ϕ, w, β) ∈ / Tw or Γ (ψ, w, β) ∈ Tw (3) Γ ((ϕ ≡ ψ), w, β) ∈ Tw ⇔ Γ (ϕ, w, β) = Γ (ψ, w, β) (4) Γ ((∀x.ϕ), w, β) ∈ Tw ⇔ for all m ∈ M: Γ (ϕ, w, β[x/m]) ∈ Tw
m, if y = x Hereby β[x/m](y) := β(y), if y = x (b) Contextual properties: For every w ∈ W and every β : X → M it holds: (i) Γ (x, w, β) = β(x) for every x ∈ X. (ii) For every β1 , β2 : X → M with β1 Free(ϕ) = β2 Free(ϕ) 4 it holds that Γ (ϕ, w, β1 ) = Γ (ϕ, w, β2 ). (iii) For every ϕ, ψ that are alpha-congruent5 it holds that Γ (ϕ, w, β) = Γ (ψ, w, β) in all worlds w ∈ W and for every β : X → M. (iv) For the negation, the implication and the integrated local constructors ∈ C there are functions ¬ : W × M → M, → : W × M × M → M und For a detailed introduction and discussion of the concepts of ∈µ -structures and the conditions of the meaning function see [1]. 4 Hereby Free(ϕ) denotes the set of free variables in ϕ in the usual way. 5 This means that ϕ and ψ only differ in the names of their bound variables. 3
84
S. Bab
: W × M s() → M such that for all ϕ, ϕ1 , ϕ2 , . . . , ∈ Lµ , all w ∈ W and alle β : X → M the following holds: Γ (¬ϕ, w, β) = ¬(w, Γ (ϕ, w, β)) Γ ((ϕ1 → ϕ2 ), w, β) = →(w, Γ (ϕ1 , w, β), Γ (ϕ2 , w, β)) Γ ((ϕ1 , . . . , ϕs() ), w, β) = (w, Γ (ϕ1 , w, β), . . . , Γ (ϕs() , w, β))
(c) Integration properties: (i) For all local constructors ∈ C, all formulas ϕ1 , . . . , ϕs() , all w ∈ W and all β : X → M it holds that: Γ ((ϕ1 , . . . , ϕs() ), w, β) ∈ Tw ⇔ f (v1 , . . . , vs() ) = T ,
T , if Γ (ϕi , w, β) ∈ Tw where for all i: vi := F, if Γ (ϕi , w, β) ∈ / Tw (ii) For all modal constructors ∈ C, all formulas ϕ1 , . . . , ϕs() , all w ∈ W and all β : X → M it holds that: Γ ((ϕ1 , . . . , ϕs() ), w, β) ∈ Tw ⇔ f(K, w, V ) = T , where for all w ∈ W the function V is defined by V (w ) := (v1 , . . . , vs() ),
T , if Γ (ϕi , w , β) ∈ Tw where for all i: vi := F, if Γ (ϕi , w , β) ∈ / Tw A formula is said to be true in an ∈µ -structure if and only if the meaning function interprets the formula as a true proposition. Thus the following validity relation can be derived: Definition 5 (Validity relation). The validity relation |=µ for an ∈µ (L , M)structure M = (K , M, T , Γ ) is defined as: (M , w, β) |=µ ϕ :⇔ Γ (ϕ, w, β) ∈ Tw .
2.3 Special ∈µ -Logics As already stated above the class of ∈µ -logics covers the parameterized ∈T -logic by Zeitz. By integrating predicates for true and false over a special Kripke structure into the ∈µ -logic the corresponding class of ∈µ -logics over the underlying
∈µ -Logics – Propositional Logics with Self-Reference and Modalities
85
object level logics matches the parameterized ∈T -logic (see [1, Section 6.3]). We define the integration form for the predicates for true and false of the parameterized ∈T -logic: Definition 6 (Integration form for :true and :false). Let M∈T := (C, s, Klocal , F) with Klocal := ({w}, ∅) be defined as follows: • C := {:true, :false} with s(:true) = s(:false) = 1. • F := {f:true , f:false } with f:true , f:false : {T , F } → {T , F } and f:true (T ) := T , f:true (F ) := F and f:false (T ) := F , f:false (F ) := T . The integration of the predicates for true and false into the ∈µ -logics is of course not limited to the above Kripke structure, but can be applied to any Kripke structure possible. We can therefore integrate for example predicates for true and false together with arbitrary modalities like possible and necessary or any other modalities from temporal logics into ∈µ -logics to construct very complex sentences.6
3 Self-Referential Sentences in ∈µ -Logics In this section we will show how self-referential sentences can be formulated in ∈µ -logics and how antinomies like the Liar paradox can be referred to in ∈µ -logics.
3.1 Reference and Quantification over Propositions Self-reference comes in two different ways in ∈µ -logics. Firstly, we can use the means of propositional equivalence for formulating self-referential sentences. Consider a formula ϕ ≡ ψ where (for example) ϕ is a sub-formula of ψ. The formula ϕ ≡ ψ states that ϕ and ψ are interpreted as the same proposition. Due to the fact that ϕ is a sub-formula of ψ the formula ϕ ≡ ψ is a self-referential sentence on ϕ. Beside using propositional equivalence for building self-referential sentences, we can use the all-quantifier for formulating self-referential sentences as well. Consider a formula ∀x.ϕ. Every formula in ∈µ -logics is being interpreted as a proposition by the meaning function, and so it is with the formula ∀x.ϕ, too. The all-quantifier quantifies over all available propositions and thus ∀x.ϕ is talking about its meaning (which means the proposition denoted by ∀x.ϕ), too. Thus ∀x.ϕ is self-referential in the way that it is explicitly talking about the proposition it denotes.
6
For detailed information about the representation of modalities in integration forms and their integration into ∈µ -logics see [1].
86
S. Bab
3.2 Treating Antinomies Antinomies are being referred to in ∈µ -logics in form of non-consistent equations. Consider the Liar paradox introduced in Section 1.1. The Liar proposition (i.e. the proposition stated by the Liar paradox) is not a proposition in ∈µ -logics, because ∈µ -logics are free from antinomies. But one can refer to the Liar proposition in form of the equation x ≡ x:false. This equation does not have a solution, because there is no assignment to x which fulfills the equation. However, any solution to this equation would result in being equivalent to the Liar proposition. This means that the Liar proposition can be referred to by an unsolvable equation in ∈µ -logics and so it is possible to refer to other antinomies as well. ∈µ -logics use the fact that antinomies are not directly available as propositions in the semantics, so that there is no problem with inconsistency, but one can implicitly refer to antinomies.
4 Conclusion In the previous sections we defined a very rich class of logics with means of expression for reference (including self-reference), propositional equivalence, and quantification. Furthermore ∈µ -logics can be enriched by modalities coming from arbitrary modal logics over Kripke semantics. Thereby the class of ∈µ -logics defines a wide range of propositional logics which can be used for reproducing certain natural language semantics in the formal language of a logic.
References 1. Bab, S.: ∈µ -Logik – Eine Theorie propositionaler Logiken. Shaker Verlag (2007). Dissertation, Technische Universit¨at Berlin, 2007 2. Bab, S., Mahr, B.: ∈T -Integration of Logics. In: Formal Methods in Software and Systems Modeling, pp. 204–219. Springer (2005) 3. Kripke, S.: Outline of a theory of truth. The Journal of Philisopy 72 pp. 690–716 (nachgedruckt in [Mar84], Seiten 53–81) (1975) 4. Mahr, B.: Applications of type theory. In: Proceedings of the International Joint Conference CAAP/FASE on Theory and Practice of Software Development, pp. 343–355. Springer (1993) 5. Mahr, B., Str¨ater, W., Umbach, C.: Fundamentals of a theory of types and declarations. Forschungsbericht, KIT-Report 82, Technische Universit¨at Berlin (1990) 6. Str¨ater, W.: ∈T Eine Logik erster Stufe mit Selbstreferenz und totalem Wahrheitspr¨adikat. Forschungsbericht, KIT-Report 98 (1992). Dissertation, Technische Universit¨at Berlin 7. Zeitz, P.: Parametrisierte ∈T -Logik: Eine Theorie der Erweiterung abstrakter Logiken um die Konzepte Wahrheit, Referenz und klassische Negation. Logos Verlag Berlin (2000). Dissertation, Technische Universit¨at Berlin, 1999
Compositionality of Aspect Weaving ¨ Florian Kammuller and Henry Sudhof
Abstract One approach towards adaptivity is aspect-orientation. Aspects enable the systematic addition of code into existing programs. In order to provide safe and at the same time flexible aspects for such adaptive systems we address the verification of the aspect-oriented language paradigm. This paper first gives an overview of our aspect calculus and summarizes previous results. Then we present a new compositionality lemma prerequisite for so-called run-time weaving. The entire theory and proofs are carried out in the theorem prover Isabelle/HOL.
1 Introduction An important research subject concerning networks of highly autonomous components in distributed systems is adaptability. Systematic code adaptation is a necessity for the reliability and the deployment on a large scale of autonomous component systems. Adaptability requires that new artifacts may be dynamically added as a prerequisite for deployment or quality of service requirements of an instance, but also during the entire life cycle of a software component. Moreover, the realization of partially autonomous components, that can, for example, defend themselves automatically against attacks, demands as well that the notion of adaptability is well understood, easily implemented and modelled. By providing a set of aspects a toolset can be constructed from which autonomous components can select aspects to adapt to current needs. Automated formal analysis with proof assistants provides a strong support for the analysis of safety properties of programming languages [6]. Our approach to support the verification of adaptive systems consists of providing a fully formalized basis for aspect-oriented programming in Isabelle/HOL [4]. We construct a core F. Kamm¨uller and H. Sudhof Software Engineering, Technische Universit¨at Berlin, Germany e-mail: {flokam, hsudhof}@cs.tu-berlin.de B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
87
88
F. Kamm¨uller, H. Sudhof
calculus of objects and aspects with types as an instance of the generic theorem prover Isabelle/HOL. The resulting framework serves to experiment with language features – for example, weaving functionality and pointcut selectors – and properties – for example, type safety and compositionality. At the same time, our results have mathematical precision and are mechanically verified. Moreover, we try to keep the formal model of the aspect calculus as constructive as possible. Thereby, we can extract executable prototypes for evaluators and type checkers from the Isabelle/HOL framework. In this paper we first give in Section 2 a short introduction to necessary prerequisites for our work. In Section 3 we present our calculus of aspects summarizing previous results on confluence and type safety. Section 4 then presents in detail the compositionality theorem for ςAsc , followed by Section 5 closing with a discussion.
2 Preliminaries Aspects enable the systematic and efficient adaptation of existing programs by adding (weaving) code segments (advice) at user-defined points (pointcuts). For example, a given implementation for a secure group communication in a network could be adapted to support only authenticated channels by weaving in an advice that implements user authentication prior to any remote method call. Our DFG-funded project Ascot [4] mechanizing aspect-orientation has produced some important first results [5, 7–9] forming a sound basis for security-critical applications of aspect-orientation. More specifically, we have constructed a full formalization of the ς -calculus in Isabelle/HOL, proved confluence, and extended the base calculus to ςAsc , a calculus for aspects and weaving. More prominently, we have defined a type system for aspects on ςAsc and proved type safety [8]. The basic idea of our calculus of aspects is similar to the theory of aspects [10] but we start from the Theory of Objects, unlike the former that is based on the λ-calculus. To model aspects we introduce labels in the base program. These labels represent socalled join-points, i.e. points at which advice might be woven in. Given these labels, we can quite naturally define weaving: advice is given as a function applicable to a labelled term, replacing the original term. So, given an aspect as a pair of pointcuts L and an advice that shall be applied at all join-points specified by L, weaving can be constructed by function application, as illustrated in the following example (weaving is represented as ⇓). l1 ∈L
L.λ x. e ⇓ (v1 + l1 v2 ) −→ v1 + e[v2 /x] We next introduce the prerequisites for the ςAsc calculus.
Compositionality of Aspect Weaving
89
Fig. 1 The primitive semantics of the ς-calculus as introduced in [1]
2.1 The ς-Calculus In a Theory of Objects [1] Abadi and Cardelli developed the ς -family of calculi to formally study object-orientation. These calculi are widely accepted as conceptual equivalents of the λ-calculus for objects, since the objects can be directly used as a basic construct without having to be simulated through λ-expressions. In the ς -calculi, an object is defined as a set of labelled methods. Each method is a ς -term in its own right and has a parameter self, in which the enclosing object is contained. There are three flavors of primitives from which to build such terms: object definitions, method invocation and field update, which are presented in Figure 1. Methods not using the self parameter are considered to be fields.
2.2 Isabelle/HOL Isabelle [11] is an interactive ML-based theorem prover. It was initially developed by Lawrence Paulson at the University of Cambridge and is today maintained there and at the TU Munich. Unlike many other interactive provers, Isabelle was written to serve as a framework for various logics, so-called object-logics. Today, mostly the object-logic for Higher-Order-Logic (HOL) and – on a smaller scale – the one for Zermelo-Fraenkel set theory are in widespread use. Isabelle has a meta-logic serving as a deductive framework for the embedded object-logics. This meta-logic is itself a fragment of HOL solely consisting of the universal quantifier and the implication. Isabelle features a powerful simplifier, and automated proof strategies. For this paper, Isabelle/HOL was used, e.g. Isabelle in its instantiation to HOL. In Isabelle/HOL automatic code generation is possible for constructive parts of a formalization, like datatypes and inductive definitions (see below), but also for constructive proofs. A very generic parser enables application-specific definition of concrete syntax (so called mixfix syntax) making Isabelle formulae and proofs almost identical to pen-and-paper formalizations. In general, Any Isabelle/HOL specific syntax that we will be using throughout the paper is going to be explained when we use it.
2.3 De Bruijn Indices One known hard problem [12] in the formalization of language semantics is the representation of binders, e.g. the operator λ in the λ-calculus that binds a variable
90
F. Kamm¨uller, H. Sudhof
x over a term t in which x may occur. More precisely, the actual problem lies in the complexity created by isomorphic terms that differ only in the choice of variable names: α-conversion. De Bruijn indices overcome the problem of concrete variable names, and thus α-conversion, by simply eliminating them. A variable is replaced by a natural number that represents the distance – in terms of nesting depth – of this variable to its binder. Thereby terms contain only numbers, no variable; α-conversion becomes obsolete. This is a considerable advantage as α-conversion is a difficult problem both from a practical point of view and for mechanical proofs. An example for illustrating the use of de Bruijn indices is given by the following simple λ-term. λx.λy.(λz. x z)y = Abs(Abs(Abs(V ar 2)$(V ar 0))(V ar 0)) Note that, different variables may be represented by the same number, e.g. z and x both are V ar 0 . De Bruijn indices relieves one from having to deal with α-conversion: for example both λx.x and its α-equivalent λy.y are represented by Abs(V ar0). The disadvantage of de Bruijn indices is that substitution, normally used for the definition of application, is difficult to construct. A term has to be “lifted”, that is, his “variables” have to be increased by one, when it moves into the scope of an abstraction in the process of substitution. We will see these operations when we introduce our ςAsc -calculus in the next section.
3 A Theory of Aspects in Isabelle/HOL 3.1 Terms of the ςAsc Calculus datatype dB = | | | |
Var nat Obj (label dB) type Call dB label Upd dB label dB Asp Label dB ("_ _ ")
The constructor Var builds-up a new term dB from a nat representing the de Bruijn index of the variable. An object is recursively defined by a finite map from label, the predefined types of “field names”, to arbitrary terms of type dB. The second argument of type type to the dB-constructor Obj is the Object’s type. We insert the type with an object in order to render the typing relation unique. The cases Call and Update similarly represent, field selection and update of an object’s field. The field constructor Asp enables the insertion of aspect labels into object terms. These labels will not be assigned any semantics until the point where we define weaving in Section 3.5. The annotation behind the constructor in quotation marks defines the mixfix syntax: we can use the notation lt as abbreviation for Asp l t.
Compositionality of Aspect Weaving
91
3.2 Lifting and Substitution As de Bruijn indices discard the use of formal parameters, substitution has to be performed by adapting the numbers representing variables when a term is moved between different layers of the nested scopes of abstraction. This movement occurs precisely when a variable has to be substituted by a term containing a free variable inside the scope of an abstraction. Therefore the notion of substitution is chained with the notion of lifting. We declare the following two constants in Isabelle/HOL. We define two operators lift and subst using mixfix syntax again to write t[s/n] to express that in a term t the variable represented by n shall be replaced by s. Before defining the semantics of substitution we need to define the lifting of a term. A lifting carries a parameter n representing the cut between free and bound variable numbers in the term that shall be lifted. The operation lift is defined by the following set of primitive recursive equations describing the effect of lifting over the various cases of object terms. liftVar: liftObj: liftCall: liftUpd:
lift lift lift lift
(Var i) k = (if i < k then Var i else Var (i + 1)) (Obj f) k = (Obj (map (λ x. lift x (k + 1)) f)) (Call a l) k = Call (lift a k) l (Upd a l b) k = Upd (lift a k) l (lift b (k + 1))
A variable is only lifted when it is free, i.e. when its representing number is greater or equal to the “cut” parameter. The “cut” parameter is increased in the recursive call when an abstraction scope is entered. This is the case when the lift function enters inside a method in an object, and when a field is updated by a method. Note that we increase only on the right side of an update because the left side will always be an object seen as a reference whereas the right side is a method. Based on the definition of lift, substitution can be defined as follows. subst_Var: subst_Obj: subst_Call: subst_Upd:
Var i [s/k] = if k < i then Var (i - 1) else if i = k then s else Var i Obj f [s/k] = Obj (map (λ x. x[(lift s 0)/(k+1)]) f) Call a l [s/k] = Call (a [s/k]) l Upd a l b [s/k] = Upd (a [s/k]) l (b [lift s 0 / k+1])
The idea is that a term s is lifted if it is substituted inside an abstraction scope, i.e. inside an object and at the right side of an update. The lifting is always initiated with “cut” parameter 0 as initially the outermost free variable when entering a scope. The decrementing in the equation for Var in cases of free variables greater than the “cut” parameter is necessary to cancel out the previous effects of lifting.
3.3 Evaluation of Terms Given the Isabelle/HOL definition of substitution for ς -terms as t[s/n] meaning substitute n by s in t using mixfix syntax, we define a small step operational semantics by a relation →β using an inductive definition.
92
F. Kamm¨uller, H. Sudhof inductive intros beta: l upd : l sel : s updL: s updR: s obj :
→β
∈ dom ∈ dom →β t →β t →β t s →β =⇒ Obj asp : s →β t
f =⇒ Call (Obj f) l →β the(f l)[(Obj f)/0] =⇒ Upd (Obj f T) l a →β Obj (f (l → a) T) =⇒ Call s l →β Call t l =⇒ Upd s l u →β Upd t l u =⇒ Upd u l s →β Upd u l t t; l ∈ dom f (f (l → s) T) →β Obj (f (l → t) T) =⇒ l s →β l t f f
The rules represent quite closely the original semantics of ς . The substitution [(Obj f T)/0] in the rule beta replaces the self parameter for the outermost variable in the object’s lth field f l. The operator the selects an α-element in an option datatype when it is defined, i.e. unequal to None. There is no case for labels because the semantics is attached to labels later by weaving.
3.4 Aspects An aspect can be simply defined as a selection of pointcuts and an advice. Since our model is in Higher Order Logic, where sets are isomorphic to predicates, we can assume that our selection of pointcuts is a set of labels. The advice is a ς -term not enclosed in an object, because an advice is applied to the sub-expression of a ς -program that is marked by a label returning another ς -term as a result. Hence, in Isabelle/HOL aspects can be simply defined as follows. datatype aspect = Aspect (Label list) dB
(" _, _ ")
The first element is the pointcut set L and the second element the advice to be applied to all points matching the pointcut description, i.e. being member of L. The mixfix syntax at the righthandside enables to annotate an aspect as L, a.
3.5 Weaving Given a base program in the ς -calculus readily labelled with aspect labels and given some aspects, the weaving function now only has to step through the term while applying the aspect. We consider this approach to resemble static weaving, but given the functional nature of our calculus, we consider the result to be valid for dynamic approaches as well. Therefore we define a function “weave” represented as ⇓ that takes a ς -program and an aspect and returns a ς -program. The second operator weave option is an auxiliary function that is needed to “map” the weaving function over the finite maps representing objects. weave :: [ dB, aspect ] ⇒ dB ("⇓") weave_option :: [ dB option, aspect ] ⇒ dB option ("⇓opt ")
Compositionality of Aspect Weaving
93
We define the weaving function for the simple case of applying one aspect to a program. The general case is later derived by repeated application. The definition of the simple case is given below in a mutual recursive definition defining the semantics of weave and weave option by simple equations. In case of weaving an aspect onto a variable Var n the advice has no effect. The case lt is the interesting one because now the ς -term for aspects, Asp, is finally equipped with semantics. In case that the label is in the pointcut specified by the first component of the aspect then the aspect matches. Consequently the advice part of the aspect a is applied to the current term t. Otherwise the aspect has no effect. The label is not eliminated during the weaving process to enable repeated weaving. primrec (Var n) ⇓ L, a = Var n l t ⇓ L.a = if l ∈ set(L) then l a[(t ⇓ L, a)/0] else l t ⇓ L, a
The Isabelle/HOL projection set transforms a list (here of labels) into the set of all elements contained in the list. The next two equalities for Call and Upd simply define that the weave process is to be passed through to the corresponding sub-terms. (Call s l) ⇓ A = Call (s ⇓ A) l (Upd s l t) ⇓ A = Upd (s ⇓ A) l (t ⇓ A)
The primitive recursive equations defining the semantics for Obj is now the point where the recursion changes to the auxiliary operator weave option. The auxiliary operator enables the pointwise definition of advice on the fields of the object by lifting the weaving function over the λ to argument position. In the defining equations for weave option (⇓opt ) we see the benefit gained by using the option type: we can explicitly use pattern matching to distinguish the case for unused field labels (None) and actual object fields matching out the field value with Some. (Obj f T) ⇓ A = Obj (λ l. ((f l) ⇓opt A)) T None ⇓opt A = None (Some t) ⇓opt A = Some (t ⇓ A)
4 Compositionality and Run-Time Weaving An important question for aspects and their practical usability is the compositionality of weaving. In aspect parlance compositionality corresponds to the possibility of run-time weaving. Figure 2 illustrates this question for AspectJ graphically: when does this diagram commute? (index sc stands for source, bc for bytecode, p for program, and ptc and adv for pointcut and advice.). In more foundational terms this questions represents that aspect weaving is respected by compilation or evaluation of a program. Alternatively, we can say that aspect weaving is compositional. We have made a major step forward in tackling the compositionality question by proving the following central lemma.
94
F. Kamm¨uller, H. Sudhof
Fig. 2 Do compile-time and run-time weaving commute?
Lemma 1 (Distribution of weave over subst). Let a be a well-formed aspect, i.e. containing just one free variable, and s be a closed subterm (no free variables) of program t. Then weaving distributes over substitution. justoneFV(adv a) ∧ noFV s =⇒ t[s/n] ⇓ a = (t ⇓ a)[(s ⇓ a)/n]
Considering the main rule beta of the operational semantics in Section 3.3, i.e. Call (Obj f) l →β the(f l)[(Obj f)/0] we see the significance for compositionality. This substitution represents function application in the language; weaving is based on function application as well. Hence, the lemma defines that weaving distributes over function application which is the essence of compositionality.
5 Conclusions We have motivated the use of aspects for self-adapting systems and argued that a sound basis is prerequisite for safely constructing such systems. After an overview of our mechanized calculus ςAsc for aspects we have addressed compositionality. Compositionality is a very central property for any kind of software system because only compositionality enables to apply divide-and-conquer style of construction which is at the basis of most algorithmic solutions. Compositionality is often very hard to get by. Many properties of interest, like security for example, are not compositional – hence, the importance of our small result. We showed a central lemma that clears up the distributivity between function application, i.e. substitution, and weaving, the main drivers of the operational semantics of the ςAsc calculus. In general, we are aiming at using a mechanized framework like Isabelle/HOL to build a sound core calculus with as much good properties as possible. We want to increase the understanding of the basic principles of aspect-orientation through adding stepwise specific constructs and thereby defining precise borderlines. We have used de Bruijn indices in the formalization of our aspect calculus. As a final thought of this paper we would like to contemplate their significance. De Bruijn indices are often criticized because they differ from the intuitive notion of names. Recent approaches on nominal techniques [13] offer alternative techniques that are unfortunately not yet sufficiently developed for our application. Moreover, there are other very recent techniques, like locally-nameless, that are also based on de Bruijn indices. With our compositionality result we illustrate again the practicality of de Bruijn indices.
Compositionality of Aspect Weaving
95
References 1. M. Abadi and L. Cardelli. A Theory of Objects. Springer, New York, 1996. 2. H. P. Barendregt. The Lambda Calculus, its Syntax and Semantics. North-Holland, 1984. 3. L. Henrio and F. Kamm¨uller. A Mechanized Model of the Theory of Objects. 9th IFIP International Conference on Formal Methods for Open Object-Based Distributed Systems, FMOODS’07. LNCS 4468, Springer, 2007. 4. S. J¨ahnichen and F. Kamm¨uller. Ascot: Formal, Mechanical Foundation of Aspect-Oriented and Collaboration-Based Languages. Web-page at http://swt.cs.tu-berlin.de/∼ flokam/ascot/index.html Project with the German Research Foundation (DFG), 2006. 5. F. Kamm¨uller. Exploring New OO-Paradigms in HOL: Aspects and Collaborations. Theorem Proving for Higher Order Logics, TPHOLs’05, Emerging trends. Technical Report PRG-RR05-02, Oxford University, 2005. 6. F. Kamm¨uller. Interactive Theorem Proving in Software Engineering. Habilitationsschrift (habilitation thesis), Technische Universit¨at Berlin, 2006. 7. F. Kamm¨uller and H. Sudhof. A Mechanized Framework for Aspects in Isabelle/HOL. 2nd ACM SIGPLAN Workshop on Mechanizing Metatheory, 2007. 8. F. Kamm¨uller and H. Sudhof. Composing Safely – A Type System for Aspects. 7th International Symposium on Software Composition, SC’08. Satellite to ETAPS’08. LNCS 4954, Springer 2008. 9. F. Kamm¨uller and M. V¨osgen. Towards Type Safety of Aspect-Oriented Languages. In Foundations of Aspect-Oriented Languages, AOSD’06, 2006. 10. J. Ligatti, D. Walker, and S. Zdancewic. A type-theoretic interpretation of pointcuts and advice. Science of Computer Programming: Special Issue on Foundations of Aspect-Oriented Programming. Springer 2006. 11. T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL – A Proof Assistant for Higher-Order Logic, LNCS 2283, Springer 2002. 12. The POPLmark challenge. http://alliance.seas.upenn.edu/∼plclub/cgi-bin/ poplmark. July 2007. 13. C. Urban et al. Nominal Methods Group. Project funded by the German Research Foundation (DFG) within the Emmy-Noether Programme, 2006.
Towards the Application of Process Calculi in the Domain of Peer-to-Peer Algorithms Sven Schneider, Johannes Borgstr¨om, and Uwe Nestmann
Abstract Peer-to-Peer (p2p) algorithms are nowadays standard. However, their specification and verification is not. Currently, the properties that such algorithms should satisfy are stated informally, and the algorithms themselves are often given as pseudo-code. Because of this, no satisfying methods for modeling, specifying and/or verifying these algorithms have yet been developed. We therefore propose a distributed stochastic process calculus to model such algorithms and to formally state and prove relevant functional and performance properties.
1 Introduction Formal verification, especially of distributed algorithms, is a difficult but important topic. In the area of peer-to-peer systems, it is often even unclear exactly what the desirable properties are. In this paper, we propose a new process calculus to model, specify and verify peer-to-peer (p2p) systems: large-scale distributed systems with an unbounded lifetime, unbounded and dynamically changing number of nodes, unpredictable message delays, link failures and lacking shared memories. In the standard setting of non-distributed programs, that do not possess the properties mentioned above, performance analysis is often concerned with, e.g., the number of operations executed by the processor. In the p2p domain we are mainly interested in the minimization of the periods in which no local computation is performed and requests are still pending. These waiting times are mainly influenced by message delays and routing decisions. The tradeoff between routing quality and the number of pointers stored at the locations drives p2p engineers to come up with at most logarithmic costs for the main operations and logarithmic data S. Schneider, J. Borgstr¨om, and U. Nestmann Theory of Distributed Systems, Technische Universit¨at Berlin, Germany e-mail:
[email protected]; {johannes.borgstroem, uwe.nestmann} @tu-berlin.de B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
97
98
S. Schneider et al.
structures (both in the size of the network). Less important factors affecting performance, from which we therefore abstract, are message sizes and the duration of local computation. In this paper we focus on the formal modeling of p2p algorithms and the formal description of their properties. An adequate formal model must treat local computation and distribution, as well as the environment and other conditions under which algorithms are to behave properly. Proper behavior is captured by means of qualitative and quantitative observations on structural and computational aspects. For this, we propose a process-calculus based formalism that provides primitive notions of communication, concurrency, distribution, failure and stochastic behavior. In Section 2, we explain Chord [13], our guiding example, and derive requirements for formalisms from it. In Section 3, we discuss candidate formalisms for p2p systems. In Section 4, we propose a process calculus, in which Chord can be modeled. There, we also discuss the representation of relevant properties of p2p systems and applicable verification techniques. We present possible extensions to our work in Section 5.
2 Chord: A Motivating Challenge In this section we introduce the example p2p system Chord. We have chosen Chord because it exhibits all of the mentioned characteristics of p2p systems without being as complex as more optimized systems. We defer the investigation of more optimized solutions until we have proven the applicability of our methods to Chord.
2.1 Purpose of Chord To explain the service of Chord we begin with the definition of hashmaps, which is a data structure with constant time complexity for insertion, lookup and deletion. Definition 1 (Hashmap). When V is a set containing elements of interest and K ⊂ N is a finite subset of the natural numbers that we call a set of keys, then the function hf : V → K is a hash function and the partial function hm : K V is a hashmap. A hash function is good iff the number of values mapped is approximately the samefor every key. Formally, we require that to a key max hf −1 (k)k ∈ K − min hf −1 (k)k ∈ K is small. For example the insertion of a value v into a hashmap hm using a hash function hf proceeds as follows: we use the hash function to generate the key k and then we overwrite any previous assignment of the generated key with the value (i.e., hm := λx. if x = hf (v) then v else hm(x)). When it becomes inefficient to send all the data to a central set of file servers, we need to distribute the data among nodes in a network (possibly the internet, which is probably the worst case). In this scenario we create a distributed hashmap.
Towards the Application of Process Calculi in the Domain of Peer-to-Peer Algorithms
99
Definition 2 (Distributed Hashmap). Iff the elements of the set of assignments of a hashmap ({(k, v) | h(k) = v}) are distributed among nodes in a network we call the hashmap a distributed hashmap. Such a distribution is good iff the number of elements stored at a node is approximately the same for every node. The time complexities of insertion, lookup and deletion of values from a distributed hashmap depend on the time it takes to locate the node that is responsible for a given key. The service of the Chord algorithm is to find this location, i.e., given a key it returns the ip address and port of the responsible node.
2.2 Implementation of Chord Chord is a structured p2p system: The possible communication partners of a given node are predefined and not random. A ring is generated from a (finite, ordered) space of keys (see Fig. 1). Each participating location generates a key when it desires to join. The node then becomes responsible for all keys that are in the interval (.pred.key, .key], where pred is its predecessor in the ring. Every node stores pointers to (a) its successor, (b) its predecessor and (c) the locations responsible for the keys in a distance from itself of 20 , 21 , . . . , 2(log2 |K|)−1 . The arcs (or chords) spanning the ring to nodes in exponential distance constitute the node’s finger table. Routing of lookup requests is the main functionality of Chord and proceeds as follows: if the location receives a request for the key k then (a) if is responsible for k, then returns itself, (b) else if the closest successor of is responsible, then returns its closest successor, (c) else returns the closest predecessor of k. The closest predecessor function and the closest successor function are calculated locally from the pointers that are stored at each node.
0
Fig. 1 A Chord ring with K = {0, 1, . . . , 15}. If there is a location for every key and all finger tables are correct, then the dotted lines are links of the finger table of location 9. A request for key 8 at location 9 would then induce messages along the log2 |K| = 4 un-dotted lines. Without finger pointers we would need |K| − 1 = 15 messages for the same query, using only the successor pointers
12
4
8
100
S. Schneider et al.
2.3 Properties of Chord As common in the formal analysis of algorithms, we distinguish the two categories of functional (qualitative) and performance (quantitative) properties. The well-known properties (agreement, eventual termination) of the traditional problem of consensus falls into the first category. In the context of p2p algorithms, the correctness of lookup results or the statement that “eventually all pointers to other locations stored at the alive locations will be correct” have a merely functional character. The second category addresses issues such as the “quality of service of systems under stress”. More precisely, here we are concerned with the probability or speed that specified configurations are reached from some set of starting configurations. A typical proposition stated for Chord [13] is the following: “In a network where all successor pointers are correct, if every node fails with probability 1/2, then the expected time to execute find successor is O(log N ).” Functional and performance properties that are satisfied by an algorithm in a static environment often turn out to be incorrect when the algorithm is used in the context of unexpected joins and leaves (including the case of failures) of locations [3]. Thus, it is essential to develop a specification that accounts for the complexity induced by these dynamics. Arguably, the deepest analysis of p2p systems like Chord is due to Liben-Nowell, Balakrishnan and Karger in [8]. Yet, since their properties were not stated formally, they could only argue informally for the optimality of certain setups of parameters in their model. Actually, to our knowledge, all proofs for soundness of functional and performance related aspects of Chord [13] and other scalable p2p overlay networks like Pastry [12], Tapestry [14] and Skipnet [5] are rather informal sketches. Summing up, we think it is fair to state that there is no commonly agreed and reasonably complete (formal or informal) specification for p2p algorithms like Chord.
2.4 Requirements for a Formalization of Chord A satisfactory formal model for Chord must treat multiple aspects of the system. Firstly, Chord is defined in terms of the local behavior of its nodes, which perform an imperative algorithm on their dynamically changing internal state. Secondly, the nodes are running concurrently, and communicate using message passing. Thirdly, individual nodes can join or leave the network; crashes and message loss should also be expressible. Finally, in order to formalize the desired properties of the system, one needs some way of assigning probabilities to different possible evolutions of the system. In this context, a common though imprecise assumption on p2p systems is that the lifetime of a node as well as the delay between the joining of two nodes are independent random variables with an exponential distribution. Thus, we have settled on a stochastic model where the various delays are exponentially distributed.
Towards the Application of Process Calculi in the Domain of Peer-to-Peer Algorithms
101
3 Formalisms for Stochastic Distributed Systems In our view, the main competitors in this field are the usual suspects: I/O-automata, Petri nets, and process calculi. All of them have been applied for the formal specification and verification of distributed algorithms in stochastic environments to some extent. They all have their own advantages and disadvantages: generality and flexibility; rich enough support of data structures; ease of extensibility; succinctness of the available primitives for the domain of p2p algorithms; ease of modeling; richness of the available meta theory; suitable compositionality; availability of powerful analysis tools; proximity to programming languages . . . to name but a few. We prefer to work with process calculi mainly due to their rich meta theory, resulting in powerful compositional analysis tools, and their proximity to programming languages. Furthermore, recent years have shown strong developments concerning the extension of process calculi to incorporate explicit distribution, failure and stochastic phenomena. For example, Francalanza and Hennessy [4, 6] enhanced the π -calculus with locations to explicitly model distribution and failure. Likewise, in [7], a stochastic process calculus using exponential distributions is defined for efficient finite state analysis using Continuous Time Markov Chain theory. Last, but not least, we have already gained promising experience [2] on the static case of DKS [11], a close relative of Chord. This previous work represents but a starting point in that we used a static model (i.e., no nodes are able to join or leave the network) without time and without link failures. We proved the correctness of the lookup module, but we were not able to express performance properties. Similarly, Bakhshi and Gurov [1] proved the correctness of a module of Chord that fixes the successor pointers while new nodes are joining the network. They abstracted from process failures, link failures and time, and removed the other modules (such as the lookup module) of the algorithm. These applications suggest that process calculi may be a promising candidate to model p2p systems and to prove their correctness.
4 A Calculus for Peer-to-Peer Systems 4.1 Syntax and Semantics Due to space limitations, the formal syntax as provided in Fig. 2 is not explained in full detail. The calculus inherits many constructs from the (Distributed) π -calculus [6, 10]. Apart from standard idioms for local and concurrent computation, there are constructs to model distribution and stochastic behavior. Explicit distribution manifests via | P |, which denotes the code P executed at location . The construct :a v models the sending of message v to port a on location . The failure of a remote location can be detected using the construct susp .P , where a successful detection leads to the execution of code P . A stochastic delay of an execution of code P is denoted by the construct (λ)P , where λ is a positive real number.
102
S. Schneider et al.
Fig. 2 Syntax for process terms (active code except for summations), summations (passive code) and systems (composition of current locations)
As usual, the semantics of our process calculus is defined as a labeled transition system (LTS). However, the modeling of stochastic effects in the presence of networks of failure-prone located processes requires us to build up some notation and book-keeping machinery that helps us to track the live locations, numbers of remaining possible location failures and joins (both may be infinite). As usual for stochastic process calculi we formally capture the transition system as a function of type C × K × C → N, where C represents configurations consisting of systems S equipped with book-keeping information, K describes the type of steps, and N counting multiplicities of steps. It is straightforward to model the functionality of Chord in this calculus. The respectivecodemaybeconsultedat http://www.mtv.tu-berlin.de/fileadmin/a3435/ encoding.pdf.
4.2 Meta Theory In our calculus we have a compositional, structural and formal way to describe the local computations. These are executed by sequences of instantaneous τ -steps. Arguing about these sequences is complicated because the order of τ -steps is in general nondeterministic. We develop some meta-theory to simplify the statement of properties as well as the development of proofs. 4.2.1 Static Normalization As usual, we equip the process language with a structural congruence ≡ ⊆ C × C that identifies configurations up to the rearrangement of terms. Using this structural congruence we can deterministically construct a normal form for every configuration emerging from the initial Chord configuration without enabled τ -steps. The normal form is determined precisely by the following information for each node. 1. The point in the local process code where progress is delayed until interaction with a remote location occurs. 2. The content of the local memory. This normal form makes reasoning much simpler because then, we are primarily facing a uniform set of configurations – only normal forms – during proofs.
Towards the Application of Process Calculi in the Domain of Peer-to-Peer Algorithms
103
4.2.2 Dynamic Normalization Since each node in the chord system performs a deterministic algorithm, the system can be seen as locally deterministic. In our model, this translates to the notion of τ -convergence: The following two theorems are satisfied by all configurations that are reachable from the initial Chord configuration. Theorem 1. (Local Life-lock-Freedom of Chord) When a τ -step is possible then every sequence of τ -steps starting with that step is finite. Theorem 2. (Local τ -Confluence of Chord) When a τ -step is possible then there is a unique configuration (up to ≡) that is reached by executing τ -steps in any order until no further τ -steps are enabled. Apart from gaining clarity for paper proofs, the above meta theory also helps to shrink the involved state spaces for automatic verification methods.
4.3 On the Formalization of p2p Properties Using our calculus and meta theory we now turn to stating properties formally.
4.3.1 Functional Properties We express functional properties as predicates over the information used to construct the normal form. These predicates may also be translated into test code that is placed on the proper locations. The test code is then executed using the LTS of our calculus (where the Chord algorithm could be halted during the test), thus conformance can be proven within the operational semantics framework. Yet, as expected, meaningful test code is in general complex which makes it hard to argue for its intuitive correctness. This problem can be weakened by decomposing test code of independent predicates to check them in isolation. Our framework is sufficient to succinctly specify every functional aspect we have yet encountered.
4.3.2 Performance Properties The specification and verification of performance properties is current and ongoing work. One preliminary observation is that such properties are dependent on the values of the stochastic parameters (rates) of our calculus, namely the (interlocation) message delay and the delays between joins and the MTTF of locations. We will investigate logics, ideally closely matching the features of our process calculus, for their applicability to formalizing trace-based performance properties.
104
S. Schneider et al.
5 Ongoing and Future Work We will continue to develop a reasonably complete formal specification of Chord. The critical property is the delivery of a proven quality of service under some assumptions about the behavior of the network. We will attempt to introduce a finite state representation of the Chord system in order to use tools to analyze or (if adequate) model-check the stochastic behavior of the Chord system. This work amounts to proving that there is only a bounded number of messages in transit at any time. At a later state, we are considering refining the network model, for instance by taking locality into account in order to prove tighter performance bounds.
References 1. R. Bakhshi and D. Gurov. Verification of peer-to-peer algorithms: A case study. Electr. Notes Theor. Comput. Sci., 181:35–47, 2007. 2. J. Borgstr¨om, U. Nestmann, L. Onana Alima, and D. Gurov. Verifying a structured peerto-peer overlay network: The static case. Global Computing, volume 3267 of LNCS, pages 250–265. Springer, 2004. 3. M. J. Fischer, N. A. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985. 4. A. Francalanza and M. Hennessy. A fault tolerance bisimulation proof for consensus (extended abstract). ESOP, volume 4421 of LNCS, pages 395–410. Springer, 2007. 5. N. J. A. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman. Skipnet: A scalable overlay network with practical locality properties. USENIX Symposium on Internet Technologies and Systems, 2003. 6. M. Hennessy. A Distributed Picalculus. Cambridge University Press, Cambridge, 2007. 7. H. Hermanns. Interactive Markov Chains: The Quest for Quantified Quality, volume 2428 of LNCS. Springer, 2002. 8. D. Liben-Nowell, H. Balakrishnan, and D. R. Karger. Analysis of the evolution of peer-to-peer systems. PODC, pages 233–242, 2002. 9. N. Lynch. Distributed Algorithms. Morgan Kaufmann, San Mateo, CA, 1996. 10. R. Milner. Communicating and Mobile Systems: The π -Calculus. Cambridge University Press, 1999. 11. L. Onana Alima, S. El-Ansary, P. Brand, and S. Haridi. DKS (N, k, f): A family of low communication, scalable and fault-tolerant infrastructures for p2p applications. In CCGRID, pages 344–350. IEEE Computer Society, 2003. 12. A. I. T. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Middleware, volume 2218 of LNCS, pages 329–350. Springer, 2001. 13. I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11(1):17–32, 2003. 14. B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications, 22(1):41–53, 2004.
Specification Techniques (Not Only) for Autonomous Systems Peter Pepper
1 Introduction There is a general tendency to make systems more and more autonomous, thus relieving human operators from the bulk of everyday routine work. This has led to a series of efforts, which have become known under the keywords Autonomic Computing and Self-X, where the “-X” stands for aspects such as -Healing, -Aware, -Optimizing and so forth. A major proponent of this development is IBM with its so-called Architectural blueprint for autonomic computing [3]. Other approaches are the Recovery-Oriented Computing (ROC) [7] or the project RoSES (Robust Self-configuring Embedded Systems), where the notion of Graceful Degradation plays an important role. Our interest mainly lies in the application of these concepts in the realm of Embedded Systems, mainly in the Automotive, Aviation or Astronautic Domains. Here the challenges are quite different from the scenarios addressed by the above projects. When the idea of Autonomic Computing is applied to web servers, data base servers and the like, then major effects can be achieved by duplicating resources, i.e. adding new servers, adding more disk storage, increasing the bandwidth etc. But for Embedded Systems a major characteristic is the lack of resources. Therefore our main challenge is to realize the principles of Autonomic Computing in an environment of sparse resources. In this context the work of NASA is of great interest [6], in particular the framework of the Livingstone project and the test satellite Deep Space One. To date the efforts around the concepts of Autonomic Computing and Self-X are mainly carried out on an abstract and highly informal level. There are general and coarse architectural designs together with proof-of-concept implementation. (Some of the literature may even be characterized as “sales talk”.) However, when these P. Pepper Compiler Construction and Programming Languages, Technische Universit¨at Berlin, Germany e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
105
106
P. Pepper
principles shall be applied in safety-critical environments such as the Automotive or Space Travel domains, then we need a more thorough treatment, where the correctness of the concepts can be guaranteed. In our work we try to utilize the formal framework of Evolving Specifications (Especs) to the task of Autonomic Computing. The Espec framework has been developed by a number of people at Kestrel Institute, notably Doug Smith and Dusko Pavlovic. Its basic principles and some applications are, e.g. presented in [8–11]. It extends earlier work on the algebraic/categorical specification of software [5], which it still contains as a subframework. Especs add the dimension of stateful behavior, and thus lead into the realm of system specifications. This approach is in the tradition of many approaches in the literature to address issues of system design by utilizing category-theoretic mechanisms (e.g. [2]).
2 Architecture Before we enter the discussion of possible formalizations, we want to provide a more intuitive introduction into the kinds of tasks to be addressed by our methodology. Therefore we first discuss the issue on the more informal level that is predominant in the literature. There is a generic architecture for Autonomic Managers, which can be found in similar designs in various approaches. The best known design is probably advocated by IBM in the so-called Architectural blueprint for autonomic computing [3]. We follow this general concept in the form presented in Fig. 1. The system is continuously monitored, thus determining the Actual Model, that is, an abstract description of the current status of the system under supervision. This model is compared against the Required Model by the Analyzer. In case of a discrepancy the Planner is invoked, which shall identify a Substitute Model that is
Monitor
Actual Mod. Repair Rules & Healing Tactics Analyzer
System under supervision
Requ. Mod.
Effector
Fig. 1 Architecture of an Autonomic Manager
Subst. Mod.
Planner
AC Manager
Specification Techniques (Not Only) for Autonomous Systems
107
Actual Model
Required Model
potential Substitute Models
Fig. 2 Selection of Substitute Model
Positioning Device Measured Position Position Sensor
GPS
Calculated Position Communicated Position
Mobile Phone
Car2Car
Hotspot
Inertial Position
Fig. 3 Possibilities for positioning
as close to the Required Model as possible (see Fig. 2). The Effector then has to modify the system such that it is conformant to the new Substitute Model (which also becomes the new Required Model). There are a number of variations of this scheme. For example, one may retain the originally required model in order to come back to it as soon as the reason for the disturbance has vanished. But we do not pursue these variations here further. For this kind of scenario to become feasible we need means to describe the various models in such a way that • Differences between models can be identified. • A notion of “closest matching” model can be introduced; that is, we need some kind of ordering that is based on the required/available functionality and the quality of service. • Necessary changes of systems can be effected in order to make them match a required model. Example: Figure 3 shows a variety of ways, in which a car can determine its current position. These possibilities vary in their quality from very accurate (e.g. GPS-based) to coarse (e.g. mobile phone); the quality may deteriorate over time
108
P. Pepper
(e.g. inertial positioning) or it may be uncertain (car2car communication). But all these possibilities share some kind of essential functionality that makes them usable as Positioning Device. So the task is to specify the models in this hierarchy in such a way that their essential functionality is expressible as well as the expectable quality of service. (End of example)
3 UML-like Models For a first sketch of our underlying concepts we use a (simplified) UML-like presentation of our models. In the next section we will then refine this coarse approach by using more advanced specification means. Example: For illustration purposes we use the following scenario: We have a built-in navigation system that is linked to a GPS device in order to determine the actual position. The coarsest form of this model is given by the architecture of Fig. 4. The navigation system is connected to a GPS receiver (usually via a MOST bus or a CAN bus). Now assume that the GPS signal is lost. This could happen due to several reasons: We might have entered a tunnel, where the signal is temporarily lost. Or the GPS device is actually broken, which means that the signal will not come back. This leads – at least temporarily – to the Actual Model depicted in Fig. 5. This situation exhibits a discrepancy between the Actual Model and the Required Model and thus leads to the search for a suitable Substitute Model. Suppose the Autonomic Manager comes up with a model, where the GPS component is replaced by a Position Estimator component, which in turn needs to be linked to Inertial Positioning devices. The resulting new system architecture is depicted in Fig. 6. Navigation
GPS
Fig. 4 Normal configuration of navigation system
Navigation
Fig. 5 Broken navigation system Navigation
Inertial Positioning Position Estimator
Fig. 6 Substitute architecture of navigation system
Specification Techniques (Not Only) for Autonomous Systems
109
Navigation
Inertial Positioning Position Estimator
Mobile Phone
Car2Car
Fig. 7 Alternative substitute architecture of navigation system
When the calculation of the position based on inertial (and other) information is used over a too long period of time, then the accuracy deteriorates. This is in particular the case, when the GPS device is completely broken. Then the Position Estimator may need corrective information using e.g. mobile phones, hotspots or even communication with cars in the vicinity. This leads to the architecture depicted in Fig. 7. However, when the GPS signal comes back (e.g. after leaving the tunnel), then the system should go back to the original architecture of Fig. 4. (End of example) This example demonstrates, how architectural models can be used within an Autonomic Manager to provide the necessary “healing” information. That is, the analysis as well as the planning utilizes architectural models. However, at closer inspection the above example also reveals a major deficiency: A great part of the necessary information was given in natural language, that is, by prose surrounding the architectural models. Therefore we will study means in the following, by which a more accurate description of the models and the analysis and repair rules can be given.
4 ESPECS The concept of Especs (Evolving Specifications) is an extension of the traditional realm of Algebraic Specification. Whereas algebraic specifications focus on data types and operations, evolving specifications add the dimension of states and their dynamic changes, that is, the coalgebraic aspects of systems. Technically, Especs combine classical algebraic techniques with automata-like constructs. (Similar concepts are also found in UML, Matlab/Simulink/Stateflow and similar approaches, usually based on some variant of Harel’s State Charts.) For a more detailed description of Especs (and related formalisms) we refer to the literature, e.g. [8–11]. We may perceive Especs as an extension of UML-like descriptions. For example, Fig. 8 presents a UML component together with its Espec behaviour description. The behaviour description essentially consists of modes and transitions. This is quite similar to the states and transitions of classical automata theory, but we
110
P. Pepper Navigation Initializing
«component» Navigation
start suspend Guiding
UML component
resume
Suspended
Espec behavior specification
Fig. 8 The modes and transitions of a trivial navigation system SPEC Navigation = FUN accuracy: (Real|0.0..1.0) MODE Initializing = ... MODE Guiding = ... AXM accuracy > 0.9 ... MODE Suspended = ... TRANS Suspend: GuidingSuspended = (accuracy ≤ 0.9 ...) END-SPEC Fig. 9 Textual form of an Espec
speak of “modes” here, to emphasize that modes represent the activities of a system (“modes of operation”). Actually, modes represent sets of states. (We also pursue variations, where the modes represent continuous functions, given by differential equations or differential-algebraic equations, similar to the concepts of Hybrid Automata [1].) The above sketch shall convey the intuition behind modes and transitions. But in the Espec formalism modes actually represent theories, more precisely, theories induced by the given specification text (i.e. by signatures together with axioms). In other words, modes stand for the set of all states, in which the given axioms hold. For example, Fig. 9 shows an excerpt from the specification Navigation. It states that a navigation system only performs its guiding actions, when the accuracy of the position is at least 90%. This is a required property of the mode Guiding and its violation is also the guard for the transition into the mode Suspended (see below). Note: This textual information could also be included into the grafical notation of Fig. 8 (very much like in systems such as State Charts, Stateflow or Modelica). But experience shows that large specifications make grafical representations completely overloaded and thus unreadable. Figure 10 illustrates the general situation. We have modes M0 , M1 and M2 with associated theories M0 , M1 and M2 . Between the modes we have transitions that are annotated with guards gi and actions ti , denoted in the form gi ti . Pictorially this is quite similar to systems like State Charts, Stateflow and others. But semantically there is a considerable difference. In our framework the situation
Specification Techniques (Not Only) for Autonomous Systems
111
Fig. 10 Modes and transitions
Fig. 11 Refinement of modes and transitions
of Fig. 10 means: When the system is in mode M0 and the guard g1 is true, then the transition to M1 may be taken. The “action” t1 – actually a theory rewriting – must ensure that the theory M1 holds. Formally: the transition is captured as an t interpretation M2 → M1 , which rewrites the theory M2 in terms of the theory M1 : M2 |= q
=⇒
M1 |= (g ⇒ t (q))
So the construction is quite similar to the predicate transformers in Hoare-style logics (where the assignment statement also rewrites the postcondition in order to obtain the weakest precondition). The central concept in our approach is the notion of Refinement. Refinements are morphisms between specifications. This is depicted in Fig. 11. One usually calls the source specification Speca of the refinement the “abstract” model and the target of the morphism the “concrete” model. The refinement morphism ϕ consists of two parts: • For the classical algebraic parts ϕ is a usual specification morphism. That is, it maps the sorts and operations of Speca to the sorts and operations of Specc such that the axioms of Speca are theorems in Specc . • For the coalgebraic parts the morphism goes in the opposite direction. That is, it is a diagram morphism that maps the modes and transitions of the concrete
112
P. Pepper
model Specc to the modes and transitions of the abstract model Speca . This way, each run of the concrete model can be associated to a corresponding run of the abstract model. In other words, the fine-grained concrete runs are refinements of the coarse-grained abstract runs.
5 Using Especs for Autonomic Computing We will now use the concepts of Especs to describe our Autonomic Computing scenario of Section 3. Due to lack of space, this presentation has to be overly simplified, but it should convey the basic principle. In order to make things more readable we also augment the Espec formalism by some ad hoc notations, which (hopefully) are self-explanatory. Example: We start by considering the “normal” situation depicted in Fig. 4. We describe it as a configuration consisting of two components Navigation and Positioning. In our presentation we are only interested in the Positioning; therefore we concentrate on this component (see Fig. 12). The grafical representation in Fig. 12 can only convey some aspects of the necessary information. For a more complete definition we need the full textual specification. Excerpts from this specification are given in Fig. 13. This simply states the minimal requirements: a Positioning needs to provide a position and also an estimate of the accuracy of this position; for simplicity we let this accuracy range from 0.0 (“useless”) to 1.0 (“perfect”). Moreover, it has two modes, Off and Locating with the corresponding transitions. For example, the transition from Off to Locating can only be taken, when the starting of the device has been requested (the start event) and a signal is available. Positioning
start∧ has-signal
set-pos
Off
Locating stop∨ signal-lost
Fig. 12 Modes and transitions of the positioning device SPEC Positioning OP position: Position OP accuracy: (Real|0.0..1.0) PRIVATE OP set-pos = ... MODE Off = ... MODE Locating = ... TRANS start: Off Locating = (start ∧ has-signal set-pos) TRANS stop: Locating Off = (stop ∨ signal-lost ) END-SPEC Fig. 13 The Espec for the component Positioning
Specification Techniques (Not Only) for Autonomous Systems
113
We leave the details of the auxiliary operation set-pos and the inner working of the mode Locating open. They essentially have to specify the acquisition and evaluation of the GPS signals from the antenna. We note in passing that the above specification shall above all be simple enough for illustration purposes. In reality we would make the operations position and accuracy only available in the mode Locating, since they are meaningless in the mode Off. (End of example) Figure 3 contains a hierarchy of refinements of this specification, representing different kinds of position-finding technologies. This hierarchy can be captured more accurately in a collection of specifications of subsystems: • The component GPS (see Fig. 4), which is a one-component subsystem • The subsystem consisting of the Estimator together with the InertialPositioning (see Fig. 6) • The subsystem consisting of the Estimator together with the components InertialPositioning as well as the MobilePhone and the Car2Car (see Fig. 7) We refrain here from writing down these specifications explicitly. The essential fact in our context is that all three subsystems are refinements of the specification Positioning. This is depicted in the following shortened excerpt from Fig. 3. Positioning
r
GPS
r
Estimator Inertial
r
Estimator Inertial Mobile Car2Car
For the required refinement morphisms to work, each of these subsystems has to provide at least the operations position and accuracy as well as refinements of the modes Off and Locating and their transitions. The above relations are part of the data base, on which the Autonomic Manager bases its analysis and planning (see Fig. 1). Since the Required Model postulates a component that meets the specification Positioning it has to choose among the possible refinements. This selection is (in our scenario) guided by the value accuracy, which represents the “quality of service” on which the “closeness of models” is based. That is, we choose the realization with the highest accuracy. In general, the GPS will provide the highest accuracy. However, inside a tunnel or when the device is broken, its accuracy will drop to 0.0, which makes one of the other realizations better.
114
P. Pepper Positioning
start∧ has-signal
set-pos Locating
Off stop ∨ signal-lost
ϕ
ϕ InertialPositioning
start∧ no-pos
start∧ has-pos
set-pos
Off
ϕ
accuracy = 0.0 accuracy < Min Blind
Calculating has-pos
stop
∆t
set-pos
decrease-accuracy
Fig. 14 Refinement of the positing device
For the InertialPositioning the accuracy goes down as time progresses. Hence, there will be a moment, when the other subsystem with its correctional means will be superior. Example: We consider the refinement morphism from the abstract specification Positioning (see Fig. 12) to a concrete realization based on estimations using inertial information. Figure 14 contains both the abstract and the concrete specification and their morphisms. (But it should be noted that the picture only contains an excerpt of the actual specification; some parts of the guards and actions have not been included in the picture.) This is admittedly oversimplified, but it should suffice to illustrate the principal way of proceeding. We refrain from writing down all the details of this specification. Rather we only comment on the essential features: • The morphism property of the algebraic part has to ensure that the operations position and accuracy are also available in InertialPositioning (possibly renamed). • The morphism property of the coalgebraic part requires that all modes of the concrete model InertialPositioning are mapped by ϕ to modes of the abstract model Positioning. We choose to identify Calculating as well as Blind with Locating. • This induces the corresponding mappings of the transitions. For example, the arrow from Off to Calculating as well as the arrow from Off to Blind are mapped to the arrow from Off to Locating. • Analogously, the arrows from Calculating to Blind and back from Blind to Calculating are both mapped to the identity arrow of Locating. (Each mode has such an identity arrow to itself, even if we do not draw it explicitly.) • We also introduce a self-loop for the mode Calculating. We use it for expressing the fact that the accuracy deteriorates in certain time intervals. (This could, of course, also have been specified as an internal property of the mode.)
Specification Techniques (Not Only) for Autonomous Systems
115
• The specifications of the modes Calculating and Blind of the concrete model InertialPositioning both have to be strong enough to entail all the properties of the mode Locating of the abstract model Positioning. There are also a number of constraints for the refinement of the guards and actions. However, they are slightly more intricate, since we have to distinguish between safety-preserving and liveness-preserving refinements. We are not going into details of this distinction; rather we refer to the literature, e.g. [4, 9]. (End of example)
6 Interfaces and Parameterized Especs An important aspect of modularized designs is the clear description of the interfaces between the components. In UML this is achieved through concepts like ports and connectors. However, as usual these UML concepts only determine some syntactic aspects and leave most of the content-related aspects open. In the Espec framework this deficiency is – again – overcome by using the notion of refinement morphisms. This is illustrated in Fig. 15 for the interface between the Navigation component and the InertialPositioning component. The approach follows the so-called rely-guarantee paradigm: as long as the environment fulfills the rely conditions, the component guarantees its behaviour. In Fig. 15 the specification Navi-Interface is the parameter of the specification Navigation. It is linked to the body by a morphism p, which (in this example) maps the two modes Initializing and Guiding to the parameter mode Locating, since the activities in these modes require position information. The mode Suspended is
InertialPositioning Navi-Interface on Locating
Blind
r
Off
off p Navigation Initializing start suspend Guiding
resume
Suspended
Fig. 15 Using parameterized Especs for interface descriptions
Calculating
Blind
116
P. Pepper
mapped to the parameter mode Blind. This association (when properly axiomatized) expresses the fact that the navigation device relies on proper positioning information. The component InertialPositioning provides – at least in some scenarios – the environment for the component Navigation. Therefore it is linked in the Espec formalism to the parameter specification Navi-Interface through a refinement morphism. This morphism associates the two environment modes Off and Blind to the parameter mode Blind and the mode Calculating to the mode Locating. The mechanisms of these morphisms then ensure that all the properties, on which the Navigation specification relies are indeed met by the environment InertialPositioning (provided that the specifications are correct). We cannot discuss the technicalities of this approach here in detail. But the above sketch shall suffice to give a first idea of the method.
7 Conclusion Like many other ideas, Autonomic Computing is usually presented on a very abstract and thus highly informal level. Even though this provides a good starting point for research and development and may even help practioners to implement concrete systems, it is by no means sufficient to introduce the technology into areas of safety-critical applications. Here we need means that foster correctness and reliability. In this paper we have sketched the potential of one specific formalism – namely Especs – for providing the necessary framework, within which the required quality and rigor of the software development can be achieved. The previous sections have demonstrated that this approach provides the principal expressive power, which is needed in connection with Autonomic Computing for safety-critical and other highquality systems. However, this paper only provides a sketch of the pertinent requirements and technologies. Many issues remain to be tackled, in particular with respect to the restricted resources that are available in Embedded Systems. For example, the models and their relationships need to be efficiently represented in the knowledge base. The communication between the various components in an Autonomic System needs to be standardized. The strategies for healing corrupted systems need to be formulated; to this end, technologies from the realm of constraint solvers appear to be most promising. Some of these issues have already been addressed by us to some extent in industry projects, but many of the questions are still topics for ongoing research. Acknowledgments My work with and on Especs is the result of a long lasting and most satisfying and enjoyable cooperation with Doug Smith and Dusko Pavlovic from Kestrel Institute. The work on Autonomic Computing, in particular in the Automotive Domain, was both sponsored and influenced by our close cooperation with Daimler AG in the framework of the DCAITI (Daimler
Specification Techniques (Not Only) for Autonomous Systems
117
Center for Automotive Information Technology Innovations) at Technische Universit¨at Berlin. The concrete work on Autonomic Computing was highly influenced by our former colleagues Andre Metzner and Mathias Hintzmann, the work on Navigation and other Telematics issues by Michael Cebulla and Sandro Rodriguez Garzon.
References 1. R. Alur, C. Courcoubetis, T.A.Henzinger, and P.-H. HO. Hybrid automata: An algorithmic approach to the specification and verification of hybrid systems. In R. Grossman, A. Nerode, H. Rischel, and A. Ravn, editors, Hybrid Systems, LNCS 736, pages 209–229. Springer, New York, 1993. 2. J. A. Goguen. Categorical foundations for general systems theory. In F. Pichler and R. Trappl, editors, Advances in Cybernetics and Systems Research, pages 121–130. Transcripta Books, London, 1973. 3. IBM, 2006. http://www-03.ibm.com/autonomic/pdfs/AC Blueprint White Paper 4th.pdf. 4. J. Fiadeiro, A. Lopes, and M. Wermelinger. A mathematical semantics for architectural connectors. In FASE-03, LNCS 2793, pages 190–234, 2003. 5. Kestrel Institute. Specware System and Documentation, 2003. http://www.specware.org/. 6. NASA. http://ic.arc.nasa.gov/projects/mba/. 7. D. Patterson and A. Fox. http://roc.cs.berkeley.edu. 8. Dusko Pavlovic, Peter Pepper, and Douglas R. Smith. Colimits for concurrent collectors. In N. Dershowitz, editor, Verification: Theory and Practice: Festschrift for Zohar Manna, pages 568–597. LNCS 2772, 2003. 9. Dusko Pavlovic, Peter Pepper, and Douglas R. Smith. Evolving specification engineering. In Proceedings of AMAST 2008, 2008 (to appear). 10. Dusko Pavlovic and Douglas R. Smith. Composition and refinement of behavioral specifications. In Proceedings of Sixteenth International Conference on Automated Software Engineering, pages 157–165. IEEE Computer Society Press, 2001. 11. Dusko Pavlovic and Douglas R. Smith. Guarded transitions in evolving specifications. In H. Kirchner and C. Ringeissen, editors, Proceedings of AMAST 2002, volume 2422 of Lecture Notes in Computer Science, pages 411–425. Springer, 2002. ftp://ftp.kestrel.edu/ pub/papers/pavlovic/GTES.ps.
Quality Assurance for Concurrent Software – An Actor-Based Approach Rodger Burmeister
Abstract Distributed aspects and parallel architectures are increasingly gaining importance today, but developing high-quality concurrent software is still more of an art than an engineering process and requires a lot of experience. The reasons for this are presented within the first part of this paper along with common strategies for addressing them. In the second part, we propose using actor models in conjunction with well-established and well-known object-oriented concepts. Putting together these two concepts – object-orientation and actors – remedies several shortcomings of classical mutual-exclusion techniques and supports the development of concurrent and distributed software in a comfortable and scalable way. In the third part of the paper, we give an overview of our current research activities, where we are developing tools to express and orchestrate concurrent test cases, in particular for testing actors and their behavior. Overall, the combination of consistent constructive and analytical actions, as proposed in this paper, relieves developers of the burden of dealing with concurrency explicitly and provides them with a basis for building high-quality concurrent software.
1 Motivation We live in a parallel world, surrounded by parallel processes at every turn. The clock on the wall ticks, while the radio on the shelf plays music and cars drive down the street. Most of these processes are independent of each other. In computer science, concurrency is an old topic but one that is becoming quite popular again, as several limitations in hardware and software force us to scale up our systems by using concurrent design elements. Herb Sutter has summarized in [1] some of the reasons why there is no way around concurrency. And a lot of concurrent technology is already R. Burmeister Software Engineering, Technische Universit¨at Berlin, Germany e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
119
120
R. Burmeister
in use today: multicore architectures, computational clusters, web services, user interfaces with asynchronous events and the Internet itself. But the problem is, developing sound, high-quality concurrent systems using traditional development tools is not easy, as they are mainly designed for sequential programming. The most commonly used imperative object-oriented languages (C++, C# and Java) use threads, shared memory and mutual-exclusion techniques to provide support for concurrent computations. Using these kinds of techniques to express concurrency entails additional, low-probability issues like the following. Data Racing: When two or more processes access a shared object at the same time, at least one operation being a write-operation, the state of that object could be undefined as it is not clear if the operation is an atomic operation. And since the evaluation depends on lots of fuzzy timed parameters like memory performance, the scheduling algorithm used or the timing of external events, it is nearly impossible to precisely predict the order of operations. The access to shared objects is therefore synchronized by using mutual exclusion, in which only one process gains access by locking and unlocking a shared resource. Since the developer must decide when to lock and unlock resources, it quite often happens that a synchronization is missing or faulty, e.g. when a resource is not released. As the conditions leading to such faults are very specific and can be easily hidden by other time-consuming operations – e.g. logging – faulty implementations need never show up. However, there are three strategies to detect or prevent data races: (1) change the software design, (2) use abstract models and formal techniques, and (3) perform static or dynamic code analysis. They all have their pros and cons. The first one prevents data racing and locking completely, but it requires a great deal of experience in designing concurrent software systems and it is not always possible to find a preventive design. The second one is the most powerful, when it comes to proving sound behavior. But it is time-consuming and can only abstract real implementations. The third one works directly on the implementation and is definitely the most pragmatic and less time-consuming one, but unlike the previous ones, it can not guarantee that the implementation is free of data races. One of the most frequently used algorithms for runtime analysis was presented in [2]. It monitors sets of possible locks for shared data objects and warns if there is an access without mutual exclusion. In contrast to the other three strategies, users of pure functional programming languages do not have to deal with data races, because a function can always be evaluated independently by several processes at the same time. Deadlocks and Livelocks: We talk of a deadlock, if two or more interdependent processes are waiting for each other and the systems gets stuck. Only rare circumstances lead to such a deadlock and most often the chain of events that induced the deadlock is not reproducible, which makes them hard to find and debug. Basically concurrent systems can be expressed and verified in process algebras like CSP, CCS, FSP or the π -Calculus, but checking real-world applications in modelcheckers like the Labeled Transition System Analyzer or the Concurrency Workbench is very time-consuming or even not feasible. Most developers opt for using a deadlock-preventing software design or use algorithms for detecting and resolving
Quality Assurance for Concurrent Software – An Actor-Based Approach
121
cyclic interdependencies between processes at runtime, like E. W. Dijkstra’s Banker Algorithm. A special case of a deadlock is the so-called lifelock. In a lifelock, the whole system state oscillates but shows no progress in the means of the application. As potential lifelocks can be intended behavior they are hard to define and detect. Non-Determinism and Repeatability: The runtime of an operation depends on environmental conditions and on various hard-to-predict perturbations, e.g. the response of a user. The specific runtime can vary therefore within different executions. But this also means that the execution order of concurrent processes and the overall behavior of an application can change even if the input is the same. Normally however, such indeterministic behavior is not desired, as non-specified behavior can potentially show up even if an application has passed all the tests. In [3], Edelstein et al. inject parameterized sleep() and yield() operations into the compiled bytecode to induce different orders of execution while testing a system. This allows them to test different concurrent and potentially risky execution orders in a repeatable way. But full coverage is not normally achievable because there are a huge number of possible execution sequences, which means that deterministic behavior cannot be guaranteed. Nevertheless, their approach is very successful in detecting concurrent faults, and storing injected parameters allows them also to reproduce a faulty chain of events. Having listed above reasons for the difficulties encountered in concurrent programming, we now present an alternativative – and scalable – approach for describing and testing concurrent software systems. To meet the needs of today’s software developers, a concurrent language must ideally support object-oriented concepts and concurrent test setups. The good news is that the formal foundations for such an approach already exist. One of them is the so-called actor model.
2 Actor-Based Object-Oriented Programming Actors are asynchronously executed, autonomous objects which can be used for describing highly scalable concurrent or distributed systems. As depicted in Fig. 1, an actor is composed of a unique address (ActorID), and a mailbox for incoming messages, and behavior definitions. Basically, there are only three things an actor can do: 1. Send messages to other known actors 2. Create new actors 3. Sequentially apply behavior corresponding to the received messages In a more formal sense [4], an actor represents only partial behavior that is replaced after an evaluation step by its succeeding behavior definition. In this paper however, we use the word actor as synonym for an autonomous object that encapsulates its behavior and state. The biggest advantages of this formalism are: (1) it is objectoriented, (2) incoming messages are processed in sequential order, and (3) the state
122
R. Burmeister
Actor ActorID: 001 Currently Processed Message messageA
Actor Mailbox
ActorID: 002
messageA messageA
messageB
messageB Behavior Definition messageA b.doSomething ...
Actor State b -> ActorID: 003
ActorID: 003 doSomething
messageB ...
Fig. 1 Actors can send asynchronous messages to each other, which are stored and processed in sequential order
of an actor can be changed only by the actor process itself. This means that we do not need locking mechanisms because there is no such thing as shared data. The use of actors can therefore facilitate the development of concurrent and distributed systems and it also bypasses some of the problems mentioned above, e.g. data racing. An actor formalism was first presented in 1973 by Carl Hewitt et al. [5]. Over the years, actor models have become a well-established formal foundation for concurrent and distributed systems and an alternative to other process calculi like CSP, FSP or the π -Calculus. What makes it easy to use compared with other models is its affinity to what we now call object orientation. This basically includes: messages, stateful objects, encapsulation or privacy, inheritance, polymorphism and generics. All these concepts could be easily expressed using the actor model [6]. It is no wonder then, that there are already several actor-based programming languages e.g. Erlang, Scala, Io, ABCL or E. The success of the functional language Erlang [7] in particular has shown that it is possible to build large scale actor-based software today. But actors can also be expressed in commonly used object-oriented programming languages like Java, Ruby or Smalltalk by using message serializing and threads for example. But unlike objects, most of these languages do not support actors as a basic primitive. Developers still have to decide how, when and where to use actors. Thus, while it is true that in most formal theories everything is an actor, it’s normally1 not the case with real, common object-oriented programming languages. We therefore suggest applying the actor formalism to existing languages by replacing the essential terms object and message invocation by their concurrent coun1
Scala, for example, offers a very good actor-extension.
Quality Assurance for Concurrent Software – An Actor-Based Approach
123
terparts actor and message-passing. It is expected that clean, pure object-oriented languages can consistently support the actor formalism additionally to the sequential type of description. The major benefit of such an approach would be that wellknown and widely used object-oriented programming languages can – under the hood – provide a scalable programming abstraction for distributed systems without programmers being aware of it. This means that we can pick up developers from sequential programming and enable them to scale up their software in a concurrent and distributed way. But it does not mean that current written sequential programs will necessarily scale up without any changes, but at least they should execute as before. The major gap between purely asynchronous actor formalism and common object-oriented programming languages is that most programming languages today use synchronous stack-based message invocation, which leads to the problem of asynchronous control flows (also known as inversion of control). For most of us, programming in a purely asynchronous manner feels somewhat bumpy and unnatural, and code written asynchronously is also hard to read because concerns that normally belongs together have to be disrupted and distributed over several asynchronously called methods. But if we wish to map synchronous object-oriented languages to an asynchronous concept, we have to find a way to combine them. We therefore suggest providing both concepts. As shown in the following example, asynchronous semantics can optionally – here indicated by the exclamation mark – bound to method calls without any return value, while method calls with return values stay bound to synchronous semantics. aCar := carPool pickCar. // synchronous assignment carPool ! add: vwBeetle. // asynchronous operation
As the switch to asynchronous calls can change the message order, and therefore the behavior of a system a strong confirmation of invariant behavior is required. But in cases where an actor object is referenced by another actor object, as it often happens in object-attribute hierarchies, the computations of this once-referenced object could be done in parallel without any fear of message reordering. To apply asynchronous semantics to method calls with return values as well, we suggest using continuations or futures. In [8], for instance, ready-to-receive actors are represented by using closures, which can be executed within the thread of the caller. Futures, by contrast are a kind of lazy evaluation or deferred computation. They immediately return a reference to a placeholder for the future result, which allows the caller to continue its work asynchronously. In the background, the placeholder is replaced by the real value as soon as it has been evaluated. But nevertheless, using actors for asynchronous programming means that we have to deal with potential non-determinism and validating actor networks. The latter is not addressed by common tools so far. Within the next section we therefore present some of our ongoing research activities, that in particular examine different strategies for specifying and executing concurrent test cases.
124
R. Burmeister
3 Testing Actor Networks Preventing use of faulty software is the overall goal of quality assurance. One commonly used and well-established tool for validating an implementation and its parts are unit tests. Unit test is a kind of black-box test in which defined input sequences are sent to the component under test. The response is then compared with a test hypothesis. If the observed output is different from the expected result, the test case has failed. But if the output matches the hypothesis, we can expect repeatable correct behavior at least for the specified input sequence. This means that in the case of sequential evaluation with observable behavior, we know whether a component matches a set of requirements or not. However if we use concurrent objects like actors, it is not so easy to decide whether a component matches a set of requirements or not. The reason for this is potential non-deterministic asynchronous behavior. This means that running a test case at two different points in time can produce two different results, as we depicted it in Fig. 2. Since there could be many interleavings, it is better to define test cases with a small number of asynchronously passed messages or to test only a critical subset of interleavings. In the second case we need – besides fully covered positive and negative results – a third kind of test result that identifies test cases with partially covered interleavings. The value of the third kind is that testing two or more interleavings with the same input sequence can still find more inconsistencies than testing only one. The good thing is that this kind of non-determinism can only occur if asynchronous control flows (inversion of control) are used. Using the also proposed synchronous approach to evaluate messages, as shown in Fig. 3, implies a fixed message order from the point of view of actor A and therefore a deterministic behavior where only one execution is needed to evaluate a test case. We must assume that directly related actors are deterministic, i.e. independent of others and sound. Non-deterministic behavior can show up during test runs if this assumption is not fulfilled! But taken the assumption for good, it is sufficient to check only potential interleavings of all incoming messages, which is far less than checking all interleavings from a global perspective. But even when making this assumption, there may be test setups where the number of possible interleavings is still far too large to test each one. Nevertheless, the proposed strategy fits in well with the character of object-attribute hierarchies and their incremental integration.
Mailbox 1. messageX 2. messageY or Mailbox 1. messageY 2. messageX
2. messageX A
B
1. doSomething
2. messageY
C
Fig. 2 Non-deterministic behavior can lead to different message orders in actor A
Quality Assurance for Concurrent Software – An Actor-Based Approach Currently Processed Message
4. getResult (sync)
doSomething(...) { b.evaluate c.evaluate resultB := b.getResult resultC := c.getResult }
125
B
Mailbox 1. evaluate 2. getResult
C
Mailbox 1. evaluate 2. getResult
1. evaluate (async) A
2. evaluate (async) 3. getResult (sync)
Fig. 3 Using synchronous and asynchronous operation can preserve message order from the perspective of actor A
testDriver
actorA
actorUnderTest
doSomething sendUpdate ???
???
???
getResult returnResult
synchronisation is needed but without touching the implementation of actorA!
Fig. 4 The testdriver should not send getResult to actorA till actorA has finished processing sendUpdate. The problem here is that there is no message from actorA to the testDriver when sendUpdate has finished, and we dont normally wish to adapt the implementation for testing only
However, dealing with test statements and concurrent coverage is only one aspect when testing object-oriented actor-based software. The other aspect that must be addressed are concurrent test case specifications and their execution. The problem is that we often need to synchronize a testdriver someway with non-public or indirect events, like the completion of a message. In Fig. 4, we show a timeline diagram which describes a situation where we need to synchronize the testdriver with a non-public virtual event. To monitor and time such kind of message traffic, we either need support by the runtime engine or some kind of runtime aspect annotation. In practice, concurrent test setups are often scheduled by using threads and roughly estimated sleep-delay operations. Precisely timed and orchestrated test setups or exception handling are rarely supported by today’s test frameworks. And approaches like Concutest, Thread-Checker, ConTest or TestNG do not target straightly for testing actor-networks. Based on the idea of Pugh et al. [9], we use a global timestep semantics for describing and orchestrating concurrent activities in actor networks.
126
R. Burmeister
In our semantics we trigger a timestep in the case that all actors – participating in a test case – have processed all their messages. Another way to orchestrate behavior we examine is the use of runtime-injected test stubs, in particular behave-as object relations. These test stubs substitute real actors and uses recorded behavior, behavior trees or statecharts to emulate behavior. The advantage of test stubs is that deterministic behavior can be guaranteed for at least the runtime of a test case and that messages could be timed precisely by an external supervisor.
4 Summary In this paper, we have shown why it is so difficult to develop high-quality concurrent software and presented constructive and analytical solutions to improve their overall quality. Our first suggestion is to use a combination of actor models and common object-oriented concepts to build languages that are close to today’s object-oriented languages, but with concurrent processes as an atomic feature and without some of the mutual-exclusion-related problems. Our second suggestion is to use concurrent test setups. In this context, we have introduced different ideas for testing actorbased object hierarchies and outlined how concurrent test cases can be specified and executed. In particular we combine different concepts and ideas to improve the overall quality and scalability of concurrent systems in the long term.
References 1. Sutter H (2005) The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. In: Dr. Dobb’s Journal 30(3), March 2005. 2. Savage S, Burrows M, Nelson G, Sobalvarro P, Anderson T (1997) Eraser: A Dynamic Data race Detector for Multithreaded Programs. In: ACM Transactions on Computer Systems (TOCS) 15(4), 391–411. 3. Edelstein O, Farchi E, Nir Y, Ratsaby G, Ur S (2002) Multithreaded Java Program Test Generation. In: IBM Systems Journal 41(1), 111–125. 4. Agha GA (1985) Actors: A Model of Concurrent Computation in Distributed Systems. Technical Report 844, MIT Artificial Intelligence Laboratory. 5. Hewitt C, Bishop P, Steiger R (1973) A Universal Modular ACTOR Formalism for Artificial Intelligence. In: Proceedings of the International Joint Conferences on Artificial Intelligence 1973, 235–245. 6. Yonezawa A 1990. ABCL: An Object-Oriented Concurrent System - Theory, Language, Programming, Implementation and Application. Computer System Series. MIT Press. 7. Armstrong J (2007) Programming Erlang – Software for a Concurrent World. Raleigh, North Carolina. 8. Haller P, Odersky M (2006) Event-Based Programming without Inversion of Control. LNCS, doi: 10.1007/11860990 2 9. Pugh W, Ayewah W (2007) Unit Testing Concurrent Software. In: Proceedings of the twentysecond IEEE/ACM International Conference on Automated Software Engineering, 513–516.
Enabling Autonomous Self-Optimisation in Service-Oriented Systems Hermann Krallmann, Christian Schr¨opfer, Vladimir Stantchev, and Philipp Offermann
Abstract Service-oriented systems, being based on the principles of service-oriented architecture (SOA), are a current trend in software architecture. In regard to autonomy, the architecture enables new ways to provide self-optimisation. To enable this, requirements on services have to be specified formally in service-level agreements (SLA). Based on these SLA, automatic monitoring can be performed. Monitoring information on current service performance in respect to requirements can be used to automatically provision services, adapting response time, availability and throughput. In this paper, we present a way to specify SLA and to monitor and automatically provision services. Our solution has been proven successful in laboratory experiments. The results show that autonomous self-optimisation is a pre-requirement for ubiquitous computing.
1 Introduction The service-oriented architecture (SOA) is a current approach to design and implement information systems. The widespread adoption of SOA has been made possible by generally accepted technical standards such as WSDL (Web Service Description Language) [1], SOAP (formerly Simple Object Access Protocol, now proper name) [2], UDDI (Universal Description, Discovery and Integration) [3] and WS-BPEL (Business Process Execution Language) [4]. A possible definition of SOA is: “SOA is a software architecture that builds a topology of interfaces, interface implementations and interface calls. SOA is a relationship of services and service consumers, both software modules large enough to represent a complete business function. So, SOA is about reuse, encapsulation, interfaces, and ultimately, agility” [5]. H. Krallmann, C. Schr¨opfer, V. Stantchev, and P. Offermann System Analysis and IT, Technische Universit¨at Berlin, Germany e-mail:
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
127
128
H. Krallmann et al.
Self-optimisation, being an aspect of autonomous computing, is a principle that can also be achieved in an SOA. As a prerequisite, to be able to do optimization, service-levels have to be specified and monitored. Based on the monitoring information, optimisation operations can automatically be performed. In order to provide this functionality, service levels of the service participating in the service delivery of the computing environment need to be guaranteed. If selfoptimization should work in the background and perform its functions without the user perception of the system itself – as demanded by the autonomous paradigm – there are very high requirements posed with respect to performance and other nonfunctional properties (NFP). E.g. systems have to perform their tasks in real time without noticeable delays. Our solution works in a scenario where a user consumes functionality that a service aggregator provides [6]. The service aggregator orchestrates several services from other providers in order to offer his services to the end user. The service providers provide their services with a certain service level. The user has certain expectations with respect to its quality of experience and perception. Our idea is to formalize on the one hand the assumed service level requirements for the users in a transparent manner based on the experience of the service aggregator and on the other hand to specify the capabilities of the services. Based on a gap analysis between those two service level definitions, we can provide a setup of run-time performance self-optimization. Thus, we can improve the user experience with respect to performance and availability during runtime to an acceptable level.
2 Overview of The Approach To achieve self-optimization in an SOA environment, we propose a solution for an automated run-time optimization that consists of four components, as depicted in Fig. 1:
Fig. 1 Overview of the approach
Enabling Autonomous Self-Optimisation in Service-Oriented Systems
129
• Formalization of run-time-related capabilities and requirements: The service aggregator specifies the requirements regarding the user experience in a formal way based on his assumption (1). The providers of the individual services to be used are assessing and specifying the services’ capabilities in a formal way (2). Both are using a pre-defined service level objective structure and pre-defined NFP terms so that a reasoner can compare both. • Run time optimization: During setup and aggregation of the autonomous system, the system aggregator compares the user experience requirements with the system’s capabilities considering the performance capabilities of the individual services (3). Based on the result of this comparison we can estimate whether we need to trigger performance optimization activities (4). The service aggregator can perform a performance optimization by following our proposed ASTAR optimization method. This optimization method uses parallelization across distributed nodes combined with load balancing in order to improve the performance of the services in case they cannot cope with the load given the performance requirements, e.g. caused by a large number of users. • Monitoring: The actual performance of the services is measured to monitor the optimization success with respect to the performance requirements and to identify further optimization need (5).
3 Related Work Much work has been done in the area of QoS-aware web service discovery [7], QoS-aware platforms and middleware for ubiquitous computing [8–10] as well as context-aware services [11]. For autonomous computing in SOA, it is important to achieve the goal of an acceptable QoS with the given services. Rather than selecting other services, we are proposing to enhance service performance, availability and reliability by optimizing the infrastructure the services run on. Extensive research concerning enforcement of non-functional properties exists in the field of CORBA (Common Object Request Broker Architecture), particularly in the areas of real-time support [12], replication as approach for dependability [13] as well as adaptivity and reflection [14]. There are various efforts to address non-functional properties in distributed computing environments that look at different system levels. Under the term performability, originally introduced in 1997 [15], there are works in the area of resource control [16] and addressing fault-tolerance at system level and application level [17]. In order to assure run-time enforcement we should look at ways to represent and control non-functional properties at the level of a technical service. Here we focus on availability and performance, particularly response time and throughput. One key approach to improving availability involves replicating the service across multiple, wide-area sites [18]. Typically, strong consistency needs to be provided between the replicas in order to provide better availability. The overhead that is introduced to ensure strong consistency has a negative impact on performance.
130
H. Krallmann et al. NFP
Predicate
Metric (Value, Unit)
Percentage
100 ms
in 95 % of if the cases
transaction less than rate
10 transactions per second
Throughput higher than 1000 kB/s in 95 % of if the cases
transaction less than rate
10 transactions per second
SLO pattern
Response time
SLO examples
Availability
less than
higher than 99.9 %-
Qualifying conditions (QC) NFP
-
if
Predicate Metric
-
-
Fig. 2 Structure of service level objective and examples
4 Formalization of Run-Time-Related Requirements and Capabilities We propose a structure for service level statements to formalize the requirements and capabilities as depicted in Fig. 2. Figure 2 also contains sample service level statements about the NFPs response time, throughput and availability. The statement structure can be used for describing performance-related capabilities of the services and the internal requirements of the service aggregator who is orchestrating the services (on an aggregated level – better reflecting the user experience). These statements are then stored with the service description (service capabilities) respectively with the process/scenario description (user experience requirements). In this work, we focus on runtime-related NFPs that are performance relevant. For mapping the user view to technical services, we use an approach for mapping non-functional requirements of composed services to every service in the composition [19].
5 Run-Time Optimization and Architectural Translucency Our run-time optimization focuses on the properties availability, response time, and throughput of a service chain, i.e. serially executed services, and the improvement of their service levels. This is specifically relevant for a service aggregator that offers services from different service providers to its customers. The idea behind this approach is to parallelize the services at the technical service level. If parallel service nodes perform the service requests, it is clear that the execution of the individual tasks and thus the service chain achieves higher availability, and many requests can be performed faster, i.e. with lower response time and higher throughput. To evaluate service replication alternatives and their effects on NFPs we have developed the approach of architectural translucency [20]. Thereby we define three general levels where we can replicate: virtualization; replication at operating system (OS) level and replication at serviceware level. Furthermore, we can distribute services across multiple hardware nodes. A first analysis of these options cannot exclude one of them as non-promising a priori. The question is at which layer
Enabling Autonomous Self-Optimisation in Service-Oriented Systems
131
Fig. 3 Screenshot of SLA compliance report for NFPs per customer, process and activity
replications should be introduced, so that the highest availability and performance at certain load hypothesis can be assured. To apply this approach we developed an experimental method called ASTAR with the following steps: Analyze, Set up tests, run Tests, Analyze and Recommend. The method is iterated twice in order to evaluate Where to replicate and How/what to replicate there. Each of the iterations covers all steps of the ASTAR method. The first iteration narrows down the locations of possible replication. The second iteration evaluates which possible replications we can introduce at these locations and compares their implications within that location. The monitoring solution is based on Eclipse BIRT (Business Intelligence and Reporting Tools, http://www.eclipse.org/birt/phoenix/). The solution facilitates a rather complex aggregation and reporting over all service calls per service and per customer over a certain period of time. The approach works in three steps. 1. The information of the SLA a customer has subscribed to needs to be stored in a database (as part of the service management platform). This can happen once after design time. 2. During run-time, the actual run-time information need to be gathered within the run-time engine and stored in a database. 3. SLA monitoring is performed by processing this raw information and comparing it to the SLAs per customer. Processing, aggregations, calculations and comparisons are defined with the BIRT report designer and performed by the BIRT runtime engine (see Fig. 3). In case of SLA breaches, the provider can react by using our optimization approach. SLA reporting is done ex post at the end of a period to give an overview of how well the SLAs have been complied with.
6 Case Study In this case study we focused on virtualization and a distributed setup. Hereby we cover the steps “set up tests”, “run tests”, “analyze” and “recommend” of the ASTAR method. It shows substantial performance improvements and describes which configurations we used to achieve them.
132
H. Krallmann et al.
Fig. 4 Distributed architecture of service replication without a central supervising node
The case study evaluates distribution of service replicas on several virtual machines on one system and on different nodes and the effects on response time and availability. We compare request processing on one system with processing by a distributed system of web services (multiple virtual machines on one hardware node and multiple hardware nodes within a network). We implemented the three options in the same code and object structure. We can define which of the two possible configurations – single node or search for replica nodes and load distribution – is used by setting a flag. We can configure virtual or real network distribution by altering the deployment of the service replicas. For this case study, we chose a setup with a distributed architecture of service replication without a central supervising node (see Fig. 4). This configuration guarantees the equality of all web services since there is no central supervising node. Therefore, we outfitted all working nodes with external interfaces to be able to communicate with clients. Moreover, they are all able to determine the utilized capacity of the nodes within the network and distribute workload accordingly. We use active replication [21] with state machine as status propagation model. We developed an own implementation (ReliableWebService class) that takes care of inter-node communication and management of the distributed system which offers only the required functionality and have thereby kept the performance overhead low. The first action is to try to obtain the current node list from an integrated UDDI server (service interface RWS GetSessionData). Then, the new web service instance propagates itself to all nodes that are already integrated in the system. This allows it to start distributing requests (control queries and load requests) to the other nodes and to receive processing requests from them (service interface RWS InternalCall). The user enters a character string that the web service MyPictureService then transforms to an image. It then sends back the image to the client. For each configuration, we carried out the test 150 times with three repetitions per iteration. Per iteration, the number of requests sent to the service simultaneously is incremented by one. Thus, the workload is linearly increasing. Of the three repetitions per iteration, we compute the average response time. While virtualization on the same hardware node does not bring a performance advantage, an overview of measurements with a distributed setup shows that replication between different systems can be advantageous with higher load hypothesis (# of simultaneous requests >60). This advantage increases to over 50% with # of requests >100 and 10 replicas (see Fig. 5). Performance is also more stable with
Avg. Response Time In ms
Enabling Autonomous Self-Optimisation in Service-Oriented Systems
133
10 Replicas (on different nodes) 4000
One System (no Replicas)
2000 Advantage
0 0
20
40
60
80
100
# Requests (simultaneous)
Fig. 5 Changes in average response times with increasing # of requests – advantage of a distributed solution is over 50%
more replicas. Our replicated solution provided – as expected – much higher availability, which is typical for functional replication [21].
7 Conclusion and Outlook We presented a four-step approach to formalizing and enforcing NFPs such as response time, throughput and availability in autonomous computing environments. A service aggregator can use this approach to perform the aggregation of the services, provide an optimized performance, monitor the performance and thus provide self-optimizing systems. The formalization of NFPs allows us to map between user expectation and existing service level. The presented translucency extension can enforce a service level that meets user expectation by automatically reconfiguring service replicas. The service integrator can easily integrate this extension in existing service-based environments without changing the service itself. The case study demonstrated that we could improve service levels by over 50% under certain load hypotheses. The monitoring solution helps to assess the success of the optimization and to identify further optimization need.
References 1. 2. 3. 4. 5.
W3C: Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language (2007) W3C: SOAP Version 1.2 Part 1: Messaging Framework (Second Edition) (2007) OASIS: UDDI Version 3.0.2 (2004) OASIS: Web Services Business Process Execution Language Version 2.0 (2007) McCoy, D., Natis, Y.: Service-Oriented Architecture: Mainstream Straight Ahead. Gartner Research (2003)
134
H. Krallmann et al.
6. Papazoglou, M.P., van den Heuvel, W.-J.: Service oriented architectures: Approaches, technologies and research issues. The VLDB Journal 16 (2007), 389–415 7. Makripoulias, Y., Makris, C., Panagis, Y., Sakkopoulos, E., Adamopoulou, P., Lytras, M., Tsakalidis, A.: Towards ubiquitous computing with quality of web service support. Upgrade, The European Journal for the Informatics Professional VI (2005), 29–34 8. Yoneki, E., Bacon, J.: Object Tracking Using Durative Events. In: Enokido, T.e.a. (ed.): EUC Workshops 2005. IFIP International Federation for Information Processing (2005) 9. Hong, S., Han, S., Song, K.: The Extended PARLAY X for an Adaptive Context-Aware Personalized Service in a Ubiquitous Computing Environment. In: Enokido, T.e.a. (ed.): EUC Workshops 2005. IFIP International Federation for Information Processing (2005), 288–297 10. Yau, S.S., Wang, Y., Huang, D., In, H.P.: Situation-Aware Contract Specification Language for Middleware for Ubiquitous Computing. The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS’03). IEEE (2003) 11. Tokairin, Y., Yamanaka, K., Takahashi, H., Suganuma, T., Shiratori, N.: An Effective QoS Control Scheme for Ubiquitous Services based on Context Information Management. The 9th IEEE International Conference on E-Commerce Technology and The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services (CEC-EEE 2007). IEEE (2007) 12. Polze, A., Malek, M.: Responsive Computing with CORBA. The First IEEE International Symposium on Object-Oriented Real-Time Distributed Computing IEEE (1998), 73–79 13. Felber, P., Narasimhan, P.: Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications. Confederated International Conferences CoopIS, DOA, and ODBASE. Springer (2002), 737–754 14. David, P.C., Ledoux, T.: An Infrastructure for Adaptable Middleware. Confederated International Conferences CoopIS, DOA, and ODBASE 2002. Springer (2002), 737–754 15. Krishna, C.M., Shin, K.G.: Real-time systems. McGraw-Hill, New York (1997) 16. Shin, K.G., Krishna, C.M., Lee, Y.: Optimal dynamic control of resources in a distributed system. IEEE Transactions on Software Engineering 15 (1989), 1188–1198 17. Haines, J., Lakamraju, V., Koren, I., Krishna, C.M.: Application-level fault tolerance as a complement to system-level fault tolerance. Journal of Supercomputing 16 (2000), 53–68 18. Yu, H., Vahdat, A.: The costs and limits of availability for replicated services. ACM Transactions on Computer Systems 24 (2006), 70–113 19. Milanovic, N., Malek, M.: Current solutions for Web service composition. IEEE Internet Computing 8 (2004), 51–59 20. Stantchev, V., Malek, M.: Architectural translucency in service-oriented architectures, Software, IEEE Proceedings 153 (2006), 31–37 21. Bessani, A.N., da Silva Fraga, J., Lung, L.C., Alchieri, E.A.P.: Active Replication in CORBA: Standards, Protocols and Implementation Framework. On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. Springer (2004), 1395–1412
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems ¨ Matthias Werner, Helge Parzyjegla, Michael A. Jaeger∗ , Gero Muhl, and Hans-Ulrich Heiss
Abstract In our previous work on self-stabilizing content-based routing for publish/subscribe systems, we provided algorithms to realize guaranteed recovery from transient faults. Doing this, we focused on the routing layer and did not explicitly consider the interdependency between the broker overlay topology and the contents of the routing tables. Reconfigurations on the overlay network layer had, thus, been handled as faults on the routing layer, because there has been no coordination between the overlay and the routing layer. In this paper, we present detailed algorithms for incorporating seamless reconfiguration of the broker overlay topology into selfstabilizing content-based publish/subscribe systems. To achieve this, we rely on a coloring mechanisms (that coordinates the actions on the overlay layer and the routing layer) and introduce a self-stabilizing broker overlay topology.
1 Introduction Systems which are self-stabilizing help to reduce the management effort significantly. This is why they gained increasing interest in the last years in the context of research on self-organizing and self-managing IT systems. The idea that a system can recover from arbitrary transient faults in bounded time without manual intervention is striking even though the system may not work correctly during recovery [1]. However, it may be necessary to reconfigure a self-stabilizing system at runtime, for example, due to maintenance reasons. At this point, the feature of autonomous repair can become counterproductive when the reconfiguration is handled by the system as a fault. In this paper, we provide more details on our approach of reconfiguring
M.A. Jaeger, G. M¨uhl, M. Werner, H. Parzyjegla, and H.-U. Heiss Communication and Operating Systems Group, Technische Universit¨at Berlin, Germany e-mail: {michael.jaeger, g muehl, m werner, parzyjegla}@acm.org;
[email protected] B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
135
136
M.A. Jaeger et al.
self-stabilizing publish/subscribe systems sketched in an earlier publication [8]. Due to spacial restriction we do not discuss publish/subscribe systems in general, which are explained in detail in [9]. This paper is structured as follows. In Section 2, we present related work. After discussing the main challenges of in our problem domain in Section 3, we present our approach in Section 4. We close with a discussion in Section 5.
2 Related Work When a system is reconfigured, i.e., it is transformed from one legitimate configuration to another one, it is intuitively expected that certain service guarantees hold. This is in contrast to the situation in which a sudden fault makes it necessary to reconfigure the system. In the following, we show that this expectation is not always fulfilled for self-stabilizing systems due to subtle difficulties. We start with an overview of different approaches to reconfiguration of self-stabilizing systems in literature. The naive approach which is implicitly taken by many layered self-stabilizing systems that do not consider reconfigurations explicitly, is to handle reconfigurations as faults from which the system recovers eventually. For a publish/subscribe system this could mean that changes in the topology of the broker overlay network result in inconsistencies on the routing layer because it is closely related to the topology of the broker overlay. Due to the self-stabilizing mechanism used, the routing tables of the brokers stabilize eventually as described in [10]. This approach does not need any coordination efforts on the cost of possible message loss or duplicates because the broker overlay network is reconfigured regardless of the routing tables of the brokers. Ghosh et al. propose a way to limit the side effects of certain faults in a selfstabilizing system by introducing the concept of fault containment [6]. However, the goals of fault containment and self-stabilization are conflicting in general since adding fault containment to a self-stabilizing protocol may increase the stabilization time [7]. Although fault containment limits the side-effects of reconfigurations which may lead to a fault on an upper layer, it does not prevent service interruption which may constitute in lost messages. This is due to the fact that still no special care is taken of reconfigurations. However, fault containment may reduce the impact of a reconfiguration on the system during convergence. Dolev and Herman [3] propose the class of superstabilizing protocols which are self-stabilizing and explicitly designed to cope with topology dynamics. Superstabilization builds on the idea that a topology undergoes “typical” changes. A passage predicate is introduced which is weaker than the predicate specifying the legitimate states but still useful. In case a typical topology change event happens, the passage predicate provides certain guarantees defined in the passage predicate until the system is back in a legitimate state. Superstabilization represents an effort which explicitly addresses the issue of reconfiguration. Although very interesting, it does not
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems
137
provide a general solution to create superstabilizing protocols. Moreover, the case of layered self-stabilizing algorithms is not discussed, thus, limiting its usefulness for our problem domain.
3 Challenges The conventional approaches to reconfigure a (layered) self-stabilizing system presented above already indicated the challenges that are inherent when reconfiguring the broker overlay topology of a self-stabilizing publish/subscribe system. In the following, we describe them in with respect to self-stabilization. No Service Interruption. During and after reconfiguration, the system must stay in a legitimate state given that no fault occurs. For a self-stabilizing publish/subscribe system this means that the correctness of the publish/subscribe system according to [10] must be maintained which implies that messages must not be lost and clients must not receive notifications more than once. Containment of Changes. A reconfiguration must not result in unintended additional reconfigurations that are not desired by the administrator executing the reconfiguration by purpose. For the broker overlay network this means that one elementary reconfiguration (e.g., a link exchange) must not result in an unwanted additional reconfiguration that was not explicitly requested. Persistence of Changes. Many self-stabilizing algorithms imply a certain structure that is legitimate and towards which the algorithm “pushes” the system. If the reconfiguration does not comply with this structure, it is “repaired” and possibly undone thereby.
4 Coordinated Reconfiguration with Layered Self-Stabilization In previous work, we presented algorithms that render the content-based routing layer self-stabilizing [10]. Thereby, we assumed that the broker overlay network is either static or self-stabilizing. Although a static broker overlay makes the whole system self-stabilizing by definition it is not feasible in practice because reconfigurations of the topology might become necessary due to system dynamics. We already argued that reconfiguring a self-stabilizing publish/subscribe system is different from reconfiguring a regular one. We, thus, propose a self-stabilizing broker overlay network in this section on which self-stabilizing content-based routing is layered upon, and introduce a coordination mechanism between the overlay and the routing layer to cope with the challenges described above. Systems which are layered like in our case can be made self-stabilizing by making all layers individually self-stabilizing. This transparent stacking of selfstabilizing layers is a standard technique which is referred to as fair composition [2]. It is easy to combine self-stabilizing algorithms this way to create a new and more
138
M.A. Jaeger et al.
powerful self-stabilizing mechanism as long as no cyclic dependencies exist among their states. Taking this approach, it is sensible to layer self-stabilizing routing in publish/subscribe systems on top of a self-stabilizing broker topology which employs a self-stabilizing tree algorithm like the ones given in literature (cf. G¨artner’s survey [5] for a good overview). However, this approach has its drawbacks since there is no coordination between the overlay and the routing layer. Our approach in the following is, hence, to realize a self-stabilizing overlay topology which maintains an arbitrary tree structure and to layer self-stabilizing routing on top of it in a way, such that reconfigurations of the overlay topology can be processed without service interruption (i.e., message loss or duplication). Two problems have to be tackled to solve this problem: (i) designing a self-stabilizing broker overlay network which does not necessarily impose a certain structure on the resulting tree and accepts all correct trees and (ii) coupling the self-stabilizing mechanisms on the overlay and the routing layer to allow for atomic topology switches without message loss.
4.1 Coloring Scheme Before we continue with the description of the self-stabilizing broker overlay network, we introduce a coloring scheme which we use to synchronize reconfigurations on the overlay layer with reconfigurations on the routing layer. To achieve this, selected data structures are marked with a color attribute. On the overlay topology layer this concerns the child and parent broker pointers, while on the routing layer the routing entries are affected. Thus, each color may represent a different topology. To make atomic switches between different colors (and, thus, topologies) possible, every broker maintains data structures for three different colors which can be accessed on the respective layer and are stored in the following variables. The variable ccur stores the color of the topology that is currently used, cold stores the color of the topology that has been used last, and cnew stores the color of the topology that will be used when the color changes for the next time. The values of these variables are rotated regularly. We require three different colors because the communication and processing delays in the network can lead to a situation, where messages are still forwarded although the topology has changed meanwhile. If the value of ccur becomes the value of cold , for example, there may still be messages on the network that are colored with cold . To be able to deliver these messages, the topology for cold has to be kept sufficiently long. In the following, we assume for a better readability that the routing entries are stored in separate routing tables for each color although a tag on each entry suffices in the implementation. It is the task of the root broker R to regularly recolor all brokers in the tree. To accomplish this, a timeout runs on R. In order to realize self-stabilization, the timeout also runs on every broker B = R. The actions taken on a timeout are described in the procedure onTimeout and will be explained in the following. After resetting the timer (line 2), the root broker rotates its colors and initializes the child
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems new
new
139 new
broker pointers C c , the parent broker pointer P c , and the routing table T c (lines 5 and 6). Then, the new color is disseminated in a recolor message RECmsg to all child brokers in the topology with the current color. If a child broker does not acknowledge this message, it will be removed (procedure sendToChildren()). 1 2 3 4 5 6 7 8 9 10 11 12 13
procedure onTimeout() resetTimer() if B = R then // root broker starts a new coloring period cold ← ccur ; ccur ← cnew ; cnew ← ccur + 1 mod 3 new cur new new Cc ← C c ; P c ← null; T c ← init m ← new RECmsg (cnew , R) applyReconfig(R) R← ∅ sendToChildren(m) else // every other broker reconnects to the tree joinTree () endif
If another broker that is not the root broker runs into a timeout, this is related to a fault because the timers are chosen in a way such that every broker except for the root broker will never experience a timeout if no fault happened. The broker hence tries to reconnect to the tree (line 12). Here, the coloring mechanism is used to realize a self-stabilizing overlay topology. Details on this topic are presented in the following section. The variable R carries the reconfigurations to be implemented. They are disseminated together with the recolor message and will be explained in detail later. When a broker B = R receives a recolor message as described in procedure onReceiveRec it resets its timer, replies with an acknowledge message ACKmsg , rotates its colors (and changes the value of its current color ccur thereby), and forwards the message to its child brokers. 1 2 3 4 5 6 7 8 9
procedure onReceiveRec(RECmsg m) new if sender = P c then resetTimer() Send ACKmsg to sender cold ← ccur ; ccur ← cnew ; cnew ← m.c new cur new cur new Cc ← C c ; P c ← P c ; T c ← init applyReconfig(m.R) sendToChildren(m) endif
The broker accepts the recolor message only if it has been sent by the bronew ker P c points to (line 2). This test is needed to detect cycles that may result from faults. It might happen that the parent/child pointers of some brokers are perturbed in a way such that a cycle is created in the tree, for example, if the parent pointer of a broker B points to an ancestor broker B , which also has a child pointer to B. In this case it is not obvious from the local view of the brokers that a cycle exists and recolor messages would be forwarded in the cycle forever if B would accept recolor messages from B. Since B does not accept and acknowledge a recolor message
140
M.A. Jaeger et al.
from B, B will eventually remove B from its set of child brokers as described in procedure sendToChildren below. Tree partitions are detected since recolor messages are not received in the partitions which do not contain R. Thus, the brokers in the partition will eventually run into a timeout. The rest of the procedure is similar to the procedure onTimeout() except for replying with an acknowledge message to the sender to indicate that the broker is alive (line 4). The receiver of an acknowledge message sets a flag for the sending broker as described in the procedure onReceiveAck: 1 2 3 4
procedure onReceiveAck(ACKmsg m) cur if sender ∈ C c then Set flag ( sender) endif
This flag is used for a second chance algorithm in the procedure sendToChildren() to remove faulty child broker pointers as previously discussed for the case of cycles (procedure sendToChildren, lines 3–8). 1 2 3 4 5 6 7 8 9
procedure sendToChildren(RECmsg m) cur foreach B ∈ C c do if flag ( B ) is set then Send m to B Unset flag( B ) else Remove child broker B endif endfor
4.2 Timeouts on the Broker Overlay Layer We already discussed the dissemination of recolor messages and introduced parts of the overlay network management, where faulty child broker pointers are detected and discarded. In this section, we further specify timeouts and present the integration of brokers which were disconnected from the overlay due to a fault. The self-stabilizing mechanism of the broker overlay network is based on timeouts regarding the receipt of recolor messages. On R, a timeout triggers a new recolor message to be sent to all its child brokers. On every other broker, a timeout occurs if it does not receive a correct recolor message in time. As recolor messages are forwarded recursively down the tree, the last leaf broker receives the message at the latest after time h · δmax , where h is the maximum height of the tree (i.e., the diameter d of the graph is at most 2 · h) and δmax (δmin ) is the maximum (minimum) delay for processing and sending a message to a child broker. As the tree may degenerate arbitrarily h can be at most equal to the maximum number of brokers η in the system (which we assume is known and stored in ROM for convenience). Given that the timeout on R occurs every time period ξ , a timeout ξ with
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems
141
ξ = ξ + h · (δmax − δmin ) is necessary on every broker B distinct from R, which is resetted every time a new recolor message is received from its parent broker (procedure onReceiveRec, line 3). The procedure resetTimer() works as follows: 11 12 13 14 15 16
procedure resetTimer() if B = R then timer ← ξ else timer ← ξ + h · (δmax − δmin ) endif
When B = R runs into a timeout, it took more than ξ to receive the next recolor message after the last one. This can only be due to a fault, since forwarding a message from R to B cannot take more than h · δmax . In this case, B tries to rejoin the tree which is described in the procedure joinTree: 18 19 20 21 22 23 24
procedure joinTree() nxtParent ← R do // ask recursively for a new parent tryParent ← nxtParent nxtParent ← tryParent.askForPlaceOrPointer() until nxtParent = tryParent Connect to tryParent
There are many ways to find a new parent broker for B depending on the topology requirements. One is to look for an arbitrary broker which has less than b child brokers down the tree and use it as a new parent for a requesting broker. This way, the broker is integrated as a leaf into the tree and the degree of the broker topology can be maintained, however, it does not prevent the degeneration of the tree deterministically. An example for the procedure askForPlaceOrPointer() is depicted in procedure askForPlaceOrPointer: 26 27 28 29 30 31
procedure askForPlaceOrPointer() new if |C c | < b then return B else new return random C ∈ C c endif
The broker overlay is in a correct state if the parent and child broker relation between every broker in the system is consistent for the data structures colored with the values of cold and ccur at R (for the value of cnew at R the pointers may be inconsistent due to message propagation delays) and the tree which is defined old old cur cur by P c and C c , and P c and C c , respectively, is not partitioned (i.e., there new new is exactly one way from one broker to another). The value of C c and P c is treated differently as explained in the next section about reconfiguration. Partitions or cycles are detected as explained in the context of procedure onReceiveRec().
142
M.A. Jaeger et al.
4.3 Reconfiguration The focus so far laid on the overlay network management and how to handle control messages and notification routing. The attribute color has been introduced to synchronize actions on the overlay network layer with the publish/subscribe routing layer. All this was preparatory to incorporate reconfiguration into self-stabilizing publish/subscribe systems which we present in this section. Whenever a leaf broker wants to join or leave the overlay network or a link has to be replaced by another one, the topology of the broker network changes. When a reconfiguration should be implemented, the intended changes are sent to R, which collects them in the set R and disseminates them in the next recolor message. Every broker that receives a recolor message carrying reconfiguration data that affects it new new implements the change into its P c and C c pointers (procedure call applyReconfig() on line 7 of procedure onReceiveRec()). The recolor message serves as a synchronizer to prevent race conditions when switching from one topology to ancur other. Recolor messages are routed using C c of every broker B which receives a cur recolor message (where the value of c of B equals the value of ccur of R). Thus, reconfigurations take two recolor messages to become active: one to disseminate the reconfiguration and one to activate it. Figure 1 shows an example reconfiguration scenario, where B2 shall be moved as a child broker from B1 to R. The solid lines depict the parent/child relations for ccur while the dashed lines depict the parent/child relation for cnew (Figure 1a). White brokers turn gray when they have received the recolor message and rotated their colors. The reconfiguration request is sent to R which incorporates it into its child broker pointers before sending it with the next recolor message in the set R to B1 cur over C c (Figure 1b). On receiving the recolor message, B1 updates its parent/child new new pointers P c and C c , i.e., R stays the parent of B1 and B1 will have no child brokers anymore (Figure 1c). Then, the recolor message is forwarded to B2 which new removes B1 as the next parent broker and sets its P c pointer to R (Figure 1d). The new parent/child pointers become active with the next recolor message disseminated by R.
(a)
(b)
(c)
(d)
Fig. 1 Example reconfiguration of the self-stabilizing publish/subscribe broker overlay topology
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems
143
As mentioned earlier, a change in the topology in general implies a change in the routing tables on the publish/subscribe routing layer. As the routing tables are regularly rebuilt from an initial routing configuration, the reconfiguration of the overlay new topology can be incorporated by delaying the switch to the new topology in P c new and C c long enough, such that they have been rebuilt completely when the switch to the new topology is executed. In the following, we describe the self-stabilizing routing algorithm that is layered on top of the self-stabilizing broker topology.
4.4 Integration of Self-Stabilizing Routing Recolor messages are used on the topology layer to trigger timeouts and coordinate reconfigurations. To achieve the latter, three topologies are held in form of colored parent/child pointers. On the routing layer, the color is used for two different purposes: (i) to rebuild the routing tables periodically and (ii) to avoid notification loss and duplicates. It is necessary to periodically rebuild the routing tables as we assume that they can be perturbed arbitrarily. Therefore, we use the leasing mechanism described in [11]: clients regularly refresh their subscriptions and brokers use a second chance algorithm to remove stale entries from their routing tables. To incorporate reconfigurations into this mechanism, we require that control messages (subscriptions and unsubscriptions) are colored with cnew , while notifications are colored with ccur . Notifications and control messages are then forwarded and applied to the routing cur new tables T c and T c , respectively. Thereby, we ensure that notifications will be routed over the topology, the publishing broker belonged to at publication time. This way, we prevent duplicates, i.e., notifications sent multiple times to the same broker, which could only happen if a notification can be colored with multiple colors and takes different paths to the same broker in case of reconfigurations. The second new chance algorithm is implemented through rotating the colors and initializing T c with a legal initial routing configuration init (procedure onTimeout, line 6, and procedure onReceiveRec, line 6, respectively). new The algorithm has to take care that the routing table T c on every broker in the system is complete when the next recolor message is sent by R. Otherwise, notifications might get lost. Therefore, the refresh period ρ for every subscriber has to be chosen according to the values determined in [11]: ξ > ρ + 2 · h · δmax . A notification is colored with ccur of the publishing broker. It may happen that a notification encounters a recolor message on its way to R. In this case, the following brokers already rotated their colors such that the color of the notification is now different from ccur of those brokers. To be able to route the notification until it reached all intended receivers, the brokers also store the data structures colored with cold , where cold is the value of ccur before the last recolor message has been received. New (un)subscriptions are sent out immediately. Since a subscription may also encounter a recolor message on its way to R, control messages are colored with cnew old new of the issuing broker to avoid that subscriptions are present in T c and later T c
144
M.A. Jaeger et al.
Fig. 2 A new subscription encounters a recolor message on its way to R
cur
but not in T c . Figure 2 depicts a situation, where a new subscription is issued shortly before a new recolor message is received. Before the subscription has reached every relevant broker, it encounters a recolor message which rotates the colors of the brokers. Due to the FIFO property of the communication channels, the color of the subscription equals the value of color ccur of every subsequently reached broker, because it “follows” the recolor message afterwards. At time tr , when the subscription and the recolor message have reached every broker in the system, the new subscription is consistently incorporated into the routing tables of every affected broker. Due to the choice of the timeout ρ, the new subscription will also be incorporated into the routing table of every broker when the next recolor message is sent by R. If the color of the subscription would have been the value of ccur of B in this old example, the subscription would have been incorporated into T c of every broker in the system from time tr until time t0 + 2 · ξ . In this period, the subscription would cur not be present in T c of the brokers such that “old” notifications would be routed to B but not the current ones. Thus, notifications could be lost.
4.5 Self-Stabilization A self-stabilizing system is by definition guaranteed to eventually reach a legitimate state in case no fault occurs for a long enough time period. In this section, we discuss the different states the systems can end up in due to faults. The state is defined by the contents of the variables stored in RAM a broker uses. These comprise the following values a broker holds: The values of its color variables; its child and parent pointers (overlay network layer); and its routing table entries. A corruption of these values may lead to the following faults: 1. On the overlay network layer: Network partitioning; and cycles 2. On the routing layer: Messages not forwarded or not delivered although they should be; and messages forwarded or delivered although they should not be
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems
145
In addition to that, brokers and links may crash and come up again due to transient faults. Since we assume that broker crashes and link failures are only transient, we can concentrate on the time after the transient faults, when the brokers or links are up again – with a possibly corrupted state. Messages inserted due to faults can be modeled as faults that manifest in the state of the broker or as messages delivered although they have never been published. However, message inserted due to faults may be forwarded for at most time h · δmax until they have left the system. In the following, we discuss all faults which may happen and how the system stabilizes itself afterwards. Network Partitioning. If the network becomes partitioned, for example, because the parent and child broker pointers of two brokers are perturbed accordingly, the brokers in the part of the network which does not include R will eventually run into a timeout because they will not receive recolor messages any more. This case is handled in procedure onTimeout, where a regular broker that runs into a timeout tries to rejoin the tree by contacting R. Cycles. Cycles in the broker overlay topology may result from perturbed parent/child broker pointers, for example, if an ancestor B of broker B is at the same time a child of B. In this case, B will send a recolor message to B who has already received it. Due to the checks done in procedure onReceiveRec, B will not accept the recolor message. Accordingly, it will also not reply with an acknowledge message. Thus, B will remove B eventually from his child brokers set (procedure onReceiveAck and procedure sendToChildren). Cycles in the topology, where all parent/child broker pointers are consistent, are handled as network partitions because this case can only happen if the network is partitioned and the root broker of the partition without R additionally has a parent broker. Perturbed Routing Tables. Regarding the routing tables, we follow the same approach as presented in [10] which relies on a precautionary reset. This guarantees that routing table entries which have been modified or inserted due to a fault will vanish eventually and that routing table entries that have been removed due to a fault will be inserted again. Colors. The colors of a broker are stored in the variables cnew , ccur , and cold and can be perturbed arbitrarily. However, every recolor message rotates all color values and sets the value of cnew to the color stored in the recolor message (cf., procedure onReceiveRec). Thus, the colors are eventually consistent with those of R. Root Broker. The root broker R plays a central role in this algorithm because it functions as the synchronizer as well as the central contact which is responsible for coordinating reconfigurations. For the whole system being self-stabilizing it is therefore necessary to implement R in a self-stabilizing fashion. Although, R is already self-stabilizing according to the timeout mechanism, it can be implemented more robust by using a root group. Such a root group consists of several brokers that take over R’s task in a predefined, globally known order in case the previous root broker fails as discussed in [4].
146
M.A. Jaeger et al.
5 Discussion Reconfiguring arbitrary self-stabilizing systems with respect to certain guarantees implies some subtle problems and requires further research efforts. We analyzed these problems and concluded that simply layering self-stabilizing algorithms on the topology and the routing layer is not feasible in publish/subscribe systems because the contents of the routing tables are depending on the broker overlay topology. Our approach relies on a combination of self-stabilizing content-based routing as introduced in [10] with a self-stabilizing broker overlay network that accepts arbitrary acyclic broker topologies in case no fault occurs. In order to meet the dependencies between both layers, we introduced a coloring mechanism that coordinates actions on them. We were, thus, able to prevent message loss during reconfiguration and guarantee certain message orderings. The coloring mechanism described is not limited to self-stabilizing publish/subscribe systems. It is a general principle which can be used to realize seamless reconfigurations in other layered self-stabilizing systems as well. However, our approach currently requires a dedicated root broker that is responsible of coordinating reconfigurations. We proposed to realize this broker in form of a cluster to prevent a bottleneck or to take a modular approach for an improved scalability. Both issues require more research and remain open for future work.
References 1. E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643–644, 1974. 2. S. Dolev. Self-Stabilization. MIT Press, Mar. 2000. 3. S. Dolev and T. Herman. Superstabilizing protocols for dynamic distributed systems. Chicago Journal of Theoretical Computer Science, http://portal.acm.org/citation.cfm? id=866056&dl=&coll=#, 4:1–40, Dec. 1997. Special Issue on Self-Stabilization. 4. S. Dolev and R. I. Kat. HyperTree for self-stabilizing peer-to-peer systems. In Proceedings of the 3rd IEEE International Symposium on Network Computing and Applications (NCA’04), pages 25–32. Washington, DC, USA, 2004. IEEE. 5. F. C. G¨artner. A survey of self-stabilizing spanning-tree construction algorithms. Technical Report 200338, Swiss Federal Institute of Technology (EPFL), School of Computer and Communication Sciences, June 2003. 6. S. Ghosh, A. Gupta, T. Herman, and S. Pemmaraju. Fault-containing self-stabilizing algorithms. In Proceedings of the 15th Annual ACM Symposium of Distributed Computing (PODC’96), pages 45–54. ACM Press, New York, NY, USA, May 1996. 7. S. Ghosh and S. Pemmaraju. Trade-offs in fault-containing self-stabilization. In S. Ghosh and T. Herman, editors, Proceedings of the 3rd Workshop on Self-Stabilizing Systems (WSS’97), pages 157–169. Charleton University Press, New York, NY, USA, 1997. 8. M. A. Jaeger, G. M¨uhl, M. Werner, and H. Parzyjegla. Reconfiguring self-stabilizing publish/subscribe systems. In R. State, S. van Meer, D. O’Sullivan, and T. Pfeifer, editors, Proceedings of the 17th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM’06), volume 4269 of Lecture Notes in Computer Science (LNCS), pages 233–238. Springer, Heidelberg/Berlin, Germany, Oct. 2006.
Algorithms for Reconfiguring Self-Stabilizing Publish/Subscribe Systems
147
9. G. M¨uhl, L. Fiege, and P. R. Pietzuch. Distributed Event-Based Systems. Springer, Berlin, Germany, Aug. 2006. 10. G. M¨uhl, M. A. Jaeger, K. Herrmann, T. Weis, L. Fiege, and A. Ulbrich. Self-stabilizing publish/subscribe systems: Algorithms and evaluation. In J. C. Cunha and P. D. Medeiros, editors, Proceedings of the 11th International Conference on Parallel Processing (Euro-Par 2005), volume 3648 of Lecture Notes in Computer Science (LNCS), pages 664–674. Springer, Heidelberg/Berlin, Germany, 2005. 11. G. M¨uhl, A. Ulbrich, K. Herrmann, and T. Weis. Disseminating information to mobile clients using publish/subscribe. IEEE Internet Computing, 8(3), May 2004.
Combining Browsing Behaviors and Page Contents for Finding User Interests Fang Li, Yihong Li, Yanchen Wu, Kai Zhou, Feng Li, Xingguang Wang, and Benjamin Liu∗
Abstract This paper proposes a system for finding a user’s interests based on his browsing behaviors and the contents of his visited pages. An advanced client browser plug-in is implemented to track the user browsing behaviors and collect the information about the web pages that he has viewed. We develop a user-interest model in which user interests can be inferred by clustering and summarization the viewed page contents. The corresponding degree of his interest can be calculated based on his browsing behaviors and histories. The calculation for the interested degree is based on Gaussian process regression model which captures the relationship between a user’s browsing behaviors and his interest to a web page. Experiments show that the system can find the user interests automatically and dynamically.
1 Introduction The Internet has become an important part of our daily life. Everyone has their own purposes or interests on the Internet access. How to find their interests has become one of the recent research goals for personalized search, collaborative filtering and recommendation systems. Research on user interests can be broadly divided into two sides: client side and server side. Client-side research focuses on learning user model based on browsing history or behaviors, such as the time spent on the page, the count of clicking or scrolling, as well as other user activities [2]. The research on server side focuses more on extracting common interests or community patterns using web or search logs. However, there is no reason to believe that every user may need whatever the F. Li, Y. Li, Y. Wu, K. Zhou, F. Li, X. Wang, and B. Liu Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail:
[email protected] ∗
Intel Asia-Pacific Shanghai, China
B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
149
150
F. Li et al.
other users would need. One hundred user could have one hundreds needs. Therefore, many recent researches [2, 7, 8] have moved to the client side to analyze user web navigation and interactive behaviors to explore the use of personalized applications by tailoring the information presented to individual users. Our research also focuses on the client side, combining browsing behaviors and content analysis to generate the user interests automatically. We all know that if a user does not like sports, he will probably not view sports pages; if a user spends more time on specific pages, he would show some interests in the content of the pages. Thus, the content of viewed web pages may reflect the interests of a user, that the browsing behaviors are valuable indicator to infer a user’s interests. The rest of this paper is organized as follows. Section 2 introduces a plugin tool to collect user browsing information. Section 3 describes our model and method. Section 4 presents the experiments of our system. Finally we make some conclusions.
2 Information Collection In order to collect information during a user’s surfing on the Internet, we implemented a browser plug-in. It can be divided into two parts: the page data collector (PDC) and the user behavior collector (UBC). The PDC is used to collect the information about visited pages and their contents. The UBC is to track the browsing activities and collect user browsing behaviors.
2.1 Page Data Collection The information of a web page can be divided into two categories: page information and content information, given as follows: Page Information • The uniform resource locator (URLs) of the visited page: It is useful for extracting contents and evaluating its relevance to the next visited page. • The page capacity: It includes how large the literal content of a page, how many pictures or images, how many out-links of the page. These factors have strong relation with the time that a user spent on the page. Content Information • The title of a page: It describes the main idea of a page. • The content of a page: It describes the essence of a page which can reflect a user’s interests.
Combining Browsing Behaviors and Page Contents for Finding User Interests
151
2.2 Web Page Denoising In order to obtain the content of a page, the module of Web Page Denoising (WPD) is implemented to eliminate irrelevant (noisy) information of the page, such as advertisements, copyrights, navigations and page scripts etc. WPD consists of two parts: page segmentation and noise filtering. The page segmentation part parses the structure of a web page and segments the page into several visual blocks. The noise filtering part calculates the importance of the above blocks based on their features and then filters out noising blocks. The Step of WPD algorithm is briefly given as follows: Step1: parse the structure of a given HTML page into a document object model (DOM) tree. Step2: use the vision-based page segmentation algorithm (VIPS) [3] to segment a page into a certain number of visual blocks. Step3: represent each of the visual blocks as a feature vector based on our block importance model. The spatial features (position and size) and other features (the number of images and out-links) are extracted and used to construct a feature vector. Step4: train the block importance model based on Support Vector Machine with RBF kernel (the Radius Basis kernel Function) (RBF-SVM) [1]. Step5: extract the content of the most important visual blocks as the content of a page. Noisy information of unimportant visual blocks is filtered out thereafter.
2.3 User Behaviors Collection The browsing behaviors are valuable indicator to infer his interests. After installed the browser plug-in on the client side, our UBC can track the user’s activities with the web browser and instantly log the following kinds of user browsing behaviors: • The time that a user spends on browsing a page • The amount of scrolling • The URLs sequence of a user clicks during a visit Some browsing behaviors may not directly show whether the user likes or dislikes a page. For example, a user may have to slide the page several times to go through it. A long page possibly costs a user much time. Combination of browsing behaviors and page capacity can produce user interests.
152
F. Li et al.
3 Finding User Interests 3.1 Problem Definition Let P = {P1 , P2 , ..., Pn } denotes a set of pages that a user has browsed. Each page can be represented as an information vector. Definition 1 (Information Vector). An information vector of a page is defined as a ! 7-tuple infoi = sizei , linki , imagei , scrolli , timei , dscrolli , dtimei ], in which sizei is the literal capacity of the page, linki is the number of out-links, imagei is the number of images, scrolli is the amount of user scrolling and timei denotes the time spent on the page. Definition 2! (User-Interest Vector). A" user’s interest vector is defined as an mtuple IG = (G1 , IG 1 ), . . . , (Gm , IG m ) , where m denotes the number of the user’s interests, Gi denotes the i th interest and IG i denotes the corresponding interested degree of Gi . Each of his interests is represented as a set of keywords. The first problem we need to solve is to develop a reasonable user-interest model which can infer a user’s interested degree to a viewed web page based on his browsing behaviors. The second problem is to find what an interest is, and how to calculate its interested degree. As follows, Section 3.2 introduce our method to solve the problem 1, Section 3.3 tackles the problem 2.
3.2 User-Interest Model According to the definition of an information vector, 5 of the 7 features can be obtained directly from the plug-in, while the last two features dscrolli and dtimei need to explain as follows: We observe that a web page is accessed by a hyperlink of the previous page. For example: a user may go to http://www.sony.com and follow an out link on the page: http://www.sonystyle.com/digitalmaging/cyber-shot.htm in order to find his interested information. The previous page is used to help him to find the latter page. In order to accumulate the time spent and the scrolling of the previous page, two features, dscrolli and dtimei , are defined to evaluate the interest gain of a latter page. The two features are calculated using Eqs. 1 and 2. dscrolli = Simurl (urli , urli−1 ) · IP i−1 · [scrolli − scrolli−1 ]
(1)
dtimei = Simurl (urli , urli−1 ) · IP i−1 · [timei − timei−1 ]
(2)
where Simurl is the URL similarity between two consecutive pages which is calculated in Eq. 9. Simurl (urli , urlj ) =
2 · length(common(urli , urlj )) length(urli ) + length(urlj )
(3)
Combining Browsing Behaviors and Page Contents for Finding User Interests
153
where common(urli , urlj ) denotes common prefix substrings of two URLs and length(urli ) denotes the length of urli . Based on the information vector, the Gaussian process regression model (GPR) [6] is used to train our user-interest model M. This interest model captures how the information vector of a page is related to the interested degree of the page using the Eq. 4. N αi · k(Infoi , Infonew ) (4) IPnew = i=1 2 −1 Where N is the number of pages in a given training set, α = (K + δ EN ) and K = k(I nf oi , I nf oj ) N ×N . The radius basis function (RBF) is considered as the kernel function used in Eq. 4, and calculated as follows: $ # (5) k(Infoi , Infoj ) = exp −γ (Infoi − Infoj )2
Where the hyper-parameter γ denotes the length scale, of which the value is 1.0. Given a new page Pnew and its information vector Infonew , the RBF-GPR model can predict a user’s interested degree of the page. For simplicity, the degree of a user’s interest is defined as the sum of the corresponding interested degrees of all those pages related to this interest. I Pj (6) IGi = Pj ∈Gi
Where Gi denotes a user’s interest, Pj ∈ Gi and Pj is a single web page.
3.3 Finding User Interests Page contents are important for finding user interests. Our clustering algorithm first utilizes Kaufman approach (KA) [5] for initialization, which initializes clustering by successively selecting representative page instances until m instances have been found. Then it uses the selected m seeds as the initial centroids and finally performs the spherical K-Means algorithm [4] to divide all the pages intom clusters. Based on the clustering results, some keywords are extracted to represent user interests and the summarization method is implemented to provide some detailed information for each interest.
4 Experiments To evaluate the performance of our system, we conducted two experiments. One experiment was conducted to evaluate the effectiveness of the user-interest model based on RBF-GPR. The other one was conducted to validate whether users were satisfied with the results found by the system. There were 13 voluntary students
154
F. Li et al.
jointed in our experiments. Each participant was given three tasks according to the following requirements: 1. Use Internet Explorer (6.0 or 7.0) browser embedded with our plug-in to surf the web. 2. Assign an interest score of [0, 100] to a visited page. 3. Predefine his interests by using some words and phrases. From the first task, we collected a dataset of about 2-week information (from September 29, 2007 to October 10, 2007) gathered from 13 voluntaries, the dataset consists of the page data and the browsing behaviors. The total number of the visited web page was about 3,350 which covers different topics including politics, culture, economy, science, entertainment and so on. From the second task, we obtained the interested degree for each visited pages rated by each participant. From the third task, we have collected the predefined interests of 13 voluntaries manually. These interests are considered as the reference results. The average number of predefined interests was 10.5, minimum number was 7 and maximum was 13.
4.1 Evaluation of User-Interest Model The dataset was divided into two groups: 65% of our dataset were used for training the model, and the rest was used for testing. There are two experiments: Evaluate the RBF-GPR model for measuring prediction performance For comparison purposes, we use the mean square error (MSE) to validate the effectiveness of our proposed user-interest model. MSE(U ) =
1 2 (fˆ(inf o(x)) − f (inf o(x))) · |DU |
(7)
x∈DU
Where DU is a set of the pages visited by the user U , x is a page instance in DU and info(x) denotes its information vector, f (info(x)) denotes the user-predefined interested degree of x and fˆ(info(x)) denotes the model-inferred interested degree of x. Table 1 shows the results.
Table 1 The evaluation results of the RBF-GPR model for measuring prediction performance User
Pages
Mean Square Error
1 2 3 4 5 6 Ave.
672 213 122 122 144 167 240
0.046 0.046 0.045 9.945 0.081 0.018 0.0485
Combining Browsing Behaviors and Page Contents for Finding User Interests
155
Table 2 The results of evaluation of the RBF-GPR model based on the distribution of a user’s interests User System-Found MAE SRCC Interests Number 1 2 3 4 5 6 Ave.
9 9 10 9 9 9 9.17
0.0101 0.0086 0.0091 0.0189 0.0075 0.0110 0.0109
1 1 0.952 0.917 0.833 0.917 0.936
Evaluate the RBF-GPR model based on its influence on the distribution of a user’s interests We use the mean absolute error (MAE) to measure the influence of our model on the distribution of the user’s interests. MAE(U ) =
1 2 ˆ (d(G) − d(G)) · |GU |
(8)
G∈GU
Where GU is a set of a user’s (U ) interests found by our system, G is one of the user’s interests, d(G) denotes the real interested degree of G obtained from the userˆ predefined interested degrees of the pages and d(G) denotes the system-evaluated interested degree of G. The Spearman correlation coefficient (SRCC) is used to calculate the relevant strength of the system rating and human rating. SRCC(rank , rank) = 1 −
6 × SRDS(rank , rank) u · (u2 − 1)
(9)
where rank is the system-evaluated ranking of the found interests, rank is the user-given ranking of the found interests, SRDS(rank , rank) denotes the difference between rank and rank and u = |rank|. Table 2 indicates two kinds of rankings are highly similar, the influences caused by our model based on the differences between the system-generated distribution and the real distribution of a user’s interests can be ignored.
4.2 Validation of User Interests The experiment was conducted to validate whether the users are satisfied with the results. Each user was asked to check each interest generated by our system and assign a score (0 to 5) for the interest. The fair-score of the system-found interests of the 13 participants is on the average 3.041 and the average best-score is 4.154, which proves that our system can find user’s interests well, and the users are generally satisfied with the results. However, the average worst-score of the found interests
156
F. Li et al.
is 1.385, which indicates that our system does not always yield good performance. The reason is that the clustering algorithm can not achieve 100% precision and noise information of the page also decreases the clustering result.
Conclusion In this paper, we propose a system to investigate the problem of finding user interests. Our system utilizes the implemented plug-in to collect and track browsing behaviors of a user. The system combines the page contents and browsing behavior analysis to find and generate the user’s interests automatically. Experiments show that our system can infer the interested degrees of visited pages based on user’s browsing behaviors. A summary can compensate the incorrectness of keywords and indicate more detailed information about each interest. In the future, we will improve the quality of clustering algorithm by using more language technologies. We plan to use hierarchical clustering algorithm, to identify the hierarchical structure of user interests. Acknowledgments We are grateful to Mr. Zhenggang Wang for implementation of the plug-in. We also thank Mr. Zuoer Lee for analyzing many clustering algorithms. Finally we want to express our sincere thanks to Prof. Bo Yuan for English language correction. This research is supported by the Intel China Lt. Co. and the UDS-SJTU joint research lab for language technologies (http://lt-lab.sjtu.edu.cn)
References 1. A Library for Support Vector Machines (LIBSVM). http://www.csie.ntu.edu.tw/∼cjlin/ libsvm/ 2. Atterer R, Wnuk M, and Schmidt A (2006) Knowing the User’s Every Move: User Activity Tracking for Website Usability Evaluation and Implicit Interaction. In Proceeding of the 15th International Conference on World Wide Web (Edinburgh Scotland, May 23–26, 2006). WWW’06, ACM Press, New York, pp 203–212 3. Cai D, Yu SP, Wen JR and Ma WY (2003) VIPS: a Vision-based Page Segmentation Algorithm. Microsoft Technical Report (MSR-TR-2003-79), November, 2003 4. S.M. Wild (2003) Seeding non-negative matrix factorizations with the spherical K-Means clustering. MS Thesis for the Department of Applied Mathematics, University of Colorado, April 2003 5. Lozano JA, Pena JM and Larranage, P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20: 1027–1040, 1999 6. Rasmussen CE and Williams CKI (2006) Gaussian Processes for Machine Learning, MIT Press, 2006 7. Weinreich H, Obendorf H, Herder E, and Mayer M (2006) Off the Beaten Tracks: Exploring Three Aspects of Web Navigation. In Proceeding of the 15th International Conference on World Wide Web (Edinburgh Scotland, May 23–26, 2006). WWW’06, ACM Press, New York, pp 133–142 8. White RW, and Drucker SM (2007). Investigating Behavioral Variability in Web Search. In Proceeding of the 16th International Conference on World Wide Web (Alberta Canada, May 8–12, 2007). WWW’07, ACM Press, New York, pp 21–30
Patent Classification Using Parallel Min-Max Modular Support Vector Machine Zhi-Fei Ye, Bao-Liang Lu∗ , and Cong Hui
Abstract The patent classification problem has a very large scale dataset. Traditional classifiers cannot efficiently solve the problem. In this work, we introduce an improved parallel Min-Max Modular Support Vector Machine (M3 -SVM) to solve the problem. Both theoretical analysis and experimental results show that M3 -SVM has much less training time than standard SVMlight . The experimental results also show that M3 -SVM can achieve higher F1 measure than SVMlight while predicting. Since the original M3 -SVM costs too much time while predicting, in this work, we also introduce two pipelined parallel classifier selection algorithms to speed up the prediction process. Results on the patent classification experiments show that these two algorithms are pretty effective and scalable.
1 Introduction Whenever a new patent application is submitted, previous similar patents must be retrieved first to evaluate the application request. This work crucially relies on the proper patent classification. Highly reliable patent classification also serves in many problems, such as protecting patent-issuing authorities and analyzing research fields, etc. As the number of new patent applications rapidly increasing in these years, manual patent classification is becoming too expensive, and this brings the need of automated patent classification.
Z.-F. Ye, B.-L. Lu, and C. Hui Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail: {zhifei ye, bllu; huicong}@sjtu.edu.cn ∗ This work was partially supported by the National Natural Science Foundation of China under the
grants NSFC 60473040 and NSFC 60773090, the Key Laboratory for Intelligent Computing and Intelligent Systems, Ministry of Education and Microsoft, and the Okawa Foundation Research Grant. B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
157
158
Z.-F. Ye et al.
Many researchers have applied machine learning techniques to automated patent classification. Fall et al. [1] tried many basic classifiers such as KNN, Na¨ıve Bayes, and SVM on English patents, but they only tried a small subset WIOP-alpha that contains 75,250 samples in all. Larkey [2] make an English patent classification system by KNN, and 1.5 million documents are used for training. Because the automated patent classification problem is large scale, multi-labeled, imbalanced and hierarchical, there is still no system which can fully solve the problem. In this work, we will introduce Min-Max Modular Support Vector Machine to cope with the automated patent classification problem. M3 -SVM is an efficient parallel classifier which can handle very large scale problems. Moreover, it naturally adopts prior knowledge to help its task decomposition and increases the classification accuracy. The rest of this article is organized as follows: first we will introduce the problem of international patent classification (IPC); then the framework of M3 -SVM and how it works with the IPC problem is given in detail; after that, we will display the experimental settings and results of 7 years Japanese patents followed by comparison and analysis of the result; finally we will draw the conclusion and discuss some future work.
2 International Patent Classification The International Patent Classification (IPC) provides for a hierarchical system of language independent symbols for the classification of patents and utility models according to the different areas of technology to which they pertain. IPC has a huge hierarchical taxonomy including 8 sections, about 120 classes, about 630 subclasses, and approximately 69,000 groups from top down. The amount of patents increases rapidly each year. Taking Japanese patents for example, there are about 350,000 new patents every year. Each patent includes four text fields: title, abstract, claim and description. Many researchers only use part of these text fields due to simplify the problem.
3 Min-Max Modular Support Vector Machine The idea of part vs. part task decomposition and min-max modular network classifier ensemble was first developed by Professor Lu using base classifiers of neural networks [4] and Support Vector Machines [5]. In this work, we only consider Support Vector Machine as the base learner. The M3 -SVM has three major steps in its learning process: 1. Task decomposition. Given a K-class problem, we first divide it into K2 twoclass sub-problems using one vs. one strategy. For each two-class sub-problem, if
Patent Classification Using Parallel Min-Max Modular Support Vector Machine
159
it is still hard to learn, we can further break it down into a series of sub-problems as small as we expected. 2. Parallel sub-problem learning. Each sub-problem is independent, which can be learned parallel using base classifiers such as SVM. 3. Classifier ensemble. After each sub-problem is learned, we integrate them using min-max modular network to be a composite classifier for the original problem.
3.1 Task Decomposition There are some difficulties in learning large scale datasets. First, large scale datasets might be too large to fit in the memory. For example, 10 years of Japanese patent data includes more than 3 million entries, and need more than 20 GB to store the training file. Second, large scale datasets bring too large QP problems for SVM, which take too much time to optimize. One way to solve large scale problems is to break down the training dataset. Let T denote the training set of a K class problem: T = {(Xl , Yl )}L i=1
(1)
where Xl ∈ R n is the input feature vector, Yl ∈ R K is the desired output, and L is the total training sample number. We first decompose a multi-class problem into two-class sub-problems using one vs. one strategy. T = {Ti,j [i = 1, 2, . . . K − 1, j = 1, 2, . . . K} Ti,j = Ti+ ∪ Tj− Ti+ = {Xl , 1}l∈Ci
(2)
Tj− = {Xl , −1}l∈Cj where Ci is the index set of samples that belong to class i. These two-class sub-problems contain all the samples of the corresponding two classes, and might still be too large or too imbalanced on sample number. A further decomposition step will be carried out if aforementioned situation exists. Samples of each class are equally divided into subsets of size d. Let & % Ci (3) Ni = d denote the number of subsets of class i after decomposition, then a two-class training set Ti,j is break down to Ni × Nj sub training sets by pair-wise combining these subsets of size d.
160
Z.-F. Ye et al.
By far, we have created N =
K−1
K
i=1 j =i+1
Ni × Nj sub training sets, each of them
has d positive samples and d negative samples. It is easy to find that any two of these N sub-training sets is independent, and they can be parallel trained straight forward.
3.2 Parallel Sub Problem Learning Let’s analyze the time complexity for training all these N sub problems in comparison with training a K -class standard SVM. As Joachims [3] discovers the time complexity of training a dataset with n samples using SVMlight is O(nα ), where α is roughly 1.2 ∼ 1.7. Then training a K-class problem using SVMlight with one vs. one task decomposition need a total time complexity of ⎛ ⎞ K K−1 α O⎝ (Ci + Cj ) ⎠ (4) i=1 j =i+1
And if we train N decomposed sub problems, the total time complexity is roughly: ⎛ ⎞ K−1 K O⎝ Ni × Nj × (2d)α ⎠ (5) i=1 j =i+1
Equation (5) indicates that the total training time grows O(d α−2 ) with the subset size d reduces. But the number of sub training sets grows O(d −2 ) as d reduces, any of them could be independently learned. If we have enough processing unit to parallel learn the sub training sets, the total training time could be reduced to O(d α ) theoretically. In summary, the sequential training time of N sub training sets increases O(d α−2 ) as d reduces, and the parallel training time have a lower bound of O(d α ), where α = 1.2 ∼ 1.7 depending on the characteristics of datasets. The traditional SVM has a space complexity of O(Lp + q 2 ), where p is the average sample length, q is the working set size. But overall space complexity of training M3 -SVM is O(L2 d −2 (Lp+q 2 )) , with q 2 " q, but since each sub training set is independent, and could be trained one by one on each processing unit, the temporary space complexity of each unit could be reduced to O(2dp + q 2 ). Since each subset will be trained with all other subsets except those in the same class, the overall data passing between processing unit is O(L3 d −2 ). But if we bind k vs. k continues subsets of the same two class together (as if a kd size sub set), k 2 sub classifiers will be trained with these 2k subsets. This could reduce the overall data passing complexity to O(L3 d −2 k −2 ), with each processing unit’s space requirement increased to O(2kd + q 2 ). By using a suitable k, we could always achieve acceptable complexity both in storing and data transferring.
Patent Classification Using Parallel Min-Max Modular Support Vector Machine
161
3.3 Classifier Ensemble When all of the N sub classifiers are learned, we will use min-max modular network to ensemble those sub classifiers into a solution to the original problem. In the work of [4], a detailed description of min-max modular network is given. Here we only show its equivalent transfer function. Consider the two class training set Ti,j , all its Ni × Nj sub training sets output a Ni × Nj discriminant matrix for given test sample x. 1,Nj
1,1 1,2 Ii,j . . . Ii,j Ii,j
2,N
2,1 2,2 Ii,j Ii,j . . . Ii,j j .. .. .. . . ··· . N ,N Ni ,1 Ni ,2 Ii,j Ii,j · · · Ii,ji j
(6)
Then the discriminant Ii,j of Ti,j after min-max modular network ensemble is equivalent to u,v min Ii,j (7) Ii,j = max 1≤u≤Ni 1≤v≤Nj
After all the two class training sets get their discriminants {Ii,j }, a majority voting or other classifier ensemble scheme is applied to get the K-class output for sample x.
3.4 Classifier Selection Algorithm Although the min-max modular network ensemble is easy to understand and operate, its high complexity prevents it from solving large scale problems. To generate the discriminant of Ii,j , all its Ni × Nj sub classifiers have to participate in predicting. But, since every discriminant Ii,j of two-class training set Ti,j is used as majority voting input. We only care its sign instead of value. In this case, the min-max modular network can be simplified using classifier selection algorithms. Zhao and Lu [6] proposed the algorithms of symmetric and asymmetric classifier selection. When only the sigh of Ni,j is concerned, the discriminant matrix of Ni × Nj outputs can be converted to binary “0–1” value, and min-max modular network is equivalent to logistic AND OR operation: Ii,j = OR
1≤u≤Ni 1≤v≤Nj
u,v AND Ii,j
(8)
i.e. each row of the matrix is first executed AND operation, then the resulting column is executed OR operation.
162
Z.-F. Ye et al.
3.4.1 Asymmetric Classifier Selection In binary case, it’s unnecessary to check all the Ni × Nj sub classifiers to have Ii,j , and the ensemble process is simplified to : • In each row of the discriminants matrix, if a “0” is found, its AND result must be “0”, the left of the row could be discarded. • If a row of full “1” is found, the OR output must be “1”, so the left rows could be discarded. This effectively reduces the number of sub classifiers needed to predict input sample x, lots of classifiers are discarded during AND, OR operation. We name it Asymmetric Classifier Selection (ACS).
3.4.2 Symmetric Classifier Selection Note that ACS is equivalent to this asymmetric rule: - Assign Ii,j “1” if a full row of “1” exists in the discriminants matrix, otherwise, assign Ii,j “0”. An variation of this rule could be a symmetric rule: - Assign Ii,j “0” if no full row of “1” exists in the discriminants matrix; assign Ii,j “1” if no full column of “0” exist; otherwise, make a guess. There is a linear complexity symmetric classifier selection (SCS) algorithm that realize the symmetric rule: Symmetric classifier selection algorithm Set u = 1, v = 1; 1. 2. 3. 4. 5.
u,v if Ii,j equals “0”, u = u + 1; else v = v + 1.; if(u > Ni ) return 0; if(v > Nj ) return 1; repeat from step 1;
The SCS algorithm starts from the first element of the discriminant matrix, and either walk down or walk right depending on the current element value. This will select O(Ni + Nj − 1) sub classifiers’ output.
3.5 Pipelined Parallel Classifier Selection 3.5.1 Constructing Pipeline Levels Although the symmetric and asymmetric classifier selection algorithms effectively reduce the number of classifiers that participate in one prediction, they cannot be
Patent Classification Using Parallel Min-Max Modular Support Vector Machine
163
Fig. 1 A 5 level ACS pipeline (left) and a 9 level SCS pipeline (right)
trivially parallelized. Because the selection of later classifiers in the matrix depends on the previous classifiers’ outputs. If we simply let the later classifiers wait for the previous classifiers’ outputs, then too much time is wasted on waiting, and the algorithm is not scalable in parallel computing environment. But we can find that in the discriminant matrix, for the ACS algorithm, the selection of the current classifier only depends on the left classifier; and for the SCS algorithm, the selection of the current classifier only depends on the left or top classifier. Borrowing the idea of pipeline structure of CPU, we can make both ACS and SCS work fine under parallel computing environment. For ACS algorithm, each column of the discriminant matrix made one pipeline level, and for SCS algorithm, each line that parallel to the anti-diagonal line of discriminant matrix make one pipeline. Figure 1 is an example of a 5 × 5 discriminant matrix with ACS and SCS pipelines. When the pipelines are successfully built, classification is proceeded similarly as CPU operates. When the first batch of test samples come, the first pipeline level of sub classifiers predicted it; after that, the second pipeline level take over the task together with the results of the first pipeline level, and meanwhile, the first pipeline level can keep processing the second batch of test samples, and the pipeline flows like this on and on until the last level of the pipeline is reached and the discriminant Ii,j is given.
3.5.2 Time Complexity Now let’s consider the time complexity of pipelined ACS and SCS algorithm. Suppose the maximum testing time of one sub classifier is t, and enough processing unit is available, then testing n samples will require total time of O((n + l) · t), where l is the depth of pipeline level. In batch testing case, we may have huge number of testing samples, i.e. n # l, then the average testing time of a sample x will be O(t). In the case of online testing, the response time of a sample x is O(l · t) for ACS, and O(2l · t) for SCS. Comparing with standard SVM, predicting a two class classifier on Ti,j requires maximum time of O(t ). In linear kernel, t = t, i.e. as long as there is enough processing unit, M3 -SVM has a testing lower bound similar with standard SVM. In
164
Z.-F. Ye et al.
non-linear kernel, the testing time depends on the number of support vectors, therefore, t > t, since t correspond to sub classifiers which has less support vectors. 3.5.3 Processing Units Requirement For ACS algorithm, the number of processing units required is Ni × Nj while for SCS, only Ni + Nj − 1 processing units is enough. It seems that ACS algorithm is not better than min-max rule, because the later can achieve O(t) response time using Ni × Nj processing units. But generally, the available processing units is less than Ni × Nj , in this case, we can reduce the processing units in the later pipeline level because they have less work to do in ACS, the average predicting time will not increase much, but testing time of min-max rule increase linearly as the number of processing units decreases. SCS requires less processing units, and when there is not enough processing units, it is faster than ACS.
4 Patent Classification Experiments 4.1 Dataset The experimental data was collected from NTCIR-5 patent data set [7]. We use 7 years of patents from 1993 to 1999, including 2,399,884 documents, see Table 1. for more details. In the experiment, we use 1998 and 1999 years’ patents as testing sets, 1997, 1996–1997, 1995–1997, 1994–1997, and 1993–1997 years’ patents as five different scales of training sets. Documents were preprocessed to a feature vector of 5,000 dimensions. The 5,000 features are selected using CHI-Average, following the comparative result of [8]. The classification task is in the section layer, and the information of class layer is used as prior knowledge in task decomposition. The training documents are first divided roughly by year, then further divided by class categories, the resulting large subsets are random divided into smaller subsets.
4.2 Computing Environment Our experiments have been conducted on a Lenovo cluster composed of three fat nodes and thirty thin nodes. Each fat node have 32G RAM and two 3.2 GHz Intel(R) Table 1 NTCIR-5 Japanese patents statistics Year
1993
1994
1995
1996
1997
1998
1999
Total
Patents
347,327
350,911
336,970
340,506
331,459
341,387
355,933
2,399,884
Patent Classification Using Parallel Min-Max Modular Support Vector Machine
165
Xeon(TM) CPUs, while each thin node have 8G RAM and two 2.0 GHz Intel(R) Xeon(TM) CPUs with each CPU have four cores. The sequential SVM experiments have been conducted on the fat nodes, and the M3-SVM experiments on the thin nodes
4.3 Comparison with SVMlight We use SVMlight [9] as standard sequential SVM for comparison. In Fig. 2 the classification performance is displayed both in Macro-F1 and Micro-F1 measure. We can see that using year and class information in task decomposition, M3 -SVM can achieve higher performance than SVMlight , especially when the training set size increases, SVMlight also suffers a slightly performance drop on 5 years training set. Figure 3 displays the training and testing time of SVMlight and M3 -SVM. We can see that M3 -SVM has much less training time than SVMlight , but the testing time of M3 -SVM is several times more than SVMlight . Especially when then size of training set increase to 5 year, SVMlight has an unchanged testing time benefits from the linear kernel we used, but M3 -SVM ’s testing time increases rapidly from 3 year, because we only used 100 CPUs in testing, and that’s not enough. In the case of non-linear kernels, SVMlight ’s testing time will also increase rapidly with the size of training set, and M3 -SVM will have a relatively lower testing time cost when enough CPUs are used.
4.4 Scalability Figure 4 illustrates the scalability of M3 -SVM on training and testing time. When enough CPU cores are available, M3 -SVM can achieve very efficient training and
Fig. 2 Performance of SVMlight and M3 -SVM
166
Z.-F. Ye et al.
Fig. 3 Train and test time of SVMlight and M3 -SVM
Fig. 4 The scalability of M3 -SVM
testing performance. We can find that when using 10 times more CPU cores, roughly 5 times speedup could be achieved in both training and testing.
4.5 Comparison of ACS and SCS Figure 5 displays the comparison of testing time of 1 year’s training model. Both ACS and SCS algorithm are much more efficient than M3 -Network, and SCS can achieve lower predicting time using less CPU cores than ACS.
5 Conclusion We used parallel M3 -SVM to solve large scale patent classification problem. Experiments show that M3 -SVM outperforms standard sequential SVMlight in both
Patent Classification Using Parallel Min-Max Modular Support Vector Machine
167
Fig. 5 Comparison of ACS, SCS and M3 – network in testing
F-measure and efficiency. We also reduced M3 -SVM’s predicting time by two pipelined parallel classifier selection algorithm: ACS and SCS. Our algorithm is also scalable as the number of available processing units increases. In the future, we are going to investigate classifier pruning methods to further reduce the time cost, as well as new classifier ensemble methods.
References 1. C. J. Fall, A. T¨orcsv´ari, K. Benzineb and G. Karetka. Automated categorization in the international patent classification. ACM SIGIR Forum, 37(1): 10–25, 2003. 2. L. S. Larkey. A Patent Search and Classification System. International Conference on Digital Libraries, Berkeley, CA, pp. 179–187, 1999. 3. T. Joachims. Making Large-Scale SVM Learning Practical. In B. Sch¯olkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods – Support Vector Learning, MIT Press, Cambridge, MA. Chapter 11, pp. 169–184, 1999. 4. B. L. Lu and M. Ito. Task decomposition and module combination based on class relations: A modular neural network for pattern classification. IEEE Transactions on Neural Networks, 10(5): 1244–1256, 1999. 5. B. L. Lu, K. A. Wang, M. Utiyama, and H. Isahara. A part-versus-part method for massively parallel training of support vector machines. Proc. of IEEE/INNS Int. Joint Conf. on Neural Networks (IJCNN2004), Budabest, Hungary, July 25–29, pp. 735–740, 2004. 6. H. Zhao, B. L. Lu. Improvement on response performance of min-max modular classifier by symmetric module selection. Proceedings of Second International Symposium Neural Networks (ISNN’05), LNCS, vol. 3497: 39–44, Chongqing, China, 2005. 7. M. Iwayama, A. Fujii and N. Kando. Overview of classification subtask at NTCIR-5 patent retrieval task. Proc. of NTCIR-5 Workshop Meeting, 2005. 8. X. Chu, C. Ma, J. Li, B. L. Lu, M. Utiyama and H. Isahara. Large-scale patent classification with min-max modular support vector machines. Accepted by Proc. of International Joint Conference on Neural Networks (IJCNN), HongKong, China, 2008. 9. T. Joachims SVMLight: Support Vector Machine. Software available from http:// svmlight.joachims.org.
A Study of Network Informal Language Using Minimal Supervision Approach Xiaokai Zhang and Tianfang Yao
Abstract The subjective text is an important processed object of opinion mining, but sometimes there have been many informal expressions in a subjective text. The authors of the subjective texts have the personal expression habits which are not restricted to a formal grammar, so Network Informal Language (NIL) emerges. For (like)” is usually replaced by the NIL word “ example, the formal word “ (conjee)” in some chatting tools like OICQ (Open ICQ) and MSN (Microsoft Service Network). Currently, opinion mining tools are less effective to dispose these problems. For instance, they regard the NIL word “8 (non-word)” as a text noise, (ok)” in fact. In which expresses one’s viewpoint, that is, the formal word “ this paper, we propose an approach to try to resolve the problems from NIL expressions. Because NIL expressions are manually identified in general, we use minimal supervision technology to identify them. According to the two different types of NIL, we adopt different strategies for those. The experiment results have shown that the performance of the proposed approach is satisfied. Therefore, this approach is reasonable and effective. In the future, we will improve this approach, so that it can process more complicated NIL expressions.
1 Introduction With the development of Internet and personal computer, online exchanges have the advantage of convenience and rapid. So communication on the Internet is an important data source for data mining and information extraction. Because information in BBS (Bulletin Board System) and MSN (Microsoft Service Network) is always from personal viewpoints, lots of them are subjective. In this situation, Network
X. Zhang and T. Yao Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail: {xkzhang, yao-tf}@sjtu.edu.cn B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
169
170
X. Zhang, T. Yao
Informal Language (NIL) is widely used in these texts. For traditional opinion mining, we regard it as text noise. However, some of these NIL words contain abundant useful information which conveys user’s opinions. E.g., some clients express their opinions about a famous product. Therefore, we should formalize subjective texts before further processing.
2 Related Work and NIL Forms There have been many researchers in England that showed many interests in studying language evolution [1]. There have also been some people studying traditional Chinese language in this field [2]. And they built a NIL corpus by collecting huge web pages, then recognized whether the words and expressions in a text are informal. At last, they replaced the informal expressions by the corresponding formal ones. But there are some defects in their work. First, all data are based on traditional Chinese words rather than simplified Chinese ones. Second, the web pages they collect are much too huge (more then 100,000 web pages), all of the NIL expressions are recognized by manually. Obviously, it is too time-consuming. In our study, we propose a minimal supervision approach to build a small corpus and use an effective approach to dispose this problem. Most of the NIL expressions come from the input of users. In order to convey their opinions and ideas more quickly and conveniently, users take homonyms, sound similar words or some words’ abbreviation to express their viewpoints rather than formal words. In this situation, NIL expressions emerge. Table 1 shows six different forms of the NIL words and expressions we collected from 4,315 web pages. In addition, because most of the current input tools have their shortcomings and some users’ pronunciation is not accurate enough, it is more difficult to resolve this problem.
Table 1 NIL words and expression forms (case insensitive) Words formation
Number
Example
Abbreviation of English or Chinese Pinyin Homophonic of Chinese words
127
“GF”=“
36
Transliteration and foreign expression Partial tone in Chinese
38
(bamboo)”=“ “ (web moderator)” “ (noddles)”=“
148
“
numbers
39
“94”=“
(yes)”
Mixture of the above forms
54
“3q”=“
(thank you)”
(girl friend)”
(porridge)”=“
(fans)” (like)”
A Study of Network Informal Language Using Minimal Supervision Approach
171
3 Data Source and Corpus Established 3.1 Source of NIL Expressions First of all, we download network texts from baidu.com by writing a spider problem which contains lots of NIL expressions. After that, we download the texts from July to December in the year 2007. Then, we annotated the NIL expressions in the texts manually. The first step is to build NIL dictionary and all the NIL words are represented by a two-element tuple (vi , vj ), here vi indicates the informal expressions and vj gives the formal word corresponding to vi . After establishing the NIL corpus, we can recognize whether some words in a subjective text are informal. If we can make sure there are informal, we displace them in the text by the corresponding formal words in the NIL corpus. The next step is to determine whether the words in the text really are informal.
3.2 Two Definitions 3.2.1 Dividing All The Forms into Two Types Our work is based on instance learning, the advantage for this approach is to use minimal learning data to recognize NIL words and expressions. We divide the six different forms in Table 1 into two main types, that is, typical NIL expressions and fuzzy NIL expressions, for disposing the problem more effectively.
3.2.2 Typical NIL Expressions The expressions of this type have the following characteristics: they impossibly appear in the formal texts (e.g., magazines, newspapers or dictionaries). They are created by web users for the purpose of achieving a much faster exchange speed in (for communication. E.g., the typical NIL words “OMG (oh, my god)” and “4 people)”.
3.2.3 Fuzzy NIL Expressions The trait of this type’s NIL is that it appears in not only network communication but also formal texts. In this circumstance, the formal language maybe has the different (porridge)” and “ meanings: formal or informal? E.g., the fuzzy NIL word “ (mate or even)” represent “ (like)” and “ (I or me)” respectively.
172
X. Zhang, T. Yao
4 Recognize the NIL Expressions 4.1 Typical NIL Expressions We adopt sequential covering algorithm to produce rules for solving this type’s problem. Toward every typical NIL word we download the first 50 documents through a search engine like Google, and then extract rules for the special NIL words. The following is a sentence example including the word “8 (not)” and “94 (be)”. “ (the only drawback of this car is not good looking in appearance!)”. Here we can extract the following rule from the sentence for recognizing whether “94” or “8” is NIL expressions or not: [any words]+[94/8]+<non-quantifier>+[any words] -> [any words]+[ <non-quantifier>+[any words].
/
]+
4.2 Fuzzy NIL Expressions This type’s NIL expressions exist in both formal and informal texts, and they also have various forms. Considering the small coverage of rule-based method for this case, it is hard to organize a suitable rule set to process such expressions. Therefore, we adopt classification technology in this circumstance.
4.3 Feature Selection After accomplishing some experiments and analyzing their results, we get some important features in the below. In most circumstances, using these features we can identify and separate the fuzzy NIL expressions. At first, we determine whether fuzzy NIL expressions appear in texts. If indeed, we recognize the property of expressions, namely, formal or informal, by extracting some features to obtain a classification result. At last, we can replace the informal expressions by the corresponding formal words in the corpus. Here are some useful features in the fuzzy expression classification: 1. Typical NIL expressions If a sentence contains the typical NIL expression, it is very likely to be a fuzzy word. Because comments belong to someone who uses NIL expressions, the typical NIL expression can be an effective feature. 2. Words indicating advice, proposal and sentiment If such words occur in a sentence, this sentence is likely subjective [3]. Because the fuzzy NIL expressions exist in subjective texts, the expression may be a NIL word.
A Study of Network Informal Language Using Minimal Supervision Approach
173
3. The first and second personal pronouns NIL expressions come from personal viewpoints, it maybe expresses one’s perspectives and comments about something. We can conclude that it is possible that the sentences containing these words include the commenter’s first or second personal pronouns. 4. Informal use of punctuations The use of the NIL expressions is much random and informal. There could occur some informal punctuations in their contexts, such as the use of some exclamation and question marks at the end of a sentence. But the informal use of punctuations impossibly emerges in the formal text like newspaper and magazine. 5. Words and punctuations containing emotional color If an exclamation mark emerges in a sentence, it indicates the author’s surprise or excitation; while a question mark indicates the author’s doubt about something. Therefore, this is a feature which should not be ignored.
5 Experiment 5.1 Resource The total sentence number for our experiment is 3,746. There sentences are constructed by two datasets: one for typical NIL expression and another for fuzzy NIL expression dataset. The former contains 1,842 sentences, and the latter includes sentences having fuzzy NIL expressions which maybe exist in the typical NIL forms. We divide each dataset into two bags: one for training and one for testing. Then we compare the performance of different data combination trained with SVM [4] (http://www.kernel-machines.org/ ) for classification.
5.2 Solution for Sparse Matrix In the second stage, we combine some of the features to draw three weight features. In order to control the weight value between 0 and 1, we modify the formula for the sentence weight by Eq. (1). In this equation, n indicates the number of feature in the sentence. 1 (1) s(n) = 1 + e−n Given a specific sentence, we can calculate the count of fuzzy NIL expressions, and then we can obtain the weight s(n) of the sentences.
174
X. Zhang, T. Yao
Table 2 Typical NIL expression results Dataset Typical NIL data
Precision
Recall
F-Measure
0.871
0.682
0.765
Precision
Recall
F-Measure
0.802 0.813
0.843 0.885
0.822 0.847
Table 3 Fuzzy NIL expression results Dataset Fuzzy NIL dataA Fuzzy NIL dataB
5.3 Result To evaluate the results, we use the standard evaluation method with precision, recall and f-measure. Our experiment includes two stages. In the first stage, we recognize the typical NIL expressions, and in the second stage, we have to use the results from the first one, because the former is one of the features for the latter. Thus, the results are composed of two parts. Tables 2 and 3 display the results. In Table 3, the result in the first line uses the same training and testing data, and those in the second line uses 10-fold validation. From the tables we think that the classification has good performance: the recall is much higher, and the f-measure is also better.
6 Conclusion The processing for the NIL expressions is a new research direction of NLP. It has many factors to affect the processing impact, because the NIL expressions are correlated with different areas and periods. Therefore, we should update our NIL dictionary every two or three month and download the sentences which contain new NIL expressions for keeping update. The approach proposed in our paper is useful, we do not need to download too much texts for the NIL’s recognition. In our study, we adopt the NIL issues using minimal data set and gain good results. It shows that the f-measure has reached 0.847, that is to say, it proves that our approach is effective for the NIL expression processing. Acknowledgment This research work is financially supported by the National Science Foundation of China (No. 60773087) and the UDS-SJTU Joint Research Lab for Language Technology. We sincerely thank them for their help.
References 1. Sproat R, Black AW, Chen S, Kumar S, Ostendorf M, Richards C (2001) Normalization of nonstandard words. J Computer Speech and Language, 15: 287–333 2. Xia YQ, Wong KF, Gao W (2005) NIL is not nothing: recognition of Chinese network informal language expressions. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language
A Study of Network Informal Language Using Minimal Supervision Approach
175
Processing at IJCNLP’05, Jeju Island, Korea, pp 95–102 3. Yao TF, Peng SW (2007) A study of the classification approach for Chinese subjective and objective texts. In: Proceedings of the third national conference for information retrieval and content security, Suzhou, China, pp 117–123 4. Isozaki H, Kazawa H (2002) Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th international conference on computational linguistics (COLING 2002), Taipei, Taiwan, pp 1–8
Topic Identification Based on Chinese Domain-Specific Subjective Sentence Hang Yin and Tianfang Yao
Abstract Opinion mining has become one of the most popular researching topics and has been applied in many fields, such as review classification, mining product reputation, etc. This paper pursues another aspect of opinion analysis: identifying topics of opinion sentences. We present TDCS (Topic iDentification in Chinese Sentence) algorithm to identify topics in Chinese domain-specific subjective sentences. The algorithm contains two parts: first, identifying domain word in the domain sentences. Then, filtering domain words to get the real topic of the domain sentence.
1 Introduction In recent years, there has been a great deal of interest in the methods for automatically identifying opinions, emotions, and sentiments in text. Much of this research explores sentiment classification, a text categorization task in which the goal is to classify a document as one having positive or negative polarity. Other research efforts analyze opinion expressions at the sentence level or below to recognize opinions including polarity and strength. While this paper focuses on one aspect of the opinion mining problem: to identify topics in Chinese domain-specific subjective sentences, and model the problem as <sentence,
*>. Note that the asterisk means that the topic number in the sentence can be 0, 1, 2, . . . . For example, (Look good, flexible conin the sentence “ (look)”, “ (control)”, and trol, and high security!)”, the topics include “ (security)”. The goal of the research is to automatically identify explicit “ domain-specific topics in Chinese subjective sentences like them. We choose online expert review in Pacific Automobile at http://www.pcauto. com.cn/ as research domain. Due to the complexity of Chinese domain words and
H. Yin, T. Yao Department of Computer Science and Engineering, Shanghai Jiao Tong University, China e-mail: {yinhangfd, yao-tf}@cs.sjtu.edu.cn B. Mahr, H. Sheng (eds.) Autonomous Systems – Self-Organization, Management, and Control, c Springer Science+Business Media B.V. 2008
177
178
H. Yin, T. Yao
Chinese sentence structures, there are two difficulties for identifying topics in Chinese domain-specific subjective sentences. First, there are lots of domain-specific words which are always compound words in sentences. Different from English sentence, there are no spaces between words in Chinese, which greatly increases the difficulties to determine the boundary of domain words. Second, in the real-world application, Chinese sentences have more flexibility and irregularity. There can be more than one topic and sentiment in one sentence, and the relation between topics and sentiments is always many-to-many, which means one sentiment can correspond to several topics and one topic can also correspond to several sentiments. To get rid of the difficulties in processing Chinese language sentences, TDCS algorithm firstly trains a statistic model to find domain words in sentence and then sets up word matrixes to compute words relationship in sentence. At last, the algorithm chooses sentiment word related domain word as topics of sentence. Section 2 describes the related approaches to identifying opinion topics. Section 3 describes the CRF method to identifying domain words and the features that our system uses. Section 4 introduces the algorithm to select topics from domain words. Finally, Section 5 presents our experimental results.
2 Related Work Some strategies select so-called in-site terms as appraised topics of specific products [1] and [2]. Apparently, these in-site terms are too limited to represent all possible appraised topics. Reference [3] uses 3-g association rule mining method to find the topic of subjective sentence. While even in Chinese subjective sentence, not all the parts of the sentence are subjective. So, this method can generate too many redundant features. Reference [4] proposes three heuristics to distinguish NPs and selects their respective topic terms by means of the likelihood test. However, this method will generate too many redundant topic terms. Besides, the topic terms in Chinese language can not be expressed with the three heuristics. Reference [5] uses FrameNet knowledge to get the topic of sentence. FrameNet is an online lexical resource based on frame semantics which aims at documenting the range of semantic and syntactic combinatory possibilities-valences of each word in each of its senses. Several frames related to opinions are collected, and specific frame elements are marked as topics. But this method can hardly used in Chinese sentence processing because there is no knowledge available in FrameNet for Chinese language. Reference [6] uses IE based method to find topic terms in English sentences, although this method can identify more complicated domain terms, but it can hardly work in Chinese language corpus. In summary, it is hard to apply these methods to the topic identifying task in Chinese sentence. Chinese sentence doesn’t have space between words. So it is a real tough work to find candidate domain words. What’s more, Chinese sentence always has more complex sentence structure. Not all the selected candidate domain words are real topics of the sentence. So, TDCS works in a different way in that it can work properly well in Chinese language corpus.
Topic Identification Based on Chinese Domain-Specific Subjective Sentence
179
3 Domain Word Identification In this paper we assume the topics of sentences are all made of domain words. To identify the topics in sentences we have to identity the domain words in sentences first. Domain words are always compound words with great complexity. It is not practical to use a dictionary to solve the problem, because the dictionary has always a low coverage. In addition, due to the complexity of Chinese language, it is also hard to find a rule-based method to find the boundary of Chinese domain words, because the words in Chinese language are written together without space. So, we can view the domain words identifying task as a sequential role labeling problem. Given an input words sequence x = x1 x2 ...xn , the corresponding output is sequential labels y = y1 y2 ...yn . Here the tag is represented by ‘BI O’ tagging convention. ‘B’ indicates the beginning of a domain word, ‘I ’ indicates the middle or end of a domain word and ‘O’ is not the part of any domain word.
3.1 CRF (Conditional Random Field) We create a linear chain CRF model [7] to solve this sequential labeling problem. Following the above model, the conditional probability of a sequence of labels y given a sequence of input words x is n 1 exp λk fk (yi−1 , yi , x, i) (1) p(y|x) = z(x) i k n z(x) = exp λk fk (yi−1 , yi , x, i) (2) y∈Y
i
k
where z(x)ψ is a normalization constant for each x. fk (yi−1 , yi , x, i) is a binary feature indicator function that can express any independent feature locates on the each edge or node, λk ψ is a weight assigned for each feature function which can be gained through training a data set. The λk parameters of the model are trained to maximize the conditional loglikelihood P (y|x). For inference, given a sentence x in the test data, the tagging sequence y is given by y0 P (y |x).
3.2 Feature and Template To identify domain word, we consider four features including word feature, POStag feature, stop-word feature and domain dictionary feature. Word feature is represented by ‘W’ which is the word itself. POS-tag feature is represented by ‘P’ which
180
H. Yin, T. Yao
Table 1 Feature template Template Name
Basic Feature
Compound Feature
TMP-1
P , S, D
TMP-2
W, S, D
TMP-3
W, P , S, D
Pn Sn Dn Pn Pn + 1 Sn Sn + 1 Dn Dn + 1 n = −2, −1, 0, 1 Wn Sn Dn Wn Wn + 1 Sn Sn + 1 Dn Dn + 1 n = −2, −1, 0, 1 Wn Pn Sn Dn Wn Wn + 1 Pn Pn + 1 Dn Dn + 1 Sn Sn + 1 n = −2, −1, 0, 1
can be gain by parser result. We define stop word as word that commonly exits in sentence and can not be part of domain word. We build a stop-word dictionary to generate stop-word feature ‘S’. Finally, we collect a set of simple domain words to get domain dictionary feature ‘D’. We use these four features to generate three templates for experiment, which are shown in Table 1.
4 Topic Identification In a subjective text, not all the sentences are subjective and even in a subjective sentence not all the parts of the sentence are subjective. So, after finding domain words, we have to filter these candidate domain words to get the real topics of the sentence. We think only sentiment related domain word is the topic of the sentence. Chinese sentence always has more complex structure. The distance between sentiment word and topic word is flexible and long. Further more, the relationship between sentiment and topic word is not always one-to-one. So using association rule mining method or pattern matching algorithm can not find relationship between sentiment and topic word efficiently. Thus we introduce topic filtering algorithm to tackle the problem in Chinese sentence. Topic filtering algorithm assumes that the sentiment words are all available and uses sentiment words to find the topics of a sentence. Topic filtering algorithm first sets up three word relation matrixes. The sentiment word and its related topic word usually locate in one branch of a grammar tree. But sometimes they locate in two different branches of the grammar tree. Thus we define three word relation matrixes to resolve the problem. Next the algorithm utilizes dependency tree analyzing result to compute value of relation matrix for obtaining grammar relationships between sentiment and domain words. Finally, the algorithm checks the value of the relationship matrix to get sentiment related topics of a sentence. The steps of the algorithm are shown as follows: Step 1 Do POS tagging to a sentence. Put the identified domain words into POS parser extension word dictionary so as to enable the POS parser to chunk the sentence and make POS tagging correctly.
Topic Identification Based on Chinese Domain-Specific Subjective Sentence
181
The chucked words are put into WordList as their original sequence in the sentence so as to form the column and row of the word matrixes of the sentence. Step 2 Do dependencies analysis to the sentence. Do dependency analysis to the sentence to get the grammar tree of the sentence. Each branch of the grammar tree contains direct relationship information between any two words in the sentence. In the step 1 and step 2 we use LTP parser which has good performance in Chinese language process parser to do POS tagging and dependency analysis. Step 3 Cut redundant branches of the grammar tree. In the experiment we observe that not all the grammar relations have contribution to identify the relation between sentiment and domain words, such as IC (independent clause relation)VV (verb-verb relation). What’s more, some redundant grammar relations will increase the complexity to the identifying task. So we cut the redundant branches of the grammar tree in advance. Step 4 Set up word relation matrixes. There are always no direct grammar relationships between sentiment and topic words. But using information in grammar tree, we can calculate indirect relationship between sentiment and topic words. With this idea, we create three word relation matrixes to compute grammar relation between words in sentence. Each word relation matrix has the same format whose column and row represent every word in WordList. But each matrix has its own special meaning and usage. Next we will introduce the three word matrixes. Definition 1. SigleBranchMatrix: This matrix records the grammar relation value between any two words in WordList in one grammar branch. Definition 2. DownDistMatrix: It is the symmetric matrix of SigleDistMatrix. It helps to compute grammar relation value between any two words in WordList in two different grammar branches. Definition 3. DoubleBranchMatrix: This matrix records the grammar relation value between any two words in WordList in two different grammar branches. The value of each word matrix is initialized to infinite value if the two words in the column and row do not represent a branch the grammar tree; otherwise initialize the value to 1.0 if the two words in the column and row do represent a branch the grammar tree. Step 5 Calculate the values of word relation matrixes. For single branch condition, we calculate the value directly. For double branches condition, we add a parameter α which is set greater than 1.0 to enlarge the value. Because we think the relationship between different branches has less confidence. The formula (3) and (4) give the calculation way of matrixes’ value. SingleDistMatrix[wi , wj ] = Min(SingleDistMatrix[wi , wj ], SingleDistMatrix[wi , wk ] +SingleDistMatrix[wk , wj ])
(3)
182
H. Yin, T. Yao
DoubleDistMatrix[wi , wj ] = Min(DoubleDDistMatrix[wi , wj ], SingleDistMatrix[wi , wk ] +α ∗ DownDistMatrix[wk , wj ])
(4)
Step 6 Select the topics of the sentence. We assume that all the sentiment words are previously known and put into the sentiment word dictionary. The relationship information between any two words in the sentence is stored in the word matrixes which contain relationship information between sentiment and domain words. So we walk through SigleBranchMatrix and DoubleBranchMatrix to find the value whose column represents a sentiment word and row represents a domain word. If the value is less than a threshold, we mark the domain word as the topic of the sentence.
5 Experimental Results TDCS algorithm is applied in the car domain. For domain word identification, we extract 1,700 sentences from Expert Review from Pacific Automobile and use them as the experimental data. Among these 1,700 sentences, we manually label 1,500 sentences as the training data set and leave 200 sentences as the testing data. For topic selection, we use the left 200 sentences as the experimental data. Three measures, recall, precision, and F-measure, are applied to evaluate the performance of domain word and topic identification. The result is shown in Tables 2 and 3. For domain word identification we use the three templates defined in the Section 3.2 to train the data set. The CRF model is gained from Pocket CRF open source code at http://sourceforge.net. For topic selection we set α = 4.0, and three different threshold values to run test. Table 3 gives the experiment result. For the total performance of the system, we gain precision 83.36, recall 77.99 and F-measure 80.86 respectively.
Table 2 Performance for domain word identification
Precision Recall F-measure
TMP-1
TMP-2
TMP-3
87.08 83.29 85.12
93.78 92.03 92.57
95.80 93.07 94.41
Table 3 Performance for topic selection
Precision Recall F-measure
Threshold = 8
Threshold = 10
Threshold = 12
87.33 81.23 84.23
87.08 83.10 85.05
87.02 83.80 85.38
Topic Identification Based on Chinese Domain-Specific Subjective Sentence
183
6 Conclusion In this paper, we present an approach for topic identification in Chinese subjective sentence. Compared to other works in this field, the approach in this paper can deal with the problems in more complicated Chinese language sentences. The experimental results have shown that the algorithm is appropriate and effective. Acknowledgement This research work is financially supported by the National Science Foundation of China (No. 60773087) and the UDS-SJTU Joint Research Lab for Language Technology. Besides, Information Retrieval Lab, Harbin Institute of Technology provides the Chinese syntactic analyzer LTP parser. We sincerely thank them for their help.
References 1. Morinaga S, Yamanishi K, Tateishi K, Fukushima T (2002) Mining product reputations on the web. In: Proceedings of the international conference on knowledge discovery and data mining (KDD-2002), ACM Press, New York, pp 341–349 2. Gamon M, Aue, Corston-Oliver S, Ringger E (2005) Pulse: mining customer opinions from free text. In: Proceedings of the 6th international symposium on intelligent data analysis, LNCS, Springer, Madrid, pp 121–132 3. Hu MQ, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the 9th national conference on artificial intelligence (AAAI-2004), San Jose, pp 755–760 4. Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the conference on computational linguistics (COLING-2004), Geneva, Switzerland, pp 1367–1373 5. Kim SM, Hovy E (2006) Extracting opinions, opinion holders, and topics expressed in online news media text. In: Proceedings of the workshop on sentiment and subjectivity in text (COLING-ACL 2006 Workshop), Sydney, Australia, pp 1–8 6. Cheng XW (2006) Automatic topic term detection and sentiment classification for opinion mining. Master thesis, University of Saarland, pp 55–59 7. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, Morgan Kaufmann, San Francisco, pp 282–289