NEURAL NETS WIRN09
Frontiers in Artificial Intelligence and Applications Volume 204 Published in the subseries
Knowledge-Based Intelligent Engineering Systems Editors: L.C. Jain and R.J. Howlett Recently published in KBIES: Vol. 203. M. Džbor, Design Problems, Frames and Innovative Solutions Vol. 196. F. Masulli, A. Micheli and A. Sperduti (Eds.), Computational Intelligence and Bioengineering – Essays in Memory of Antonina Starita Vol. 193. B. Apolloni, S. Bassis and M. Marinaro (Eds.), New Directions in Neural Networks – 18th Italian Workshop on Neural Networks: WIRN 2008 Vol. 186. G. Lambert-Torres et al. (Eds.), Advances in Technological Applications of Logical and Intelligent Systems – Selected Papers from the Sixth Congress on Logic Applied to Technology Vol. 180. M. Virvou and T. Nakamura (Eds.), Knowledge-Based Software Engineering – Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering Vol. 170. J.D. Velásquez and V. Palade, Adaptive Web Sites – A Knowledge Extraction from Web Data Approach Vol. 149. X.F. Zha and R.J. Howlett (Eds.), Integrated Intelligent Systems for Engineering Design Recently published in FAIA: Vol. 202. S. Sandri, M. Sànchez-Marrè and U. Cortés (Eds.), Artificial Intelligence Research and Development – Proceedings of the 12th International Conference of the Catalan Association for Artificial Intelligence Vol. 201. J.E. Agudo et al. (Eds.), Techniques and Applications for Mobile Commerce – Proceedings of TAMoCo 2009 Vol. 200. V. Dimitrova et al. (Eds.), Artificial Intelligence in Education – Building Learning Systems that Care: From Knowledge Representation to Affective Modelling Vol. 199. H. Fujita and V. Mařík (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the Eighth SoMeT_09 Vol. 198. R. Ferrario and A. Oltramari (Eds.), Formal Ontologies Meet Industry Vol. 197. R. Hoekstra, Ontology Representation – Design Patterns and Ontologies that Make Sense
ISSN 0922-6389
Neural Nets WIRN09 Proceedings of the 19th Italian Workshop on Neural Nets, Vietri sul Mare, Salerno, Italy, May 28–30 2009
Edited by
Bruno Apolloni Università degli Studi di Milano, Dipartimento di Scienze dell’Informazione, Via Comelico 39, 20135 Milano, Italy
Simone Bassis Università degli Studi di Milano, Dipartimento di Scienze dell’Informazione, Via Comelico 39, 20135 Milano, Italy
and
Carlo F. Morabito Università di Reggio Calabria, IMET, Loc. Feo di Vito, 89128 Reggio Calabria, Italy
Amsterdam • Berlin • Tokyo • Washington, DC
© 2009 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-60750-072-8 Library of Congress Control Number: 2009940563 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved.
v
Preface Human Beings leave, the Science continues. This volume collects contributions to the 19th Italian Workshop of the Italian Society for Neural Network (SIREN). The conference held a few days after the death of prof. Maria Marinaro, who was a founder and a solid leader of the society. The conference was sad for this, but more intense at the same time. With neural networks we are exploring thought mechanisms that share the two features of an efficient computational tool and a representative of the physics of our brain, having the loops of our thoughts as an ultimate product. It is not a duty of our discipline sentencing what happens when these loops stop, but is a fascinating goal shedding light on how these loops run and which tracks they leave. The Science continues, and we dedicate these selected papers to Maria. We have grouped them within the five themes of: “modeling”, “signal processing”, “economy and complexity”, “biological aspects”, and “general applications”. They come from three regular sessions of the conference plus two specific workshops on “Computational Intelligence for Economics and Finance” and “COST 2102: Cross Modal Analysis of Verbal and Nonverbal Communications”, respectively. The editors would like to thank the invited speakers as well as all those who contributed to the success of the workshops with papers of outstanding quality. Finally, special thanks go to the referees for their valuable input.
This page intentionally left blank
vii
Contents Preface
v
Chapter 1. Models The Discriminating Power of Random Features Stefano Rovetta, Francesco Masulli and Maurizio Filippone
3
The Influence of Noise on the Dynamics of Random Boolean Network A. Barbieri, M. Villani, R. Serra, S.A. Kauffman and A. Colacci
11
Toward a Space-Time Mobility Model for Social Communities Bruno Apolloni, Simone Bassis and Lorenzo Valerio
19
Notes on Cutset Conditioning on Factor Graphs with Cycles Francesco Palmieri
29
Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features Vincenzo Conti, Barbara Lanza, Salvatore Vitabile and Filippo Sorbello
39
Chapter 2. Signal Processing The COST 2102 Italian Audio and Video Emotional Database Anna Esposito, Maria Teresa Riviello and Giuseppe Di Maio
51
Face Verification Based on DCT Templates with Pseudo-Random Permutations Marco Grassi and Marcos Faundez-Zanuy
62
A Real-Time Speech-Interfaced System for Group Conversation Modeling Cesare Rocchi, Emanuele Principi, Simone Cifani, Rudy Rotili, Stefano Squartini and Francesco Piazza
70
A Partitioned Frequency Block Algorithm for Blind Separation in Reverberant Environments Michele Scarpiniti, Andrea Picaro, Raffaele Parisi and Aurelio Uncini
81
Transcription of Polyphonic Piano Music by Means of Memory-Based Classification Method Giovanni Costantini, Massimiliano Todisco and Renzo Perfetti
91
A 3D Neural Model for Video Analysis Lucia Maddalena and Alfredo Petrosino A Wavelet Based Heuristic to Dimension Neural Networks for Simple Signal Approximation Gabriele Colombini, Davide Sottara, Luca Luccarini and Paola Mello
101
110
viii
Support Vector Machines and MLP for Automatic Classification of Seismic Signals at Stromboli Volcano Ferdinando Giacco, Antonietta Maria Esposito, Silvia Scarpetta, Flora Giudicepietro and Maria Marinaro
116
Chapter 3. Economy and Complexity Thoughts on the Crisis from a Scientific Perspective Jaime Gil-Aluja
127
Aggregation of Opinions in Multi Person Multi Attribute Decision Problems with Judgments Inconsistency Silvio Giove and Marco Corazza
136
Portfolio Management with Minimum Guarantees: Some Modeling and Optimization Issues Diana Barro and Elio Canestrelli
146
The Treatment of Fuzzy and Specific Information Provided by Experts for Decision Making in the Selection of Workers Jaime Gil-Lafuente
154
An Intelligent Agent to Support City Policies Decisions Agnese Augello, Giovanni Pilato and Salvatore Gaglio
163
“Pink Seal” a Certification for Firms’ Gender Equity 169 Tindara Addabbo, Gisella Facchinetti, Giovanni Mastroleo and Tiziana Lang Intensive Computational Forecasting Approach to the Functional Demographic Lee Carter Model Valeria D’Amato, Gabriella Piscopo and Maria Russolillo Conflicts in the Middle-East. Who Are the Actors? What Are Their Relations? A Fuzzy LOGICal Analysis for IL-LOGICal Conflicts Gianni Ricci, Gisella Facchinetti, Giovanni Mastroleo, Francesco Franci and Vittorio Pagliaro
177
187
Chapter 4. Biological Aspects Comparing Early and Late Data Fusion Methods for Gene Function Prediction Matteo Re and Giorgio Valentini An Experimental Comparison of Random Projection Ensembles with Linear Kernel SVMs and Bagging and BagBoosting Methods for the Classification of Gene Expression Data Raffaella Folgieri Changes in Quadratic Phase Coupling of EEG Signals During Wake and Sleep in Two Chronic Insomnia Patients, Before and After Cognitive Behavioral Therapy Stephen Perrig, Pierre Dutoit, Katerina Espa-Cervena, Vladislav Shaposhnyk, Laurent Pelletier, François Berger and Alessandro E.P. Villa
197
208
217
ix
SVM Classification of EEG Signals for Brain Computer Interface G. Costantini, M. Todisco, D. Casali, M. Carota, G. Saggio, L. Bianchi, M. Abbafati and L. Quitadamo
229
Role of Topology in Complex Neural Networks Luigi Fortuna, Mattia Frasca, Antonio Gallo, Alessandro Spata and Giuseppe Nunnari
234
Non-Iterative Imaging Method for Electrical Resistance Tomography Flavio Calvano, Guglielmo Rubinacci and Antonello Tamburrino
241
Role of Temporally Asymmetric Synaptic Plasticity to Memorize Group-Synchronous Patterns of Neural Activity Silvia Scarpetta, Ferdinando Giacco and Maria Marinaro Algorithms and Topographic Mapping for Epileptic Seizures Recognition and Prediction N. Mammone, F. La Foresta, G. Inuso, F.C. Morabito, U. Aguglia and V. Cianci Computational Intelligence Methods for Discovering Diagnostic Gene Targets about aGVHD Maurizio Fiasché, Maria Cuzzola, Roberta Fedele, Domenica Princi, Matteo Cacciola, Giuseppe Megali, Pasquale Iacopino and Francesco C. Morabito Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation Fabio La Foresta, Nadia Mammone, Giuseppina Inuso and Francesco Carlo Morabito
247
261
271
281
Chapter 5. Applications The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects Maurice Grinberg and Vladimir Haltakov
293
Interactive Reader Device for Visually Impaired People Paolo Motto Ros, Eros Pasero, Paolo Del Giudice, Vittorio Dante and Erminio Petetti
306
On the Relevance of Image Acquisition Resolution for Hand Geometry Identification Based on MLP Miguel A. Ferrera, Joan Fàbregas, Marcos Faundez-Zanuy, Jesús B. Alonso, Carlos Travieso and Amparo Sacristan Evaluating Soft Computing Techniques for Path Loss Estimation in Urban Environments Filippo Laganà, Matteo Cacciola, Salvatore Calcagno, Domenico De Carlo, Giuseppe Megali, Mario Versaci and Francesco Carlo Morabito
314
323
x
The Department Store Metaphor: Organizing, Presenting and Accessing Cultural Heritage Components in a Complex Framework Umberto Maniscalco, Gianfranco Mascari and Giovanni Pilato
332
Subject Index
339
Author Index
341
Chapter 1 Models
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-3
3
The discriminating power of random features Stefano ROVETTA a(1) , Francesco MASULLI a,b and Maurizio FILIPPONE c a Department of Computer and Information Sciences, University of Genova, Italy b Temple University, Philadelphia, USA c Sheffield University, UK Abstract Input selection is found as a part of several machine learning tasks, either to improve performance or as the main goal. For instance, gene selection in bioinformatics is an input selection problem. However, as we prove in this paper, the reliability of input selection in the presence of high-dimensional data is affected by a small-sample problem. As a consequence of this effect, even completely random inputs have a chance to be selected as very useful, even if they are not relevant from the point of view of the underlying model. We express the probability of this event as a function of data cardinality and dimensionality, discuss the applicability of this analysis, and compute the probability for some data sets. We also show, as an illustration, some experimental results obtained by applying a specific input selection algorithm, previously presented by the authors, which show how inputs known to be random are consistently selected by the method.
1. Introduction When we are concerned with data sets characterized by high dimensionality (number of attributes) and low cardinality (number of instances), as in the case of typical document processing and bioinformatics data sets, we stumble upon many data analysis problems related to the curse of dimensionality [3]. A widespread remedy for this situation is input selection, aiming to find small, relevant subsets of attributes whose predictive power is not much worse than that of the full set of inputs. Another application of input selection is finding the most important inputs, with the aim of describing the problem at hand by using only relevant variables. Unfortunately, the two problems of data dimensionality and input relevance are related by a phenomenon that is the subject of this work, and which can be again related to the curse of dimensionality itself. It may happen, and we evaluate how often, that typical performance-based input selection methods can find very relevant variables even when the data set is random. Our analysis shows that for data sets characterized by high dimensionality (number of attributes) and low cardinality (number of instances) there is a certain probability that a random attribute can perform perfect classification on the training set, and even higher probabilities to perform classification with some errors. 1 Stefano Rovetta, Department of Computer and Information Sciences, University of Genova, Italy, e-mail:
[email protected]
4
S. Rovetta et al. / The Discriminating Power of Random Features
This work does not propose a solution, but only illustrates the problem and discusses possible strategies to alleviate it. In the next section we briefly survey some of the many possible formulations and strategies for the input selection task, although for a more in-depth treatment we recommend the survey in [7]. In Section 3 the problem will be exemplified and formalized. Some experimental illustrations are presented and commented in Section 5. Finally, Section 6 contains a brief discussion.
2. Input selection The input selection task can be stated in many possible ways. To fix a reference framework, we will refer to the diagnostic task, i.e., variable selection for classification, which is a very common framework in bioinformatic data analysis. A first consideration concerns the goal that we seek. Relevant variable selection aims at being maximally synthetic and at pointing out what factors are really acting over an empirically described phenomenon, so that a model can be constructed building on these factors. Dimensionality reduction, on the other hand, aims at identifying and eliminating all variables which are redundant, uncorrelated with the output, or noisy, and whose contribution is more in increasing computational complexity than in providing selectivity. As a consequence, the former task might be tuned for high selectivity, while the latter might yield more comprehensive results for the sake of keeping performance high. An important aspect is the evaluation of mutual dependencies between variables. If dependence is assumed or known to be low, then we can attribute relevance to an individual variable, and proceed by ranking all inputs. Conversely, if this hypothesis does not reasonably hold, then the task is necessarily feature subset selection, whereby groups of variables are evaluated together. Sometimes the hypothesis of mutual independence is made only to cut the exponential growth in computational cost that subset selection implies, and yet it is known that, in problems with large numbers of inputs such as gene expression analysis and natural language processing, methods based on this assumption perform surprisingly well (we can cite the well-known example of spam recognition with Naive Bayes filters [2]). In the fields cited as examples, this can also be reasonably expected, since even multifactorial problems depend only on few of the many available variables. Input selection strategies are invariably heuristic, due to the exponential complexity of an exhaustive approach in such a huge search space. Heuristic criteria include forward selection (progressive addition of good features), backward elimination (progressive deletion of bad features), and feature ranking, along with mixed strategies like “floating selection” (one step forward, then some steps backward). These techniques can also be broadly divided into two categories [4]: filters and wrappers. Filters evaluate the relevance of each input (subset) using the data set alone, while wrappers invoke a learning algorithm to evaluate the quality of each input (subset). Both approaches usually involve searches, often only local, through the space of possible input subsets. Wrappers are usually more computationally demanding, but they can be superior in accuracy when compared with filters. Yet another method type, related to wrappers, is that of embedded methods [7], which are “informed” in that they rely on
S. Rovetta et al. / The Discriminating Power of Random Features
5
evaluations of moves in the search space provided by information such as the gradient of a quality or relevance function, which involve the knowledge of internal details of the learning machine, such as information gain given the particular path in a classification tree so far [10], or the weights of a linear threshold classifier [15,8,11]. In this case feature selection is usually performed jointly with model training. One last, but certainly important design decision for any input selection method is the function by which the importance of inputs is evaluated. This function is termed relevance and is strictly related to the other aspects that we have reviewed. A discussion about feature relevance can be found in [9]. In wrapper methods, usually a performance criterion for the learning machine (e.g., classification error) coincides with or is a part of the relevance function. Note that inputs which are useful for solving the learning task are not necessarily relevant from the standpoint of the underlying model. A variable may give a contribution to the operation of a given learning machine even without bearing any useful information. To be convinced of this, we can consider the example of a linear classifier. Some variables that are strongly correlated with others (even deterministically [14]) or variables with a constant value (the bias) may help increasing the separating power for a given problem.
3. The problem of relevant random features We assume a training set for a two-class classification problem X = {x1 , . . . , xn } containing n d-dimensional patterns xi = (xi1 , . . . , xid ) in Rd with a vector of labels y = (y1 , . . . , yn ) and yi ∈ {0, 1}. We suppose that the attributes are real. What happens if all attributes are random numbers picked according to a uniform distribution in the interval [0, 1]? We denote by G the event that a feature f is able to separate the two classes perfectly matching the labels y. The probability P (G) of the event G is used to compute how a completely random training set is likely to contain some features which will be deemed very relevant by a wrapper approach. Suppose, then, that the n × d features are uniform random real numbers in [0, 1]. We denote by G the event that a feature f is able to separate the two classes perfectly matching the labels y. We want to evaluate the probability P (G) of the event G. To this end we assume that, when we are considering the discriminative capability of a feature f , we arrange its values in increasing order: f1 < f2 < . . . < fn Since we are dealing with real numbers the “strictly less than” conditions can be used. Accordingly with this ordering we arrange the labels yi in order to keep the initial association between samples and labels. This is a common “feature ranking” scheme, modeling the task of gene selection that we are referring to. Now the probability of the event G can be translated as the probability that n/2 red balls and n/2 black balls are picked in such a way that all the red balls are picked before the black ones. It is very easy to compute this probability which is just one divided by the combinations of n/2 objects in n places without considering the order of picking of the objects. Thus:
6
S. Rovetta et al. / The Discriminating Power of Random Features
P (G) =
−1
n n/2
=
n! (n/2)!(n/2)!
−1 (1)
Now if we are picking d features we can compute the probability that none of them is able to satisfy the event G. This probability can be computed considering d independent ¯ of probability 1 − P (G): events G ¯ d = (1 − P (G))d P (G)
(2)
Now the probability that at least one event G happens is: P (G)d = 1 − (1 − P (G))d
(3)
If we allow a number m of misclassifications for each class and we call this event Gm , its probability will be: −1 n/2 n n/2 (4) · · P (Gm ) = m n/2 m This is easy to verify, since the event Gm can be considered as the combinations of m red balls in n/2 places which is the number of possible configurations of the misclassified patterns for the first class. For each configuration of these red balls we have the same for the black ones. The analysis presented so far is independent of the specific classifier adopted. When the classifier is fixed, it is possible to use well-known results about the so-called capacity to investigate the probability of being able to separate two balanced classes. As an example, we can cite the ubiquitous case of linear threshold units (including for instance the linear Support Vector Machine). In this case, deterministic separability, or the fact of being guaranteed that any training set will be separable, is limited by the Vapnik-Chervonenkis dimension [16] of the learning machine, which equals the number of inputs plus one for a linear threshold unit with bias. This means that, in the presence of large-dimensionality, small-cardinality data sets, the Vapnik-Chervonenkis dimension of the classifier will always be larger than the cardinality of the data set, and the analysis above will always be applicable. If we increase data set cardinality, we can use the results by Cover [5] to analyze the separability of random patterns, and we can find that the break-even point where we have probability 1/2 to be able to separate a random training set (i.e., the point where our analysis is applicable 50% of the times) corresponds to a cardinality that equals twice the Vapnik-Chervonenkis dimension of the classifier. This analysis can be extended to threshold units which are not linear [5], and also to multi-layer neural networks using such units [14].
4. External cross-validation and relevance aggregation In [1], Ambroise and McLachlan pointed out the so-called selection bias problem affecting wrapper approaches to input selection when the learning machine is tested on the same data set used in the first instance to select the input being used by the learning machine, or when the cross-validation of the learning machine is internal to the se-
S. Rovetta et al. / The Discriminating Power of Random Features
7
Figure 1. The probability of finding a discriminating random feature without errors (above) and with at most two errors (below).
lection process. One viable solution to this problem they proposed was using external cross-validation. With M-fold cross-validation of a learning machine R, the training set is divided into M non-overlapping subsets of equal size. The learner is trained on M − 1 of these subsets combined together and then applied to the remaining subset to obtain an estimate
8
S. Rovetta et al. / The Discriminating Power of Random Features
Table 1. Ranking of the 10 best features obtained after a 10-fold cross-validation procedure. Rank
Synth30
Synth40
Synth50
1
2
4
4
2
4
2
3
3
702
3
2
4
37
1
1
5
3
407
343
6
774
771
626
7
605
844
262
8
718
523
772
9
793
658
875
10
657
941
376
of the prediction error. This process is repeated in turn for each of the M subsets, and the cross-validation error is given by the average of the M estimates of the prediction error thus obtained. It is worth noting that that for each of the M subsets the application of the wrapper input selection method leads potentially to a different selected subset of inputs. The feature selection method that we have used in the experiments involves an external cross-validation procedure, with the application of voting techniques to aggregate the input selection results of the M folds. We can define, e.g., the voted relevance of an input as the sum of the its occurrences in the results obtained on the M folds. Although this practice is good from the standpoint of reducing the risk of overfitting, we should observe that the discriminative power of random patterns cannot be reduced following this direction. This is because, according to the available data, the feature is actually relevant, although it may not be in a different sample from the same problem. Therefore, cross-validation is not a remedy for this problem.
5. Experimental validation The experimental validation is focused on the SAIS feature selection method proposed in [6]. The SAIS algorithm has been implemented in the R language and statistical environment under the Linux operating system. Briefly, the method weights the inputs using a relevance vector, then trains a classifier, and performs a Simulated Annealing process; this procedure is repeated in a loop until a suitable stopping criterion is met. Inputs are ranked according to their participation to repeated, successful trainings. The method’s generalization is controlled by an external cross-validation procedure. The data sets are built as follows: there are 2 balanced classes, 1000 inputs, and a variable number of patterns (30, 40 and 50 respectively for data sets synth30, synth40, and synth50). Inputs 1 and 2 are able to discriminate the two classes; inputs 3 and 4 are also able to discriminate the two classes, since they are defined as inputs 1 and 2 changed in sign, respectively; the rest of the inputs is random. Classification errors for these data sets are obtained as follows: 2 errors on Synth30, corresponding to 6.7%; 1 error on Synth40 or 2.5%; 1 error on Synth50 or 2%.
S. Rovetta et al. / The Discriminating Power of Random Features
9
It is possible to note that, even if the two best features are always selected among the first four, which are actually discriminating both on the training set and in the generating model, in the smallest data set (synth30) the third and fourth best features are not among the first four, but they are selected among the random ones. To find Feature 3, we have to go to rank 5, while Feature 1 does not appear among the best 10. This is a clear indication that, when cardinality decreases, the danger of having highly discriminating features which are irrelevant in the underlying model is an actual risk.
6. Discussion Input selection is a complex task, partly because it is not well defined (the task itself depends on the specific goal), and partly because of computational complexity. In addition, as we have shown, there are issues arising from sample complexity. Throughout the paper we have made references to a specific task setting. Our hypotheses stem from the applicative problem of gene selection, therefore we focus on classification, we use a ranking technique, and we wish to find out the biologically relevant genes. Unless we have some additional information about the inputs, i.e., external knowledge, separating inputs which are advantageous for the classification but are not significant in the biological model requires that we tune our methods for the lowest false negative rate, at the risk of accepting a higher false positive rate. However, we have proven that, even for input dimensions which are not uncommon, false positives include not only inputs which are by themselves useful although not biologically relevant, but also inputs which are discriminative only because of small-sample effects, even in the case that they are completely random. The usual recommendation to use statistical validation techniques like leave-oneout or k-fold cross-validation does not apply to this very peculiar problem, even in the (correct) external cross validation setting which does not include the variable selection process in the loop. Unfortunately, the only remedies that apply to this case are those which are usually assumed to be of the most expensive type, namely, getting more data, or at least getting more external knowledge (which will be necessarily indirect, since we start from a situation where we assume that the role of individual inputs is not known). This situation implies that the results of computational gene selection should be monitored in the light of the possibility of the small-sample effect that we have analyzed. It also suggests us to ascribe particular importance to meta-analysis, which is the instrument used by applied statisticians to make inferences based not only on one study, but on a collection of studies. This instrument is widely used in medicine [13], and was suggested also in molecular biology [12]. The present work shows that for some problems this may be the only option available.
References [1] Christophe Ambroise and Geoffrey J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences, 99(10):6562–6566, 2002.
10
S. Rovetta et al. / The Discriminating Power of Random Features
[2] Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, and Constantine D. Spyropoulos. An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal email messages. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 160–167, New York, NY, USA, 2000. ACM. [3] R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961. [4] Avrim Blum and Pat Langley. Selection of relevant features and examples in machine learning. Artif. Intell., 97(1-2):245–271, 1997. [5] Thomas Cover. Geometrical and statistical properties of inequalities with application in pattern recognition. IEEE Trans. Electron. Comput., 14:326–334, 1965. [6] Maurizio Filippone, Francesco Masulli, and Stefano Rovetta. Unsupervised gene selection and clustering using simulated annealing. In Isabelle Bloch, Alfredo Petrosino, and Andrea Tettamanzi, editors, Fuzzy Logic and Applications, 6th International Workshop, WILF 2005, Crema, Italy, September 15-17, 2005, Revised Selected Papers, Berlin Heidelberg, September 2006. Springer-Verlag. [7] Isabelle Guyon and Andr´e Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, 2003. [8] Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene selection for cancer classification using support vector machines. Mach. Learn., 46(1-3):389–422, 2002. [9] George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In Machine Learning, Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, pages 121–129. Morgan Kaufmann, July 1994. [10] Sreerama K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345 – 389, 1998. [11] Francesco Masulli and Stefano Rovetta. Random voronoi ensembles for gene selection. Neurocomputing, 55(3-4):721–726, 2003. [12] Yves Moreau, Stein Aerts, Bart De Moor, Bart De Strooper, and Michal Dabrowski. Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends in Genetics, 19(10):570 – 577, 2003. [13] S.-L. T. Norman. Tutorial in biostatistics. meta-analysis: Formulating, evaluating, combining, and reporting. Statistics in Medicine, 18:321–359, 1999. [14] Sandro Ridella, Stefano Rovetta, and Rodolfo Zunino. Circular back–propagation networks for classification. IEEE Transactions on Neural Networks, 8(1):84–97, January 1997. [15] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. [16] Vladimir N. Vapnik. Statistical learning theory. Wiley, New York, 1998.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-11
11
The influence of noise on the dynamics of Random Boolean Network A BARBIERI a,1 , M. VILLANI a , R. SERRA a , S.A. KAUFFMAN b and A. COLACCI c a Department of Social, Cognitive and Quantitative Sciences, Modena and Reggio Emilia University, via Allegri 9, 42100 Reggio Emilia, Italy; E-mail: {mvillani, alessia.barbieri.35586, rserra}@unimore.it b Institute for Biocomplexity and Informatics University of Calgary, 2500 University Dr. NW Calgary, AB T2N 1N4, Canada; E-mail:
[email protected] c Mechanisms of Carcinogenesis and Anticarcinogenesis Unit - Environmental Protection Agency Emilia-Romagna Region Viale Filopanti, 20/22 Bologna, Italy; Email:
[email protected] Abstract. Since real networks are noisy systems, in this work we investigate the dynamics of a deterministic model of gene networks affected by small random fluctuations. In this case jumps among different attractors are possible, thereby leading to an asymptotic dynamics different from that of the underlying deterministic model. The significance of the jumps among attractors is discussed. A key control parameter of this phenomenon is the size of the network, a fact that could lead to interesting consequences for the theories of natural and artificial evolving systems. Keywords. Random Boolean Networks, noise, attractors, emergent computation
Introduction Models of neural networks can be roughly divided in two classes, according to their behaviour in the recognition phase, namely those where there is a directional flow of activation, such as feedforward layered networks [1], and those which are truly dynamical systems, like e.g. those proposed by Elman [2] and by Hopfield [3]. Although the former have proven to be very effective in applications, the latter are very interesting from the viewpoint of emergent computation, since their properties are related to their phase portrait, i.e. attractors and basins of attractors. This is particularly clear in the case of the Boolean Hopfield model, whose attractors are fixed points (in the usual case with symmetric synaptic weights), and the learning algorithms which have been devised try to shape the attractors in such a way that they coincide with the patterns to be retrieved. There are of course limitations to the possibility of shaping an emergent property (i.e. an attractor) according to specific goals; on the other 1 Corresponding Author: Alessia Barbieri, Department of Social, Cognitive and Quantitative Sciences, Modena and Reggio Emilia University, via Allegri 9, 42100 Reggio Emilia, Italy; e-mail:
[email protected].
12
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
hand, some attractors which the network spontaneously develops - different from the patterns which have been taught - may perform useful non preprogrammed (emergent) computation [4]. Another well-known Boolean network model is that of Random Boolean Networks [5] [6] (briefly, RBNs), which display a much richer dynamics than that of the symmetric Hopfield model (although the learning algorithms proposed so far for RBNs have been less effective [7]). The attractors of finite RBNs are cycles; however, by taking into account the scaling of the cycle period with the system size, and by considering the robustness with respect to small changes in the initial conditions, it has been possible to introduce a notion of ordered vs. disordered attractors, which represents in a sense the analogue (in a finite discrete system) of the distinction between regular and chaotic attractors in continuous systems. Section 1 will review the main dynamical properties of RBNs. Note also that the dynamics depends upon the choice of the allowed set of Boolean functions: besides linearly separable ones, other interesting sets of functions (canalizing and, more recently, biologically plausible ones) have been analyzed [8]. The inspiring metaphor for RBNs is that of a network of interacting genes, which affect each other’s activation. In this context the network attractors have been associated to cellular types. However biological systems (genes, proteins, neurons, etc.) are easily affected by noise, and so are many artificial systems; therefore the relevance of the attractors has to be checked in the presence of noise. In this paper we present a thorough investigation of these aspects. On the basis of the previous remarks, it has been previously suggested by one of us [9] that the really interesting properties of these networks should not be related to single attractors, but rather to sets of attractors which are easily reachable from one another, under the influence of a small amount of random noise. The observable behaviour of the system under the action of random noise should then be described by these “relevant” sets of attractors. In this study we consider only the smallest type of perturbations, i.e. a transient flip: at a given time step the value of a randomly chosen node is changed, and the network is then left free to evolve according to its deterministic dynamics. In section 2 we provide the necessary definitions and a precise statement of the problem to be addressed. We then discuss the outcomes of extensive computer experiments which can be divided in two classes, “small” and “large” networks. For small networks, in this work up to 20 nodes, it has been possible to study the deterministic (i.e. without flips) dynamics for all the initial conditions: in these cases we have a complete knowledge of the phase portrait of the network, which provides the basis for investigating the effects of noise. In the case of large networks it is impossible to consider all the 2N initial conditions, and a limited sampling has been performed. Section 3 presents and comments the first results of our study, and in section 4 the conclusions that can be drawn from it are discussed. 1. Dynamical properties of random Boolean networks There are excellent reviews of RBNs in the literature [10] [5], so here we will very briefly summarize some of their properties. The models were originally developed by Stuart Kauffman as a model of genetic regulatory network in cells [11]. Besides their biological interest, they have been used to model different complex phenomena [10].
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
13
A RBN consists of N nodes, which can assume a Boolean value; each node has kin input connections. In the classical model here used kin is the same for all nodes, where the input nodes are chosen randomly between the other N − 1 nodes; consequently, outgoing connections follow a Poisson distribution. Every node has a Boolean rule fi , which determine the node value at time t considering the value of its own input at time t − 1. kin The Boolean rules are chosen randomly between all the 22 possible Boolean functions with two arguments. The updating of the network is synchronous, the time is discrete and neither the topology nor the logic rules associated to each node change in time (the so-called quenched model). In such a way each state X(t) = [x1 (t), x2 (t), ... xN (t)] (xi (t) being the activation value of node i at time t) determines univocally the successive state X(t + 1). In this deterministic system the dynamic admits as possible asymptotic states only fixed points or cycles. It is possible to distinguish three different regimes: ordered, chaotic and critical. Few attractors, scarce sensitivity to initial conditions, robustness to perturbations and a large number of static nodes characterize the ordered regime, whereas a huge number of attractors (exponential with N ), high sensitivity to initial conditions and a high fraction of oscillating nodes characterize the chaotic one. Similar states tend to very different attractors and the network is highly sensitive to perturbations, which can expand till invading the whole system. The critical regime is the zone of transition between order and chaos, where small changes can propagate without necessarily permeating the whole system. The dynamical regime of a RBN depends primarily on two parameters: the average connectivity of the network and the magnetization bias p, which is the probability that the boolean functions fi have the outcome 1. Many works have shown that if all boolean rules are accepted the critical regime holds when kin = 2 and p = 0.5 [12], or that in −1 a more general way the formula kin = [2p(1 − p)] holds [10]. Critical RBNs show equilibrium between robustness and adaptiveness [13]; for this reason they are reputed plausible modes for the organization of living systems. Recent results support the view that biological genetic regulatory networks operate close to the critical region [14] [15].
2. The influence of noise As it has been stressed in the introduction, the presence of noise can induce jumps from an attractor to another. It is therefore appropriate to associate the relevant properties of the network to sets of attractors through which the system can easily hop, rather than to single attractors. For the purpose of the present study we will consider only a very small amount of noise, i.e. the flip of a single node. At time t the state of a node chosen at random is changed; its state at time t + 1 will be determined by the network’s own dynamics. Let Ai (i = 1...M ) be the M attractors of a given network (under the action of the deterministic transition functions), and let A be the set of such attractors. Let us now consider a network which, after a finite transient, is in attractor Ai . We say that Aj is directly reachable from Ai if there is (at least) a node such as the flip of that node at time t (when the system is in attractor Ai ) has the effect of bringing (after a transient) the network to the attractor Aj . In symbols, we represent the fact that Aj is reachable from Ai (directly or indirectly) with an arrow: either Ai → Aj or Aj ← Ai . An ergodic
14
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
set of attractors (ES) of the network is a subset of A composed by attractors which are reachable from any other member of the ES, not necessarily in a single step [9]. Also, the ergodic set is such that one or more flips can’t make the system leave the ergodic set. Formally: ES ≡ {Ai ∈ A|∃Aj ∈ ES, Aj → Ai ; Ai → Ak ⇒ Ak ∈ ES} Note that there may be more (disjoint) ESs for the same network: the problem we address in this work is then that of finding the ESs of a given network.
3. Experiments and first results We generate critical RBNs (with p = 0.5 and kin = 2) with different values of N and characterise their dynamics by finding the whole set of their attractors, or a subset for large networks. For each attractor found we perturb every node in every phase (one flip at time) and control the identity of the attractor the perturbation is leading to; this information is memorised in an adjacency matrix, whose rows and columns correspond to the whole set of attractors of the net2 . For small networks we test all the possible states; in this case every attractor is found. For large networks we explore a definite number of different initial conditions3 . Please note that these latter networks have a small probability of not finding attractors after the perturbation (within the search parameters). These occurrences are very few and are not considered in the adjacency matrix. From the adjacency matrix we can obtain a graphical representation of the attractor landscape. Figure 1 shows an example of such structure: each vertex represents an attractor of a network; there is an edge from vertex a and to vertex b if there exist at least one node belonging to one state cycle of the attractor a that, if perturbed, leads the system to the attractor b. Edge labels correspond to the percentages of perturbations of the attractor that lead the system from one attractor to itself or to another one, calculated upon total number of possible attractor perturbation (N ∗ attractor period). Note that the percentages associated to the links represent also the probabilities that a flip of an arbitrary state belonging to a cycle leads the system to another state cycle. Analyzing the diagonal of the adjacency matrices, we found that the higher the network size, the higher is (on average) the percentage of perturbations that don’t have influence on the attractor: after a brief transient the dynamic of the network carries back the system in the starting cycle. In other words, the percentage of perturbations responsible for attractor transitions decrease as N increases (see Table 1).
2 All the proposed analysis takes into consideration only networks with a number of attractors (N ) higher A than one. It is possible to include also these nets; nevertheless the effect is only to smooth the graphs without changing their behavior (it is the effect of introducing always constant values in averages). 3 For networks with N = 100 we explore 5000 different initial conditions, whereas for networks with N = 200 and N = 1000 we explore 10000 different initial conditions.
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
15
Figure 1. Graphical representation of the attractor landscape: each cycle is represented by a vertex. There is an edge from vertex a to vertex b if perturbing at least one node in one phase of the attractor a the system goes to attractor b. The edge labels correspond to the percentages of the number of perturbation that leads the system from one attractor to itself or to another one. Table 1. Percentage of perturbations that don’t have influence on the state cycle. All points are averages of 65 different networks, with the exception of the nets having N = 1000 for which only 20 networks are analysed. N
10
20
100
200
1000
%
69,6
73,9
85,6
90,1
97,3
In absence of noise all the networks have the same quantity of attractors and ESs, each attractor representing a disjoint ES (in fact there are no noisy events able to allow the system moving). On the other hand, real networks have to deal with noise, and this fact has consequences on the attractors’ landscape. If all single bit flip are permitted, all the networks we analysed present only one ES, with very few exceptions having two ESs (one for each network set of the same size). Nevertheless, links associated to low probabilities have small possibilities of realisation during the cell life; in other words, a transition having not enough probability of being activated could be neglected in examining the attractors landscape. For this reason it is possible to analyze the effect of introducing a threshold σ that separates low and high probabilities on the attractors’ landscape, neglecting the transitions whose probabilities lie below. Figure 2 shows the behaviour of the ratio η = ESs/NA between the number of ESs and the total number of networks’ attractors NA with respect to the variation of the threshold σ for each network sets. When η reaches the value of 1 all the attractors are ESs, whereas each net starts with a value that represents only a fraction of the maximum.
16
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
Figure 2. The fraction η of total possible ESs in the network over the total number of attractors NA , with respect to the variation of the threshold σ: the higher is N , the lower are the values of σ able to separate all the attractors. All points are averages of 65 different networks, with the exception of the nets having N = 1000 for which only 20 networks are analysed.
Now we can make some interesting observations. First, the values of η relative to σ = 0 are in accordance with the proprieties of noiseless RBNs: the total number of attractors is a growing function of N , whereas the corresponding networks have typically only one ES; therefore at σ = 0, η is a decreasing function of N . Second, we don’t need high σ values in order to obtain more than one ES; moreover, at constant σ value the higher is the size of a noisy network the higher is the relative fraction of ESs the network is able to express. Therefore the values of σ needed to reach the maximum value of η are a decreasing function of N (see Figure 3). In other words, nets having bigger size obtain “for free” higher control on their internal noise.
Figure 3. Minimum threshold values needed to obtain the maximum number of Ergodic Set (equivalent to the number NA of attractors). All points are averages of 65 different networks, with the exception of the nets having N = 1000 for which only 20 networks are analysed.
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
17
The effect of threshold σ is that of separating the unique ES in more disjoint ESs. If we make the analogy ES ↔ patterns of interest, this is a very interesting effect, because it indicates the possibility of creating “specialised” patterns starting from a more “global” and noisy one (see Figure 4). Again, the higher is N the more easily the nets can exhibit this behaviour, a behaviour that could introduce interesting consequences for the theories of natural and artificial evolving systems, which typically continuously grow in time.
Figure 4. The attractor landscapes of a network with 200 nodes, referred to different value of threshold σ. In the first image all the flips are allowed and the network has only one ES. The second image represents the attractor landscape of the same network with σ = 2, whereas the third and fourth images have respectively σ = 3 and σ = 4.
4. Conclusions In this work we discuss the influence of noise in attractors’ landscape of critical RBNs. The presence of small noisy fluctuations in the model allows the system to jump through different attractors. In case of a finite lifetime, the less frequent fluctuations could be neglected allowing the formation of many independent patterns. This behaviour can be seen as the sequential specialization of few initial undifferentiated patterns, and might lead to interesting models of cell differentiation. In addition, the growth of the net size enhances “for free” the control of this phenomenon, leading to interesting consequences for the theories of natural and artificial evolving systems.
Acknowledgements This work has been partially supported by the Italian MIUR-FISR project nr. 2982/Ric (Mitica)
18
A. Barbieri et al. / The Influence of Noise on the Dynamics of Random Boolean Network
References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10]
[11] [12] [13] [14]
[15]
Rumelhart, D.E., McClelland, J.L. : Parallel Distributed Processing, Vol. 1, Bradford Book, MIT Press, Cambridge Mass, London (1986) Elman, J.L., Finding structure in time. Cognitive Science 14, (1990) 179-211 Hopfield, J.J., Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. USA 79 (1982) 2554-2558 Serra, R., Zanarini, G., Complex systems and cognitive processes, Springer-Verlag (1990) Kauffman, S. A., The origins of order, Oxford University Press (1993) Bastolla, U., Parisi, G., The modular structure of kauffman networks, J. Phys. D 115, (1998) 219-233 Patarnello A. and Carnevali P., Learning networks of neurons with boolean logic, Europhysics Letters, 4(4) (1987) 503-508 Serra R., Graudenzi A., Villani M., Genetic Regulatory Networks and Neural Networks, New Directions in Neural Networks Eds. B. Apolloni, S. Bassis and M. Marinaro, Frontiers in Artificial Intelligence and Applications V.193 IOS Press Amsterdam, The Netherlands (2008) Ribeiro A.S. and Kauffman S.A., Noisy attractors and ergodic sets in models of gene regulatory networks, Journal of Theoretical Biology 247 (2007) 743-755 Aldana, M., Coppersmith, S., Kadanoff, L.P., Boolean Dynamics with Random Couplings, in E. Kaplan, J.E. Marsden, K.R. Sreenivasan (eds.), Perspectives and Problems in Nonlinear Science. Springer Applied Mathematical Sciences Series (2003) Kauffman, S. A., Metabolic stability and epigenesis in randomly constructed genetic nets, Journal of Theoretical Biology, 22 (1969) 437-467. Derrida B., Pomeau Y., Random networks of automata: a simple annealed approximation, Europhys. Lett. 1, (1986) 45-49 Aldana M., Balleza E., Kauffman S.A., Resendiz O., Robustness and evolvability in genetic regulatory networks, Journal of Theoretical Biology 245 (2007) 433-448 Villani, M., Serra, R., Graudenzi, A. & Kauffman, S.A., Why a simple model of genetic regulatory networks describes the distribution of avalanches in gene expression data. J. Theor. Biol. 249 (2007) 449-460 Ramo P., Kesseli J., Yli-Harja O., Perturbation avalanches and criticality in gene regulatory networks, Journal of Theoretical Biology v. 242 (2006) 164-170
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-19
19
Toward a space-time mobility model for social communities a
Bruno APOLLONI a and Simone BASSIS a,1 , and Lorenzo VALERIO a University of Milan, Computer Science Dept., Via Comelico 39/41, 20135 Milan, Italy {apolloni,bassis,valerio}@dsi.unimi.it Abstract. We present a sensitivity study of a wait and chase scheme introduced in a previous work to model the contact times between people belonging to a social community. The membership presupposes that, besides purely occasional encounters, people are motivated to meet other members of the community, while the social character of the latter makes each person met an equivalent target. This calls for a mobility in the family of Lévy jumps alternating a wandering period within a limited environment – waiting phase – with jumping to a new site constituting the target of a chase phase. In this paper we aim to connect specific features of single individual dynamics to the overall evolution of the social community in the true thread of the Palm calculus philosophy. We base this study on a large mobility track dataset expressly collected with this objective. Keywords. Social Communities, Mobility Model, Processes with Memory, Paretolike Distribution Law, Algorithmic Inference
Introduction Social community is a crasis between social networks and virtual communities used to mean a set of persons that are grouped together in a community because they share a common interest – the social core of the community – but need a communication network to cultivate this interest – the structural counterpart. While the traditional vision of the network is in terms of an external communication structure, such as the WEB, a more recent perspective assumes the network to be embedded in the community, i.e. raised up exactly by people who are moving along crossing trajectories. In this way, the physical meeting of people becomes a relevant part of community life, so that its overall behavior is not far from the herd dynamics of many living species, ranging from bird [7] to primates [14]. The main feature of these dynamics is its divergence from pure Brownian motion [8], as a consequence of a symmetry loss due to the intentionality of the individual motion. In this respect we have two research veins: the first focused mainly on spatial aspects, aimed at characterizing the trajectory coordinates [9]; the second recently promoted in the dual fields of immunology and opportunistic networks in order to identify the contact times between community members [17]. 1 Corresponding Author: Simone Bassis, University of Milan, Computer Science Dept., Via Comelico 39/41, 20135 Milan, Italy; e-mail:
[email protected]
20
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
10
10
y
10
y
5
y
5
2
4
6
5
8
10
12
x
5
2
4
6
8
10
12
x
4
θ
6
8
10
12
x
5
5
(a)
2
(b)
(c)
Figure 1. Joint traces of two cars (plain and dashed curves respectively) when: (a) both move according to a Brownian motion behavior; (b) the former moves only in one quadrant (absolute value of the Brownian motion components) from a trigger time on; and (c) an oracle rotates this trajectory toward the other car with some approximation (quantified by the ray of a proximity circle).
In view of fusing the two veins, so as to have time and location of encounters as a function of the people dynamics, in a previous paper we introduced the dodgem car model [2]. Assume you are playing with dodgem cars at an amusement park. You drive around until, from time to time, you decide to bang into a given car which is unaware of your intent. For the sake of simplicity, we may assume the trajectory of each car to be a plane Brownian motion before the chase triggering (waiting phase). Thus, with the reference frame in Figure 1(a), indexing with i = 1, 2 the cars whose histories we are following in terms of their position (xi , yi ), we have 2 Xi (t) ∼ N0,√t
Yi (t) ∼ N0,√t
(1)
where N0,√t is a Gaussian variable centered around 0 and with a standard deviation growing with the square root of the elapsed time t, which denotes a car mean speed. Then you, sitting in the first car, decide at time τ to reach and crash into the second car. The questioned variable records the instant T > τ you succeed. The chase effectiveness depends on the capability of orienting your motion in the direction of the target, which corresponds to converting a part of the motion along the cars’ connecting line from symmetric to oriented moves. Mathematically, orientation corresponds to aligning the absolute value of the elementary steps in that direction, so as to work with Chi distributed addends in place of Gaussian ones (see Figure 1(b) and (c)). As a consequence, we obtain the ratio V between chase and waiting time as an F variable with parameters (2, 2) [15] so that its cumulative distribution function (c.d.f.) FV reads FV (v) = 1 −
1 I[0,∞) (v) 1+v
(2)
Generalizing the dependence of the Brownian motion’s standard deviation in terms of a generic power α of time t leads to 2 By
default, capital letters (such as U , X) will denote random variables and small letters (u, x) their corresponding realizations; the sets to which the realizations belong will be denoted by capital gothic letters (U, X).
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
log F T
21
log F T
1.00
1.00
0.50
0.50
0.20
0.20
0.10
0.10
0.05
0.05
0.02
0.02
0.1
1
10
100
(a)
1000
104
log t
0.1
1
10
(b)
100
1000
104
log t
Figure 2. Computing the hitting time e.c.c.d.f. from: (a) our mobility model, and (b) a sample of numerically integrated trajectories.
FV (v) = 1 −
1 I[0,∞) (v) 1 + v 2α
(3)
where we obtain the expression in (2) by chosing α = 1/2. Now, to obtain the car crash time T we need to arrange a convolution of the above distribution with the trigger time one. We compute this convolution approximately through (see Figure 2(a)) FT (t) = 1 −
b+1 a I[0,+∞) (t) b + ct + 1
(4)
where the parameter b1/a is the approximate abscissa of the plateau end, denoting the divide between non intentional and intentional behavior of the agent, a is proportional to the rate of contacts when agents move according to some purpose, and c is a finetuning factor. We will refer to a, b, c as statistical parameters, for short s-parameters. This expression represents a slight variant of the extended family of Pareto distribution laws proposed by Pareto since 1897 [13]. While the original family was tailored to study economical phenomena and extremal events, our form proves more suitable from a statistical point of view in order to fit curves like in Figure 2(b) related to intercontact times between agents in social scenarios like in the current toy example. The goal of this paper is twofold: 1) to toss the ability of (4) to describe the dodgem car process, and 2) to check the fitting of this model with real dynamics within a social scenario such as the one tiled by an opportunistic network. To this end we set up a numerical experiment where a set of dodgem cars, call them agents, are put in motion within a square field, each according to variously tuned wait-and-chase rules, as described in Section 1. Then a rudimentary fitting procedure is run in Section 2 so as to connect the s-parameters in (4) with the dynamics tuning parameters. In Section 3 we use these relations to interpret the mobility tracks collected on our university campus, as an extension of the experiment described in a previous paper [1]. Finally, we draw some conclusions in Section 3.
1. The artificial tracks By exploiting asynchronicity efficiently ruled by the Pi-calculus paradigm [12], we have numerically implemented our model on a simulated field. Namely, having in mind a
22
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
log F T
y 200
1.00 0.50
150 0.20
100 0.10 0.05
50
0
0
50
100
150
x
200
10
100
1000
log t
(b)
(a)
Figure 3. Implementation of the proposed mobility model. (a) Tracks of 10 out of 39 agents followed for 1000 seconds, and (b) intercontact e.c.c.d.f.s for 1.5×106 step long tracks for all 39 agents. Thick plain curve: median e.c.c.d.f.; thick dashed curve: Pareto-like fitting.
telecommunication experiment we will discuss later on, we consider a field of 200 × 200 square meters with 39 individuals uniformly located inside. Each agent has two mobility modes: a special random waypoint [10] up to trigger time τ and the mentioned pursuit strategy after it. In the first mode, an agent randomly selects a direction that follows for a time length θ that is uniformly drawn in [0, 2000] seconds with a mean velocity of v = 1.47 meters per second (mean pedestrian velocity). This is simulated by tossing a positive random number less or equal to 2000, as for θ, a uniform value between 0 and 2π, as for direction, and a Chi distribution with 2 degrees of freedom scaled by 1.17t to maintain the mentioned mean velocity, to sample the distance D(t) covered by the agent at time t. At the completion of time θ he selects a new random direction and so on. When the trigger time τ expires he shifts to the second mode: the above Chi step is now coupled with a suitable angle rotation directing the agent toward the chosen target. A match occurs when, for any reason, an agent gets closer than 10 meters to another agent. We remark that we do not need any constraint on the agent location, since the chase features automatically maintain an overall attractor within the original 200 × 200 square. Figure 3 reproduces a story of this model when the trigger time is drawn from a Pareto distribution as well. Free parameters are the α exponent in (3) modulating the agent mean speed versus time in this phase, and the λ parameter of the trigger time distribution which in the above settings reads: FT (τ ) = 1 −
λ k τ
(5)
Ancillary parameters, such as k or the chase target distribution, are properly set to match the framework of the experimental campaign which will be described in Section 3. For α = 0.9 and λ = 1.5, part (a) of Figure 3 reports the first 1000 seconds of the 1.5 × 106 step long tracks of 10 agents, for the sake of visualization, whereas part (b) synthesizes, for each one of the 39 agents, the intercontact times with usual empirical complementary cumulative distribution function (e.c.c.d.f.). The times are collected along the entire experiment log corresponding to indicatively a bit less than 18 days of simulated time and 150 seconds of our simulation. These times reckon the differences between one clash and the subsequent one for an agent versus all others. In the picture we may see two thick curves showing the median of the above e.c.c.d.f.s, and its fitting through (4). In fact, thanks to a certain reproducibility of the contact times, also the intercontact may
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
23
be interpolated with our Pareto-like distribution for a suitable choice of the parameters, confirmed by the good similarity between the two curves. The small hump over the sloping trait is an effect of the encounters within the waiting phase, whose intercontact time follows an exponential distribution law not taken into account in the previous algebra. We also mention that a similar behavior has been obtained with different trigger distributions, ranging from uniform to exponential ones.
2. The fitting procedure We solve two orders of inference problems: i) fitting of the intercontact times ti s through (4), by identifying s-parameters a, b, c for each agent; ii) regression of the fitting parameters versus the mobility parameters α and λ (for short m-parameters). We accomplish the first task with a slightly improved variant of the Algorithmic Inference procedure described in a previous paper [2]. Namely, denoting by t(i) the i-th element of the sorted intercontact times and by m the quantity (m+1)/2, we use the well behaving statistics [4] s1 = t(m)
(6)
m
s2 =
s3 =
1 ti − s1 m i=1 m
(7)
log t(i)
(8)
i=m
Thanks to the sampling mechanism t = FT−1 (u) = ga,b,c (u) = c
bu + 1 1−u
a1
−1
(9)
relating a realization of the uniform random variable U to a T ’s, we obtain the master equations [5] s1 = ga,b,c (u(m) )
(10)
m
s2 =
1 ga,b,c (ui ) − ga,b,c (u(m) ) m i=1
m 1 ξm bui + 1 log c + log s3 = 2 a 1 − ui
(11)
(12)
i=m
As usual, we solve them in the s-parameters in correspondence to a large set of randomly drawn seeds {u1 , . . . , um }. In this way we obtain a sample of fitting curves as in Figure 4 that we statistically interpret to be compatible with the observed data. The free parameter ξ is set to a value slightly greater than 1 in order to compensate the bias coming both from computing the last statistic only on a part of the observed sample, and from the truncation at the last intercontact, as a direct consequence of the finiteness of the simulation time.
24
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
log F T
log F T
1.000
1.000 0.500
0.500
0.100 0.050
0.100 0.050
0.010 0.005
0.010 0.005 5
10
50
100
500 1000
5
log t
10
50 100
(a)
500 1000
log t
(b)
Figure 4. Curves fitting with the sample (dark gray curves) obtained by solving master equations (10-12) starting with 100 different random vectors, and their 90% confidence region (light gray region). Thick plain curve: median of the confidence region; thick dashed curve: sample e.c.c.d.f.
In the figure we also report the 90% confidence region for these curves, representing a region where we expect that a compatible curve entirely lies with a probability equal to the confidence level of the region. We obtain this region through a standard peeling method [11,3]. Namely, we circularly visited the upper and lower borders of the region made up of the envelope of the curves and erased the extremal ones, i.e. those that at least partially trespassed the envelope of the remaining curves. We iterate the procedure until γ% of the original curves survive, where γ is the confidence level. Also, the thick line represents the median curve obtained with the same procedure, leaving just one single curve to survive. In order to relate s-parameters to m-parameters, first we identify the median as a template of the above curves, then regress its parameters. Namely, we fit the s-parameters a, b, c of the median curves through a polynomial in the m-parameters. In Figure 5 we 1 see the best fitting we obtain separately on a, b a and c. The first graph denotes a complex trend of a with α that we interpret in this way. On the one hand, the comparative inspection of curves like in Figure 6 shows that an a increase (a↑) in (4) has the effect of moving the elbow between the non-Pareto and Pareto traits back, as for the turning time, and down, as for the corresponding probability, with the twofold effect of reducing both the distribution time scale (t↓) and the rate of contact times (r↓) falling in the second trait (call them the intentional times). On the other hand, we see that a has a parabolic trend with α having the top in the surrounding of α ≈ 0.5, a value that calls for the Brownian
1
a
ba
c
000
1.8
2
1.6
1
40
0
500
40
20
0.5
α
1.0 0
(a)
λ
1.4
40
1.2
0 20
0.5
α
1.0 0
(b)
λ
20
0.5
α
λ
1.0 0
(c)
Figure 5. Relation between s-parameters and their mobility companions α and λ obtained from different simulation experiments where the latter vary on a suitable grid. Surfaces: best fitting curves obtained as a quadratic form in the above variables; points: s- and m-parameters obtained through simulation.
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
25
log F T 0.1
0.001 105
0
2000
4000
6000
8000
10 000
t
Figure 6. C.c.d.f. plot in Log scale of Pareto-like distribution (4) with a ranging from 2 (black curve) to 3.6 (light gray curve).
motion as the basic component of the model. Moving far from this value we see a decreasing of a that we alternatively relate to the two effects t↓ and r↓. Namely, since α is a mobility speed-up factor, in the left-hand side of the trend we relate the increase of a with α to a decrease of the time scale (t↓). This effect is contrasted by the rarefaction of the random encounters when α becomes still higher, since the probability of crossing a same 10 meter raw transmitting coverage diminishes with the velocity due to the low agent density. In these conditions we have an overwhelming of intentional contacts (belonging to the Pareto trait) (r↑). 1 We may explain in a similar way the second graph where we broadly relate the b a parameter to the scale of the non intentional encounter times. In principle, this scale decreases with λ, since the average of the related Pareto does so, and increases with α, because of the mentioned spreading effects of this parameter. However, in this case too we have a saturation effect, so that very small λs equalize the trigger times. As a consequence, the number of (now almost purely intentional) contacts follows a Poisson distribution that is analogous to the one of the almost purely non intentional encounters reckoned in the opposite corner. Similarly, we have seen that short as in correspondence of short αs may reduce the number of non intentional encounters (since r↑) contributing to define the scale of the non Pareto trait. The third parameter, c, looks like a fine-tuning factor indirectly affected by the mparameters (see picture (c)).
3. The ground truth Following the first experimental campaign described in [1], we performed a second data collection with the same devices but in a great number. In synthesis, the data are collected through a set of 39 Pocket Trace Recorders (PTRs). Each device is enabled to send and collect signals for 18 days with a single battery recharge. The exchanged signals are small packets (called beacons) that advertise, with a frequency of 1.5 seconds, the presence of the devices in a surrounding represented by the PTR antenna coverage, expressly limited to 10 meter raw. After sending its beacon, a PTR enters a sleep mode and wakes up whenever it receives a beacon from the neighborhood witnessing that an encounter did occur between two devices (rather, one device entered in the transmission range of the other). At this instant, the receiver creates a new entry in the local contact-log containing the following items: i) local ID and ID of the encountered PTR; ii) the timestamp of the first contact; and iii) the timestamp of the contact closing event. As for the latter, an
26
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
log F T
log F T
1.000 0.500
1.00 0.50
0.100 0.050
0.20 0.10 0.05
0.010 0.005
0.02 0.01
100
200
500
1000
(a)
2000
5000 1 10
4
log t
100
200
500
1000
2000
5000 1 104
log t
(b)
Figure 7. (a) Intercontact e.c.c.d.f.s computed on the experimental dataset; (b) same as (a) with the superposition of the 90% confidence region of the median of these curves. Same notation as in Figure 3.
entry in the contact-log is closed when the beaconing from the encounter device has been missing for more than t seconds, with t = 60 seconds in our experiments. The PTRs were distributed to students and administrative/teaching staff within the Computer Science department of the University of Milano. After the 18-day campaign their logs, collected into a single server, were remodulated in order to remove artifacts, by eliminating idle periods represented by the time intervals where people were expected to be away from the campus. Namely, we contracted to 0 the time intervals between 7 p.m and 8 a.m. of workdays and all during the weekend. We also clamped to 0 the last 60 seconds of contacts that are artificially generated by the above beaconing control rule. After this preprocessing, we computed for each PTR a log of both its contacts, intraand intercontact times with any other of the PTRs enrolled in the experiment. In this paper we focus exclusively on the latter. The basic inspection tool is the e.c.c.d.f. that we jointly visualize for all devices in LogLog scale in Figure 7(a), and contrast with the confidence region of the median of these curves in picture (b). At first glance this picture indicates a good match between the proposed theoretical model and on-field behavior. In fact, the two sheafs group different kinds of compatible curves: the former (drawn in part (a) as well) represents samples compatible with the experimental setup, where variations are due to the agents – i.e. people equipped with a PTR – behavior; the latter collects distributions that are compatible with the median sample generated by the setup, with variations induced by the randomness of the seeds ui s. The divergence on the right-hand side of the two sheafs is due to their different nature and to the campaign truncation (after 18 days) as well. The role of confidence region as a shallow container of the related e.c.c.d.f. is shown better in Figure 8, where the right part denotes some drift from the model that we may expect in someone among the variety of people involved in the experiment. We will deepen this variety as follows. The above match is made up of three ingredients: 1) the ability of the dodgem car model to capture the main features of the agent mobility; 2) the robustness of the expression (4) to synthesize these dynamics even in the presence of many additional effects drifting the dynamics from the model; and 3) the adequacy of the Algorithmic Inference statistical tools to draw probabilistic scenarios compatible with the observed data, even in presence of variations of the main free mobility parameters that we identify with λ and α exponents. We depict all them in Figure 9. The three graphs contrast the regression curves obtained in the previous section (see Figure 5) with the pair s-parameters, m-parameters emerging from the inversion of the regression curves as functions of the s-parameters a, b, c of (4). Namely, having computed s-parameter replicas compatible with the collected experimental datasets through master
27
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
log F T
log F T
1.00
1.000 0.500
0.50 0.20
0.100 0.050
0.10 0.05
0.010 0.005
0.02 0.01 100
200
500
1000
2000
5000 1 104
100
log t
200
5000 1 1042 104
500 1000 2000
(a)
log t
(b)
Figure 8. Same curves as in Figure 4 computed on the intercontact times collected on two out of 39 agents participating in the experimental campaign.
equations (10-12), as in Figure 8, we look for the corresponding m-parameters α and λ that minimize the relative error between the computed a, b, c and the triple obtained through the regression curves. Omitting for a moment the third graph, we see that the former two denote a notable generalization of the regression curves on the new points, in spite of their location on the border of the region spanned by the training set. The clouds of points refer to the union of sets of 1950 curves (hence the triplets of parameters specifying them) that are compatible with one of the 39 e.c.c.d.f.s. In particular, we marked the points corresponding to the medians of these sets, assumed to be their representatives. In other words, if we resume the e.c.c.d.f. with this median curve and go back to the m-parameters of the underlying model by back-regressing from the median s-parameters to the mobility ones, we obtain: on the one hand, values in line with the overall trend of both a and b with α and λ, as modeled in the previous sections. On the other hand, these values are compatible with the physics of the people’s dynamics and reflect the two dynamics’ polarizations (before and after α = 0.5) discussed in the previous section. With c things go worse. But this is to be expected given both the tuning role of this parameter and the indirect influence of m-parameters on it. We note however that taking into account its value in back-regressing α and λ (through a minimizing procedure) diminishes the spread of these parameters.
1
ba
a 3 2
40
1 20
0.5
α
1.0 0
(a)
λ
c 2.0
800 600 400 200
1.5 40 20
0.5
α
1.0 0
(b)
λ
40
1.0 20
0.5
α
λ
1.0 0
(c)
Figure 9. S- and m-parameters emerging from the experimental campaign contrasted with regression curves as in Figure 5. Gray points: simulated parameters; white points: parameters inferred as point estimate of experimental campaign e.c.c.d.f.s; and black points: their bootstrapped replicas.
28
B. Apolloni et al. / Toward a Space-Time Mobility Model for Social Communities
4. Conclusion Social community mobility models represent a challenge both for their exploitation in many social contexts and for the sophistication of the theoretical tools required by their analysis and identification. In this paper we essentially follow a numerical approach pivoted on a mainly empirical model. However we had in mind and try to relate methods and results to two very sophisticated theoretical paradigms represented by the Lévy flights [16] and the Palm Calculus [6]. Actually the sensitivity study of the mobility parameters α and λ represents a practitioner counterpart in the time domain of the analysis done by β Lévy in the Fourier domain of the probability density pN (k) = e−N |k| , where N is the index of the encounters and β plays a role analogous to α in our model. Moreover, the sequencing of our jumps (wait + chase episodes) represents a true numerical implementation of Palm calculus, made up of rules for single jumps plus an overall stationary mechanism activating them. Thus we deal with an ergodic process that we follow along the single trajectories drawn both by the dodgem cars in the case study and by the PTRs in the real case. We numerically prove that the Pareto-like distribution is suitable enough for describing this process. We consider this paper as just the starting point of a research vein aimed at jointly synthesizing both the spatial and time aspects of these trajectories. References [1] B. Apolloni, G. Apolloni, S. Bassis, G. L. Galliani, and G. P. Rossi. Collaboration at the basis of sharing focused information: the opportunistic networks. Collaborative Computational Intel., 2009. In press. [2] B. Apolloni, S. Bassis, and S. Gaito. Fitting opportunistic networks data with a Pareto distribution. In B. Apolloni, R. J. Howlett, and L. C. Jain, editors, Join Conference KES 2007 and WIRN2009, volume 4694 of LNAI, pages 812–820, Vietri sul Mare, Salerno, 2007. Springer-Verlag. [3] B. Apolloni, S. Bassis, S. Gaito, and D. Malchiodi. Appreciation of medical treatments by learning underlying functions with good confidence. Current Pharmaceutical Design, 13(15):1545–1570, 2007. [4] B. Apolloni, S. Bassis, D. Malchiodi, and P. Witold. The Puzzle of Granular Computing, volume 138 of Studies in Computational Intelligence. Springer Verlag, 2008. [5] B. Apolloni, D. Malchiodi, and S. Gaito. Algorithmic Inference in Machine Learning. Advanced Knowledge International, Magill, Adelaide, 2nd edition, 2006. [6] F. Baccelli and P. Bremaud. Elements of queueing theory. Springer, 2003. [7] A. M. Edwards et al. Revisiting Lévy flight search patterns of wandering albatrosses, bumblebees and deer. Nature, 449:1044–1049, 2007. [8] A. Einstein. Investigations on the theory of the Brownian Movement. Dover Publication Ltd, 1956. [9] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature, 453:779–782, 2008. [10] D. B. Johnson and D. A. Maltz. Dynamic source routing in ad hoc wireless networks. In T. Imielinski and H. Korth, editors, Mobile Computing, volume 353, pages 153–181. Kluwer Academic Pub., 1996. [11] R. Y. Liu, J. M. Parelius, and K. Singh. Multivariate analysis by data depth: Descriptive statistics, graphics and inference. The Annals of Statistics, 27:783–858, 1999. [12] R. Milner. Communicating and Mobile Systems: the π-Calculus. Cambridge University Press, 1999. [13] V. Pareto. Course d’Economie Politique. Rouge and Cie, Lausanne and Paris, 1897. [14] G. Ramos-Fernandez et al. Lévy walk patterns in the foraging movements of spider monkeys (ateles geoffroyi). Behav. Ecol. Sociobiol., 273:1743–1750, 2004. [15] V. K. Rohatgi. An Introduction to Probablity Theory and Mathematical Statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York, 1976. [16] K. Sato. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, 1999. [17] T. Spyropoulos, A. Jindal, and K. Psounis. An analytical study of fundamental mobility properties for encounter-based protocols. International Journal of Autonomous and Adaptive Communications Systems, 1(1):4–40, 2008.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-29
29
Notes on Cutset Conditioning on Factor Graphs with Cycles Francesco PALMIERI a,1 a Dipartimento di Ingegneria dell’Informazione, Seconda Università di Napoli, Italy Abstract. Inference on factor graphs with loops with the standard forwardbackward algorithm, can give unpredictible results as messages can travel indefinitely in the system with no guarantee on convergence. We apply the exact method of cutset conditioning to Factor Graphs with loops starting from a fully developed three-variable example and providing comments and suggestions for distributed implementations. Keywords. Bayesian Networks, Artificial Intelligence, Propagation of Belief, MATLAB, Data Fusion
Introduction Solving probabilistic problems on graphs has recently become a very popular and promising paradigm in many areas, such as communications, signal processing, statistics and artificial intelligence. When we build a model in the form of a graph, the system variables become part of a network with nodes and branches that exchange messages: our objective is to control the system evolution towards equilibrium points that becomes the solution to our problem. There is a long history related to this kind of approach, also because of the excitement sparked by the similarities with the working of the nervous system (Bayesian reasoning). Furthermore, from an engineering point of view, this distributed approach to computation holds the promise of being able to provide solutions to very complicated inference and learning problems with parallel structures that can be readily implemented in available hardware (e.g. FPGA). Factor Graphs (FG) [2][3][4], when applied to the probabilistic framework, are a different graphical form for describing Bayesian networks [1]. The network of random variables in a factor graph is modeled as the product of functions: the branches represent the variables and the nodes the functions. The FG approach has shown greater engineering appeal compared to the Bayesian network counterpart, because it maps the system directly to a more familiar block diagram. An inference problem in a polytree FG can be easily solved by propagation of messages. The process is guaranteed to converge to the optimal solution in a number of steps that are upper bounded by the graph diameter. The literature on this topic is now quite vast and various authors have tried to provide graphical forms and transformation 1 Corresponding
Author: Francesco Palmieri, Dipartimento di Ingegneria dell’Informazione, Seconda Università di Napoli, via Roma 29, 81031 Aversa (CE), Italy; E-mail:
[email protected].
30
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
methods to allow agile design of algorithms for inference and learning. See [4] and [5] for lists of references. We have also provided some suggestions for programming the sum-product algorithm in MATLAB using compact matrix algebra [7]. Even though the graphical approach to inference has found many successful applications, one of the problems that is still facing the research community is how to perform reliable inference in graphs that have loops (loopy graphs) with limited computational complexity. Even when the graph is not a polytree, the Bayesian framework has been extensively used often simply by running the inference as if there were no loops, using standard forward-backward messages. In such cases it cannot be excluded that the messages may travel in the system indefinitely. However, if the algorithm converges, the resulting distributions can provide sometimes approximate solutions to our quest for inference [12] [13]. Various problems formulated on graphs with cycles, such as turbo codes, LDPC codes, have been successful approached with the standard forward-backward algorithm. There is no guarantee of convergence, but, if the system settles down, the obtained distributions seem to be correct in the mean and provide reasonable approximations to the true solution [14] [15] [16] [17] [18]. Understanding of the working of the standard algorithm on graphs with cycles, to our knowledge, is still scarce. Some contributions have recently appeared in the literature proposing corrections to the solutions obtained with the standard algorithm. An interesting approach is based on the so-called loop series [9] [10] [11]. Even though the computational complexity may become an issue, alternative techniques to blind use of the forward-backward approach in graphs with cycles do exist [8]. For example Monte Carlo methods (such as Gibbs sampling) are classical in the literature and, if designed accurately, they can provide good solutions even if the presence of cycles makes the reliability of the results often difficult to evaluate. Alternatively, exact methods such as merging variables in cliques, as in the junction-tree algorithm, can be used [6]. Clearly, merging variables into larger sets increases the computational complexity because joint densities have to be stored in the network nodes. In this work we focus on an exact method, called cutset conditioning, that we believe holds the promise of providing one of the best solutions for graphs with cycles. Cutset conditioning goes back to the book of Pearl’s [1] and consists in removing the loops by "cutting variables" that become the conditioning variables (loop cutset). The solution is obtained by averaging over the various realizations where each one, being cycle-free, is solved as a polytree. The complexity is, in theory, exponential in the cutset order, but the advantage is that distributed version of the forward-backward algorithm can be easily implemented and random sampling from the cutset [19] can be used. The approach is also related to particle filtering, a technique become popular recently in signal processing [20] [21]. In these notes we first review our compact matrix formulation for Factor Graphs, and then apply cutset conditioning to loopy graphs, working out the details of message propagation. We show how sample messages can be "launched in the system" from the cuts and how they can be handled to compute all the marginals including those of the conditioning variables.
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
31
Pb (X1 ) - X1 Pf (X1 ) Pb (X2 ) Pb (Z) - X2 -Z Pf (X2 ) P (Z|X1 , ..., XN ) Pf (Z) Pb (XN ) -XN Pf (XN )
Figure 1. A factor with N incoming branches
1. Factor graph building blocks In a factor graph, the various blocks in the system (factors) describe our knowledge of the relations among the system variable via the conditional probabilities of their outputs given their inputs. All blocks can have more than one input, but only one output. Multiple outputs must be mapped to higher dimensional variables and to branch blocks. Therefore each branch is connected only to two factors. Figures 1,2 and 3 show the complete block set including terminating branches and sources. When a variables X is constrained (clamped) to a given value X = xj (Figure 3(a)), the branch is cut and two new variables XL (left) and XR (right) are defined, with backward and forward messages set to Pb (XL ) = δ(XL − xj ) and Pf (XR ) = δ(XL − xj ). We assume that messages are distributions. In our notation, if variable X takes values in the finite alphabeth X = {x1 , x2 , ..., xn }, f (X) denotes the n-dimensional column vector that contains the function values. More in general, given M variables X1 , X2 , ..., XM that take value in the product set X1 × X2 × · · ·XM , where Xi = {xi1 , xi2 , ..., xiDi }, i = 1, ..., M , f (x1 , ..., xi−1 , Xi , xi+1 , .., xM } denotes an Di × D1 · ... · Di−1 Di+1 · ... · DM matrix containing the function values organized with each row corresponding to the same value of Xi . For easy programming, we suggest to start from a factor f (x1 , x2 , ..., xM ) that is a row vector containing function values for the ordered set X1 × X2 × · · ·XM . The unique index to select the element of any subset can be obtained by the formula j = D2 · . · DM (j1 − 1) + D3 · . · DM (j2 − 1) + ... + DM (jM − 1),
(1)
where j1 , j2 , ..., jM are the indeces for the M variables. For example, f (x1 , X2 , x3 , x4 ), is the matrix with rows obtained fixing j2 and letting the other indeces vary orderly. This notation allows a very compact matrix formulation of the propagation through the factors. With reference to Figure 1, we have
32
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
Pf (Y1 ) Y1 Pb (Y1 )
Pf (X) X
-
Pf (Y2 ) Y2 Pb (Y2 )
Pb (X)
Pf (YN ) YM Pb (YN )
Figure 2. A branch factor
Pb (Xi ) = P (z|x1 , ..., xi−1 , Xi , xi+1 , ..., xN )(Pb (Z) ⊗ Pf (X1 ) ⊗ · · · · · · ⊗Pf (Xi−1 ) ⊗ Pf (Xi+1 ) ⊗ · · · ⊗ Pf (XN )).
(2)
Pf (Z) = P (Z|x1 , ..., xN )(Pf (X1 ) ⊗ · · · ⊗ Pf (XN )),
(3)
with ⊗ denoting the Kronecker product. The final distributions are obtained after normalization to one: Pb (Xi ) = Pb (Xi )/|Pb (Xi )|, Pf (Z) = Pf (Z)/|Pf (Z)| (|v| denotes the sum of the elements of v). The expression simplifies for the branch factor of Figure 2 to Pf (Yi ) = Pf (X) Pb (Y1 ) · · · · · · Pb (Yi−1 ) Pb (Yi+1 ) · · · Pf (YM )).
(4)
Pb (X) = Pb (Y1 ) · · · Pb (YM ),
(5)
where denotes the Hadamard (element-by-element) product. The final distributions are again obtained after normalization to one: Pf (Yi ) = Pf (Yi )/|Pf (Yi )|, Pf (X) = Pf (X)/|Pf (X)|. The final distribution of our variables is obtained by the product rule using backward and forward messages. For a generic variable X P (X) = Pf (X) Pb (X);
P (X) = P (X)/|P (X)|.
(6)
2. Factor Graphs with cycles: an example Let us start our discussion on factor graphs with cycles with the toy example of Figure 4 where we show three random variables X, Y and Z, both as a bayesian graph and
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
-X xj
33
-
Pf (XR ) = δ(X - R − xj ) -X R Pb (XR )
Pf (XL) -X L Pb (XL ) = δ(XL − xj )
Clamp (a)
Pf (X) -X Pb (X) = U (X)
Open branch (b)
Pf (X) = P0 (X) P0 (X) - X Pb (X) Source (c)
Figure 3. Various factors: (a) Clamp; (b) Open branch; (c) Source.
as a factor graph in normal form (this is often named Forney-style Factor Graph, FFG [3][4]). The set is fully connected and presents a cycle. The joint density is factorized as P (XY Z) = P (Z|XY )P (Y |X)P (X), or in normal form [3] P (X1 X2 X3 Y Z) = P (Z|X2 Y )δ(X1 X2 X3 )P (Y |X3 )P (X1 ),
(7)
where δ(X1 X2 X3 ) is a function that is equal to one if X1 = X2 = X3 and is zero otherwise. If we had an instance on Z = z and wanted to obtain the marginal density of X1 ,or X2 , or X3 , or Y , conditioned on Z = z, we cannot use forward and backward messages, because they would travel indefinitely around the loop. This is explained with the fact that the marginal that we want to compute cannot be written as the product of two sums that contain only a common term. More specifically, if our objective is, for example, to obtain the marginal for Y P (Z = z|X2 Y )δ(X1 X2 X3 )P (Y |X3 )P (X1 ), (8) P (Y, Z = z) = X1 X2 X 3
the sum cannot be split into the product of two sums that depend only on Y (forward and backward messages). The same thing happens for X2 and X3 . For X1 the split is
34
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
X
q
P (X)
X1
-
X2
Z
?
?
-
X3
Z
P (Z|XY )
Y P (Y |X)
-
-
Y
Figure 4. An example with three variables: (a) Bayesian graph; (b) Factor Graph in normal form (FFG).
Pf (X1 ) P (X)
Pf (X2L )
-
Pb (X1 ) Pb (X3 )
-
X2L -
X1
6? X3
Pf (X2R )
X2R
X2
Pb (X2L )
Pb (X2R )
Pf (X3 )
?
Z Pb (Z)
Pf (Y ) P (Y |X)
Pf (Z)
P (Z|XY )
Y
-
-
Pb (Y )
Figure 5. The XY Z factor graph in normal form with the cut on the variable X2 .
possible, but the backward message from the set X2 , X3 , Y cannot be obtained with another split because of the loop. The cutset approach consists in conditioning a number of variables that effectively "open the loops". In our toy example we could condition on X2 that is equivalent to write P (X1 X2 = x2 X3 Y Z)δ(X2 − x2 ), (9) P (X1 X2 X3 Y Z) = x2 ∈X2
(the loop would be open also if we cut on X3 , or on Y ). Now, in solving an inference problem, each element of the sum can be factorized and being solved with the forwardbackward approach. We can ran the inference |X2 | times, and sum the results. Figure 5 shows the result of the split of X2 into variables X2L and X2R that corresponds to the factorization P (X1 X2 = x2 X3 Y Z) = P (X1 )δ(X1 X2L X3 )δ(X2L − x2 )P (Y |X3 )P (Z|X2R Y )δ(X2R − x2 ).
(10)
Given, for istance, an evidence on Z = z, all the distributions can be computed using belief propagation as shown in following.
35
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
P (X 2 = x2 Y Z = z) = P (X1 ) δ(X1 X2L X3 ) δ(X2L − x2 ) P (Y |X3 ) P (Z = z|X2R Y ) δ(X2R − x2 );
X3 X 1
Pf (X1 )
Pb (X2L )
Pf (X3 )
Pf (X2R )
Pb (Y )
Pf (Y )
P (X2 = x2 X3 Z = z) = P (X1 ) δ(X1 X2L X3 ) δ(X2L − x2 ) P (Y |X3 ) P (Z = z|X2R Y ) δ(X2R − x2 );
X1
Pf (X1 )
Pb (X2L )
Pf (X3 )
Y
Pf (X2R )
Pb (Y )
Pb (X3 )
P (X1 X2 = x2 Z = z) = P (Y |X3 ) P (Z = z|X2R Y ) δ(X2R − x2 ); P (X1 ) δ(X1 X2L X3 ) δ(X2L − x2 )
Pf (X1 )
Pb (X2L )
Y
Pb (Y )
Pf (X2R )
Pb (X3 )
Pb (X1 )
Also the marginal for the conditioning variable X2 can be computed by message passing as follows. = z) = P (X2 = x2 Z P (X1 ) δ(X1 X2L X3 ) P (Y |X3 ) P (Z = z|X2R Y ) δ(X2R − x2 ); δ(X2L − x2 )
X1 X 3
Y
Pf (X1 )
Pf (X2R )
Pb (Y )
Pb (X3 )
Pf (X2L )
The above expression means that the only valid value for X2 is that corresponding to x2 . Similarly using the other side of the loop we get the same result computing P (X x2 Z = z) = 2 = P (X1 ) δ(X1 X2L X3 ) δ(X2L − x2 ) P (Y |X3 ) P (Z = z|X2R Y ) δ(X2R − x2 );
Y
X3 X1
Pf (X1 )
Pb (X2L )
Pf (X3 )
Pf (Y )
Pb (X2R )
36
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
-
Pf (XL ) -XL
Pf (XR ) X
Pb (XL )
-XR Pb (XR ) 6
?
?
Figure 6. The cut on a loop cutset variable X.
?
-L ) - Pf (X -XL
Pf (XR ) X
Pb (XL )
-XR
Pb (XR )
?
Figure 7. The cut on a non loop cutset variable X.
3. Cutsets on general Factor Graphs Assume that our factor graph is modeling the joint density of a set of variables T = {X1 , X2 , ..., XN } that take values in the joint alphabeth X1 × ... × XN . There are loops in the graph, but we can find a subset of size M , C = {Xi1 , ..., XiM } ⊂ T (loop cutset) such that, if the branches in C are cut, the system becomes a polytree. The complete density can be written as P (c, T /C)δ(C − c). (11) P (T ) = P (C, T /C) = c∈Xi1 ×...×XiM
The advantage of this representation is that inference on each P (c, T /C) can solved exactly with forward-backward messages. The algorithm is guaranteed to converge to the exact solution in a numer of steps upper bounded by the graph diameter. Each cut-
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
37
set element c is equivalent to launch simultaneusly in the graph M examples out of Xi1 , ..., XiM and wait until convergence. The final result is obtained averaging over all examples. Figure 6 shows a generic loop cutset variable X where Pf (XR ) = δ(XR − x) and Pb (XL ) = δ(XR − x), x ∈ X . If the return messages for c are Pb (XR , c) and Pf (XL , c), the marginal for X is obtained as Pb (Xl , c)δ(X − x), or P (X) = Pb (XR , c)δ(X − x), (12) P (X) = c∈C
c∈C
because the returns are the same being the integration over the same variables (see figure 6). Note that in general a cut can be made on any variable. If the variable is not a loop cutset as in Figure 7, we have
P (X) = c∈C Pb (Xl , c)δ(X − x) c∈C Pb (XR , c)δ(X − x), (13) P (X) = P (X)/|P (X)|. Equation (13) can be then assumed to be the general formula for combining forward and backward returns for any cut variable. The cutset algorithm holds the potential of being deployed in large-scale applications because it can be easily implemented in distributed hardware simply by sampling the cutset variables randomly independently and uniformly. All the variables in the system can keep running averages that are guaranteed to convergence to the true marginals. Consider that many variables in the system may be only loosely dependent on the cutset and they may converge much before the cutset variables have span their whole dataset. The cutset approach in normal factor graphs is somewhat simpler than in Bayesian nets where, for each cutset variable child node, the variable has to be replicated [19]. In the factor graph only a cut is necessary and the samples can be launched simultaneously into the system. The complexity is apparently large if the loop cutset size |Xi1 | · ... · |XiM | is large. However, distributed realizations using reconfigurable hardware such as Field Programmable Gate Arrays (FPGA) may render the approach feasible in real-time. Large numbers of fast hard-wired operations should be the core of an inference system with a guaranteed progressive accuracy towards convergence.
References [1] J. Pearl, Probabilistic Reasoning in Intelligent Systems, 2nd ed. San Francisco: Morgan Kaufmann, 1988. [2] Kschischang F.R., B.J. Frey, H.A. Loeliger, "Factor Graphs and the Sum-Product Algorithm," IEEE Trans. on Information Theory, Vol. 47, N. 2, pp. 498-519, February 2001. [3] G. D. Forney, Jr., "Codes on graphs: normal realizations, " IEEE Trans. Information Theory, vol. 47, no. 2, pp. 520-548, 2001. [4] H. A. Loeliger, "An Introduction to Factor Graphs," IEEE Signal Processing Magazine, pp. 28-41, Jan 2004. [5] H.A.Loeliger, "The Factor Graph Approach to Model-Based Signal Processing," Proceedings of the IEEE, Vol. 95, N. 6, pp. 1295-1322, June 2007. [6] M. I. Jordan and T.J. Sejnowski, eds., Graphical Models: Foundations of Neural Computation, MIT Press, 2001.
38
F. Palmieri / Notes on Cutset Conditioning on Factor Graphs with Cycles
[7] F. Palmieri, "Notes on Factor Graphs," New Directions in Neural Networks with IOS Press in the KBIES book series, Proceedings of WIRN 2008, Vietri sul mare, June 2008. [8] M. Frean, "Inference in Loopy Graphs," Class notes for Machine Learning-COMP421, pp. 1-5, 2008. [9] M. Chertkov and V. Y. Chernyak, "Loop series for discrete statistical models on graphs," Journal of Statistical Mechanics: Theory and Experiment, P06009, pp. 1-28, 2006. [10] M. Chertkov and V. Y. Chernyak, "Loop calculus in statistical physics and information science", Physical Review, E 73, pp. 065102:1-4, 2006. [11] M. Chertkov, V. Y. Chernyak and R. Teodorescu, "Belief Propagation and Loop Series on Planar Graphs," submitted to Journal of Stastistical Mechanics, April 11, 2008. [12] K. Murphy, Y. Weiss, and M. Jordan, "Loopy-belief Propagation for Approximate Inference: An empirical study", Uncertainty in Artificial Intelligence, UAI99, 1999. [13] J. S. Yedidia, W. T. Freeman, and Y. Weiss, "Generalized Belief Propagation." Advances in Neural Information Processing Systems (NIPS), vol. 13, pp. 689-695, December 2000. [14] J. S. Yedidia, W. T. Freeman and Y. Weiss, "Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms," IEEE Transactions on Information Theory, Vol. 51, NO. 7, pp. 2282-2312, July 2005. [15] Y. Weiss, "Correctness of Local Probability Propagation in Graphical Models with Loops", Neural Computation, N. 12, pp 1-41, 2000. [16] Y. Weiss and W. T. Freeman, "On the optimality of the max-product belief propagation algorithm in arbitrary graphs", IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 736-744, 2001. [17] J. Dauwels, H. A. Loeliger, P. Merkli, and M. Ostojic, "On structured-summary propagation, LFSR synchronization, and low-complexity trellis decoding," Proc. 41st Allerton Conf. on Communication, Control, and Computing, Monticello, Illinois, October 1-3, 2003. [18] M. J. Wainwright, T. S. Jaakkola and Alan S. Willsky, "Tree-Based Reparameterization Framework for Analysis of Sum-Product and Related Algorithms," IEEE Transactions on Information Theory, Vol. 49, NO. 5, pp. 1120-1145, May 2003. [19] B. Bidyuk and R. Dechter, "Cutset Sampling for Bayesian Networks," Journal of Artificial Intelligence Research, N.28, pp. 1-48, 2007. [20] P. M. Djuric et al., "Particle Filtering," IEEE Signal Processing Magazine, pp. 19-38, September 2003. [21] I. M. Tienda-Luna, D. P. Ruiz and Y. Huang, "Itarative decoding in Factor Graph Representation Using Particle Filtering," Proceedings of IEEE 6th Workshop on Signal Processing Advances in Wireless Communications, pp. 1038-1042, 2005.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-39
39
Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features Vincenzo CONTI a,1, Barbara LANZAa, Salvatore VITABILE b and Filippo SORBELLO a a
Dipartimento di Ingegneria Informatica Università degli Studi di Palermo Viale delle Scienze, Ed. 6, 90128 Palermo, ITALY {conti, sorbello}@unipa.it b Dipartimento di Biotecnologie Mediche e Medicina Legale, Università degli Studi di Palermo Via del Vespro, 90127 Palermo, ITALY
[email protected] Abstract. The main objective of this work is the comparison between metabolic networks and neural networks (ANNs) in terms of their robustness and fault tolerance capabilities. In the context of metabolic networks errors are random removal of network nodes, while attacks are failures in the network caused intentionally. In the contest of neural networks errors are usually defined configurations of input submitted to the network that are affected by noise, while the failures are defined as the removal of some network neurons. This study have proven that ANNs are very robust networks, with respect to the presence of noise in the inputs, and the partial removal of some nodes, until it reached a critical threshold; while, metabolic networks are very tolerant to random failures (absence of a critical threshold), but extremely vulnerable to targeted attacks. Keywords: Complex Networks; Neural Networks; Metabolic Networks; Robustness and Fault Tolerance comparison.
Introduction In the last years many complex real systems have been described in terms of complex networks: technological networks, such as the Internet and the WWW, biological networks, such as metabolic networks, and even social type networks. The goal about study of complex systems is to deduce the global behaviour of the system through a thorough basic knowledge of their parts. The behaviour about the complex system working can be deal using the complex networks theory. In addition to basic 1
Corresponding author
40 V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features
components, the interactions between the various elements leading to the global behaviour of the entire system are analyzed and evidenced. The empirical and theoretical results on the study of real systems, indicating that complex networks can be divided into two categories [1] based on the probability distribution function P(k), which gives the probability that a node in the network is connected to k other nodes. The first class of networks is characterized by a Gaussian distribution P(k), with a peak centred on the mean value of the degree k; these networks are called exponential. The second class of networks is composed of real networks, such as cellular networks, called scale-free, with the grades distribution function that follows a power law. Recently an enormous interest has been shown for the study of tolerance to errors and attacks both in the scale-free models as well as exponential. The reason for these studies on tolerance of networks comes from two main objectives: the design of new networks as integrated systems in their environment, taking into account the possible presence of any errors and/or attacks, and protection of existing networks through the identification of the most critical and the adoption of countermeasures to reduce the criticality [2]. This paper first describes, in statistical terms, the universal values that characterize, in general, complex networks. Subsequently are shown, briefly, the main points of two types of complex networks that are very important: the Artificial Neural Networks and Metabolic Networks. Finally, the robustness and fault tolerance of these two networks are investigated, analyzing the way in which react when they operate in a non-optimal way.
1.
Related Works
The Theory of Complex Networks is a science that is having a great development in these years, due to the presence of interconnected structures in different areas and in various fields of scientific research. In this direction, studies of Erdös and Rényi, Watts and Strogatz, Barabási and Albert have lead many issues, respectively, of the random networks, small-world networks and scale-free networks [1], which differ for the evolution of the probability distribution function of a degree and/or the value of clustering coefficient. The many applications about artificial neural networks (ANN) made directly possible to test the network prototypes realized and some parameters such as fault tolerance and robustness compared to failures, as in the work of Sandri [15]. Additional research has attempted to verify the robustness of neural networks in advance, using an updated and revised training algorithm for neural networks as in the work of Kerlirzin at al. [14]. To assess, instead, the Metabolic Networks strength, have been used both the results obtained from the general analysis conducted on networks with a scale-free architecture [1][2], and those derived from analysis based on specific structural properties [10] and functional [9] of the metabolic network nodes. All these studies have led to interesting results, which will be summarized in section 3. In this work, on the basis of results from different research areas, the overall characteristics of two major complex networks, called Neural Networks and
V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features 41
Metabolic networks will be shown, trying to summarize their behaviours changing the working conditions.
2.
Complex Networks
Many complex systems can be represented in terms of networks of elements interacting with each other. In general, the concept of network is a schematization of a complex system, consisting of many entities, called nodes, linked among them through connections. The relationships between network elements are fundamental to define the structure of a complex system, in order to be able to predict its behaviour. The statistical measures, to describe the complex networks of various kinds, are the degree of connectivity, the path length and clustering coefficient. The first measure, i.e. the degree of connectivity, said k, is one of the parameters analyzed as highlights, clearly, the characteristics of complex networking. It is defined as that property of a node so as to reveal the number of arcs incident [6]. The degree permits to calculate the distribution function of the connectivity degree P(k), which represents the probability that a node has exactly degree k. It permits to determine the overall behaviour of a network. The second measure is the path length, also known as the diameter D of the graph, which provides an estimate of the average separation between any two nodes of the network. Its value is obtained by calculating the average of the minimum paths (i.e. the shortest paths) between all pairs of nodes, taken once, upon the total number of paths [3]: a node in any network can be reached from any other node through a few steps. Finally, the third measure is the clustering coefficient, which indicates the characteristic of nodes to form agglomerates and it is indicated with the letter C [4]. There are two classes of networks: exponential homogeneous networks and the scale-free networks. Exponential networks are called homogeneous because they have nodes in an almost homogeneous, all having the same influence on the global properties of the network to which they belong. The scale-free networks are defined as unequal in the behaviour of these nodes that is highly heterogeneous: there are few nodes, called hubs, with many links, as opposed to the majority of nodes with very small degree (see Fig. 1).
Figure 1. Models of complex networks. (A) Graph of a exponential homogeneous network, in which the nodes have approximately the same degree. (B) Graph of a scale-free network, with only a few nodes, the hub (red), has many links. Figure taken from [1].
42 V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features
The class of exponential networks is the random networks of Erdös and Rényi, and the small-world networks of Watts and Strogatz. In random networks each node has about the same number of links and each pair of nodes in the network has the same probability p of being connected. In a random network of large size, the degree k is equal to the product of the number N of nodes and the value of the probability p that they are related, k=N*p [5]. The distribution function P(k) is well approximated by the Poisson distribution [3] and will apply the following relationship (1): P(k) = (
k / k! ) e -
(1)
The diameter D of the network tends to smaller values, and it decreases with the probability p with which pairs of nodes are connected. The clustering coefficient C tends to be small, which involves the property that in random networks, the density of local clusters of subnets is low [6]. The small-world networks of Watts and Strogatz, instead, have a high value of clustering coefficient C, and as random networks, they are characterized by a low diameter. Watts and Strogatz showed that the variation of a single parameter, as for example the probability p of nodes connection, allowed moving from a network type to other (see Fig. 2). Analysis of several real systems has shown that many networks in nature are characterized, however, from the following approximate distribution, where the parameter Ȗ represents the coefficient of power-law: P(k) ~ k - Ȗ
(2)
Figure 2. Graphical display of a small-world network in which three clusters of nodes (yellow, green, red) are connected by a few weak links (dashed red line).
If a network is characterized by a power distribution function, there will be a slow decay of the function P(k) when increases the degree of nodes in the network. In such network, called scale-free, there are a large number of nodes whose degree is small, together with a very small nodes number which can assume very high values: these nodes are called hubs. This network type has been defined by Barabási and Albert [1], and is characterized by behaviour of the nodes of the network strongly heterogeneous. Thanks to the presence of hubs, the scale-free model of Albert and Barabási makes small diameter of the network (small-world property), since the link through the network hub allows quick to connect all nodes. In nature there are two main complex networks in which nodes (artificial neurons and metabolites) interact among them to generate the global behaviour: Metabolic Networks and Neural Networks. The next two paragraphs give an overview of the main features of artificial neural networks and metabolic networks.
V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features 43
2.1 Artificial Neural Networks An Artificial Neural Network (ANN) is a complex network that acts as a powerful analysis tool for the computational simulation of the structure, high parallelism, the human brain [8]. According to the network links configuration, two main architectural classes can be introduced: the feed-forward networks and the feedback networks [7]. The feed-forward networks are represented graphically by a graph that contains no reaction coils: neurons receive their inputs only from neurons of the previous layer and send their output only to the layer below. The feed-forward network is the most complex multi-layer, characterized by the presence of intermediate layers (hidden) between the input and output. In the context of ANN, the learning can be seen as the problem to update the architecture of the network, through a modification of the weights of the connections, so the network can effectively perform its specific task. The network learns the weights that must have its link with the repeated presentation of input configurations belonging to a specific class (training set). One of the fundamental problems associated with the use of ANN is on the search for an optimal network, i.e. a structure that is able to achieve high performance together with a satisfactory generalization degree. Generally, a compromise between the performance of the network and the fulfilment of some constraints in the network must be considered in order to obtain the best neural model, capable of achieving a specific task [13]. There are several constraints for the neural model and among these, the most important is the network strength, which is the ability to slow the loss of performance when affected by disturbances, and fault tolerance that may arise configurations of input to the network. 2.2 Metabolic Networks The metabolic network of an organism summarizes the biochemical reactions that enable survival cells. It has some important properties related to conservation of thermodynamic constraints that must be satisfied at each node [9]. A metabolic network is represented graphically by a graph where the nodes are the chemical compounds involved in cellular reactions (metabolites), while the links represent the biochemical reactions, catalyzed by a specific enzyme. To determine which is the topological structure of metabolic networks have been conducted several studies examining the cellular metabolism of different organisms. The results of Jeong at al. [10] led to the conclusion that the metabolic networks are, from a structural point of view, scale-free networks. A scale-free model excludes the existence of functional modules in the network, in fact, because of a few highly connected hubs in scale-free networks, cluster of connected highly nodes between them are absent. But different results from Kitano [11] and Guimerá at al. [12], on functional capabilities of metabolic networks, have demonstrated the existence of a modularization in cellular metabolic network. In other words, they noted the metabolic network of Escherichia Coli (see Fig. 3), obtained from analysis of functional cellular metabolism. Their analysis, on many microorganisms, has revealed that the metabolic networks of all organisms studied are composed by the interconnection of functional modules.
44 V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features
To resolve this apparent contradiction, between the modular structure and grades distribution in power law function (the typical scale-free models), Ravasz at al. [4] introduced a new network model: the hierarchical model (see Fig. 4). This model defines a network in which, in a scale-free architecture, is embedded modularity. Therefore, the structure at several levels of cellular metabolism (modular approach) can be well approximated by the hierarchical model. Finally, the analysis of cellular metabolic networks has suggested that they are scale-free networks containing a number of metabolic patterns (functional modules). Studies on the robustness of scale-free networks [1][2] have shown that these networks, given the heterogeneous nature of their nodes, are tolerant to errors, but very vulnerable to targeted attacks.
Figure 3. Metabolic network of Escherichia Coli bacterium. The groups of nodes of different colours indicate the functional modules identified in the cellular metabolism (corresponding to specific metabolic pathway), which belong to different nodes. The network includes 473 metabolites (nodes) and 574 links. Figure taken from [12].
Figure 4. Hierarchical model of metabolic networks: architecture of scale-free networks with embedded modularity. Figure taken from [4].
3.
Robustness and Fault Tolerance Comparison
The need for a study on the robustness of complex systems comes from the observation that physical systems may be suffered to phenomena that tend to reduce their integrity. The strength of the robustness of complex networks is closely linked to their inherent redundancy: every network has a real variety of different paths
V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features 45
connecting any pair of nodes in the network. The topological structure of a network determines how it responds to malfunction [18]. To evaluate the behaviour of metabolic networks and neural networks in the presence of faults is necessary to make a distinction between the meanings that the concepts of error and attack have in both cases. In the context of metabolic networks errors are random removal of network nodes, while attacks are failures in the network caused intentionally. In the contest of neural networks errors are usually defined configurations of input submitted to the network that are affected by noise, while the failures are defined as the removal of some network neurons (the failure probability is uniform for all network neurons) [1][2][18]. A characteristic often attributed to neural networks is their robustness with respect to failures, more precisely than the partial destruction of their cells [7]. In fact, the distributed displacement of the information in the network, allows a partial destruction of some cells involves only a slight degradation of overall network performance. Moreover, it was shown that no significant alteration of the input values is absorbed by the network, demonstrating a tolerance to errors. A neural network provides a slow loss of accuracy when disturbances affect the parameters of the network itself [13][14]. The behaviour of metabolic networks in the presence of a malfunction of the network nodes regards the following considerations: 1. the analysis of complex cellular networks has suggested that the metabolic networks have important structural properties. In fact, it was shown that they are scale-free [10][16]. 2. The study of scale-free networks has suggested that these networks are tolerant to errors, but very vulnerable to attack, namely the failure of highly connected nodes [1][2]. Metabolic networks, having a scale-free network, are very tolerant to random failures but at the same time highly vulnerable to attacks to hubs. Models of scale-free networks imply that most nodes have few links, therefore, as a result of a random failure in the network, these nodes will have the highest probability of being selected. Their removal does not alter the network structure, and this indicates the absence of a critical threshold and high tolerance of the network to random failures, or the removal of random nodes. Instead, if the most connected nodes, i.e. the hub, are deleted, the network will be disintegrated because a drastic fall in the communication skills of the network has been made possible. To analyze how a network operates in response to random removal (error) or targeted (attack) of a set of nodes, several approaches can be used. Albert at al. in [1] has investigated the robustness of these networks by assessing the variation of the diameter of the network, in relation to the percentage of nodes removed (see Fig. 5). Crucitti at al. in [2] have instead using the variation of the parameter of global efficiency, linked to the presence or absence of links between pairs of nodes. In both cases, however, he has reached the same conclusions: the scale-free networks are tolerant to random removal of nodes, but at the same time are very vulnerable to targeted removal of central nodes of the network (hubs) [1][2]. The various studies on the tolerance of the scale-free networks have revealed that in these networks there is not a critical threshold, existing instead in exponential
46 V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features
homogeneous networks for which, if the number of removed nodes exceeds a critical threshold, the network is disintegrated [17].
Figure 5. Variation of the diameter D of the network as a function of fraction f of nodes removed, both in exponential networks (E) that scale-free (SF). Figure taken from [1].
3.1 Defence Capability of Neural Networks The character recognition through neural networks can be used as common example to test the robustness of this approach. Sandri M. in [15] has trained an ANN to recognize the 26 letters of the alphabet. The network, after the training phase, was able to associate to a given input configuration one of the 26 letters of the alphabet. The training set was composed of 26 letters "clean", i.e. without noise (in this case, the noise could be imperfections in the form of letters). In testing phase Sandri has referred to letters affected by noise, to see if these were properly recognized by the network. In this manner, was tested the tolerance of network to errors. By observing the network results, a high tolerance to noise (around 85%) has been reached. The test on the robustness of the network was continued until it had reached a threshold value, beyond which was the phenomenon known as breakdown. The threshold value was reached when about 50% of the connections between nodes were removed (see Fig. 6).
Figure 6. Identification Efficiency (IE) of the neural network depending on the removal rate of the connections between nodes. As the removal’s percentage is increasing, the network gradually loses its ability to identify, breaking near a critical value (roughly corresponding to the red X).
V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features 47
3.2 Defence Capability of Metabolic Networks In this section, the results on the robustness of metabolic networks of different cellular organisms will be shown, to confirm what has been said about the fault tolerance and vulnerability to attacks of metabolic networks. Jeong at al. in [10] have examined how the metabolic networks of 43 organisms react in the presence of a malfunction of the nodes of the network. First, they have established what the topological structure of the cellular metabolic networks was. The problem to solve was to determine whether the topology of these networks was better described by the exponential model than the uniform or scale-free not homogeneous. Evaluating the function of the degree distribution P(k) of the 43 bodies, they have obtained that in all organisms tested, the probability that a given node (substrate) involved k reactions followed a trend with a P(k) in power-law, typical of scale-free model. An important consequence of the distribution of the connectivity degree to the power law is the presence of a few highly connected nodes (hubs) which have the primary responsibility of the whole network interconnectivity. If these hubs were removed sequentially, the diameter of the network grow in a very fast way [1], compromising the network stability, disintegrating into isolated groups of nodes. The scale-free networks have also proven to be very tolerant to nodes random removal [1][2]. To test whether these properties really belonged to the cellular metabolic networks, they have simulated the removal, both casual and tailored, of the nodes of the metabolic network of Escherichia Coli. Following the removal of more nodes connected a rapid increase in the diameter of the metabolic network of E. Coli was identified, confirming the vulnerability to attacks on the network and the key role that hubs play in his total interconnectivity. Instead, the random removal of a set of M nodes (see Fig. 7) did not lead to big changes on the connectivity among the surviving nodes, indicating a high tolerance to random errors. A value of M equal to 60 corresponds to a removal rate of the nodes of the network of Escherichia Coli metabolism of about 8%. These results were obtained by simulating the removal of nodes from the network of Escherichia Coli metabolite. Similar results to those obtained for the Escherichia Coli bacterium were also obtained for the other 42 bodies, confirming the characteristic scale-free networks of cellular metabolism and, consequently, their high tolerance to errors associated with extreme vulnerability to attack.
Figure 7. Change in diameter of the metabolic network of Escherichia Coli according to the number M of nodes (metabolites) removed. The red triangles indicate how quickly the diameter of the network following the subsequent removal of the most connected. The green rectangles indicate as the average distance between nodes surviving remains broadly unchanged following the removal of random nodes in the network. Figure taken from [10].
48 V. Conti et al. / Neural Networks and Metabolic Networks: Fault Tolerance and Robustness Features
4.
Conclusions
The various results presented here show that the robustness of the networks depends strongly on the nature of the interconnections between the nodes. However, there are also additional parameters to be evaluated (load of the network, functionality of the nodes, ...) which lead to behaviour different from those one that might expect from a purely structural analysis. The ANN has proven to be very robust network, both with respect to the presence of noise in the inputs, and the partial removal of some nodes, until it reached a critical threshold. The metabolic networks have proved very tolerant to random failures (absence of a critical threshold), but extremely vulnerable to targeted attacks: a hub must be protected to ensure the correct network working. The observations made show that the knowledge of the mechanisms by which complex networks self-organize and evolve is crucial for the networks efficiency, security and reliable.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
Albert, R., Jeong, H., Barabási, A. – L: “Error and attack tolerance of complex networks”, Nature, 406, 378-382, (2000) Crucitti, P., Latora, V., Marchiori, M., Rapisarda, A.: “Error and attack tolerance of complex networks”, Physica A 340, 388-394, (2004) Albert, R., Barabási, A. - L.: “Statistical mechanics of complex networks, Review Modern Physics”, vol. 74, 47-97, (2002) Ravasz, E., Somera, A. L., Mongru, D. A. , Oltvai, Z. N., Barabási, A. – L.: “Hierarchical organization of modularity in metabolic networks”, Science, vol. 297, 1551-1555, (2002) Przulj, N.: “Graph theory approaches to protein interaction data analysis”, Technical Report, 322, Department of Computer Science, Toronto, (2004) Bobbio, A.: “La struttura delle reti in un mondo interconnesso”, Mondo Digitale, 4, 3-18, (2006) Brescia, M.: “Cervelli Artificiali. Macchine per simulare la mente”, Le nuove Tessere, Cuen, (1999) Gori, M.: “Introduzione alle reti neurali artificiali”, Mondo Digitale, 4, 4-20, (2003) Mahadevan, R., Palsson, B. O.: “Properties of metabolic networks: Structure versus Function”, Biophysical Journal: Biophysical Letters, L07-L09, (2005) Jeong, H., Tombor, B., Albert, R., Oltvai, Z., Barabási, A. – L.: “The large-scale organization of metabolic networks”, Nature, 407, 651-654, (2000) Kitano, H.: “System biology: A brief Overview”, Science, 295, 1662-1664, (2002) Guimerá, R., Amaral, L. A. N.: “Functional cartography of complex metabolic network”, Nature, 433, 895-900, (2005) Alippi, C.: “Selecting accurate, robust, and minimal feedforward neural networks”, IEEE, vol. 49, 12, 1799-1810, (December 2002) Kerlirzin, P., Réfrégier, P.: “Theoretical investigation of the robustness of multilayer perceptrons: Analysis of the linear case and extension to nonlinear networks”, IEEE, vol.6, 3, 560-571, (1995) M. Sandri, Dipartimento di metodi quantitativi, Università degli Studi di Brescia, http://www.msandri.it Barabási, A. – L., Oltvai, Z.N.: “Network Biology: understanding the cell’s functional organization”, Nat. Rev. Genet.,5, 101-113, (2004) Pastor-Satorras, R., Vespignani, A.: “Epidemic spreading in scale-free networks”, Phys. Rev. Lett., 86, 3200-3203, 2001 Cecconi, F., Caligiore, D.: “Reti complesse e resistenza ai guasti. La rete daPERtutto”, Conferenza GARR_05, Pisa, (maggio 2005)
Chapter 2 Signal Processing
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-51
51
The COST 2102 Italian Audio and Video Emotional Database Anna ESPOSITO1a b, Maria Teresa RIVIELLOa b, Giuseppe DI MAIOc Second University of Naples, Department of Psychology, Caserta, Italy b International Institute for Advanced Scientific Studies (IIASS), Vietri sul Mare, Italy c Second University of Naples, Department of Mathematics, Caserta, Italy a
Abstract. This paper describes general specifications and current status of the COST 2102 Italian Audio and Video Emotional Database collected to support the research effort of the COST Action 2102: “Cross Modal Analysis of Verbal and Nonverbal Communication” (http://cost2102.cs.stir.ac.uk/). Emphasis is placed on stimuli selection procedures, theoretical and practical aspects for stimuli identification, characteristics of selected stimuli and progresses in their assessment and validation.. Keywords. Vocal and facial expression of emotion, audio and video recordings, perceptual assessment.
1. Introduction In a body-to-body interaction, the addressee exploits both the verbal and nonverbal communication modes to infer the speaker’s emotional state. Is such an informational content redundant? Is the amount of information conveyed by each communication mode the same or is it different? How much information about the speaker’s emotional state is conveyed by each mode and is there a preferential communication mode for a given emotional state? In an attempt to give an answer to the above questions several perceptual experiments need to be conducted in order to evaluate the subjective perception of emotional states in the single (either visual or auditory channel) and the combined channels (visual and auditory). In specifying the experimental set-up for facing the above theoretical questions, the authors ended up with a collection of audio and video stimuli that are worth to be described since they may result useful for any practical application related to the automatic classification and recognition of vocal and facial expression of emotion. The general specification and the characteristic of the collected set of stimuli are reported below.
2. Materials The collected data are based on extracts from Italian movies whose protagonists were carefully chosen among actors and actresses that are largely acknowledged by the 1 Corresponding Author: Anna Esposito, Second University of Naples, Department of Psychology, and IIASS, Italy, via Pellegrino 19, 84019, Vietri sul Mare, Salerno, Italy; E-mail: [email protected] .
52
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
critique and considered capable of giving some very real and careful interpretations. The final database consists of audio and video stimuli representing 6 basic emotional states: happiness, sarcasm/irony, fear, anger, surprise, and sadness. We introduced sarcasm/irony to substitute the emotional state of disgust, since after one year of movie analysis only 1 video clip was identified for this emotion. For each of the above listed emotional states, 10 stimuli were identified, 5 expressed by an actor and 5 expressed by an actress, for a total of 60 audio and video stimuli. The actors and actresses were different for each of the 5 stimuli to avoid bias in their ability to portray emotional states. The stimuli were selected short in duration (the average stimulus’ length was 3.5s, SD = r 1s). This was due to two reasons: 1) longer stimuli may produce overlapping of emotional states and confuse the subject’s perception; 2) emotional states for definition cannot last more than a few seconds and then other emotional states or moods take place in the interaction [15]. Consequently, longer stimuli do not increase the recognition reliability and in some cases they can create confusion making the identification of emotions difficult, since in a 20 seconds long video clip, the protagonist may be able to express more than one and sometimes very complex emotions. Care was taken in choosing video clips where the protagonist’s face and the upper part of the body were clearly visible. Care was also taken in choosing the stimuli such that the semantic meaning of the sentences expressed by the protagonists was not clearly expressing the portrayed emotional state and its intensity level was moderate. For example, we avoided to include in the data, sadness stimuli were the actress/actor were clearly crying or happiness stimuli where the protagonist was strongly laughing. This was because we wanted the subjects to exploit emotional signs that could be less obvious but that were generally employed in every natural and not extreme emotional interaction. From each complete stimulus - audio and video - we extracted the audio and the video alone coming up with a total of 180 stimuli (60 stimuli only audio, 60 only video, and 60 audio and video). The emotional labels assigned to the stimuli were given first by two expert judges and then by three naïve judges independently. The expert judges made a decision on the stimuli carefully exploiting emotional information on facial and vocal expressions such as frame by frame analysis of changes in facial muscles, and F0 contour, rising and falling of intonation contour, etc, as reported by several authors in literature [1, 3-5, 78, 11-14, 16-19] ) and also exploiting the contextual situation the protagonist is interpreting. The naïve judges made their decision after watching the stimuli several times. There were no opinion exchanges between the experts and naïve judges and the final agreement on the labeling between the two groups was 100%. The stimuli in each set were then randomized and proposed to the subjects participating at the experiments. The collected stimuli, being extracted from movie scenes containing environmental noise are also useful for testing realistic computer applications. 2.1. Participants A total of 90 subjects participated at the perceptual experiments: 30 were involved in the evaluation of the audio stimuli, 30 in the evaluation of the video stimuli, and 30
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
53
in the evaluation of the video and audio stimuli. The assignment of the subjects to the task was random. Subjects were required to carefully listen and/or watch the experimental stimuli via headphones in a quite room. They were instructed to pay attention to each presentation and decide as quickly as possible at the end of presentation, which emotional state was expressed in it. Responses were recorded on a matrix paper form 60x8 where the rows listed the stimuli’s numbers and the columns the emotional states of happiness, sarcasm/irony, fear, anger, surprise, and sadness, plus an option for any other emotion (where subjects were free to report a different emotional label than the six listed), plus the option neutral that was suggested when according the subject’s feeling the protagonist did not show an emotional state. Each emotional label given by the participants as an alternative to one of the six listed was included in one of listed emotional class only if criteria of synonymity and/or analogy were satisfied otherwise it was included in the class labeled “any other emotion”. 2.2. Specifications Two procedures have been employed to study emotional states: The coding procedure that investigates emotional features eliciting emotional states through actor portrayals, induction or clandestine recordings of naturally occurring emotions; The decoding procedure that evaluates the listeners’ ability to recognize emotional states from speech, gestures and gaze expressions. Whatever was the exploited procedure, emotions and the related perceptual cues to infer them, have always been investigated, to our knowledge, considering separately either only the vocal or the facial emotional expressions and there is a debate on which of them could be preferential with respect to the other in conveying emotional information. In particular, some studies sustain that facial expressions are more informative than gestures and vocal expressions [5-6, 10, 12], whereas others suggest that vocal expressions are more faithful than facial expressions in expressing emotional states since physiological processes, such as respiration and muscle tension, are naturally influenced by emotional responses [1-2, 16-19]. It should also be noted that vocal expressions unfolded along the time dimension because speech is intrinsically a dynamic process, while a long established tradition attempts to define the facial expression of emotion in terms of qualitative targets - i.e. static positions capable of being displayed in a still photograph. The still image usually captures the apex of the expression, i.e. the instant at which the indicators of emotion are most marked. However, in the daily experience, also emotional are varying along time. The data collected allow to compare dynamic visual information with vocal emotional information. The use of video clips extracted from movies (in our case, Italian movies) allowed to overcome two critiques generally moved to perceptual studies of the kind proposed: 1) the stillness of the pictures in the evaluation of the emotional visual information; 2) differently from other emotional database proposed in the literature, the actors had not been asked to produce an emotional vocal expression by the experimenter. Rather, they were acting according to the movie script and presumably their performance had been judged as appropriate to the required emotional context by the movie director (supposed to be an expert).
54
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
3. Stimuli Assessment The following Tables (1, 2, 3, 4, 5, 6) report for each emotional labeled stimulus the percentage of agreement expressed by the subjects participating to the experiments. Figures 1, 2, 3, 4, 5, and 6 provide histograms of the same data for a better visualization. The numbers attached to the stimuli served as reference for reading the histograms and are reported on the x-axis of the figures. Table 1. Subject’s agreement (in %.) of the sadness stimuli under the three experimental conditions
Stimuli Females 1_F11 2_F12 3_F13 4_F14 5_F15 Males 6_M11 7_M12 8_M13 9_M14 10_M15
Audio 77 73 57 80 73 60 83 16 87 63
Sadness Video 73 23 43 37 73 60 60 20 47 43
Audio and Video 67 57 80 57 100 100 100 3 100 90
Table 2. Subject’s agreement (in %.) of the irony stimuli under the three experimental conditions.
Stimuli Females 1_F21 2_F22 3_F23 4_F24 5_F25 Males 6_M21 7_M22 8_M23 9_M24 10_M25
Audio 77 93 67 83 57 80 87 77 40 93
Irony Video 53 83 47 47 47 43 47 43 13 63
Audio and Video 87 100 33 53 100 70 47 77 53 23
Table 3. Subject’s agreement(in %.) of the happiness stimuli under the three experimental conditions.
Stimuli Females 1_F31 2_F32 3_F33 4_F34 5_F35
Audio 10 37 47 57 13
Happiness Video 93 70 90 50 57
Audio and Video 33 90 73 33 30
55
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
Males 6_M31 7_M32 8_M33 9_M34 10_M35
27 63 90 67 73
53 43 77 47 27
67 37 97 47 57
Table 4. Subject’s agreement (in %.) of the fear stimuli under the conditions.
Stimuli Females 1_F41 2_F42 3_F43 4_F44 5_F45 Males 6_M41 7_M42 8_M43 9_M44 10_M45
Audio
Fear Video
Audio and Video
87 33 80 40 50
63 53 47 97 53
100 33 37 37 33
47 90 63 63 50
97 60 50 20 47
100 40 0 67 30
Table 5. Subject’s agreement (in %.) of the anger stimuli under the conditions.
Stimuli Females 1_F51 2_F52 3_F53 4_F54 5_F55 Males 6_M52 7_M52 8_M53 9_M54 10_M55
three experimental
Audio
Anger Video
three experimental
Audio and Video
97 47 63 87 90
73 33 57 70 87
87 50 37 100 70
60 83 83 80 80
80 53 73 63 90
0 53 97 70 37
Table 6. Subject’s agreement (in %.) of the surprise stimuli under the three experimental conditions.
Stimuli Females 1_F61 2_F62 3_F63 4_F64 5_F65
Audio 37 53 20 23 40
Surprise Video 47 20 43 3 63
Audio and Video 50 87 97 40 53
56
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
Males 6_M61 7_M62 8_M63 9_M64 10_M65
47 30 60 50 10
30 63 33 30 40
67 43 50 47 6
Figure 1 Subject’s agreement of the sadness stimuli under the three experimental conditions. The first 5 stimuli were produce by females and the remaining one by males
Figure 2 Subject’s agreement of the irony stimuli under the three experimental conditions
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
57
Figure 3 Subject’s agreement of the happiness stimuli under the three experimental conditions
Figure 4 Subject’s agreement of the fear stimuli under the three experimental conditions.
Figure 5 Subject’s agreement of the anger stimuli under the three experimental conditions
58
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
Figure 6 Subject’s agreement of the surprise stimuli under the three experimental conditions.
4. Discussions and Conclusions
The data displayed in Figures 1, 2, 3, 4, 5, 6 show that most of the selected stimuli result informative about the labeled emotional state. In particular, from Table 1 and Figure 1 it is possible to infer that the sadness’ stimuli numbered 1, 5, 6, and 7, meet the agreement of more than 50% of the subjects under all three different experimental conditions. The sad stimuli 2, 3 ,4, 9, and 10 get a less than 50% of agreement under the video alone condition, but are clearly identified as sad under the audio alone and the audio and video combined. In details it is possible to note that the sad stimuli 1, 2, and 4 (produced by female) are better identified in the audio alone, whereas stimuli 3, 5, 6, 7, and 10 (two are produced by females and 3 by males) are better identified in the combined audio and video condition. For sadness, the video alone condition is less informative than the audio alone, suggesting that dynamic plays a role in the perception of sad visual stimuli. The stimulus 8 seems to be ineffective in conveying sad emotional information. It is confused with sarcasm/irony both in the audio and in the audio and video combined, whereas in the video alone it has not a specific collocation since it is confused with all the other emotional states (confusion matrices are reported in [9]). Such stimulus could be excluded from the database if the interest is to provide algorithms that are able to detect facial and vocal emotional expressions, but could be of interest in a decoding procedure of emotional states since it can highlights which vocal and facial features are rejected by humans as informative of sadness. The percentages of agreements related to the sarcasm/irony show a trend similar to sadness. Again, the video alone condition meets an agreement below the 50% for all stimuli, except n. 2 and 10, whereas both the audio and audio and video combined reached similar percentages of recognition except for stimuli 3, 7, and 10 that are confused with anger and/or surprise (see [9]) in the audio and video combined, and
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
59
the stimulus 9 that is confused with sadness in the audio alone condition. Again, the visual channel seems to be affected by the dynamic of the stimuli, being less informative than the audio. Again, some stimuli did not meet a high percentage of label agreement but are interesting since can suggest which vocal and facial emotional features are used to decode a particular emotional state. Happiness appears to be more effectively transmitted by visual features for females stimuli and by vocal features for males stimuli, whereas the audio and video combined condition does not improve the recognition in both the cases. For happy stimuli portrayed by actresses, the audio and video combined shows a significant misrecognition, compared to the video alone for the stimuli 1, 4, and 5; whereas, for happy stimuli portrayed by actors, the audio alone gives higher percentages of correct identification compared with the audio and video combined for stimuli 7, 8, 9, and 10. These results may suggest that the decoding procedure for happy emotional states is affected by the gender and that the happy emotional features used in the decoding process are different for male and females. Furthermore, it appears that combining audio and video brings to a loss of the happy emotional information that requires precise vocal or visual emotional features according to the gender. For fear, the combined audio and video condition received more agreements than the audio and video alone conditions only for the stimuli 1 and 6 (portrayed by a female and a male respectively) with significant differences in the video alone for the stimulus 1 and in the audio alone for the stimulus 6. Again, it show up that the emotional information is decoded through a nonlinear procedure that does not account for what could be called “emotional redundancy”. It seems instead that adding to a clear fearful stimulus unclear visual or vocal emotional information produces a decrease in the amount of features to be exploited and confusion in the subject’s perception of the emotional state. Audio alone is more informative than video alone for fear stimuli portrayed by males (except for stimulus n. 6) and video alone works better for females (except for stimuli 1 and 3) denoting a gender preference that depends on the exploited channel. Anger is the emotional states that received higher agreements in all the conditions. It shows for the audio and video combined, the same trend observed for fear stimuli. Only for the stimuli 4 and 8 (portrayed by a female and a male respectively) the combined channels reach a higher recognition performance than the single ones. However, when the single channels are analyzed in details, it is clear that there is no gender preference in selecting the channel for decoding the emotional state. The stimuli 2 and 3 are those that received less agreements for the assigned label and were portrayed by females. It worth to notice that, under the combined audio and video condition, the fear stimulus n. 8, and the anger stimulus n. 6, received 0% of agreements (the fear stimulus n. 8 was confused either with sadness or received a label not listed, the anger stimulus n. 6 was confused 70% of the cases with fear), while the same stimuli received a high percentage of label agreement in the audio and video alone condition. Surprise is the emotional state more difficult to be recognized since it received the lower averaged percentage of label agreement in all the conditions. In addition, it seems that there is no a preferential channel to read it. It would be worth to investigate
60
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
if this result depends on the selected stimuli or it depends on the specific emotion, that being so short in itself could not be easily recognized when it is decoded through dynamic stimuli and therefore stillness is preferred. The discussed data show an unexpected trend and posit the basis for new interpretations of the amount of perceptual information that are caught in the three different conditions. According to common sense it should be expected that subjects will provide the highest percentage of emotional labeling agreements in the combined audio and video condition. However, the experimental data and an ANOVA analysis provided in [9] did not support this hypothesis. ANOVA analysis (reported in [9])showed that, as macroscopic tendency, there is not a significant difference in perceiving emotional states either exploiting the audio alone or the combined audio and video, whereas subject’s performance is less effective under the video alone condition. In particular, in the video alone condition, the percentage of labeling agreements is significantly lower than that obtained in the audio alone and in the combined audio and video suggesting that dynamic visual emotional information can be less effective for detecting an emotional states than vocal emotional information. In addition, there is not a significant difference between the amount of emotional information gathered by audio alone and/or the combined audio and video. The audio channel can be considered as preferential since it conveys the same information as the combined audio and video. In some cases it also seems to be able to resolve ambiguities produced by combining information derived from both the channels. For example, the fear stimulus n. 8 and the anger stimulus n. 6 anger, received 63% and 60% respectively of label agreements in the audio alone but were not recognized in the audio and video condition. Even though in a non statistically significant manner, audio beats audio and video for anger, irony and fear, whereas happiness appears to be more effectively transmitted by the video alone than the audio and the audio and video together. The stimuli presented are part of a larger database of about 648 Italian stimuli (216 audio, 216 video and 216 audio and video) and being the first data for Italian they can be of great utility in helping researchers both to develop new algorithms for vocal and facial expression recognition and for cross cultural comparison of emotional decoding procedures. The comparison will lead to new mathematical models describing human interaction, thus permitting new approaches to the psychology of the communication itself, as well as to the content of any interaction, independently from its overt semantic meaning.
Acknowledgements This work has been partially funded by COST 2102 “Cross Modal Analysis of Verbal and Nonverbal Communication”, http://cost2102.cs.stir.ac.uk/ and by Regione Campania, L.R. N.5 del 28.03.2002, Project ID N. BRC1293, Bando Feb. 2006. Acknowledgements go to Miss Tina Marcella Nappi for her editorial help.
A. Esposito et al. / The COST 2102 Italian Audio and Video Emotional Database
61
References [1] R., Banse, K. Scherer: Acoustic profiles in vocal emotion expression. Journal of Personality & Social Psychology 70(3) (1996) , 614-636. [2] K. L. Burns, E. G. Beier: Significance of vocal and visual channels in the decoding of emotional meaning. Journal of Communication 23 (1973), 118–130. [3] J.T. Cacioppo, G.G. Berntson, , J.T. Larsen, K.M. Poehlmann, T.A: Ito: The Psychophysiology of emotion. In J.M. Lewis, M. Haviland-Jones (Eds.), Handbook of Emotions, 2nd edition, 173-191, New York: Guilford Press, 2000. [4] P. Ekman, W.V. Friesen, J.C. Hager: The facial action coding system. Second edition. Salt Lake City: Research Nexus eBook.London: Weidenfeld & Nicolson, 2002. [5] P. Ekman: Facial expression of emotion: New findings, new questions. Psychological Science 3 (1992), 34-38. [6] P. Ekman: The argument and evidence about universals in facial expressions of emotion. In H. Wagner, H., Manstead, A. (eds.). Handbook of social psychophysiology, Chichester: Wiley, 143-164, 1989. [7] P. Ekman, W. V. Friesen: Facial action coding system: A technique for the measurement of facial movement. Palo Alto, Calif.: Consulting Psychologists Press, 1978. [8] P. Ekman, W. V. Friesen: Manual for the Facial Action Coding System, Palo Alto: Consulting Psychologists Press, 1977. [9] A. Esposito: The amount of information on emotional states conveyed by the verbal and nonverbal channels: Some perceptual data. In Y. Stilianou et al. (Eds): Progress in Nonlinear Speech Processing, Lecture Notes in Computer Science, 4392, 245-264, Springer-Verlag, 2007. [10] J. Graham, P.E. Ricci-Bitti, A. Argyle: A cross-cultural study of the communication of emotion by facial and gestural cues. Journal of Human Movement Studies 1 (1975), 68-77. [11] C.E: Izard, B.P. Ackerman: Motivational, organizational, and regulatory functions of discrete emotions. In J.M. Lewis, M. Haviland-Jones (Eds.), Handbook of Emotions, 2nd edition, 253-264, New York: Guilford Press, (2000). [12] C.E. Izard: Innate and universal facial expressions: Evidence from developmental and cross-cultural research. Psychological Bulletin, 115 (1994), 288–299. [13] C.E. Izard, L.M. Dougherty, E.A. Hembree: A system for identifying affect expressions by holistic judgments. Unpublished manuscript. Available from Instructional Resource Center, University of Delaware, 1983. [14] C.E. Izard: The maximally discriminative facial movement coding system (MAX). Unpublished manuscript. Available from Instructional Resource Center, University of Delaware, 1979. [15] K. Oatley, J. M. Jenkins: Understanding emotions. Oxford, England: Blackwell, 1996. [16] K.R. Scherer: Vocal communication of emotion: A review of research paradigms. Speech Communication 40 (2003), 227-256. [17] K.R. Scherer, R. Banse, H.G. Wallbott: Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32 (2001), 76–92. [18] K.R. Scherer, R. Banse, H.G. Wallbott, T. Goldbeck: Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15 (1991), 123–148. [19] K.R. Scherer: Vocal correlates of emotional arousal and affective disturbance. In Wagner, H., Manstead, A. (eds.) Handbook of social Psychophysiology New York: Wiley, 165–197, 1989.
62
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-62
Face Verification Based on DCT Templates with Pseudo-Random Permutations a
Marco GRASSIa, Marcos FAUNDEZ-ZANUYb,1 Department of Biomedical, Electronic and Telecommunication Engineering Università Politecnica delle Marche, Ancona, Italy b Escola Universitària Politècnica de Mataró (Adscrita a la UPC) Mataró, Spain
Abstract. Biometric template security and privacy are a great concern of biometric systems, because unlike passwords and tokens, compromised biometric templates cannot be revoked and reissued. In this paper we present a protection scheme for a face verification system based on a user dependent pseudo-random ordering of the DCT template coefficients and MPL and RBF Neural Networks for classification. In addition to privacy enhancement, because a hacker can hardly match a fake biometric sample without knowing the pseudo-random ordering this scheme, the proposed system also increases the biometric recognition performance. Keywords. Biometric recognition, security, privacy, face recognition.
Introduction Biometric template security is an important issue because unlike passwords and tokens, compromised biometric templates cannot be revoked and reissued. Thus, there is a strong interest on the possibility to cancel and replace a given biometric data when compromised. If a biometric is lost once (illegally acquired by a hacker), it is compromised forever, hence this information may need to be protected for a very long time [1]. An ideal biometric template protection scheme should possess the following properties [2]: 1. Diversity: the secure template must not allow cross-matching across databases, thereby ensuring the user’s privacy. 2. Revocability: it should be straightforward to revoke a compromised template and reissue a new one based on the same biometric data. 3. Security: it must be computationally hard to obtain the original biometric template from the secure template. This property prevents a hacker from creating a physical spoof of the biometric trait from a stolen template. 4. Performance: the biometric template protection scheme should not degrade the recognition performance (identification or verification rates). For the next years, we are evolving towards ambient intelligence, pervasive networking or ubiquitous computing, which have special characteristics [1]. In addition 1
Corresponding Author: Marcos Faundez-Zanuy, Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303 Mataró (Barcelona), Spain. E-mail: [email protected]
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
63
there is a gradual erosion of the computational difficulty of the mathematical problems on which cryptology is based, due to developments in computation (progress in electronics and in the future in optical and maybe even quantum computing). This increases the vulnerability of biometric systems [3]. Encryption is not a smooth function [2] and a small difference in the values of the feature sets extracted from the raw biometric data would lead to a very large difference in the resulting encrypted features. While it is possible to decrypt the template and perform matching between the query and decrypted template, such an approach is not secure because it leaves the template exposed during every authentication attempt. Thus, standard encryption techniques are not useful for securing biometric templates. The solutions proposed in the literature can be split into two categories [3]: x Feature transformation. x Biometric Cryptosystems. We will describe these categories in the next sections.
1. Feature transformation A transformation function Y f ( x ) is applied to the biometric information and only the transformed template is stored in the database. The parameters of the transformation function are typically derived from a random key or password. The same transformation function is applied to the test signal and the transformed query is directly matched against the transformed template. Feature transformation can be divided into salting and non-invertible transforms. In salting Y f ( x ) is invertible. Thus, if a hacker knows the key and the transformed template, he can recover the original biometric template, and the security is based on the secrecy of the key or password. This is the unique approach that requires a secret information (key). This is not necessary in the other categories. The second group is based on noninvertible transformation systems. They apply a one-way function on the template and it is computationally hard to invert a transformed template even if the key (transform function) is known. The main drawback of this approach is the trade-off between discriminability and noninvertibility of the transformation function. Transformed features belonging to the same user should have high similarity after transformation, while features from different users should be quite different. While our intuition seems to suggest that it is very easy to design a function that is “easy” to compute but “hard” to invert, so far the best theoretical results can prove that there exists functions that are twice as hard to invert as to compute [4]. It is clear that such functions would be completely useless to practical cryptology [1]
2. Biometric Cryptosystems In this approach some public information about the biometric template is stored. This public information is called helper data. For this reason they are also known as helper data-based methods. While the helper data does not reveal any significant information about the original biometric template, it is needed during the matching to extract a cryptographic key from the input signal. Matching is performed indirectly by verifying
64
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
the correctness of the extracted key. In order to cope with intra-user variations, error correction coding techniques are usual. Biometric cryptosystems can also be split into two groups: key binding and key generation systems depending on how the helper data is obtained. When the helper data is obtained by binding a key (that is independent of the biometric features) with the biometric template, it is known as key-binding biometric cryptosystem. In the second case, the helper data is derived only from the biometric template and the cryptographic key is directly generated from the helper data and the query biometric features. This second approach is known as key generation biometric cryptosystem [5]. Authentication in a key-binding biometric cryptosystem is similar except that the helper data is a function of both the template and the key K. Hybrid techniques combining the previous four approaches are also possible.
3. Pseudo-random permutations In this paper, in order to secure the templates, we use a different random permutation of template coefficients for each person. Thus, our proposal corresponds to salting feature transformation described in section 1.1. Figure 1 shows the diagram for the proposed approach. The template coefficients are equal to the DCT components of the two dimensional transform of the face image [6]. The permutation order is different for each person (although more than one permutation per person is possible) and it is given by a Key, which must be kept secret. The advantages of this strategy are the following: x There is an increase on privacy, because it is impossible (computationally very hard) to obtain the face image without knowing the permutation order. If a feature vector template has N coefficients, the number of permutations of these coefficients is equal to N! x There is an improvement on the recognition rates because an impostor does not know the correct permutation order. This is similar to the privacy achieved by CDMA (Code Division Multiple Access) used in some mobile telephone standards in order to secure the communications. If you do not know the correct order (provided by the pseudo-random frequency hopper order) you cannot decode the message. Obviously if the impostor knows the permutation order then he can sort his/her feature vector and then the protection is equal to the biometric system without permutation. Anyway, it is as difficult to get the Key as in other security systems (VISA number, password, PIN, etc.). x In contrast to some encryption systems, where the template must be decrypted before comparison, in our approach there is no need to re-order the coefficients. They can be directly compared.
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
65
Figure 1. Scheme for proposed system. In the enrollment phase, the biometric template X n coefficients are permuted according to the secret key K. This permuted template is stored in the database. In the test phase a noisy version of the biometric template Y n is acquired, and the user provides his/her Key. The permuted F (V ) template is matched with the template of the claimed identity. If distance ( F( V ), F ( S )) threshold , the user is accepted. Otherwise he/she is rejected.
Although this is a very simple approach, it has a main advantage: experimental results are always better or equal than the baseline biometric system without encryption. This is not the case with a large majority of existing systems in the literature, where intra-user variability is hard to manage and provides an increase in False Rejection Rates.
4. Experimental Results for Face Recognition System The proposed protection schema has been applied to an identity verification system based on biometric face recognition [7]. In the system, the DCT has been used for features extraction and neural networks (both MLP and RFB) have been used for classification [8]. Figure 2 shows some snapshots of one user of this database. The AR database [9], used for the experiments, is a publicly available database (http://cobweb.ecn.purdue.edu/RVL/ARdatabase/ARdatabase.html) of 126 individuals, with 26 images of each, taken in two different sessions at a time distance of two weeks, varying the lighting and the facial expression. We have used 10 of the 26 images (5 images as enrollment templates and 5 different ones for testing the system), excluding the ones overexposed and the ones in which the face was partially occluded by sunglasses or scarves, of 117 of the 126 individuals, those of the rest being either not complete or not available. All the images have been cropped and normalized to 64x74 grey ones.
66
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
Figure 2. AR database samples of one person
We have used as verification error the minimum Detection Cost Function, to deal with a trade-off between two different kinds of possible errors in verification: missed detection (those situations where a user is incorrectly rejected) and false alarms (those situations where an impostor is accepted), which has usually to be established by adjusting a decision threshold. The DCF is defined by [10]:
DCF
C
Miss
u PMiss u PT arg et C FalseAlarm u PFalseAlarm u 1 PT arg et
where C Miss and C FalseAlarm represent respectively the cost of a missed detection and of a false alarm, PMiss and PFalseAlarm represent respectively the Miss and the False Alarm probability and PT arg et represents the a priori probability of the Target to be known. An error type weighting of 1:1 and an equal probability that the target be known or unknown ( C Miss = C FalseAlarm =1; PT arg et =0.5) have been chosen. The DCT (Discrete Cosine Transform) has been used for characteristics extraction, we obtain one model vector from each training image. A vector dimension of N’xN’=10x10=100 coefficients has been chosen in order to grant at the same time a fast computation (the number of coefficients corresponds to the number of the input neurons for the neural networks used as classifiers), good performances in recognition and in security through an elevate number of possible permutations (100! means more than 10157 permutation). Experiments have been done to compare the performances with or without a pseudo-random permutation of the DCT’s coefficients, using MLP and RBF neural networks, as in the following. 4.1. Single Multi Layer Perceptron (MLP) A three layer perceptron with a variable number of hidden neurons has been used in the simulations, with a scaled conjugate gradient back-propagation algorithm, that is based on conjugated directions, setting up the following parameters: Number of epochs = 15000, Input neurons = 100, Hidden layer neurons = 10 : 10 : 150, Output layer neurons = 117 neurons (one for each person), Performance function: regularized mean square error (MSEREG) [11]
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
67
The results, in Figure 3, show how the application of the pseudo-random permutation leads to a significant improvement of system performances, with verification error that goes to 0 using more than 100 neurons.
Figure 3. Verification rates using a MLP-NN, as function of the number of neurons, with 100 DCT coefficients, with and without pseudo-random permutation.
4.2. Radial Basis Function Neural Network Radial Basis Function Neural Network can require more neurons than a standard feedforward back-propagation networks, but they can be trained in a fraction of time needed by standard feed-forward networks. In the simulations a RBF-NN has been used, with Gaussian function, applying the same training methodology of the MLP neural network, setting up the following parameters: x RBF neurons: 10 : 10 : 200 x Output layer neurons: 117 x Spread: 1.5 : 0.25 : 4 Also in the case of a RBF-NN, Figure 4, the application of permutated template lead to a significant performance improvement, using a number of neurons greater than 40, completely avoiding verification errors for more than 110 neurons. Interesting results appear also from a training time comparison, with and without the application of the pseudo-random permutation of the DCT coefficients [Figure 5]. For the RBF-NN the permutation of the coefficients doesn’t lead time training differences. In the case of the MPL-NN, while for a low number of neurons the use of the coefficients without permutation requires a slightly minor training time, for a number of neurons superior than 80 the permutation of the coefficients leads to a great improvement in terms of computational time.
68
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
Figure 4. Verification rates using a RBF-NN, as function of the number of neurons, with 100 DCT coefficients, with and without pseudo-random permutation
Figure 5. Training times for the MLP and the RBF (medium time for each spread), as function of the number of neurons, with 100 DCT coefficients, with and without pseudo-random permutation.
5. Conclusions In this paper we have presented an authentication system through face identity verification that uses a simple approach for biometric template protection that belongs to the salting feature transformation. The unique drawback of this approach is that it requires a secret key. If it is compromised, the biometric template can be exactly recovered. Nevertheless, the chance to crack the secret key is at least as difficult as to obtain the VISA number, PIN, or password of a classical security system, taking in account that if the key is kept secret the number of different permutations equals 100!, which is a very large value for typical face template sizes. In addition to security enhancement, there is always an improvement on the biometric verification errors which are dramatically reduced, both for an MPL and RBF neural network, and on the training time for the MPL neural networks. This is not the
M. Grassi and M. Faundez-Zanuy / Face Verification Based on DCT Templates
69
case of a large majority of the proposed methods in the literature, where privacy implies a degradation of biometric (FRR is increased) and computational performances.
6. Acknowledgements This work has been supported by the Spanish project MEC TEC2006-13141-C03/TCM and COST-2102.
References [1] B. Preneel, ENCRYPT: the Cryptographic research challenges for the next decade, In Security in communication networks, 4th international conference, SCN 2004, Lecture Notes in Computer Science 3352, C. Blundo and S. Cimato Eds., Springer-Verlag, pp.1-15, 2005. [2] A.K. Jain, K. Nandakumar, A. Nagar, Biometric template security. Eurasip journal on Advances in Signal Processing. Special issue on Biometrics. pp.1-20, January 2008. [3] M. Faundez Zanuy, Biometric security technology, IEEE Aerospace and Electronic Systems Magazine, 21 (6), pp.15-26, June 2006. [4] A.P.L. Hiltgen, Constructions of feebly-one-way families of permutations, Advances in Cryptology , Proc. Auscrypt’92 LNCS 718, J. Seberry and Y. Zheng Eds. Springer-Verlag 2001, pp.422-434. [5] P. Tuyls, J. Goseling, Capacity and examples of template-protecting biometric authentication systems, in Biometric Workshop (BioAW 2004), Lecture Notes in Computer Science 3087, pp. 158-170 Prague, 2004. [6] N. K. Ratha, S. Chikkerur, J. H. Connell, R. M. Bolle, Generating cancellable fingerprint templates, IEEE Trans. On Pattern Analysis and Machine Intelligence 29 (4), pp.561-572, April 2007 [7] A. F. Abate, M. Nappi, D. Riccio, G. Sabatino, 2D and 3D Face Recognition: A Survey, Pattern Recognition Letters 4, December 2006. [8] M. Grassi, M. Faundez-Zanuy, Face Recognition with Facial Mask Application and Neural Networks, Proceeding of IWANN 2007, LNCS 4507, Springer-Verlag, pp. 701-708. [9] A. M. Martinez, Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class, IEEE Transaction On Pattern Analysis and Machine Intelligence 24 (6), pp. 748-763, June 2002 [10] A. Martin, G A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, European Speech Processing Conference (Eurospeech 1997), Vol.4, pp. 1895-1898,. [11] M. Faundez-Zanuy, Face Recognition in a Transformed Domain, IEEE Proceedings 37th Annual International Carnahan Conference On Security Tecnology, 2003, pp.290-297
70
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-70
A real-time Speech-interfaced System for Group Conversation Modeling Cesare Rocchi, Emanuele Principi, Simone Cifani, Rudy Rotili, Stefano Squartini and Francesco Piazza 3MediaLabs, DIBET, Università Politecnica delle Marche, Via Brecce Bianche 1, 60131 Ancona, Italy Abstract. In this paper, a speech-interfaced system for fostering group conversations is presented. The system captures conversation keywords and shows visual stimuli on a tabletop display. A stimulus can be a feedback to the current conversation or a cue to discuss new topics. This work describes the overall system architecture and highlights details about the design choices of the overall system, with a particular focus on the real-time implementation issues. A suitable speech enhancement front-end and a keyword spotter have been integrated on a common software platform for real-time audio processing, namely Nu-Tech, resulting in a helpful and flexible architecture for real-world applications in group conversation modeling scenarios. Such system characteristics, jointly with some experimental results obtained from simulations on recorded speech data, seem to confirm the efficacy of the approach motivating the development of further features and the experimentation in new scenarios. Keywords. Conversation modeling, Tabletop, keyword spotting.
Introduction Speech recognition technology has reached such a reliability to be employed in heterogeneous scenarios, from consumer electronics to industrial applications. In tabletop scenarios, the use of speech has been investigated by few contributions: DiMicco and colleagues [1] identified the role of real time visual feedback on tabletop surfaces. A similar study has been conducted by Sturm et al. [2]. The use of speech recognition in a brainstorming scenario was investigated by Hunter and Maes [3]. In this paper, the application of a speech-based human-machine interface in a group scenario is proposed. The group is supposed to sit around a table, where graphical elements like words and pictures are projected. The display of graphical elements is aimed at enriching the experience of the group during the conversation. By capturing conversation keywords via an appropriate speech-interface ([11]), the application can propose stimuli (e.g. floating words and pictures) to foster or support a conversation. In a previous work [4], a Wizard of OZ study was made to investigate the impact of displaying visual stimuli related to the conversation on a tabletop surface. Results showed that people recognize visual stimuli as related to the current discussion. Sometimes people make also reference to images or words during the conversation itself. This encouraged us to develop a system that automatically tracks the topic of the conversation to inform the choice of visual stimuli to
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
71
Figure 1. System Architecture.
be displayed on the tabletop. In [5], the use of a keyword spotter was introduced. This paper is particularly focused on the architecture of the system and its implementation. We illustrate the modules that compose the system and provide a detailed description of each one, along with the choices related to design. The criteria that driven the design are mainly two: non-invasivity and effectiveness. The first criterion impacted on the choice of the presentation mean, the tabletop, and the recognition module based on an array of distant talk microphones. This way users are sitting around the table, which is a common tool, and are not forced to wear microphones to interact with the system The second criterion, effectiveness, impacted on the implementation of the overall system, which has to work real time to allow quick reaction to the changes during the conversation (e.g. pauses, topic shifts, etc.) We provide details about early experimentation, implementation of the modules and finaly we introduce new scenarios suitable to the application of such a system.
1. Description of the system The overall system is designed along three layers: Perception, Interpretation and Presentation (Figure 1). The Perception layer is meant to capture the ongoing situation around the table. The Presence Detector in the Perception layer determines the states of the system. There are three possible states which are considered relevant to model: • Presence: detects whether people are sitting around the table. • Ordering: detects whether people are talking to the waiter to order beverages. • Conversation: detects whether people are having a chat.
72
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
If people are not sitting around the table, the system can be set into an ‘idle’ modality, e.g. a sort of screensaver which tries to convince people to stop by. While people are ordering it is important to not distract them, so the speech recognition can be turned off. As an alternative, in future versions, the ordering can be done by interacting with the table itself. The latter state is the most important, for it enables the perception of the content shared by the users during the conversation. 1.1. Interpretation Layer The Interpretation module is concerned with the detection of the conversation occurring in the group. Since the application is real time we opted for a shallow semantic representation. Altough an ontology would enable a richer reasoning on the conversation the choice of a shallow representation has been made according to two aspects: (i) quick processing and (ii) easier portability to new domains. The knowledge base exploited to track the conversation is organized according to a semantic map, which we devised to structure the content of our application domain. The semantic map is represented in terms of topics. A topic has associated a subset of the keywords taken from the grammar. Each topic has also a relevance value. Relevance is an attribute dynamically updated by the interpretation module to track the topic of the ongoing conversation. At the beginning all the topics have relevance equal to zero. An excerpt of a topic is represented below. topic id : 001 keywords : key1,key2, ... relevance: 0 This organization of the knowledge allows inferring relations between topics, which we calculate by intersecting the keywords of each topic. The bigger the set of keywords shared by two topics, the tighter the semantic relation between the topics. We exploit such information both during the interpretation of the conversation and the selection of the stimuli to be shown on the tabletop interface. The interpretation module updates the semantic map according to the keywords recognized correctly. The updating of the semantic map is made according to the keywords. If a topic contains a correctly detected keyword it is updated by adding one to the previous relevance value. This mechanism allows updating the model of the conversation only on its “positive” side, that is, when people are talking about something related to the domain. We are also interested in tracking when the conversation is “out of topic”. To enable such a functionality we integrated a decrease function. Such a function is run periodically and decreases by 0.1 the relevance of each topic greater than zero. Table 1 represents an example of processing in the interpretation module. The left column represents temporal moments, the middle contains the actions performed by the system and the third shows variation of the relevance value of a topic. At the beginning, t0, the relevance is zero. At t1 ‘key1’ is detected and the relevance of the topic is raised to 1. In t2, t3 and t4 the effect of the decrease function is shown, as the relevance is progressively reduced. Another keyword is detected at t5 and the relevance goes up to 1.7. At t6 the decrease function reduces again the relevance value. This mechanisms enables a fine grained tracking of the conversation’s topics, which can be exploited by the presentation layer to select stimuli according to the context detected.
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
73
Table 1. An excerpt of the Interpretation process. Time
Action
t0
start
Relevance value
t1
detected key1
t2 t3
decrease function decrease function
t4
decrease function
0.7
t5 t6
detected key2 decrease function
1.7 1.6
0 1 0.9 0.8
1.2. Presentaton Layer The Presentation module dynamically selects stimuli according to the status of the conversation. The primary resource analyzed by this module is the conversation model. The first check performed is related to the conversational state. If the relevance of at least one topic is greater than a given threshold the conversation is on topic, that is people are discussing something related to the domain of the application. If the relevance of each topic is below the threshold the conversation state is “out of topic”, that is users are talking about something unrelated ot the domain. Another resource exploited by the presentation module is a repository of stimuli. By stimulus we mean a graphical object (e.g. word, text, picture) that floats on the tabletop surface. Each stimulus is classified in terms of topics and can be dynamically selected to support the current conversation or foster the discussion of new topics. Each stimulus is classified in terms of topics and keywords. An example of stimuli classification is represented below. stimulus id : stim01 type : image source : image.jpg topics : topic01 keywords : key01,key04,key05 stimulus id : stim02 type : word value : "word sample" topics : topic02, topic03 keywords : key02,key03,key05 The first stimulus is of type image, is associated to topic01 and it is related to three keywords. The second stimulus is a word, associated to two topics and three keywords. The choice of the stimuli to be displayed varies according to the context of the conversation. During some tests with a random selection of stimuli we discovered that some inconvenient situation are to be avoided. In particular, when more than one topic is relevant to the conversation, the surface of the table can be crowded with too many stimuli. During early experimentations we decided to limit to six the maximum number of stimuli shown on the table.
74
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
Figure 2. Speech Processing Unit scheme.
Figure 3. Speaker Diarization scheme.
Besides random selection we also implemented a different type of strategy, which exploit data about the history of the presentation, to avoid the repetitive display of the same stimuli. A third resource used by the presentation module is the history of the conversation, that is the list of the topics already discussed. This allows reasoning on the progression and timing of the conversation. For example the system can detect that a topic has alredy been extensively discussed and might propose new stimuli to move the conversation to a new argument. 1.3. Speech Processing Unit While conversation around the table occurs, speech signals from all the speakers are acquired and go through three processing blocks: the Multichannel Front-End, the Speaker Diarization block, and the Keyword Spotter (Fig. 2). The Multichannel Front-End main purpose is noise reduction. Since the proposed application is meant to be used in a museum bar, audio interferences such as babble noise can degrade the performance of the recognizer. As shown in [6], speech recognizers performance exhibit significant benefits from the use of multi-microphone noise reduction algorithms as pre-processing stage. The Speaker Diarization block detects the presence of crosstalk and overlapping speech in the input signals [7]. Such information is used to control the Speech Recognizer: for example, the Speaker Diarization block could stop the recognition when overlapping occurs to avoid possible recognition errors. As shown in Fig. 3, each input frame is classified as “speaker alone” (S), “speaker plus crosstalk” (SC), “crosstalk” (C) or “silence” (SIL). Such classification is performed in two main processing stages:
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
75
• a feature extraction stage, in which meaningful information are extracted from the speech signal (e.g. energy, kurtosis, cross-channel correlation); • a classification stage, in which one eHMM for each channel classifies the input frames and constraints are dynamically applied to avoid illegal combinations (e.g. more than one channel classified as “S”). 1.4. Keyword spotting Keyword spotting is exploited instead of a full transcription because the goal is to detect only selected keywords related to the domain. One of the option to perform keyword spotting is the use of a speech recognizer which integrates a garbage model [8] – a specially trained Hidden Markov Model which captures non-keyword events – in its acoustic model. The speech recognizer is also required to be speaker independent, since it must recognize multiple speakers at time. The speech recognition engine adopted was VoCon 3200 by Nuance [10], a speaker independent phoneme-based continuous speech recognizer. The output of the recognizer are the N-Best spotted keywords sorted by their confidence, a measure which expresses the reliability of the recognition result. It represents a normalized score in the range 0, 10000, where 0 indicates very little confidence. In our implementation, only the best scoring keyword is used, provided its confidence exceeds a predefined threshold. Early experimentations on the system shown that the performance is negatively affected by false acceptances. This lead us to choose the threshold for which the false acceptance rate is equal to 0%. Keywords are specified by supplying the recognizer a grammar in Backus-Naur Form, along the following scheme:
<start>: <...> <...>; : word1 | word2 | ... | word N ;
where the symbol <...> denotes the garbage model and the rule specifies the keyword list to be recognized. To further improve the performance of keyword spotting, the grammar was augmented with antikeywords, that is out-of-vocabulary (OOV) words acoustically similar to the keywords to be recognized. 1.5. Real-time implementation Real-time implementations of the multichannel Front-end and the keyword spotting stage have been carried out with the NU-Tech framework [9]. NU-Tech allows the developer to concentrate on the algorithm implementation without worrying about the interface with the sound card. The ASIO protocol is also supported to guarantee low latency times. NU-Tech architecture is plug-in based: an algorithm can be implemented in C++ language to create a NUTS (NU-Tech Satellite) that can be plugged in the graphical user interface. Inputs and outputs can be defined and connected to the sound card inputs/outputs or other NUTSs. The use of a commercial ASR motivates us to perform the necessary noise reduction operations by means of a preliminary stage, namely the multichannel Front-end, as shown in Fig. 4. The multichannel Front-end consists of a collection of algorithms for noise reduction purposes joined together by a highly flexible and scalable software ar-
76
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
Table 2. List of available Front-end Algorithms. For a better description of the involved algorithms please see [6] and references therein. Beamforming
RTF Estimation
Speech Enhancement Filter
Noise Estimator
D-GSC
ATF ratio
LSA
MCRA
TF-GSC
RTF Offline
OM-LSA
IMCRA
RTF Online
Psychoacoustic MMSE M.channel OM-LSA
RL-MS
M.channel Psychoacoustic MMSE
chitecture, which takes the form of a computer library, written in C++. Two basic stages can be recognized in the system, the first hosting a beamformer and the second hosting certain adaptive filter, to be used in cascade scheme or separately, without restrictions on the type of algorithms. The configuration file allows the user to define the system and specify the parameters. The following options are available at the moment: Actually, the Front-end provides a Matlab and a NU-Tech interface: the former, equipped with other scripts, is specifically designed for massive simulations, while the latter allows for real-time processing. Computational load strictly depends on the frontend configuration and parameters, such as sampling frequency and number of channels and, as example, may range from 3-4% CPU load for a single-channel speech enhancement filter up to 15-20% for a complete 5-channel configuration, assuming a sampling rate of 16 kHz and an Intel Core2 Duo 2.2 GHz processor. Regarding the Speaker Diarization module, the training of classification algorithm parameters is typically performed offline on a suitable audio database, whereas the classification itself is performed real-time. Feature extraction and classification algorithms are not currently integrated in a NUTS, but future works will address this topic. Currently, algorithms are implemented in Matlab and experiments showed that classification speed fulfills real-time constraints. The planned plug-in will take as inputs the speech from each channel and the detection results will be used to control the keyword spotting module. The NUTS integrating VoCon 3200 has two inputs (Fig. 4): the first input is relative to the speech audio signal. The second input can be used by an external Voice Activity Detector to control the recognition. In the current implementation, such input is not exploited and VoCon 3200 internal VAD is used instead. The implemented plug-in processes the speech signal frame by frame. Frame dimension can be specified in the NU-Tech settings page. It is advisable to supply VoCon with a frame of size multiple of its internal frame. This results in a slightly more efficient execution. While feeding the engine with the incoming frames, multiple events can occur: for example, a VAD event (begin of speech, end of speech), a gain request event, or a result event. In the latter case, a result is produced and an output string with an associated confidence value is obtained. The output string is sent via TCP/IP to a remote client implementing the Interpretation layer algorithms. The string is of the form: “keyword,confidence”, where “keyword” is the recognized keyword and “confidence” is the associated confidence value used discard or accept the recognition. VoCon 3200 configuration parameters are exposed in the NU-Tech user interface. Additional parameters specific to the NUTS implementation (e.g. the client IP address) are also configurable through the user interface.
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
77
Figure 4. Front-end and VoCon 3200 based keyword spotter in the NU-Tech graphical user interface.
Figure 5. Acquisition setup: every speaker’s mouth was about 40 cm distant from the microphone in front.
2. Experiments Experiments were carried out on speech data acquired at 3MediaLabs audio laboratories. Acquisition consisted of a four persons (three males and one female) informal chat arranged around a rectangular table (Fig. 5). To acquire a natural conversation, the four persons were asked to read a newspaper article about politics and then to discuss it together. Keyword spotting was performed on a 5 minutes and 20 seconds long extract of the chat. The acoustic model adopted was the “full 16 kHz in car”, optimized for car envi-
78
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
ronments, but with good performance in office environments too [10]. Keywords selection was made accordingly to the topic of the chat. A total of 48 distinct keywords were chosen for recognition, and 5 antikeywords for each of the 48 keywords were added to the grammar. VoCon engine was tuned along two features: VAD parameters and garbage model “sensitivity”. The VAD was setup to detected end of speech, with a 200 ms trailing silence value. The garbage sensitivity parameter determines how easily the garbage model matches the speech signal, and lies in the range 0, 100. The “optimal” value is strictly related to the in grammar and to the content of utterances. Before the implementation of the system, a Wizard of OZ study has shown that people have a positive reaction to this novel scenario [4]. An especially encouraging aspect has been the result, obtained via qualitative interviews, that people sometimes feel the table as “following” the conversation. An informal analysis of data collected during the WOZ study has led to define 3/4 correctly recognized keywords per minute as a sufficient input to the Interpretation layer in order to determine the topic of the conversation and to inform the choice of stimuli. Another research result, based on a partially implemented version of the system, confirms this observation [5]: when the keyword spotter provides 3/4 correct keywords per minute (or no keywords when out of topic), the Intepretation layer correctly detects the topic and the Presentation layer shows stimuli related to the topic currently discussed. As so, people recognizes the stimuli as “connected” to the subject of the conversation. The measured performance of the keyword spotter mathes the requirement of the Interpretation layer to correctly detect the topic of the conversation. In fact, the proposed system is able to recognize almost 5 correct keywords per minute, with a false acceptance rate equal to 0% [11].
3. Discussion There is a growing interest in tabletop surfaces and innovative uses of such a technology. The main focus of the tabletop community is on interaction, that is the way people manipulate the content displayed on the surface. The goals of this research field are to achieve a natural interaction and to find the appropriate way to use such a technology to solve problems related to group scenarios. For example in a meeting at the workplace the tabletop can provide feedback to the participants or help to organize the agenda and keep track of the decisions made. Researchers in the field of speech recognition are focused on achieving a similar goal, that is to improve interaction by enabling human machine communication using one of the most natural interaction means, the speech. One of the goals pursued by the speech community is to enable machine to receive commandsand to recognize them in different context and situations. Recently the focus of the research has moved to mobile devices, which are quickly spreading worldwide and used daily in many contexts. In this paper we described our attempt at coupling speech and tabletop technologies applied in a group scenario, to provide context-aware feedback to the group via the tabletop. Altough we focus the work in a scenario where people are sitting around a table, we also foresee contexts where the group is standing in front of a wall and the feedback in projected right on it. In both cases the key requirement is that the technology is not invasive, for the scenario has to be as natural as possible. Group is a vague concept. A group can range from two to many individuals, which might know each other. In this scenario we focus on groups whose elements know each
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
79
other and share some past experience, like friends, families or workgroups. To provide the “best” content according to the current situation the concept of context has also to be specified. For example in the context of the meeting at the workplace, the needs of the group can be: to stick with the agenda, to keep track of the decisions made and to balancethe contributions of the participants. In other settings, like friends doing shopping or families visiting a museum we can not even talk about “needs”, though technology can provide some service, like recommending some item to purchase or some artwork to visit. In the field of Computer Supported Cooperative Work (CSCW) the research is focused on the needs of the group with the goal of optimizing the collaboration between participants. In such a scenario applications can provide feedback to affect group dynamics and improve support teamwork. There are other group scenarios which are closer to leisure activities like shopping or visiting a museum and not as formal as work related meetings. In such contexts, in spite of the absence of a goal, our system can play an important role, like recommending or hinting. For example, two friends in front of a shelf full of shoes might start chatting, e.g. about funky music. The content of the conversation, or simply some keywords, can be used to dynamically provide feedback and signal some shoes “related” to the content of the conversation, e.g. sneaker shoes. In this scenario, the two friends have no need of such a suggestion, and would probably buy a pair of shoes anyway. The system provides only a suggestion, a hint which they might follow as a recommendation. In case the topic of the conversation if a pair of shoes the system might behave like Amazon’s recommendation system, which shows items related to the one being browsed.
4. Conclusions In this paper a real-time speech-interfaced system for group conversation modeling has been presented. The proposed architecture is made of different layers (Perception, Interpretation and Presentation) cooperating to carry out the overall processing, which adheres to a non invasivity and effectiveness, so that people feel in a natural context and receive a prompt feedback. In particular, the real-time implementation of the system has been described. The Nu-Tech platform has been chosen as integration platform for audio processing, guesting the Front-end algorithms and the commercial ASR engine for accomplishing the Keyword Spotting duties. The Interpretation module, based on a shallow semantci map, exploits the real time detection to keep track of the conversation’s topic and can be easily ported to new domains, without the need of complex ontologies which require time to be developed. The Presentation layer allows flexible selection of new stimuli to foster and support the conversation. According to its characteristics and the real-time working capabilities, we consider the system as suitable for addressing different applications, specially in tapletop scenarios. Future works are oriented to the experimentation of this architecture in new scenarios, to assess the portability of the system to new domains and new context like wall-projected interfaces.
Acknowledgements The authors would like to thank Nuance Communication, It-Works and FBK-Irst for supporting us in this work.
80
C. Rocchi et al. / A Real-Time Speech-Interfaced System for Group Conversation Modeling
References [1] [2]
[3]
[4]
[5]
[6]
[7] [8] [9]
[10] [11]
[12] [13]
J. M. DiMicco, K. J. Hollenbach, A. Pandolfo, W. Bender, The impact of increased awareness while face-to-face, Human-Computer Interaction 22 (2007), 47–96. J. Sturm, O. H. Herwijnen, A. Eyck, J. Terken, Influencing social dynamics in meetings through a peripheral display, in Proc. of the 9th Int. Conf. on Multimodal interfaces (2007), Nagoya, Japan, 263– 270. S. Hunter, P. Maes, WordPlay: A Table-Top Interface for Collaborative Brainstorming and Decision Making, presented at the 3rd IEEE Int. Workshop on Horizontal Interactive Human Computer Systems (2008), Amsterdam, the Netherlands. C. Rocchi, D. Tomasini, O. Stock, M. Zancanaro, Fostering conversation after the museum visit: a WOZ study for a shared interface, in Proc. of the working conference on Advanced Visual Interfaces (2008), Naples, Italy, 335–338. C. Rocchi, F. Pianesi, E. Principi, O. Stock, D. Tomasini, M. Zancanaro, A Tabletop Display for Affecting Group Conversation: Formative Evaluation in a Museum Scenario, presented at the 3rd IEEE Int. Workshop on Horizontal Interactive Human Computer Systems (2008), Amsterdam, the Netherlands. S. Cifani, E. Principi, C. Rocchi, S. Squartini, F. Piazza, A Multichannel Noise Reduction Front-End Based on Psychoacoustics for Robust Speech Recognition in Highly Noisy Environments, in Proc. of the 2008 joint workshop on Hands-Free Speech Communication and Microphone Arrays (2008), Trento, Italy, 172–175. S. N. Wrigley, G. J. Brown, V. Wan, S. Renals, Speech and Crosstalk Detection in Multichannel Audio, IEEE Trans. on Speech and Audio Processing 13 (2005), 84–91. A. S. Manos, V. W. Zue, A segment-based wordspotter using phonetic filler models in Proc. IEEE Int. Conf. ASSP (1997), 899–902. F. Bettarelli, E. Ciavattini, A. Lattanzi, D. Zallocco, S. Squartini, F. Piazza, NU-Tech: implementing DSP Algorithms in a plug-in based software platform for real time audio applications, in Proc. of the 118th Conv. of the Audio Engineering Society (2005), Barcelona, Spain. Nuance. VoCon 3200 Speech Recognition Engine. Available: http://www.nuance.com/vocon/3200/ E. Principi, S. Cifani, C. Rocchi, S. Squartini and F. Piazza, Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation, in Proc. of the 2nd Int. Conf. on Human System Interaction (2009), Catania, Italy, to be published. G. Leinhardt, K. Knutson, Listening in on Museum Conversations, Altamira Press (2004). C. Rocchi, O. Stock, M. Zancanaro, M. Kruppa, A. Krüger, The museum visit: generating seamless personalized presentations on multiple devices, in Proc. of the 9th int. Conf. on Intelligent User Interfaces (2004), Funchal, Portugal.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-81
81
A Partitioned Frequency Block Algorithm for Blind Separation in Reverberant Environments Michele SCARPINITI a,1 , Andrea PICARO a , Raffaele PARISI a and Aurelio UNCINI a a Infocom Department, “Sapienza” University of Rome Abstract. In this paper a blind source separation algorithm in reverberant environment is presented. The algorithms working in such adverse environments are usually characterized by a huge computational cost. In order to reduce the computational complexity of this kind of algorithms a partitioned frequency domain approach is proposed. Several experimental results are shown to demonstrate the effectiveness of the proposed method. Keywords. Blind Source Separation, Reverberant environment, Frequency domain algorithms, Partitioned block algorithms.
Introduction Blind Source Separation (BSS) has arisen a great interest in the last fifteen years and a huge number of works were published [1,2]. While there are a lot of studies in the linear and instantaneous case, much poor is the range of works in reverberant environment. To achieve BSS of convolutive mixtures, several methods have been proposed [3, 2]. Some approaches consider the unmixing systems as FIR filters, and estimate those filters [4]; other approaches transform the problem into the frequency domain to solve an instantaneous BSS problem for every frequency simultaneously [5,6]. There are a few applications of BSS to mixed speech signals in realistic acoustical environments [7], but the separation performance is still not good enough [8]. It has also been pointed out that some problems occur in environments with a lot of reverberation [9,8]. If the environment is quite adverse, i.e. long reverberation time, the demixing filter can be very long. This fact leads to a huge computational complexity for the demixing algorithm [9,8]. A solution to this fundamental limitation can be found by partitioning the demixing filter in an optimal number of sub-filters, in order to partition the convolution sum in a number of shorter convolution sums [10]. The idea of this paper is to develop a novel algorithm for source separation in adverse environments. In particular a partitioned frequency block algorithm [11] is integrated in 1 Corresponding
Author: Infocom Department, “Sapienza” University of Rome, via Eudossiana 18, 00184 Roma; e-mail: [email protected].
82
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
the frequency domain in order to reduce the latency between input and output and the overall computational time. The paper is organized as follows: section 1 introduces the BSS problem in the convolutive environment. Section 2 describes the Partitioned Block Frequency Adaptive algorithm that will be the core of the proposed algorithm, deeply described in section 3. Section 4 shows some experimental tests while section 5 concludes the work.
1. Blind Source Separation for Convolutive Mixtures Let us consider a set of N unknown and independent sources s(n) = [s1 (n), . . . , sN (n)]T , such that the components si (n) are zero-mean and mutually independent. Signals received by an array of M sensors are denoted by x(n) = [x1 (n), . . . , xM (n)]T and are called mixtures. For simplicity we consider the case of N = M . The convolutive model introduces the following relation between the i-th mixed signal and the original source signals xi (n) =
N K−1
aij (k)sj (n − k) ,
i = 1, . . . , M
(1)
j=1 k=0
The mixed signal is a linear mixture of filtered versions of the source signals, and aij (k) represents the k-th mixing filter coefficient. The task is to estimate the independent components from the observations without resort to a priori knowledge about the mixing system and obtaining an estimate u(n) of the original source vector s(n): ui (n) =
M L−1
wij (l)xj (n − l) ,
i = 1, . . . , N
(2)
j=1 l=0
where wij (l) denotes the l-th mixing filter coefficient. When a mixing environment is quite complex, filters of the ICA network may require thousands of taps to appropriately invert the mixing. In such cases, the time domain methods have a large computational load to compute convolution of long filters and are expensive for updating filter coefficients. The methods can be implemented in the frequency domain using FFT in order to decrease the computational load because the convolution operation in the time domain can be performed by element-wise multiplication in the frequency domain. Note that the convolutive mixtures can be expressed as x (f, k) = A (f ) s (f, k) ,
∀f
(3)
where x(f, k) and s(f, k) are the frequency components of mixtures and the independent sources at frequency f , respectively. A(f ) denotes a matrix containing elements of the frequency transforms of mixing filters at frequency f . From (3), it is clear that convolutive mixtures can be represented by a set of instantaneous mixtures in the frequency domain. Thus, the independent components can be recovered by applying ICA for instantaneous mixtures at each frequency bin and then transforming the results in the time domain: u (f, k) = W (f ) x (f, k) ,
∀f
(4)
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
83
where W(f ) denotes the demixing matrix in the frequency domain. Note that s(f, k), x(f, k) and u(f, k) are vectors of complex elements. In order to solve the BSS in the convolutive environment Smaragdis in [6] applied to each frequency bin one of the best performing algorithm for BSS, introduced by Bell & Sejnowski in [3] and known as INFOMAX. This algorithm performs separation by maximizing the joint entropy of the output of a nonlinear network and leads to a simple adaptation rule for the demixing matrix. Adopting the Amari’s natural gradient [12,13] we have: (5) ΔW(f ) = I − 2yuH W(f ), where y = f (u) and f (·) is a complex nonlinear function called activation function (AF), essential for the information maximization [3]. One of the main issues in designing such a network is the choice of the complex AF [14]. Smaragdis in [6] proposed y = f (u) = tanh(Re(u)) + tanh(Im(u)), known as the splitting AF [15].
2. The Partitioned Frequency Block Algorithm The fundamental limitation of the convolutive BSS problem is due to the computational complexity in order to consider the long impulse response that describes the reverberant environment. We propose the use of a particular type of demixing architecture that is able to reduce the computational cost by adopting a block processing of signals. Unfortunately computational cost increases with the block length: an efficient implementation can be derived by dividing (partitioning) the convolution sum into a number of smaller sums [11]. This kind of algorithm is known in literature as Partitioned Frequency Block Adaptive Filtering (PFBAF) and was originally introduced in [10]. In particular we propose to modify the standard ICA algorithms by adopting this efficient solution for the convolution calculation. Let us assume that L = P · M , where P and M are integers (the length L of the filter can be divided in P partitions of length M ), and note that the convolution sum may be written as u(n) =
P −1
ul (n),
(6)
l=0
where ul (n) =
M −1
w(i + lM )x(n − lM − i).
(7)
i=0
To develop a frequency domain implementation of these convolutions, we choose an input block length Li = M and divide the input data into blocks of length 2M samples such that the last M samples of the k-th block are the same of the first M samples of the (k + 1)-th block. Then the convolution sum in (7) can be evaluated by using circular convolution of these data blocks with the appropriate weight vectors having been padded F with M zeros. Let us pose xF l (k) and wl (k) the FFT of the time vector x(n) and w(n) respectively, then the l-th partition of the output vector u(n) can be evaluated as
84
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
ul (n) = the last M elements of IFFT wlF (k) ⊗ xF l (k) ,
(8)
where ⊗ denotes multiplication element-by-element, k is the block index and l is the partition index. Summing all the single convolution in each partition and noting that xF l (k) = F x0 (k − l), we can write the output as P −1 F F u(n) = the last M elements of IFFT wl (k) ⊗ x0 (k − l) . (9) l=0
The block diagram of the partitioned algorithm is depicted in Figure 1, where the delays z −1 , are in the unit of block size and the thick lines represent frequency domain vectors.
Figure 1. Scheme of the PFBAF algorithm
2.1. Frequency domain BSS with partitioned algorithm A first idea is to evaluate the convolutions in the forward phase of the Smaragdis algorithm with the previous algorithm, in order to reduce the high computational cost. This version of the algorithm is named as Partitioned-Smaragdis or simply P-Smaragdis. During the backward phase it is not possible to evaluate the learning vector ΔW(f ) for every block because we need a signal block of the length of the estimated filter. Applying the partitioned block algorithm to the forward phase, if the filter has L taps, the algorithm can elaborate a shorter block of length M = L/P . In order to produce the learning there is the need of evaluate P blocks and then it is possible to calculate an FFT of length L = P M on the vector obtained by the concatenation of the P blocks produced by the forward phase. The data path of the algorithm is depicted in Figure 2. In this figure the orange block is the coefficient vector in time domain: there are P blocks of length M ; the
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
85
Figure 2. Scheme of the modified Smaragdis algorithm with partitioning in the forward phase
thick lines represent frequency domain vectors while the thin ones denote time domain vectors. The black path is the forward phase: for every new input block of length M we padded with M zeros and performed FFT with 2M points. The output is buffered in P blocks of length M in order to obtain a vector of L = P M samples and then transformed in the frequency domain through an FFT with 2L points. It is necessary to adapt only half of the coefficients and then to apply the conjugate symmetric constraints. The main advantage of this algorithm is the possibility of evaluating the convolution sums in the forward phase in a small amount of time, obtaining results similar to the classical ones. A first disadvantage is the huge number of FFT calculation and at the same time a raise in the amount of memory usage for the buffers. In addition the convergence is not fast, because we perform a complete learning step every P blocks acquisition.
3. PF-BSS Implementation A more efficient realization of this kind of algorithm is its complete implementation in the frequency domain. In fact we can generalize the previous algorithm to the case of P > 1 partitions, adopting eq. (9). So for every frequency bin f¯, we have ¯
uf (k) =
P −1 l=0
¯
Wlf (k)xf (k − l), ¯
(10)
where the index k denotes the k-th block and l the l-th partition. From (10) it is evident that the relationship between the P input blocks at the f¯th frequency bin and the P output blocks at the f¯-th frequency bin has a convolution structure. In this way the learning can be done for each frequency and for each partition, ¯ without the evaluation of the FFTs. We can adapt each of the Wlf (k) coefficients directly
86
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
in the frequency domain. The main difference from the classical Smaragdis algorithm is that each coefficient of an output block depends from P coefficients extracted from P input blocks and convolved by P coefficients extracted by the P filter partitions. ¯ Let us pose yf (k) the vector collecting the AF’s outputs of the f¯-th frequency bin of the k-th block, then for the INFOMAX principle [3] we derive the learning rules by maximizing the joint entropy of the network output: f¯ f¯ H(y −E ln(p(y (k)) = max (k))) , (11) max ¯ ¯ Wlf (k)
Wlf (k)
¯ where p(yf (k)) is the probability density function (pdf) of the output at the f¯-th frequency bin. This leads to the following learning rules, the first for the first partition (l = 0) and the second for the remaining partitions (l ≥ 1): ¯ ¯ ¯ ¯ ¯ (12) W0f (k + 1) = W0f (k) + μ I − 2yf (k)uf (k)H W0f (k), l = 0, ¯
¯
¯
Wlf (k + 1) = Wlf (k) − 2μyf (k)uf (k − l)H Wlf (k), l = 1, 2, . . . , P − 1. (13) ¯
¯
¯ In (13) the terms uf (k−l) are the coefficients of the previous block for the f¯-th frequency bin. Further the gradient constraints [16,11] are used to obtain the weights in (13). In this way this algorithm can adapt all coefficients for every input block. We name this algorithm as Partitioned Frequency BSS (PF-BSS). The data path is shown if Figure 3, where the thick lines represent frequency domain vectors while the thin ones denote time domain vectors. The forward phase is equal to
Figure 3. Scheme of the proposed PF-BSS algorithm
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
87
that of Figure 1. The major advantage of this solution is that there are no buffers on the output and in addition the learning scheme is completely in the frequency domain. With an appropriate choice of the P parameter and a multi-thread architecture, the algorithm proposed in Figure 3 is able to evaluate the output in real time, regardless of the length of the filter.
4. Results We tested our architecture in a number of real environment characterized by a reverberation time T60 in a range of 100 − 400 ms. The impulse responses of the environment were simulated with the Matlab tool RoomSim2 . In order to provide a mathematical evaluation of the output separation, different indexes of performance are available in literature. In this paper the signal to interference ratio (SIR) Sj of the j-th source was adopted [17]: ⎫⎤ ⎡ ⎧ ⎨ 2 2 ⎬ ⎦. |u|σ(j),k E |u|σ(j),j Sj = 10 log ⎣E (14) ⎩ ⎭ k=j
In (14) ui,j is the i-th output signal when only the j-th input signal is present, while σ(j) is the output channel corresponding to the j-th input. In this paper we propose an
Figure 4. Environment of the experimental test (sources:red bullet; sensors: black bullet)
experimental test for a room of dimension 5 × 4 × 3.5 m, shown in Figure 4. We used a male and a female speech signal sampled at 8 kHz and the convolved filters have 1024 taps. Several experiments were proposed with a learning rate of μ = 10−3 and P = 2, 4, 8 respectively. Figure 5 shows the impulse responses of the demixing filters in the case of P = 8 while Figure 6 shows the SIR of this experimental test compared with the classical Smaragdis algorithm. Figure 6 shows that the SIRs of the proposed approach are perfectly comparable with the classical ones, but in this case the latency 2 Roomsim is a MATLAB simulation of shoebox room acoustics for use in teaching and research. Roomsim is available from http://media.paisley.ac.uk/~campbell/Roomsim/
88
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
Figure 5. Impulsive responses of the demixing filters in the case of P = 8
between input and output is reduced of a factor P . Moreover this figure shows that the solution converges to the same SIR value, independently from the number of partitions P , while the Smaragdis algorithm converge only with long filters and does not converge with a filter length of 128, which is the same for the PFBSS with P = 8. There is another interesting result to note. Araki et al. in [9] show the need of a compromise in the choice of the window length. In fact the window should have a length to cover the reverberation time, but on the other side, a long window tends to capture the signal non stationarity only and render very difficult the estimate of the statistical property useful to separation, due to the “sum effect” of the FFT, which produces more gaussian samples, increasing the number of frequency bins. The PF-BSS algorithm overcomes this problem. It is able to split the choice of the filter length from the choice of the window length of the input signal. 4.1. Computational complexity Let us pose N the number of signals, L = P · M the length of the convolution filter, where P and M are the number and the length of the partitions respectively. Then it is possible to evaluate the computational complexity as the sum of three terms: the first term represents the computational cost of the FFT, the second one is the complexity of the output evaluation while the third one represents the complexity of the learning phase. Table 1 summarize the complexity for the Smaragdis algorithm, the P-Smaragdis and the proposed PF-BSS. From table 1 it is clear that the PF-BSS algorithm has the minor computational complexity that in addition to the minor latency between input and output makes it a suitable algorithm for real-time purpose.
89
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
Figure 6. Comparison of Smaragdis and the proposed PF-BSS algorithms with different block size and number of partitions
Complexity Smaragdis P-Smaragdis PF-BSS
(3N + 2N 2 )L log2 L LN 2 L[(N 2 + 2N ) log2 M + (2N 2 + N ) log2 L] LN 2 P 3N L log2 M LN 2 P Table 1. Evaluation of the computational complexity
L (N 2 + N 3 ) 2 L(N 2 + N 3 ) L(P N 2 + N 3 )
5. Conclusions This paper introduced an improved version of the frequency domain blind source separation proposed by Smaragdis in [6]. The enhancement consists in the adoption of the Partitioned Frequency Block Adaptive Algorithm in the evaluation of the convolution sums. This approach is well known in the adaptive filtering framework [11]. Moreover a novel algorithm named PF-BSS was proposed. This algorithm can perform the learning phase completely in the frequency domain, improving the convergence speed and reducing the latency between input and output. Several tests have been performed to verify the effectiveness of the proposed approach and demonstrate that it can solve the problem in a minor amount of time. The quality of the separation has been evaluated in terms of the SIR Index which is widely diffused in literature.
90
M. Scarpiniti et al. / A Partitioned Frequency Block Algorithm
Acknowledgements This work has been partially supported by the Faculty grant “Separazione di sorgenti vocali in ambienti riverberanti” (under the grant number C26F08W2F2), of the University of Roma “La Sapienza”, Italy.
References [1] S. Choi, A. Cichocki, H.-M. Park, and S.-Y. Lee, “Blind source separation and independent component analysis: A review,” Neural Information Processing - Letters and Reviews, vol. 6, no. 1, pp. 1–57, January 2005. [2] S. Haykin, Ed., Unsupervised Adaptive Filtering, Volume 1: Blind Signal Separation. Wiley, 2000. [3] A. J. Bell and T. J. Sejnowski, “An information-maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, pp. 1129–1159, 1995. [4] T.-W. Lee, M. Girolami, A. J. Bell, and T. J. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” Computer & Mathematics with applications, vol. 39, no. 11, pp. 1–21, June 2000. [5] S. Ikeda and N. Murata, “A method of ica in time-frequency domain,” in Proc. Workshop Indep. Compon. Anal. Signal. Sep., 1999, pp. 365–370. [6] P. Smaragdis, “Blind separation of convolved mixtures in the frequency domain,” Neurocomputing, vol. 22, pp. 21–34, 1998. [7] T. Lee, A. J. Bell, and R. Orglmeister, “Blind source separation of real world signals,” Neural Networks, vol. 4, pp. 2129–2134, 1997. [8] M. Z. Ikram and D. R. Morgan, “Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment,” in Proc. ICASSP2000, 2000, pp. 1041–1044. [9] S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, “The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech,” IEEE Transaction on Speech and Audio Processing, vol. 11, no. 2, pp. 109–116, 2003. [10] P. C. W. Sommen, “Partitioned frequency domain adaptive filters,” in Proc. Twenty-Third Asilomar Conference on Signals, Systems and Computers, vol. 2, Oct. 30–Nov. 1, 1989, pp. 677–681. [11] B. Farhang-Boroujeny, Adaptive Filters: Theory and Applications. Wiley, 1998. [12] S. Amari, A. Cichocki, and H. Howard, “A new learning algorithm for blind signal separation,” Advances in Neural Information Processing Systems, vol. 8, pp. 757–763, 1996. [13] S.-I. Amari, “Natural gradient works efficiently in learning,” Neural Computation, vol. 10, pp. 251–276, 1998. [14] M. Scarpiniti, D. Vigliano, R. Parisi, and A. Uncini, “Generalized splitting functions for blind separation of complex signals,” Neurocomputing, vol. 71, no. 10-12, pp. 2245–2270, June 2008. [15] N. Benvenuto and F. Piazza, “On the complex backpropagation algorithm,” IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 967–969, April 1992. [16] B. Farhang-Boroujeny, “Analysis and efficient implementation of partitioned block LMS adaptive filters,” IEEE Transactions on Signal Processing, Vol. 44, no. 11, pp. 2865–2868, 1996. [17] D. Shobben, K. Torkkola, and P. Smaragdis, “Evaluation of blind signal separation methods,” in In Proc. of ICA and BSS, Aussois, France, Jenuary, 11-15 1999, pp. 239–244.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-91
91
Transcription of polyphonic piano music by means of memory-based classification method Giovanni Costantinia,b, Massimiliano Todiscoa , Renzo Perfettic Department of Electronic Engineering, University of Rome “Tor Vergata” b Institute of Acoustics “O. M. Corbino”, Rome c Department of Electronic and Information Engineering, University of Perugia a
Abstract. Music transcription consists in transforming the musical content of audio data into a symbolic representation. The objective of this study is to investigate a transcription system for polyphonic piano. The proposed method focuses on temporal musical structures, note events and their main characteristics: the attack instant and the pitch. Onset detection exploits a time-frequency representation of the audio signal. Note classification is based on constant Q transform (CQT) and support vector machines (SVMs). Finally, to validate our method, we present a collection of experiments using a wide number of musical pieces of heterogeneous styles. Keywords. Onset detection, music transcription, classification, constant Q transform, support vector machines.
Introduction Music transcription can be considered as one of the most demanding activities performed by our brain; not so many people are able to easily transcribe a musical score starting from audio listening, since the success of this operation depends on musical abilities, as well as on the knowledge of the mechanisms of sounds production, of musical theory and styles, and finally on musical experience and practice to listening. In fact, be necessary discern two cases in what the behavior of the automatic transcription systems is different: monophonic music, where notes are played one-byone and polyphonic music, where two or several notes can be played simultaneously. Currently, automatic transcription of monophonic music is treated in time domain by means of zero-crossing or auto-correlation techniques and in frequency domain by means of Discrete Fourier Transform (DFT) or cepstrum. With these techniques, an excellent accuracy level has been achieved [1, 2]. Attempts in automatic transcription of polyphonic music have been much less successful; actually, the harmonic components of notes that simultaneously occur in polyphonic music significantly obfuscate automated transcription. The first algorithms were developed by Moorer [3] Piszczalski e Galler [4]. Moorer (1975) used comb filters and autocorrelation in order to perform transcription of very restricted duets.
92
G. Costantini et al. / Transcription of Polyphonic Piano Music
The most important works in this research field is the Ryynanen and Klapuri transcription system [5] and the Sonic project [6] developed by Marolt, particularly this project makes use of classification-based approaches to transcription based on neural networks. The target of our work dealt with the problem of extracting musical content or a symbolic representation of musical notes, commonly called musical score, from audio data of polyphonic piano music. In this paper, an algorithm and model for automatic transcription of piano music are presented. The solution proposed is based on the onsets detection algorithm based on Short Time Fourier Transform (STFT) and a classification-based algorithm to identify the note pitch. In particular, we propose a supervised classification method that infers the correct note labels based only on training with tagged examples. This method performs polyphonic transcription via a system of Support Vector Machine (SVM) classifiers that have been trained starting from spectral features obtained by means of the well-known Constant-Q Transform (CQT). The paper is organized as follows: in the following section our onsets detection algorithm will be described; in the second section, the spectral features will be formulated; the third section will be devoted to the description of the classification methods; in final section, we present and show the results of a series of experiments involving polyphonic piano music. Some comments conclude the paper.
1. Onset Detection The aim of note onset detection is to find the starting time of each musical note. Several different methods have been proposed for performing onset detection [7, 8]. Our method is based on STFT and, notwithstanding its simplicity, it gives better or equal performance compared to other methods [7, 8]. Let us consider a discrete time-domain signal s(n), whose STFT is given by mh +N 1
Sk (m) =
w(n mh)s(n)e
j N k(nmh )
(1)
n= mh
where N is the window size, h is the hop size, m {0, 1, 2,, M} the hop number, k = 0, 1,, N-1 is the frequency bin index, w(n) is a finite-length sliding Hanning window and n is the summation variable. We obtain a time-frequency representation of the audio signal by means of spectral frames represented by the magnitude spectrum Sk (m) . The set of all the Sk (m) can be packed as columns into a non-negative LM matrix, where M is the total number of spectra we computed and L=N/2+1 is the number of their frequencies. Afterwards, the rows of S are summed, giving the following onset detection function based on the first-order difference
f onset (m) = where
df (m) dm
(2)
G. Costantini et al. / Transcription of Polyphonic Piano Music
93
L
f (m) =
S(l,m)
(3)
l=1
Therefore, the peaks of the function fonset can be assumed to represent times of note onsets. After peak picking, a threshold T is used to suppress spurious peaks; its value is obtained through a validation process as explained in the next sections. To demonstrate the performance of our onset detection method, let us show an example from real piano polyphonic music of Mozart's KV 333 Sonata in B-flat Major, Movement 3, sampled at 8 KHz and quantized with 16 bits. We will consider the second and third bar at 120 metronome beat. It is shown in Figure 1. We use a STFT with N=512, an N-point Hanning window and a hop size h=256 corresponding to a 32 milliseconds hop between successive frames. The spectrogram is shown in Figure 2.
Figure 1. Musical score of Mozart's KV 333 Sonata in B-flat Major.
Figure 2. The spectrogram of Mozart's KV 333 Sonata in B-flat Major.
94
G. Costantini et al. / Transcription of Polyphonic Piano Music
Figure 3. Normalized sum of the elements of each column of the spectrogram.
Figure 4. Onset detection function for the example in Figure 1.
Summing the elements of each column in Figure 2 we obtain the sum of rows in Figure 3 and, after a computation of the first-order difference, the onset detection function in Figure 4. The time onset resolution is 32ms. A statistical evaluation of the onset detection method will be presented in the next sections.
2. The Constant-Q Transform and the Spectral Features A frequency analysis is performed on notes played by piano, in order to detect the signal harmonics. Using the Fast Fourier Transform (FFT) the frequency resolution could be not sufficient. In fact, a FFT with 512 temporal samples x[n] on a sound recorded with the usual sampling rate (SR) of 44100 Hz, carries out a resolution of about 86.1 Hz between two FFT samples. This is not sufficient for low frequency notes, where the distance between two adjacent semitones is about 8 Hz (C3, 131 Hz and C#3, 139 Hz). The frequency resolution will get better if a higher number of temporal
G. Costantini et al. / Transcription of Polyphonic Piano Music
95
samples are used (with 8192 samples the resolution is of about 5.4 Hz), but that requires larger temporal windows for a fixed SR. In this case, the analysis of the instantaneous spectral information of the musical signal makes worse. To solve this problem, a Constant-Q Transform (CQT) [9] is used to detect the fundamental frequency of the note. Then, the upper harmonics may be individuated easily, as they are located at frequencies nearly multiples of the fundamental frequency. The Constant-Q Transform (CQT) is similar to the Discrete Fourier Transform (DFT) but with a main difference: it has a logarithmic frequency scale, since a variable width window is used. It suits better for musical notes that are based on a logarithmic scale. The logarithmic frequency scale provides a constant frequency-to-resolution ratio for every bin
Q=
fk 1 = 1/ b f k +1 f k 2 1
(4)
where b is the number of bins per octave and k the frequency bin. If b=12, and by choosing a particular, then k is equal to the MIDI note number (as in the equaltempered 12-tone-per-octave scale). There is an efficient version of the CQT that’s based on the FFT and on some tricks, as shown in [10]. In our work, the processing phase starts in correspondence to a note onset. Notice that two or more notes belong to the same onset if these notes are played within 32ms. Firstly, the attack time of the note is discarded (in case of the piano, the longest attack time is equal to about 32ms). Then, after a Hanning windowing, as single CQT of the following 64ms of the audio note event is calculated. Figure 5 shows the complete process.
Figure 5. Spectral features extraction.
All the audio files have a sampling rate of 8 kHz. We used b=48, that means 4 CQT-bins per semitone, starting from note C0 (~ 32 Hz) up to note B6 (~ 3951 Hz). The output of the processing phase is a matrix with 336 columns, corresponding to the CQT-bins, and a number of rows that’s equal to the total number of note events in the
96
G. Costantini et al. / Transcription of Polyphonic Piano Music
MIDI file. The scale of the values of the frequency bins is also logarithmic rescaled into a range from 0 to 1. We note that melodic and harmonic structures depend on the composition method adopted by the compositor, this means that every musical note is highly correlated to the preceding and following notes in the composition. In our system, we take account of this assumption, consequently the feature vector is composed by 420 bins that coincide with the 336 CQT-bins of the considered note and the 84 Midi note number of previous note with the following assumption: if a note are detected, the value is 1, if a note are not detected, the value is -1. Figure 6 shows the feature vector.
3. Multi-Class SVM Classifiers Based on CQT Spectral Features A SVM identifies the optimal separating hyperplane (OSH) that maximizes the margin of separation between linearly separable points of two classes. The data points which lie closest to the OSH are called support vectors. It can be shown that the solution with maximum margin corresponds to the best generalization ability [11]. Linearly non-separable data points in input space can be mapped into a higher dimensional (possibly infinite dimensional) feature space through a nonlinear mapping function, so that the images of data points become almost linearly separable. The discriminant function of a SVM has the following expression:
f (x) =
y K (x , x) + b i i
i
(5)
i
where xi is a support vector, K(xi, x) is the kernel function representing the inner product between xi and x in feature space, coefficients i and b are obtained by solving a quadratic optimization problem in dual form [11]. Usually a soft-margin formulation is adopted where a certain amount of noise is tolerated in the training data. To this end, a user-defined constant C > 0 is introduced which controls the trade-off between the maximization of the margin and the minimization of classification errors on the training set [11]. The SVMs were implemented using the software SVMlight developed by Joachims [12]. A radial basis function (RBF) kernel was used: 2 K x i ,x j = exp x i x j ,
(
)
>0
(6)
where describes the width of the Gaussian function. For SVMs with RBF kernel two parameters, C and , need to be determined. To this end we looked for the best parameter values in a specific range using a grid-search on a validation set. More details will be given in the following Section. In this context, the one-versus-all (OVA) approach has been used. The OVA method constructs N SVMs, N being the number of classes. The ith SVM is trained using all the samples in the ith class with a positive class label and all the remaining samples with a negative class label. Our transcription system uses 84 OVA SVM note classifiers whose input is represented by a 420-element feature vector, as described in Section 2. The presence of a note in a given audio event is detected when the discriminant function of the corresponding SVM classifier is
G. Costantini et al. / Transcription of Polyphonic Piano Music
97
positive. Figure 7 shows a schematic view of the complete automatic transcription process.
Figure 6. Normalized magnitude spectrum and Note detected in previous event of the notes E4 flat + E5 flat and the previous notes C4 + C5.
Figure 7. Scheme of the complete automatic transcription process.
98
G. Costantini et al. / Transcription of Polyphonic Piano Music
4. Audio Data and Experimental Results In this section, we report on the simulation results of our transcription system and compare them with some existing methods. The MIDI data used in the experiments were collected from the Classical Piano MIDI Page, http://www.piano-midi.de/ [13]. A list of pieces can be found in [13] (p. 8, Table 5). The 124 pieces dataset was randomly split into 87 training, 24 testing, and 13 validation pieces. The first minute from each song in the dataset was selected for experiments, which provided us with a total of 87 minutes of training audio, 24 minutes of testing audio, and 13 minutes of audio for parameter tuning (validation set). This amounted to 22680, 6142, and 3406 note onsets in the training, testing, and validation sets, respectively. First, we performed a statistical evaluation of the performance of the onset detection method. The results are summarized by three statistics: the Precision, the Recall and the F-measure. When running the onset detection algorithm we experimented with the threshold value for peak picking. We consider as correct the onset detected within 32 milliseconds of the ground-truth onset. The results reported here were obtained using the threshold value 0.01; it was chosen to maximize the F-measure value regarding the 13 pieces of validation dataset. Table I quantifies the performance of the method on the test set (including 6142 onsets). After detection of the note onsets, we have trained the SVMs on the 87 pieces of the training set and we have tested the system on the 24 pieces of the test set. Besides, to compare the accuracy of our system with a system with no memory, a second trial was performed on the same data set, using the 336 CQT-bins. The results are outlined in Table II.
Table I Precision Recall F-measure
96.9% 95.7% 96.3%
Table II
F-measure
System With Memory 420 CQT-bins 85.3%
System Without Memory 336 CQT-bins 82.7%
Etot
20.7%
22.0%
Esubs
10.6%
11.5%
Emiss
10.1%
10.3%
Efa
0.02%
0.02%
G. Costantini et al. / Transcription of Polyphonic Piano Music
99
In addition of the F-measure metric, a different metric was used to evaluate the accuracy of our transcription system. In particular, the transcription error score defined by NIST for evaluations of “who spoke when” in recorded meetings, National Institute of Standards Technology [14]. Specifically, the total error score is given by T
Etot =
max(N
ref
( t ), N sys ( t )) N corr ( t )
t=1
(7)
T
N
ref
(t)
t=1
where T is the number of total time frames, Nsys is the number of notes detected by the system, Nref is the number of notes in the musical pieces (ground-truth) and Ncorr is the number of notes correctly detected by the system. Transcription error score is a sum of three components. The first is substitution error; the second and third components are miss error and false alarms error.
5. Conclusion In this study, we have discussed a polyphonic piano transcription system based on the characterization of note events. We focus our attention on temporal musical structures to detect notes. It has been shown that the proposed onset detection is helpful in the determination of note attacks with modest computational cost and good accuracy. It has been found that the choice of CQT for spectral analysis plays a pivotal role in the performance of the transcription system. We compare two systems based on different featured vectors of 336 CQT-bins and 420 CQT-bins that represent the system without memory and the system with memory, respectively. Musical note recognition system used 87 OVA binary classifiers based on SVM. A wide number of musical pieces of heterogeneous styles we used to validate and test our sensor interface.
References [1] J. C. Brown, “Musical fundamental frequency tracking using a pattern recognition method”, Journal of the Acoustical Society of America, vol. 92, no. 3, 1992. [2] J. C. Brown and B. Zhang, “Musical frequency tracking using the methods of conventional and narrowed autocorrelation”, Journal of the Acoustical Society of America, vol. 89, no. 5, 1991. [3] Moorer, “On the Transcription of Musical Sound by Computer”. Computer Music Journal, Vol. 1, No. 4, Nov. 1977. [4] M. Piszczalski and B. Galler, “Automatic Music Transcription”, Computer Music Journal, Vol. 1, No. 4, Nov. 1977. [5] M. Ryynanen and A. Klapuri, “Polyphonic music transcription using note event modeling,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’05), New Paltz, NY, USA, October 2005. [6] M.Marolt, “A connectionist approach to automatic transcription of polyphonic piano music,” IEEE Transactions on Multimedia, vol. 6, no. 3, 2004.
100
G. Costantini et al. / Transcription of Polyphonic Piano Music
[7] W.C. Lee, C.C. J. Kuo, “Musical onset detection based on adaptive linear prediction”, IEEE International Conference on Multimedia and Expo, ICME 2006, Toronto, Canada, pp. 957-960, 2006. [8] G.P. Nava, H. Tanaka, I. Ide, “A convolutional-kernel based approach for note onset detection in pianosolo audio signals”, Int. Symp. Musical Acoust. ISMA 2004, Nara, Japan, pp. 289-292, 2004. [9] J. C. Brown, “Calculation of a constant Q spectral transform”, Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425–434, 1991. [10] J. C. Brown and M. S. Puckette, “An efficient algorithm for the calculation of a constant Q transform,” Journal of the Acoustical Society of America, vol. 92, no. 5, pp. 2698–2701, 1992. [11] J. Shawe-Taylor, N. Cristianini, An Introduction to Support Vector Machines, Cambridge University Press (2000). [12] T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. [13] G. Poliner and D. Ellis “A Discriminative Model for Polyphonic Piano Transcription,” EURASIP Journal of Advances in Signal Processing, vol. 2007, Article ID 48317, pp. 1-9, 2007. [14] National Institute of Standards Technology, Spring 2004 (RT-04S) rich transcription meeting recognition evaluation plan, 2004. http://nist.gov/speech/tests/rt/rt2004/spring/.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-101
101
A 3D Neural Model for Video Analysis Lucia MADDALENA a , Alfredo PETROSINO b a ICAR - National Research Council, Italy b DSA - University of Naples Parthenope, Italy Abstract. We propose a 3D self organizing neural model for modeling both the background and the foreground in video, helping in distinguishing between moving and stopped objects in the scene. Our aim is to detect foreground objects in digital image sequences taken from stationary cameras and to distinguish them into moving and stopped objects by a model based approach. We show through experimental results that a good discrimination can be achieved for color video sequences that represent typical situations critical for vehicles stopped in no parking areas. Keywords. moving object detection, background subtraction, background modeling, foreground modeling, stopped object, self organization, neural network
Introduction Recently, automated video surveillance using video analysis and understanding technology has become an important research topic in the area of computer vision. Within video understanding technology for surveillance use, stopped object detection is known to be a significant and difficult research problem. Stopped object detection in an image sequence consists in detecting temporally static image regions indicating objects that do not constitute the original background but were brought into the scene at a subsequent time, such as abandoned and removed items, or illegally parked vehicles. Great interest in the stopped object detection problem has been given by the PETS workshops held in 2006 [3] and in 2007 [4], where one of the main aims has been the detection of left luggage, that is luggage that has been abandoned by its owner, in movies taken from multiple cameras. Another example of strong interest in the considered problem is given by the i-LIDS bag and vehicle detection challenge proposed in the AVSS 2007 Conference [11], where the attention has been driven on abandoned bags and parked vehicles events, properly defined. A broad classification of existing approaches to the detection of stopped objects can be given as tracking-based and non tracking-based approaches. In tracking-based approaches the stopped object detection is obtained on the basis of the analysis of object trajectories through an application dependent event detection phase. These include most of the papers in [3,4]. Non tracking-based approaches include pixel- and region-based approaches aiming at classifying pixels/objects without the aid of tracking modules and include [1,5,6,9,10]. Our ap-
102
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
proach to the problem is non tracking-based. The problem is tackled as stopped foreground subtraction, that, in analogy with the background subtraction approach, consists in maintaining an up-to-date model of the stopped foreground and in discriminating moving objects as those that deviate from such model. Both background subtraction and stopped foreground subtraction have the common issue of constructing and maintaining an image model that adapts to scene changes and can capture the most persisting features of the image sequence, i.e. the background and stationary foreground, respectively. For such modeling problem we adopt visual attention mechanisms that help in detecting features that keep the user attention, based on a 3D self-organizing neural network. The paper is organized as follows. In Section 1 we describe the 3D neural model that can be used for both background and foreground modeling for moving object and stopped foreground detection. In Section 2 we describe a model-based pixelwise procedure allowing to discriminate foreground pixels into stopped and moving pixels, that is completely independent on the adopted background and foreground models. Section 3 presents results obtained with the implementation of the proposed stopped foreground detection procedure adopting the 3D neural model, while Section 4 includes concluding remarks.
1. The 3D Neural Model Relying on recent research concerning moving object detection [7,8], we present a self-organizing neural network, organized as a 3-D grid of neurons, suitable for background and foreground modeling. Each neuron computes a function of the weighted linear combination of incoming inputs, with weights resembling the neural network learning, and can be therefore represented by a weight vector obtained collecting the weights related to incoming links. An incoming pattern is mapped to the neuron whose set of weight vectors is most similar to the pattern, and weight vectors in a neighborhood of such node are updated. Specifically, for each pixel pt = It (x) we build a neuronal map consisting of L weight vectors cl (pt ), l = 1, . . . , L. Each weight vector cl (pt ) is represented in the HSV colour space, that allows to specify colours in a way that is close to human experience of colours, and is initialized to the HSV components of the corresponding pixel of the first sequence frame I0 (x). The complete set of weight vectors for all pixels of an image It with N rows and M columns is organized as a ˜ with N rows, M columns, and L layers. An example of such 3D neuronal map B neuronal map for background modeling is given in Fig. 1, which shows that for ˜t (x) = each pixel pt = It (x) (red dot in Fig. 1-(a)) we consider a weight vector B 1 2 L l (c (pt ), c (pt ), . . . , c (pt )), where each c (pt ) (red dot in Fig. 1-(b)) belongs to one of the L layers. ˜ each pixel By subtracting the current image from the background model B, pt of the t-th sequence frame It is compared to the current pixel weight vectors to determine if there exists a weight vector that matches it. The best matching weight vector is used as the pixel’s encoding approximation, and therefore pt is detected as foreground if no acceptable matching weight vector exists; otherwise it is classified as background.
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
(a)
103
(b)
Figure 1. An example image (a) and the background modeling neuronal map with L = 4 layers (b).
Matching for the incoming pixel pt = It (x) is performed by looking for a ˜t (x) = (c1 (pt ), . . . , cL (pt )) of the current pixel weight vector cb (pt ) in the set B weight vectors satisfying: d(cb (pt ), pt ) = min d(cl (pt ), pt ) ≤ ε l=1,...,L
(1)
where the metric d(·) and the threshold ε are suitably chosen as in [7]. The best matching weight vector cb (pt ) belonging to layer b and all other weight vectors in a n × n neighborhood Npt of cb (pt ) in the b-th layer of the ˜ are updated according to selective weighted running average: background model B ˜ b (x) = (1 − αt (x))B ˜ b (x) + αt (x)It (x), ∀x ∈ Np B t t−1 t
(2)
Learning factor αt (x), depending on scene variability, can be chosen as in [7] in such a way that if the best match cb (pt ) satisfying eq. (1) is not found, the ˜ remains unchanged. Such selectivity allows to adapt the background model B background model to scene modifications without introducing the contribution of pixels not belonging to the background scene. Our choice of learning factor αt (x) here takes the form αt (x) = F1 (pt )F2 (pt )α(x)w(x), ∀x ∈ Npt
(3)
where w(x) are Gaussian weights in the neighborhood Npt , α(x) represents the learning factor, depending on scene variability, and functions F1 and F2 , as later specified, take into account uncertainty in the background model and spatial coherency. Uncertainty in the background model derives by the need of the choice of a suitable threshold ε in Eq. (1). Therefore function F1 is chosen as a saturating linear function given by ⎧ ⎨ d(cb (pt ), pt ) if d(cb (pt ), pt ) ≤ ε 1 − F1 (pt ) = ε ⎩0 otherwise
104
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
Function F1 (pt ), whose values are normalized in [0, 1], is such that the closer is the incoming sample pt to the background model C(pt ) = (c1 (pt ), c2 (pt ), . . . , cL (pt )), the larger is the corresponding value F1 (pt ). Therefore, incorporating F1 (pt ) in Eq. (3) ensures that the closer is the incoming sample pt to the background model, the more it contributes to the background model update, thus further reinforcing the corresponding weight vectors. Spatial coherence is introduced in order to enhance robustness against false detections. Let p = I(x) the generic pixel of image I, and let Np a spatial square neighborhood of pixel p ∈ I. We consider the set Ωp of pixels belonging to Np that have a best match in their background model according to Eq. (1), i.e. Ωp = {q ∈ Np : d(cb (q), q) ≤ ε} In analogy with [2], the Neighborhood Coherence Factor is defined as: N CF (p) =
|Ωp | |Np |
where | · | refers to the set cardinality. Such factor gives a relative measure of the number of pixels belonging to the spatial neighborhood Np of a given pixel p that ˜ If N CF (p) > 0.5, most of the are well represented by the background model B. pixels in such spatial neighborhood are well represented by the background model, and this should imply that also pixel p is well represented by the background model. Moreover, the greater is N CF (p), the greater majority of pixels in Np are well represented by the background model, and the better the pixel p can be considered as represented by the background model. Therefore the function F2 (pt ) in Eq. (3) is chosen as 2 ∗ N CF (pt ) − 1 if N CF (pt ) ≥ 0.5 F2 (pt ) = 0 otherwise Incorporating F2 (pt ) in Eq. (3) ensures that the greater is the number of pixels belonging to the spatial neighborhood Npt of a given pixel pt that are well rep˜ the more pixel pt contributes to the backresented by the background model B, ground model update. This means also that exploiting spatial coherence of scene objects as compared to scene background guarantees more robustness against false outlier detections. ˜t has been adopted for both the background model The described model B BGt and the foreground model F Gt described in the following section for the classification of stopped and moving pixels.
2. Stopped Foreground Detection In this section we propose a model-based approach to the classification of foreground pixels into stopped and moving pixels. A foreground pixel is classified as stopped if it holds the same color features for several consecutive frames; otherwise it is classified as moving.
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
105
Assuming we have a model BGt of the image sequence background, we compute a function E(x) of color feature occurrences for pixel It (x) as follows ⎧ ⎨ min(τs , E(x) + 1) if It (x) ∈ BGt ∧ It (x) ∈ F Gt (4) E(x) = max(0, E(x) − 1) if It (x) ∈ BGt ∧ It (x) ∈ F Gt ⎩ max(0, E(x) − k) if It (x) ∈ BGt where model F Gt of the sequence foreground is iteratively built and updated using image pixels It (x) for which E(x) > 0. Every time pixel It (x) belongs to the foreground model (It (x) ∈ F Gt ), E(x) is incremented, while it is decremented if it does not belong to the foreground model. The maximum value τs for E(x) corresponds to the stationarity threshold, i.e. the minimum number of consecutive frames after which a pixel assuming constant color features is classified as stopped. The value for τs is chosen depending on the desired responsiveness of the system. On the contrary, if pixel It (x) is detected as belonging to the background (It (x) ∈ BGt ), E(x) is decreased by a factor k. The decay constant k determines how fast E(x) should decrease, i.e. how fast the system should recognize that a stopped pixel has moved again. To set the alarm flag off immediately after the removal of the stopped object, the value of decay should be large, eventually equal to τs . Pixels It (x) for which E(x) reaches the stationarity threshold value τs are classified as stopped. In order to keep memory of objects that have stopped and to distinguish them by moving objects eventually passing in front of them in subsequent frames, pixels for which E(x) = τS are moved to a stopped object model STt , setting STt (x) = F G(x), while foreground model and E function are reinitialized (F G(x) = 0, and E(x) = 0). Therefore, pixels for which STt (x) = 0 belong to stopped objects, while pixels for which 0 < F Gt (x) < τS belong to moving objects, and eventually STt ∩ F Gt = ∅. The described procedure is completely independent on the model adopted for the scene background and foreground. Results presented in the following section have been obtained adopting for the background and the foreground the 3D neural model presented in Section 1.
3. Experimental Results Experimental results for the detection of stopped objects using the proposed approach have been produced for several image sequences. Here we describe three different sequences, namely PV-easy, PV-medium, and PV-hard, belonging to the publicly available i-LIDS 2007 dataset (ftp://motinas.elec.qmul.ac.uk/pub/iLids). Such scenes represent typical situations critical for detecting vehicles in no parking areas, where the street under control is more or less crowded with cars, depending on the hour of the day the scene refers to. For all the scenes the main difficulty is represented by the strong illumination variations, due to clouds frequently covering and uncovering the sun. For the purpose of the AVSS 2007 contest [11], the no parking area is defined as the main street borders, and the stationarity threshold is defined as τS = 1500. This means that an object is considered irregularly
106
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
251
2712
4119
4200
4875
4976
5000
5290
Figure 2. Detection of stopped objects in sequence PV-easy. The van first stops in frame 2712. The first stationary object is detected in frame 4119; further stationary pixels are later detected, as shown in frames 4200 and 4875. The van is detected as a stationary object till to frame 4976, and no more stopped objects are detected till to frame 5290 (end of the sequence).
parked if it stops in the no parking area for more than 60 seconds (scenes are captured at 25 fps). Results obtained for sequence PV-easy are reported in Fig. 2. Since an empty initial background is not available for this scene, we artificially inserted 30 empty scene frames at the beginning of the sequence (starting from frame 251) in order to not be puzzled with bootstrapping problems for background modeling. As soon as the white van stops (starting from frame 2712) the function E(x) described in Section 2 starts incrementing for pixels belonging to the van; such pixels are inserted into the foreground model F Gt and used for the model update. After approximately τS =1500 frames, E(x) reaches the stationarity threshold τS , thus signaling the first stopped object (frame 4119). From this moment till to the end of the stopped car event the stopped object model allows to distinguish moving objects from the stopped object. When the van leaves again (from frame 4875), the part of the scene uncovered by the van is again recognized as belonging to the background model BGt , and previously stopped pixels are deleted from the stopped object model. It should be stressed that illumination conditions have changed quite a bit between the stopping and leaving of the van. This results in an uncovered background very different from the background that was stored before the van stop. Our background model, however, could recognize it again as background since it includes a mechanism for distinguishing shadows and incorporating them into the background model (here not described for space constraints). Moreover, it should be clarified that we do not identify the whole white van, but only its part belonging to the no parking area, since we restrict our attention only to the street including the no parking area (masking out the remaining part of the scene). Results obtained for sequence PV-medium are reported in Fig. 3. Here the empty scene available at the beginning of the sequence (starting from frame 469) allows to train a quite faithful background model. As soon as the dark car stops (starting from frame 700), E(x) starts incrementing for pixels belonging to the
107
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
469
700
2197
2675
2720
2779
2800
3748
Figure 3. Detection of stopped objects in sequence PV-medium. The car first stops in frame 700. The first stationary object is detected in frame 2197; further stationary pixels are later detected, even if the stopped object is occluded by foreground pixels (e.g. in frame 2720, where the white car covers the stopped car). The car is detected as stopped till to frame 2779, and no more stopped objects are detected till to frame 3748 (end of the sequence).
car; such pixels are inserted into the foreground model F Gt and used for the model update. The role of the F Gt model is soon quite clear here, since many cars pass in front of the stopped car and, without a comparison with such model, could be taken erroneously as part of the stopped car. After approximately τS =1500 frames, E(x) reaches the stationarity threshold τS , thus signaling the first stopped object (frame 2197). From this moment till to the end of the stopped car event the stopped object model allows to distinguish moving objects passing in front of the stopped car, as can be seen in frame 2720. When the car leaves again (starting with frame 2675 where the car issues the directional indicator), the part of the scene uncovered by the car is again recognized as belonging to the background model BGt , and previously stopped pixels are deleted from the stopped object model. Finally, results obtained for sequence PV-hard are reported in Fig. 4. As soon as the white car stops (starting from frame 787), E(x) starts incrementing for pixels belonging to the car; such pixels are inserted into the foreground model F Gt and used for the model update. After approximately τS =1500 frames, E(x) reaches the stationarity threshold τS , thus signaling the first stopped object (frame 3294). When the car leaves again (from frame 3840), the part of the scene uncovered by the car is again recognized as belonging to the background model BGt , and previously stopped pixels are deleted from the stopped object model. Comparison of stopped object event start and end times computed with our approach with those provided by the ground truth are reported in Table 1 for all the considered sequences. Since our approach to stopped object detection is pixel-based and no region-based post-processing is performed in order to identify objects, our stopped object event starts as soon as a single pixel is detected as stopped, and ends as soon as no more stopped pixels are detected, thus leading to small errors in detected event start and end times.
108
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
271
787
3294
3300
3825
3840
3850
4360
Figure 4. Detection of stopped objects in sequence PV-hard. The car first stops in frame 787. The first stationary object is detected in frame 3294; further stationary pixels are later detected, even if the illumination conditions are strongly varying (see different illumination in frames 787 and 3825). The car is detected as stopped till to frame 3840, and no more stopped objects are detected till to frame 4360 (end of the sequence).
Table 1. Comparison of stopped object event start and end times computed with our approach with those provided by the ground truth for considered sequences. Sequence name
Event
Ground truth
Proposed 3D model
Error (secs)
PV-easy ” PV-medium ” PV-hard
Start time End time Start time End time Start time
02:48 03:15 01:28 01:47 02:12
02:45 03:19 01:20 01:51 02:12
3 4 0 4 0
”
End time
02:33
02:34
1
4. Conclusions The paper reports our approach to background and foreground modeling for moving object and stopped foreground detection. The approach is based on visual attention mechanisms to solve motion detection tasks, using a 3D self-organizing neural network, without prior knowledge of the pattern classes. The aim is to obtain the objects that keep the user attention in accordance with a set of predefined features, by learning the trajectories and features of moving and stopped objects in a self-organizing manner. Stopped foreground subtraction is achieved maintaining an up-to-date model of the background and of the foreground and in discriminating moving objects as those that deviate from both such models. We show that adopting the 3D neural model allows to construct a system able to detect motion and segment foreground objects into moving or stopped objects, even when they appear superimposed.
L. Maddalena and A. Petrosino / A 3D Neural Model for Video Analysis
109
References [1] Collins, R.T., Lipton, A.J., Kanade, T., Fujiyoshi, H., Duggins, D., Tsin, Y., Tolliver, D., Enomoto, N., Hasegawa, O., Burt, P., Wixson, L. A System for Video Surveillance and Monitoring. The Robotics Institute, Carnegie Mellon University, Tech. Rep. CMU-RITR-00-12 (2000). [2] Ding, J., Ma, R., Chen, S. A Scale-Based Connected Coherence Tree Algorithm for Image Segmentation. IEEE Transactions on Image Processing, 17-2, 204–216 (2008) [3] Ferryman, J.M. (Ed.) Proceedings of the 9th IEEE International Workshop on PETS. New York, June 18 (2006) [4] Ferryman, J.M. (Ed.) Proceedings of the 10th IEEE International Workshop on PETS. Rio de Janeiro, Brazil, October 14 (2007) [5] Herrero-Jaraba, E., Orrite-Urunuela, C., Senar, J. Detected Motion Classification with a Double-Background and a Neighborhood-based Difference. Pattern Recognition Letters 24, 2079–2092 (2003) [6] Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.S. Real-time ForegroundBackground Segmentation using Codebook Model. Real-Time Imaging 11, 172–185 (2005) [7] Maddalena, L., Petrosino, A. A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications. IEEE Transactions on Image Processing, 17 (7), 1168–1177 (2008) [8] Maddalena, L., Petrosino, A., Ferone A. Object Motion Detection and Tracking by an Artificial Intelligence Approach. International Journal of Pattern Recognition and Artificial Intelligence, 22, (5), pp. 915-928 (2008) [9] Patwardhan, K. A., Sapiro, G., Morellas, V. Robust Foreground Detection in Video Using Pixel Layers. IEEE Transactions on PAMI 30, No. 4 (2008) [10] Porikli, F., Ivanov, Y., Haga, T. Robust Abandoned Object Detection Using Dual Foregrounds. EURASIP Journal on Advances in Signal Processing (2008) [11] Proceedings of 2007 IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2007). IEEE Computer Society (2007)
110
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-110
A wavelet based heuristic to dimension Neural Networks for simple signal approximation Gabriele COLOMBINI1*, Davide SOTTARA1, Luca LUCCARINI2, Paola MELLO1 1
DEIS, Faculty of Engineering, University of Bologna Viale Risorgimento 2, 40100 Bologna (BO) Italy 2 ENEA - ACS PROT IDR - Water Resource Management Section Via Martiri di Monte Sole 4, 40129 Bologna (BO) Italy
Abstract Before training a feed forward neural network, one needs to define its architecture. Even in simple feed-forward networks, the number of neurons of the hidden layer is a fundamental parameter, but it is not generally possible to compute its optimal value a priori. It is good practice to start from an initial number of neurons, then build, train and test several different networks with a similar hidden layer size, but this can be excessively expensive when the data to be learned are simple, while some real-time constraints have to be satisfied. This paper shows a heuristic method for dimensioning and initializing a network under such assumptions. The method has been tested on a project for waste water treatment monitoring.
Introduction Waste water treatment plants need to be monitored continuously in order to prevent faults and optimize operational costs. A possible approach to reaching this goal is to constantly monitor some indicator signals such as pH, redox potential (ORP) and dissolved oxygen concentration (DO) in order to identify the advancement state of the treatment process [1]. In order to dynamically analyze these values we use neural networks to approximate the sampled signals. The network is a single-input singleoutput feedforward network with one hidden layer and thus has a 1-H-1 structure, with H an optimal value depending on the signal to be modeled. The hidden neurons activation function is sigmoidal (hyperbolic tangent) while the output neuron one is linear. Thus, the network output is the sum of a number of sigmoidal components equal to the number of hidden neurons. Every hidden neuron is described by a tern of parameters {A,w,b} – the hidden to output synapse weight, the input to hidden synapse weight and the bias. A represents the amplitude of the sigmoid, w its extension and b defines, together with w, the center of the sigmoid [2]. In particular, it is not the output of the network itself, but its parameters (weights and biases) which can be analyzed to obtain the relevant characteristic of the signal. Monitoring, however, must performed on-line, in real time, to make the analysis * Corresponding Author: Colombini Gabriele, DEIS, Faculty of Engineering, University of Bologna Viale Risorgimento 2, 40100 Bologna (BO) Italy. E-mail: [email protected]
G. Colombini et al. / A Wavelet Based Heuristic to Dimension Neural Networks
111
effective. Because of this, the analysis must require a limited amount of time. On the other hand, the commonly accepted procedure for neural network dimensioning and training requires to test different networks of different sizes before choosing the best one [2], which usually requires an excessive computational effort. Thus, we have developed an alternative method in order to estimate an optimal size for the hidden layer and thus to create a single neural network every time.
1.
Wavelet Decomposition
The technique we developed aims at finding a reasonable estimation of the number of hidden neurons H0, depending on a resolution threshold: given this value, a convenient network can be created and trained. In particular, we estimate H0 using an heuristic based on a wavelet decomposition [3], [4]. In general, given a continuous signal s(t) and a wavelet function C(a,b), where a and b are a scaling and shifting parameter respectively, the continuous wavelet decomposition (CDW) is defined by Eq (1):
√
(1)
Since the original signal is actually a discrete time series x[1..T] , we use the discrete transform wavelet (DTW), which yields two series of coefficients. The high frequency coefficients, called detail coefficients, are obtained using a high pass filter. The low frequency coefficients, also known as approximation coefficients, are obtained by filtering with a low pass filter. It is then possible to iteratively apply the transform to the low frequency component, obtaining a Figure 1: Wavelet decomposition signal described at a progressively coarser level of granularity. We have used the Haar wavelet [2], thus at each iteration the Ti samples are replaced by Ti/2 approximations and Ti/2 details, defined as in Eq (2) and Eq (3). Figure 1 shows the transformations from the original signal (on the top) at the different approximation levels.
! "# ! "# $ √"
! "# ! "# √"
(3)
Thus, at the i-th iteration the signal is described using a number of approximation coefficients equal to T/2i, as if it had been sampled with a period that is 2i times higher.
112
G. Colombini et al. / A Wavelet Based Heuristic to Dimension Neural Networks
To choose the level at which to interrupt the decomposition process, we use the energy measure, defined by Eq. (4). This function defines the i-th level residual information. ∑ j xi [ j ] (4) Ei =
∑ x[n] n
where x[t:1..T] are the components of the original signal and xj[t:1..Ti] is the approximation of the i-th level. When the energy value falls below a threshold Emin, the decomposition is interrupted: in our case we set Emin = 0.99.
2.
Multilevel wavelet decomposition
The final values (tj,xj[ti]) can be considered the endpoints of T/2i - 1 consecutive linear evolutions, interpreting the wavelet filtering as a piecewise approximation process. All the tracts, however, have the same temporal extension (xj – xj-1): In some cases (for instance, signals with different types of trends, such as the one shown in figure 2), this can be a serious limitation. Figure 2 also shows the result obtained applying the usual decomposition: the original signal is linearly approximated by a set of trends uniformly distributed in time. This is not optimal: it is immediate to see that the first, constant, part of the signal could have been approximated at a lower level of granularity (i.e. using less points) because of its lack of trends. On the contrary, the second, highly oscillatory, part should have been defined using more points to achieve a better degree of precision. The problem is that the approximation is global and thus the estimated degree of precision is a compromise between the optimal values for different parts of the signal. To overcome this implicit limitation, a further improvement of the standard wavelet decomposition has been designed, to achieve a multilevel decomposition. In the original method, the iterations stop at the same level for all points. Using a multilevel wavelet decomposition, the filter is applied locally. The algorithm starts from the deepest level of the decomposition, with the original signal being shrinked in one root point, and then tries to progressively expand it back until a suitable approximation has been found. The iterations of the Haar filtering, in fact, can be mapped onto a binary tree, the decomposition tree, having a number of levels equal to L = %log2(T)&+1. Nodes at level (L-n) store information on the points after n applications of the filter; notice that its leaves represent the original signal, while the root is the approximation of the full signal. Each node x stores the approximation ax and the detail dx, which can be computed iteratively starting by the leaves according to Eq. (2) and (3) respectively. By convention, the approximation of a leaf node coincides with the point value itself. Given the tree, the algorithm starts from the root and proceeds as follows: at each step, a node is expanded into the two nodes it was generated by, selecting the node with the maximum gain, i.e. the one whose expansion leads to the maximum increment in information degree. Figure 2: standard wavelet decomposition
113
G. Colombini et al. / A Wavelet Based Heuristic to Dimension Neural Networks
To compute the gain, first of all we calculate the information amount of the root node according to Eq. (5)
Er = Where
a r2 a r2 + Dr
(5)
+,
" " '( ∑/
. ∑-. *
(6)
Thus, Er depends on the root-level approximation ar and the detail di[j] of all the other nodes at different levels. Er determines the initial conditions, so that the theoretical gain for each other node x can be computed iteratively. Er depends on the approximation energy Ea and the detail energy Ed, initially set according to Eq. (8) and (9). 0
123 3 12 143
5
12 12 14
1
7 8 12 6
7 7 2 6 8 14 96
5
12 12 14
:; (" (8)
;9 '( < = " > ?@/A B C =( C =D
(10)
Given a candidate node x, the energy gain is given by equation (7),where ax and dx are respectively the approximation and the detail of node x, A(x) is the sum of the approximations of the leaves in the underlying sub-tree, xl and xr are the children of the node x and ?@/A B E returns 1 if x is a leaf, 0 otherwise. Considering all the unexpanded nodes as candidates, we greedily choose the one with the higher gain and expand it into its two children, we update the Ea and Ed respectively with Ea* and Ed* of the winner node and we restart the algorithm from Eq. (7). This step iterates until the cumulative information amount, defined as the cumulative gain G of all the expanded nodes, reaches a certain threshold: the current leaves of the tree yield the desired approximation. As an Figure 3: adaptive wavelet example, the result obtained using the multilevel decomposition wavelet decomposition on the signal in figure 2 is shown in figure 3. We can see how the number of neurons used to approximate the signal is much less in the first constant part rather than in the second, oscillatory one.
3.
Network Initialization
The points obtained from the decomposition define a piecewise linear approximation of the original signal: this allows to use the number of tracts as a good estimation of the number of neurons H which will have to be used in the hidden layer of a network which approximates the data. Ideally, each linear tract, a segment, is mapped onto a sigmoid, which can itself be approximated by a bounded linear function. Given this correspondence, the decomposition not only allows us to estimate H and build the network, but also to initialize it conveniently.
114
G. Colombini et al. / A Wavelet Based Heuristic to Dimension Neural Networks
Neural network parameters are usually initialized randomly and then modified iteratively during the training phase. According to this, the epochs, therefore the time spent, necessary to have the net reach the required target performance are straightly depending on the specific initialization itself. The performance is defined in terms of MSE on the training data themselves. (Notice that the network won't be required to generalize new inputs, inside or outside the domain, so there is no point in leaving out some data for test and validation). However, the linear tracts obtained using the multilevel wavelet decomposition can also be used to initialized the net parameters. Each neuron is placed on one of the [(x1, y1); (x2, y2)] segments obtained by the decomposition: there is a biunivocal correspondence between a tract extremes and the parameters of the sigmoid associated to that neuron, as defined in Eq. (11), (12) and (13) F F " G$ 5H > 7 ! (13) = 7 ! GG H "
7 !
"
The model we obtain with this technique is particularly close to the original signal. This causes the search for an optimal solution to start from a near optimal condition, with the benefit of a lower number of training epochs being potentially necessary to reach the goal.
4.
Test
Multilevel decomposition was tested on a set of signals representing some chemical indicators (pH, redox potential ORP, dissolved oxygen concentration DO) characterizing the process of biological depuration of effluent waters. Given each time series, we aimed to develop a neural network capable of modeling the original signal. In figure 4 we show an example of one of these signals. In figure 5 we show the linear approximation obtained via the decomposition. In figure 6 we show the model obtained after training the associated neural network.
Figure 4: Signal analyzed
Figure 5: Adaptive wavelet Figure 6: Neural net output decomposition
In figure 7 we show another test signal, while in figure 8 and 9 we show the results. As we can see, the number of neurons used is sufficient for correctly modeling the original signal; indeed all signal dynamics are detected.
Figure 7: Signal analyzed
Figure 8: Adaptive wavelet decomposition
Figure 9: Neural net output
G. Colombini et al. / A Wavelet Based Heuristic to Dimension Neural Networks
115
To assess the benefits of this initialization strategy, we applied the following procedure to a data set of approximately 3500 different time series: • We built a network according to ϮϬϬ our initialization method, computing the number ENR of epochs required to have ϭϬϬ the standard gradient descent training algorithm converge to a threshold MSE Ϭ such that the mean relative approximation error is below 1%. ͲϭϬ Ͳϳ Ͳϰ Ͳϭ Ϯ ϱ ϴ • For each time series, we also created 10 randomly initialized networks, Figure 10: Training performance comparison using the same value for H0, and computed the average number of epochs to have their training converge , ER. The histogram in Figure 10 shows how many times a given difference between ER and ENR has been obtained: in most of cases (approximately 72%), the initialization has allowed to improve the training performance. Notice also that differences greater than 10 have been collapsed in the corresponding bins.
5.
Conclusion
The proposed algorithm heuristically estimates the number of neurons required in the hidden layer of a feedforward neural network which has to correctly model a signal, reproducing all its fundamental changes in trend. The advantage is particularly in terms of time: with our methodology, it is necessary to build and train just one network, unlike the classical method which leads to N different networks. Moreover, during the modeling phase we tailor the initialization of the neurons to the input signal, thus reducing the necessary number of epochs to have the network converging.
References [1] Ma Y. Peng Y. Yuan Z. Wang S. Wu X. Feasibility of controlling nitrification in predenitrification plants using DO, pH and ORP sensors. Water Science & Technology Vol.53, No 4-5 pp 235-243, IWA Publishing 2006. [2] Haykin s.. Neural Networks: A Comprehensive Introduction. Prentice Hall, 1999. [3] Mallat S.: A Wavelet tour of signal processing. Academic Press, 1999 [4] Donald B. Percival & Andrew T. Walden: Wavelet Methods for Time Series Analysis. Cambridge University Press, 2000
116
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-116
Support Vector Machines and MLP for automatic classification of seismic signals at Stromboli volcano Ferdinando GIACCO a,1 , Antonietta Maria ESPOSITO b , Silvia SCARPETTA a,c,d , Flora GIUDICEPIETRO b and Maria MARINARO a,c,d a Department of Physics, University of Salerno, Italy b Istituto Nazionale di Geofisica e Vulcanologia (Osservatorio Vesuviano), Napoli, Italy c INFN and INFM CNISM, Salerno, Italy d Institute for Advanced Scientific Studies, Vietri sul Mare, Italy Abstract. We applied and compared two supervised pattern recognition techniques, namely the Multilayer Perceptron (MLP) and Support Vector Machine (SVM), to classify seismic signals recorded on Stromboli volcano. The available data are firstly preprocessed in order to obtain a compact representation of the raw seismic signals. We extract from data spectral and temporal information so that each input vector is made up of 71 components, containing both spectral and temporal information extracted from the early signal. We implemented two classification strategies to discriminate three different seismic events: landslide, explosion-quake, and volcanic microtremor signals. The first method is a two-layer MLP network, with a Cross-Entropy error function and logistic activation function for the output units. The second method is a Support Vector Machine, whose multi-class setting is accomplished through a 1vsAll architecture with gaussian kernel. The experiments show that although the MLP produces very good results, the SVM accuracy is always higher, both in term of best performance, 99.5%, and average performance, 98.8%, obtained with different sampling permutations of training and test sets. Keywords. Seismic signals discrimination, Linear Predictive Coding, Neural Networks, Support Vector Machine, Multilayer Perceptron.
Introduction Automatic discrimination among seismic events is a critical issue for the continuous monitoring of seismogenic zones and active volcanic areas. This is the case of the Stromboli island (southern Italy) where the seismic activity is intense and the analysis of the data should be very fast in order to communicate, as soon as possible, the significance of the recorded information to civil defense authorities. The available data are provided through a broadband seismic network, installed during the crisis of December 2002, to monitor the evolution of the volcanic processes [1,2]. Since its installation, the network has recorded many thousands of explosion-quake and 1 Corresponding
Author: Ferdinando Giacco, Department of Physics, University of Salerno, Via S. Allende, 84081 Baronissi (SA), Italy; E-mail: [email protected].
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
117
landslide signals. The detection of landslide seismic signals and their discrimination from the other transient signals was one of the most useful tools for monitoring the stability and the activity of the northwest flank of the volcano. In recent years, several methods have been proposed for detecting and discriminating among different seismic signals, based on spectral analysis [3,11] , cross-correlation technique [5,6] and neural networks [7,8,9,4,10]. In this paper we report on two different supervised approaches for discrimination among explosion-quake, landslide and microtremor signals, which characterize the strombolian activity. The first method is based on one of the widespread used neural network, the Multilayer Perceptron (MLP), while the second is the Support Vector Machine algorithm (SVM) [13]. The Support Vector Machines, originally developed for the discrimination of twoclasses problems, have then been extended to multi-class settings [15], and nowadays multi-class SVMs architectures like 1vs1 and 1vsAll are widely used in different fields [16,17,18,20], including recent application on seismic signals recognition [19]. The reminder of the paper is organized as follows: Section I describes data and the preprocessing techniques to represent data in a meaningful and compressed form; Section II describes the classification techniques, namely Multilayer perceptron (A) and Support Vector Machine (B); lastly, in Section III, the conclusion on the experimental results are reported.
1. Seismic Data and preprocessing Stromboli is a volcanic island in the Mediterranean Sea located north of eastern Sicily. Stromboli exhibits continuous eruptive activity generally involving the vents at the top of the cone. This activity consists of individual explosions emitting gasses and pyroclastic fragments typically six to seven times per hour. Seismic signals recorded on Stromboli are characterized by microtremor and explosion-quakes, usually associated with Strombolian explosions. This typical Strombolian activity sometimes stops during sporadic effusive episodes characterized by lava flows. The most recent effusive phases occurred in 1930, 1974, 1985, in December 2002 and the last one in February 2007. The December 2002 effusive phase began with a large landslide on the “Sciara del Fuoco”, a depression on the northwest flank of the volcano that generated a tsunami with maximum wave height of about 10 m. After this episode the northwest flank became unstable and as many as 50 landslide signals per day were recorded by the seismic monitoring network operated by the Istituto Nazionale di Geofisica e Vulcanologia (INGV) [12]. The broadband network operated by the INGV for the seismic monitoring of Stromboli volcano has operated since January 2003. It consists of 13 digital stations equipped with three-component broadband Guralp CMG-40 seismometers, with frequency response of 60 sec (see Fig. 1). The data are acquired by digital recorders, with a sampling rate of 50 samples/sec, and are continuously transmitted via Internet to the recording center in Naples at the Vesuvius Observatory (INGV). A more detailed description of the seismic network and data-acquisition system can also be found at www.ov.ingv.it/stromboli.html [1]. Since its installation, the network has recorded as transient signals some hundreds of thousands of explosion-quakes and thousands of landslides, in addition to continuous volcanic microtremor signals. The explosion-quakes are
118
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
Figure 1. Map showing the current network geometry of 13 digital broadband stations deployed on Stromboli Island.
characterized by a signal exhibiting no distinct seismic phases and having a frequency ˝ Hz. Landslide signals are higher in frequency than the explosion-quakes range of 1U10 and their typical waveform has an emergent onset. The microtremor is a continuous signal having frequencies between 1 and 3 Hz. The network has also recorded local, regional and teleseismic events. The data set includes 1159 records from the three components of five seismic stations: STR1, STRA, STR8, STR5, STRB (see Fig. 1). It is made up of 430 explosion-quakes, 267 landslides, and 462 microtremor signals. For each event, a record of 23 sec is taken, at 50-Hz sampling frequency. The arrival-time picking of the explosion-quake and landslide signals has been performed by the analysts, using data windows having about 3 sec of pre-event signal. Hence, we used 5/8 of the available data as the training set (724 samples) and the remaining 3/8 provides the testing set (435 samples). The preprocessing stage is performed using the Linear Predictive Coding (LPC) technique [21], a technique frequently used in the speech recognition field to extract compact spectral information. LPC tries to predict a signal sample by means of a linear combination of various previous signal samples, that is: s∗ (n) c1 s(n − 1) + c2 s(n − 2) + . . . + cp s(n − p)
(1)
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
119
where s(n) is the signal sample at time n and s∗ (n) is its prediction, p is the model order (the number of the prediction coefficients). The estimate of the prediction coefficients ci , for i = 1, . . . p, is obtained by an optimization procedure that tries to reduce the error between the real signal at time n and its LPC estimate. The number of prediction coefficients p is problem dependent. This number must be determined via a trade-off between preserving the information content and optimizing compactness of the representation. Here, we choose a 256-point window using p = 6 LPC coefficients for each signal. Increasing p does not improve the information content significantly, but decreases the compactness of the representation markedly. Therefore we extract six coefficients from each of the eight Hanning windows (5 secs) in which we divided the file signal, each window overlapping with the previous one by 2.5 sec. Because LPC provides frequency information [21], we have also added time-domain information. We use the function fm computed as the difference, properly normalized, between the maximum and minimum signal amplitudes within a 1-sec window Wm : (max[si ] − min[si ]) × N fm = N , i ∈ Wm , m = 1, . . . , N. n=1 (max[si ] − min[si ])
(2)
Thus, for a N = 23 sec signal, we obtain a 23-element time-features vector. Therefore, the dataset is composed of 1159 signals each of which each signal is encoded with a 71-feature vector (6 × 8 = 48 frequency features +23 time features). The use of both spectral and temporal features more closely approximates the waveform characteristics considered by seismologists when visually classifying signals. In the following, we design two supervised techniques to build an automatic classifier and we train them to distinguish between landslides, explosion-quakes, and microtremor.
2. Classification techniques 2.1. Multi-layer Perceptron The multilayer perceptron (MLP) using back-propagation learning algorithm ([19]) is one of the most widely used neural network. There are two kinds of information processing performed in multilayer perceptron. The first one is the forward propagation of the input by the environment through the network from the input units to the output units. The other one is the learning algorithm, which consists of the backpropagation of the errors by the environment through the network from the output units to the input units, and weight and bias updates. The purpose of back propagation is to adjust the internal state (weights and biases) of the multilayer perceptron so that to produce the desired output for the specified input. In our experiments we built a two-layer MLP network for the three-class discrimination problem [25]. Weight optimization is carried out during the training procedure through minimisation of the Cross-Entropy Error Function [22] using the QuasiNewton algorithm [22]. The network output activation function is the logistic while the hyperbolic-tangent is used for the hidden units. Moreover, when logistic output units and cross-entropy error function are used together, the network output represents the prob-
120
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
Table 1. Error matrix corresponding to the best MLP performance, obtained with a network architecture made up of 5 hidden units and 110 training cycles. The overall accuracy is 98.39 %. Classes
Landslide
Explosion-quake
Microtremor
Landslide
97
0
4
Explosion-quake
0
167
1
Microtremor
2
0
164
Figure 2. SVM optimal solution in a two dimensional space for a non-linearly separable classification problem. The distance between the optimal hyperplane and the nearest datum is called margin, while the data corresponding to the filled circles and the filled rectangle are support vectors. The slack variables ξi and ξj are here introduced (see Eqn. 3 ) to allow the violation of constraints for some training samples.
ability of an input vector to belong to one of the investigated classes. The number of hidden units and training cycles has been chosen empirically by trial and error. Lastly, to verify the generalization ability of the network, after the training step we test the MLP on a subset (the testing set) not used to train the network. To assess the system robustness we test the network several times, changing randomly the weight initialization and the permutation of data. In this way the network performance is the average of the percentages of correct classification obtained with each test. The Table I shows the error matrix corresponding to the best classification performance, obtained with an MLP architecture made up of 5 hidden units and 110 training cycles. The best overall accuracy is 98.39% while the average value taken on 10 different permutations of training and test set is 97.2 %. 2.2. Support Vector Machines Support Vector Machines (SVMs) have become a popular method in pattern classification for their ability to cope with small training set and high-dimensional data [13,14,15].
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
121
The SVM algorithm goal is to find the separating decision function with the maximum separating margin, in order to maximize the generalization ability when a new sample is presented. This can be formulated through a lagrangian minimization problem with inequality constraints on data separation. If the training data are linearly separable all the samples lie above the maximum margin, while the data lying on the margins are called support vectors. In our study we used an SVM formulation assuming that data are not linearly separable (see Fig. 2). In this case, we allow the violation of some constraints by introducing the non-negative slack variables, ξi ≥ 0, into the lagrangian problem. Namely, the lagrangian Q (for M training samples) is given by M
Q(w, b, ξ) =
1 w2 + C ξi 2 i=1
(3)
where w is an m-dimensional vector which locates the optimal hyperplane, b is a bias term and C is a parameter determining the weight of the slack variables ξi . The inequality constraints are then given by yi (wT xi + b) ≥ 1 − ξi
for i = 1, . . . , M,
(4)
where xi are the training samples and yi the associated labels (i.e. ±1 in a binary setting). The solution of the SVM problem is then achieved by introducing lagrangian multipliers, α1 , . . . , αM , and looking for the related “dual problem”, given by Q(α) =
M i=1
αi −
M 1 αi αj yi yj xTi xj 2 i,j=1
(5)
subject to the constraints M
yi αi = 0,
C ≥ αi ≥ 0 for i = 1, . . . , M.
(6)
i=1
One of the advantage of the SVM algorithm is that the solution is unique and it only depends on the support vectors. Hence, to enhance linear separability, the original input space is mapped into a highdimensional dot-product space called the feature space. The advantage of using kernels is that we need not treat the high-dimensional feature space explicitly, namely, in solving Eqn. 5 we use K(xi , xj ) instead of xTi xj . The most used “kernel functions” in literature are polynomials (of different degrees) and gaussians. In our experiments we tried both kernel choices, finding that the best performance is achieved using a gaussian kernel, namely K(xi , xj ) = exp(−γxi − xj 2 ),
(7)
where γ is an additional parameter manually determined. Concerning the possibilities to extend the originally binary SVMs to multi-class settings, there has been quite some research recently [23,24]. Two main architectures were originally proposed for an l-classes problem [15]:
122
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
• One versus All (1vsAll): l binary classifiers are applied on each class versus the others. Each sample is assigned to the class with the maximum output. • One versus One (1vs1): l(l − 1)/2 binary classifiers are applied, one for each pair of classes. Each sample is assigned to the class getting the highest number of votes. A vote for a given class is defined as a classifier assigning the pattern to that class. In the current case, we focus on the 1vsAll approach, building 3 different SVMs, each of which is able to separate one specific class from all the others. We tried several values for the parameters C and used the gaussian kernel reported in Eqn. 7. The best classification has an overall accuracy of 99.54% (see Table II), while the average value computed on 10 different permutations of training and test is 98.76 % . Table 2. Error matrix corresponding to the best 1vsAll SVM performance with gaussian kernel. The overall accuracy is 99.54% Classes
Landslide
Explosion-quake
Microtremor
Landslide
91
0
1
Explosion-quake
0
169
0
Microtremor
1
0
173
3. Conclusions Two supervised strategies have been implemented to discriminate among three different seismic events: landslides, explosion-quakes and microtremor. Looking at the results we can state that the discrimination performance is very good for both MLP and SVM algorithms. The MLP best performance has a percentage of correct classification of 98.4, while the average value obtained on several training and test sampling is 97.2%. However, the 1vsAll SVM with gaussian kernel always achieves higher accuracy both in term of best, 99.5%, and average, 98.8%, performance. We also remark that the extracted features used as parametric and compressed representation of the seismic signals give robust information on their nature. This can be also argued by looking at the SVMs results, where the solution only depends on the support vectors, acting as the relevant part of the training set. Indeed, different permutations of the training and test sets provide different results, meaning that many support vectors are present within the data, that is the data representation is well suited for the required classification task.
References [1]
[2]
W. De Cesare, M. Orazi, R. Peluso, G. Scarpato, A. Caputo, L. D’Auria, F. Giudicepietro, M. Martini, C. Buonocunto, M. Capello, A. M. Esposito (2009) - The broadband seismic network of Stromboli volcano (Italy), Seismological Research Letters. In press. M. Martini, F. Giudicepietro, L. D’auria, A. M. Esposito, T. Caputo, R. Curciotti, W. De Cesare, M. Orazi, G. Scarpato, A. Caputo, R. Peluso, P. Ricciolino, A. Linde, S. Sacks (2008) - Seismological monitoring of the February 2007 effusive eruption of the Stromboli volcano, Annals of Geophysics, Vol. 50, N. 6, December 2007, pp. 775-788.
F. Giacco et al. / Support Vector Machines and MLP for Automatic Classification
[3]
123
Hartse, H. E., W. S. Phillips, M. C. Fehler, and L. S. House (1995). Singlestation spectral discrimination ˝ using coda waves, Bull. Seism. Soc. Am. 85, 1464U1474. [4] Del Pezzo, E., A. Esposito, F. Giudicepietro, M. Marinaro, M. Martini, and S. Scarpetta (2003). Discrimination of earthquakes and underwater explosions using neural networks, Bull. Seism. Soc. Am. 93, ˝ no. 1, 215U223. ˝ [5] Joswig, M. (1990). Pattern recognition for earthquake detection, Bull. Seism. Soc. Am. 80, 170U186. [6] Rowe, C. A., C. H. Thurber, and R. A. White (2004). Dome growth behavior at Soufriere Hills volcano, ˝ Montserrat, revealed by relocation of volcanic event swarms, 1995U1996, J. Volc. Geotherm. Res. 134, ˝ 199U221. [7] Dowla, F. U. (1995). Neural networks in seismic discrimination, in Monitoring a Comprehensive Test Ban Treaty, E. S. Husebye and A. M. Dainty (Editors), NATO ASI, Series E, Vol. 303, Kluwer, Dor˝ drecht, The Netherlands, 777U789. [8] Wang, J., and T. Teng (1995). Artificial neural network based seismic detector, Bull. Seism. Soc. Am. ˝ 85, 308U319. [9] Tiira, T. (1999). Detecting teleseismic events using artificial neural networks, Comp. Geosci. 25, ˝ 929U939. [10] Esposito, M., F. Giudicepietro, L. D’Auria, S. Scarpetta, M. G. Martini, M. Coltelli, and M. Marinaro (2008). Unsupervised Neural Analysis of Very-Long-Period Events at Stromboli Volcano Using the ˝ Self-Organizing Maps, Bull. Seism. Soc. Am., Vol. 98, No. 5, pp. 2449U2459. [11] Gitterman, Y., V. Pinky, and A. Shapira (1999). Spectral discrimination analysis of Eurasian nuclear tests and earthquakes recorded by the Israel seismic network and the NORESS array, Phys. Earth. Planet. ˝ Interiors 113, 111U129. [12] Martini, M., B. Chouet, L. DŠAuria, F. Giudicepietro, and P. Dawson (2004). The seismic source stabil˝ ity of the Very Long Period signals of the Stromboli volcano, in I General Assembly AbstractsUEGU, ˝ April 2004. Nice, 25U30 [13] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory, Springer. [14] Webb, A. R. (2002). Statistical Pattern Recognition, John Wiley and Sons. [15] Schffolkopf, B. and A.J. Smola (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, MIT Press. [16] Melgani, F. and L. Bruzzone (2004). Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. on Geoscience and Remote Sensing, vol. 42, pp. 1778-1790. [17] Foody, G. F. and Ajay Mathur (2004). A relative evaluation of multiclass image classification by support vector machines, IEEE Trans. on Geoscience and Remote Sensing, vol. 42, pp. 1335-1343. [18] Hsu, C. W. and C. J. Lin (2002). A comparison of methods for multiclass support vector machines, IEEE Trans. on Neural Networks, vol. 13, pp. 415-425. [19] Masotti, M., S. Falsaperla, H. Langer, S. Spampinato, and R. Campanini (2006), Application of Support Vector Machine to the classification of volcanic tremor at Etna, Italy, Geophys. Res. Lett., 33, L20304, doi:10.1029/2006GL027441. [20] Kahsay, L., F. Schwenker and G. Palm (2005). Comparison of multiclass SVM decomposition schemes for visual object recognition, LNCS, Springer, vol. 3663, pp. 334-341. [21] Makhoul, J. (1975). Linear prediction: a tutorial review, Proc. IEEE 63, 561-580. [22] Bishop, C. (1995). Neural Networks for Pattern Recognition, Oxford University Press, New York, 500 pp. [23] F.Giacco, S. Scarpetta, L. Pugliese, M. Marinaro and C. Thiel. Application of Self organizing Maps to multi-resolution and multi-spectral remote sensed images, “New directions in neural networks”, Proceedings of 18th Italian Workshop on neural networks (WIRN 2008), IOS Press (Netherlands), pp. 245253. [24] C. Thiel, F. Giacco, F. Shwenker G. Palm. Comparison of neural Classification Algorithms applied to land cover mapping, “New directions in neural networks”, Proceedings of 18th Italian Workshop on neural networks (WIRN 2008), IOS Press (Netherlands), pp. 254-263. [25] A. M. Esposito, F. Giudicepietro, S. Scarpetta, L. D’Auria, M. Marinaro, M. Martini (2006) - Automatic discrimination among landslide, explosion-quake and microtremor seismic signals at Stromboli volcano using Neural Networks, Bull. Seismol. Soc. Am. (BSSA) Vol. 96, No. 4A, pp. 1230-1240, August 2006, doi: 10.1785/0120050097
This page intentionally left blank
Chapter 3 Economy and Complexity
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-127
127
Thoughts on the crisis from a scientific perspective a
Jaime GIL-ALUJA a,1 President of “Real Academia de Ciencias Económicas y Financieras”, SPAIN
By way of an emotional memorial The news of the sudden death of Massimo Salzano came, to the circle of those of us who enjoyed his friendship, as a serious blow which has left a feeling of incredulity and sense of loss on realising how difficult it will be to recompose the paths of the research laid down by him with so much enthusiasm and wisdom. His disciples above all else, but also his companions, will feel his absence, loss of human warmth and advice which he always gave with a generosity without limits. In this feeling of sadness, it is comforting to feel that the influence left by Professor Salzano will not disappear and that those who received his teachings will continue the work that today has not been concluded due to his premature passing. The first proof of this is the posthumous homage which is to be offered during the WIRN 2009 Conference. Our desire has been to adhere to this recognition to Massimo Salzano by contributing a few modest thoughts, with the hope that these will be an emotional memory of the conversations held during lengthy meetings, at which the future of scientific activity in the field of economics and management occupied a large part of our scientific concerns. It corresponds to the generations that follow on to take over and work as we have attempted, in order to attain, from the vantage point of science, a world of comprehension, solidarity and social justice.
1. Economic and financial aspects of current realities Those of us who by the passing of years have crossed the threshold of maturity, feel that we are invested with sufficient experience to be able to proclaim the important advance which has taken place during the last decade, in the process of the creation of a society with progress without precedent throughout history. The achievements that have been attained the result of a whole conglomerate of effort carried out by the most diverse classes, with the common denominator of freedom in stability. The development of research in the different spheres of knowledge, in our opinion, constitutes one of the fundamental axes around which the progress attained has revolved. This progress can be seen today as menaced by the ghost of the crisis, which is hitting, particularly all the under privileged, so hard. How has this 1 Jaime GIL-ALUJA: Real Academia de Ciencias Económicas y Financieras, Vía Laietana, 32; 08003Barcelona. Spain. E-mail: [email protected].
128
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
situation been arrived at and what are the paths that economic research can propose to shorten the time and depth of the consequences of the crisis are the aspects that concern actors in economy the most. In order to help in clarifying these questions certain brief considerations may be useful relative to the fundamental changes there have been over latter decades in economic structures of the Western countries of Europe, with the transfer of industrial activity to other developing countries, giving priority to what could be called “splitting the economy in a tertiary nature”. We feel it is premature to present conclusions on the relation between the productive structure on the one hand, and the intensity of the economic crisis on the other. As a prior step it would be necessary to delve deeper into the set of relations of induced direct incidence. Only after would it be effective to draw up the necessary industrial policy, not only at the level of each country, but also, and above all else, at a European level. To enter into a process such as this also requires an analysis broken down by sectors, by dimensions and by type of business, given the technological heterogeneity between one and another sector. It cannot be forgotten that the level of technical progress and innovation in industry is translated into a continuous increase in productivity of the sector in question. On the other hand the diversity of the industrial presence in European countries, with a similar level of income and consumer structure, should be highlighted. Let us remember that, for example in Germany, the importance in the manufacturing industry is quantitively in the order of double that of Britain, France or America. The conclusion reached is evident and revealing: the countries with a high industrial presence show a much more favourable commercial balance relative to those countries which have become de-industrialised faster, whatsoever the dimension of the market and degree of sectorial specialisation. These reflections, chosen from among several others that affect the sphere of industrial activity, reflect the situation that precedes economic crisis, of which it is not possible to establish precisely what will be its depth and duration. At present there are no indications as to how the different countries will come out of the crisis, but the direction of the debates that are taking place makes us think that the “industrial problem” will occupy a prominent place, both in academic discussions and in the governmental policies of all countries with a high degree of development. We believe that “industrial policy” will be talked about again. But this return, nevertheless, must not be used for protectionist ends. To the unaware spectator these reflections may appear to be marginal when we find ourselves in the presence of a very serious financial crisis at a global level. Nothing can be further from the truth. To speak in these circumstances of structural problems of a lengthy course is the best way to prepare a better future. If in previous years this challenge were to have been met, certain disasters would have been avoided. It is very true to say that the financial crisis has become added to and has overlapped the economic crisis. The inter-relation between the two, that is obvious, excuses any explanation. The detonator of the perverse accumulative process cannot be explained by a single cause, but on the other hand what has become more visible is the incorrect use of the national financial system, which with sophisticated and imaginative “architecture” has corrupted the interconnection network of highly globalised financial institutions. Contagion has been very fast and generalised. The reality is that we now find ourselves with the fact that the ghost of the crisis is to be found in every corner of our planet, presenting itself disguised in the most varied
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
129
clothes: economic, financial, political, social and moral. But also from ghostly apparitions the light of an opportunity can arise, the opportunity of change that must be a systematic change. In fact, we have been living in a system with an American base, which in the immediate past represented an opportunity for progress in freedom and prosperity without precedent. This cloud of prosperity has given rise to the appearance of that phenomena, which precede the sinking of every structure with human roots, be this a country, a financial system, a business or a family, that is superficiality. Superficiality of the financial system has on this occasion resulted as dangerous as a field sited on quicksand. This has ended up by devouring all that functioned on it in an artificial manner. Thus what has occurred, in a slow but inexorable manner, is a rupture between the economic-financial model and the real economy. As a consequence of this the inevitable result has been the loss of confidence of citizens of all countries. To regain this confidence will require an important teaching effort and effort in dialogue. But this alone will not be sufficient. And this is so since what is in play is the design of a new cultural model. This cultural model must include from new values that are capable of substituting those that are outdated and today non-existent, right up to security and control of any possible excesses in the exercise of freedom. From here the importance of the setting up of a framework within which freedoms move. The drawing up of this cultural model must, also, respond effectively not only to the economic and financial consequences of globalisation but also take into account the process of shifting towards a society that is all the time becoming more complex and full of uncertainties2. All of us, who are immersed in research tasks in the field of social sciences, cannot be oblivious of the study of the economic-financial problems that arise in society. It is our duty to contribute at each interval in time, those solutions that are susceptible of resolving or mitigating the maladjustments that prevent the development of the capacities of all citizens within the framework of justice and freedom. And in this sense, today we find ourselves at one of the most enthralling moments in time through which human knowledge has passed. Those who have the mission of taking decisions in any spheres: political, social, economic or in business cannot be oblivious to the difficult realities they are faced with. What is required then is a profound reflection on how to approach the solution to the problems that are placing obstacles in the way of the common objective of lessening the pain of those who are suffering the fluctuations of economic and financial disruptions. It is true to say that, increasingly, socio-economic techniques are available to face up to the recessive phases of economic cycles, but it is also true to say that the confusion of events is deepening, in such a way, that future perspectives are overloaded with a high degree of uncertainty. We are living in a world in which events take place with unusual speed, they crowd in on us, they get mixed up and disappear very fast. Mutability constructs one of the most outstanding characteristics of our time. When a success is attained, triumph is short lived, it is rapidly forgotten. When a blunder is made, when we are surrounded by failure, suffice it to wait for the calendar to bring on another event, which will allow us to forget the previous one. Efforts made during economic bonanza were highly satisfactory, above all in the sense of adjusting speculative economic knowledge to the realities on which concepts, 2
These comments were extracted from the paper by the autor given at the Solemn Academic Act of the “Real Academia de Ciencias Económicas y Financiaras” of Spain, at Bilbao on the 5th of February 2009.
130
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
methods and techniques created by it are to be used. But the implacable force of events obliges economic research to seek out new paths. Increasingly the difficulty of using the techniques arising from a past for the solution of the problems arising from new realities becomes more obvious. This is because the realities requiring adjustment have not been located in an ideal world, but in this world that surrounds us.
2. A glance at the scientific activity of the past Why than is this disruption repeatedly manifest between theory and reality? Perhaps we can find the answer in the routine of economic research by copying the searching with which physicists observed the universe. They hope that with this they would find those signals by means of which social facts and phenomena could be represented. In this way, little by little, economic science became impregnated with the mechanism that is typical of physics, the brilliant trajectory of which warranted the highest admiration, from the moment that Tales de Mileto (624 BC to 546 BC) raised his eyes to heaven and conceived the fundamental questions on the functioning of the cosmos. The lees of the mechanist culture, deposited over so many centuries in the formation of the scientific edifice, could not pass unseen in the construction of economic science. The phenomena of economics were studied by considering economic systems as “large Meccanos”, thinking, like the physicists, the differential equations could show the supposedly regular behaviour of the agents acting within them. If the universe followed known laws, why then should economic systems not do so? Physical models which function like a clock, therefore acceptance of economic systems that function like a clock. Mechanist physical models, then acceptance of an economy of mechanist systems. Economic science, then, is supported, from the outset, on the mechanics of movement, which describes processes of a reversible nature, where the direction of time plays no part whatsoever and in which there is no place for uncertainty. The consideration of temporal reversibility in economic science has been one of the most permanent obstacles for the development of new paths of knowledge, above all when we delve into the different ways researchers have conceived time from a formal perspective and the perception held of it by economic agents in its real activity. In fact, the person faced with the task of taking a decision normally associates, the reality of the current time, the past is no longer and the future has not yet arrived. It would appear that thought is displaced in such a way that the uncertainty of tomorrow is no longer but is converted into the passing reality of today, which, in its turn allows passage of the certainty of the past. But this vital perception, clashes head on with the rationality with which economic science assumes the concept of time. For it, there exists a “temporary landscape” in which all the events of the past, the present and the future are to be found: Time does not move, what moves are objects in time. Time does not pass. It simply is. The flow of time is unreal, what is real is time. For mechanism, which is typical of economy, a clock measures intervals between events, it does not measure the speed with which one passes from one event to another. Thus it is accepted that the past, present and future are equally real: eternity is present in all its infinite dimension. We enjoy reading the work that contains the continued correspondence between Michele Besso and Albert Einstein: Before the insistent question of the former: What is time? What is irreversibility the latter replies “irreversibility is an illusion”. On the
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
131
death of Besso, Einstein writes a letter [1] to his sister and son that contains the following words. “Michele has preceded me some on abandoning this strange world. This lacks all importance. For us, believing physicists, this separation between past, present and future has no more value than an illusion, however persistent this may be”. Notwithstanding this categorical statement, it is difficult to accept a nature without time. Carl Rubino [2] reminds us that Homer, in the “Iliad”, locates Achilles in a position of seeking something permanent and unchanging, which is only attained at the cost of the humanity of the individual: he has to loose his life in order to attain this higher plane. A sour lesson that Achilles learns all too late. The work supports itself therefore, on the problem of time. As a counterpoint, in “The Odyssey” [3], Odysseus can choose, and his fortune is to have the capacity of opting between eternal youth and immortality (will he always be the lover of Calypso) or the return to humanity, that is to say, to old age and death. He decides for time and human destiny, rejecting eternity and the destiny of the gods. Must economic science elect between the atemporal conception, which presupposes human alienation, and acceptance of time, which appears to contravene scientific rationality? There is a feeling of a profound incompatibility between “classical reason” with an atemporal vision and “our own existence” seasoned by time. It is difficult to reject the validity of the concepts of past and future, even though sustaining the non-existence of the “flow of time”. Existing in economy is a multitude of irreversible phenomena. We could say even that they are in the majority. In a limit situation, the existence could be accepted of an asymmetry of objects in time, although not an asymmetry of time. It is true to say that atemporality constitutes a solid base on which to found the concept of stability of equilibrium, a fundamental element of economic science. But this does not exclude the initial difficulty of combining the realities of our convulsed society with “orthodox doctrine”. An attempt to attain this came hand in hand with Ilya Prigogine (1917-2003) when he differentiated the structures of equilibrium and dissipative structures [4]. A structure of equilibrium does not require exterior flow for maintaining it, therefore it is prohibited from all entropy generating activity, given that without any external contributions which maintain dissipation it will disappear and the system attains a state of equilibrium. Therefore, only when instability does not exist are mechanist laws totally complied with. Could the financial situation, which we are passing through, be a revealing example? It is curious to see how this original contribution of Prigogine draws us closer to the marvellous adventure, which commenced nearly 150 years ago with the publication in 1859 of the fundamental work “The Origin of species”. In fact Darwin combines two elements fluctuation and irreversibility, when he sustains that the fluctuations in biological species, thanks to the selection of the medium give rise to irreversible biological evolution. Of the association between fluctuations (which is similar to the idea of chance, we would say uncertainty) and irreversibility what takes place is an auto-organisation of systems with growing complexity. In the field of economy, evolution in social, economic and management institutions, broadly speaking, can be conceived as a pseudo-genetic renovation, which takes place in bodies of the States and other public institutions as well as in businesses. This pseudo-genetic renovation gives rise to successive generations of economic systems and businesses, and makes each one of them structurally unrepeatable. This then is a temporarily irreversible process, which breaks the mechanist schemes of classical and neoclassical studies of the economy, that are overloaded with atemporality.
132
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
We are conscious of the difficulties of incorporating evolutionary and irreversible schemes into the daily routine of teachers and researchers. The problem is not a new one. The breakdown in the economy that signifies the evolutionary idea relative to the mechanist, is creating the same doubts and the same rejections, which in its day were signified by Darwinism as an alternative to what is today known as creationism in biology. Illustrating what we have just stated is the lawsuit, which came into being just a year after the publication of the work of Darwin between Bishop Wilberforce and Thomas Huxley (grandfather of Aldus Huxley). It is said that in one of his public interventions Wilberforce suddenly blurted out to Huxley the presumably hurtful question of if he descended from a monkey was this through his father or his mother. Huxley replied that he would prefer to descend from a monkey rather than from a bishop. The use of temporal reversibility and mechanics in economy has given us as a result a determinism in which the notions of liberalism and freedom are works that have been divested of all meaning, when we have attempted to seek answers to the essential questions for economic reasoning. Paul Valéry states that “the sense of the word determinism possesses the same degree of vagueness as that of freedom” [5]. We should in this respect remember the reflection of Karl Popper [6] when he points out, on the one hand, “every event is caused by an event, in such a way that all events could be foreseen or explained…” But also, on the other hand he adds that “common sense attributes to healthy and adult people the capacity to choose freely between several paths…”. This type of interior contradiction constitutes a major problem that William James [7] called the “dilemma of determinism” which on transferring it to economy we become aware that what is neither more nor less in play is our relation with society. In fact, has society been written or is it in permanent construction? If for a large quantity of physicists, among whom is Einstein, the problem of determinism and also of time has been resolved, for philosophers it continues to be a question mark. Thus Henri Bergson [8] states that “time postpones or, better said, is a postponement”. Therefore it must be elaboration. Will it not be then the vehicle for creation and election? Does the existence of time not prove then that there is indetermination in things? In this way, for Bergson realism and indeterminism walk hand in hand. Also Karl Popper considers that “the determinism of Laplace – confirmed as it appears to be by the determinism of physical theories and his brilliant success – is the most solid and serious obstacle in the way of an explanation and an apology of human freedom, creativity and responsibility” [9]. The fact that the determinist idea is present in western thought from pre-Socratic times is causing deeply felt tension when attempting to give an impulse to objective knowledge and, simultaneously, promote the humanist ideal of freedom. Science would fall into a contradiction if it were to opt for a determinist concept when we find ourselves involved in the task of developing a free society. One cannot identify science and certitude on the one hand, with ignorance and possibility on the other.
3. The new paths for knowledge of complex realities This confirms to us that research activity is at a crossroads in which what is in play is the future of science. On the one hand what we will have is the geometric conception of knowledge, and on the other the Darwinian conception. On the one side, the sublime
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
133
and well-known reiterative songs, which are renewed in their forms. The dream of reducing the functioning of the world to the predictability of a Meccano. On the other hand, the emptiness of the unknown. The attraction of adventure. The invitation to jump forward only guided by the hope of opening up new horizons. The response to the calling of Ludwig Boltzmann, Bertrand Russel, Lukasiewicz, Zadeh, Lorenz, Prigogine and Kaufmann. The rejection of the yoke of pre-destination and the proclamation of the freedom of decision. On our wanderings through the spheres of economic research we have dedicated an academic life to fighting determinism and pre-destination, aiding in the construction of theoretical and technical elements that are carriers of freedom. We have had the great fortune to receive the teachings of some of the great creators of innovating ideas. We recall in our youth the teachings of François Perroux, clamouring against the transfer to the economic sphere of mechanist models. Later, in the mid 60’s, it was Lotfi Zadeh who with the concept of fuzzy sets opened up the doors so that Arnold Kaufmann could develop and initially expand not only certain innovating techniques but a new way of channelling thought, which is versatile, modular and qualifying. Essential for transgressing the essences of economic determinism were the lessons received from Ilya Prigogine who in 1977 was awarded the Nobel Prize for Chemistry for his contributions to thermodynamic imbalance, particularly with the theory of irreversible processes. On the occasion of the International SIGEF Congress in Buenos Aires [10], we attempted to set up the Epicurean position in the new coordinates arising from the findings of Zadeh [11], by enunciating the “principle of gradual simultaneity” (all propositions can be at one and the same time true and false, on the condition of assigning them a degree of truth and a degree of falseness). Before and afterwards, a good number of scientists have placed, stone on stone, the foundations of what can be a new building of knowledge. But still required is a large dose of imagination in order to break the links that tie us with the past, placing in their place “non linear” differential equations, that carry a large descriptive arsenal of uncertain situations. On the other hand, it can be seen that the forms that we find in our world, both in the sphere of physical objects as also in the economic, normally have no similarity with the traditional geometric figures of mechanist mathematics. And this is so in spite of the reiterated demands to the contrary from the statement, in 1610, by Galileo Galilei, in the sense that: “mathematics is the language of nature”. The truth is that the geometry of nature is difficult to represent by means of the usual forms of Euclid or by differential calculation. And many times the same occurs with the physical and mental objects of economy. Their limited order convert them into “chaotic”. Benoit Mandelbrot in his work The Fractal Geometry of Nature [12] states that the clouds are not spheres, mountains are not circles and the bark of a tree is not smooth. With this idea a new mathematics developed that is capable of describing and studying the irregular shapes of natural objects. He minted a name; fractals3, to describe these new geometric forms. The fractals, as occurs with chaos, are based on the structure of irregularity. In the two, geometric imagination acquires a fundamental importance. Now then, if in fractals geometry dominates, in chaos this is subordinated to dynamics. It could be said that fractals contribute a new language that is susceptible of describing the form of chaos.
3
The Latin adjective fractus can be translated as interrupted, irregular.
134
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
The possibilities of representing irregular economic phenomena in a geometric manner, open the doors to the use of fractals in the sphere of social sciences. Could the concern for the fluctuations in the stock market not be a stimulus for the study of this new geometry of nature by economists and specialists in management? Three fundamental axes make up the search for a new way of thought in economic science: uncertainty faced by certainty, irregularities faced by the laws of nature, and complexity before linearity. Uncertainty, irregularity and complexity would appear then to be the principal challenges that the changeable realities of our day are placing before social and economic research. It is necessary to delve into the depth of each one of the levels of knowledge in order to attempt to find, in each one of them the keys that allow us to open the doors to an efficient treatment of uncertainty. To the “principle of the excluded third” we have given preference to “the principle of gradual simultaneity”. “Boolean logic” has been generalised by a whole range of “multivalent logics”. The “mathematics of certainty and chance” has become completed with the “numerical and non-numerical mathematics of uncertainty”. The “models and algorithms” used now for over 50 years have become relegated or transformed in the light of the “theory of fuzzy sub-sets”. The study of economic irregularities is one of the pending subjects both for research and for teaching. Nevertheless, interesting works give faith to the concern of students of this field. Attempts have been made to describe the optimisation of share portfolios by seeking a solution by means of fractal geometry. Projects have appeared for estimating the more volatile values of economic systems by using tools that are capable of representing irregular structures. However, much is lacking in order to arrive at a coherent body on which to support works that include formal solidity and practical efficiency. It is said that the famous and beautiful phrase of Edward Norton Lorenz (19182008) “can the beating of the wings of a butterfly in Brazil cause a storm in Texas?” is the most intuitive representation of chaos. Without trying to belittle the extraordinary interest of his works, we feel that what we could call the theory of complexity goes much further and includes other aspects than those considered by Lorenz. At the beginning of the 90’s of the XX century, as a result of the drawing up of a work with Arnold Kaufmann [13] we were able to test the importance of those systems in which, starting out from precise data and a determinist system certain results are arrived at which did not follow any recognisable pattern. The behaviour appeared to be chaotic. A little later we attempted to go deeper into the possibilities of incorporating the theory of chaos into the treatment of certain economic problems. The results were made public with the publication of a book [14] and two articles [15] [16], which were widely distributed. We felt that we had opened a new door to future research. Today, with the perspective of time, we are aware of the lengthy path still to be covered. I would like to feel that my last words sounded like a song of hope. For this, we will resort to the words of Einstein when he says that “creativity is born from anguish just as the day is born from the night. It is in crisis when inventiveness, discoveries and great strategies are born. He who overcomes the crisis surpasses himself without being surpassed. He who attributes to the crisis his failures and penury, violates his own talent and has more respect for the problems than for their solutions”. Science must play an important role in the rules that in the future govern international relations. We are very confident in future contributions made within the heart of the new fields that have opened in research activities. These contributions must be the ones to expand the
J. Gil-Aluja / Thoughts on the Crisis from a Scientific Perspective
135
light of science, and at the same time strengthen solidarity and well being of all citizens. Only this way will lead us to the desired sustainable social progress.
References [1]
[2]
[3] [4] [5] [6] [7] [8] [9] [10]
[11] [12]
[13] [14] [15] [16]
Einstein-Besso, Correspondence 1903-1955. Edition, prologue and notes by Pierre Speziali. Hermen, Paris 1979. Page 88. Translated into Spanish by Tusquets Editores, S.A. Barcelona, 1994. pages 454455. C. Rubino, Winged Chariots and Black Holes: Some reflections on Science and Literature. Manuscript quoted by Ilya Prigogine in a paper given at the Jawaharlal Nehru University, New Delhi, December 18,1982, under the title of “Only an Illusion”. J.P. Vernant, Le refus d’Ulysse, Le temps de reflexion III, 1982. I. Prigogine, La fin des certitudes. Versión in Spanish Publ. Taurus, Buenos Aires 1997. pp 11-12. P. Valery, Cahiers, I. Bibliotheque de la Pléiade. Publ. Gallimand. Paris 1973. Pages 531-651. K. Popper, L’univers irrésolu. Plaidoyer pour l’indéterminisme. Publ. Hermann Paris 1984 page XV. W. James, The Dilema of Determinism in the Will to Believe. Publ. Dover. New York, 1956. H. Bergson, Le Posible et le réel in: Oeuvres. Presse Universitaires de France. Paris 1970, page 1333. K. Popper, L’univers irrésolu. Plaidoyer pour l’indéterminisme. Publ. Hermann. Paris 1984, page 2. J. Gil Aluja, Lances y desventuras del nuevo paradigma de la teoría de la decisión. Proceedings of the III Congres of the International Society of Management and Fuzzy Economics. Buenos Aires, November 10-13 1996 (not numbered). L. Zadeh: Fuzzy Sets. Information and Control. June 8, 1965, Pages 338-353. There is in existence a versión in Spanish translated from the third French edition titled “Les Objects fractals. Forme, hasard et dimension”, published in 1993 by Tusquets Editores, S.A. titled, “Los objetos fractales” versión to which we contributed. A. Kaufmann and J. Gil Aluja, Nuevas técnicas para la dirección estratégica. Publ. University of Barcelona, 1991. J. Gil Aluja and N.H. Theodorescu, An introduction to Chaos Theory and Application. UNIL-HEC Laussane, 1994. J. Gil Aluja and N.H. Theodorescu, Phénomènes économiques chaotiques de croissance: modèles flous in Trends in Fuzzy Systems and Signals. AMSE Press, Tasin (France) 1992. N.H. Theodorescu, J. Gil Aluja and A.M. Gil Lafuente, Periodicity and Chaos in Economic Fuzzy Forecasting. Kyushgu Institute of Technology Iizuka, 1992. pp. 85-92, reproduced in: An Introduction to Fuzzy Systems. LEAO_LAMI, Laussane 1994 and in: An Introduction to Chaos Theory and Applications, Unicersity of Laussane, 1994.
136
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-136
Aggregation of opinions in Multi Person Multi Attribute decision problems with judgments inconsistency Silvio GIOVEa,1 and Marco CORAZZAa, b Department of Applied Mathematics, Ca' Foscari University Venice – Sestiere Dorsoduro 3825/E, 30123 Venice, Italy b Advanced School of Economics of Venice, Faculty of Economics, Ca' Foscari University Venice – Sestiere Cannaregio 873, 30121, Venice, Italy
a
Abstract. In this paper we consider a Multi Person Multi Attribute decision problem in which a finite number of alternatives has to be scored on the basis of a finite number of criteria, using different Stakeholders’ judgments. In particular, first we propose a new individual preferences aggregation method which takes possible Stakeholder’s inconsistencies into account, measured by a new consistency measure we call μ–consistency, then we focus the attention on the aggregation process of the Stakeholders’ preference structure about the criteria, preference structure which is represented by non additive measures defined over the space of the criteria. Keywords. Group decision theory, multi criteria analysis, consensus management, non additive measures, Choquet integral.
Introduction In this paper we consider a Multi Person Multi Attribute decision problem in which a finite number of alternatives has to be scored on the basis of a finite number of criteria, using different Stakeholders’ judgments. Let us suppose that the Stakeholder’s preferences can be represented by suitable non additive measures so that interactions among the criteria can be considered, over passing the limitations of traditional simpler weighted average approach ([7]). Here we propose a new individual preferences aggregation method which takes possible Stakeholder’s inconsistencies into account, measured by a new consistency measure we call μ–consistency. Each individual Stakeholders’ judgment will be more or less weighted in the aggregation process on the basis of its own consistency. Despite other contributions in the Multi Person Decision Theory ([3], [6], [8]), here we do not address the consensus management problem, neither we discuss the elicitation of the opinions for the different alternatives, i.e. we suppose that the criteria values are exogenously assigned. Conversely, here we focus the attention on the aggregation process. To this purpose we analyze the individual behavior of each Stakeholder for what concerns the preference structure about the criteria, preference structure which is represented by non additive measures defined 1
Corresponding author.
137
S. Giove and M. Corazza / Aggregation of Opinions
over the space of the criteria. In fact, a wide range of aggregation function can be formalized by such an approach. The remainder of this paper is organized as follows. In Section 1 we introduce the problem, in Section 2 we briefly describe the non additive measures and the Choquet integral for Multi Attribute decision problems. In Section 3 we present the μ– consistency item and the aggregation algorithm to be used. Finally, in Section 4 we propose a numerical example, followed by some conclusions in Section 5.
1. The Multi Person Multi Attribute decision problem Let us suppose that M alternatives need to be scored on the basis of N criteria, each of them normalized in the common [0, 1] scale, with the usual meaning: 0 indicates that the criterion is completely unsatisfied, 1 indicates that the criterion is completely satisfied. Without loss of generality, we suppose that each criterion is a benefit, i.e. an higher value is preferred to a lower one. For each alternative the score is computed aggregating the judgments of a focus team formed by K Stakeholders E1 , E2 ..., E K . Let us suppose that the decision structure is represented by a set of non additive measures, defined on the space of the criteria 2 . The Stakeholders evaluations are aggregated using the simple weighted average (WA) approach, computing the weighted average of each individual non additive measure. The main purpose of this paper consists in the determination of the Stakeholders’ weights to be used in the aggregation process by using our definition of μ–consistency index. As the K Stakeholders are supposed to be rational, the K measures need to be monotone. But, due to an usually unavoidable source of uncertainty, some Stakeholder can violate some of the monotonicity conditions, as sometimes verified if the measures are extracted by means of a suitable questionnaire ([4]). A similar phenomenon is observed in the AHP methodology when the cardinal transitivity is not verified for all the pairwise comparison ([17]). To distinguish from the AHP consistency, we call the non additive based consistency as μ–consistency. If a Stakeholder exhibits a stronger inconsistent behavior than a second one, her/his weight will be decreased. Using those weights, the individual non additive measures are aggregated obtained a global measure, and the score of each alternative is computed applying the Choquet integral to its criteria vector. This way, the Stakeholder is involved in the decision process only for what concerns the determination of the relative importance of coalitions of criteria. In order to evaluate the importance’s weight of each Stakeholder, we introduce the μ–consistency index, using her/his inner and outer measures associated to every (non monotonic) non additive measures.
2. Non additive measures and the Choquet integral for Multi Attribute decision problem An
aggregation
function
(AF)
is
a
monotonic
continuous
function
F (x ) : [0 ,1] → [0 ,1] which satisfies the border conditions F (0 ,0 ,K ,0 ) = 0 and N
2 Only few applications of non additive measures to Group Decision Theory have been proposed in the past ([1], [9]).
138
S. Giove and M. Corazza / Aggregation of Opinions
F (1,1,K ,1) = 1 . AFs are the WA operators, the quasi-median operators, the OWA operators ([19]) and many other ones. Here we consider an AF based on the Choquet integral with non additive (fuzzy) measures ([11], [12], [13], [14], [18]), a general and widely used approach which, depending on the measure values, includes as particular cases many operators as the WA, the OWA, the min and the max, the median and others again. Since a numerical value is assigned to every possible subset of the criteria, Choquet integral-based operators can model synergic and redundant relationships among the criteria. Let P(C ) be the power set of C, where C is the set of criteria, with
N=C . Definition 1. A non additive measure is a set function μ ( A) , A ∈ P(C ) , with the
monotonicity property T ⊆ S ⇒ μ (T ) ≤ μ (S ), ∀ T , S ⊆ C , and with the boundary conditions μ (∅ ) = 0, μ (C ) = 1 . ■
Probability measures and belief functions are particular cases of non necessarily additive measures. Notice that 2 N − 2 values need to be assigned 3 , given the two
boundary conditions, being card {P(C )} = 2 N . The measure μ (S ) , S ⊆ C , gives the relative importance of the coalition S. Through the Möbius transform ([13]), every non additive measure μ (S ) is mapped to α (S ) (notice that if α (S ) < 0 it is redundant). If the coalition S is synergic, the weight of the coalition S is greater than the sum of the weight of every element of its partition. The contrary holds if the coalition is redundant, even if the measure of S cannot be lower than the minimum measure of the elements of the coalition, otherwise the monotonicity condition would be violated4. If the measure is additive, as in WA, the sum of the weights of the elements of every coalition equals the weight of the coalition itself. In particular, μ (S ) = 0 implies that the importance of the subset S is null. Consider the following example with only two criteria: μ (1) = 0.1 and μ (2) = 0.2 .
As μ (1,2) = μ (C ) necessarily equals 1, the measure is synergic and models a conjunction behavior. In fact, the full satisfaction is obtained if both the two criteria are satisfied. The complete satisfaction of only one of them can not compensate the null satisfaction of the second one. As a counterpart, consider the following example: μ (1) = 1 , μ (2 ) = 1 and μ (1,2) = μ (C ) = 1 , in which the disjunctive operator is realized. In this case, it suffices that only one of the two criteria is satisfied to assure the complete satisfaction. Conversely, if: μ (1) = 0.4 and μ (2) = 0.6, the measure is
additive since μ (1, 2) = 1 = μ (1) + μ (2) . Again, for N criteria, if all the subsets with equal cardinality have the same value of the measure, the OWA operator is obtained
3 This can be sometimes a critical problem, for which reducted order models have been introduced in the past. Such a model are a measure for which interactions are possible only for couples of criteria, thus strongly
reducing the big effort to purchase the 2 N − 2 values (for instance, for a second order model it suffices only N (N − 1) 2 − 2 values ([12])). 4 Even if quite rare, non monotonic measure can be defined if the monotonicity requirement is excluded, permitting to represent conflicting relationships, as the exclusive disjunction ([1], [5], [15], [16]). If the measure values can be also negative, they are named as signed measure ([1]). The Choquet integral can be extended also to non monotonic measures, and to interval ([2], [10]) or fuzzy data ([20]).
139
S. Giove and M. Corazza / Aggregation of Opinions
([19]). Every alternative is represented by a vector of normalized criteria X = [x1 , x2 , K, x N ]. Let the vector Y = x(1) , x(2 ) , K, x( N ) be such that
[
]
x(1) ≤ x(2 ) ≤ K ≤ x( N ), x(i ) ∈ X ∀ i and A(i ) = {i,K , N } , with A( N +1) = ∅ . The Choquet integral of X with respect to the measure μ (⋅) is given as follows in [13]:
∫ X ⋅ dm = ∑ (x(i ) − x(i −1) )⋅ m (A(i ) ) . N
i =1
As example consider the following aggregation with two criteria. Let the measure values be μ (1) = 0.3 and μ (2) = 0.5, and let the alternative vector be X = [0.9,0.5]. Then the previous formula implies:
∫ X ⋅ dm =
0.5 + (0.9 − 0.5) ⋅ 0.3 = = 0.62 . Observe that
the aggregated value is closer to the lowest value of the vector X than to the highest one. In fact, the measure is more conjunctive than disjunctive. The following result can be easily demonstrated.
Proposition 1. A convex linear combination of non additive monotone measures is a non additive monotone measure too. ■ Definition 2. A non monotone additive measure is a bounded set function μ ( A)
defined on the power set P(C ) . ■
3. The Multi Person Multi Attribute problem; preferences and their aggregation through the μ–consistency In the considered Multi Person Multi Criteria decision problem, some alternatives have to be ranked on the basis of a finite number of criteria and of K Stakeholders’ preference structure. Each of such criteria is represented by K non additive measures. The score of the j–th alternative will be computed through of the Choquet integral applied to the criteria vector, using an “average” measure obtained weighting the individual Stakeholder’s measures. Those weights depend on the μ–consistency index, a measure of the individual coherence, calculated on the basis of possible monotonicity violations. A 0–consistency Stakeholder will receive a null weight, i.e. she/he will be excluded from the decisional process. The contrary holds if her/his consistency is complete. Afterword, the measures of the Stakeholders will be aggregated, and a group measure will be obtained to be used for the alternative scoring using the Choquet integral. Therefore, the Multi Person Multi Attribute problem is characterized by:
•
{ A1, A2 , K , AM } : a set of M alternatives, each of them defined by a vector of criteria;
•
{C1, C2 , K , C N }: a set of N criteria whose values are collected in X ( j ) = {x1 ( j ), x2 ( j ),K , xn ( j )} , j = 1,..., M , for the j–th alternative;
the vector
140
•
S. Giove and M. Corazza / Aggregation of Opinions
{E1, E2 , K , E K } : a set of K Stakeholders, each of them characterized by the set of non additive measures {μ1 ( A)}, K , {μ K ( A )}, k = 1 − K and A ⊆ C .
Using these information, an importance weights of each Stakeholder will be calculated on the basis of her/his μ–consistency index. Using these weights, a group measure will be constructed, and subsequently applied to score the alternatives. The following Definition introduces the inner and the outer measures.
Definition 3. Given a non monotonic (non additive) measure {μ ( A ), A ⊆ C } , its
inner in ( A ) and outer out ( A) measures are defined as follows ([10]):
in ( A) = min {μ (B ) : B ⊇ A} , out ( A ) = max {μ (B ) : B ⊆ A} . ■ B⊆C
B⊆C
It easy to check that the inner (outer) measure is the upper (lower) bound of all the monotone measure dominated by (dominating) μ ( A ) . Moreover, in ( A) ≤ m( A) ≤
≤ out ( A), while in ( A) = m ( A) = out ( A) if and only if the measure μ ( A) is monotone.
k The μ–consistency related to μ ( A ) ∈ [0,1] , i.e. the measure given by the k–th Stakeholder, depends on the (non negative) difference of these two measures,
Δk ( A ) = out k ( A ) − in k ( A). As greater as less consistent the measure, and consequently the relative Stakeholder’s importance in the group decreases. Given that the spread is related to the uncertainty about the coalition judgment, the maximum entropy principle suggests, as a natural choice, to assign to each Stakeholder a measure which is the average between the two ones, i.e. k ( A) = μ Ave
1 k Δ ( A). 2
Averaging all the contribution, the following inconsistency index can be computed
Spr k =
1 2
N
∑ Δk ( A) .
A⊆ C
which varies between 0 and 1 (the worst case).
Proposition 2. The inconsistency index varies between 0 (the best case) and 1 (the worst case), and it equals 1 if and only if the measure in anti-monotone5. ■ PROOF. From its definition, the index cannot assume negative values. If the measure in monotone then it equals 0 since in ( A ) = out ( A) and Spr = 0. Conversely, if the measure is anti-monotone, its inner measure equals 0 for each coalition of criteria, while its outer measure equals 1 for each coalition of criteria, i.e.: in( A) = 0 k
and out ( A ) = 1 ∀ A ⊆ C , thus Δ ( A ) = out ( A ) − in ( A) = 1 ∀ A ⊆ C . This implies 5
By anti-monotone measure we mean a measure such that μ ( A) ≥ μ (B ) ∀ A ⊆ B .
S. Giove and M. Corazza / Aggregation of Opinions
1
k
that Spr =
∑ Δ ( A) =
2 N A⊆ C
141
N 1 2
∑ 1 = 1 . In all the other cases, that is when the 2 N i =1
measure in neither monotone neither anti-monotone, the following inequality chain in( A) ≤ m( A) ≤ out ( A) ∀ A ⊆ C , results, and for some A ⊆ C the inequality has to hold
strictly. Then 0 ≤ Δ ( A ) ≤ 1 ∀ A ⊆ C and 0 < Δ ( A) < 1 for some A ⊆ C. It follows
1
k
that Spr =
∑ Δ( A) <
2 N A⊆ C
N 1 2
∑ 1 = 1 . ■ Q.e.d. 2 N i =1
As a consequence, the importance weight of the k–th Stakeholder can be obtained as follows. Supposing that not all the Stakeholders are completely inconsistent, i.e.
Spr k ≠ 1 for some k, then the weight of each Stakeholder decreases with its inconsistency. In the simpler way, the weight reduction depends linearly on the inconsistency, i.e. the updated (not normalized) value of the k–th weights is
u k = 1 − Spr k . Afterward, the weights need to be normalized, obtaining the final value
wk = u k
K
∑ul
. The amount of inconsistency, interpreted as uncertainty amount of
l =1
the group, is given by averaging the inconsistencies of all the participants, i.e.
Spr ALL =
1 K
K
∑ Spr k .
k =1
If Spr ALL overcomes a prefixed threshold ε , a revision is required since too much uncertainty exist from (likely almost) all the participants. Anywise, the index
Spr k cannot render this measure suitable for practical purposes. In fact, it reaches relatively high values if and only if too many inconsistency violations appear. In particular, it equals 1 if and only if the measure is totally anti-monotone, not a realistic case. Conversely, it can be the case that a few, but high, monotonicity violation makes the involved Stakeholders to be shut down in the group consideration. To this purpose, we consider the maximum of the spread as a measure of the monotonicity violation6, meaning that (at least) one great spread is worst than (eventually many) small spreads. Thus we suggest the following computation of the μ–consistency index:
{
}
Spr k = max Δk ( A ), A ⊆ C , Subsequently, the importance weights and the total uncertainty are modified as follows:
6 Really, a measure of the monotonicity violation is not straightforward, and probably it can be subjectively evaluated. This item will be carefully analyzed in a subsequent research.
142
S. Giove and M. Corazza / Aggregation of Opinions
k
uk
u = 1 − Spr , w = K k
k
ALL = and Spr
∑ul
1 K ∑ Spr k . K k =1
l =1
Using these weights, a weighted measure is easily computed (see Proposition 2) if
Sp
ALL
is less then the prefixed threshold:
μ ALL ( A) =
K
k ( A), ∀A ⊆ C . ∑ w k μ Ave
k =1
Finally, one applys the Choquet integral of the vector X ( j ) with the measure
μ ALL ( A) to obtain the score of the j–th alternative. As usually done, the alternatives can be compared together on the basis of the obtained score. The related algorithm is depicted below. AGGREGATION ALGORITHM
-
Assign the measures μ
-
∀k
k
( A) ∀ A ⊆ C ∧ ∀ k = 1,K , K , the threshold ε > 0 ;
compute the inner and the outer measure, obtaining the values
k ( A) = Δk ( A), A ⊆ C and μ Ave
{
1 k Δ ( A) ; 2
}
-
∀ k compute Spr k = max Δk ( A ), A ⊆ C ;
-
ALL = IF Spr
-
1 K ∑ Spr k ≤ ε THEN K k =1 k
∀ k compute u = 1 −
SprHk ,
k
uk
w = K
∑ ul
k , μ Ave ( A ) =
1 k Δ ( A); 2
l =1
-
ALL ( A) = compute μ
-
compute the score of the j–th alternative using the Choquet integral
∫ X ( j ) ⋅ dμ -
K
ALL
=
k ( A), ∀ A ⊆ C ; ∀j = 1, K , M ∑ w k μ Ave
k =1
;
∑ (X ( j )(i ) − X ( j )(i −1) )⋅ μ ALL (A(i ) ) ; n
i =1
ELSE - a judgment revision is required as too much inconsistency is detected.
143
S. Giove and M. Corazza / Aggregation of Opinions
4. A numerical example For a better comprehension of the proposed algorithm, in this Section we provide a numerical example with four Stakeholders and three criteria which shows how to obtain the aggregated measure. For each of the four Stakeholders, the Table 1 reports the weights of each coalitions of criteria, expressed in the [0, 10] scale for a better readability (of course, the following default border conditions hold: μ (∅ ) = 0 and
μ (C ) = μ (1,2,3,4 ) = 1 ). As a consequence, the formula for the (not normalized) weights
u k = 1 − Spr k needs to be naturally modified as follows u k = 10 − Spr k . Table 1. Weights of each coalitions of criteria, expressed in the [0, 10] scale.
{C1}
{C 2 }
{C1 ,C2 }
{C3}
{C1 ,C3 }
{C2 ,C3 }
E1
3
5
5
4
2
3
E2
6
5
2
2
8
7
E3
1
1
1
2
3
1
E4
5
2
3
7
8
5
For instance, the value 8 in the second row and fifth column means that the second Stakeholder E2 assigns to the coalition formed by the first and the third criterion a weight equals to 8. We can also write the evaluation of the first Stakeholder as μ1 = {3, 5, 5, 4, 2, 3} . It is easy to check that the third and the last Stakeholders are μ– consistent, respectively conjunctive the third and additive the fourth. Conversely, the first two Stakeholders exhibit some monotonicity violations. The first Stakeholder E1
violates the monotonicity in the coalition {C1 ,C 3 } , since its value, 2, is lower than the
maximum between the value of {C1} , 3, and the value of {C3 } , 5, and the same holds for the coalition {C 2 ,C 3 } . The second Stakeholder E 2 violates the monotonicity in
the coalition {C1 ,C2 } . Table 2 reports the inner and the outer measures for both E1
and E2 together with the difference between these measures. Outer and inner values are highlighted in italic bold if different from the original value of the measure. Table 2. Inner and the outer measures for both E1 and E2 together with the difference between these measures.
E1
E2
Outer Inner Δ1 Outer Inner Δ2
{C1}
{C 2 }
{C3}
{C1 ,C2 }
{C1 ,C3 }
{C2 ,C3 }
3 2 1 6 2 4
5 2 3 5 2 3
5 2 3 2 2 0
5 4 1 6 2 4
5 2 3 8 8 0
5 3 2 7 7 0
3 4 Being Spr = Spr = 0 ∀K H , with ε = 2.5 we obtain7
7
The value ε = 2.5 corresponds to ε = 0.25 in the scale [0, 1].
144
S. Giove and M. Corazza / Aggregation of Opinions
Spr 1 = max {1, 3, 3, 3,1, 3, 2} = 3 , Spr 2 = max {4 ,3, 0 , 4 , 0 , 0} = 4 , SprHALL =
1 4 1 Spr k = (3 + 4 + 0 + 0 ) = 1.75 . ∑ 4 h =1 4
ALL < ε = 2.5 , no revision of the judgments is required, i.e. the Given that Spr
average inconsistency level is acceptable, and
1 2
μ 1Ave = {5, 7 , 7 , 9, 7 , 8} = {2.5, 3.5, 3.5, 4.5, 3.5, 4},
1 2
2 4 μ Ave = {8, 7 , 4 , 8,16 ,14} = {4 , 3.5 , 2 , 4 , 8, 7}, μ 3Ave = μ 3 , μ Ave = μ4 .
Thus,
u 1 = 10 − Spr 1 = 10 − 3 = 7 , u 2 = 10 − Spr 2 = 10 − 4 = 6 , u 3 = u 4 = 10 . Afterward,
w1 =
7 7 6 10 = = 0.22 , w 2 = = 0.18 , w 3 = w 4 = = 0 .3 . 7 + 6 + 10 + 10 33 33 33
Finally, after a straightforward computation, the value of the aggregated measure are
μ ALL =
K
k 2 4 = 0.22 μ 1Ave + 0.18 μ Ave + 0.30 μ 3Ave + 0.30 μ Ave , ∑ w k μ Ave
k =1
which is reported in the Table 3. Table 3. Value of the aggregated measure for each coalition.
μ ALL
{C1}
{C 2 }
{C3}
{C1 ,C2 }
{C1 ,C3 }
{C2 ,C3 }
2.35
2.30
2.90
4.40
5.50
3.90
In the final step, these measure can be used to aggregate the criteria of each alternative using the Choquet integral.
5. Conclusion and future work In this paper we proposed an aggregation methods of individual opinions about coalitions of criteria for a Multi Person Multi Criteria decision problem, based on a new inconsistency index associated to each members of the group. This index is related to the violation of the monotonicity property, a rational requirement which needs to be satisfied. In the participant decisions, as greater the inconsistency as lower the relative importance of each members. Thus the individual opinions are averaged using such inconsistency index as weight. All the available alternatives are subsequently scored using a convex combination of all the individual measures.
S. Giove and M. Corazza / Aggregation of Opinions
145
As a future development, we intend to analyze both the consensus among the participants and, at the same time, to introduce an interval scoring of the alternatives on the basis of the dispersion of the aggregated group measure.
6. References [1] M. Cardin and S. Giove, Aggregation functions with non-monotonic measures, Fuzzy Economic Review 13 (2008), 3-15. [2] M. Ceberio and F. Modave, An interval-valued, 2-additive Choquet integral for multicriteria decision making, Proceedings of 10th Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, Perugia, 2004. [3] F.J. Cabrerizo, S. Alonso, I.J. Pérez and E. Herrera-Viedma, On consensus measures in fuzzy group decision making. In: V. Torra and Y. Narukawa (Eds.), Modeling Decisions for Artificial Intelligence, Springer, Berlin, 2008, 86-97 [Lecture Notes in Computer Science series]. [4] O. Despic and S.P. Simonovic, Aggregation operators for decision making in water resources, Fuzzy Sets and Systems 115 (2000), 11-33. [5] De Waegenaere A. and P.P. Wakker, Non monotonic Choquet integrals, Journal of Mathematical Economics 36 (2001), 45-60. [6] E. Ephrati and J.S. Rosenschein, Deriving Consensus in Multiagent Systems, Artificial Intelligence, 27 (1996), 126-132. [7] J. Fodor and M. Roubens, Fuzzy Preference Modelling and Multicriteria Decision Support, Kluwer, Boston, 1994. [8] J. Kacprzyk, H. Nurmi and M. Fedrizzi. (Eds.), Consensus under Fuzziness, Kluwer, Boston, 1997. [9] S. Giove, Fuzzy measures and the Choquet integral for Group Multicriteria Decision Making. In: R. Tagliaferri and M. Marinaro (Eds.), Neural Nets, Springer, Berlin, 2001, 77-84 [Perspectives in Neural Computing series]. [10] S. Giove, Non Additive Measures and Choquet integral with uncertain coefficients, Ratio Sociologica. Journal of Social Sciences: Theory and Applications [submitted]. [11] M. Grabish, J.L. Marichal and M. Roubens, Equivalent representations of a set function with applications to game theory and multicriteria decision making, Mathematics of Operations Research, 25 (2000), 157-178. [12] M. Grabish, K-order additive discrete fuzzy measures and their representation, Fuzzy Sets and Systems, 92 (1997) 167-189. [13] M. Grabisch and M. Roubens, Application of the Choquet integral in multicriteria decision making. In: M. Grabisch, T. Murofushi, and M. Sugeno (Eds.), Fuzzy Measures and Integrals – Theory and Applications, Physica Verlag, Berlin, 2001, 415-434. [14] P. Meyer and M. Roubens, On the use of the Choquet integral with fuzzy numbers in multiple criteria decision support, Fuzzy Sets and Systems 157 (2006) 927-938. [15] T. Murofushi, M. Sugeno and D. Machida, Non-monotonic fuzzy measures and Choquet integral, Fuzzy Sets and Systems 64 (1994), 73-86. [16] Y. Rèbillè, Sequentially continuous non-monotonic Choquet integrals, Fuzzy Sets and Systems, 153, (2005), 79-94. [17] T.L. Saaty, Decision Making for Leaders: The Analytical Hierarchy Process for Decisions in a Complex World, University of Pittsburgh Press, Pittsburgh, 1988. [18] Wang Z., Klir G.J., Fuzzy Measure Theory, Plenum Press, New York, 1992. [19] R.R. Yager, Applications and extensions of OWA aggregation, International Journal of Man-Machine Studies, 37 (1992) 103-132. [20] R. Yang, Z. Wang, P.-A. Heng and K.-S. Leung, Fuzzy numbers and fuzzification of the Choquet integral, Fuzzy Sets and Systems, 153 (2005), 95-113.
146
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-146
Portfolio management with minimum guarantees: some modeling and optimization issues Diana BARRO a , Elio CANESTRELLI a1 a Dept. of Applied Mathematics, Ca’ Foscari University, Venice Abstract. In this contribution we consider a dynamic portfolio optimization problem where the manager has to deal with the presence of minimum guarantee requirements on the performance of the portfolio. We briefly discuss different possibilities for the formulation of the problem and present a quite general formulation which includes transaction costs, cardinality constraints and buy-in thresholds. The presence of realistic and operational constraints introduces binary and integer variables greatly increasing the complexity of the problem. Keywords. Minimum scenario.
guarantee,
dynamic
portfolio
management,
Introduction Interest in financial products with minimum guarantee features has increased in response to a period of financial market instability and low interest rate levels. Different categories of investors are looking for risk-return profiles which are capable of integrating potentiality for upward capture ratios and more effective mechanisms of control of the risk of downward movements. A possible way of achieving these goals is to introduce low barriers to shape the risk-return profile. According to the chosen level, these barriers can be expressed as minimum return requirements or maximum amount of loss allowed. There are many different types of products and policies which offer riskreturn profiles with minimum guarantee features. In this paper we are interested in optimal portfolio strategies which could be adopted by the manager of a fund linked with the issuance of these products. Policies which give a minimum guaranteed return usually provide also a certain amount of the return of the risky part of the portfolio invested in the equity market. Limiting the portfolio allocation only to bonds and liquidity instruments do not allow upside potential to be captured. The main objective is thus to combine a guaranteed return, i.e. a low profile of risk, and the possibility of achieving, at least, a part of the higher returns which could be granted by the 1 Corresponding
author: [email protected]
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
147
equity market at the cost of a high exposure to the risk of not meeting the minimum return requirement. Different contributions in the literature tackled the problem of optimal portfolio choices with the presence of a minimum guarantee. For example, we refer to [10] and [11], for a discussion of these problems in a continuous time framework. Another interesting issue is related to the pricing problem for products including minimum guarantee features, see, for example, [2][3]. We consider the problem of formulating and solving an optimal allocation problem including a minimum guarantee requirement and a participation in the returns generated from the risky portfolio. These goals can be achieved both considering them as constraints or including them in the objective function. We discuss the problem in a dynamic optimization framework in presence of frictions in the market. We formulate the problem as a multistage stochastic programming problem in arborescent formulation. The presence of elements such as binary variables and roundlots constraints do not allow us to rely on efficient decomposition techniques already used for other dynamic optimization problems, see, for example, [4]. Thus to tackle the resulting problem we are interested in the use of artificial intelligence approaches. Different contributions in the literature pointed out that these techniques can be efficiently applied to solve a variety of portfolio optimization problems, see, for example, [20][22][25].
1. Minimum guarantee requirements In this contribution we focus on a dynamic portfolio management problem in which we consider an investor who is interested in maximizing his wealth controlling in the meantime the downside risk. Different approaches have been proposed in the literature both in discrete and continuous time using appropriate measure of downside risk or proper constraints on the level of shortfall allowed. In this contribution we want to focus on models where the maximization of wealth is coupled with the presence of a minimum guarantee return for the portfolio. The control of risk is set fixing a lower bound to the level of wealth which can be reached by the portfolio rather than controlling the variability of the distribution of returns of the portfolio. This approach seems to be interesting and appealing in different financial context where the investors are risk averse and may require guarantees to be convinced to enter the investment. We can find many example of this type of risk control profile in financial products like pension funds, life insurance contract, unit linked policies. We are not interested in the pricing of these products, we are rather interested in the point of view of a manager who has to manage a fund linked with these products and thus our goal is to study the formulation and the solution of a dynamic optimization problem. Different strategies can be implemented to build the desired risk/return payoff. In particular we can include direct hedging using derivatives, or indirect hedging the long positions using short-selling. Other strategies can be built through a re-balancing of the portfolio changing the proportions invested in a risky fund and the riskless component, but without an optimization in the
148
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
composition of the risky fund. A simple example in a static framework is the stop loss strategy while in a dynamic context we can built strategies such as the constant proportion portfolio insurance. In this contribution we are more interested in analyzing a dynamic optimization problem in which we deal with the requirement of a minimum guarantee and we are interested in devising an effective way of introducing it in the formulation of the problem analyzing the effect of different choices on the usable solution approaches. In the following we consider two different ways of including the minimum guarantee requirements in the formulation of the problem. Firstly the condition can be included in the objective function as the goal, or one of the goals, of the optimization problem. Secondly it can be handled through the introduction of one or more constraints to the optimization problem. In the latter case we can consider both hard constraints or probabilistic constraints which allows to reach the goal within a specified level of probability. In the following we consider the first approach and include the goal of minimizing the shortfall with respect to the minimum guarantee in the objective function. We consider a portfolio made of three asset classes, i.e. equities, bonds and cash. In this contribution we do not consider hedging strategies, neither direct hedging, including derivatives in the portfolio, nor indirect ones through short selling. The presence of these elements could help in the achievements of the desired goals but their inclusion in the portfolio require a careful modeling of operational constraints linked with these positions, such as borrowing limits and the adequacy of margins which need to be monitored. We consider a dynamic optimization problem in discrete time with a finite horizon and a scenario framework for the description of the uncertainty included in the problem. In the following qi kt , i = 1, . . . , n1 , denotes the position in the i−th stock and bj kt ,j = 1, . . . , n2 , denotes the position in the j−th bond, respectively, while ckt denotes the amount of cash. We denote with rkt = (r1 kt , . . . , rn kt ) the vector of returns of the risky assets for the period [t − 1, t] in node kt and with rc kt the return on the liquidity component in node kt . In order to account for transaction costs and liquidity component in the portfolio we introduce two vector of variables akt = (a1 kt , . . . , an kt ) and vkt = (v1 kt , . . . , vn kt ) denoting the value of each asset purchased and sold at time t in node kt , while we denote with κ+ and κ− the proportional transaction costs for purchases and sells. We denote with ψkt (ykt , zt ) a generic distance measure between the value of the portfolio and the value of the minimum guarantee in node kt at time t. We propose the absolute downside deviation as measure of distance between the managed portfolio ykt and the minimum guarantee benchmark zt ψkt (ykt , zt ) = [ykt − zt ]− = − min[ykt − zt , 0] = γk−t .
(1)
If we consider the minimization of the mean absolute downside deviation for the terminal period, the objective function becomes min
S 1 πs γs− S s=1
(2)
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
149
where s denotes a scenario, i.e. a path connecting the origin of the tree to a leaf node, and πs denotes the probability associated to the path. This choice results in a linear optimization problem which can be solved quite efficiently even if the number of scenarios included is high. Other choices are possible for the minimum guarantee goal, in particular, for example, we can minimize the maximum downside deviation at the end of the horizon. To this aim we define a non negative variable θs as the upper bound of the absolute deviations, i.e. θs ≥ |ys − zt |; this problem, too, can be transformed in a linear optimization problem, see [23], obtaining min θs θ s + ys ≥ z T
s = 1, . . . , S
(3)
The minimum return goal in both the forms described above can be combined with a measure of performance such as the maximization of the wealth in each period, in such a way that a first part of the objective function is related to the maximization of the value of the wealth in each period (an expected utility from wealth can be introduced to take into account risk averse behavior), while a second part controls the downside risk minimizing the mean absolute downside deviation or the maximum downside deviation from the guarantee return portfolio. Alternatively we can set a minimum guarantee return in each period. This formulation would allow us to introduce a path dependent goal in which we can lock in part of the return of the portfolio in such a way that the level of the minimum guarantee can be increased during the management period in case of positive returns in the portfolio, as it often is, for example, for some financial products with the introduction of lock-in conditions. The minimum guarantee can be assumed constant over the entire planning horizon or it can follow a deterministic dynamics, i.e it is not scenario dependent. A second approach in dealing with the minimum guarantee requirements is based on the introduction of some constraints in the formulation of the optimization problem to force the level of wealth or the return of the portfolio to satisfy the required lower bound. The constraints can be of different types, in particular we can consider both hard constraints and probabilistic constraints. In the first case we can introduce constraints on the level of wealth or on the return of the portfolio in the form of lower bounds while in the second we can require that the goal is reached with a specified level of confidence.
2. Formulation of the problem A crucial aspect for the formulation of these problems is the definition of a set of operational constraints which are particularly relevant in the case of direct or indirect hedging using derivatives or short-selling since they are related to the presence of margins and borrowing limits. Nevertheless they are particularly interesting in the management of funds with minimum guarantee requirements since very often they are subject to operational constraints. In
150
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
particular we are interested in focusing on the constraints which increase the computational complexity of the problem, i.e. discrete restrictions which results in the introduction of binary or integer variables. Different classes of constraints have been considered in the literature, mainly in static portfolio problems, see, for example [17] and references therein. Following [17] we consider constraints to explicitly limit the number of stocks which can be included in the portfolio, i.e. cardinality constraints; buy-in thresholds which define the minimum level below which the asset is not included; roundlots constraints which are defined as the discrete numbers of assets which are taken as the basic unit of investment. Given a set of binary variables δi t , δi t ∈ {0, 1}, i = 1, . . . , N , such that δi t is equal to zero if the asset i is not included in the portfolio at time t; the cardinality constraints can be expressed as N
δi t = Ntmax
i=1
where N max represents the number of assets which can be included in the portfolio. The presence of cardinality constraints is linked with buy-in thresholds. Using the binary variables introduced jointly with upper and a lower bounds li t and ui t for asset i at time t, we can express the existence of buy-in thresholds as
li t δi t ≤ xi t ≤ ui t δi t where xi t represents the fraction of portfolio wealth invested in the generic asset i at time t, and is defined as qi kt /yi kt for equities, and bi kt /yi kt for bonds. In the following we will present the formulation of a multistage stochastic programming problem, in its arborescent formulation, with the introduction of constraints on the presence of minimum guaranteed return for the portfolio for each period of the investment horizon and the operational constraints.
min
qkt ,bkt ,ckt
T t=1
⎡
Kt
⎣αt
ψ(ykt ) − βt
kt =Kt−1 +1
ykt = ckt +
n1 i=1
qi kt +
Kt
⎤ φ(ykt )⎦
(4)
kt =Kt−1 +1 n2
bj kt
(5)
j=1
" ! qi kt = (1 + ri kt ) qi f (kt ) + ai f (kt ) − vi f (kt i = 1, . . . , n1 " ! bj kt = (1 + rj kt ) bj f (kt ) + aj f (kt ) − vj f (kt ) j = 1, . . . , n2 ckt = (1 + rc kt )[cf (kt ) −
n1 i=1
(κ+ )aif (kt ) +
n1 i=1
(κ− )vif (kt )
(6) (7)
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
+
n2
(κ+ )ajf (kt ) +
j=1
n2
(κ− )vjf (kt ) +
j=1
n2
gkt bjf (kt ) ]
151
(8)
j=1
ai kt ≥ 0 vi kt ≥ 0 i = 1, . . . , n1
(9)
aj kt ≥ 0 vj kt ≥ 0 j = 1, . . . , n2
(10)
qi kt ≥ 0 i = 1, . . . , n1
(11)
bj kt ≥ 0 j = 1, . . . , n2
(12)
ckt ≥ 0
(13)
qi 0 = q¯i i = 1, . . . , n1
(14)
bj 0 = b¯j j = 1, . . . , n2
(15)
c0 = c¯
(16)
ykt ≥ zt
(17)
δi kt ∈ {0, 1} i = 1, . . . , n
(18)
n
δi kt = N max
(19)
i=1
li kt δi kt ≤ xi kt ≤ ui kt δi kt
(20)
kt = Kt−1 + 1, . . . , Kt t = 1, . . . , T where ψ(ykt ) and φ(ykt ) denotes two functions of the level of wealth in the portfolio accounting for the return and the risk of the portfolio, respectively. The choice of these function characterizes the resulting optimization problem, in particular the manager may be interested in maximizing the expected return of the portfolio or the expected utility of wealth and at the same time minimize the downside deviations or the variance of the portfolio. A proper choice of these functions may lead to a linear multistage stochastic programming problem. Equations (5)-(8) represent the portfolio composition in node kt , the dynamics of the amounts of stocks, bonds and cash in the portfolio moving from the ancestor node f (kt ), at time t − 1, to the descendent nodes kt , at time t, with K0 = 0, respectively. With gkt in equation (8) we denote the inflows from the bonds in the portfolio. Following [12] we assume that there is an annual guaranteed rate of return
n+1 denoted with ρ. If the initial wealth is W0 = i=1 xi0 , then the value of the guarantee at the end of the planning horizon is WT = W0 (1 + ρ)T . At each intermediate date the value of the guarantee is given by zt = eδ(t,T −t) (T −t) W0 (1 + ρ)T , where eδ(t,T −t) (T −t) is a discounting factor, i.e. the price at time t of a zcb which pays 1 at terminal time T . If we omit the operational constraints we obtain a (linear) multistage stochastic programming problem which can be efficiently solved using decomposition techniques, see for example [4] and references therein.
152
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
The presence of discrete constraints greatly increases the computational complexity of the resulting problem, see, for example, [17] for a discussion of the effects of the introduction of these constraints in a static portfolio selection problem. Different computational approaches based on heuristic algorithms (like genetic algorithms, neural networks, simulated annealing, etc.), have been proposed to deal with this kind of problems. In particular several papers report successful results from the application of artificial intelligence tools to tackle complex financial problems, see for example [20][22][19]. We think this promising approach could help in tackling the family of problems we are interested in. In particular we are interested in comparing the portfolio strategies obtained using different objective functions. A first issue we must deal with is related to the choice of how these techniques can be applied to solve multistage stochastic optimization problems. In particular we suggest to apply them directly to solve the deterministic equivalent problem which is a large scale optimization problem.
3. Concluding remarks In this contribution we analyze the issue of managing a portfolio with the goal of building a profile of risk/return characterized by the presence of a minimum guarantee. We would like to focus on different modeling choices which are available and on the complexity of the resulting optimization problems. In this work we do not consider hedging strategies obtained using derivatives or short selling, this topic is left for future research, as well as a comparison of the effectiveness of the different strategies in reaching the goal of a minimum guarantee return portfolio comparing them also from the point of view of the costs associated with the dynamic management of the portfolio.
References [1] Alexander, G.J., Baptista, A.M.: Portfolio selection with a drawdown constraint. Journal of Banking and Finance 30 (2006) 3171-3189 [2] Bacinello, A.R.: Fair pricing of life insurance partecipating policies with a minimum interest rate guarantee. Astin Bulletin 31(2) (2001) 275-297 [3] Bacinello, A.R.: Fair valuation of a guaranteed life insurance partecipating contract embedding a surrender option. The Journal of Risk and Insurance 70(3) (2003) 461-487 [4] Barro, D., Canestrelli, E.: Dynamic portfolio optimization: Time decomposition using the Maximum principle with a scenario approach. European Journal of Operational Research 163 (2005) 217-229 [5] Basak, S.: A general equilibrium of portfolio insurance. Review of financial studies 8 (1995) 1059-1090 [6] Beasley J.E., Meade N., Chang T.-J.: An evolutionary heuristic for the index tracking problem. European Journal of Operational Research 148 (2003) 621-643 [7] Boyle, P., Tian, W.: Optimal portfolio with constraints. Mathematical Finance 17 (2007) 319-343 [8] Consiglio, A., Cocco, F., and Zenios S.A.: The Prometeia model for managing insurance policies with guarantees. In: Zenios, S.A. and Ziemba W.T. (Eds.) Handbook of asset and liability management vol.2 663-705, North-Holland (2007)
D. Barro and E. Canestrelli / Portfolio Management with Minimum Guarantees
153
[9] Consiglio, A., Saunders, D., and Zenios S.A.: Asset and liability management for insurance products with minimum guarantees: The UK case. Journal of Banking and Finance 30 (2006) 645-667 [10] Deelstra, G., Grasselli, M., Koehl, P-F.: Optimal investment strategies in the presence of a minimum guarantee. Insurance: Mathematics and Economics 33 (2003) 189-207 [11] Deelstra, G., Grasselli, M., Koehl, P-F.: Optimal design of the guarantee for defined contribution funds. Journal of Economic Dynamics and Control 28 (2004) 2239-2260 [12] Dempster, M.A.H., Germano, M., Medova, E.A., Rietbergen, M.I., Sandrini, F., and Scrowston, M.: Designing minimum guaranteed return funds. Research Papers in Management Studies, University of Cambridge, WP 17/2004 (2004) [13] El Karoui N., Jeanblanc, M., Lacoste, V.: Optimal portfolio management with American capital guarantee. Journal of Economic Dynamics and Control 29 (2005) 449-468 [14] Gaivoronski, A., Krylov, S., van der Vijst, N.: Optimal portfolio selection and dynamic benchmark tracking. European Journal of Operational Research 163 115-131 [15] Hochreiter, R., Pflug, G., Paulsen V.: Design and management of Unit-linked Life insurance Contracts with guarantees. In: Zenios, S.A. and Ziemba W.T. (Eds.) Handbook of asset and liability management vol.2 627-662, North-Holland (2007) [16] Jensen, B.A., Sorensen, C.: Paying for minimal interest rate guarantees: Who should compensate whom? European Financial Management 7 (2001) 183-211 [17] Jobst, N.J., Horniman, M.D., Lucas C.A., Mitra, G.: Computational aspects of alternative portfolio selection models in the presence of discrete asset choice constraints, Quantitative Finance 1 (2001) 489-501 [18] Konno, H., Yamazaki, H.: Mean absolute deviation portfolio optimization model and its applications to Tokyo stock market. Management Science 37 (1991) 519-531 [19] Lemke F., Mueller J.-A.: Self-Organizing Data Mining For A Portfolio Trading System. Computational Intelligence in Finance May/June (1997) 12-26 [20] Chi-Ming Lin, Jih-Jeng Huang, Mitsuo Gen, Gwo-Hshiung Tzeng: Recurrent neural network for dynamic portfolio selection. Applied Mathematics and Computation 175 (2006) 1139-1146 [21] Muermann, A., Mitchell, O.S., Volkman, J.M.: Regret, portfolio choice, and guarantees in defined contribution schemes. Insurance: Mathematics and Economics 39 (2006) 219-229 [22] Kyong Joo Oh, Tae Yoon Kim, SungKy Min: Using genetic algorithm to support portfolio optimization for index fund management. Expert Systems with Applications 28 (2005) 371-379 [23] Rudolf, M., Wolter, H-J., Zimmermann, H.: A linear model for tracking error minimization. Journal of Banking and Finance 23 (1999) 85-103 [24] Sorensen, C.: Dynamic asset allocation and fixed income management. Journal of Financial and Quantitative Analysis 34 (1999) 513-531 [25] Lean Yu, Shouyang Wang, Kin Keung Lai: Neural network-based mean-variance-skewness model for portfolio selection. Computers & Operations Research 35 (2008) 34-46 [26] Ziemba, W.T.: The Russel-Yasuda Kasai, InnoALM and related models for pensions, Insurance companies and high net worth individuals. In: Zenios, S.A. and Ziemba W.T. (Eds.) Handbook of asset and liability management vol.2 861-962, North-Holland (2007)
154
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-154
The treatment of fuzzy and specific information provided by experts for decision making in the selection of workers a
Jaime GIL-LAFUENTE a,1 Universitat de Barcelona, SPAIN
Abstract. Up to now, our research, based on the theory of fuzzy subsets, leads to an alternative implementation of a series of techniques arising from multivalent logics [1] to decision making in management of human resources, using known algorithms (like Hamming [2], Euclean [3], Tran y Duckstein [4] distances, adequation coefficient [5] or weighted mean hemimetric for fuzzy numbers [6]) even creating new instruments (such as the “maximum and minimum level index” [7] or “discarding by overcoming ratedistance index” [8]) which would allow us to adapt in an even more reliable manner to the always complex and unstable reality. In many cases, in order to avoid uncontrollable errors, the expert was advised, if in doubt, to work with confidence intervals in stead of crisp numbers which, allowed to limit the uncertainty to a greater degree and to do calculations that ensured that we avoided errors in estimates or valuations. The results were satisfactory and, besides, it was possible to approach a new question: if experts could carry out their valuations in a more meticulous way if possible (by providing several possibilities between a minimum and maximum), valuing, for example, by means of fuzzy numbers [9], would it be feasible to operate in order to find in the end, Hamming distances with weighting? The answer is yes. Many manuals provide us with the techniques to, on the one hand, calculate distances and, on the other hand, operate with fuzzy numbers. But to find distances between fuzzy sub-sets of a degree α (for example) was a challenge which we have allowed ourselves to tackle. Keywords. Convex weighting, Decision making, Distance, Fuzzy number, Uncertainty.
Introduction The current situation we are living, heavily influenced by economic and financial crisis, all companies must be cautious about decision-making, since any error can trigger its collapse. Many companies are forced to dismiss employees to reduce their staff and need to maintain versatile workers who are able to adapt and fulfill, if necessary, different jobs and roles. Taking these facts into account, we decided to work with a series of techniques, which, at least, allowed us to reduce the uncertainty of certain decisions that, taken often very lightly, have meant extraordinary economic losses for companies. In an attempt to provide a solution to such an obvious problem, we have presented several works, based on the aggregation of expert opinions; which allowed us to 1 Jaime GIL-LAFUENTE: Universidad de Barcelona, Av. Diagonal, 690; 08034-Barcelona. Spain. Email: [email protected].
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
155
determine if workers are “polyvalent” or would optimize their performance in specific positions. The aim is, to sum up, an optimum assignment for workers. In each case, we recomended the experts to evaluate each quality, characteristic or peculiarity to be taken into account by means of the hendecagonal system. If in doubt, uncertainty should be limited between a minimum and a maximum in order to form confidence intervals. But the experts, having sufficient information and knowing how to use it, not only were to come up with a confidence interval, but they may have to specify, for each value of the hendecagonal system to be found between the lower and the upper limits, the possibilities that each one of the inserted valuations was the most suitable. In this paper will be discussed, not only evaluation by means of real numbers or confidence intervals, but also a third option: fuzzy numbers.
1. The arithmetic of the fuzzy number As we all know, the fuzzy number can be defined as the fuzzy sub-set that possesses three specific characteristics [9]: 1. 2. 3.
The referential must belong to the realm of rational numbers The function characteristic of membership must be normal. The function characteristic of membership must be convex.
Obviously, we shouldn’t dwell on each one of the factors that define a fuzzy number in this paper. Nevertheless, we do consider that it is necessary to underline the fact that it is an element which provides more information at any given moment between the levels that limit uncertainty and, therefore, has a great operative use. Let us remember that, with fuzzy numbers, we can practically do all the operations which we normally do with crisp numbers. Consequently, if we are considering finding the Hamming Distance between two fuzzy sub-sets in which some of the elements that form the function characteristic of membership is a fuzzy number, we have to stop and find out how to do subtraction when crisp numbers, confidence intervals and fuzzy numbers intervene. The next step would be facing the challenge of finding the absolute values of it result obtained before. Finally we have to find out which of the fuzzy numbers obtained is the lowest.
2. The most complete fuzzy valuation carried out by experts Up to this point, when we needed to optimize our decision making to hire a worker, the process was based on the fact that it was enough if the experts evaluated each one of the qualities, characteristics and peculiarities which describe the profile of each candidate by crisp numbers, or if in doubt, by confidence intervals. However, the prestigious experts on whom we are going to rely for this purpose may have sufficient information not only to limit the uncertainty by means of
156
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
confidence intervals but also to assign to every intermediary value, to be found between the lower and upper extremes, an valuation that indicates the (subjective) possibilities that they consider this situation could occur. In other words: possibilities will be assigned between the two extremes (values between 0 and 1) that each of the intermediary values may occur (members of the hendecagonal [10] system). In this case, the function characteristic of membership could be given by means of: • A single number, when the expert or experts are absolutely sure about the evaluation of a specific characteristic, quality or peculiarity. • A confidence interval, if the expert or experts are not completely sure about the evaluation that should be granted to each characteristic, quality or peculiarity. In this case, the value will be expressed by means of a minimum and a maximum, outside of which, according to the expert, there is no possibility that the evaluation is correct. • A fuzzy number, if the experts have enough information to consider that the valuation given to each characteristic, quality or peculiarity can be included between a minimum value and a maximum value, but, unlike the previous case, this “interval” will have, for each one of the values which covers, more possibilities (near to the maximum presumption: 1) or less (further away from the maximum value) that the valuation granted will be complied with. We are faced then with the fact that in order to do all and each one of the calculations that will permit us to take decisions, we have to operate not only with crisp numbers or confidence intervals, but also with fuzzy numbers, elements that have a greater operative complexity. The challenge of finding a solution involves among others the sum of a confidence interval with a fuzzy number, finding the absolute value of a confidence interval (if it allows us to operate finding a distance), or finding, in the same way, the manner to work with the absolute value of a fuzzy number, in order to find likewise distances.
3. Adapting mathematics to common sense. The special case of distance between a crisp number and fuzzy confidence interval. We know that the arithmetic between confidence intervals has, for some time, allowed us to achieve satisfactory results. Nevertheless, when we do a subtraction in which the subtrahend confidence interval is partially or totally greater than the minuend, we face an incongruence: when looking for the absolute value (a logical operation if we are dealing with the Hamming distance), the negative values would become positive. Consequently distance, in stead of being minimum (or 0), would become greater than it really is.
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
157
A simple and easily understandable example for this case could be: For characteristic Cn, the experts valuate the employee considered ideal (I) and the result is: .5. If one of the candidates has been granted by the experts, for this same characteristic, values included between [.3, .6], we can see that, in the calculation process for the Hamming distance the result of this partial operation would be: |[.5, .5] (-) [.3, .6]| = |[-.1, .2]| = [.1, .2], a solution which is totally erroneous if the hypothetical value included in the confidence interval [.3, .6] were .5. It is clear that the result: .5 () .5 = 0 would not be found between the minimum .1 and the maximum .2 which make up the confidence interval that is the result of the operation. At this juncture its worth considering that “when the result of the subtraction is a confidence interval in which the lower extreme is negative, we must consider, on doing its absolute value for the calculation of a distance, that the minimum distance (lower extreme) will always be equivalent to zero”.
4. A new approach to adding fuzzy numbers with confidence intervals The next difficulty that arises for us, once we have found all the partial distances of each one of the characteristics, qualities or peculiarities, is to do all the additions. It would be more than logical that in many cases we will have to add crisp numbers, fuzzy numbers and confidence intervals. There would be no problem whatsoever doing the addition, taking into account that it is possible to convert any crisp number or any confidence interval [11] into a fuzzy number. A simple example will allow us to appreciate this simplicity:
•
A crisp number into a fuzzy number. Ex.:
•
A confidence interval into a fuzzy number. Ex.:
Nevertheless we have allowed ourselves to consider that, for the second case, the function characteristic of membership belonging to the referentials included between .7 and .9 should not be the maximums of presumption, since the information contained between the extremes is nil, and therefore the uncertainty maximum. For this reason, we are in a position to contribute, we feel in an opportune manner, but above all because it is closer to reality, to assign each of the values of the referential included between the lower extreme and the upper extreme a confidence interval that shows with greater exactitude the total uncertainty of each one of them: [0, 1]. Continuing with our example, in order to convert this confidence interval into a fuzzy number what we consider most adequate is to do so as follows:
Ex.:
158
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
5. Making the final decision There is no doubt whatsoever that each one of the steps we take in order to find the Hamming distance will slowly but surely convert all the values into fuzzy numbers. Therefore, the final result is a series of distances expressed in fuzzy numbers and on innumerable occasions will be particularly difficult to determine which is the greater or the lesser. To determine the greater or lesser fuzzy number the “supremum” (the fuzzy number that includes those under comparison) must be calculated. Once it is found, the next step is to look for the distances relative to Hamming between this number and each one of the distances arrived at before. Let us assume that we have two fuzzy numbers X and Y, and that we need to know which of them is the greater.
The distance relative to Hamming between the “supremum” (S) and X: δ (X, S) = 0 + 0 + .4 = .4 / 3 = .13 and the upper limit between (S) and Y: δ (Y, S) = .5 + 0 + .4 = .5 / 3 = .16. So we can state that the fuzzy number X is lower.
6. Application of the proposed scheme The human resources manager describes the ideal employee for a determined position, taking into account a series of characteristics, qualities and peculiarities of a physical, technical, mental and medical nature:
Likewise, and after the pertinent trials, all the experts evaluate each one of the characteristics, qualities and peculiarities of the first candidate selected for occupying this position in the team; the description will be represented by fuzzy sub-set A:
In this case, for characteristics C1 and C3 the experts decide to assign a confidence interval, while for C4 and C5 the specialists have not only limited the uncertainty, but
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
159
they have structured it. The experts decide the level of possibilities that each one of the values situated between the extremes occur, forming a fuzzy number. The description of the other candidates will be likewise carried out by means of the following fuzzy sub-sets:
Making use of this information, we must find which one of these is closer to the “employee considered as ideal” by means of the “distance relative to Hamming”2 with convex weight3. The experts would mark, according to their subjective viewpoint, the weight of each characteristic [12]: w1 = .8; w2 = .2; w3 = .4; w4 = .4; w5 = .2. By doing w the following calculations: z = 1, 2, …, 5 we will get the index of weights: vz = n z ; ∑ wi i =1
v1 = .4; v2 = .1; v3 = .2; v4 = .2; v5 = .1; The Hamming distance between candidate A and the worker considered as ideal I is:
In spite of the fact that the results we have found up to this point indicate that a confidence interval can become a fuzzy number, by assigning to the values located between each extreme maximum values in the function characteristic of membership (which is 1), we have considered, just as we mentioned in the previous section, a more uncertain result, but also much closer to reality: the values will be a confidence interval of the maximum uncertainty: [0, 1].
2 As has been commented before, we have decided to simplify this example to concentrate only on the steps towards the resolution of the problem we have considered. For this reason, we have decided to use the “distance relative to Hamming” and not other indices which, although may be more exact but far more complex. 3 We think that this option is the more appropriate due to its simplicity and adaptability to the multivalent logics.
160
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
and we know that: which will allow us to operate. This is a solution find the absolute Hamming distance between the ideal laborer I and candidate A, with the same steps for B and C:
This distances can be seen, perhaps with greater clarity, if expressed graphically:
(I, A)
.39
.23
.27
.31
0
.35
(I, C)
.39
.35
0 , .31
.27
.23
1
Figures 1-2. Distances between I and A, B and C seen from both sides After finding the 3 Hamming distances that separate each one of the candidate workers from the hypothetical perfect employee, we must decide which of them is less distant from the ideal. And for that, we must find the three “supremum”.
To find the distance, we must consider, for feasibility, that into every fuzzy sub-set, [0, a] = a.
161
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
So:
The next step is to calculate the Hamming distance between the upper limit and
(,
)
each of the existing distances: δ S π (I, A) = 1.7 / 24 = .0708; δ 24 = .3958; δ
(S, π (I, C)) = 11.4 / 24 = .475
(S,
)
π (I, B) 9.5 /
Thus we can conclude that candidate C is the closest to the worker considered as ideal for occupying a certain position on the organisation (the distance from the “supremum” is the greatest), while candidate B would be the second option. Finally, A would occupy the last place.
7. Conclusion The schemes which have been used for the resolution of problems such as the selection of the workers who are best for a specific position (confidence intervals and even fuzzy numbers) can be applied to other situations when information exceeds the precision of numbers to deal with uncertainty. Now then, the arithmetic of uncertainty obliges us, on certain occasions, to establish some criteria which we understand can be controversial and even, in certain cases, rejected. We feel, nevertheless, that this kind of investigation should not be stopped in order to advance towards the generalization of schemes that have been available up to now. The case we are concerned with is a good example of this. Obviously, our work, which includes several numerical examples, only intends to be a first proposal for discussion. Perhaps we ourselves are not totally satisfied, although we feel more so because we hope that out of this, new and fruitful consequences may arise.
References [1] L.A. Zadeh, Fuzzy sets, Information and Control, 8 (3):338–353, 1965. [2] R.W. Hamming, Error detecting and error correcting codes. Bell System Technical Journal, 29 (2), 1950.
162
J. Gil-Lafuente / The Treatment of Fuzzy and Specific Information
[3] P.E. Danielsson, Euclidean distance mapping, Computer Graphics and Image Processing, 14 (1980), 227-248. [4] L. Tran & L. Duckstein, Comparison of fuzzy numbers using a fuzzy distance measure, Fuzzy Sets and Systems, 130, (2002), 331-341. [5] J. Gil-Aluja. La gestión interactiva de los recursos humanos en la incertidumbre. Publ. Ceura, Madrid, 1996. [6] J. Rojas-Mora & J. Gil-Lafuente, The signing of a professional athlete: Reducing uncertainty with a weighted mean hemimetric for Φ-fuzzy subsets. Proceedings of ICEIS'09 (2009). [7] J. Gil-Lafuente, El ‘Índice del Máximo y Mínimo Nivel’ en la Optimización del Fichaje de un Deportista, X AEDEM International Conference Reggio Calabria, Italy (2001), 439-443. [8] J. Gil-Lafuente, Nuevo instrumento de selección: el índice de Descartes por superación-distancia, Cuadernos del Cimbaje, 5 (2005), 43–60. [9] A. Kaufmann & J. Gil-Aluja, Introducción a la teoría de la incertidumbre en la gestión de empresas, Publ. Milladoiro, Vigo, 2002. [10] Greek “endeka” of eleven, and the suffix “ada”, together. A “endecada” is a set of eleven elements. In our case, we used 11 equally spaced values between 0 and 1 for the ratings assigned [11] A. Kaufmann & J. Gil-Aluja, Las matemáticas del azar y de la incertidumbre. Elementos básicos para su aplicación en economía, Publ. Centro de estudios Ramón Areces. Madrid, 1990. [12] J. Gil-Lafuente, Il Suceso nella Gestione Sportiva. Algoritmi per l’Eccellenza, Publ. Falzea, Reggio Calabria, Italy, 2003.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-163
163
An Intelligent Agent to Support City Policies Decisions a
Agnese Augello a,1 , Giovanni Pilato b and Salvatore Gaglio a ,b DINFO - Dipartimento di Ingegneria Informatica – Università degli Studi di Palermo b ICAR - Italian National Research Council, sede di Palermo Abstract. In last year there has been a growing interest in computational intelligence techniques applied to economics, providing support for financial decisions. In this paper we propose an intelligent decision support system, aimed at suggesting the best managing strategies for a game-based model of a virtual city. Two knowledge representation areas characterize the intelligent agent. The first one is a “deterministic” area, which deals with descriptions and deterministic events. The second one is a “decisional” area, which deals with decisions taken under conditions of uncertainty. The agent is capable of reasoning in order to prospect the future evolutions of particular choices taken by the user. The interaction is conducted through a natural language interface built as an Alicebased conversational agent.
Keywords. DSS, Chatbot, Bayesian Decision Networks, ACE
Introduction In last year there has been a growing interest in computational intelligence (CI) techniques in economic fields. Evolutionary algorithms, neural networks, fuzzy systems, and Bayesian networks are the most used CI techniques in economics and finance [2][4][5][6]. Decision Support Systems (DSS) are useful application of these methodologies in economic fields. In the Italian context, the interest for this kind of systems is growing in particular for regional issues management: territorial management[8][9], forecasting of environmental effects[12], sustainable development[10], control of water-plants nets [11], markets forecasting[13]. An attempt of using expert systems technology has been given by the “Provincia autonoma di Trento”[7], with the development of a Bayesian network aimed at the evaluation of economic conditions of families for welfare services access. Often wrong financial choices, taken without a scientific base, determine harmful market consequences. Predictive services, based on adequate economic theories to understand and forecast the reactions due to the choices of investors[2] should be developed. Late researches in the field of software agents in economics (ACE) describe and study this complex domain in order to catch the effects deriving by the interaction between different agents.
1
Agnese Augello, DINFO - Dipartimento di Ingegneria Informatica – Università degli Studi di Palermo- Viale delle Scienze, Edificio 6 - 90128, Palermo, Italy, [email protected]
164
A. Augello et al. / An Intelligent Agent to Support City Policies Decisions
The union between experimental economics and ACE disciplines allows the development of validation tests on economic theories through the study of human behavior in an highly controlled environment, and substituting human agents with software agents[3]. In order to accomplish this task, it is important to consider heterogeneous software agents, and characterizing them with their own cognitive capability, a personality and a different level of risk attitude[3]. A useful manner to study the effects of economic policies is recreating scenarios of the real world in simulation games. Many simulation games exist where the player task is to take strategic decisions with the aim of reaching specific goals. The Global Economics Game[14] gives an example of educative game where the task is to define fiscal, economic and trading policies of a nation. The goal is to promote the economic growth without causing excessive pollution, keeping high values of employment and low inflation. Other kind of strategic games deal with the task of administering transport systems of a nation, effectively managing the economic resources[16][17], guaranteeing security in a town area, controlling market prices and taking decisions on sectors in which to invest the available economic resources[15]. The SimCity series includes a set of famous simulation games where the player acts as a major of a town[18][19]. Two different versions belong to the series. The first one, named “SimCity” is mostly oriented on economic/management issues, the variation of the taxes, the management of garbage, and the interaction with nearby towns. The second one, named “SimCity society”, is oriented to the construction of cities based on specific social values. These kinds of values derive by the choices taken by the player during the construction of the buildings of the town, which affect the behavior of the citizens. As an example, cities based on productivity will be characterized by an high level of pollution, cities based on authority will be characterized by a strong control over citizens and recall the Orwell’s “Big brother” model, and so on. The goal of the system described in this paper is to provide an automatic decision support in a model of town inspired to the “SimCity and SimCity Society” games. An intelligent agent, which interacts in natural language with the user, obtains information about the current state of the city and gives, as a consequence, the DSS suggestions about the best strategies to apply.
1. An agent architecture for city policies support The goal of the proposed system is to provide a support for taking politicaleconomical decisions. The system is suited for giving suggestions in order to administer a city as best as possible. To this aim, we exploit simulation games oriented to the creation and the management of a town. The decision support system is composed of an intelligent agent, which is capable to obtain information on the current status of the city through natural language interaction with the user. It gives hints on the best strategies to use. In particular, the player receives a support regarding the decisions about the specific fields in which to invest the economic resources of the city, the taxes to impose, etc, trying to control parameters of interest like the level of pollution, the criminality index, the well being of citizens. We have hypothesized the possible actions of the major/player, like the financial management of the city. We have identified macro-areas of investment
A. Augello et al. / An Intelligent Agent to Support City Policies Decisions
165
destination: road conditions, environment, economy and development, general and support services, justice and security, education, culture and spare time, welfare, etc... The agent is suited to reflect the player’s personality; according to this, it tries to satisfy the goals of the game respecting, as much as possible, the user’s preferences. The agent should be able to reason in order to evaluate the benefits and the potential risks resulting from the adoption of different strategies, and, as a consequence, to estimate the effects on the variables of interest of the analyzed domain. We have created a simplified model for the management of a virtual city, including the main features of the analyzed “sim-city class” games. We have identified the main variables of the domain, their possible states and their mutual dependences.
1.1. The Architecture The core of the system is an intelligent agent, which interacts with the user exploiting two knowledge areas: a “deterministic area” and a “decisional area”. The deterministic area is aimed at the description of the domain using an ontological model. This allows understanding the structure of the town and inserting deterministic information about it. It defines the rules, the relations and the concepts of the domain, the taxonomy, and the main properties to take into consideration. Besides, it provides an inferential deterministic engine. The “decisional area” helps the user in taking the best decisions under uncertainty conditions. It takes into account the user's preferences, the evidences of the casual variables, analyzing the effects of potential actions. Bayesian Decision Network (BDN) gives the core of the decisional area. This choice is particularly useful when we have partial information about facts. The network is capable to make probabilistic inferences, aimed at estimating the states resulting by the application of a particular action or determine the possible actions in order to reach a particular goal. Inside the network we have defined decision nodes, and utility nodes, which represent the different possible choices and their associated attractiveness. Actions are selected evaluating the evolution of the network for each possible configuration of the decision node and a as a consequence different investment policies. A first schematization of the Bayesian decision network has been built with GeNIe, an environment for building of decision-theoretic models, developed at the University of Pittsburgh[20]. The integration of both models allows the exploitation of the ontology features and also the representation of uncertainty, which cannot be modeled using the rules defined in the ontology.
166
A. Augello et al. / An Intelligent Agent to Support City Policies Decisions
Figure 1 A first schematization of the Bayesian Decision Network
The chatbot interface can be implemented exploiting the Alice technology. This kind of architecture is capable to interact with people by using rules described into its own knowledge base through its embedded AIML (Artificial Intelligence Mark-up Language) language. Ad hoc introduced AIML rules allow the chatbot to exploit its own deterministic and decisional areas. As a consequence, the conversational agent is capable to interact with users through the evaluation of complex probabilistic scenarios on the domain. Figure 2 shows the system architecture, while Figure 3 shows a conceptual schema of the system settings and the user-agent interaction.
Figure 2 System Architecture
A. Augello et al. / An Intelligent Agent to Support City Policies Decisions
167
1.2. User Agent Interface The agent can interact with the user by means of a conversational agent based on the ALICE technology. The rules described in the chatbot knowledge base, are oriented to the accomplishment of the following tasks: the exploitation of the decisional area, the deterministic area and the natural language interaction with the user. The rules are coded in question answer modules, named categories and described in AIML language. User questions are compared by the chatbot engine with the question component of the categories, described by an AIML tag called pattern. Every time a matching rule is satisfied, the chatbot will answer to the user with the answer associated to the matched category, which is described by an AIML tag called template. The querying of the deterministic and decisional areas is triggered by the presence of ad-hoc tags written into the template. Hence, the chatbot is provided with probabilistic reasoning and decisional capabilities. As an example the chatbot can set and get the probabilities of the Bayesian decision network, update the belief of the network, show to the user the most suitable decision strategy in order to accomplish the desired targets. Other tags enable the deterministic reasoning of the chatbot, inferring complex relations starting from known facts and properties represented inside the ontology. The functionalities enabled by this interaction are shown in Figure 3.
Figure 3: A conceptual schema of the work
168
A. Augello et al. / An Intelligent Agent to Support City Policies Decisions
2. Conclusion and Future Works The use of a smart conversational agent that suggests the best strategies to adopt, also in uncertainty situations, in order to manage a virtual city is a good test-bench for using knowledge representation and reasoning in real life economics and management. Future developments of the architecture presented in this paper will regard the definition of a more detailed model, a better analysis of different strategies, the creation of disparate agents, each one suited for a specific problem and capable to interact with other agents in order to co-operate for the attainment of the goal.
References [1] Dawid, H.; La Poutre, H.; Xin Yao, "Computational intelligence in economic games and policy design [Research Frontier]," Computational Intelligence Magazine, IEEE , vol.3, no.4, pp.22-26, Nov. 2008 [2] Olsen, R., "Computational Finance as a Driver of Economics [Developmental tools]," Computational Intelligence Magazine, IEEE , vol.3, no.4, pp.35-38, Nov. 2008 [3] Shu-Heng Chen, "Software-Agent Designs in Economics: An Interdisciplinary [Research Frontier]," Computational Intelligence Magazine, IEEE , vol.3, no.4, pp.18-22, Nov. 2008 [4] Aaron J. Rice; John R. McDonnell; Andy Spydell; Stewart Stremler, "A Player for Tactical Air Strike Games Using Evolutionary Computation," Computational Intelligence and Games, 2006 IEEE Symposium on , vol., no., pp.83-89, May 2006 [5] Chen, S. 2007. Editorial: Computationally intelligent agents in economics and finance. Inf. Sci. 177, 5 (Mar. 2007), 1153-1168. DOI= http://dx.doi.org/10.1016/j.ins.2006.08.001 [6] C.-C. Tseng, Influence diagram for investment portfolio selection, in: Proceedings of 7th Joint Conference on Information Sciences, 2003, pp. 1171–1174. [7] Wolfgang J. Irler. Elementi formali di un modello matematico per la valutazione della condizione economica. http://econpapers.repec.org/paper/trnutwpde/9302.htm [8] SuSap Project: http://www.ersaf.lombardia.it/default.aspx?pgru=2&psez=38 [9] SFIDA Project :http://www.sfida-life.it/pdf/Articolo_INPUT_2005.pdf [10] Giorgio Cevenini. Progetto O.S.A. http://www.mtisd06.unior.it/collegamenti/MTISD%202006/Abstracts/49c_Cevenini.pdf [11] HyNet Project: http://www.ehssrl.it/software_ssd.htm [12] Sedemed Project: https://sedemed.medocc.org [13] Sesamo Project: http://sesamo.ricercadisistema.it/ [14] Global Economics Game: http://www.worldgameofeconomics.com/ [15] The 10 political games everyone should play. http://www.guardian.co.uk/technology/gamesblog/2006/oct/26/tenseriousgam [16] Openttd http://www.openttd.org/en/ [17] Simutrans http://www.simutrans.com/ [18] SimSity: http://simcitysocieties.ea.com/index.php [19] O. Devisch. Should Planners Start Playing Computer Games? Arguments from SimCity and Second Life. Planning Theory & Practice, Volume 9, Issue 2 June 2008 , pages 209 – 226 [20] GeNIe environment for decision-theoretic model. http://genie.sis.pitt.edu/
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-169
169
“Pink Seal” a certification for Firms’ Gender Equity Tindara ADDABBOa, Gisella FACCHINETTIa, Giovanni MASTROLEOb, Tiziana LANGc a University of Modena and Reggio Emilia b University of Calabria c Isfol, Roma
Abstract. In this paper we present the results of a research project promoted by Italian Ministry of Labour devoted to assign a formal certify to Italian private firms who respect Equal Opportunities (EO) between man and woman. The research project has been implemented in the framework of the 2007 European Year of Equal Opportunities for All and has been co-financed by the European Commission. Italian Ministry made a public call, to choose a sample of private firms. The sample was formed by: 14 joint-stock companies, 2 Ltds, 3 cooperatives Ltd, 10 social cooperatives and 5 others form of enterprise. The group of researchers involved is makes by sociologists, mathematicians and economists. The project started in June 2007 when the experts set up the selfassessment questionnaire. The test was carried out from November 2007 until February 2008. The selected firms filled in the self-assessment questionnaires and the sociologists conducted in-depth interviews to the relevant union representatives. A Fuzzy Expert System (FES) is used. The scope for using FES is connected not only to the multidimensional nature of EO and to the need of providing a synthetic indicator of firm’s EO without losing the its complexity, but even with the composite group of experts involved. The presence of sociologists, economists trade unions members, not used to a mathematics language, carried us to propose an instrument more user friendly like a FES. Keywords. Fuzzy logic, Fuzzy expert system, gender equity, equal opportunities.
Introduction The purpose of the present paper is to illustrate an experimental test carried out in Italy with the aim of certifying the gender equality approach within a sample of Italian firms. The project “Bollino Rosa” (pink seal) has been implemented in the framework of the 2007 European Year of Equal Opportunities for All and has been co-financed by the European Commission. Actually the Italian labour market shows remarkable gender inequalities notwithstanding the advanced labour market regulation in terms of Equal Opportunities. Italian women still experience many inequalities at their workplaces such as: wage differences, lower career paths, higher percentages of fixed-term and short-term contracts, etc. As a matter of fact in the past years despite having witnessed a constant growth in the employment rate of women (that reached 46.6 percentage points in 2008), the objective set out in the Lisbon Strategy still seems far to be attained (60% women employment rate within year 2010). Inequalities at the workplace are reinforced by and interact with the unequal distribution of unpaid work at home (care
170
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
and domestic work). Italian women bear a higher share of unpaid work than their partners and when they enter the labour market this is going to produce an unequal share of total (paid and unpaid) work. [2], [19]. Though the Italian situation in terms of gender equity and access to paid labour is particularly weak gender segregation in employment is a feature that characterizes EU27 on average as the results by Burchell, Fagan, O’Brien and Smith [9] based on the European Foundation for the Improvement of living and working conditions 2005 Survey shows. In order to improve women employment rates and to reduce the existing gender gaps in the workplaces the Ministry of Labour promoted the project “Bollino Rosa” together with Isfol (Istituto per la Formazione Professionale dei Lavoratori) and a group of experts in sociology, mathematics and economics. The project started in June 2007 when the experts set up the self-assessment questionnaire. The test was carried out from November 2007 until February 2008. The selected firms filled in the self-assessment questionnaires and the sociologists conducted in-depth interviews to the relevant union representatives. The questionnaires were returned to the Ministry and underwent the Fuzzy analysis that is described hereafter. The firms reacted very positively to test and identified the strength of the trial in its capacity of imposing a new kind of analysis of the organizational contexts, from a gender friendly point of view. On the other side the questionnaire complexity has been pointed out as very critical especially by very small firms and by social cooperatives. In the future it could be useful to foresee diverse questionnaires according to the company typology and size though maintaining the structure for comparability amongst firms. The future program is to organize several questionnaires on line the firms may compile autonomously to control their position in certification level. The use of a FES lets the firm to understand who are the fields in which their performance is under the sufficient level and going back till the initial inputs the firm may change their behavior to obtain better results. Section 2 presents the FES and variables used. Section 3 comments on the results of the application of the system to firms involved in the project while section 4 contains concluding remarks and proposals for the extension of the project.
1. A fuzzy expert system for evaluation 1.1. Why a fuzzy expert system To face this complex problem and to reach an aggregated value of the certification level, we propose a Fuzzy Expert System (FES), which utilizes fuzzy sets and fuzzy logic to overcome some of the problems that occur when the data provided by the user are vague or incomplete. This is not the natural framework of a FES, in fact engineering problems are more typical for FES, but recently economic and management researchers have found in this instrument interesting applications, [4], [5], [6], [7], [12], [26]. In a multidisciplinary research, like the one we present, the power of FES shows its ability to describe linguistically a particular phenomenon or process, and then to represent that description with a small number of very flexible rules. In a FES, the knowledge is contained both in its rules and in fuzzy sets, which hold general description of the properties of the phenomenon under consideration. FES provides all possible solutions whose truth is above a certain threshold, and the user or the application program can then choose the appropriate solution depending on the particular situation. This fact adds flexibility to the system and makes it more powerful.
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
171
FES uses fuzzy data, fuzzy rules, and fuzzy inference, in addition to the standard ones implemented in the ordinary Expert Systems The following are the main phases of a FES design ([25], [28]): 1) identification of the problem and choice of the type of FES, which best suits the problem requirement. A modular system can be designed. It consists of several fuzzy modules linked together. A modular approach may greatly simplify the design of the whole system, dramatically reducing its complexity and making it more comprehensible. This approach is particularly useful when the research is in a multidisciplinary field, like here, and the experts are not easily involvable in a strict mathematical approach; 2) definition of input and output variables, their linguistic attributes (fuzzy values) and their membership function (fuzzification of input and output); 3) definition of the set of heuristic fuzzy rules. (IF-THEN rules); 4) choice of the fuzzy inference method (selection of aggregation operators for precondition and conclusion); 5) translation of the fuzzy output in a crisp value (defuzzification methods); 6) test of the fuzzy system prototype, drawing of the goal function between input and output fuzzy variables, change of membership functions and fuzzy rules if necessary, tuning of the fuzzy system, validation of results. 1.2. System structure The experts decide that five are the “dimensions” that describe the evaluation of a firm in a gender perspective: 1) Equity in the Firm, 2) Equal Opportunities, connected in a macro-indicator Gender Equity and 3) Work-life balance, 4) Human Resources Management, 5) Safety, connected in another macro-indicator Gender Sustainability, (see Figure 1).
Figure 1. The five dimensions.
In this paper we present more details on the dimension “Work-life balance” (WLB). The current distribution of time by gender in Italy is highly unbalanced towards a greater involvement of women in unpaid work activities and family responsibilities [19], [16], [17], [23]. For this reason we believe that a work-life balance environment can have a positive effect on women’s employment. The effects
172
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
can be: of improving women’s entry into the firm (if women anticipate that the firm environment allows them to reach a better work life balance and makes their high unpaid family work load compatible with paid work activities) and of making women staying in paid work activities in phases of their life cycle when unpaid family work is more intensive [10-11], [17-18], [20]. A more continuous work-profile in the firm can improve women’s job specific human capital and improve career perspectives [3], [8]. Women’s higher sensitiveness to work-life balance problems can be checked directly by ad hoc surveys on household’s socioeconomic conditions when work-life balance difficulties are investigated explicitly [1]. The importance of paid working time and of their distribution in the presence of work-life balance problems has been detected also on the basis of ISTAT surveys on new mothers [21] and on the basis of Isfol survey showing also different suggestions of changes to achieve a better work-life balance in relation to different care activities needs. Table 1. Input variables. CourtesyBus EffectOfPT
Playroom or courtesy bus for employees' children to after school activities Part-time does not reduce the possibility to be in an apical position
FamilyLeaves FlexEntExit HoursBank IncAccCareer
Parental leaves promoted by the firm Enty or Exit time flexibility Hours bank Work-life balance policies introduced to increase female participation in firm’s employment Firm’s crèches, agreement with child care services, vouchers for child or elderly or people needing assistance care Percentage of workers taking parental leaves Employee’s voice in the decision of changes in paid working time or schedule Reversible Part time Women employed in evening work Women employed during public holidays Women employed in night work Weekly schedule flexibility Working time compatible with schools time
Kindergarten ParentLeave PowerChTime RevPartTime WEEvening WEHolidays WENight WorkTimeFle WTCompat
Figure 2. Work-life balance dimension.
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
173
Firm’s status in terms of work-life balance is made of: working time management, available services and firms’ policies to improve work-life balance, evaluation of women’s presence in unsocial hours. The experts have given to the presence of these policies a higher mark if they have been agreed with employees’ unions or with workers, if unions were not present in the firm, in a systematic and continuous process of agreement. A positive effect has been given to the presence of flexible entry and exit time as well as the presence of a time bank and the possibility that the employee can change her working time schedule (i.e. that the working time is not unilaterally decided by the firm). All these considerations are present in the block rules. Part-time work can improve work-life balance and its choice (amongst voluntary part-timers or those women who would like to work part-time [22]) is often explicitly related to care problems [11]. In Italy the higher unpaid family work done by women, the relatively low availability of public care services and their bad synchronization with full-time working schedule, make part-time a positive factor improving work life balance for women. One must notice however that the positive effect of voluntary part-time on women’s work life balance may bear a cost in terms of gender equity. This can occur since part-time work is often related to a more difficult access to apical positions and to career progression. The latter being worsened if it is not possible to revert the part-time choice. Given the current distribution of unpaid domestic work we give a lower mark, inside the work-life balance dimension to a higher presence of women in job positions characterized by unsocial working hours. Actually unsocial working time in terms of length or night and evening shifts have been found to have a worse work-life balance [13]. The following is the design the expert fixed for the variable “Women employed in night work”. This percentage, with the other two, evening hours and public holidays hours, produce a negative effect on the aggregate variable “unsocial hours” if they are near the value 0,1.
The ‘work-life balance policies’ dimension is obtained by a set of variables that measure firm’s situation in terms of family friendly policies (not included in time management) and that show also the link with other firm’s policies and the area where the firm is located. The variables are: the firm promotes presence of crèche (inside the firm), or of agreement with external services or voucher to use care services (this can be achieved for instance by opening the internal crèche to children of families living in the area or by promoting agreement with services already existing that can improve the supply of care services in the area); the firm supplies transport services to game rooms services or other after school services reducing the transport costs connected to the use of the services; the firm supplies difficulty in synchronizing working time and children’s school time (for instance by means of flexible entry and exit time at work); the firm promotes the use of parental leaves to the employees’ caring need, (given the current distribution of care work inside the family, this can reduce women’s difficulties
174
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
in making working and caring activities compatible in certain phases of the life cycle). If the firm promotes the take up of parental leaves by men it positively contributes to the division of unpaid work inside the family; the firm promotes the use of part-time if is not going to prevent part-timers’ career into the firm [18]; the firm improves women’s participation in to the firm.
2. Results In this project, the Ministry of Labour through a public call has selected 34 private firms of various sizes, belonging to the different productive sectors. The sample was formed by: 14 joint-stock companies, 2 Ltds, 3 cooperatives Ltd, 10 social cooperatives and 5 others form of enterprise. According to the number of employees those firms can be classified as follows: 1 micro-enterprise (<10 employees), 12 small and medium enterprises (>10 employees <250), 21 big enterprises of which 4 employ more than 30.000 employees. 21 firms have got a turnover of max 50 millions euro (SMEs) while 5 company exceed a turnover of 1 billion euro per year. Concerning the economic sectors, 17 firms belong to the service industry (research and training, welfare services, company services), 5 are utility companies (telecommunications and public transports), 3 of them belong to the pharmaceuticals industry, 3 to the retail trade, 1 is an energy producing industry and 1 is a bank. The last 4 firms of the sample belong to other sectors like: edutainment, food production and airport management. In the next table we present some results looking not only at the final evaluation “Certification” but even to the intermediate variable involved in WLB dimension and in the macro-index Gender Equity. Looking at the results we may see that firm 006 has a final evaluation 0,47 that is obviously insufficient to reach the certification, but the dimension that produce this result is connected with Gender Equity. This firm can increase his level giving great care to the dimensions Equity in the Firm and Equal Opportunities and backward till the initial inputs there involved. The firm 005 has 0,5 and 0,44 in the two macroindicators. This means that it has to do a wide work of revision of all his gender policy. The advantage of this modular system lets the possibility to recognize what are the weaker areas and then to intervene where it is necessary to reach at least a sufficient evaluation. Table 2. Results. Working Unsocial WLB Time Hours Policies
WLB
Gender Sustainability
Gender Equity
Certification
firm_001 firm_002 firm_003 firm_004 firm_005 firm_006 firm_007 firm_008 firm_009 firm_010 firm_011
0,6 0,6 0,6 0,8 0,4 0,8 0,8 0,4 0,6 0,8 0,8
0,25 1 1 0,46292 1 1 1 1 0,75 0,64162 0,75
0,4 0,4 0 0,2 0,2 0,8 0,4 0,4 0,6 0,6 0,6
0,5 0,625 0,5 0,5 0,5 0,875 0,75 0,5 0,625 0,69582 0,75
0,48378 0,5 0,5 0,44052 0,5 0,71428 0,64286 0,60846 0,71428 0,64954 0,71428
0,16666 0,16572 0,25 0,46118 0,448 0,20686 0,29792 0,39108 0,25 0,56998 0,5343
0,3 0,31022 0,35 0,43398 0,4688 0,47412 0,47516 0,4981 0,5 0,62004 0,62058
firm_012
0,8
0,96874
0,4
0,73436
0,64284
0,71758
0,67812
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
175
3. Conclusion The process of firm’s evaluation in terms of gender certification pursued by the project ‘Bollino Rosa’ promoted by the Italian Ministry of Labour in year 2007 requires collecting and analysing indicators on different dimensions of gender equity. To provide a synthetic indicator on the firm’s situation without loosing the complexity of the different dimensions of gender equity we have modelled and applied a Fuzzy Expert System. To our knowledge, this is the first attempt of using Fuzzy Expert System to assess firms with respect to gender equity and gender sustainability, though broader issue of the gender perspective in the evaluation of quality of work using fuzzy expert system has been considered in [5]. Other experiences carried out in European countries implies self-evaluation (like Total Equity Prize in Germany) or evaluation on the basis of the policies enacted by the firms in terms of equal opportunities, on their human resources management and on the presence of work-life balance policies (like Label Egalité in France) however they do not use a system of evaluation structured as the one used in this application. In the future a structured questionnaire will be submitted to the public sector and a calibration of the questionnaire should follow its experimental phase. With respect to other systems of evaluation the proposed method entails an analytical model of evaluation that confers numerical indicators useful for the firm to assess its position as far as gender equity and sustainability are concerned allowing a backward process to detect causes of poor grades in terms of gender certification. Another advantage of the system is the transparency of the evaluation system reached by a clear statement of input variables and the rules used by the experts. Sometimes the transparency is not an advantage especially in situations in which politics is involved, but we think that a clear and transparent method is better than others in which the choices are expressed in “obscure” words in order to hide real choices from people involved. Public institutions and government can use the results of this model in terms of gender certification of firms to take policy decisions. They can also infer by comparative analyses on different areas how different policies and agreement carried out also at local level may interact with the gender assessment of firms in that area. Policies to consider firms’ gender certification in gender equity and sustainability can also be used as an indicator for the assessment of firms when public institutions or firms contract out part of their production.
References [1] Addabbo, T. (ed.) (2005) Genitorialità, lavoro e qualità della vita: una conciliazione possibile? Milano, Angeli. [2] Addabbo, T.: Unpaid work by gender in Italy,Ch.2 (2003) in‘Unpaid work and the economy’, A.Picchio (ed.) Routledge,London and New York (2003) [3] Addabbo, T., Borghi, V. and Favaro, D. (2006) “Differenze di genere nell’accesso a posizioni apicali. Risultati di una ricerca sul campo”, in Simonazzi Annamaria (a cura di) (2006) Questioni di genere, questioni di politica. Trasformazioni economiche e sociali in una prospettiva di genere, Roma, Carocci. [4] Addabbo, T. - Di Tommaso, M.L. -Facchinetti, G.(2004): To what extent fuzzy set theory and structural equation modelling can measure functionings? An application to child well being. Materiali di Discussione del Dipartimento di Economia Politica n.468, Modena (2004) [5] Addabbo T. – Facchinetti G. - Mastroleo G. (2005). "A fuzzy expert system to measure functionings. An application to child wellbeing". In K. Saeed, R. Mosdorf, J. Pejas, P. Hilmola, Z. Sosnowsky (eds) Proceeding of Image Analysis, Computer Graphics, Security Systems and Artificial Intelligence Applications. Bialystok Vol. 1, 29-42.
176
T. Addabbo et al. / “Pink Seal” a Certification for Firms’ Gender Equity
[6] Addabbo T. – Facchinetti G - Mastroleo G. (2007). “Capability and functiongs: A Fuzzy Way to measure interaction between Father and Child” in Biometrics, Computer Security Systems and Artificial Intelligence Applications. Khalid Saeed, Jerzy Pejas, Romuald Mosdof Editors 185-195, Springer, ISBN 978-0-387-36232-8 (Print) 978-0-387-36503-9 (Online). [7] Addabbo T. – Facchinetti G - Mastroleo G. - Solinas G. (2006). “A fuzzy way to Measure Quality of Work in a multidimensional view” Plenary lecture in the ACS Conference Poland 0ctober 2006. Jerzy Pejas, Imed El Fray, Khalid Saeed Editors Vol I, 13-24. ISBN 83-87362-75[8] Addabbo, T.-Neri, M. - Riccò, R. (2005) Ricerca/intervento a supporto dell’innovazione di approccio e di strumentazione nella gestione delle risorse umane in un’ottica di valorizzazione della diversità di genere, Rapporto di ricerca, RER n.0075-2003-DGR n.1168 23/06/2003CESVIP, Regione Emilia Romagna, collana argomenti n.2. [9] Burchell, B., Fagan, C., O’Brien, C. and Smith, M. (2007) Working conditions in the European Union: The gender perspective, Dublin, European Foundation for the Improvement of living and working conditions. [10] Cardinali, V. (2006a) (ed) Maternità, lavoro, discriminazioni, Isfol, Ufficio Nazionale consigliera di parità, Rubbettino. [11] Cardinali, V. (2006b) ‘Le flessibilità del mercato del lavoro in ottica di genere’, Parte I, Capitolo II, in Cardinali, V. (2006a) (a cura di) Maternità, lavoro, discriminazioni, Isfol, Ufficio Nazionale consigliera di parità, Rubbettino. [12] Facchinetti G. – Franci F.- Mastroleo -G. Pagliaro V.-Ricci G. (2007) Illogica di un conflitto. La logica fuzzy applicata al conflitto tra Israele e Libano, Eurilink editor ISBN 978-88-95151-04-5 [13] Fagan, C. and Burchell, B. (2002) Gender, jobs and working conditions in the European Union, http://www.eurofound.eu.int/publications, Luxembourg, Office for Official Publications of the European Communities. [14] Isfol (2007) Rapporto 2007, Rubbettino. [15] Istat (2004) Rapporto annuale. La situazione del paese nel 2003, Roma, Istituto Poligrafico e Zecca dello Stato. [16] Istat (2006a) ‘Tempi di lavoro e valorizzazione delle competenze’ Capitolo 4 in Istat (2006) Rapporto annuale. La situazione del paese nel 2005, Roma, Istituto Poligrafico e Zecca dello Stato. [17] Istat (2006b) Diventare padri in Italia, Roma, ISTAT. [18] Lee, S., McCann, D. and Messenger, J.C. (2007) Working time around the world. Trends in working hours, laws and policies in a global comparative perspective, London and New York, Routledge. [19] Picchio, A. (ed.) (2003) Unpaid work and the economy, London and New York: Routledge. [20] OECD (2007) Babies and bosses. Reconciling work and family life. A synthesis of findings for OECD countries, Paris, OECD. [21] Prati, S., Lo Conte, M. and Talucci, V. (2003) ‘Le strategie di conciliazione e le reti formali e informali di sostegno alle famiglie con figli piccoli’ in CNEL and ISTAT (2003) Maternità e partecipazione delle donne al mercato del lavoro: tra vincoli e strategie di conciliazione, Roma, 2 dicembre 2003. [22] Rustichelli, E. (2003) ‘Modi e tempi di lavoro: analisi delle tipologie contrattuali’ Capitolo 3, in Battistoni, L. (ed) (2003) I numeri delle donne, Quaderni Spinn, n.4, Ministero del lavoro e delle Politiche Sociali. [23] Sabbadini, L.L. (2005a) Conciliazione dei tempi di vita e denatalità, Istat e Ministero per le Pari Opportunità. [24] Bassanini, C. (2008) ‘Modelli e buone pratiche per la certificazione di genere nelle imprese: un confronto internazionale’, in Príncipe G. et al.,(2008). Strumenti per certificare e promuovere la parità di genere in azienda, ISFOL, 2008, Ch.10. [25] Piegat A. (2001), Fuzzy modelling and control. Springer-Verlag, Heidelberg-New York. [26] Shun-Hsien L. (2005) Expert System Methodologies and application- a decade review from 1995 to 2004. Expert Systems with application. Vol 28 Isuue 1 93-103. [27] Wang L., (1992), Fuzzy systems are universal approximators Proc. Of Int. Conf. On Fuzzy Engineering, 471-496, 1992. [28] Von Altrock C. (1997), Fuzzy Logic and NeuroFuzzy applications in Business and Finance, PrenticeHall Inc.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-177
177
Intensive Computational Forecasting Approach to the Functional Demographic Lee Carter Model 9DOHULD'¶$0$72a, Gabriella PISCOPOb, Maria RUSSOLILLOa a Department of Economics and Statistics, University of Salerno, Campus Fisciano 84084 (Salerno) Italy e-mail: [email protected], [email protected] b Department of Mathematics and Statistics, University of Napoli Federico II, complesso Monte S. Angelo 80126 Napoli Italy e-mail: [email protected]
Abstract. Several approaches have been developed for forecasting mortality using stochastic model. In particular, the Lee Carter model (1992) has become widely used and there have been various extensions and modifications proposed to attain a broader interpretation and to capture the main features of the dynamics of the mortality intensity. Hyndman and Ullah (2005). introduce a particular version of the Lee Carter methodology, the so-called Functional Demographic Model - FDM, the most accurate approach as regards some mortality data, particularly for longer forecast horizons where the benefit of a damped trend forecast is greater. The paper objective is properly to single out the most suitable model between the basic Lee Carter and the FDM to the Italian mortality data. A comparative assessment is made. Moreover, we provide information on the uncertainty affecting the forecasted quantities by using bootstrap technique. The empirical results are presented using a range of graphical analyses. Keywords. Lee Carter model, functional demographic model, forecasting, smoothing
1. Introduction Mortality has shown a gradual decline over time. To have an idea of this evolution, Figure 1 shows the general drop in the Italian male mortality rates during the period 1950-2006. Improvements in mortality are not uniform across the ages and the years: first of all, reductions in mortality rate are stronger for ages between 0 and 10. As it is clear, there is an increasing variance for higher ages, especially around x=100. The dynamic behaviour of the underlying curves is complicated to describe because it does not outline a specific pattern. For example, there has been a higher growth rate of 20 years old death in 2000 than in the other years plotted. Moreover, there are some outlier data, such as the number of 100 years old death in 1950. In the case of older ages, the high variability can be due to small exposures to risk and this is a common problems when estimating mortality rates for groups aged 90 and more. Techniques of smoothing
178
V. D’Amato et al. / Intensive Computational Forecasting Approach
have been implemented to avoid this shortage of data because the heavy variance at younger and older ages influence the fitting of mortality models. On the other hand, methods taking into account the different patterns among ages are necessary in order to estimate the cohort effects. Recently, many authors have proposed approaches to mortality forecasting based on smoothing procedure (Delwarde et al., 2007; Currie, 2004). Hyndman and Ullah (2005) add to this literature with an extension of the Lee Carter model, based on the combination of smoothing techniques and forecasting robust to outliers. In this paper, we explore the potentiality of the Functional Demographic Model (from herein FDM) by Hyndman and Ullah to catch this features of the Italian death rates. Initially, we compare it to the traditional Lee Carter (LC) method applied to the Italian male data; the results show the best goodness of the fitting through the FDM methodology. In order to verify if this difference is only due to the smoothing preliminary procedure in the FDM, we fit the Lee Carter on Italian male smoothed data (we call this procedure LCS). We compare the FDM with the LCS and verify that the former fitted better Italian male data than the latter; finally, we provide confidence intervals for forecasted quantities derived by using simulation techniques. The paper is organized as follows: in section 2, we describe the Lee Carter model and the Functional Demographic Model; section 3 shows the details of simulative approach for obtaining confidence intervals on the Lee Carter family models; in section 4 a comparison among the basic Lee Carter and FDM is performed to the Italian male mortality data. Concluding Remarks are offered in section 5.
Figure1. Italian Male Death Rates
179
V. D’Amato et al. / Intensive Computational Forecasting Approach
2. The Lee Carter Model and the Functional Demographic Model The Lee-Carter methodology is a milestone in the actuarial literature of mortality projections. The model describes the log of the observed mortality rate for age x ant year t, mx ,t , as the sum of an age-specific component D x , that is independent of the time, and another component that is the product of a time-varying parameter
kt ,
Ex ,
that
reflecting the general level of mortality, and an age-specific component
represents how mortality at each age varies when the general level of mortality changes:
ln mx,t D x E x,t H x,t
(1.1) The Lee Carter model (1992) has become widely used and there have been various extensions and modifications proposed to attain a broader interpretation and to capture the main features of the dynamics of the mortality intensity. Hyndman and Ullah (2005) propose a methodology to forecast age-specific mortality rates, based on the combination of functional data analysis, nonparametric smoothing and robust statistics. They use the functional data paradigm (Ramsey and Silverman, 1997), that leads to the application of nonparametric smoothing in order to reduce the randomness in the observed data. The model is summarized in the following. Let yt x be the log of the observed mortality rate for age x and year t , f t x the
underlying smooth function,
^xi , yt xi `, t
1,..., n, i 1,..., p the functional time
series, where
yt xi
f t ( xi ) V t ( xi )H t ,i (1.2)
with
H t ,i
is an iid standard normal random variable and the
amount of noise to vary with
V t xi allows
for the
x.
The smoothing is carried out with a non parametric method, based on weighted penalized regression splines, with a monotonic constraint, that is reasonable for mortality data and allows to reduce the noise for high age. Then, the fitted curves is decomposed via a basis function expansion: K
f t x P x ¦ E t ,kM k x et x k 1
(1.3)
x is a measure of location of ft x , ^Mk x ` is a set of orthonormal basis functions and et x ~ N 0, var x . The error term et x , given by the difference where P
between the smoothed curves and the fitted curves from the model, is the modelling error.
180
V. D’Amato et al. / Intensive Computational Forecasting Approach
yt x , univariate time series models are fitted to each coefficients ^Et ,k `, k 1,..., K . Using them, the coefficients ^Et ,k ` k 1,..., K are forecasted for t n 1,..., n h . Finally, The previously obtained coefficients are implemented to get the f t x as in formula (1.3) and the yt x are projected from (1.2). The In order to forecast
estimated variance of error terms in (1.3) and (1.2) are used to compute prediction intervals for the forecast.
3. Bootstrapping the model for mortality projections First of all, the rationale of our research is to determine which model between those aforementioned lead to the best goodness of fit to the main feature of Italian data, especially to avoid localized age induced anomalies. Furthermore, we obtain mortality projections on the basis of the best selected model. The effects of uncertainty coming frRP SURMHFWLRQV DUH LQYHVWLJDWHG E\ FRQILGHQFH LQWHUYDOV &,¶V . To this purpose, the nonlinear nature of the quantities of interest makes an analytical approach not tractable and therefore simulation technique is used: the bootstrap procedure appears more reliable as in Renshaw and Haberman (2008). The bootstrap sampling can be carried out parametrically when an estimate of the population F derived from a parametric model is used:
FÖ par by means of the notation of Efron and Tibshirani (1993). Let us consider the underpinning Lee Carter structure for achieving the bootstrap, the estimate of population
Ö F is: FLC .
Instead of sampling with replacement from the dataset,
B samples of size n are
FÖLC : x1* , x2* ,..., xn* .
obtained by sampling from the
FÖLC o x*
X. j , j 1 , 2 ,..., B ,the In other words, on the basis of the generated bootstrap samples D x ' s, E x ' s, k t ' s are estimated and the k 't ' s are then projected as in Hyndman and j the bootstrapped parameter is Ullah (2005) . In particular, for each bootstrap sample We assume that the samples are independent realizations of a random variable
T
j
got at j-th sample. We can write the bootstrap estimator as:
T BOOT
1 B j ¦t B j1
(1.4) j
where B is the number of the resampling and t is the estimate at the resampling stage. The variance of the bootstrap estimator is the following:
j th
V. D’Amato et al. / Intensive Computational Forecasting Approach
Var T BOOT
1 B j ¦ t T BOOT B 1 j 1
181
2
(1.5) Finally we can compute the measure of interest.
4. Numerical Applications We have run the application by considering the Annual Italian male mortality rates from 1950 to 2006 for single year of age (the data are downloaded from the Human Mortality Database). The first step of the application consists in fitting the basic Lee Carter model and the FDM version to the data under consideration; Figure 2 and 3 show the fitted parameters.
Figure 2. The parameter estimates of basic Lee Carter model on Italian male mortality data
182
V. D’Amato et al. / Intensive Computational Forecasting Approach
Figure 3. Basis Functions of FDM and associated coefficients for Italian male population.
As explained in Hyndman et al. 2005, the basis functions model different movements in mortality rates across the ages. In particular, the Basis function 1 mainly models mortality changes for children; the Basis function 2 gives us information about the differences between 30 and 60 years old. The other functions, especially the last one, are more complex and model differences between all the cohorts. The basis functions explain 91.8%, 3.9%, 1.6%, 0.4% of the variation respectively. Instead, the percentage of variation explained by the Lee Carter model is 91.6%. A good fit is achieved when the residuals are independent and identically distributed. We have verified these conditions using contour maps and through the following error measures: mean error (ME), mean square error (MSE), mean percentage error (MPE), mean absolute percentage error (MAPE), integrated error (IE), integrated square error (ISE), integrated percentage error (IPE) and integrated absolute percentage error (IAPE).
V. D’Amato et al. / Intensive Computational Forecasting Approach
183
Table 1. Lee Carter model
ME 0.00786 IE 0.78248
AVERAGE ACROSS AGES MSE MPE 0.01947 -0.01290 AVERAGE ACROSS YEARS ISE IPE 1.88518 -1.14400
MAPE 0.03697 IAPE 3.48655
Table 2. FDM model
ME 0.00001 IE 0.00101
AVERAGE ACROSS AGES MSE MPE 0.00367 -0.01037 AVERAGE ACROSS YEARS ISE IPE 0.30953 -0.85270
MAPE 0.02764 IAPE 2.48752
By comparing the traditional LC model to the FDM one, we can notice that the goodness of fitting is better for FDM ( see percentage of variance explained and MSE). We keep running our application, in order to verify if the best fitting of FDM depends on the smoothing on the data, involved in the model. For this reason, we smooth the data using a monotonic p-spline and then apply the Lee Carter method to the smoothed data; we call this procedure LCS. From a first analysis, we can see that the percentage of variation explained by applying the LCS model is 93.4%. In particular, we can notice that the percentage of variance explained increases when we shift from LC to LCS; this is not due to a greater capacity of the model to describe the data, but to a transformation of the same data into data less variable. Moreover, the MSE of the LCS is greater than the MSE of the FDM. From this analysis we can conclude that the improvements in the fitting shifting from LC to FDM are not only due to smoothing. The FDM explain better movements in the mortality through the basis functions. In fact, the FDM is a generalization of the LCS: LCS can be obtained from FDM, if we set k=1 and consider raw mortality rates rather than smoothed functional; a random walk with drift (used for forecasting with the LC) is also a special case of an exponential smoothing state space model (used for forecasting with the FMD). Thus, the differences between LC and FDM are: the observation are equally weighted in the LC, while in FDM the weight of outliers is lower; in the LC there is only the first basis function that explain mainly the change in death rates at younger ages, while in the FDM there are other functions that explain the movements in older ages too. For these reason, the fitting improves when we shift from LC to FDM, and this not only because the FDM operates on smoothed data. Also the forecasts appear improved; to show this, we fit the FDM and LCS on the data from 1950 to 1975, then we project the mortality rates from 1976 to 2006 according to the fitted model and compare the projections with the observed rates as in figures 4 and 5. In particular, in the following we can observe the difference between observed rates and its projections.
184
V. D’Amato et al. / Intensive Computational Forecasting Approach
Figure 4. FDM Forecast errors
Figure 5. LCS Forecast errors
If we compare Figure 4 to Figure 5, we can see that the forecast errors are lower in the FDM than in LCS. Moreover, the errors committed for younger ages are quite similar because in both models there is the first basis function. Instead, for ages between 20 and 100 the forecast errors are lower with the FDM than with the LCS. The final results refer to the life expectancy and a measure of its uncertainty obtained by the bootstrap applied to the FDM model. According to the results in Table 3, it seems reasonable to consider reliable the projections obtained through the model taken into account. In fact, a very low bootstrap procedure variance has achieved as in (1.5). For instance, we have calculated furthermore the confidence interval at 97.5%. It corresponds to the interval around the parameter value as in (1.4), in which falls the mean of the population from the re-samples come true, according a 97,5% of probability.
V. D’Amato et al. / Intensive Computational Forecasting Approach
185
Table 3. The Confidence Intervals- Bootstrap on FDM
10000 Bootstrap Replications Bootstrap variance
0.00864061
Quantile 97.5% 0.2183068
Quantile 2.5% 4.723007e-05
5. Concluding Remarks The paper focuses on a comparative assessment among the original LC model and a variant of the basic methodology, the so called FDM, for providing accurate mortality forecasting, as regards the Italian survival phenomenon. While it is essential to safeguard against depicting general conclusions on the basis of individual cases, the analysis furnishes an useful insight into the comparative performance of the different approaches under consideration. The empirical findings suggest the FDM framework is readily adapted to deal with more complex forecasting problems, including forecasting of the mortality dynamics related to extreme ages. From the viewpoint of insurance companies, this model feature is more desirable, because of their exposure to the variability of mortality trend at old ages, in particular regards post retirement annuitytype products. The analysis is conducted on Italian male mortality rates ranging from 1950 to 2006, with age classified by individual year (0,100). The implementation of the proposed methodology is facilitated by using an R package demography, available from the FDM authors. Although the LC is still used as a point of reference (e.g. Renshaw and Haberman, 2003a, 2003b, 2003c), it is noted the best performance of the FDM model. The study suggests that the FDM forecast accuracy is arguably connected to the model structure, combining functional data analysis, nonparametric smoothing and robust statistics. In particular, the decomposition of the fitted curve via basis functions represents the advantage, since they capture the variability of the mortality trend, by separating out the effects of several orthogonal components.
References [1] Currie I. D., Durban, M. and Eilers, P. H. C. Smoothing and forecasting mortality rates, Statistical Modelling, 4, (2004), 279-298. [2] Currie, I.D., Smoothing overparameterized regression model. 23rd International Workshop on Statistical Modelling (2008). [3] Delwarde, A., Denuit, M., Eirlers, P., Smoothing the Lee±Carter and Poisson log-bilinear models for mortality forecasting. Statistical Modelling, 7, No. 1 (2007), 29-48 [4] Efron, B. Tibshirani, R.J., (1993), An introduction to the bootstrap, Chapman & Hall, New York & London. [5] Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on [date]).
186
V. D’Amato et al. / Intensive Computational Forecasting Approach
[6] Hyndman R.J., Ullah S., Robust forecasting of mortality and fertility rates: a functional data approach, Computational Statistics & Data Analysis, 51 (2007), 4942-4956. [7] Lee, R.D., Carter, L.R., Modelling and Forecasting U.S. Mortality, Journal of the American Statistical Association, 87 (1992), 659-671. [8] Ramsay, J.O., Silverman, B.W., Functional data analysis. Springer-Verlag: New York (1997). [9] Renshaw, A.E., Haberman, S., Lee-Carter mortality forecasting: a parallel generalised linear modelling approach for England and Wales mortality projections, Applied Statistics 52 (2003a), 119-137. [10] Renshaw, A.E., Haberman, S., On the forecasting of mortality reduction factors. Insurance: Mathematics and Economics 32 (2003b), 379-401. [11] Renshaw, A.E., Haberman, S., Lee-Carter mortality forecasting with age specific enhancement. Insurance: Mathematics and Economics 33 (2003c), 255-272. [12] Renshaw, A.E., Haberman, S., On simulation-based approaches to risk measurement in mortality with specific reference to Poisson Lee±Carter modelling, Insurance: Mathematics and Economics, 42 (2008), 797-816
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-187
187
Conflicts in the Middle-East. Who are the actors? What are their relations? A fuzzy LOGICal analysis for IL-LOGICal Conflicts Gianni RICCI 1a, Gisella FACCHINETTI a, Giovanni MASTROLEO b, Francesco FRANCI c, Vittorio PAGLIARO d a University of Modena and Reggio Emilia b University of Calabria c Interproductions, Roma d University of Teramo
Abstract. Will mathematics be useful in overcoming the crisis in the Middle-East Area? The classical decisions methods are based only on numbers and do not include qualitative aspects which can add important and significant meanings to the subject. Furthermore, Mathematics is often reviled and considered unpleasant or difficult to be comprehended and so social researchers prefer to have nothing to do with it. The recent development of Fuzzy Logic has opened the possibility of new applications in situations in which both numbers and adjectives are important and have to be treated, simultaneously and on an equal standing; a mathematical theory where the traditional concept of true or false can be enriched by true and false. A Fuzzy Cognitive Map of the conflicts in the Middle-East has been realized and has been used to evaluate the failure of diplomatic and political efforts with the consequent onset of war and conflict. The paper is the result of the uncommon but joint effort between mathematicians, social researchers, sociologists and information experts. Key words: Conflict, Middle East, Fuzzy Logic.
Introduction Reality is not only complicated but is also complex and so it is important to have the correct vision of the complete scenario. In situations of war and social conflict, new tools and strategies are needed, without which human society in its interethnic and global context is unintelligible. We have analyzed the war between Israel and Lebanon and the general conflicts in the Middle-East Area without oversimplification of the roles of the actors or 1
I am grateful to Massimo Salzano for what he had done in converting a classical optimizer, how I was, in a realistic fuzzy decision maker, how I am.
188
G. Ricci et al. / Conflicts in the Middle-East
transforming the qualitative factors into numbers to avoid loss of information and of meaning. During the Cold War period some researchers modeled the dynamics of conflicts in an Optimal Control or Game Theory framework [1], [2]. This paper contains one of the very few attempts in literature to describe a conflict adopting a Fuzzy Logic approach [1],[6]. The attempt appears to be useful in representing interaction between the actors and it proves to be very flexible in canceling one player (who has become unimportant in the scenario) or adding a new significant player. The political scenario in the Middle-East (wars included) will never have a winner and a loser because it is not a zero sum Game; all parties face to win and lose because the situation is fuzzy.
1. The socio-political situation Since the end of the Second World War, the Middle East has been a very critic geopolitical region not only for the populations living there but also for the consequences on a more global level involving the States supporting the Palestinian or Israeli factions. In 2002 the authors designed a fuzzy cognitive map (FCM) for the conflict between Palestinians and Israelis [5-6]. From the analysis of the map it was evident that the situation was to deteriorate. A FCM can capture the dynamics of the problem just as a normal classical model based on a system of differential equations but in the case of a FCM updating is easier to implement and does not require restarting the estimation process of the model from the beginning. So when in 2006 the conflict between Israel and Lebanon broke out we observed that the background was more or less the same, even in the presence of a growth of the geopolitical complexity, but with a node (representing Hezbollah) that had increased importance and influence over other actors. The new research started analyzing those States with Islamic overtones. We considered Sunni and Shiite Islamic States and their distinction into radical and moderate leanings. The whole Islamic world has been knotted (as a pearl necklace) to actors that can influence and condition the application of diplomacy or - on the contrary - the use of arms. In analyzing the State of Israel we studied the positions of the different parties with respect to the crisis in South Lebanon and the weight of Sephardic, Ashkenazi and Eastern Hebrew, as well as the influence of Jewish lobbies living in Western Countries (in particular USA and Western Europe). Other countries such as Russia and China entered the scenario claiming a powerful role comparable with the one detected by traditional players as UN and the EU. Also France increased its role because of its historical leaning towards pro-Arab policies and often averse to the pro-Israel policy of the US. The crisis between the Shiite of Hezbollah and Israel in South Lebanon also involved Iran and Syria; the first for religious reasons (against the Western World), the second for laical strategic goals and so these two Countries entered the map with a significant weight. The team of experts modified the map correspondently and introduced as an output variable an index measuring the political option for the conflict; a variable running in [0, 1] which is complementary to the military option (a political option closer to the value 1 signifies a lower probability of a military option and vice versa). The international observers gave no chance to a complete political solution but at the same time, nobody resigned themselves to a military option (peace or war); diplomacy was
G. Ricci et al. / Conflicts in the Middle-East
189
aimed towards a compromise solution (peace and war). As an output, the system gave an encouraging value for the political solution (0.61), that is a correct representation demonstrated by the real facts: the truce signed by the two parts signed agreeing towards a cease fire.
1.1. A fuzzy expert system for political option evaluation To face this complex problem and to reach an aggregated value of the political option level, we propose a translation of FCM in an operative instrument Fuzzy Expert System (FES), which utilizes fuzzy sets and fuzzy logic to overcome some of the problems that occur when the data provided by the user are vague or incomplete. In a multidisciplinary research, like the one we present, the power of FES shows its ability to describe linguistically a particular phenomenon or process, and then to represent that description with a limited number of flexible rules. In a FES, the knowledge is contained both in its rules and in fuzzy sets, which hold general description of the properties of the process under consideration. FES provides all possible solutions whose truth is above a certain threshold and the user or the application program can then choose the appropriate solution depending on the particular situation. This fact adds flexibility to the system and makes it powerful. FES uses fuzzy data, fuzzy rules and fuzzy inference, in addition to the standard ones implemented in the ordinary Expert Systems. The following are the main phases of a FES design ([7], [10]): 1) identification of the problem and choice of the type of FES, which best suits the problem requirement. A modular system can be designed. It consists of several fuzzy modules linked together. A modular approach may greatly simplify the design of the whole system, dramatically reducing its complexity and making it more comprehensible. This approach is particularly useful when the research is in a multidisciplinary field, like here, and the experts cannot be involved in a strict mathematical approach; 2) definition of input and output variables, their linguistic attributes (fuzzy values) and their membership function (fuzzification of input and output); 3) definition of the set of heuristic fuzzy rules (IF-THEN rules); 4) choice of the fuzzy inference method (selection of aggregation operators for precondition and conclusion); 5) translation of the fuzzy output in a crisp value (defuzzification methods); 6) test of the fuzzy system prototype, drawing the goal function between input and output fuzzy variables, changing membership functions and fuzzy rules if necessary, tuning the fuzzy system; validation of results. The FES we propose is developed by the software FuzzyTECH by Inform. 1.2. Input, intermediate and output variables. The socio-political field experts decide the actors (inputs) involved in the scenario we describe. Range’s inputs is given by the interval [0, 1] and they are described by the linguistic attributes: Low, Medium and High. The values that the experts assign are the standardized attendance levels of every actor. We present one of them: Likud (var 10)
190
G. Ricci et al. / Conflicts in the Middle-East
Figure 1. Input variable layout: var10. Table 1. Input variables. Label var01 var02 var03 var04 var05 var06 var07 var08 var09 var10 var11 var12 var13 var14 var15 var16 var17 var18 var19 var20
Variable Name China France UN Russia UE USA Kadima Labour Party UTG (ultra orthodox) Likud Shas Hezbollah (Shiite) Iran (pro-Shiite) Syria Christian (Maronite) Druze Islamists_(Sunni) Shiite_moderate Fatah Hamas
The 12 intermediate variables are the results of the aggregation of several variables. Their ranges are given by the interval [0, 1] and are described by 3 or more linguistic terms. Table 2. Intermediate variables. Label Ashenazi Hamas Hezbollah IntScenario IsraelInside IsraelOutside IsraelPolSol LebanonInside LebanonOutside LebanonPolSol Palestine Sephardite
Variable Name Ashkenazi Hamas Hezbollah International Scenario Israel inside organisation Israel outside organisation Israel in favour to political solution Lebanon inside organization Lebanon outside organization Lebanon in favour to political solution Palestine Sephardite
G. Ricci et al. / Conflicts in the Middle-East
191
The output variable represents the political option for the solution of the crisis. Its range is given by the interval [0, 1]. The output is described by 11 linguistic terms. If the defuzzyfied value for the output variable is close to 1 then a political solution to the conflict is possible; on the contrary if it is close to 0 then the military action is a more likely outcome. The rule block we present is connected with Likud and Shas actors. The experts decides that if Shas (var 11) has a low level of attendance and Likud has low or medium level, the variable Sephardite replay in a medium way. On the contrary if Likud has a low level of attendance and Shas has a medium or high level the aggregate variable replays with low level of attendance.
Figure 2. Rule block for the intermediate variable Sephardite.
Figure 3. Fuzzy expert system (August 2006).
192
G. Ricci et al. / Conflicts in the Middle-East
2. The simulations The starting situation (base scenario) refers to the scenario at August 26 of 2006. The Middle East expert gave the input values and the output came out with the value 0,61. This value indicates a propensity either for Lebanon and for Israel to look for a political solution for the confliction scenario which is perfectly compatible with the progression of events in the following week. As we have not a historical series of dates to test the system, we decide to propose a sensitivity analysis to the inputs to control the capability of the system to correctly evaluate the situation.
Table 3. The starting situation (base scenario)
Many different scenarios were considered, one of which, regarding the US, is illustrated below. We considered the case in which George Bush were to have lost the mid-term elections in September 2006. As a consequence, the US policy would have changed with a consequent decrease in the US presence in Middle East. The effect of this scenario would have been: •
a significant disengagement of US interests in the whole area;
•
a significant increase of influence of the UN, UE and France in the area;
•
a significant decrease of Israel’s military propensity.
Some input values would alter the base scenario. USA from 0,6 to 0, 3; UN from 0,6 to 0, 8; UE from 0,7 to 0, 8; France from 0,8 to 0, 9; Kadima from 0,6 to 0, 7; Labour from 0,4 to 0, 7; Likud from 0,4 to 0, 6; Shas from 0,4 to 0, 3; UTG from 0,4 to 0, 3. What we expected was an increase propensity towards a political solution for the crisis between Lebanon and Israel and for the Middle East crisis in general. The output of the system is 0,688 which is a significant increase! In [5] one can find the sensitivity analysis for the relevant actors in that scenario.
3. The current situation. More recently (April 2009) we modified the FES to include changes occurred either in Lebanon and in Israel, as summarized below. In Lebanon the Lebanese Government and Hezbollah are not willing to find an
G. Ricci et al. / Conflicts in the Middle-East
193
agreement. In Israel the right wing is in government and so the level of tension in Lebanon and in Gaza strip has changed. The positions of Syria and Iran are against the Lebanese Government and they are supporting Hezbollah. The role of Arab League and Saudi Arabia has increased considerably as the importance of Omar Suleiman, the most important Egyptian mediator. The position of the Palestinian Authority against Israel and Hamas has become stronger. The values (with respect to base scenario) assigned to Lebanese groups, to Hezbollah, Syria, Lebanon and Israel will consequently decrease, while the inputs related to Saudi Arabia, Arabian League, Egypt, UE and France will once again increase. The value for US is substantially the same because even if the election of Barack Obama has given an important impulse towards a peaceful solution, one cannot ignore the fact that he was elected also with the votes of the US Jewish community. The value for the output variable is 0,64, a little smaller than the result obtained in 2006. At this point a question - or rather - a political need arises! We have to find a third way towards solving the Middle East affair, because the two contending parties will not be able to solve the conflicts politically.
Figure 4. The Fuzzy expert system (April 2009).
194
G. Ricci et al. / Conflicts in the Middle-East
4. Conclusion In this paper we try to focalize the attention on the conflict in the Middle-East Area and to identify the actors and their relations. We have already said that these maps are a simplification of the complexity of the situation, but they serve to us as an example of the possibility and of the usefulness of a fuzzy approach to the social-political reality. Is this the panacea useful to resolve all the complex problems? Obviously it is not, because the complexity doesn’t agree with the panacea, with definite solutions, but it is only a scientific approach which tries to decode the complexity dealing with unsettled, dynamic, vague, fuzzy instruments, but effective. The experts of the sector have been very interested in the approach but especially in the results. They have decided that these new methods may offer a significant opportunity to fix the “actors” on the scene and to understand their connections either for the influence they do each other, or for the strength they use. The final evaluation may give an useful idea of the level of destabilization and may offer the opportunity to have a monitoring way to control the complex problem of Middle East area. Obviously, not only this design may be enlarge, but this is only one of the possible applications in the security and terrorism field, many others are the complex political situations present in the world.
References [1] Al-Mutairi, MS and Hipel, KW and Kamel, MS, Fuzzy Preferences In Conflicts, Springer Heidelberg, Journal of Systems Science and Systems Engineering, 17, 2008. [2] Bojadziev, G., Bojadziev M., Fuzzy Logic for Business, Finance and Management, World Scientific Publishing co, Singapore, 1997. [3] Dimitrov V., Use of Fuzzy Logic when Dealing with Social Complexity, School of Social Ecology, University of Western Sydney, 1998. [4] Facchinetti G., Franci F., Mastroleo G., Pagliaro V., Ricci G., From a Logic Map to a Fuzzy Expert System for measuring the Destabilization in the Middle East, in: Soldek J., Drobiazgiewicz L., Advanced Computer Systems (vol. 1). ISBN: 83-87362-47-6. Szczecin, Poland, (2002), 27-36. [5] Facchinetti G., Franci F., Mastroleo G., Pagliaro V., Ricci G., Illogica di un conflitto. - La logica fuzzy applicata alla crisi tra Israele e Libano, Eurilink editori Roma, 2007. [6] Gleditsch, Nils Petter; Kathryn Furlong, Håvard Hegre, Bethany Ann Lacina & Taylor Owen,. Conflicts over Shared Rivers: Resource Warsor Fuzzy Boundaries?, Political Geography 25(4): 361–382,. 2006. [7] Piegat A., Fuzzy modelling and control. Springer-Verlag, Heidelberg-New York, 2001. [8] Ricci G. Armaments Race: a differential game approach. Quartalshefte. vol. 4, Wien, (1987), 89-95. [9] Richardson, L. F., Arms and Insecurity, the Boxwood Press, Chicago, 1968. [10] Von Altrock C., Fuzzy Logic and Neuro Fuzzy applications in Business and Finance, Prentice-Hall Inc., 1997.
Chapter 4 Biological Aspects
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-197
197
Comparing early and late data fusion methods for gene function prediction Matteo RE a,1 and Giorgio VALENTINI a,2 a DSI, Dipartimento di Scienze dell’Informazione Università degli Studi di Milano, Via Comelico 39, Italy Abstract. High-throughput biotechnologies are playing an increasingly important role in biomolecular research. Their ability to provide genome wide views of molecular mechanisms occurring in living cells could play a crucial role in the elucidation of biomolecular processes at system level but dataset produced using these techniques are often high-dimensional and very noisy making their analysis challenging because the need to extract relevant information froma sea of noise. Gene function prediction is a central problem in modern bioinformatics and recent works pointed out that gene function prediction performances can be improved by integrating heterogeneous biomolecular datasources. In this contribution we compared performances achievable in gene function prediction by early and late data fusion methods. Given that, among the available late fusion methods, ensemble systems have not been, at today, extensively investigated, all the late fusion experiments were performed using multiple classifier systems. Experimental results show that late fusion of heterogeneous datasets realized by mean of ensemble systems outperformed both early fusion approaches and base learners trained on single types of biomolecular data. Keywords. Weighted averaging; decision templates;Naive Bayes combiner;Vector space integration;early fusion; late fusion; decision fusion; data integration; gene function prediction.
1. Introduction Recent advances in high-throughput biotechnologies resulted, in the last years, into an ever increasing number of biomolecular datasets available in the public domain offering unprecedented opportunities of investigation. In order to effectively exploit these information for the elucidation of biomolecular processes at system level a key problem is the integration of heterogeneous biomolecular data. Functional classification of unannotated genes, and the improvement of the existing gene functional annotation catalogs, is a central problem in modern functional genomics and bioinformatics and several works recently pointed out that the integration of heterogeneous biomolecular data plays a central role to improve the accuracy of gene function prediction [1]. 1 Corresponding 2 Corresponding
Author: Matteo Re; E-mail: [email protected]. Author: Giorgio Valentini; E-mail: [email protected].
198
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
A first approach proposed in literature consists in modeling interactions between gene products using graphs and functional linkage networks [2]: integration is exploited through a "conjunctive method", i.e. by including exactly the edges that can be confirmed in each source graph [3], or by applying a probabilistic evidence integration scheme based on graphical models [4]. Approaches alternative to structured probabilistic methods can can be roughly classified according the moment in which the integration of heterogeneous data occurs. In early integration methods the integration is performed at feature level, as in the case of the direct “vector-space integration” (VSI) in which different vectorial data are concatenated [5] and then used to train a final classifier. Kernel methods, by exploiting the closure property with respect to the sum, represents another valuable research direction for the integration of biomolecular data [6]. All these methods suffer of limitations and drawbacks, due to their limited scalability to multiple data sources (e.g. Kernel integration methods based on semidefinite programming [6]), to their limited modularity when new data sources sources are added (e.g. vector-space integration methods), or when data are available with different data type representations (e.g. functional linkage networks and vector-space integration). A new possible approach is based on ensemble methods, but as observed by Noble and Ben-Hur, not much work has been done to apply classifier integration methods to protein function prediction [1]. In late fusion methods, as in the case of ensemble systems, a single learner is trained for any available datasource and the base learners outputs are then converted into a common form resulting into an intermediate feature space in which a suitable rule can be applied to make a final decision. To our knowledge, only few works devoted to the characterization of performances achievable by data fusion based gene function prediction realized by mean of ensemble systems have been proposed, such as the "late integration" of kernels trained on different sources of data [7], or the Naive-Bayes integration of the outputs of SVMs in the context of the hierarchical classification of genes [8]. In this contribution we compare the effectiveness of an early fusion method (direct vector space integration) and several late integration approaches: the classical weighted integration (using two different weighting schemes), Decision Templates [9] and the Naive Bayes combiner in order to provide an overview of capabilities of multiple classifier systems in the integration of heterogeneous biomolecular data sources for the prediction of gene functions.
2. Biomolecular data integration with early and late integration methods 2.1. Data integration through early fusion: Vector-space integration The simplest form of heterogeneous data integration is to concatenate the features collected for each gene in all the available datasets in a fixed-length vector and then feed the resulting collection of vectors into a classification algorithm [7]. The Vector-space integration (VSI) is suitable for data integration independently from the structure of the involved dataset and has the vantage of simplicity. VSI is suffering of biases due to the different length of the concatenated vectors and is not able to incorporate much domain knowledge being each type of data treated identically [1]. In our experiments we nor-
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
199
malized the data with respect to the mean and standard deviation, separately for each data set. 2.2. Reasons for combining biomolecular data through late fusion using ensembles Apart from the general statistical, representational and computational reasons for combining multiple classifier systems [10], there are several reasons to apply ensemble methods in the specific context of genomic data fusion for gene function prediction. At first, continuous advances in high-throughput biotechnologies provide new types of data, as well as updates of existing biomolecular data available for gene prediction. In this context, ensemble methods are well-suited to embed new types of data or to update existing ones by training only the base learners devoted to the newly added or updated data, without retraining the entire ensemble. Moreover most ensemble methods scale well with the number of the available data sources, and problems that characterize other data fusion approaches are thus avoided. Using vectorial data for different sources there is no bias in the integration of large and small or sparse and dense vectors. More in general diverse types of data (e.g. sequences, vectors, graphs) can be easily integrated, because with ensemble methods the integration is performed at decision level. Data fusion of heterogeneous biomolecular data sources can be effectively realized by means of ensemble systems composed by base learners trained on different datasets, and then combining their outputs to compute the consensus decision. 2.3. Simple late fusion: the weighted average In the context of gene function classification, we need to estimate of the reliability of the prediction [8]. To this end, we use SVMs, with probabilistic output obtained by applying a sigmoid fitting to their output [11]. Thus a trained base classifier computes a function dj : X → [0, 1] that estimates the probability that a given example x ∈ X belongs to a specific class ωj . An ensemble combines the outputs of n base learners, each trained on a different type of biomolecular data, using a suitable combining function g to compute the overall probability μj for a given class ωj : μj (x) = g(d1,j (x), . . . , dn,j (x))
(1)
A simple way to integrate different biomolecular data sources is represented by the weighted linear combination rule: μj (x) =
n
wt dt,j (x)
(2)
t=1
The weights are usually computed using an estimate of the overall accuracy of the base learners, but for gene function prediction, where the functional classes are largely unbalanced (positive examples are largely less than negative ones), we choose the F-measure (the harmonic mean between precision and recall). We consider two different ways to compute the weights: Ft wtl = n t=1
log
Ft
wt ∝ log
Ft 1 − Ft
(3)
200
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods log
The wtl weights are obtained by a linear combination of the F-measures, and wt by a logarithmic transformation. Independently of the choice of the weights the decision Dj (x) of the ensemble about the class ωj is taken using the estimated probability μj (eq. 2): # 1, if μj (x) > 0.5 Dj (x) = 0, otherwise
(4)
where output 1 correspond to positive predictions for ωj and 0 to negatives. 2.4. Late integration accounting for systematical errors in base learners outputs: the Decision Templates combiner Certain types of biomolecular data can be informative for some functional classes, but uninformative for others. Hence it would be helpful to take into account whether certain types can be informative or not, depending on the class to be classified. To this end Decision Templates [9] can represent a valuable approach. The main idea behind decision templates consists in comparing a "prototypical answer" of the ensemble for the examples of a given class (the template), to the current answer of the ensemble to a specific example whose class needs to be predicted (the decision profile). More precisely, the decision profile DP(x) for an instance x is a matrix composed by the dt,j ∈[0,1] elements representing the support given by the tth classifier to class ωj . Decision templates DTj are the averaged decision profiles obtained from Xj , the set of training instances belonging to the class ωj : DTj =
1 DP (x) |Xj |
(5)
x∈Xj
Given a test instance we first compute its decision profile and then we calculate the similarity S between DP (x) and the decision template DTj for each class ωj , from a set of c classes. As similarity measure the Euclidean distance is usually applied: 1 [DTj (t, k) − dt,k (x)]2 n × c t=1 n
Sj (x) = 1 −
c
(6)
k=1
The final decision of the ensemble is taken by assigning a test instance to a class with the largest similarity: D(x) = arg max Sj (x) j
(7)
In our experimental setting we consider dichotomic problems, because a gene may belong or not to a given functional class, thus obtaining two-columns decision template matrices. It is easy to see that with dichotomic problems the similarity (S1 ) (eq. 6) for the positive class and the similarity (S2 ) for the negative class become:
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
201
S1 (x) = 1 −
1 [DT1 (t, 1) − dt,1 (x)]2 n t=1
(8)
S2 (x) = 1 −
1 [DT2 (t, 1) − dt,1 (x)]2 n t=1
(9)
n
n
where DT1 is the decision template for the positive class and DT2 for the negative one. The final decision of the ensemble for a given functional class is: D(x) = arg max(S1 (x), S2 (x)) {1,2}
(10)
2.5. Independence of classifiers given the class labels: the Naive Bayes combiner In a recently published work Guan and colleagues investigated the performances achievable by the Naive-Bayes integration of the outputs of SVMs in the context of the hierarchical classification of genes [8]. Despite the problems investigated in [8] and in our experiment are essentially different, as we don’t make use of the structural information contained in gene functional catalogues, we tested the Naive Bayes combination of component classifiers outputs estimating the class-conditional supports given the observed vector of categorized component classifiers outputs s as proposed by Titterington and colleagues [20]: # P (s|ωk ) ∝
T $ cmiω
+ 1/c Nk + 1 k ,si
i=1
%B (11)
where the probability of the observed vector s, formed by the outputs of the component classifiers, given the class ωk is calculated using data contained in the confusion matrices produced for each classifier during the training. In this equation cmiωk ,si represent the number of the training instances of class k predicted to belong to class si by the ith classifier, T the number of the base learners, Nk the number of training samples of true class k and c the number of the classes in the learning task at hand. The calculated supports are normalized and the larger one label the test instance. According to [20] the B term has been set to 1.
3. Experimental setup At today many eukaryotic model organisms are routinely used in biomolecular experiments, ranging from the simplest budding yeast (Saccharomyces cerevisiae) to nematodes and higher eukaryotes like mammals. Despite the great attention dedicated to complex organisms in the last few years, lower costs of experiments performed using simple model organisms and biological features like their lower generation time in respect of higher eukaryotes, resulted into significant differences in the types and amount of public data associated to each model system. We thus decided to perform our experiments using data collected from S.cerevisiae, because the great amount of biomolecular data available for this species. We used protein-protein interaction data collected from Bi-
202
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
Table 1. Datasets Code
Dataset
examples
features
Dppi1 Dppi2
PPI - STRING PPI - BioGRID
2338 4531
2559 5367
Dpf am1
Protein domain log-E
3529
5724
Dpf am2
Protein domain binary
3529
4950
Dexpr
Gene expression
4532
250
Dseq
Pairwise similarity
3527
6349
description protein-protein interaction data from [13] protein-protein interaction data from the BioGRID database [12] Pfam protein domains with log E-values computed by the HMMER software toolkit protein domains obtained from Pfam database [14] merged data of Spellman and Gasch experiments [15] [16] Smith and Waterman log-E values between all pairs of yeast sequences
oGrid [12] , a database of protein and genetic interactions and from STRING [13], a collection of protein functional interactions inferred from heterogeneous data sources comprising, among the others, experimental data and information found in literature. Moreover, we considered homology relationships data using pairwise Smith-Waterman log E values between all pairs of yeast protein sequences. We included also protein domain data available from Pfam [14]. We considered the presence/absence of a particular protein domain in the proteins encoded by genes comprised in the dataset and the E-value assigned to each gene product by a collection of profile-HMMs each of which trained on a specific domain family. The E-values have been computed through HMMR software toolkit (http://hmmer.janelia.org ). Finally we included into our experiment a dataset obtained by the integration of microarray hybridization experiments published in [15] [16]. The main characteristics of the data sets used in the experiments are summarized in Tab. 1. We considered yeast genes common to all data sets (about 1900), and we associated them to functional classes using the functional annotations of the Functional Catalogue (FunCat) database (version 2.1) [17]. In order to reduce the number of classification tasks required by the experimental setting we considered only the first level of the hierarchy of FunCat classes. In other words, we selected the roots of the trees of the FunCat forest (that is the most general and wide functional classes of the overall taxonomy). We also removed from the list of the target functional classes all those represented by less than 20 genes. This corresponds to restrict our classifications to only 15 FunCat classes (Tab. 2) Each dataset was split into a training set and a test set (composed, respectively, by the 70% and 30% of the available samples). We performed a 3-fold stratified cross-validation on the training data for model selection: we computed the F-measure across folds, while varying the parameters of gaussian kernels (both σ and the C regularization term). To evaluate the performance on the separated test set we computed both the F-measure and the AUC (Area Under the ROC Curve). This choice is motivated by the large unbalance between positive and negative examples that characterizes gene function prediction problems: indeed on the average only a small subset of the available genes is annotated to each functional class. We compared the performances of single gaussian SVMs trained on each data set with those obtained with vector-space-integration (VSI) techniques (using a linear SVM for classifier), and with the ensembles described in Sect. 2.4,2.3 and 2.5.
203
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
Table 2. FunCat classes Code 01 02 10 11 12 14 16 18
Description Metabolism Energy Cell cycle and DNA processing Transcription Protein synthesis Protein fate Protein with binding function or cofactor requirement Regulation of metabolism and protein function
Code 20 30
Description Cellular transport and transport routes Cellular communication/ Signal transduction mechanism Cell rescue, defense and virulence Interaction with the environment Cell fate Biogenesis of cellular components Cell type differentiation
32 34 40 42 43
Table 3. Ensembles of learning machines: averages across the performed learning tasks of the F-measure, precision, recall and AUC (Area Under the Curve) computed on the test sets.
Elin
Elog
Edt
EN Bayes
V SI
Davg
Dppi2
F
0.4347
0.4111
0.5302
0.5174
0.3213
0.3544
0.4818
rec
0.3304
0.2974
0.4446
0.6467
0.2260
0.2859
0.3970
prec
0.8179
0.8443
0.7034
0.5328
0.6530
0.5823
0.6157
AUC
0.8642
0.8653
0.8613
0.7933
0.7238
0.7265
0.8170
Metric
4. Results Tab. 3 summarizes the main results obtained in the experiments. The table shows the average F-measure, recall, precision and AUC across the 15 selected FunCat classes, obtained through the evaluation of the test sets (each constituted by 570 genes). The three first columns refer respectively to the weighted linear, logarithmic linear and decision template ensembles; VSI stands for vector space integration (Sect. 3), Davg represents the averaged results of the single SVMs across the six datasets, and Dppi2 represents the single SVM that achieved the best performance, i.e. the one trained using protein-protein interactions data collected from BioGrid (Tab. 1). Tab. 4 shows the same results obtained by each single SVM trained on a specific biomolecular data set. Table 4. Single SVMs: averages across the performed learning tasks of the F-measure, precision, recall and AUC (Area Under the Curve) computed on the test sets. Each SVM is identified by the same name of the data set used for its training (Tab. 1).
Metric
Dppi1
Dppi2
Dpf am1
Dpf am2
Dexpr
Dseq
F
0.3655
0.4818
0.2363
0.3391
0.2098
0.4493
rec
0.2716
0.3970
0.1457
0.2417
0.1571
0.5019
prec
0.6157
0.6785
0.7154
0.6752
0.3922
0.4162
AUC
0.7501
0.8170
0.6952
0.6995
0.6507
0.7469
Looking at the values presented in Tab. 3, we see that, on the average, data integration through late fusion methods provide better results than single SVMs and VSI, inde-
204
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
Table 5. Results of the non-parametric test based on Mann-Withney statistics to compare AUCs between ensembles, VSI and single SVMs. Each entry represents wins-ties-losses between the corresponding row and column. Top: Comparison between late fusion methods and VSI; Bottom: Comparison between late fusion approaches and VSI with single SVMs.
Elog Elin Edt EN B
Elin Elog Edt EN B V SI
Dppi1 11-4-0 11-4-0 11-4-0 5-10-0 1-11-3
V SI 13-2-0 13-2-0 13-2-0 9-6-0 Dppi2 4-11-0 4-11-0 4-11-0 2-11-2 0-8-7
Elog 0-14-1 1-13-1 0-2-13 Dpf am1 15-0-0 15-0-0 15-0-0 9-6-0 2-11-2
Elin 1-11-3 0-2-13 Dpf am2 14-1-0 14-1-0 14-1-0 8-7-0 1-14-0
Edt 0-2-13 Dexpr 15-0-0 15-0-0 15-0-0 12-3-0 4-11-0
Dseq 13-2-0 13-2-0 13-2-0 7-8-0 0-12-3
pendently of the applied combination rule. In particular, Decision Templates achieved the best average F-measure, while the average AUC is larger for the majority of the ensemble methods with respect to single SVMs and VSI. Among the late fusion approaches, according with the collected AUCs, the worst performing method is the Naive Bayes combiner albeit it’s performances are still, on the average, higher than the ones reported for VSI and the single classifiers. Precision of the late fusion methods is relatively high: this is of paramount importance to drive the biological validation of "in silico" predicted functional classes: considering the high costs of biological experiments, we need to obtain a high precision (and possibly recall) to be sure that positive predictions are actually true with the largest confidence. To understand whether the differences between AUC scores in the 15 dichotomic tasks are significant, we applied a non parametric test based on the Mann-Withney statistic [18], using a recently proposed software implementation [19]. Tab. 5 shows that at 0.01 significance level in most cases there are no differences between AUC scores of the weighted average based ensembles and the Decision Template combiner. A different behavior is observed for the Naive Bayes combiner which performances are comparable tho the ones obtained by the other late fusion approaches only in 2 over 15 classification tasks and worse in the remaining 13. The observed differences in AUCs resulted to be statistically significant when we compare the three best performing late fusion methods with VSI, independently of the combination method (Tab. 5, top). The differences in performances obtained by the Naive Bayes combiner and VSI are not different (at 0.01 significance level) in 6 out of 15 functional classification test. It is worth noting that, among the tested late fusion approaches, Elin ,Elog and Edt undergo no losses when compared with single SVMs (Tab. 5, bottom): we can safely choose any late fusion method (but not the Naive Bayes combiner) to obtain equal or better results than any of the single SVMs. On the contrary in many cases the early fusion method VSI and the late fusion method EN B shows worse results than single SVMs. Nevertheless, we can observe that a single SVM trained with Ppi-2 data achieves good results (11 ties with ensembles and an average AUC 0.81 w.r.t.
205
1.0
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
0.6 0.4 0.0
0.2
F−measure
0.8
Davg Dppi2 Elin Elog Edt ENB VSI
01
02
10
11
12
14
16
18
20
30
32
34
40
42
43
FunCat class code
Figure 1. Comparison of the F-measures achieved in gene prediction: Davg stands for the average across SVM single learners, Dppi2 for the best single SVM, Elin , Elog , Edt , EN B for weighted linear, logarithmic, decision template and Naive Bayes ensembles and V SI for vector space integration.
0.86 of the ensembles, Tab. 3 and 5), showing that large protein-protein interactions data sets alone provide information sufficient to correctly predict several FunCat classes. F-measure performances are summarized in Fig. 1: all ensemble methods outperform on the average single SVMs. Nevertheless the best single SVM (Dppi2 ) outperforms weighted linear and logarithmic ensembles for some functional classes, but decision templates are in most cases better than the best single SVM, and significantly better than VSI.
5. Conclusions In this work we compared the performances in yeast genes functional classification problems of early and late integration data fusion methods. Our experiments demonstrated the potential benefits introduced by the usage of simple late fusion approaches based on multiple classifier systems for the integration of multiple sources of data in gene functional classification problems. The majority of late fusion methods were able to outperform the averaged performances of base learners in all the gene function prediction tasks, achieving the best results in terms of AUC, and Decision Templates showed the best average F-measure across the 15 functional classes. Among the tested late fusion approaches the only one unable to perform equally or better than early fusion approaches and single learners trained on single datasets was the Naive bayes combiner, clearly indicating that not all the late fusion methods are granted to outperforms early fusion approaches in gene function prediction and that further investiga-
206
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
tions are required in order to find the best strategy for data fusion based gene function prediction. The results presented in this contribution, obtained with relatively simple combining methods, show the effectiveness of the late fusion methods in the integration of heterogeneous biomolecular data sources for gene function prediction. Moreover we think that the application and the development of more refined ensemble methods, exploiting the modularity and scalability that characterizes the ensemble approach, represent a promising research line for gene function prediction using heterogeneous sources of complex biomolecular data.
Acknowledgments The authors would like to gratefully acknowledge partial support by the PASCAL2 Network of Excellence under EC grant no. 216886. This publication only reflects the authors’ views.
References [1] W. Noble, A. Ben-Hur, Integating information for protein function prediction, in: T. Lengauer (Ed.), Bioinformatics - From Genomes to Therapies, Vol. 3, Wiley-VCH, 2007, pp. 1297–1314. [2] U. Karaoz, et al., Whole-genome annotation by using evidence integration in functional-linkage networks, Proc. Natl Acad. Sci. USA 101 (2004) 2888–2893. [3] E. Marcotte, M. Pellegrini, M. Thompson, T. Yeates, D. Eisenberg, A combined algorithm for genomewide prediction of protein function, Nature 402 (1999) 83–86. [4] O. Troyanskaya, et al., A Bayesian framework for combining heterogeneous data sources for gene function prediction (in saccharomices cerevisiae), Proc. Natl Acad. Sci. USA 100 (2003) 8348–8353. [5] M. desJardins, P. Karp, M. Krummenacker, T. Lee, C. Ouzounis, Prediction of enzyme classification from protein sequence without the use of sequence similarity, in: Proc. of the 5th ISMB, AAAI Press, 1997, pp. 92–99. [6] G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, W. Noble, A statistical framework for genomic data fusion, Bioinformatics 20 (2004) 2626–2635. [7] P. Pavlidis, J. Weston, J. Cai, W. Noble, Learning gene functional classification from multiple data, J. Comput. Biol. 9 (2002) 401–411. [8] Y. Guan, C. Myers, D. Hess, Z. Barutcuoglu, A. Caudy, O. Troyanskaya, Predicting gene function in a hierarchical context with an ensemble of classifiers, Genome Biology 9 (2008) S2. [9] L. Kuncheva, J. Bezdek, R. Duin, Decision templates for multiple classifier fusion: an experimental comparison, Pattern Recognition 34 (2) (2001) 299–314. [10] T. Dietterich, Ensemble methods in machine learning, in: J. Kittler, F. Roli (Eds.), Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, Vol. 1857 of Lecture Notes in Computer Science, Springer-Verlag, 2000, pp. 1–15. [11] H. Lin, C. Lin, R. Weng, A note on Platt’s probabilistic outputs for support vector machines, Machine Learning 68 (2007) 267–276. [12] C. Stark, B. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers, BioGRID: a general repository for interaction datasets, Nucleic Acids Res. 34 (2006) D535–D539. [13] C. vonMering, et al., STRING: a database of predicted functional associations between proteins., Nucleic Acids Research 31 (2003) 258–261. [14] R. Finn, J. Tate, J. Mistry, P. Coggill, J. Sammut, H. Hotz, G. Ceric, K. Forslund, S. Eddy, E. Sonnhammer, A. Bateman, The Pfam protein families database, Nucleic Acids Research 36 (2008) D281–D288. [15] P. Gasch, et al., Genomic expression programs in the response of yeast cells to environmental changes, Mol.Biol.Cell 11 (2000) 4241–4257. [16] P. Spellman, et al., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomices cerevisiae by microarray hybridization, Mol. Biol. Cell 9 (1998) 3273–3297.
M. Re and G. Valentini / Comparing Early and Late Data Fusion Methods
[17]
[18]
[19] [20]
207
A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Guldener, G. Mannhaupt, M. Munsterkotter, H. Mewes, The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes, Nucleic Acids Research 32 (18) (2004) 5539–5545. E. Delong, D. Delong, D. Clarke-Pearson, Comparing the areas under two or more or more correlated Receiver Operating Characteristics Curves: a non parametric approach, Biometrics 44 (3) (1988) 837– 845. I. Vergara, T. Norambuena, E. Ferrada, A. Slater, F. Melo, StAR: a simple tool for the statistical comparison of ROC curves, BMC Bioinformatics 9 (265) (2008). D. Titterington, G. Murray, D. Spiegelhalter, A. Skene, J. Habbema, G. Gelpke, Comparison of discriminant techniques applied to a complex data set of head injured patients, Journal of the Royal Statistical Society 144 (2) (1981).
208
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-208
An experimental comparison of Random Projection ensembles with linear kernel SVMs and Bagging and BagBoosting methods for the classification of gene expression data Raffaella Folgieri1 Dip. di Informatica e Comunicazione, Università Statale di Milano, Italy
Abstract. In this work we experimentally analyze ensemble algorithms based on Random Subspace and Random Plus-Minus-One Projection, comparing them to the results obtained in literature by the application of Bagging and BagBoosting on the same data sets used in our experiments: Colon and Leukemia. In this work we concentrate on the application of random projection (Badoiu et al.,2006) ensemble of SVMs, with the aim to improve the accuracy of classification, both through SVMs that represent the state-of-the-art in gene expression data analysis (Vapnik,1998) (Pomeroy et al.,2002), and through the ensemble methods, used in our work to enhance the classification accuracy and capability. Ensemble methods, in fact, train multiple classifiers and combine them to reduce the generalization error of the multi-classifiers system. To make possible the comparison of our results with those obtained in literature by the application of Bagging and BagBoosting, in this works we concentrate on SVMs with linear kernel. Keywords: Random Subspace, Random Projection, Randomized Maps, ensemble, Bagging, BagBoosting
Introduction In Class prediction with microarray data, experiments derived by real world are often related to the analysis of complex and high-dimensional data sets, characterized by a low ‘a-priori’ knowledge on data, and usually by a ‘small number’ of classified examples of ‘large dimension’. It is a fact that these high-dimensional problems are often difficult to solve because of their complexity, i.e. the cost of an optimal solving algorithm increases exponentially (or at least superpolinomially) with the dimension. It has been proven, in fact, that the problem is NP-hard, implying that it is not possible to find a polynomial time algorithm to solve it. This typical problem in gene expression 1 Corresponding Author: Raffaella Folgieri, Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy; E-mail: [email protected].
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
209
data rises the so called ‘curse of dimensionality’, term introduced by Bellman in 1961 [4]. Machine learning algorithms and particularly SVMs [2; 3; 5; 6] represent the state-ofthe-art in gene expression data analysis. Other methods have been used, such as Bagging and Boosting [7] and feature selection or extraction methods (see Golub [8]), or feature subsampling proposed by T. K. Ho [9]. In other works we compared results obtained by the application of single SVMs with those obtained by the means of two randomized techniques, Random Projection [1] and Random Subspaces ensemble of SVMs with linear, gaussian and polynomial kernels, with the aim to improve of the accuracy of classification. We also used ensemble methods to enhance the classification accuracy and capability. The main idea on which are based ensemble methods, in fact, is to train multiple classifiers and to combine them, to reduce the generalization error of the multi-classifiers system. Our results showed that linear and low degree polynomial kernel give better results compared both with Single SVMs and with Random techniques using high degree polynomial and Gaussian kernels. A theoretical justification of this approach is related to the Johnson-Lindenstrauss lemma about distance-preserving random projections. This lemma forms a theoretical basis for low-cost topology preserving some feature extraction. In this context we show a theoretical result related to the supervised learning, in the case of the polynomial kernels. We showed that, with high probability, the kernel applied to the compressed data is ε-closed to the optimal solution if we project in space of dimension
d ' = Ο(α 2 ⋅
lg N ) , where α is the degree of the polynomial kernel. ε2
This fact allowed us to conclude that, for algorithms using some characteristics of data, such as distances or polynomial kernel, random projections work as injection of noise into the outputs of the algorithms. Therefore, in these cases, random projections are suitable for applying ensemble methods. We can consider linear kernel as a particular case of polynomial kernel, so we have the possibility to compare results obtained by the application of Random Subspace and Random Projection ensemble of SVMs with the results obtained in literature by Diettling and Bühlman, using the Boosting and the BagBoosting methods on the Leukemia and Colon data [7].
1. An overview of the applied methods The main idea at the basis of our methods can be summarized as follow: applying ensemble methods, when the hypotheses are produced by applying a leaning algorithm to data perturbed by a random projection. In this way we hope to obtain feature selection with an algorithm of low computational cost, reducing the dimension of data according to a well stated theory. In our work, to improve the accuracy of results, we performed the RS or the PMO projection through the use of the ensemble method. Suppose we want to solve a classification problem and that, for a random projection P: Prob{A(D)=A(P(D))> 1/2 To improve the confidence, we have to:
210
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
- repeated the projection more times, independently - give the result with the major vote In this way the probability of error decreases. In Random Projection methods, first of all we construct a set of classifiers by applying a learning algorithm to random projected data, then a weighted vote of their prediction give the classification of the considered points (Fig 1). In this work, to compare results with those obtained by literature, we will use, as learning algorithms, SVMs, with linear kernels, and as randomized maps: - Plus-Minus-One (PMO) random projections: represented by matrices
P=
1 ⋅ rij where rij are uniformly chosen in {-1,1}, such that d'
Prob(rij=1)=Prob(rij=-1)=1/2. In this case the JL lemma holds with c≈4 (where c is a suitable constant); -
Random Subspace (21tesi): Tij =
1 ⋅ rij where rij are uniformly chosen with d'
entries in {0,1}, and with exactly one ‘1’ per row and most one ‘1’ per column. It is important go observe that even if RS subspaces can be quickly computed, they do not satisfy the JL lemma.
Figure 1: Proposed ensemble method.
As we have discussed, ensemble methods based on Random Subspace allow the reduction of the dimensionality d, in the possibility to obtain a significant reducing of the generalization error (in the case of classification problems). To complete the overview, we shortly recall Bagging and Boosting, which are popular methods we will use in our experiments, as comparison elements. There are various forms of bagging [10; 11; 12], here we limit the description to the general algorithm. The bagging algorithm creates a classifier using a combination of base classifiers. However, instead of iteratively reweighing the instances before each call of the base learner (as boosting), it creates a replicate dataset of the same size as the original. It does this by sampling from the original dataset with replacement. It then calls the base learner on the replicate to get a classifier Ct. After doing this for the set number of iterations, it creates the overall classifier, C*, by combining the base classifiers with a majority vote. For a given instance x: Loop over base classifiers Ct Loop over classes k
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
211
Vk = Vk + 1 if Ct(x) = k C* = k such that Vk is the maximum. The Boosting method was proposed firstly by Freund and Schapire [13]. Boosting is a meta-algorithm for improving on the accuracy of a weak learner while performing supervised machine learning. A weak learner is a machine learning algorithm that classifies data with accuracy greater than that of chance. Boosting runs the weak learner iteratively on the training data, rearranging the probability distribution of the given examples so that subsequent iterations of the weak learner focus on the ones that have not been accurately learnt yet. The algorithm then combines the hypothesis generated at each iteration and uses them to construct a classifier that has greater accuracy than the weak learner. The idea behind bootstrapping is that if the sample is a good approximation of the population, the sampling distribution of interest may be estimated by generating a large number of new samples (called resamples) from the original sample. Put in another way, bootstrapping treats the sample as if it is the population. The resampling is done using random number generator. Bootstrapping is therefore a Monte Carlo (i.e., numerical) technique, as opposed to the analytic techniques. The basic algorithm is a weak one. It varies the probability distribution on the examples, increasing the probability on the misclassified examples. The data are re-sampled in an adaptive way, so, with the data obtained, the weights are increased to comprehend those cases that often are misclassified. The predictors are after aggregated by weighted voting. By this description we can desume that Bagging is a special case of boosting, where the re-sampling probabilities are uniform at every step and the perturbed predictors give equal weight in the voting process. Some works in which authors used bagging and boosting show results on the same data sets we use for our research, for example Dettling and Bühlman for Leukemia and Colon data [7], so we will compare our results with those obtained by bagging and boosting methods.
2. Experimental setup The experiments have been performed on the same data used in literature by Dettling and Bühlman: (a) The Colon adenocarcinoma data set, composed of 2000 genes and 62 samples: 40 colon tumor samples and 22 normal colon tissue samples (Alon et al.,1999). (b) The Leukemia data set, that treats the problem of recognizing two variants of leukemia by analyzing the expression level of 7129 different genes. The data set consists of 72 samples, with 47 cases of Acute Lymphoblastic Leukemia (ALL) and 25 cases of Acute Myeloid Leukemia (AML), split into a training set of 38 tissues and a test set of 34 tissues. All the data sets have been treated following the same indication reported in the respective works in literature.
212
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
Concerning the implementation, we developed new C++ classes and applications for random subspace ensembles extending the NEURObjects2 library, using the SVM-light applications by Joachim. The procedures have been developed in Perl, in Linux O.S. environment. We specialised the learning algorithm L using linear Support Vector Machines (SVMs). We fixed 50 as the number I of base learners and chose as dimension of subspace every number n = 2k with 1≤ k < ⎡ log2d ⎤ where d is the dimension of the data. More
⎛d ⎞
precisely, we drew 50random subspaces from the available ⎜⎜ ⎟⎟ ones, and we used ⎝n⎠ them to project the original d-dimensional input data into the obtained 50 ndimensional subspaces; the resulting samples have been used to train the 50 base SVMs that belong to the ensemble. On the selected data set we performed the methods listed below: - Random Subspace (RS) projection ensemble of SVMs - Random Projection Plus-Minus-One (RP-PMO) ensemble of SVMs
3. Compared results Diettling and Bühlman. in their works [7] applied Boosting and Bagging methods to Leukemia and Colon data sets. As seen in previous paragraphs, boosting is a class prediction method developed in the machine learning framework, particularly useful in high-dimensional prediction problems. It consists in producing a classification from a sequential ensemble of base learners, fitted with an adaptively reweighed version of the data set. In the specific experiments conducted by Diettling and Bühlman, they used a particular combination called BagBoosting because it uses bagging as a module for the boosting algorithm applied to the microarray considered data. In this approach, for each boosting iteration, the technique does not rely just on a single base learner, but aggregates the output from several ones, generated from bootstrap samples, each obtained performing a replacement from the reweighed training data. Even if there are some differences in experiments set up, we will compare the results obtained by Diettling and Bühlman on Leukemia and Colon data set with those obtained with Random Projection ensemble (both from Random Subspace and from Random PMO Projection), considering comparable the results on the basis of the following considerations: - BagBoosting incorporates a multivariate feature selection, so the results don’t depend strongly on preliminary data filtering; - the test error reported by Diettling and B¨uhlman show the outcome with 200 genes and we will compare it to values obtained with a similar subspace dimension, that is 256. Moreover, the splitting of the original data sets into learning and test sets has been done in both our experiments and in Diettling and Bühlman ones in the same way, that is as in [14]. For the Random Subspace projection ensemble the compared results are
2 The extended version of the NEURObject http://www.disi.unige.it/person/ValentiniG/Neurobjects/.
library
is
freely
downloadable
from
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
213
reported in table 1 for the subspace dimension 256, and for the best results obtained from Random Subspace ensemble.
Colon data set Leukemia data set
Boosting 0.1286 0.0567
BagBoosting 0.1610 0.0408
RS ensemble with linear kernel Best results Subspace dim.256 0.1270 0.1270 0.0822 0.0254
Table 1: Colon and Leukemia data set: Boosting and BagBoosting (on 200 selected genes) test error compared with Random Subspace ensemble with linear kernels for best results and for the subspace Dimension 256.
Notwithstanding the differences in the two experimental environments, it is evident by the results that in general Random Subspace ensemble outperform both Boosting and BagBoosting algorithm. This fact is well underlined if we consider the best results obtained with Random Subspace ensemble, but is quite true also considering the results obtained with the Subspace Dimension 256, comparable to the 200 genes selected by Diettling and Bühlman. Particularly, in the case of Leukemia data set, we obtained quite similar results, while for the Colon data set the Random Subspace ensemble perform always better than Boosting and BagBoosting methods. We can state that also Random PMO Projection ensemble gives better results on Colon and Leukemia data set. In fact, as shown in table 2 Random PMO Projection ensemble with linear kernel outperforms also Random Subspace Projection ensemble and, consequently, both BagBoosting and the ‘simple’ Boosting results. It is known that Boosting tends to overfit on gene expression data during training. BagBoosting can inherit the same effect since it is based on Boosting. Hence, both algorithms may not well generalize and classification errors could be large. It could be the explanation of why an SVM ensemble can outperform them, though an SVM may be also prone to overfitting on very high dimensional data. Data sets Colon Leukemia
Boosting 0.1286 0.0567
BagBoosting 0.1610 0.0408
RS ens. linear 0.1270 0.0254
RP (PMO) ens. linear 0.1186 0.0254
Table 2: Colon and Leukemia data set: Boosting and BagBoosting (on 200 selected genes) test error compared with the best results obtained with Random Subspace ensemble and Random PMO Projection with linear kernel.
4. Conclusions The comparison between the results obtained by the application of Random Subspace and Random PMO Projection ensemble on Leukemia and Colon data sets and results in literature obtained with Boosting and BagBoosting [7] methods, confirms the effectiveness of Random Projection ensemble. In fact,we obtained similar (Leukemia data set) or better (Colon data set) results.
214
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
Considering the differences among our experimental setup and the gene selection performed by Diettling and Bühlman (200 genes), we compared these results from literature both to results obtained with Random Subspace ensemble with subspace dimension 256, and to the best results by Random Subspace ensemble with higher subspace dimension. With random subspace ensembles of linear SVMs, we obtained the minimum of the test error using 1024-dimensional subspaces, but also with 256 to 1024-dimensional subspaces results (Fig. 2 a) are equal or better than results obtained by the application of Bagging and BagBoosting. Interestingly enough, sensitivity is very high if very low dimensional subspaces are applied, but at the expenses of the specificity (Fig.2 b). Indeed using 2 or 4dimensional subspaces the base SVMs learn nothing, predicting that all samples are malignant, without any distinction between normal and cancerous tissues. The ensembles start to learn when 8 random genes are selected, and if we apply at least 16 gene-subspaces we achieve a reasonable specificity at the expense of a low decrement of the sensitivity (Fig 2 b).
(a)
(b)
(c)
(d)
Figure 2: SVM random subspace ensembles results on the colon data set (5-fold cross validation). (a) Test and training error with respect to the dimension of the subspace (b) Sensitivity, specificity and precision (c) Test error curve with standard deviation values (d) Training error curve with standard deviation values .
Fig. 3 (a) shows that both the base learner training and test error decrease monotonically with the subspace dimension. Similar consideration are valid for the
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
215
Leukemia data sets, for which Random Subspace Ensemble achieve similar general results.
(a)
(b)
Figure 3: Colon data set: (a) Average training and test error of the base learners (component predictors) with respect to the subspace dimension (b) Test error of the 1024 dimensional SVM random subspace ensemble with respect to the number of the base learners on the 5 folds.
Concluding, on the basis of our experiments, we can affirm that the comparison between results obtained on Leukemia and Colon data sets and those at disposition in literature on the same data, showed that Random Subspace and Random PMO Projection ensemble outperform both Boosting and BagBoosting methods, even considering the differences among the experiments. The experiments also highlight that the information carried out by many genes is highly correlated. These results can also suggest than many genes are not correlated with the discrimination of the functional classes. As expected, the aggregation of more base learners, that is the ensemble methods, enhance the results, improving the accuracy of the Random Subspace and Random PMO Projection methods. The method could be surely applied with a large confidence probability to clinical and diagnostic problems and it could be also applied to other research fields affected by the problem of data sets characterized by high dimensions and few certain knowledge. This is the case of fraud detection, food classification, data from spectrometry and biomolecular analysis problems.
References [1] Badoiu, M.,Demaine E., Hajiaghayi M., Indyk P. Low-Dimensional Embedding with Extra Information. Discrete Computational Geometry. 36(4):609- 632,2006. [2] Vapnik, V. N. Statistical Learning Theory.Wiley, New York,1998 [3] Pomeroy, S. et al. Gene Expression-Based Classi_cation and Outcome Prediction of Central Nervous System Embryonal Tumors.Nature. 415:136-142,2002. [4] Bellman, R.: Adaptive Control Processes: a Guided Tour. Princeton University Press. New Jersey. 1961 [5] Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46 (2002) 389–422 [6] Brown, M. et al.: Knowledge-base analysis of microarray gene expression data by using Support Vector Machines. PNAS 97 (2000) 262–267
216
[7] [8] [9] [10] [11] [12] [13] [14]
R. Folgieri / An Experimental Comparison of Random Projection Ensembles
Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19 (2003) 1061–1069 Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286 (1999) 531–537 Ho, T.: The Random Subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832–844 Breiman, L.: Bagging predictors. Machine Learning 24 (1996) 123–140 Friedman, J. and Hall, P.: On Bagging and Nonlinear Estimation. Statistics Department, University of Stanford, CA, Tech. Rep. Tech. Report (2000) Skurichina, M., Duin, R.: Bagging, boosting and the Random Subspace method for linear classifiers. Pattern Analysis and Applications 5 (2002) 121–135 Freund, Y., Schapire, R.,E.: A decision-theoretic generalization of online learnings and an application to boosting. Journal of computer and system sciences 55 (1997) 119–139 Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97 (2002) 77–87
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-217
217
Changes in quadratic phase coupling of EEG signals during wake and sleep in two chronic insomnia patients, before and after cognitive behavioral therapy Stephen PERRIG a,1 and Pierre DUTOIT b and Katerina ESPA-CERVENA a and Vladislav SHAPOSHNYK b and Laurent PELLETIER b and François BERGER b and Alessandro E.P. VILLA a,b a Sleep Laboratory, Neuropsychiatry Service Belle-Idée, Hôpitaux Universitaires de Genève, Switzerland b Neuroheuristic Research Group, Grenoble Institute of Neuroscience, Université Joseph Fourier, Grenoble, France Abstract. Quantitative EEG studies of primary insomnia (PI) suggest that increased high frequency and reduction in slow frequency EEG activity could be associated with cortical “hyperarousal” and sleep homeostasis dysregulation. This preliminary study is the first to apply higher order EEG analysis in chronic PI patients. We analyzed phase coupling in two patients against two control subjects. We defined an index of resonant frequency (IRF) and show that both patients were characterized by high IRF values that suggest an increase in local cortical information processing before treatment. We show that cognitive behavioral therapy for insomnia (CBT-T) is able to reverse EEG phase coupling towards control values in as little as eight sessions. After treatment the patients were characterized by lower index values, thus suggesting recovery of information processing over wide-spread cortical areas. Keywords. Insomnia, cognitive behavioral therapy, cortico-cortical resonances, bispectrum, bicoherence
Introduction Insomnia is an ubiquitous major health disorder characterized by difficulties initiating or maintaining sleep, waking up too early or by poor quality sleep [1]. Insomnia occurs in about a third of the adult population affecting both genders at all ages and observed in all countries, cultures, and races. It is associated with daytime impairment (fatigue, attention-concentration-memory impairment and mood disturbances) [2]. One third of insomniacs (i.e., about 10% of the overall adult population) suffer on a chronic basis (i.e., for more than a month). A large number of those chronic patients (i.e., 2-3% of the overall adult population) have “primary” psychophysiological insomnia (PI) not associated 1 Corresponding
Author: S.P., Sleep Laboratory, Neuropsychiatry Service Belle-Idée, Hôpitaux Universitaires de Genève, Switzerland; E-mail: [email protected] .
218
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
with comorbid medical (e.g., pain, cancer, respiratory disease) or psychiatric conditions (e.g., anxiety, depression) [30]. Acute insomnia may be treated pharmacologically with GABAergic drugs [24,33]. For chronic insomnia, cognitive behavioural therapy (CBT-I) is applied with subjective and polysomnographic improvement of sleep [10,18]. Furthermore, two studies reported an increase of slow wave activity and changes in sleep EEG power densities analysis after CBT-I [15,8]. The pathophysiology of PI is not yet understood but these results suggest that nonpharmacological treatments can induce significant changes in the EEG and that these changes might be associated to the dynamics of neural circuits that are involved in the etiology of insomnia. The hypothesis of a “hyperarousal” mechanism either physiologic, psychologic or cognitive has been suggested [26] on the basis of physiological markers of increased arousal such as elevated cortisol/ACTH in the evening hours, increased heart rate and modification of heart rate variability. This hypothesis is supported by electrophysiological markers of insomnia that include increased high EEG frequencies in the peri-sleep onset period, during non rapid eye movement (NREM) and rapid eye movement (REM) sleep [12,16,17,25], reduction of slow wave sleep [13,14] and decreased of power spectra of delta (0.5–3.75 Hz) and theta (3.75–6.75 Hz) bands [17,25]. However, no consistent firm correlation can be established between quantitative EEG findings and subjective complaints of insomniac patients. These EEG studies used linear second order spectral analysis (i.e., Fast Fourier Transform, Power Spectrum Densities, Coherency) despite the fact that the neurophysiological processes underlying EEG are determined by highly non-linear dynamic systems [23]. Third order polyspectral analysis [5] is a non-linear method of signal processing that quantifies the degree of phase coupling and was applied to EEG by pioneers as early as the 1970s [9]. This method is computationally intensive and due to the lack of computing resources until recent years it remained somehow forgotten. Only few studies used this technique to analyze sleep EEG in animals [21,27], human neonates [34] and epileptic patients [7]. A proprietary bispectral index (BIS) calculated by a commercial device (BIS Technology, Aspect Medical Systems Inc., Norwood, MA, USA) decreased progressively as sleep became deeper in human studies [29,3]. The BIS is a combination of weighted parameters (burst-suppression ratio, beta ratio, relative bispectral power in the 40–47 Hz frequency range) developed by anaesthesiologists to monitor sedation. Despite a correlation with the stage of sleep the BIS could not reliably indicate conventionally determined sleep stages and patients could arouse with low BIS values [19]. The current study extends our previous EEG analyses after CBT-I [8] with polyspectra analysis [32]. The main goal is to test the hypothesis that in PI patients there is also persistence of EEG quadratic phase coupling between cortical areas during wake, drowsiness and sleep compared to normal subjects. We postulate that this coupling in high frequency would correspond to cortical “hyperarousal” and that it might be reversed by CBT-I.
1. Methods 1.1. Patients data We studied two patients suffering of psychophysiologic insomnia.
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
219
Case 1: a 33 yrs old male known for sleep problem since his infancy. He goes to bed at 11h00 P M and in the absence of medication (Zolpidem) he needs 3 hours to initiate sleep. He did not suffer respiratory disorder neither periodic leg movement (P LM ) during sleep. He wakes up at 7h30 AM with a sensation of non restorative night. A first polysomnography (PSG) was done one month after his last medication. This recording session is labeled hereafter as before treatment condition. The patient slept 5h26 min (Total Sleep Time: T ST ) with a sleep efficacy SE=66%. The hypnogram showed a maintenance insomnia. The sleep latency (SL) was 9 min and the wakefulness after sleep onset (W ASO) was 152 min. The index of apnea/hypopnea (IAH) was 6/h. The cognitive behavioural therapy for insomnia (CBT-I) consisted of 8 sessions distributed over 2 months. A PSG was performed after treatment in order to quantify the therapeutical effects, if any. The patient reported better nights and indeed the subjective sleep quality improved according to questionnaires and sleep diary. The PSG after treatment reported SL = 10 min, T ST = 7h10 with a sleep efficiency of 87% and W ASO decreased to 67 min. Case 2 : a 20 yrs old female without any medical or psychiatric condition who reported chronic insomnia occurring during the past 3 years without episodes of periodic leg movements. Her usual bed time is 00h30 AM , She complaints of daytime fatigue and concentration problems with subjective sleep latency close to 3 hours, and total sleep time between 3 and 4 hours. The PSG before treatment was characterized by SL = 57 min, T ST = 8h15 min, W ASO = 45 min, SE=83%. The index of apnea/hypopnea was normal (IAH = 0.6/h). After 8 sessions of CBT-I over 2 months, the after treatment PSG showed improvements in sleep latency (SL = 8 min), total sleep time (T ST = 9h10 min) with a sleep efficiency of 89% and W ASO = 58 min. Controls : polysomnography was performed in 2 healthy volunteers, a 26 yrs old female and a 43 yrs old male. None of the subjects suffered from medical or psychiatric disorders or were taking any medications. 1.2. EEG data acquisition The polysomnography (PSG) was performed using 7 scalp silver-silver chloride EEG electrodes (F 3–F 4, C3–C4, Cz, O1–O2) referred to linked mastoid electrodes. Impedance was kept below 5 kΩ. In addition, two electrodes were placed above and below the external canthi for EOG, and two electrodes on the chin for EMG. The EEG recording was performed with a commercial device (Brainlab, OSG, Belgium) during the period 10 P M –8 AM . During wakefulness, at the beginning of the recording period, EEG epochs of interest were recorded during eyes open (EO) and eyes closed (EC) conditions. The sleep onset was defined by the first desappearance of the alpha rhythm on all derivations and the appearance of a diffuse theta rhythm visible for more than 2 sec. The drowsiness period that occurred just before sleep onset is defined post-hoc at the time of the off-line analysis and corresponds to the pre-theta (PRE-THETA) recording condition. Changes in EEG spectral analysis in PI is most pro-eminent during the first ultradian sleep cycle. Therefore two minutes of EEG without artefacts were collected for the recording conditions: wake (EO, EC), drowsiness (PRE-THETA) and first sleep cycle (NREM2, NREM3, REM). Notice that EO and sleep-N3 recording conditions were discarded from the analysis performed in this paper because of too large variance between Subjects. These conditions will be considered at the end of the study that will comprise at least twenty Subjects (10 patients and 10 controls).
220
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
1.3. Data analysis This section is aimed to briefly explain the analytical specification of quadratic phase coupling starting from the basic of signal analysis in the frequency domain. Signal analysis functions are subdivided into classes derived from their relationship with the statistical moments and cumulant series. Correlation, power spectrum density (PSD) and coherence are second order cumulant analyses. Let us consider a case study where an analog signal from a single channel, e.g. an EEG signal x(t), is recorded during
N (a(t) + b(t) + a(t)b(t)) where N epochs of equal duration, such that x(t) = a(t) = cos(2πf1 t + ωa ), b(t) = cos(2πf2 t + ωb ) and f1 , f2 represent two frequencies of periodic processes and ωa ,ωb are phases randomly changed, i.e. uniformly distributed in [0, 2π], for each epoch. Notice the non-linear interaction is represented by the term a(t)b(t) (Fig. 1). The spectral representation of this signal X(f ) is obtained by the
N x(t)e−it2πf . The power spectrum is Pxx (f ) = |X(f )|2 Fourier transform X(f ) = and its shape will show peaks corresponding to frequencies f1 , f2 and f3 = f1 + f2 (Fig. 1). In the current study we have analyzed only a reduced number of EEG derivations and power spectrum densities (PSDs) for each subject were cumulated and averaged across derivations. Power spectrum analysis can be applied only to stationary signals because it assumes that the signal is composed into a linear combination of mutually uncorrelated frequency components [9]. In order to keep the phase relationship between the signal components and detect if some of them are non linearly coupled it is necessary to compute third order cumulant statistics [5,9,20]. Third order cumulant analyses include the bicorrelation, the bispectrum and the bicoherence. They keep the phase relationship between the signal components and thus can detect if some of them are non linearly coupled. Firstly, we N X(f1 ) X(f2 ) X ∗ (f1 +f2 ) where compute the bispectrum Bxxx , defined by Bxxx = ∗ X (f1 + f2 ) is the conjugate of X(f1 + f2 ). Bxxx will be near 0 in case of independence [5], and for the peaks in the bispectrum we estimate the value of the interaction by the bicoherence Cxxx , defined by Cxxx = |Bxxx |2 /Pxx (f1 ) Pxx (f2 ) Pxx (f1 + f2 ). Let us assume a case study x(t) (Fig. 1). characterized by an interaction, represented by the term a(t)b(t), between f1 and f2 so that a significant value of the bicoherence is observed for bifrequencies (f1 , f2 ). We tested the hypothesis that the bispectrum was equal to zero [9,6] at the 99% confidence limit to detect the significant interactions at couples of frequencies f1 and f2 . Phase-coupled frequencies f3 = f1 + f2 were determined for corresponding significant bispectral analysis at couples of frequencies f1 and f2 . Frequency f3 defined the “frequency of resonance”. In order to avoid biased sampling (i.e., due to differences in gain of amplitude set for the recording from different subjects) only bispectrum peaks corresponding to bicoherence peaks greater or equal to 0.6 were considered here in each EEG sample.
2. Results Figures 2-5 show the bipolar traces of the EEG recordings and the corresponding Power Spectrum Densities (PSD). Notice that in the eyes closed condition (Fig. 2) of one control subject (SP ) it is possible to observe focal posterior alpha rhythm (8 Hz) and fast rhythms (> 20 Hz) on EEG traces. This pattern might be associated to higher attention
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
221
Figure 1. Outline of the unresolving result provided by power spectrum analysis when the recorded signals depend on phase relations more than in frequency relations. (a) Assume that two signals, y(t) and z(t), were recorded simultaneously from separate channels. Each signal was recorded for N epochs of equal time. (b) Calculation of the Fourier transform of each signal. (c) Calculation of the power spectrum for each signal. See text for more details.
paid by the control subject during the calibration process. Conversely, for patient DS the PSD shows a higher 8 Hz peak and a decreased basal line for fast rhythms thus revealing a secondary peak in power near 18 Hz. Notice that PSDs of controls were similar for all recording periods and that both patients were characterized by a relative enhancement of the peak near 8 Hz, as shown by DS recordings in Figures 2-5. The PSD analysis of Pre-Theta, NREM and REM did not reveal other noticeable results. The bispectral analysis was performed for all channels separatedly and the values of phase-coupled frequencies (i.e., the frequencies of resonance f3 ) were determined. The third cumulant analysis obtained from all bipolar recordings, from both single- and cross-channel analyses, were grouped together. Phase-coupled frequencies in the range 1 to 50 Hz for all recording periods in controls and patients are plotted as histograms in
222
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
Figure 2. Recording condition Eyes Closed for one control Subject (SP ) and for one patient (DS) before and after treatment (CBT-I) is illustrated in the upper, mid and lower panels, respectively. The left panels show the bipolar recordings for derivations F 3–A2, F 4–A1, C3–A2, C4–A1, Cz–A1, O1–A2 and O2–A1. Calibration ticks correspond to 2 seconds epochs for all subjects (an overall 30 s interval is shown in the figure) and to 34.1 μV , 52.7 μV and 67.0 μV , for Subject SP and Subject DS pre- and post-treatment, respectively. The right panels show the corresponding cumulated Power Spectrum Density (PSD) for all bipolar derivations but Cz–A1. Subject SP has a few blinking artifacts visible on frontal leads, a more posterior alpha rhythm and fast rhythms visible in the traces. Patient DS has a more diffuse alpha rhythm.
Figure 6. A general observation is that for all subjects and during all recording conditions the majority of phase-coupled frequencies are lying in the range 13–33 Hz. In addition, the histograms of Figure 6 show that for patients before CBT-I treatment the relative count of phase-coupled frequencies in the range 33–48 Hz was larger than the count of phase-coupling in the low frequency range, up to 13 Hz. Notice that these frequency values refer to the phase-coupled frequencies f3 that are the sum of two frequencies (f1 and f2 ) and should not be confounded with the usual frequency bands of the EEG power spectra. In order to characterize quantitatively the shift of f3 towards the extremes of the frequency range we defined two indexes. Let us label LF the relative frequency of peaks in the ]1 − 13] Hz band and HF the relative frequency of peaks in the ]33 − 48] Hz band. The index of resonant frequencies IRF is defined in the range 0–100 as follows: 1 HF −LF IRF = 2 × 100 + HF +LF × 100 . This means a value of IRF close to 100 corresponds to a shift of f3 towards higher frequencies and value of IRF close to 0 corresponds to a shift of f3 towards lower frequencies. A value of IRF close to 50 corresponds to a system characterized by as much phase-coupling in the LF as in the HF band, irrespective of the total count. The second index called RF R, the raw frequency ratio, is the ratio between the relative count of phase-coupled frequencies in the 1–13 Hz range and the relative count of phase-coupled frequencies in the 33–48 Hz range. This
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
223
Figure 3. Recording condition pre Theta for a control Subject (SP ) and for a patient (DS) before and after treatment is illustrated following the same labels as in Fig.2.
Figure 4. Recording condition Non REM Sleep for a control Subject (SP ) and for a patient (DS) before and after treatment is illustrated following the same labels as in Fig.2. K-complex and sleep spindle are visible on the traces.
224
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
Figure 5. Recording condition REM Sleep for a control Subject (SP ) and for a patient (DS) before and after treatment is illustrated following the same labels as in Fig.2. Fast rhythms are visible for SP and some electrocardiogram artifacts are visible for patient DS after treatment.
means a large value of RF R corresponds to a shift of phase-coupling towards higher frequencies and a low value of RF R corresponds to a shift towards lower frequencies. Table 1 shows the relative count of phase-coupling in the frequency bands of interest and the values of indexes IRF and RF R. The general pattern was an increase of high frequency coupling in the group of patients before treatment. The main effect of treatment was to reduce high-frequency coupling and shift phase-coupling towards low frequencies, somehow with a significant increase of low frequency coupling compared to the controls. Because of the limited sample of our current study the variance of the results is high. We tested the effect of the treatment by comparing the results before treatment and after treatment against the control groups using Chi − squaretest, 2p < 0.05. In the LF range we observe in Table 1 that the patients before treatment show fewer phase-coupling than controls during all recording periods, but REM sleep. It is interesting to notice that during the PRE-THETA intervals the phase-coupling of patients was not quantitatively modified by the treatment. Conversely, the CBT-I treatment significantly increased the phase-coupling in the LF band during all other intervals, either re-establishing a level close to the controls or even beyond that level, as observed during REM. The mid-range (13−33 Hz) phase-coupling was not affected by the CBT-I treatment. In the HF range the treatment significantly reduced the phase-coupling by a shift towards lower frequencies during all recording periods, but REM. Another view of the results is illustrated by Figure 7 based on a radial representation of the values of the index of resonant frequencies IRF during all recording periods. In this graphics each group is represented by a polygon. The patient group before CBT-I treatment appears clearly on the external border of the diagram, thus emphasizing the relative increase of high-frequency phase-coupling in this pathology. The empty circles
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
225
Figure 6. Relative distribution of the frequencies of resonance for control and patient groups before and after CBT-I treatment. Bin size corresponds to 1 Hz intervals. The dotted lines delineate the limits of LF and HF bands.
and the thick line correspond to the control group. The treatment appears to be very effective in shifting the phase-coupling towards lower frequencies, as shown by the curve of post-treatment patients closer to the center of the diagram.
3. Discussion This study reports, for the first time to our knowledge, the analysis of functional electrophysiology by bispectral techniques before and after cognitive behavioral therapy (CBTI, 8 sessions over 2 months) in patients suffering chronic primary insomnia (PI) [10,18]. This high order spectral analysis allows to determine the frequency range of quadratic phase coupling (resonant frequency) across cortical areas [31,32]. A coupling that occurs at high frequencies may be interpreted as a sign of focal cortical interactions. Conversely, a coupling at low frequencies suggests an increased cross-areal involvement in neural processing. We show that raw frequency ratio and the index of resonant frequencies indicate the prevalence of coupling at higher frequency range in insomniac patients compared to controls. We showed that CBT-I provoked a shift of the indexes towards low frequencies of resonance at all brain states. The treatment failed to show a significant effect
226
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
Table 1. Percentage of phase-coupled frequencies in each frequency bands of interest for the the control group and for the group of patients before and after CBT-I treatment. IRF: index of resonant frequencies. RFR: raw frequency ratio. Significance levels: (ns) not significant, (*) 5%, (**) 1%. Subject Group
Percentage of phase-coupled frequencies LF: ] 1-13]Hz ]13-33]Hz HF: ]33-48]Hz
Indexes IRF
RFR
Eyes Closed Control Patient before after CBT-I Pre-Theta Control
12 2 (*) 8 (ns)
14 21 (ns)
54 91 (*)
1.17 10.50
88 (ns)
4 (*)
33 (ns)
0.50
75
13
52
1.08
Patient before
3 (*)
87 (ns)
10 (ns)
77 (ns)
3.33
after CBT-I
3 (*)
96 (ns)
1 (**)
25 (*)
0.33
NREM Control Patient before
57 27 (*)
30 60 (*)
13 13 (ns)
19 33 (*)
0.23 0.48
42 (ns)
57 (*)
1 (**)
2 (*)
0.02
4 4 (ns) 19 (*)
90 85 (ns) 79 (ns)
56 75 (ns) 10 (*)
1.25 3.00 0.11
after CBT-I
12
74 77 (ns)
REM Control Patient before after CBT-I
5 12 (ns) 2 (ns)
(Table 1) only in the high frequency range (]33-48] Hz) during REM sleep and at low frequencies (] 1-13] Hz) immediately before sleep onset (PRE-THETA recording condition). It is also worth reporting that the only condition that let appear a difference of resonant frequencies in the intermediate range (]13-33] Hz) was during sleep-N2 (NREM) irrespective of the treatment. This last result suggests that despite an overall shift of resonant frequencies towards recovery, focal cortical interactions tended to persist in patients during NREM sleep periods. This is in agreement with the finding of decreased regional cerebral blood flow during non-REM sleep reported in the subcortical, limbic/arousal systems and in the anterior cingulate and medial prefrontal areas of patients compared to normal controls [22]. Definite conclusions cannot be drawn because of the preliminary nature of this report based only on two patients’ data. However, the current study brings new original results to support the therapeutic value of CBT-I which appears to modify cortical neural processing in an objective way. CBT-I has an effect by changing dysfunctional thoughts and attitude about poor sleep, changing conditioned arousal, bringing the patient to modify his behavior towards his sleep difficulties. The prevalence of high frequencies of resonance in chronic PI patients may be associated to the prevalence of multiple sites of focal cortical interactions, which supports the “hyperarousal” hypothesis in insomniac patients [4,26]. Future clinical studies are necessary to confirm this hypothesis which can also be investigated by means of models and neural network simulations. The question whether robots need to sleep [11] has been raised. What is the meaning of “hyperarousal” in an artificial brain? We have set-up a routine analytical procedure that will allow us to investigate cross-areal frequency coupling and compare human data with the chipograms obtained from simulated neural networks [28].
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
227
Figure 7. Radial scatterplot of the index of resonant frequencies IRF during all recording periods. Notice that the the CBT-I treatment shifts the curve beyond the curve of the controls.
Acknowledgments The authors thank the technical contribution by D. Grasset, E. Claudel, F. Espa, B. Adjivon and the discussions with Dr. V. Ibanez and Dr. H. Merica. The authors ackowledge the partial support by the European Union FP6 grant #034632 (PERPLEXUS).
References [1] A. A. S. M. International classification of sleep disorders: diagnostic and coding manual. Technical report, Westchester, IL: American Academy of Sleep Medicine, 2005. [2] S. Ancoli-Israel and T. Roth. Characteristics of insomnia in the United States: results of the 1991 national sleep foundation survey. Sleep, 22:S347–353, 1999. [3] F. Benini, M. Trapanotto, S. Sartori, A. Capretta, D. Gobber, C. Boniver, and F. Zacchello. Analysis of the bispectral index during natural sleep in children. Anesth. Analg., 101:641–644, 2005. [4] M. H. Bonnet and D. L. Arand. Hyperarousal and insomnia. Sleep Med. Rev., 1:97–108, 1997. [5] D. R. Brillinger. An introduction to polyspectra. Ann. Math. Stat., 36:1351–1374, 1965. [6] D. R. Brillinger and R. A. Irizarry. An investigation of the second- and higher-prder spectra of music. Signal Processing, 65:161–179, 1998. [7] T. H. Bullock, J. Z. Achimowicz, R. B. Duckrow, S. S. Spencer, and V. J. Iragui-Modaz. Bicoherence of intracranial EEG in sleep, wakefulness and seizures. Electroenc. Clin. Neurophys., 103:661–678, 1997. [8] K. Cervena, Y. Dauvilliers, F. Espa, J. Touchon, M. Matousek, M. Billard, and A. Besset. Effect of cognitive behavioural therapy for insomnia on sleep architecture and sleep EEG power spectra in psychophysiological insomnia. J. Sleep Res., 13:385–393, 2004.
228
S. Perrig et al. / Changes in Quadratic Phase Coupling of EEG Signals
[9] G. Dumermuth, P. J. Huber, B. Kleiner, and T. Gasser. Analysis of the interrelations between frequency bands of the EEG by means of the bispectrum. a preliminary study. Electroencephalogr. Clin. Neurophysiol., 31:137–148, 1971. [10] J. D. Edinger and M. K. Means. Cognitive–behavioral therapy for primary insomnia. Clinical Psychology Review, 25:539–558, 2005. [11] J. D. Fouks, S. Besnard, L. Signac, J. C. Meurice, J. P. Neau, and J. Paquereau. Do robots need to sleep? Neurophysiologie Clinique, 34:59–70, 2004. [12] R. Freedman. EEG power in sleep-onset insomnia. Electroenc. Clin. Neurophys., 63:408–413, 1986. [13] J. M. Gaillard. Is insomnia a disease of slow wave sleep? Eur. Neurol., 14:473–483, 1976. [14] J. M. Gaillard. Chronic primary insomnia: possible physio-pathological involvement of slow wave deficiency. Sleep, 1:133–147, 1978. [15] G. D. Jacobs, H. Benson, and R. Friedman. Home-based central nervous system assessment of a multifactor behavioral intervention for chronic sleep-onset insomnia. Behavior. Ther., 24:159–174, 1993. [16] C. H. Lamarche and R. D. Ogilvie. Electrophysiological changes during the sleep onset period of psychophysiological insomniacs, psychiatric insomnias, and normal sleepers. Sleep, 20:724–733, 1997. [17] H. Merica, R. Blois, and J. M. Gaillard. Spectral characteristics of sleep EEG in chronic insomnia. Eur. J. Neurosci., 10:1826–1834, 1998. [18] C. M. Morin, A. Vallières, B. Guay, H. Ivers, J. Savard, C. Mérette, C. Bastien, and L. Baillargeon. Cognitive behavioral therapy, singly and combined with medication, for persistent insomnia. JAMA, 301:2005–2015, 2009. [19] D. Nieuwenhuijs, E. L. Coleman, N. J. Douglas, G. B. Drummond, and A. Dahan. Bispectral index values and spectral edge frequency at different stages of physiologic sleep. Anesth. Analg., 94:125–129, 2002. [20] C. L. Nikias and M. R. Raghuveer. Bispectrum estimation: a digital signal processing framework. Proc. IEEE, 75:869–891, 1987. [21] C. L. Ning and J. D. Bronzino. Bispectral analysis of the EEG during various vigilance states. IEEE Trans. Biomed. Eng., 36:497–499, 1989. [22] E. A. Nofzinger, D. J. Buysse, A. Germain, J. C. Price, J. M. Miewald, and D. J. Kupfer. Functional neuroimaging evidence for hyperarousal in insomnia. Am. J. Psychiatry, 161:2126–2129, 2004. [23] P. L. Nunez and R. Srinivasan. Electric Fields of the Brain. Oxford University Press, New York, NY, USA, 2006. [24] S. Passarella and M.-T. Duong. Diagnosis and treatment of insomnia. American Journal of HealthSystem Pharmacy, 65:927–934, 2008. [25] M. L. Perlis, M. T. Smith, H. Orff, P. Andrews, and D. E. Giles. Beta/gamma activity in patients with insomnia and in good sleeper controls. Sleep, 24:110–117, 2001. [26] P. W. Perlis ML, Smith MT. Etiology and pathophysiology of insomnia. In M. Kryger, T. Roth, and W. C. Dement, editors, Principles and Practice of Sleep Medicine, pages 726–737. Elsevier Saunders, Philadelphia, PA, USA, 2005. [27] K. Schmidt, M. Kott, T. Müller, H. Schubert, and M. Schwab. Developmental changes in the complexity of the electrocortical activity in foetal sheep. J. Physiol., 94:435–443, 2000. [28] V. Shaposhnyk, P. Dutoit, V. Contreras-Lámus, S. Perrig, and A. E. P. Villa. A framework for simulation and analysis of dynamically organized distributed neural networks. LNCS, in press, 2009. [29] J. W. Sleigh, J. Andrzejowski, A. Steyn-Ross, and M. Steyn-Ross. The bispectral index: a measure of depth of sleep? Anesth. Analg., 88:659–661, 1999. [30] D. Taylor, L. J. Mallory, K. L. Lichstein, H. H. Durrence, B. W. Riedel, and A. J. Bush. Comorbidity of chronic insomnia with medical problems. Sleep, 30:213–218, 2007. [31] A. E. P. Villa, I. V. Tetko, P. Dutoit, Y. De Ribaupierre, and F. De Ribaupierre. Corticofugal modulation of functional connectivity within the auditory thalamus of rat. J. Neurosci. Meth., 86:161–178, 1999. [32] A. E. P. Villa, I. V. Tetko, P. Dutoit, and G. Vantini. Non-linear cortico-cortical interactions modulated by cholinergic afferences from the rat basal forebrain. BioSystems, 58:219–228, 2000. [33] J. W. Winkelman, O. M. Buxton, J. E. Jensen, K. L. Benson, O. S. P., W. Wang, and P. F. Renshaw. Reduced brain GABA in primary insomnia: preliminary data from 4T proton magnetic resonance spectroscopy (1H-MRS). Sleep, 31:1499–1506, 2008. [34] H. Witte, P. Putsche, M. Eiselt, K. Hoffmann, B. Schack, M. Arnold, and H. Jäger. Analysis of the interrelations between a low-frequency and a high-frequency signal component in human neonatal EEG during quiet sleep. Neurosci. Lett., 236:175–179, 1997.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-229
229
SVM Classification of EEG Signals for Brain Computer Interface G. Costantinia , M. Todiscoa, D. Casalia, M. Carotaa, G. Saggioa, L. Bianchib, M. Abbafatib, L. Quitadamob a Dipartimento di Ingegneria Elettronica Università di Roma “Tor Vergata” Rome, Italy b Dipartimento di Neuroscienze Università di Roma “Tor Vergata” Rome, Italy
Abstract. In this paper, a brain/computer interface is proposed. The aim of this work is the recognition of the will of a human being, without the need of detecting the movement of any muscle. Disabled people could take, of course, most important advantages from this kind of sensor system, but it could also be useful in many other situations where arms and legs could not be used or a brain-computer interface is required to give commands. In order to achieve the above results, a prerequisite has been that of developing a system capable of recognizing and classifying four kind of tasks: thinking to move the right hand, thinking to move the left hand, performing a simple mathematical operation, and thinking to a carol. The data set exploited in the training and test phase of the system has been acquired by means of 61 electrodes and it is formed by time series subsequently transformed to the frequency domain, in order to obtain the power spectrum. For every electrode we have 128 frequency channels. The classification algorithm that we used is the Support Vector Machine (SVM). Keywords. SVM, EEG, classification
Introduction Brain electrical activity can be easily observed by simply placing a set of wet electrodes on the surface of the head. Every kind of task or thought the human being can perform causes electrical activities in different parts of his/her brain; therefore, the recognition of this activity could be considered as a desirable machine learning application. The task is not so trivial, for many reasons: first, we cannot know the state of all neurons in the brain, but just a mean value of it in some zones of the outer part of the brain. Second, the electrical activity is not limited to a single zone, depending on the task a person is performing: in most cases, it involves the whole brain. The difference among different tasks is mainly in the way electrical waves move from one zone to another. A third problem is that there is always a lot of electrical activity in the brain, also when we are thinking or doing “nothing”. This kind of activity, including breathing and all involuntary movements, is always present and can eventually mask
230
G. Costantini et al. / SVM Classification of EEG Signals for Brain Computer Interface
the task we intend to monitor. All of this activities represent for us a “noise” that is often bigger than the “signal” we need to detect. Because of these reasons, the main challenge we are going to face is the classification of the dataset that we collected from the electrodes. A reasonable classification method for this kind of data relies on artificial neural networks. In this work, we used Support Vector Machines: a tool that is very similar to a neural network, but with the advantage that it can support datasets with a huge number of components; therefore there’s no need of a reduction of the feature space. Moreover, it has a training algorithm that is much better than the “back-propagation” rule usually used in neural networks. The paper is organized as follow: in the first section we will describe the sensor system, in the second section we will describe the preprocessing that is applied to the acquired data. In third section a description of the classifier is given. In fourth section a description of experimental tests, together with results is shown. Finally we will give a conclusion with some comments and possible improvements for future works.
1. Sensor System As shown in Fig 1, the recognition system includes three blocks, one for each phase of the signal processing: data acquisition, preprocessing, and classification. The sensor system is composed by 61 electrodes that are placed at the surface of the head of the subject, according to a standard disposition used in these kind of applications [1,2], as shown in Fig 2. EEG
Preprocessing
Classification Figure 1. Block-diagram of the sensor system.
Electrodes are connected to the computer by means of fiber optics, in order to provide necessary electrical insulation that guarantee the subject by any risk of electrical shock. Signals are sampled at a sampling rate of 256 Hz. A picture of the sensor system is shown in Fig. 3.
G. Costantini et al. / SVM Classification of EEG Signals for Brain Computer Interface
231
Figure 2. Position and names of the electrodes.
2. Preprocessing Different types of brain activity are often related to the frequency of the waves that we can find in EEG signals, such as alpha waves (8-12 Hz), beta waves (12-19 Hz), gamma waves (around 40 Hz), delta waves (1-4 Hz) that are associated to weakness, sleep, REM, and other kind of brain states [3-5]. For that reason, we suppose that analyzing the dataset in the frequency domain could be useful for our purpose. For every task, we calculated the FFT in three windows of 256 samples. Of course we considered only the first half of every FFT window, because the second half is symmetric and doesn’t give any further information. Channels from 1 to 127 represent frequencies from 1 to 127 Hz. Channel 0 represents frequency 0, and it is omitted. So, we have a total number of 381 (127 for every one of the three windows) data points for every electrode. The feature vector that we want to classify, is composed by 381*61 data points. Finally, as there is a great variance in the ranges of the values, we perform a normalization, so that all the values involved we have a range form 0 to 1.
232
G. Costantini et al. / SVM Classification of EEG Signals for Brain Computer Interface
Figure 3. Picture of the sensor system
3. Classifier In the past few years, SVMs aroused the interest of many researchers being an attractive alternative to multi-layer feed-forward neural networks for data classification and regression or PCA [6,7]. The basic formulation of SVM learning for classification consists in the minimum norm solution of a set of linear inequality constraints. So, it seems useful to exploit the relation between these two paradigms in order to take advantage of some peculiar properties of SVMs: the “optimal” margin of separation, the robustness of the solution, the availability of efficient computational tools. In fact, the SVM learning problem has no non-global solutions and can be solved by standard routines for quadratic programming (QP); in the case of a large amount of data, some fast solvers for SVMs are available, e.g. SVMlight [8].
4. Experiments and Results In the experiment involved a set of 5 subjects for two days. Every day a subject performed two sessions. During a session, the subject was asked to perform 200 tasks randomly selected among the following: thinking to move the right hand, thinking to move the left hand, performing a simple mathematical operation and thinking to a carol. Every task lasted three seconds, hence the entire session was 20 minutes long. Our objective was to operate discriminations between every couple of task: left hand vs. right hand, mathematical operation vs. carol, right hand vs. mathematical operation, left hand vs. carol, right hand vs. carol and left hand vs. mathematical operation. Hence we prepared 6 kinds of datasets, one for every possible combination of the four tasks. The whole datasets was divided in training sets (75% of the datasets) and test sets (25% of the datasets). Accuracy results for training set was always 100%, accuracy results for test set are shown in Table I. For every subject (denoted with a number, for privacy reason), we performed a mean value of results for 4 different sessions, considered separately, we didn’t mix data from different subjects or different sessions, because they would be too much different. In the table we reported the mean value of the accuracy on the test set, for each subject and every couple of task. In the last line of Table I, we reported mean value for all the subjects.
G. Costantini et al. / SVM Classification of EEG Signals for Brain Computer Interface
233
5. Conclusion A brain/computer interface is presented, which is able to discriminate among different kind of mental tasks that a subject is performing. It is based on a SVM classifier, which is trained by the power spectrum of the EEG signals coming from 61 electrodes set in the surface of the head. An experimental test showed quite good results in case of discriminating between the thought of a carol and hand movements, or mathematical operation and hand movements, while results have been very poor in case of discriminating between movement of right hand and left hand. There is a little prevalence of electrical activity in the opposite side of the brain, but this prevalence is not enough to be successfully exploited by the SVM. A greater difference could probably reside in the activation time of the involved areas. Future development will include an analysis in the time domain, in addition to the data considered in this paper. Moreover, we found an high difference in performance according to different subjects: for example, as you can see in the table, with subject 1 we obtained more than 80% in many tasks, while with subject 4 we was just a little over 50%. Table 1. Accuracy for every task couple. Subject
Left/Right
Math/Carol
Right/Math
Left/Carol
Right/Carol
Left/Math
1
58%
69%
88%
91%
83%
79%
2
46%
54%
57%
77%
62%
66%
3
55%
58%
77%
73%
71%
77%
4
48%
57%
54%
62%
53%
67%
5
54%
79%
60%
66%
65%
61%
average
52.2%
63.4%
67.2%
73.8%
66.8%
68%
References [1] Sharbrough F, Chatrian G-E, Lesser RP, Lüders H, Nuwer M, Picton TW (1991): ”American Electroencephalographic Society Guidelines for Standard Electrode Position Nomenclature”. J. Clin. Neurophysiol 8: 200-2. [2] Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, Volker Kunzmann, Florian Losch, Gabriel Curio “The Berlin Brain-Computer Interface: EEG-based communication without subject training” Transactions On Neural Systems And Rehabilitation Engineering, Vol. XX, No. Y, 2006 [3] Brazier, M. A. B. The Electrical Activity of the Nervous System, Pitman, London, 1970. [4] Ward LM, Doesburg SM, Kitajo K, MacLean SE, Roggeveen AB. “Neural synchrony in stochastic resonance, attention, and consciousness”. Can J Exp Psychol. 2006 Dec;60(4):319-26. [5] Walker, Chambers Dictionary of Science and Technology, Chambers Harrap Publishers, 2nd ed., p312, P.M.B (1999). [6] Jolliffe I.T, Principal Component Analysis, 2nd ed., Springer, NY, 2002 [7] C.J.C. Burges, “A tutorial on support vector machines for pattern recognition “, in Data Mining and Knowledge Discovery 2, Kluwer, 1998, pp.121-167. [8] T. Joachims, “Making large scale SVM learning practical”, Advances in Kernel Methods-Support Vector Learning, B. Scholkopf, C.J.C. Burges and A.J. Smola Eds., MIT Press, Cambridge, MA, 1999, pp. 169-184 (http://svmlight.joachims.org/).
234
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-234
Role of Topology in Complex Neural Networks Luigi FORTUNA, Mattia FRASCA, Antonio GALLO, Alessandro SPATA and Giuseppe NUNNARI Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi, Viale A. Doria 6, 95123 Catania Abstract. The aim of this study is to investigate the role played by topology in complex networks of neurons working in noisy environment. Due to the presence of noise in real neurons environment a systematic study of how different topologies work in presence of noise is of great interest. To this purpose a software simulator was developed and numerical simulations of different topologies of populations of FHN neurons have been carried out. This work in an on-going project aimed at better understanding complex neuronal dynamics. Future efforts will be devoted to perform a systematic study of the dynamical effects of topology on synchronization. Keywords. Complex network, neuron, topology, noise
Introduction Patterns of electric signals are processed by the brain and constitute all real life sensations like sounds, images, taste, etc. Brain signals that convey information are stereotyped in the entire nervous system and propagate in high-noisy environments. A key principle of the brain function is that the information is not characterized by the form of the signal but by the pathways the signal travels in the nervous system. Aim of this paper is to investigate the behavior of neuron populations in presence of noise. The idea that noise usually degrades the performance of a system has been overcome by the concept of stochastic resonance [1, 2]: in nonlinear dynamical systems noise can have benefic effects. Stochastic resonance was originally introduced to explain the almost periodic recurrences of the Earth’s ice ages. Now the concept of stochastic resonance has been extended to account for all the cases in which the presence of noise enhances the degree of order of a system or improve its performance. Examples are the motion of an overdamped Brownian particle in a bistable potential with weak periodic modulation, nonlinear circuits, lasers, biological systems. Precedent studies on stochastic resonance in summing networks of neurons modelled by FitzHugh-Nagumo (FHN) equations have demonstrated that the sensitivity of neuron populations to under-threshold inputs enhances with the number of neurons involved in the computation [3].In this paper, concepts related to the study of complex systems have been adopted to improve information transmission in neuron complex networks in presence of noise. Several network topologies have been investigated focusing on positive effects of connections in networks of nonlinear FHN units affected by noise. Structures, like chains or fully connected graphs [4], have been simulated by connecting FHN neurons excited by weak input signals. The ratios between the input signals and the detected ones have been evaluated to point out the stochastic resonance features of extended neuron populations versus the topology configuration
L. Fortuna et al. / Role of Topology in Complex Neural Networks
235
Mathematical model Real neurons are characterized by a complex dynamics regulated by the effects of several ion channel flows through the neuron membrane. The behavior of the membrane potential is high-dimensional and its description requires accurate mathematical models. For instance, the Hodgkin-Huxley [5] (HH) takes into account two main channels (of the ions Na+ and K+) and a further variable modelling all the leakage currents and is described by up to 15 differential equations. A simple model derived by the HH model and capturing the salient features of the neuron dynamics is the well-known FitzHugh-Nagumo neuron. The neuron populations investigated in this work are therefore modelled by FHN equations. The equations include two inputs, a noise term (t) and a “signal” input s(t), as follows:
(1)
In eq. (1) the variable vi(t) is a fast variable mimicking the membrane potential, wi(t) is a slow variable modelling the recovery dynamics and A is a constant current bias. The behavior of the FHN neuron in absence of noise can be characterized by a silent state, that in terms of dynamical systems theory is a stable equilibrium point: the system is excitable, when the input signal is chosen over the threshold characteristic of the system a spike is emitted. The FHN neuron shows characteristic signatures of stochastic resonant behavior due to the presence of this threshold-dependent dynamics. It can be clearly identified a noise value optimal in the sense that he system responds to a sub-threshold stimulus with a coherent emission of spikes. This phenomenon allows detecting signals under the threshold and is therefore very important in neurophysiological sensory systems. Coherence between input and related firing has been quantified by an index taking into account that the information in neural systems is coded in spikes [6]. Therefore, ideally at each input peak should correspond a spike or a spike train. By counting the number of true spikes Ptrue (i.e. spikes in correspondence of an input peak) and false spikes Pfalse (i.e. spikes occurring without any input excitation) the following index is defined:
(2)
where Ptot is the total number of input peaks. The weights c1 and c2 take into account the density of true spikes originated by the same input peak and the density of false spikes, respectively. It should be noticed that the index is 1 when the spikes and input peaks are coherent or perfectly in anti-phase.
236
L. Fortuna et al. / Role of Topology in Complex Neural Networks
Neurons in presence of noise In this section several network topologies of FH neurons are analyzed in presence of noise. In the case in which the neurons are not coupled [3] a low sensitivity of large summing networks with respect to the noise level emerges. This phenomenon is referred to as stochastic resonance without tuning. However, the hypothesis of uncoupled neurons is far from the biological case of neuron assemblies, where coupling is the natural solution to enhance information exchange and robustness within the network. For this reason we investigate the emergence of stochastic resonance without tuning in the case of regular networks. Architectures of globally coupled neurons show different behaviours according to the value of the coupling strength. Performances observed by using a low coupling value J=0.01 are reported in Fig. 1. By increasing the value of the coupling to J=0.03 as shown in Fig. 2, the index assumes higher values in spite of a reduction of the range of noise leading to stochastic resonance. This phenomenon is due to the fact that the coupling strength reflects the amount of information exchanged among neurons that leads to collective behavior. As more information is exchanged, the coherence of the response to the stimulus is enhanced, thus leading to higher values of the index. On the other hand, stochastic resonance without tuning is a prerogative of the presence of many sources of noise acting on independent neurons. Thus, the attitude to synchronization, that is higher for increasing values of the coupling, has the effect of enhancing the performance of the system and its sensitivity to the noise. In fact, the minimum noise level detectable is higher in the case of J=0.03.
Figure 1. Value of the index C for a population of globally coupled FHN neurons (J=0.01)
L. Fortuna et al. / Role of Topology in Complex Neural Networks
237
Figure 2. Value of the index C for a population of globally coupled FHN neurons (J=0.03)
Further simulations have been carried out by keeping constant the value of the coupling strength and investigating the effect of increasing the neighborhood radius starting from the case of nearest neighbor coupling (i.e. r=1). In Fig. 3 three cases are reported: r=1, r=3 and r=5. The performance increases according to increasing values of the radius r. It has been verified that starting from a given value of r the performance does not significantly increase.
Figure 3. Value of the index C for a population of locally coupled FHN neurons with r=1, r=3 and r=5.
Regular architectures, so far investigated, represent an abstraction of real neuron assemblies, but allow focusing on the dynamical features of the population in a simple connection pattern. On the opposite side, random networks are characterized by asymmetric structures and capture in an idealized way the features of many real systems. These networks have a short average path length which derives from the fact that, starting from each node, any other node of the network can be reached in a small number of links [4]. Many real networks (as for instance social networks) have a short average path length, but at the same time show a high clustering degree due to the presence of both short-range and long-range links. In order to model these systems Strogatz and Watts
238
L. Fortuna et al. / Role of Topology in Complex Neural Networks
introduced the concept of small-world networks that successfully captures, for instance, the essential features of the neuronal system of the C. Elegans [4]. Small-world networks can be built starting from a network of locally coupled neurons and replacing some links with new random ones with probability p. By increasing the probability p the architectures of the neuron population is tuned between the two extremes, regular and random networks. Small-world networks are characterized by low values of the probability p. We considered small-world networks as random networks and furthermore we remove the hypothesis that all the neurons are identical. In fact, in recent works [7] it has been demonstrated that parametric and structural dissymmetries may enhance the synchronization properties of spatially extended networks. For example, it has been shown how an array of slightly different pendula (or Josephson Junctions) organizes itself, while a disordered behavior is observed in an array of identical elements. Moreover, dissymmetries obtained by using deterministic processes have been compared with random ones. These deterministic dissymmetries are obtained starting from a chaotic process and exploiting the noise-like spreadband spectrum of a chaotic signal. A chaotic state variable is sampled to generate a sequence of unpredictable values. Such a sequence can be used to introduce dissymmetry in a system in two main ways: parameter and structural dissymmetries. In a large number of cases the introduction of a deterministic fluctuation in the parameters of the units constituting the complex system has lead to an improvement of the self-organizing properties. On the other side deterministic sequences may also be used to generate the structure of connections in a small-world topology. Both cases have been investigated in FHN populations. The results are summarized in Fig. 4, where are also reported the cases of uncoupled neurons (i.e. the best case as regards the range of suitable noise levels) and random connections (i.e. the best case as regards the maximum level of stimulus response coherence). The introduction of structural dissymmetries has several advantages: with respect to the case of random connections, it increases the range of suitable noise and mantains a high value of the maximum level of C. But the improvement given by the introduction of an unhomogeneity in the parameters of each single neuron is even greater. When non identical FHN neurons arranged in a small-world network (regulated by a deterministic structural dissymmetry) are taken into account, the network behaves optimally both with respect to the maximum level of C and to the range of suitable noise. In particular, the results have been obtained by considering A as the parameter accounting for the unhomogeneity of the neuron population and letting it variable in the range of [0.01,0.01], while the chaotic sequence has been generated by using the peak-to-peak dynamics of the variable x of a Double Scroll Chua's Attractor. From the analysis of networks of noisy neurons, we can conclude that systems behave in a very different way. For example, when uncoupled neurons are considered, the network exhibits stochastic resonance without tuning. The same behavior occurs in network of locally coupled neurons only if the coupling strength is low. On the other hand, increasing the coupling of regular topologies enhances the performance of the network and decreases the minimum value of noise leading to stochastic resonance. Moreover, the overall range of suitable noise level to observe stochastic resonance is reduced. Neuron populations with random connections are those characterized by the highest values of the stimulus response coherence. The simulation results lead to the conclusion that connections improve the performance of the system by increasing the
L. Fortuna et al. / Role of Topology in Complex Neural Networks
239
information exchange among the neurons, but need a tuning of the noise level. When non identical FHN neurons arranged in a small-world network (regulated by a deterministic structural dissymmetry) are taken into account, the network behaves optimally with respect to both the aspects of the stimulus response.
Figure 4. Comparison between different architectures of FHN populations
Conclusions In this paper networks of noisy neurons have been analyzed from the viewpoint of stochastic resonance. Several topologies from regularly connected to random connected networks have been studied. These systems behave in a very different way. For example, when uncoupled neurons are considered, the network exhibits stochastic resonance without tuning. The same behavior occurs in network of locally coupled neurons only if the coupling strength is low. On the other hand, increasing the coupling of regular topologies enhances the performance of the network and decreases the minimum value of noise leading to stochastic resonance. Moreover, the overall range of suitable noise level to observe stochastic resonance is reduced. Neuron populations with random connections are those characterized by the highest values of the stimulus response coherence. The simulation results lead to the conclusion that connections improve the performance of the system by increasing the information exchange among the neurons, but need a tuning of the noise level. This would suggest that to obtain the best performance, a trade-off between these two aspects of the stimulus response in neurons (stimulus response coherence and noise sensitivity) is mandatory. However, when nonidentical FHN neurons arranged in a small-world network (regulated by a deterministic structural dissymmetry) are taken into account, the network behaves optimally with respect to both aspects of the stimulus response. This configuration thus can be considered the best architecture of FHN neurons in a noisy environment.
References [1] R. Benzi, A. Sutera, A. Vulpiani. The mechanism of stochastic resonance. In J. Phys. A, 14, pp. L453, 1981.
240
L. Fortuna et al. / Role of Topology in Complex Neural Networks
[2] L. Gammaitoni, P. Hnggi, P. Jung, F. Marchesoni. Stochastic resonance. In , 70, 23, 1998. [3] J. J. Collins, C. C. Chow, T. T. Imhoff. Stochastic resonance without tuning. In Nature, 376, pp. 236-238, 1995 [4] S. H. Strogatz. Exploring complex network. In Nature, 410, 2001 [5] M. A. L. Hodgkin, A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. In J. Physiol. Lond, 117, pp. 500-544, 1952 [6] M. La Rosa, M. I. Rabinovich, R. Huerta, H. D. I. Abarbanel, L. Fortuna. Slow regularization through chaotic oscillation transfer in an unidirectional chain of Hindmarsh-Rose models. In Physical Letter A, 266, pp. 88-93, 2000 [7] {Bucolo M., Caponetto R., Fortuna L., Frasca M., Rizzo A., Does chaos works better than noise?, CAS Magazine, 3 (2002), 4-19.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-241
241
Non-Iterative Imaging Method for Electrical Resistance Tomography Flavio CALVANOa, Guglielmo RUBINACCIa and Antonello TAMBURRINO b,1 a Ass. EURATOM/ENEA/CREATE, DIEL, Università di Napoli Federico II, Italy b Ass. EURATOM/ENEA/CREATE, DAEIMI, Università di Cassino, Italy
Abstract. Electrical Resistance Tomography (ERT) is a body of methods and techniques aimed to reconstruct the spatial distribution of the resistivity of a material starting from the knowledge of boundary measurements such as, for instance, the Neumann-to-Dirichlet map. This inverse problem is ill-posed and nonlinear and, therefore, its solution require a considerable computational effort. In this paper we discuss a fast non-iterative reconstruction method for locating inclusions in an otherwise homogeneous material. This method, potentially, is a candidate for near real-time applications. Keywords. Electrical resistance tomography, non iterative imaging methods, inverse problems, non-destructive testing.
Introduction This paper is focused on the Electrical Resistance Tomography (ERT) to detect inclusions in conducting materials by non-iterative methods. Non-iterative methods [16] have attracted a lot of interest because they provide a test for evaluating if a point (or a subregion) of the conducting domain is part or not of the anomaly, regardless other points (or subregions). The test is usually very cheap from the computational viewpoint. On the other hand iterative methods, representing the most common approaches to inverse problems, update iteratively the current estimate of the spatial distribution of the resistivity. At each step of the algorithm at least one forward problem has to be solved thus the computational cost is an issue and, moreover, the solution can be trapped in false solutions. In ERT a major role is played by the Neumann-to-Dirichlet map that is the operator ȁ mapping the boundary (applied) currents f into the boundary (measured) voltages u w: , i.e. / : f o u w: where: 1 ° K u ® 1 ° ¯K wu / wX
0 f on w:
: is the conducting domain under investigation and Q is the outward normal on w:.
1
Corresponding Author.
(1)
242
F. Calvano et al. / Non-Iterative Imaging Method for Electrical Resistance Tomography
It is well known (see [7]±[11]) that the inverse problem of reconstructing the resistivity Șfrom the knowledge of ȁ has an unique solution when the resistivity satisfies some assumptions. In particular, here we face the reconstruction of piecewise smooth resistivities. In addition, we assume that the resistivity of the background, as well as of each inclusion, is constant. The union of all inclusion is the set B:. In this paper we describe a fast and non-iterative methods together with numerical examples showing the related performances. The method, introduced in [1] and [2], has been originally developed for two phase materials. In this paper, we extended it to treat anomalies having three or more phases.
1. A non-iterative method: Monotonicity Imaging Method The Monotonicity Imaging Method (MIM) has been developed by the authors in [1] and [2]. The method relies on the idea that inclusions with higher electrical resistivity increase the Neumann-to-Dirichlet map in some sense. In particular, it is possible to demonstrate the following monotonicity for the Neumann-to-Dirichlet map [12]:
B1 B2 / 2 /1 t 0
(2)
where t means positive semi-definite, ȁk is the Neumann-to-Dirichlet map corresponding to an anomaly occupying the region Bk and the electrical resistivity in Bk is Ka, higher than the resistivity of the host material K0. The inversion algorithm (see [1, 2]) is based on:
/ / test t 0 false : test B
(3)
that follows directly from (2), where ȍtest is the support of a generic test anomaly (electrical resistivity equal to Ka in ȍtest and K0 in ȍ\ ȍtest) used to reconstruct the unknown anomaly B. The imaging method checks (8) for different trial subsets ȍtest, for instance those obtained by partitioning the domain : in non-overlapped subregions (see figure 1).
Figure 1. Subdivision of the domain :in elementary subdomains :k
F. Calvano et al. / Non-Iterative Imaging Method for Electrical Resistance Tomography
243
Then, the reconstruction is the union of the subsets that, through test (3), result to be included in B. Test (3) can be performed by evaluating the sign of the eigenvalues of ȁ-ȁtest. In order to avoid false detections due to the presence of noise that can alter the eigenvalues closer to zero, we associate to the test anomaly in the k-th subdomain ȍk the following quantity (sign index):
¦O
sk
k, j
/ Ok , j
(4)
j
where Ȝk,j are the eigenvalues of ȁ-ȁk and ȁk is the Neumann-to-Dirichlet map related to the test domain ȍk. The peaks in the spatial map given by 1/(1-sk) reveals the presence of the anomalies. The presence of anomalies having two or more values of resistivity poses some problems. Let us consider the case of two phases anomalies where the resistivity can be Ka and Kb in different regions of the domain :. The overall problem (considering the background also) consists in the imaging of a three phase material (resistivities Ka, Kb and K0). Therefore, it is rather natural to consider two different families of test anomalies. If, for instance, Ka>Kb>K0, then a first family of anomalies assuming values Kb in any single ȍk can be used to find the regions where there is an anomaly, either of type a or b, and a second family of anomalies assuming values Ka in any single ȍk can be used to find the regions where there is an anomaly of type a. On the contrary, when Ka>K0>Kb the above mentioned two families do not provide satisfactory results if applied in the straightforward manner. To get appropriate results, we found empirically that the matrix to be processed is not ȁ-ȁtest but, rather, ȁ-Fȁtest where c is a positive constant found experimentally. Preliminary results prove the effectiveness of the method for three phase materials where the resistivity of the anomalies may assume values higher or lower than the background resistivity.
2. Numerical examples In the following sections we show two examples from biomedical applications. The data are synthetic and, to avoid the inverse-crime, we added random noise. At the discrete level, i.e. when the Neumann-to-Dirichlet map is represented through a matrix, the noise model is as follows:
~ /i j
/ij Ni j [
(5)
~ where / is the noisy Neumann-to-Dirichlet map, / is the noiseless Neumann-toDirichlet map, N is a normalized noisy having entries uniformly distributed in (-G, +G) and [
max i , j / / 0 i, j .
244
F. Calvano et al. / Non-Iterative Imaging Method for Electrical Resistance Tomography
2.1. First numerical example The first example refers to the identification of a muscle in a fat background with resistivities of 4.386:/m and 2.2422:/m, respectively. The shape of the inclusion is shown in the figure 2, whereas the related reconstructions are shown in figure 3.
Figure 2. A rectangular inclusion aspect ratio of 1:2 in a disk.
Figure 3. Reconstructions obtained with the monotonicity method. Right: noise level G=0.001. Left: noise level G=0.01. The reconstructions are shown together with the elementary subdomains.
The shape represents the muscle in an extremely schematic and simplified manner. It is possible to notice that the reconstructions obtained with this method are satisfactory. 2.2. Second example The second example (see figure 4) concerns the reconstruction of the heart and two lungs with resistivities 2:/m and 0.83:/m respectively, in a diastolic configuration, with a background resistivities equal to 1:/m. It is worth noting that the three objects present resistivities either greater or smaller than those of the background. As discussed in Section 1, this has a major impact on MIM, originally developed for the imaging of two phases materials. In the following we describe the details of the approach used to apply MIM to this configuration. We stress that these are preliminary results and some theoretical issues have to be clarified. In any case, to treat a three phases problem (1:/m, 2:/m and 0.83:/m) we have introduced two families of test anomalies. A first family is for detecting the presence of the heart and is made by anomalies where the resistivity is equal to that of the background apart from one elementary subdomain where the values is that of the heart (2:/m). The second family differs from the first one in the value of the resistivity in the elementary subdomain, equal to that of the lungs (0.83:/m). Finally, the sign index (4) is computed on ȁ-Fȁtest where c is a constant found empirically. Numerical results are shown in figure 5.
F. Calvano et al. / Non-Iterative Imaging Method for Electrical Resistance Tomography
245
Figure 4. Two rectangular inclusions (lungs) and a square inclusion (heart) in a disk.
Figure 5. Top: reconstructions with 0.1% noise. Bottom: reconstructions with 1% noise. The identification of the lungs (left) and the heart (right) is carried out in two separate steps.
3. Conclusions A non-iterative imaging method for locating inclusions in a conductor by means of Electrical Resistance Tomography has been presented and extended from two phase problems to three phase problems. The main advantages of this imaging method are that no iterations are required and its related low computational cost. Numerical examples prove the effectiveness of the method as well as the effectiveness of its extension to three phase materials where the anomalies may assume a resistivity values either higher or lower than those of the background. This non-iterative method is a candidate for near real-time applications.
246
F. Calvano et al. / Non-Iterative Imaging Method for Electrical Resistance Tomography
References [1] A.Tamburrino and G. Rubinacci, "A new non-iterative inversion method for electrical impedance thomography", Inverse Problems, pp. 1809±1829, 2002. [2] $ 7DPEXUULQR * 5XELQDFFL ³)DVW PHWKRGV IRU TXDQWLWDWLYH HGG\ FXUUHQW WRPRJUDSK\ RI FRQGXFWLYH PDWHULDOV´IEEE Trans. on Magnetics, vol. 42, no. 8, pp. 2017-2028, August 2006 [3] M. Hanke and M. Bruhl, "Numerical implementation of two non iterative methods for locating inclusions by impedance thomography", Inverse Problems, pp. 1029±1042, 2000. [4] A Kirsch, ³Characterization of the shape of the scattering obstacle using the spectral data of the far field operator´ Inverse Problems, v. 14, pp. 1489±512, 1998. [5] M. Hanke and M. Bruhl, "Recent progress in electrical impedance tomography", Inverse Problems, pp. S65±S90, 2003. [6] A. J. Devaney, "Super-resolution processing of multi-static data using time reversal and MUSIC", J. Acoust. Soc. Am, 2004. [7] R. Kohn and M. Vogelius, "Determining conductivity by boundary measurements", Commun. Pure Appl.Math., pp. 289±98, 1984. [8] J. Sylvester and G. Uhlmann "Global uniqueness theorem for an inverse boundary value problem", Ann. Math., pp. 153±69, 1987. [9] A. Nachman, "Reconstruction from boundary measurements", Ann. Math., pp. 531±76, 1988. [10] A. Nachman, "Global uniqueness for a two-dimensional inverse boundary value problem", Ann. Math., pp. 71±96, 1995. [11] V. Isakov, "Uniqueness and stability in multi-dimensional inverse problems", Inverse Problems, pp. 579±621, 1993. [12] G. Gisser, D. Isaacson and J. C. Newell, "Electric current computed tomography and eigenvalues", SIAM J. of applied Mathematics, pp. 1623-1634, 1990.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-247
247
Role of temporally asymmetric synaptic plasticity to memorize group-synchronous patterns of neural activity Silvia SCARPETTA a,1 , Ferdinando GIACCO a , Maria MARINARO a , a Dipartimento di Fisica ”E R Caianiello”, Universita di Salerno, CNISM, INFM and INFN Gruppo coll. Salerno, Italia Abstract. Many experimental results have generated renewed appreciation that precise temporal synchronization, and synchronized oscillatory activity in distributed groups of neurons, may play a fundamental role in perception, memory and sensory computation, especially to encode relationship and increase saliency. Here we investigate how precise temporal synchronization of groups of neurons can be memorized as attractors of the network dynamics. Multiple patterns, each corresponding to different groups of synchronized oscillatory activity, are encoded using a temporally asymmetric learning rule inspired to the spike-timingdependent plasticity recently observed in cortical area. In this paper we compare the results previously obtained for phase-locked oscillation in the random phases hypothesis, to the case of patterns with synchronous subgroups of neurons, each pattern having neurons with only Q = 4 possible values of the phase. Keywords. Synchrony, spatio-temporal patterns, associative memory, cortical dynamics
Introduction Recent advances in brain research have generated renewed awareness and appreciation that the brain operates as a complex nonlinear dynamic system, and synchronous oscillations may play a crucial role in information processing, such as feature grouping and saliency enhancing [1,2,3]. Precise temporal synchronization, that exploits the precise temporal relations between the discharges of neurons, may be an effective strategy to encode information [1]. Many results led to the conjecture that synchronized and phase locked oscillatory neural activity play a fundamental role in perception, memory, and sensory computation [4,5]. Recent experimental findings further underlined the importance of synchrony and precise temporal relationship of the dynamics by showing that long term 1 Corresponding Author: Silvia Scarpetta, Dipartimento di Fisica ”E R Caianiello”, Universita di Salerno, CNISM, INFM and INFN Gruppo coll. Salerno, Italia; E-mail: e-mail: [email protected].
248
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
changes in synaptic strengths depend on the precise relative timing of pre- and post-synaptic firing [6,9,8,7,10]. Many oscillatory models has been proposed for many areas of the brain, from coupled oscillators in a phase-only representation (see, among others, [11, 12,13,14,15] and references therein), to reduced models of coupled Excitatory and Inhibitory (E-I) neurons [16,20,18,22], and many experimental features have been accounted by these models. In previous papers [17,18], we showed how a temporally asymmetric learning rule introduced in [20,19,21], inspired on the STDP results, is able to encode multiple phase-locked oscillatory patterns as attractors of the network dynamics, such that these spatiotemporal oscillatory patterns are replayed spontaneously, depending from initial conditions. However this work, as many other oscillatory models, assume that phases of the oscillatory population are chosen randomly in [0, 2π). In [17] we analyzed the case of patterns in which each neuron had a phase extracted randomly in [0, 2π), that is the analogous of the random pattern hypothesis of the Hopfield model. In this case, the overlap among patterns is small in the thermodynamic limit (i.e. when number of neurons is large with respect to the number of encoded patterns), the dynamics can be studied analytically, and the learning rule is able to encode the patterns as memory states. In this paper, we study the case of group-synchronous oscillations, in which there are synchronous subpopulations of neurons, i.e. patterns in which phases have only Q = 4 possible values, φ = 2πq/Q with q = 0, 1, 2, 3. Therefore there are four groups of synchronous neurons, and each group is phase-shifted with respect with other groups. The pattern is defined specifying which neurons belong to each of the four subpopulations. We show that, even without the random-phases hypothesis, the dynamics can be studied analytically, and, under some conditions, the learning rule is able to encode the patterns as memory states. However in this case of synchronous groups, we find that the relationship between the frequency of recall and the learning window is different from the one that we had when all phases of the patterns are randomly chosen in [0, 2π). Morevoer some learning window shapes, which allowed the encodeing and recall of patterns in the random phases case, are not suitable in the case of group-synchronus patterns, and a certain degree of temporal asymmetry in the learning window shape is needed to encode and recall properly the group-synchronous patterns.
1. The model The model is based on the one introduced in [20,17]. We review its main ingredients here. The dynamic evolution of unit xi is ⎛ d xi = −xi + F ⎝ dt
j
⎞ Jij xj ⎠ ,
(1)
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
249
where the transfer function F (h) denotes the nonlinear input-output relationship of neurons, and Jij is the connection after the learning procedure. In the following analysis we will consider the transfer function F (h) to be the Heaviside function F (h) = 1 if h > 0, F (h) = 0 otherwise, while in the numerical simulations we will use a continuous function F (h) = 1 + tanh(βh) with large β. As usual in many models [20], learning is assumed to occur in a first phase in which neural plasticity is turned on, the network is clamped to external teaching signals and the learning rule is applied. After learning, the dynamics of the network of N coupled nonlinear units is analyzed, looking for the analytical conditions under which the learned patterns are dynamical attractors of the dynamics. The proposed learning procedure is described in sec (1.1) Clearly, spontaneous activity dynamics of the coupled nonlinear system is determined by the shape of the coupling matrix Jij . We also investigate here the relation between the time scale of encoded pattern, the time scale of retrieval replay and the shape of the learning window. After encoding a finite number of periodic spatiotemporal patterns during the learning mode, we are able to derive the order parameter dynamics of the networks in the recall mode. If the learning rule proposed in the following is used, with a proper shape of the learning window, then the network is able to memorize the encoded patterns and replay them selectively. We consider P periodic patterns (μ = 1, ..., P , j = 1, ..., N ) Pjμ (t) =
cos(ωμ t − φμj ) + 1 2
(2)
characterized by unitary amplitudes, frequency ωμ , and phase φμj on unit j. We can describe such an oscillation pattern using the equivalent notation Pjμ (t) =
1 1 + ξjμ e−iωμ t + c.c. , 4
(3)
where c.c. denotes the complex conjugate, and we denote with ξiμ = exp(iφμi )
(4)
the pattern of phase shifts of the encoded oscillatory pattern μ. Previously [17,18] we have studied the case in which the phases φμj were chosen randomly in [0, 2π) (so called random phases case). Here we study the case in which encoded patterns have phases with only Q = 4 possible values, namely φμj = qjμ
2π Q
(5)
where qjμ is an integer number chosen randomly in {0, 1, 2, 3}. This correspond to synchronous activity in four distributed groups of neurons, and each pattern have a different composition of the four synchronous groups. In Fig. 1a,b, we show the phases φμi of P = 2 patterns encoded in a network of N = 40 neurons. When the learning rule proposed in the following is used to encode these partially-synchronous patterns, as defined in Eq. (2-5), into the connections Jij ,
250
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
the network memorizes these patterns as attractors of the dynamics. Numerical simulations show indeed that, when P N , initial condition on the state of the network can force the network to replay one or the other of the encoded patterns, in the sense that the phases of the patterns are retrieved: looking at spontaneous dynamics of the network we see that a pattern is retrieved in the sense that the indices of the neurons which participate to each of the 4 synchronous group are preserved. If the initial state of the network have enough overlap with pattern μ, then the phase relationship of the spontaneous activity of neurons preserve the phase relationship of pattern μ. In fig 1c it is shown the spontaneous dynamics of the network, when the P = 2 patterns shown in fig.1ab have been memorized into the connectivity matrix Ji j. After learning, we carry out the numerical integration of the dynamics (1), under the initial condition xi (0) = Piμ (0) + ηi (where ηi is a white noise extracted randomly between −0.3 and 0.3). Figure 1c shows the phase relationship of spontaneous dynamics of the network (after the initial transient), in the case in which initial conditions have a large overlap (similarity) with pattern μ = 1, i.e. under initial conditions xi (0) = Pi1 (0) + ηi . We see that initial condition forces the network to retrieve the pattern μ = 1. The behavior of xi (t) in this numerical simulation is shown in Fig, 1(d) (only 8 out of the N = 4096 neurons are shown). Note that while the wave forms of xi in Fig. 1(d) are different from the simple sinusoidal Piμ (t), the phase shifts of encoded pattern in fig. 1a are well reproduced in the activity pattern 1c. In this sense, the initial condition xi (0) near Piμ (0) leads to retrieval (i.e. replay) of pattern P μ , since the activity preserves the encoded phase relationship among units. The same set of synchronous units of the encoded pattern is observed during replay, but the replay occurs at a different oscillation frequency. Depending from the shape of learning window A(t) used to learn Jij , the replay can be faster or slower than the encoded frequency. To understand analytically this behavior, and the relation among the two frequencies, we define, as in [18,17], the overlap of the network output xi (t) with the pattern μ as the scalar product mμ =
1 μ ξi xi (t). N i
(6)
The order parameters mμ are complex-valued, and exhibit periodic orbits in the complex plain. The amplitude of oscillation of the order parameters is highest when the pattern is perfectly replayed, and tends to zero for not replayed patterns. Replay frequency ω ¯ takes a different value from encoded pattern frequency ωμ , since replay occurs at a time scale different from encoded patterns. In contrast with the case in which phases are randomly distributed in [0, 2π), studied in [17], here the module |mμ | of the order parameter is not constant in time, but have oscillations around a mean value, with a period four times smaller then the period of the real and imaginary part of mμ (see fig 2).
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
251
1.1. Spike-Timing-Dependent Plasticity and the proposed temporal asymmetric learning rule Plasticity of synaptic connections is regarded as a cellular basis for the developmental and learning-related changes in the central nervous system. In neocortical and hippocampal pyramidal cells, it has been found [6,9,8,7,10] that the synaptic strength increases (long-term potentiation (LTP)) or decreases (long-term depression (LTD)), whether the presynaptic spike precedes or follows the postsynaptic one by few milliseconds, with a degree of change that depends from the delay between pre and post-synaptic activity via a learning window that is temporally asymmetric. This form of synaptic plasticity was also employed in a model for auditory localization by relative timing of spikes from two ears [24,25]. The computational role and functional implications of this type of plasticity, also called Spike Timing Dependent Plasticity (STDP), have been explored recently from many points of view (see for example [27,25,26,30,28,29] and reference therein). STDP has also been hypotized to play a role in the hippocampus phase precession phenomenon [32,33,34], even thou other explanations has also been proposed for this phenomena (see [35] and references therein). Here we analyze the behavior of the simple model of Eq. (1), when we use the asymmetric time-dependent learning rule proposed in [21], and investigated in [20,17,18], inspired to the experimental findings cited above. Briefly, the learning rule can be formulated as follows: 1 δJij (t) = NT
*T
*∞ dt
0
dτ xi (t + τ )A(τ )xj (t)
(7)
−∞
for synaptic weight Jij , where xj (t) and xi (t) are the pre- and post-synaptic activities, and we have explicitly added the conventional normalization factor 1/N for convenience in doing the mean field calculations. Activity-dependent modification of synaptic strengths due to the proposed STDP-based asymmetric learning rule in Eq. (7), is sensitive to correlations between pre- and postsynaptic firing over timescales given by the range of the window A(τ ), that typically is tens of ms. The kernel A(τ ) is the measure of the strength of synaptic change when there is a time delay τ between pre- and postsynaptic activity. Writing Eq. (7), implicitly we have assumed that the effects of separate spike pairs due to STDP sum linearly. However note that many non-linear effects have been observed [36,37,29], therefore our model holds only in those case where linear summation is a good approximation. Note that Eq. (7) reduces to the conventional Hebbian one (used, e.g., in [31]), +T δCij ∝ dt xi (t)xj (t), when A(τ ) ∝ δ(τ ). However, to model the experimental 0
results of STDP such as [7,8], the kernel A(τ ) should be an asymmetric function of τ , mainly positive (LTP) for τ > 0 and mainly negative (LTD) for τ < 0. The shape of A(τ ) strongly affect Jij the dynamics of the networks, as discussed in the following.
252
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
Let’s consider we want to encode P patterns given by Eq. (2-5). So we present an input Pjμ (t) to our network and we apply our learning rule (7), In the brain, cholinergic modulation can affect the strengths of long-range connections; these are apparently reduced strongly during learning [23]. In our model we therefore make the simplifying assumption that connections Jij are ineffective during learning (while they are plastic and learn their new value), and the dynamics is driven by the input Pjμ (t), so that xj (t) ∝ Pjμ (t). If there is a balance between total inhibition and excitation such that *∞ ˜ dτ A(τ ) = A(0) =0
(8)
−∞
then we obtain μ ˜ μ ) ξ μ ξ μ∗ ] = |aμ |cos(φμ − φμ + ψ μ ) = Re[A(ω Jij i j i j
(9)
where φμi = qiμ 2π/Q and qiμ ∈ {0, 1, 2, 3}, and ˜ μ ) = aμ ≡ A(ω
*
∞
−∞
dτ A(τ )e−iωμ τ = |aμ |eiψ
μ
(10)
˜ μ ) can be thought of as an is the Fourier Transform of the learning window. A(ω effective learning rate at the frequency ωμ . So, for examples all neurons i and j that are synchronous in the patterns that is encoded (qiμ = qjμ ), have a synaptic ˜ μ )), while the neurons which belong to two connection Jij ∝ |aμ |cos(ψ) = Re(A(ω different shifted sub-populations has a synaptic connection Jij ∈ |aμ |cos(ψ + q2π Q ), where q = 0 is the shift. As shown in the following, the parameter ψ μ in Eq. (10), related to the ratio ˜ μ ), will set frequency of the internal between the real and the imaginary part of A(ω mode of the system, i.e. the replay frequency. In order to learn multiple patterns ξ μ , μ = 1, 2, ..., P , we set Jij as the sum of contributions from individual patterns like Eqs. (7). Applying (9) to P patterns, we get the following learning rule: Jij =
P μ=1
μ Jij =
P P 1 ˜ μ )ξ μ ξ μ∗ = 1 Re A(ω |aμ |cos(φμi − φμj + ψ μ ) i j N μ=1 N μ=1
(11) If there not a perfect balance between inhibition and excitation, then Jij = where
P 1 ˜ μ )ξ μ ξ μ∗ + bP/N Re A(ω i j N μ=1
(12)
253
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
*∞ b=2
A(t)dt
(13)
−∞
The term b is zero, and the results (11, 9) hold, only when there is balance between inhibition and excitation in learning rule as in Eq. (8), or also in the case of patterns such as Pjμ (t) = cos(ωμ t − φμj )) = 1/2ξjμ e−iωμ t + c.c., with a balance between positive and negative values. We will study the case of balance b = 0, i.e. eq. (8,11), where patterns are given by eq. (2-5). 1.2. Order parameter Dynamics Let’s consider the network in Eq. (1) whose connections Jij encode P partially synchronous patterns given by Eq. (2-5), such that after the learning stage, the connection are given by Eq. (11). Starting from Eq. (1), and from the definition of order parameter in Eq. (6), mμ =
j
ξjμ xj (t) =
we can rewrite the local field hi = hi =
ν
j
μ
e(iqj 2π/Q) xj (t)
(14)
j
Jij xj in the form
˜ ν )ξiν mν∗ . Re A(ω
(15)
and the following equation, where Q = 4, for the order parameters mμ , holds Q 1 P d μ μ m = −m + ( ) ... dt Q 1 q =1 Q q P =1
e(iq
μ
2π/Q)
F
˜ ν )e(iq Re A(ω
ν
2π/Q)
mν∗
(16)
ν
μ
N Here we use the fact that the sum j=1 e(iqj 2π/Q) becomes a sum over the 4
Q μ possible values of q, qμ =1 e(iq 2π/Q) , since qiμ was extracted randomly and the number of neurons N tends to infinity in the thermodynamic limit. Note that these differential equations are different from the case investigated in [17], where the hypothesis of random phases distribution ψ ∈ [0, 2π) was used to investigate the dynamics. Here, we will investigate the case in which the number of neurons is much larger then the number of encoded patterns, √ the crosstalk
N >> P , such that term is negligible, i.e. overlap 1/N μ=ν i ξiμ ξiν is order O(1/ N ) and tends to zero when N tends to infinity, with P finite. Let’s make the ansatz that one of the encoded pattern is retrieved, let’s say μ = 1, and therefore
254
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
a: phases of pattern μ = 1
c: phases during recall of pattern 1
b: phases of pattern μ = 2
d: neurons activity xi (t) during recall 1
Figure 1. In this numerical simulation of the network in eq. (1), two oscillatory patterns P μ (μ = 1,2), each formed of 4 groups of synchronous neurons (Q=4), and same frequency, have been memorized in a network of N=4096 units according to the proposed learning rule (11), with |aμ | = 1 and ψ μ = 1.1π/4 for both patterns. The phase relationship of the two patterns are shown in subplots (a) and (b) (only 40 of the 4096 units are shown). Phases φμ i of the 40 neurons in pattern μ = 1 are shown in (a), neurons are ordered according with their phases in pattern 1, and the same 40 neurons used in (a) are shown (b) where phases of pattern μ = 2 is plotted. The numerical integration of equation (1) are computed with the transfer function F given by F (h) = (1 + tanh(βh))/2 with β = 90, and initial condition set to xi (0) = Piμ (0) + η with μ = 1. This induces the retrieval (replay) of pattern μ = 1. Indeed, the phase relationship of neurons during spontaneous dynamics, shown in (c), is the same phase relationship of pattern μ = 1. (d) The behaviour of xi (t) is plotted as a function of time only for selected 8 neurons (i = 1,2, 11,12,21,22,31,32).
m1 = m mμ = 0 f or μ = 1
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
255
a: order parameter mμ , μ = 1 during recall 1 b: order parameter mμ , μ = 2 during recall 1
Figure 2. The time evolution of order parameter mμ , defined in eq. (6), with μ = 1 (a) and μ = 2 (b) , is plotted as a function of time, during the numerical simulations of the network shown in fig 1cd (retrieval of pattern μ = 1). The pink line is the module |mμ |, while the red and blue dotted lines are the real and immaginary part of mμ , i.e. |mμ |cos(θμ ) and |mμ |sin(θμ ) respectively. The Real and the Immaginary part of mμ oscillate in time, and the module |mμ | also oscillate but with 4-times lower period. In contrast with the results of [0, 2π)-phases case studied in [17], here the module is not constant in time. Order parameter mμ with μ = 2 is much lower then mμ with μ = 1, since initial condition set the retrieval of pattern μ = 1, and therefore simularity with pattern μ = 2 is negligible in the limit of large network (here N=4096).
Using the polar notation m = |m|iθ , we get: d |m| = −|m| + (1/4)[F (X) − F (−X)]cos(θ) + (1/4)[F (Y ) − F (−Y )]sin(θ) dt d |m| θ = (1/4)[F (X) − F (−X)]sin(θ) + (1/4)[F (Y ) − F (−Y )]cos(θ) dt (17) where X = |a||m|cos(θ−ψ) Y = |a||m|sin(θ−ψ). Notably, the dynamical behavior is substantially different depending from the shape of the activation function F (h). We will investigate analytically the case of F (h) to be the Heaviside function, F (h) = 1 0
if
h > 0,
otherwise.
(18)
On the contrary, numerically we will investigate, the case in which F (h) is the continuous function F (h) = (1 + tanh(βh))/2 for different values of β. Notably, when F (h) is the Heaviside step function, the terms F (X) − F (−X) and F (Y ) − F (−Y ) are 0 or 1 depending from the value of θ − ψ, and therefore they are constant in each quadrant of θ − ψ; for example in the first quadrant, i.e. 0 < θ − ψ < π/2, it’s F (X) − F (−X) = 1. Exploiting this property, we can rewrite the
256
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
a
b
Figure 3. The picture shows analitical solutions of eqs.(22,21). We show in subplot a and b, respectively, the module of the order parameter |m| and the output frequency, evaluated as the inverse of the period T, where T is the time that θ in eq (21) takes to make an entire cycle across the 4 quadrants. Order parameter module and frequency are shown as a function of ψ. As expected, some values of ψ (|ψ| < π/4) are not allowed.
coupled nonlinear order parameter equations Eq. (17) in a form that is exactly solvable in each quadrant: d d |m| = −|m| − M1 f (M2 θ) dt dθ d |m| θ = M1 f (M2 θ) dt
(19) (20)
where f (θ) = 1/4(cos(θ) − sin(θ)), and M1 = 1 in the first and second quadrant of (θ − ψ), M1 = −1 in the third and fourth quadrant, while M2 = 1 in the first and third quadrant, and M2 = −1 in the second and fourth quadrant. Solving the Eq. (19,20), in each quadrant of θ − ψ, we get θ = atan(bn et + cn ) − M2 π/4
(21)
√ |m| = M1
2 e−t 4 bn cos(θ + M2 π/4)
(22)
where we remember that M1 , M2 are ±1 depending from the quadrant of θ − ψ, and bn , cn are arbitrary constant that have a different arbitrary value in each quadrant.
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
257
Figure 4. The output frequency, evaluated as the inverse of the period T of the module of the orP μ der parameter mμ = N i=1 ξi xi (t), during retrieval of pattern μ = 1, measured (a)numerically, from the numerical integration of eq(1), when 1 pattern was encoded, with F given by eq. (23) for different values of β = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 (blue stars), the larger stars correspond to β = 100; (b)analitically, from evaluation of the period of |mμ | in eq. (21,22) (red dots) where F (h) is the Heaviside function. The output frequency ω ¯ is plotted as a function of ψ. The black line is shown only for comparisons, and it’s the analitical result ω ¯ = tan(ψ) that holds only in the case of random phases ψ ∈ [0, 2π] studied in [17]. We see that there is good agreement between numerical simulations and analitical predictions, indeed for β large (big stars are β = 100) the numerical results (with F (h) = (1 + tanh(βh))/2) converge to the analitical results (evaluated when F(h) is the Heaviside function).
Equations (21,22) should be solved in each quadrant, the 8 arbitrary constants ( bn , cn , two for each quadrant) are determined when we impose that solutions |m|, θ are continuous functions in the separating points between one quadrant and the other, and initial conditions.
2. Results and Conclusions Let’s investigate order parameter dynamics, for different values of ψ. Analitical solutions in eq. (21,22) are shown in fig 3, where we plot, as a function of ψ, the module |m| and the frequency ω ¯ = 1/T where T is the time interval that θ takes to do a entire cycle of the 4 quadrants. Clearly the shape of learning window A(t) influence the value of ψ (see eq. (10)). We safely consider ψ ∈ [−π/2, π/2] since original equations for ψ and ψ + π are the same. Looking at order parameter solutions (21,22), in each quadrant, we see that not all values of ψ ∈ [−π/2, π/2] are allowed to get oscillating solutions |m|iθ in
258
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
which θ pass from one quadrant to the other when time passes. Indeed values of ψ in the interval [−π/4, π/4] makes solutions (21,22) to hold only in a empty interval of time. This means that not all learning windows A(τ ) are allowed, but ˜ μ ) = aeiψ have ψ satisfying: only those such that A(ω π/4 < ψ < π/2
− π/2 < ψ < −π/4,
and so only learning windows A(t) that are enough asymmetric in time. Indeed a learning window A(t) symmetric in time reversal would give ψ = 0, due to ˜ ImA(ω) = 0, for all ω. Therefore condition on ψ in eq. (23) need a learning window enough asymmetric in time. The need of such condition (23) on ψ is related to the choice of the Heaviside activation function. From solutions (21,22) we can estimate the period of the |m| and the period of the real and imaginary part of m = |m|eiθ as a function of ψ. We find that the period of real and imaginary part is 4-times the period of the module. Note that in the case of distributed phases in [o, 2π) studied previously the module |m|, after a short transient, goes to a constant, and the real and imaginary part oscillate 1 while here, after a small transient, around zero with a period T¯ = ω1¯ = tan(ψ) it oscillate with small amplitude around a non-zero values (with a period that is 1/4 of the period of the real part of m, as in fig 2), and the real and imaginary part of m oscillate around zero with a frequency different from tan(ψ), as shown in figure 4. The frequency of oscillation of the real part of the order parameter m, estimated from the Eq. (21,22) is shown in fig. 4 and compared with results of numerical simulations of eq.(1). Numerical simulation of eq. (1) with connection Jij given by eq. (11), and F (h) =
1 + tanh(βh) 2
(23)
tends to the analytical solutions eq.(21,22) when β is enough large (such as β = 100). Clearly there are difference when β is small (such as β = 30 ) since the analytical solution has been derived when activation function of the units is the step Heaviside function, and not in the case of eq.(23). The figure 4 shows the frequency of oscillation during replay (i.e. during retrieval), measured from the numerical solutions of eq. (1) with F given by eq. (23) for different values of β = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 (blue stars), and the results are compared with the frequency of oscillation during replay that comes analytically from eq. (21,22) (red dots) where F (h) is the Heaviside function. The black line is shown only for comparisons, and it’s the analytical result ω ¯ = tan(ψ) that we get in the case of random phases ψ ∈ [0, 2π] studied in [17]. In conclusion we have studied the ability of the learning rule (11) with asymmetric temporal learning window, to encode multiple group-synchronous patterns, in such a manner that selectively the network can retrieval one of the encoded patterns, depending from the initial conditions of the network. We can study analitically the order parameters when the activation function F (h) is the Heaviside function.
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
259
Clearly, the magniture of the order parameter |m| indicate how good the encoded pattern is retrieved, i.e. the similarity between the encoded phase relationship and the spontaneous activity of the network, when the net is initialized properly (close to the encoded phase relationship). From fig.4ab we see that not all values of ψ are able to encode and retrieve properly the patterns, since there are values (−π/4 < ψ < π/4) not allowed (where spontaneous activity is not oscillating, and the net get stacked in a fixed point), and values (ψ π/2) for which the order parameter |m| is too low, therefore the network activity is not enought similar to the pattern that we try to retrieve. Since ψ, as defined in eq. (10), is the argument of the Fourier Transform of the learning window A(t), evaluated at the encoding frequency, from fig. 4, we can conclude that in order to encode group-syncrnous patterns with the proposed learning rule, we need a shape A(t) with a certain degree of asymmetry, such that ψ is in the proper interval. The situation here is different from the case of random phases in [0, 2π) [17,18], since in that case a small asymmetry (small values of ψ) was good to get encoding and retrieval, even when the Heaviside function was used, while here with group-synchronous patterns, a larger asymmetry is needed (|ψ| > π/4) to avoid that the net falls in a static fixed point (activity not oscillating in time). Finally, we note that in this paper we have studied analitically the existence of the retrieval solutions, and the stability of such solutions has been checked only with numerical simulations, in the future the stability of the solutions will be investigated analitically, analogously to the work in [18], in order to get the analitical conditons on stability of multiple patterns, with attention to the case in which the encoded patterns have not all the same frequency.
References [1] [2] [3] [4] [5] [6] [7] [8]
[9]
[10] [11]
Singer, Neuronal Synchrony: A Versatile Code for the Definition of Relations, Neuron 1999; Fries Nicolik, Singer Trends in Neurosciences Volume 30, Issue 7, July 2007, Pages 309-316 Fries A mechanism for cognitive dynamics: neuronal communication through neuronal coherence, Trend in Cognitive Science 2005. G Buzsaki, A Draguhn 2004 Science vol 304, 1926-1929 Alan Gelperin The Journal of Neuroscience, February 8, 2006, 26(6):1663-1668; MiniReview Olfactory Computations and Network Oscillation Alan Gelperin Magee J.C., and Johnston D. (1997). A synaptically controlled associative signal for Hebbian plasticity in hippocampal neurons. Science 275 p.209-212. Markram H., Lubke J., Frotscher M., Sakmann B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275 p.213. Bi G.Q., and Poo M.M. (1998). Precise spike timing determines the direction and extent of synaptic modifications in cultured hippocampal neurons. J. Neurosci. 18 p.10464:10472. Bi G.Q., and Poo M.M. (2001) Annual Review Neuroscince 24, 139-166. Debanne D., Gahwiler B.H. and Thompson S.M. (1998). Long-term synaptic plasticity between pairs of individual CA3 pyramidal cells in rat hippocampal slice cultures. J. Physiol. 507, p.237-247. Feldman D.E. (2000). Timing-based LTP and LTD and vertical inputs to layer II/III pyramidal cells in rat barrel cortex. Neuron 27, p.45-56. M.G.Kuzmina, E.A.Manykin, I.I.Surina. Lecture Notes in Computer Science, v. 930, pp. 246-251,(1995)
260 [12]
[13] [14] [15] [16] [17] [18]
[19] [20] [21] [22] [23]
[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]
[36] [37] [38]
[39] [40] [41]
S. Scarpetta et al. / Role of Temporally Asymmetric Synaptic Plasticity
Hoppensteadt F.C. and Izhikevich E.M. (1997) Weakly Connected Neural Networks. Springer-Verlag, New York. Izhikevich E.M. (1999) IEEE Transactions On Neural Networks, 10:508-526 Z. Yu, Y Yamaguchi Biol. Cybernetic (2004); Y Yamaguchi J Integrative Neuroscience vol 2, n 2 143-157 (2004). Y.Yamaguchi Biol. Cybern. 89, 19 (2003) Borisyuk R., Hoppensteadt 1999, Biol. Cybern. 81, 359:371. C. van Vreeswijk, D. Hansel, Neural Computation, 13:959-992. (2001). M. Yoshioka, S. Scarpetta, M. Marinaro Phys. Rev. E vol: 75 pag051917 2007 S. Scarpetta M. Yoshioka, M. Marinaro, Encoding and replay of Dynamic Attractors with Multiple Frequencies LECTURE NOTES IN COMPUTER SCIENCE Volume: 5286, pages 38-61, 2008 M.Marinaro S.Scarpetta Neurocomputing vol 58-60C pp 279-284 2004 S. Scarpetta, L. Zhaoping, J. Hertz Neural Computation 2002 Oct;14(10):2371-96 S. Scarpetta, L. Zhaoping, J. Hertz. NIPS 2000. Vol 13. T. Leen, T. Dietterich, V. Tresp (eds), MIT Press (2001). S. Scarpetta, L. Zhaoping, J. Hertz (2002) in Scaling and Disordered Systems pag. 292, Edt. F. Family, M. Daoud, H. Herrmann and E. Stanley, World Scientific Publishing. Hasselmo M. E. (1993). Acetylcholine and learning in a cortical associative memory, Neural Computation 5, p.32-44. Hasselmo M.E. (1999). Neuromodulation: acetylcholine and memory consolidation, Trend in Cognitive Sciences, Vol.3, No.9, p.351. Gerstner W, kempter R, van Hemmen J.Leo, Wagner H, Nature 383, 1996 Kempter R., Gerstner W. and van Hemmen L. (1999). Hebbian learning and spiking neurons Physical Review E vol. 59, num. 5, p.4498-4514 . Song S, Miller KD, Abbott LF. (2000). Competitive Hebbian learning through spiketiming-dependent synaptic plasticity. Nat Neurosci. 3(9):919-26. Rao RP, Sejnowski TJ. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural Comput. 13(10):2221-37. Extending the effects of spike-timing-dependent plasticity to behavioral timescales P. J. Drew and L. F. Abbott PNAS 2006 Wittenberg et al, J. Neurosci. 26, 6610 (2006) H. Abarbanel, R. Huerta, M. I. Rabinovich. (2002) PNAS vol 99, n.15, 10132-10137 Li Z. and Hertz J. (2000) Odor recognition and segmentation by a model olfactory bulb and cortex Network: Computation in Neural Systems 11, p.83-102. S.Scarpetta Z.Li M.Marinaro Hippocampus 2004 Metha et al Neuron 25 pp.707 2000 Florian RV, Muresan RC Phase precession and recession with STDP and anti-STDP; LECTURE NOTES IN COMPUTER SCIENCE Vol: 4131 pp:718-727 2006 Phase Precession through Synaptic Facilitation Kay Thurley Christian Leibold, Anja Gundlfinger Dietmar Schmitz Richard Kempter NEURAL COMPUTATION Volume: 20 Issue: 5 Pages: 1285-1324 Published: MAY 2008 Sjostrom, P. J., Turrigiano, G. Nelson, S. B. (2001) Neuron 32, 11491164. Froemke, R. C. Dan, Y. (2002) Nature 416, 433438. Zolt´ an N´ adasdy, Hajime Hirase, Andr´ as Czurk´ o, Jozsef Csicsvari, and Gy¨ orgy Buzs´ aki, Replay and Time Compression of Recurring Spike Sequences in the Hippocampus, The Journal of Neuroscience, November 1, 1999, 19(21):9497–9507. Sequential Experience in the Hippocampus during Slow Wave Sleep. A. Lee M. Wilson Neuron 2002 Fast-Forward Playback of Recent Memory Sequences in Prefrontal Cortex During Sleep. D.R. Euston, M. Tatsuno, Bruce L. McNaughton Science 2007 Reverse replay of behavioural sequences in hippocampal place cell s during the awake state D.J. Foster M. A. Wilson Nature 2006
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-261
261
Algorithms and topographic mapping for epileptic seizures recognition and prediction N. MAMMONE a,1 , F. LA FORESTA a G. INUSO a F.C. MORABITO a U. AGUGLIA b V. CIANCI b a DIMET - "Mediterranea" University of Reggio Calabria, ITALY b Epilepsy Regional Center, Università Magna Graecia di Catanzaro, Presidio Riuniti Reggio Calabria, ITALY Abstract. Epileptic seizures seem to result from an abnormal synchronization of different areas of the brain, as if a kind of recruitment occurred from a critical area towards other areas of the brain, until the brain can no longer bear the extent of this recruitment and it triggers the seizure in order to reset this abnormal condition. In order to catch these recruitment phenomena, a technique based on entropy is introduced to study the synchronization of the electric activity of neuronal sources in the brain. This technique was tested over 25 EEG dataset from patients affected by absence seizures as well as on 40 EEG dataset from healthy subjects. The results show an abnormal coupling among the electrodes that will be involved in seizure development can be hypothesized before the seizure itself, in particular, the frontal/temporal area appears steadily associated to an underlying high synchrony in absence seizure patients. Keywords. Entropy, Electroencephalography, Epilepsy, Brain mapping.
1. Introduction Epilepsy is one of the most common neurological disorders (about 1% of the world’s population). Two-thirds of patients can benefit from antiepileptic drugs and another 8% could benefit from surgery. However, the therapy causes side effects and surgery is not always resolving. No sufficient treatment is currently available for the remaining 25% of patients. The most disabling aspects of the disease lie at the sudden, unforeseen way in which the seizures arise, leading to a high risk of serious injury and a severe feeling of helplessness that has a strong impact on the everyday life of the patient. It is clear that a method capable of explaining and forecasting the occurrence of seizures could significantly improve the therapeutic possibilities, as well as the quality of life of epileptic patients. Epileptic Seizures have been considered sudden and unpredictable events for centuries. A seizure occurs when a massive group of neurons in the cerebral cortex begins to discharge in a highly organized rhythmic pattern then, for mostly unknown rea1 Corresponding
Author: Nadia Mammone, DIMET, University of Reggio Calabria, Reggio Calabria, Italy; E-mail: [email protected].
262
N. Mammone et al. / Algorithms and Topographic Mapping
sons, it develops according to some poorly described dynamics. Nowadays, there is an increasing amount of evidence that seizures might be predictable. In fact, as proved by the results reported by different research groups working on epilepsy, seizures appear not completely random and unpredictable events. Thus it is reasonable to wonder when, where and why these epileptogenic processes start up in the brain and how they result in a seizure. The goal of the scientific community is to predict and control epilepsy, but if we aim to control epilepsy we must understand it first: in fact, discovering the epileptogenic processes would throw a new light on this neurological disease. The cutting edge view of epileptic seizures proposed in this paper, would dramatically upset the standard approaches to epilepsy: seizures would no longer be considered the central point of the diagnostic analysis, but the entire “epileptogenic process” would be explored. The research in this field, in fact, has been focused only on epileptic seizures so far. In our opinion, as long as we focus on the seizure onset and then we try to explain what happened before in a retrospective way, we will not be able to discover the epileptogenic processes and therefore to fully understand and control epilepsy, because seizures are only a partial aspect of a more general problem. Epileptic seizures seem to result from an abnormal synchronization of different areas of the brain, as if a kind of recruitment occurred from a critical area towards other areas of the brain (not necessarily the focus) until, the brain can no longer bear the extent of this recruitment and it triggers the seizure in order to reset this abnormal condition. If this hypothesis is true, we are not allowed to consider the onset zone the sole triggering factor, but the seizure appears to be triggered by a network phenomenon that can involve areas apparently not involved in seizure generation at a standard EEG visual inspection. Seizures had been considered unpredictable and sudden events until a few years ago. The Scientific Community began being interested in epileptic seizure prediction during ’70s: some results in literature showed that seizures were likely to be a stage of a more general epileptogenic process rather than an unpredictable and sudden event. Therefore a new hypothesis was proposed: the evolution of brain dynamics towards seizures was assumed to follow this transition: inter-ictal − > pre-ictal − > ictal − > post-ictal state. This emerging hypothesis is still under analysis and many studies have been carried out: most of them have been carried out on intracranial electroencephalographic recordings (IEEG). The processes that start up in the brain and lead to seizure are nowadays mostly unknown: investigating these dynamics from the very beginning, that is minutes and even hours before seizure onset, may throw a new light on epilepsy and upset the standard diagnosis and treatment protocols. Many researchers tried to estimate and localize epileptic sources in the brain, mainly analysing the ictal stage in EEG [1], [2], [3]. If the aim is to detect and follow the epileptogenic processes, in other words to find patterns of epileptic sources activation, a long-time continuous analysis (from either the spatial and temporal point of view) of this output in search of any information about the brain-system might help. In order to understand the development of epileptic seizures, we should understand how this abnormal order affects EEG over the cortex, over time. Thus the point is: how can we measure this order? Thanks to its features, entropy might be the answer, thus here we propose to introduce it to investigate the spatial temporal distribution of order over the cortex. EEG brain topography is a technique that gives a picture of brain activity over the cortex. It consists of plotting the EEG in 2-D maps by color coding EEG features, most commonly the EEG power. EEG topography has been widely used as a tool for investigating the activity of epileptic brains but just for analyzing sparse images and not for reconstructing a global picture
N. Mammone et al. / Algorithms and Topographic Mapping
263
of the brain behaviour over time [1], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Here we propose to carry out a long-time continuous entropy topography in search of patterns of EEG behaviour. Once entropy is mapped a spatio-temporal SOM based clustering is carried out in order to put in the same cluster the electrodes that share similar entropy levels. The paper is organized as follows: Section 2 will describe the electroencephalographic potentials generation from neuronal sources and the genesis of epileptic seizures, Section 3 will introduce entropy topography and spatial clustering and Section 4 will report the results.
2. The Electroencephalography and the neuronal sources A single EEG electrode provides estimates of synaptic action averaged over tissue masses containing between roughly 100 million and 1 billion neurons. The space averaging of brain potentials resulting from extracranial recording is a data reduction process forced by current spreading in the head volume conductor. The connection between surface and depth events is thus intimately dependent on the physics of electric field behaviour in biological tissue. The synaptic inputs to a neuron are of two types: those that produce excitatory postsynaptic potentials (EPSP) across the membrane of the target neuron, thereby making it easier for the target neuron to fire an action potential and the inhibitory postsynaptic potentials (IPSP), which act in the opposite manner on the output neuron. The cortex is believed to be the structure that generates most of the electric potential measured on the scalp. The synaptic action fields can be defined as the numbers of active excitatory and inhibitory synapses per unit volume of tissue at any given time. A seizure occurs when a massive group of neurons in the cerebral cortex begins to discharge in a highly organized rhythmic pattern (increased entrainment). According to one theory, seizures are caused by an imbalance between excitatory and inhibitory neurotransmitters. Electric and magnetic fields provide large-scale, short-time measures of the modulations of synaptic and action potential fields around their background levels. Dynamic brain behaviour are conjectured by many neuroscientists to result from the interaction of neurons and assemblies of neurons that form at multiple spatial scales: part of the dynamic behaviour at macroscopic scales may be measured by scalp EEG electrodes. The amplitude of scalp potential depends strongly on the characteristic size of the underlying correlated source patches (amount of source synchronization). Quantitative methods to recognize spatial-temporal EEG patterns have been studied only sporadically. The technique proposed in this paper aims to describe such spatialtemporal patterns of EEG dynamics. Entropy will be proposed to measure the entrainment of neuronal sources, its topography will give a view of the spatial distribution of the entrainment and spatial clustering will quantify the mutual interactions among neurons.
3. Methodology 3.1. Entropy to measure “order” in the brain Since we were interested in monitoring the order degree of different areas of the brain, in this paper we proposed to map the Renyi’s entropy of EEG [13] and we compared it
264
N. Mammone et al. / Algorithms and Topographic Mapping
with the mapping of the power of EEG. Entropy can be interpreted as a measure of order and randomness. Given a signal, we can think about its amplitude as a random variable X. HRα (X) =
1 log P α (X = ai ) 1−α i
(1)
For a continuous random variable X, whose probability density function (pdf) is fX (x), Renyi’s entropy is defined as:
HRα (x) =
1 log 1−α
*
+∞
−∞
α fX (x)dx
(2)
where the order α is a free parameter (α > 0 and α = 1) and P represents the probability. In case of variable with small randomness (“high order”), a few probabilities will be close to one (P (x = ai ) → 1) and most of probabilities will be close to zero (P (x = ai ) → 0): the overall contribution to entropy will be low, because the argument of the logarithm tends to 1. In case of a variable with large randomness (“low order”), the probabilities will be uniformly distributed (P (x = ai ) → 1/N , ∀i = 1, ..., N ) and the entropy will be high, because the argument of the logarithm tends to N 1−α , therefore HRα (x) → logN . The order α was set at 2 and the pdf was estimated using kernel estimators [13]. Entropy was estimated, for each channel, within 1sec non-overlapping windows. We arranged the entropy time series of each channel as a row of a matrix (from now on named matrix X) whose t column represented the set of the entropy values associated with the n electrodes at a certain window t. EEG power was estimated the same way. For each window, we plotted a 2D map for EEG entropy and a 2D map for EEG power exploiting a function from the toolbox EEGLAB [14]. For each electrode, the corresponding value was plotted, encoding the values according to a continuous color scale going from blue (low values) to red (high values). The range of the scale was fixed, so that we could detect the overall variation over time as well as the local in time entropy spatial distribution. The colour of the spatial points lying between the electrodes are calculated by interpolation, thus a smooth gradation of colors is achieved. Every figure was captured as a frame of a movie and each frame was associated to the corresponding time, so that it was possible to keep the time information and therefore to identify the critical stages while reviewing the movie itself. During the visualization we followed the topographic trends of the most active areas in terms of lowest and highest entropy levels, in other words, we were in search of the electrodes that had been associated with low-entropy or high-entropy values for the longest time and we called these electrodes the “most active”. 3.2. Entropy topography clustering In order to quantify the information that we could appreciate visually reviewing the movie and in order to automatize the analysis of the evolution of entropy spatial distribution, a spatio-temporal model was necessary. One neurocomputational approach to timeseries analysis, self-organizing maps (SOM), is designed so that certain relationships between features in the input vectors’ components are reflected in the topology of the re-
N. Mammone et al. / Algorithms and Topographic Mapping
265
sultant clusters’ weight vectors obtained through unsupervised training using Kohonen based learning [15]. Matrix X was partitioned into 1min non-overlapping windows. Our aim was to subdivide the head, window by window, into a certain number of areas so that the electrodes sharing similar entropy levels could be clustered together, with particular attention to the areas associated to the extreme entropy levels: the highest entropy region would reflect underlying high random activity, the lowest entropy region would reflect low random activity instead. The areas associated with intermediate entropy would account for “neutral” areas. We decided to subdivide the head into four regions: high entropy area, low entropy area and two intermediate entropy areas. In other words, we needed the rows (channels) of our matrix X to be clustered into four clusters, window by window. Therefore, we designed a SOM with 4 processing elements and we implemented it in Matlab. A SOM was trained over each one of the windows coming up with a spatio-temporal clustering of the channels based on their entropy levels.
Figure 1. The international 10-20 system seen from (A) left and (B) above the head. A = Ear lobe, C = central, Pg = nasopharyngeal, P = parietal, F = frontal, Fp = frontal polar, O = occipital.
In order to quantify the visual review of the movies, as described in Section 3.2, we clustered the electrodes into a low-entropy cluster (LEC), a high-entropy cluster (HEC) and two neutral clusters (NEC1 and NEC2), passing the EEG entropy matrix X through the SOM. Window by window, we identified which one of the four processing elements was associated to the LEC, HEC and NEC clusters, so that we could come up with an homogeneous spatio-temporal clustering of the electrodes according to their entropy levels. Once we had come up with the clustering, we estimated how often an electrode belonged to each one of the clusters. In other words, we estimated how often an electrode was active in terms of either high-entropy and low-entropy: we quantified how often (represented in time % with respect to the overall length of the EEG recording) each electrode belonged to the low-activity cluster, to the high-activity cluster or to the neutral clusters. This way we could define a “frequency of membership” of an electrode to each cluster. In order to give a spatial view at a glance of this quantification and to visually associate relative frequency of membership with the location of the corresponding electrodes, the relative frequency of the membership to clusters, with respect to the overall length of the EEG recording, was represented as a map: the electrodes were represented with black dots whereas the membership was coded with a coloration going from dark blue, for a high relative frequency of membership to cluster LEC (low-Entropy cluster),
266
N. Mammone et al. / Algorithms and Topographic Mapping
to dark red, for a high relative frequency of membership to cluster HEC (high-Entropy cluster). Yellow and green coloration means membership to cluster NEC.
Figure 2. The relative frequency of membership of the electrodes to entropy clusters throughout the entire EEG recording. Each map is associated to a patient affected by absence seizures: the electrodes are represented with black dots whereas the membership to the clusters is coded with a coloration going from dark blue, for a high relative frequency of membership to cluster LEC (low-Entropy cluster), to dark red, for a high relative frequency of membership to cluster HEC (high-Entropy cluster). Yellow and green coloration means membership to cluster NEC (neutral-Entropy Cluster).
N. Mammone et al. / Algorithms and Topographic Mapping
267
Figure 3. The relative frequency of membership of the electrodes to entropy clusters throughout the entire EEG recording. Each map is associated to a healthy subject: the electrodes are represented with black dots whereas the membership to the clusters is coded with a coloration going from dark blue, for a high relative frequency of membership to cluster LEC (low-Entropy cluster), to dark red, for a high relative frequency of membership to cluster HEC (high-Entropy cluster). Yellow and green coloration means membership to cluster NEC (neutral-Entropy Cluster).
268
N. Mammone et al. / Algorithms and Topographic Mapping
Figure 4. The pie chart represents the average membership degree of each group of electrodes (frontal, temporal, central, parietal, occipital) to the low-entropy cluster. The membership degree was averaged, for each group of electrodes, over the patients (a) and over the healthy subjects (b). The frontal-temporal area appears associated to low-entropy levels more in the absence seizure patients than in the healthy subjects.
4. Results 4.1. Data description The analyzed dataset consists in 25 EEG recordings with 18 to 20-channels (Figure 1) from patients affected by absence seizures and 40 recordings from normal subjects. The EEG was high pass filtered at 0.5Hz and low-pass filtered at 70Hz and the sampling rate was set at 256Hz. In absence seizures, the person may appear to be staring into space with or without jerking or twitching movements of the eye muscles. These periods last for seconds, or even tens of seconds. Those experiencing absence seizures sometimes move from one location to another without any purpose. The neurologist think that, during the ictal stages, the critical area in the brain of patients affected by absence seizures is the frontal one. 4.2. Electrodes clustering Each one of these recordings was analysed as described in Section 3. Entropy was first estimated for each channel, window by window and then, the spatial clustering was carried out as detailed in Section 3.2. The results of clustering are plotted in Figure 2 for the patients affected by absence seizures and in Figure 3 for the healthy subjects. Looking at Figure 2 we can realize that low entropy was concentrated in the frontal/temporal areas of the brain in most of cases, except patients A5 and B3. Sometimes we can see an involvement of the occipital area together with the frontal/temporal areas (patients A2, B5 and D3). Healthy subjects had rather random entropy distributions, as we can realize looking at Figure 3. We can point out a very important result: the electrodes belonging to the frontal/temporal area (that is critical for absence patients) resulted clustered together throughout the EEG recording and not only during the ictal stage, this means that an abnormal coupling among the electrodes that will be involved in seizure development can be hypothesized
N. Mammone et al. / Algorithms and Topographic Mapping
269
before the seizure itself. Moreover, the electrodes resulted clustered together in the lowentropy (low randomness) cluster: this means that the frontal/temporal area is steadily associated to an underlying high synchrony. Healthy subjects show a quite random entropy distribution, there is not any specific area of the brain involved with low-Entropy. In order to summarize these results, we averaged the membership of each electrode to each cluster over the subjects, thus achieving an average frequency of membership of the electrodes to the low-entropy cluster for the group of absence patients and an average frequency of membership of the electrodes to the low-entropy cluster for the group of healthy subject. Then we grouped the electrodes according to the classic structure of the brain: frontal lobe, temporal lobes, parietal lobes, occipital lobe, central lobe. We represented the average frequency of membership of each group of electrodes to the lowentropy cluster (with respect to the overall time duration of the EEG recording) with a pie chart: in absence patients (Figure 4).a low-entropy cluster was dominated by the frontal-temporal for the 76% of time whereas for the healthy subjects group (Figure 4.b) low-entropy cluster was dominated by the frontal-temporal only in the 55% of time, that means a more random behaviour.
5. Conclusions In this paper, the issue of synchronization in the electric activity of neuronal sources in the epileptic brain was addressed. A new Electroencephalography (EEG) mapping, based on entropy, together with a SOM based spatial clustering, was introduced to study this abnormal behaviour of the neuronal sources. Renyi’s entropy was proposed to measure the randomness/synchrony of the brain. A set of 25 EEG recordings from patients affected by absence seizures were analyzed together with a set of 40 EEG recorded from healthy subjects. The electrodes belonging to the frontal/temporal area (that seem to be critical for absence patients) resulted clustered together throughout the EEG recording and not only during the ictal stage: this means that an abnormal coupling among the electrodes that will be involved in seizure development can be hypothesized before the seizure itself. Moreover, the electrodes resulted clustered together in the low-entropy (low randomness/low order) cluster during 76% of the overall EEG recording: this means that the frontal/temporal area appears steadily associated to an underlying high synchrony. Healthy subjects showed a quite random entropy distribution instead, there is not any specific area of the brain involved with low-Entropy. In the future, patterns of entropy activation will be investigated.
References [1]
Im, C.H., Jung, H.K., Jung, K.Y., Lee, S.Y.: Reconstruction of continuous and focalized brain functional source images from electroencephalography. IEEE Trans. on Magnetics 43(4) (2007) 1709–1712 [2] Im, C.H., Lee, C., An, K.O., Jung, H.K., Jung, K.Y., Lee, S.Y.: Precise estimation of brain electrical sources using anatomically constrained area source (acas) localization. IEEE Trans. on Magnetics 43(4) (2007) 1713–1716 [3] Knutsson, E., Hellstrand, E., Schneider, S., Striebel, W.: Multichannel magnetoencephalography for localization of epileptogenic activity in intractable epilepsies. IEEE Trans. on Magnetics 29(6) (1993) 3321–3324
270 [4] [5] [6]
[7]
[8] [9] [10] [11]
[12] [13]
[14] [15]
N. Mammone et al. / Algorithms and Topographic Mapping
Sackellares, J., Iasemidis, L., Gilmore, R., Roper, S.: Clinical application of computed eeg topography. In Duffy, F.H., ed.: Topographic Mapping of Brain Electrical Activity. Boston, Butterworths (1986) Nuwer, M.R.: Quantitative eegs. Journal of Clinical Neurophysiology 5 (1988) 1–86 Babiloni, C., Binetti, G., Cassetta, E., Cerboneschi, D., Dal Forno, G., Del Percio, C., Ferreri, F., Ferri, R., Lanuzza, B., Miniussi, C., oretti, D.V.M., Nobili, F., Pascual-Marqui, R.D., Rodriguez, G., Romani, G., Salinari, S., Tecchio, F., Vitali, P., Zanetti, O., Zappasodi, F., Rossini, P.M.: Mapping distributed sources of cortical rhythms in mild alzheimer’s disease. a multicentric eeg study. NeuroImage 22 (2004) 57–67 Miyagi, Y., Morioka, T., Fukui, K., Kawamura, T., Hashiguchi, K., Yoshida, F., Shono, T., Sasaki, T.: Spatio-temporal analysis by voltage topography of ictal electroencephalogram on mr surface anatomy scan for the localization of epileptogenic areas. Minim. Invasive Neurosurg. 48(2) (1988) 97–100 Ebersole, J.S.: Defining epileptic foci: past, present, future. Journal of Clinical Neurophysiology 14 (1997) 470–483 Scherg, M.: From eeg source localization to source imaging. Acta Neurol. Scand. 152 (1994) 29–30 Tekgul, H., Bourgeois, B., Gauvreau, K., Bergin, A.: Electroencephalography in neonatal seizures: comparison of a reduced and a full 10/20 montage. Pediatr. Neurol. 32(3) (2005) 155–161 Nayak, D., Valentin, A., Alarcon, G., Garcia Seoane, J.J., Brunnhuber, F., Juler, J., Polkey, C.E., Binnie, C.D.: Characteristics of scalp electrical fields associated with deep medial temporal epileptiform discharges. Journal of Clinical Neurophysiology 115(6) (2004) 1423–1435 Skrandies, W., Dralle, D.: Topography of spectral eeg and late vep components in patients with benign rolandic epilepsy of childhood. J. Neural Transm. 111(2) (2004) 223–230 Hild II, K.E., Erdogmus, D., Principe, J.C.: On-line minimum mutual information method for time varying blind source separation. In: 3rd International Conference on Independent Component Analysis And Blind Signal Separation. (2001) 126–131 Delorme, A., Makeig, S.: Eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis. Journal of Neuroscience Methods 134 (2004) 9–21 Kohonen, T.: Self-Organizing Maps. Series in Information Sciences Vol. 30. Springer, Heidelberg (1995)
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-271
271
Computational Intelligence Methods for Discovering Diagnostic Gene Targets about aGVHD Maurizio Fiaschéa,1, Maria Cuzzolab, Roberta Fedeleb , Domenica Princib, Matteo a a b a Cacciola , Giuseppe Megali , Pasquale Iacopino and Francesco C. Morabito a
b
DIMET, University Mediterranea of Reggio Calabria Transplant Regional Center of Stem Cells and Cellular Therapy, ”A. Neri”, Reggio Calabria
Abstract. This is an application paper of applying classical statistical methods and standard technique of computational intelligence to identify gene diagnostic targets for accurate diagnosis of a medical problem - acute graft-versus-host disease (aGVHD). This is the major complication after allogeneic haematopoietic stem cell transplantation (HSCT), an immunomediated disorder that is driven by allospecific lymphocytes since aGVHD does not occur after autologous stem-cell transplantation. In this paper we analyzed gene-expression profiles of 47 genes associated with allo-reactivity in 59 patients submitted to HSCT. We have applied 2 feature selection algorithms combined with a classifier to detect the aGVHD at on-set of clinical signs. This is a preliminary study and the continuance of our works which tackles both computational and biological evidence for the involvement of a limited number of genes for diagnosis of aGVHD. Directions for further studies are outlined. Keywords. Feature Selection, Gene targets, GEP, GVHD, SNR, ANN
Introduction With the completion of the first draft of the human genome, the task is now to be able to process this vast amount of ever growing dynamic information and to create intelligent systems for detection, prediction and knowledge discoveries about human pathology and disease. Genes are complex chemical structures and they cause dynamic transformation of one substance into another during the whole life of an individual, as well as the life of the human population over many generations. When genes are in action, the dynamics of the processes in which a single gene is involved are very complex, as this gene interacts with many other genes and mediators, and is influenced by many environmental factors. The whole process of DNA transcription and gene translation is continuous and it evolves over time. The genes in an individual may mutate, change slightly t heir code, and may therefore express differently at a next time. So, genes may change, mutate, and evolve in a life time of 1
Corresponding Author: Maurizio Fiasché, DIMET, University “Mediterranea” of Reggio Calabria, Via Graziella, Feo di Vito, 89100 Reggio Calabria, Italy; E -mail: [email protected]
272
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
living organism. Modeling these events, learning about them and extracting knowledge, is major goal for bioinformatics [1,2].Bioinformatics is concerned with the application of the methods of information sciences for the analysis, modeling and knowledge discovery of biological phenomena such as genetic processes [1,2]. The potential applications of microarray technology are numerous and include identifying markers for classification, diagnosis, disease outcome prediction, target identification and therapeutic responsiveness. However, the choice of which candidate genes to study can be daunting, considering the approximately many thousands known genes. There are several approaches to this dilemma. Diagnosis and prediction of the biological state/disease is likely to be more accurate analyzing gene expression profiles (GEPs) of specific molecular clusters performed by macroarray analysis. Based on profile, it ’s possible to set a diagnostic test, so a sample can be taken from a patient, the data related to the sample processed, and a profile related to the sample obtained [2]. This profile can be matched against existing gene profiles and based on similarity, it can be confirmed with a certain probability a diagnosis of disease or if the patient is at risk of developing it in the future. We apply this approach to detect acute graft-versus-host disease (aGVHD) in allogeneic hematopoietic stem cell transplantation (HSCT), a curative therapy for several malignant and non malignant disorders [3]. Acute GVHD remains the major complication and the principal cause of mortality and morbility following HSCT [4,5]. At present, the diagnosis of aGVHD is merely based on clinical criteria and may be confirmed by biopsy of one of the 3 target organs (skin, gastrointestinal tract, or liver) [6]. Based on clinical observation and without tissue biopsy, a clear diagnosis of GVHD syndromes can be difficult. For adverse transplant reactions, a diagnostic tool of similar diagnostic precision is currently lacking. In fact, there is no definitive diagnostic blood test for aGVHD, although a lot of blood proteins have been described as potential biomarkers in small studies [8,9]. A recent report indicates a preliminary molecular signature of aGVHD in allogeneic HSCT patients [10]. In the current project, our primary objective was to validate a novel and not invasive method to confirm the diagnosis of aGVHD in HSCT patients at onset of clinical symptoms. For this purpose, a database has been built by pre-processing experimental measures, and features were selected to enable a good class separation without using all features and facing the “curse of dimensionality” problem, i.e., an excessive number of training inputs that increases the system complexity without remarkable advantages in terms of prediction performances. This problem can be considered as a typical inverse problem of pattern classification starting from experimental database. The proposed approach, useful to detect the presence of aGVHD, exploits Principal Component Analysis (PCA) combined with signal-to-noise ratio (SNR) filtering and a Correlation-based Feature Selection (CFS) algorithm to select the most important features (genes) for the diagnosis. We have therefore used a suitable Artificial Neural Network (ANN), particularly Multi-Layer Perceptron (MLP) with back-propagation, demonstrating that a combined use of classification and feature selection approaches makes it possible to select relevant genes with high confidence. This paper discusses both computational and biological evidence to confirm the early statement of aGVHD based on selected genetic diagnostic markers. The organization of the rest of the paper is as follows: in section 1 the data analyzed and dimensionality reduction techniques in order to reduce the number of variables are described; in section 2 the neural network architecture is given; finally, in section 3 and 4 the results of the diagnostic method are discussed and conclusions are inferred with some possible future applications.
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
273
1. Methodology Feature selection is the process of choosing the most appropriate features (variables) when creating a computational model [11]. Feature evaluation is the process of establishing how relevant to the problem in hand are the features used in the model. There are different groups of methods for feature selection: x Filtering methods: the features are ‘filtered’, selected and ranked in advance, before a model is created (e.g. a classification model). Traditional filtering methods are: correlation, t-test, and signal-to-noise ratio. x Wrapping methods: features are selected on the basis of how well the created model performs using these features. In this paper we consider two methods to feature subset selection. They differ in the evaluation of feature subsets. The first method is a combination of a feature extraction technique based on the variance of data (PCA) and a classical filter method (SNR), the second method is the CFS. As a filter approach, CFS was proposed by Hall [12]. The rationale behind this algorithm is “a good feature subset is one that contains features highly correlated with the class, yet uncorrelated with each other”. It will be shown later in this paper that combining CFS with a suitable ANN, provides a good classification accuracy for diagnosis of aGVHD. 1.1. Experimental Data Fifty-nine HSCT patients were enrolled in our study between March 2006 and July 2008 in Transplants Regional Center of Stem Cells and Cellular Therapy "A. Neri" Reggio Calabria, Italy, during a Governative Research Program, The Integrated Oncological Program of the Italian Ministry of Health with the title: " Predictive and prognostic value for graft versus host disease of chimerism and gene expression” Because experimental design plays a crucial role in a successful biomarker search, the first step in our design was to choose the most informative specimens and achieve adequate matching between positive cases aGVHD (YES) and negative controls aGVHD (NO) to avoid bias. This goal is best achieved through a database containing high-quality samples linked to quality controlled clinical information. Patients with clinical signs of aGVHD (YES) were selected, and in more than 95% of them aGvHD was confirmed by biopsy including those with grade I. We used 26 samples from aGVHD (YES) patients that were taken at the time of diagnosis and we selected 33 samples from patients that didn’t experienced aGVHD (NO). All together YES/NO patient groups comprised a validation set. Total RNA was extracted from whole peripheral blood samples using a RNA easy Mini Kit (Qiagen) according to the manufacturer’s instructions. Reverse transcription of the purified RNA was performed using Superscript III Reverse Transcriptase (Invitrogen). A multigene expression assay to test occurrence of aGVHD were carried out with TaqMan® Low Density Array Fluidic (LDA-macroarray card) based on Applied Biosystems 7900HT comparative dd CT method, according to manufacturer’s instructions. Expression of each gene was measured in triplicate and then normalized to the reference gene 18S mRNA, who was included in macroarray card. About the project of macroarray card, we selected 47 candidate genes from the published literature, genomic databases, pathway analysis. The 47 candidate genes were involved in immune network and inflammation pathogenesis.
274
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
1.2. Dimensionality reduction approach 1.2.1. Features extraction with PCA In statistics, PCA [13] is a technique that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components (PCs). PCA can be used for dimensionality reduction in a data set while retaining those characteristics of the data set that contribute most to its variance, by keeping lowerorder PCs and ignoring higher-order ones. Such low-order components often contain the “most important” aspects of the data, considering , moreover, that data set has been preliminarily dewhitened. PCA has the distinction of being the optimal linear transformation for keeping the subspace that has the largest variance. In this study we have applied PCA to reduce numbers of variables, but also to select variables with major weight for each PC [14]. We found for diagnosis aGVHD(Yes) 2 PCs, with cumulative variance of 64% (Table 1) and a third PC very minor of the second PC, so observing also the screen plot (Figure 1) associated we can take the first 2 PCs and selecting the heaviest variables. We found for diagnosis aGVHD(No) 2 PCs, with cumulative variance of 57% (Table 2) and a second PC very minor of the first PC, so observing also the screen plot associated (Figure 2) we can take the first 2 PCs and selecting the heaviest variables. For this analysis we have applied the first heuristic method and in particular, the third heuristic method (the graphic rule). Moreover, we have chosen the variables for each PC that have a weight > 0.89. These variables composed a subset of 23 genes present in the first two principal components of aGVHD(Yes) and aGVHD(No). The list of extracted genes included mediators of T helper 1 and T helper 2 (Th2) cell responses, CD8 positive cytotoxic cell activators, homing, vascular mediators and inflammation-factors are shown in Table 3. Variables of the first two PCs of aGVHD(No) is included in the subset of the first two principal components aGVHD(Yes). Our hypothesis is that this presence strengthens the importance of these genes for training an intelligent system. Any genes in aGVHD(Yes) are not in group of aGVHD(No) but they have a very important weight in the first PC of aGVHD(Yes), so we have taken also them in group of 23 genes. It’s very important that others genes have residual in the 2 PCs < 0.75. TABLE 1. The first two components for aGVHD(Yes): eigenvalues, single variance and cumulative variance calculated with SPSS ®. Initial Elgenvalues
Component 1
Total 16.125
2
14.124
% of variance 34.309 30.050
Cumulative % 34.309 64.359
TABLE 2. The first two components for aGVHD(No) output: eigenvalues, single variance and cumulative variance calculated with SPSS ®. Initial Elgenvalues
Component 1
Total 19.02
2
7.775
% of variance 40.468 16.542
Cumulative % 40.468 57.01
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
Figure 1. Screen plot for aGVHD Yes calculated with SPSS®. It is possible see the breakpoint for the second PC.
275
Figure 2. Screen plot for aGVHD No Calculated with SPSS ®. It is possible see the breakpoint for the second PC.
1.2.2. SNR filtering The SNR method evaluates how important a variable is to discriminate samples belonging to different classes [1,2]. For the case of a two-class problem, a SNR ranking coefficient for a variable x is calculated as an absolute difference between the mean value M1x of the variable for class 1 and the mean M 2 x of this variable for class 2, divided to the sum of the respective standard deviations: SNR _ x abs( M 1x M 2 x ) /( Std 1x Std 2 x ) (1) Now we want to measure the SNR value of these 23 genes. So we apply the SNR filtering to all 47 genes. The result is show in Figure 3 where the genes selected as top discriminating features for the two class aGVHD(YES) and aGVHD(No) are the 23 genes individuated with PCA technique. This confirm, as expected, the GEP is a method with low noise. The ranking of the 23 genes is shown in Figure 3. Now it’s indispensable to have confirm or not of this analysis making a comparison with an adequate feature selection algorithm for selecting a robust subset of genes. 1.3. Feature Subset Selection Feature Selection is a technique used in machine learning of selecting a subset of relevant features to build robust learning models. The assumption here is that not all genes measured by a macroarray method are related to aGVHD classification. Some genes are irrelevant and some are redundant from the machine learning point of view [15]. It is well-known that the inclusion of irrelevant and redundant information may harm performance of some machine learning algorithms. Feature subset selection can be seen as a search through the space of feature subsets. CFS evaluates a subset of features by considering the individual detector ability of each feature along with the degree of redundancy between them. CFSs
krcf k k (k 1)r ff
(1)
276
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
Figure 3. This figure shown the ranking of the 23 genes selected with a SNR filter, they are ordered for SNR values. The number in ordinate axis is the number of column in the database, this is the association: 2(C3), 16(EGR2), 47(VEGF), 34(IRF1), 26(IL12A), 45(SLPI), 41(PIAS1), 25(IL10), 46(STAT6), 39(NFKB2), 32(IL4), 9(CD52), 3(CASP1), 18(FOS), 43(SELP), 44(SER3SER4), 33(IL6), 8(CCR4), 15(EGR1), 35(IRF7), 40(NOS2A), 42(PTN), 38(MMP9).
Where: x CFS S is the score of a feature subset S containing k features, x rcf is the average feature to class correlation (f אS), x
rff is the average feature to feature correlation.
The distinction between normal filter algorithms and CFS is that while normal filters provide scores for each feature independently, CFS presents a heuristic “merit” of a feature subset and reports the best subset it finds. To select the genes with CFS, we have: a) Choose a search algorithm; b) Perform the search, keeping track of the best subset encountered according to CFSS, c) Output the best subset encountered. The search algorithm we used was best-first with forward selection, which starts with the empty set of genes. The search for the best subset is based on the training data only and it’s very important choose the correct training data set (e.g. patients without alterations for parallel treatment) so the technique take the best performance subset. Once the best subset has been determined, and a classifier has been built from the training data (reduced to the best features found), the performance of that classifier is evaluated on the test data. The 13 genes selected here are included in the set of 23 genes selected using the PCA/SNR and are reported in Table 3. A leave-one-out cross validation procedure was performed to investigate the robustness of the feature selection procedures. In 29 runs, the subset of 13 genes was selected 28 times (96%) by CFS. Now it is possible use a classifier to advantages and disadvantages of the feature selection methods.
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
277
Table 3. The 23 Genes selected with PCA/SNR method and their name and meaning. The 13 genes selected from CFS are marked with *. Gene Name
Official full name
Immune function
BCL2A1* CASP1* CCL7* CD83* CXCL10*
BCL2-related protein A1 Caspase 1, apoptosis-related cysteine peptidase chemokine (C-C motif) ligand 7 CD83 molecule chemokine (C-X-C motif) ligand 10
EGR2*
Early growth response 2
FAS*
TNF receptor superfamily, member 6)
ICOS*
Inducible T-cell co-stimulator
IL4* IL10* SELP* SLPI* STAT6*
IL12A IRF1 IL6 IRF7 MMP9 NFKB2 NOS2A PIAS1
Interleukin 4 Interleukin 10 selectin P Stomatin (EPB72)-like 1 Transducer and activator of transcription 6, interleukin-4 induced Complement factor 3 chemokine (C-C motif) receptor 4 CD52 molecule Early growth response 1 v-fos FBJ murine osteosarcoma viral oncogene homolog Interleukin 12 sub-unitA Interferon regulatory factor 1 Interleukin 6 Interferon regulatory factor 7 Matrix metallopeptidase 9 Nuclear factor of kappa Nitric oxide synthase 2A Protein inhibitor of activated STAT, 1
Anti- and pro-apoptotic regulator. Central role in the execution-phase of cell apoptosis. Substrate of matrix metalloproteinase 2 Dendritic cells regulation. Pleiotropic effects, including stimulation of monocytes, natural killer and T-cell Migration, and modulation of adhesion molecule expression. Transcription factor with three tandem C2H2 -type zinc fingers. Central role in the physiological regulation of programmed cell death. Plays an important role in cell-cell signaling, immune responses, and regulation of cell proliferation. Immune regulation. Immune regulation . Correlation with endothelial cells. Elemental activities such as catalysis. Regulation of IL4- mediated biological responses. Complement Protein inhibitor of activated STAT, 1 CD52 molecule transcriptional regulator. Oncogene
PTN SELP,
Pleiotrophin P-selectin
SER3SER4 VEGF
Serpin peptidase inhibitor, Vascular endothelial growth factor A
C3 CCR4 CD52 EGR1 FOS
Immune regulation. Immune regulation. Immune regulation. Immune regulation. Tissue remodeling Inflammatory events Reactive free radical Nuclear receptor transcriptional coregulator Heparin binding growth factor 8 Leukocyte-endothelial cell adhesion molecule Inflammatory events Inflammation, endothelial damage
2 NEURAL NETWORK MODEL FOR EARLY DIAGNOSIS USING THE SELECTED GENE DIAGNOSTIC MARKERS Artificial neural networks (ANNs) are commonly known as biologically inspired, highly sophisticated analytical techniques, capable of modeling extremely complex non-linear functions. Formally defined, ANNs are analytic techniques modeled after the processes of learning in the cognitive system and the neurological functions of the brain and capable of predicting new observations (on specific variables) from other
278
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
observations (on the same or other variables) after executing a process of so-called learning from existing data [16]. Here we want to make a comparison between the evaluations of the two model built with the two different feature selection methods. So, we have used a popular ANN architecture called MLP with back-propagation (a supervised learning algorithm). The MLP is known to be a robust function approximator for prediction/classification problems. The training data set had 29 patient samples (13 aGVHD(Yes) and 16 aGVHD(No)). The test data set consisted of 30 patient samples (13 aGVHD(Yes) and 17 aGVHD(No)). After the step of test, final results has been obtained according Figure 5. The ANN’s outputs were: x 0, if aGVHD diagnosis was Yes; x 1, if aGVHD diagnosis was No. The ANN based system was trained with adaptive rate of learning during a period of 400 epochs. We have used once the MLP with the 23 genes as input, and once more with the 13 genes as input of the net. The ANN, according to a consequence of the Kolmogorov’s theorem [17], has a hidden layer with 47 neurons the first time (for the 23 genes) and the second time with 27 neurons (for the 13 genes); activation functions are: tan-sigmoid between input and hidden layer, and pure linear between hidden and output layer (Figure 4). After the training phase, the ANN has been tested; final results are shown in section 3.
Figure 4. Structure of the implemented ANN.
3 RESULTS In these ANNs, the selected genes from feature selection methods were inputs and the evaluation of syndrome was output. We have explored different kinds of ANN [18], have compared them to improve results and our experimental runs also proved the notion that for this type of classification problems MLP performs better than other ANN architectures such as radial basis function (RBF), recurrent neural network (RNN), and self-organizing map (SOM). The final obtained results were good, and tell us that it was possible to diagnose the aGVHD using a restrict number of variables. With the subset obtained from the CFS method only 1 case escaped our classification model (Figure 5), which achieves 96% accuracy in a leave one-out cross-validation on the training set and 97% on the test data set (Table 4). We had obtained the same results on the testing set with the PCA/SNR method, but using more variables (23). This give evidence about the reproducibility and the reliability of our analysis. In patients with aGVHD (YES), level expression of immune gene pattern showed a different behaviour: BCL2A1, CASP1, CCL7, CD83 were up-expressed than reference normal value (it’s assumed to be = 1). For these genes it’s very important to establish the cut-off expression value correlating with event.
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
279
Table 4. Experimental results of a CFS and a PCA/SNR method combined with an ANN classifier.
Method CFS-ANN PCA/SNR-ANN
Training set 28(29) -
Test set 29(30) 29(30)
In contrast, CXCL10, EGR2, FAS, ICOS, IL-4, IL-10, SELP, SLP1, STAT6 was always down-regulated during aGvHD and before of pharmacological treatment. Endothelial mediators transcripts did not showed an explici t behaviour. When clinical manifestation was resulted, expression level of all significant genes was strongly increased. In aGVHD (NO) group, for all genes the transcriptosome expression showed a very high value.
Figure 5. Observed classes of patients and results obtained by ANN.
4 CONCLUSION AND FUTURE WORK We examined the immune transcripts to study the applicability of gene expression profiling (macroarray) as a single assay in early diagnosis of aGVHD. Our interest was to select a low number of molecular biomarkers from an initial gene panel and exploiting this to validate a fast, easy and non-invasive diagnostic tool. The proposed method provides a good overall accuracy to confirm aGVHD development in HSCT setting, as istology demonstrated. Concerning biological point of view, our results were highly reliable: According to previous reports, mediators of TH2/TH1 cell responses were involved [19, 20]. Acute and chronic GVHD are clinical entities that affect mucous membranes too. Although the epithelial compartment is most affected by the immune injury during GVHD, endothelial cells are also involved in the disease process. Biological data extracted by computational method suggested that endothelium-derived molecules (such as VEGFA, eNOsA, MMP9) could be useful as biomarkers for early GVHD. This result focused on pathological signs of immunemediated vascular injury suggesting that endothelium was a target tissue of GVHD in humans, as previous reported [21]. All together our results strongly outlined the importance and utility of non-invasive tool for aGVHD diagnosis based on GEP. We believe that to achieve an advantage from GEP performance, it’s very important known the transcript levels of immune effector cells in early time post-engraftment. In conclusion, in current practice, tissue biopsies are performed to confirm this diagnosis and our molecular tool may obviate the need for an invasive procedure. Indeed, with this study it is possible to assert that, our computational intelligence approach confirms
280
M. Fiasché et al. / Computational Intelligence Methods for Discovering Diagnostic Gene Targets
an early aGVHD with 97% accuracy in the test data set of HSCT population at first clinically diagnosed. Moreover it is necessary extended the system as a personalized model to capture peculiarity of patients, also an optimization method [22,23] and a comparison among different subset selection algorithms is important to improve performance of this model. The authors are actually engaged in this direction.
References [1] N. Kasabov, Evolving Connectionist Systems: The Knowledge Engineering Approach, 2nd ed., Springer, London, 2007. [2] Kasabov, N , I.A. Sidorov, D S Dimitrov, Computational Intelligence, Bioinformatics and Computational Biology: A Brief Overview of Methods, Problems and Perspectives.J. Comp. and Theor. Nanosc., Vol. 2 No. 4 (2005), 473-491. [3] F. R Appelbaum, Haematopoietic cell transplantation as immunotherapy, Nature 411 (2001), 385–389. [4] D. Weisdorf, Graft vs. Host disease: pathology, prophylaxis and therapy: GVHD overview, Best Pr. & Res. Cl. Haematology 21(2008), No. 2, 99-100. [5] P. Lewalle, R. Rouas, and P. Martiat, Allogeneic hematopoietic stem cell transplantation for malignant disease: How to prevent graft -versus-host disease without jeopardizing the graft-versus-tumor effect?. Drug Discovery Today: Therapeutic Strategies | Immunological disorders and autoimmunity , vol.3, No.1 (2006). [6] J.L. Ferrara, Advances in the clinical management of GVHD, Best Pr. & Res. Cl. Haematology, vol. 21, No. 4 (2008), 677 -682. [7] D. Przepiorka, D. Weisdorf, and P. Martin, Consensus Conference on acute GVHD grading. Bone Marrow Transplanation. 15 (1995), 825-828. [8] S. Paczesny, J.E. Levine, T.M. Braun, andJ.L. Ferrara, Plasma biomarkers in Graft-versus-Host Disease: a new era?. Biology of Blood and Marrow Transplantation, 15 (2009), 33-38 . [9] S. Paczesny, I. K. Oleg, and M. Thomas, A biomarker panel for acute graft-versus-host disease. Blood, 113 (2009), 273-278. [10] M.P. Buzzeo, J. Yang, G. Casella, and V. Reddy, A preliminary gene expression profile of acute graftversus-host disease. Cell Transplantation, 17, No. 5 (2008), 489-494. [11] N. Pal, Connectionist approaches for feature analysis, Neuro-Fuzzy techniques for intelligent information systems, Physica - Verlag, Heidelberg, (1999), 147-168. [12] M.A. Hall, Correlation-based feature selection for machine learning. Ph.D. Thesis. Department of Computer Science, University of Waikato (1999). [13] P.J.A. Shaw, Multivariate Statistic for the Environmental Sciences. Hodder-Arnold, 2003. [14] S. Ozawa, K. Matsumoto, S. Pang and N. Kasabov, Incremental Principal Component Analysis Based on Adaptive Accumulation Ratio . LNCS, Springer, 2004. [15] Y. Wang, I. V. Tetko, M. A. Hall, E. Frank, A. Facius, K. F.X. Mayer and H. W. Mewes, Gene selection from microarray data for cancer classification -a machine learning approach, Computational Biology and Chemistry, 29, Issue 1(2005), 37-46. [16] C. Bishop, Neural Networks for Pattern Recognition, Calderon-Press, Oxford, 1995. [17] V. Kurkova, Kolmogorov’s theorem and multilayer neural networks. N. Net. 5 (1992), 501–506. [18] Fogel, D. B., An information criterion for optimal neural network selection. IEEE Tran. N.N., (1991), 490-497. [19] J.E. Foley Jason, J. Mariotti, K. Ryan, M. Eckhaus, and D. H. Fowler, The cell therapy of established acute graft-versus-host disease requires IL-4 and IL-10 and is abrogated by IL-2 or host-type antigenpresenting cells. Biology of Blood and Marrow Transplantation, 14 (2008), 959-972. [20] X.-Z Yu.. Y. Liang, R. I. Nurieva, F. Guo, C. Anasetti, and C. Dong, Opposing effects of ICOS on graft-versus-host disease mediated by CD4 and CD8 T cells1, The Journal of Immunology 176 (2006), 7394–7401. [21] Barbara C. Biedermann, Vascular endothelium and graft-versus-host disease. Best Practice & Research Clinical Haematology. 21, No. 2, (2008), 129–138. [22] Y. Hu, Q. Song and N. Kasabov, Personalized Modeling based Gene Selection for Microarray Data Analysis. In: The 15th Int. Conf. on Neuro-Information Processing, ICONIP, Auckland, New Zealand, Nov. 2008, Springer LNCS vol.5506/5507, 2009. [23] N.Kasabov, Global, local and personalised modelling and profile discovery in Bioinformatics: An integrated approach, Pattern Recognition Letters, 28, Issue 6 (2007), 673-685.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-281
281
Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation Fabio LA FORESTA1, Nadia MAMMONE, Giuseppina INUSO, and Francesco Carlo MORABITO DIMET - “Mediterranea” University of Reggio Calabria via Graziella Feo di Vito, I-89122 Reggio Calabria, ITALY
Abstract. The electrocardiogram (ECG) is the major diagnostic instrument for the analysis of cardiac electrophysiology; this is due to two simple reasons, first because it is not invasive and secondly because an ECG is a source of accurate information about the heart functionality. For these reasons, in the last years, the ECG has attracted the interest of many scientists, who have developed algorithms and models to investigate the cardiac disorders. The aim of this paper is to introduce a novel dynamic model to simulate pathologic ECGs. We discuss a generalization of a well known model for normal ECG signals generation and we show that it can be extended to simulate the effects on ECG of some cardiac diseases. We also represent the 3D vector trajectory of the cardiac cycle by reconstructing the heart dipole vector (HDV) from the Frank lead system. Finally, we propose to generate the complete 12-lead ECG system by the HDV projection. The results shows this a powerful tool for pathologic ECG generation, future research will be devoted to set up an extensive synthetic ECG database which could open the door to new theories about the genesis of the ECG as well as new models of heart functionality. Keywords. ECG, heart dipole model, heart diseases, VCG.
Introduction Since its discovery by Waller (1889) and Einthoven (1908), the electrocardiogram (ECG) has been widely used in cardiological diagnosis, like arrhythmia, atrioventricular (A-V) fibrillation, A-V block, myocardial ischemia and other disorders in the heart activation. The ECG is the recording of the heart electrical activity collected at the body surface. It is generated by the bioelectric field produced by the contractions of the cardiac muscle; in particular the bioelectric potentials is generated by the depolarization and repolarization processes of the atrial and ventricular muscles. In fact during atrial and ventricular contractions, the sodium and potassium ions concentration is not balanced and the heart is not electrically neutral. In other words, the heart can be considered equivalent to an electric dipole and it can be described by a vector whose amplitude and orientation change during the depolarization and repolarization of the heart muscle cells: this is the theoretic basis of the modern ECG [1]-[4].
1 Corresponding Author: Fabio La Foresta, DIMET - Mediterranea University of Reggio Calabria, via Graziella Feo di Vito, I-89122 Reggio Calabria, ITALY; E-mail: [email protected].
282 F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation
Therefore, the ECG is the major diagnostic tool in cardiac electrophysiology. It is extensively used for two reasons: first because it is not invasive and secondly because the ECG recordings are sensitive to even ting abnormalities of the cardiac cycle, i.e. an accurate information about the heart functionality can be infered by the ECG inspection. In the last years, several algorithms were developed for the ECG analysis and interpretation (a wide description of algorithms and applications can be found in [5] pp. 135-195, 291-362). The researchers have focused their studies in two topics: one is concerned with the morphology of waves and complexes that characterize a complete cardiac cycle; the other one consist in seeking patterns of ECG variability over time. The availability of ECG databases and the implementation of heart models have been the basis of the discoveries and the improvements in the cardiac diseases diagnosis. In particular modeling the heart is a useful theoretical tool for the investigation and the interpretation of cardiac physiology and pathology [6], [7]. The main reason that makes the availability of heart models essential and this drives the researchers to implement cardiac models, is the need to understand how the heart activity affects the ECG and the impossibility to influence the human heart in order to study the consequent effect on the ECG. Evidently this problem can be partially bypassed trough the analysis of ECG database recordings [8]. On the other hand, heart modeling would a clear advantage to allow for a mathematical characterization of the complete cardiac cycle and to allow for the simulation of fatal cardiac diseases. Furthermore, the generation of synthetic ECG would dramatically improve the testing of signal processing algorithms. In particular, a synthetic 12-lead ECG database can be obtained by reconstructing the Heart Dipole Vector (HDV) and by using the Dower transformation [9]. The development of models might also useful to improve the solution of inverse problems [10] and to help in the fetal ECG evaluation [7], [11]. The aim of this paper is to introduce a dynamical model to simulate pathologic ECGs because such a model has not been introduced yet. The proposed model is based on the generalization of a well known model, developed from McSharry et al. [6], for generating a single-channel ECG signal. After a brief description about the HDV generation and its relationship with the 12-lead ECG system, we will discuss in details the proposed dynamic model, named Modified Heart Dipole Model (MHDM), that extends the model described [6] to synthesize ECG signals related to the most common heart disorders. We will also show as MHDM allows to depict the 3D HDV representation, namely the vectorcardiogram (VCG), by its reconstruction from the Frank lead system. Finally, we will present some results that point out the potential employment of the proposed model to develop a synthetic database of pathologic ECGs.
1. Heart Dipole Vector and 12-Lead ECG System According to the single dipole model of the heart [3], the cardiac electrical activity can be simulated by a time-varying rotating vector d(t), that is mathematically represented in the Cartesian coordinates, as follows: d(t ) = d x (t )ˆi x + d y (t )ˆi y + d z (t )ˆi z
(1)
F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation 283
where ˆi x , ˆi y , ˆi z are the unit vectors of the three body axes depicted in Fig. 1. The cardiologists usually refer the heart activity to the three orthogonal planes: sagittal plane (x-z plane), frontal plane (y-z plane) and transverse plane (x-y plane). As mentioned in the previous section, the representation of (1) in the 3D body space is known as VCG. Figure 2 shows a theoretical VCG loop during normal cardiac cycle; this representation is only employed to collect additional information about the loop area and the direction of the HDV. The ECG is usually performed by nine electrodes that collect twelve leads (for more details see [1] pp. 277-290 and [5] pp. 1-25). The 12-lead ECG system is a redundant way to represent the HDV and it is useful to investigate the heart disorders because it provides many HDV projections. In fact, any ECG signal recorded at the body surface can be considered a linear projection of the HDV (1) onto the direction of the recording electrode axes, v(vx,vy,vz): ECG (t ) = vx ⋅ d x (t ) + v y ⋅ d y (t ) + vz ⋅ d z (t )
(2)
In other words, the ECG signals recorded at the body surface are the potential differences between two different points. In summary, in agreement with (1) and (2), a complete description of the 3D HDV can be performed using only three linearly independent ECG signals, i.e. VCG. However, since the single dipole model of the heart is not a perfect representation of the cardiac activity, the cardiologists usually utilize the 12-lead ECG system to study the cardiac activity, because it provides a redundant information that can be used to improve the diagnosis of heart diseases. In particular the 12 lead ECG system recorded twelve signals linearly dependent: (i) the three Einthoven leads I, II, III, (ii) the three Wilson leads aVR, aVL, aVH, and (iii) the six precordial leads V1-V6. Equation (2) can be extended to provide a mathematical description of a complete 12-lead ECG system: ECG (t )12-lead = H ⋅ V ⋅ d(t ) + N(t )
(3)
where ECG(t)12-lead is a vector of the ECG channels recorded from 12-leads, d(t)=[dx(t), dy(t), dz(t)]T contains the three components of the HDV, H is a 12×3 matrix that includes the body volume conductor model ([1], pp. 139-142), V is a 3×3 matrix corresponding to the projection of the HDV onto the direction of the recording electrode, and N(t) is the noise in each of the 12 ECG channels at the time instance of t. The generation of pathologic ECG is based on the mathematical model of the 12-lead ECG system, given by (3); the basic point is the HDV estimation. In the next section, we present a dynamic model whose parameters can be associated with some cardiac pathology.
2. Modified Heart Dipole Model In this section, we introduce the Modified Heart Dipole Model (MHDM) to simulate pathologic ECG signals. The proposed model is inspired by the single-channel ECG dynamic model presented in [6]. After a brief mention about the ECG morphology, we
284 F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation
Figure 1. The three body axes (according to [1]). The heart activity can be referred to the three orthogonal planes: sagittal plane (x-z plane), frontal plane (y-z plane) and transverse plane (x-y plane).
Figure 2. A theoretical VCG loop during normal cardiac cycle (arrows indicate the direction of rotation).
will detail how we enhanced the model described in [6] for generating pathologic ECG signals. Subsequently, pathological ECG will be generated and the HDV will be reconstructed by estimating three orthogonal ECG signals.
F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation 285
2.1. ECG Morphology Figure 3 shows a normal ECG signal and its characteristic waves; according to (2), the signal can be assimilated to the projection of the HDV into the frontal plane. The normal ECG signal includes various deflections: P-wave represents atrial depolarization, the ventricular depolarization causes the QRS complex, and repolarization is responsible for the T-wave (for more details see [6], section II). It is well known that heart diseases distort the PQRST complexes, vice versa specific changes in ECG signals reveal a pathological activity of the cardiac muscle. In other words, the morphology analysis of ECG is the simplest way to investigate the majority of cardiac disorders. 2.2. Single Channel Modeling for Pathologic ECG The single-channel ECG modeling is based on the idea that the PQRST complex can be obtained as a succession of Gaussians having different characteristics. McSharry et al. [6] have shown that ECG signal can be simulated by solving three coupled ordinary differential equations [6].
Figure 3. The synthesized ECG signal by MHDM: morphology of PQRST-complex. Table 1. Parameters of MHDM i
P
Q
R
S
T
time (s) θi (rad) ai bi
-0.250 - π/3 1.25 0.25
-0.025 - π/12 -5.00 0.10
0 0 30.00 0.10
0.025 π/12 -8.00 0.10
0.250 π/2 1.00 0.50
286 F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation
If s(t) is the ECG signal, it can be synthesized by: s&(t ) = −
∑
i∈{P ,Q , R , S ,T }
α i θ − θi
2π
⎡ Δθ 2 ⎤ exp ⎢ − i2 ⎥ − s (t ) ⎣⎢ 2bi ⎦⎥
(4)
where | |2π denotes the “module 2π” operation, (αi, bi, θi) are empirical parameters, whose setting is values are summarized in Table I, and θ is an angular dynamic parameter that influences the ECG deflections. McSharry et al. proposed to calculate the θ parameter by coupling (4) with: ⎧ χ& = αχ − ωγ ⎨ ⎩γ& = αχ + ωγ
(5)
where α = 1 − χ 2 + γ 2 and ω is related to the heart rate. The θ parameter is obtained from (5) as θ = atan 2(γ , χ ) . In other words, θ can be dynamically estimate by moving around the χ-γ limit trajectory (i.e. the unit radius cycle) with angular velocity ω The model (4) could be also used to simulate heart diseases by changing αi, bi, and θi; thus the model would become more complicated and less flexible. The proposed model is based on the generalization of the χ-γ limit trajectory by modifying (5). This way the limit trajectory can be dynamically adapted and the PQRST complex can be controlled beat by beat. Because the aim is to simulate pathologic ECG signals, a fundamental requirement is that we can control the effects of the limit trajectory generalization by means of some parameters. Thus, we propose to generalize the limit cycle with an elliptic trajectory introducing a linear transformation and a rotation: ⎡ χˆ ⎤ ⎡ cos ϕ ⎢ˆ⎥=⎢ ⎣γ ⎦ ⎣ − sin ϕ
sin ϕ ⎤ ⎡ k1 ⋅⎢ cos ϕ ⎥⎦ ⎣ 0
0 ⎤ ⎡χ ⎤ ⋅ k2 ⎥⎦ ⎢⎣γ ⎥⎦
(6)
the θ parameter is now obtained as θ = atan 2(γˆ, χˆ ) ; notice that for k1= k2=1 and φ=0, (6)is equivalent to (5). This choice has two advantages: first the computation complexity of the model is almost unvaried and secondly we can modulate the distortion of the PQRST complex by the parameters k1, k2 and φ. In Fig. 4, we show an example of the ECG morphology alteration: continuous line is related to normal ECG and it is simulated by k1= k2=1 and φ =0; while dotted and dashed lines (related to pathologic ECG signals) are obtained respectively by k1=1, k2=0.7, φ =0 and k1=1, k2=0.7, φ =π/2 (or k1=0.7, k2=1, φ =0). Notice that changes in QRS-complex length are typical of bradycardia and tachycardia; whereas the prolongation of PR-interval is typical of A-V block (for more details about ECG diagnosis see [1] pp. 320-335). 2.3. Heart Dipole Vector Modeling The HDV can be modeled by using (4) to generate its Cartesian components in the 3D body space; in other words, the HDV can be synthesized by three orthogonal ECG
F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation 287
Figure 4. The synthesized ECG signals: continuous line is related to normal ECG (k1= k2=1, ϕ=0), dotted (k1=1, k2=0.7, ϕ=0) and dashed (k1=1, k2=0.7, ϕ=π/2 or k1=0.7, k2=1, ϕ=0) lines are related to pathological ECG.
signals. Such condition is fulfilled by the Frank lead system that is the conventional method for recording the VCG (for more details see [1] pp. 296-299). Sameni et al.[7] have proposed to reconstruct the Cartesian components of d(t) by adapting the parameters of (4) to the Frank leads. Thus, the HDV can be modeled by: 2 ⎧ ⎡ ⎛ x ⎞ ⎤ Δθ ⎪d&x (t ) = − ⎥ − d x (t ) α ix θ − θix exp ⎢ − ⎜ i ⎟ 2π 2bix ⎠ ⎥ ⎪ ⎢ ⎝ i∈{ P ,Q , R , S ,T } ⎣ ⎦ ⎪ 2 ⎪ ⎡ ⎛ y ⎞ ⎤ ⎪& Δθ ⎥ − d y (t ) α iy θ − θiy exp ⎢ − ⎜ i ⎨d y (t ) = − y⎟ 2π 2bi ⎠ ⎥ ⎢ ⎝ ⎪ i∈{P ,Q , R , S ,T } ⎣ ⎦ ⎪ 2 ⎡ ⎛ z ⎪ ⎞ ⎤ Δθ ⎥ − d z (t ) α iz θ − θiz exp ⎢ − ⎜ i ⎪d&z (t ) = − z⎟ 2π 2bi ⎠ ⎥ ⎢ ⎝ ⎪⎩ i∈{ P ,Q , R , S ,T } ⎣ ⎦
∑
∑
∑
{
}{
}{
where α ix , α iy , α iz , bix , biy , biz , θix ,θiy ,θiz
(7)
} are the empirical parameters that have
been estimated from the best fitting between the MHDM parameters and the Frank lead signals (an accurate investigation of these parameters can be found in [7]). Equations (7) can be used to simulate a pathologic HDV by computing the θ parameter by (6). In Fig. 5, we show an example of a pathologic VCG loop related to the “nodal
288 F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation
Figure 5. A pathologic VCG loop related to the “nodal rhythm” disease (k1=0.5, k2=1, ϕ= π/6): a marked Tloop is easily visible (arrows indicate the direction of rotation).
rhythm” disease (k1=0.5, k2=1, φ =π/6): a marked T-loop is easily visible (arrows indicate the direction of rotation).
3. Results In this section, we evaluate the performance of the MHDM through some simulation. As shown in the previous section, we simulate the Cartesian components of the HDV by coupling (7) and (6); the parameters k1, k2 and ϕ are used to modify the normal HDV cycle. Once the HDV is known, we can depict the VCG and we can obtain the 12-lead ECG. Table 2. Some Synthetic Diseases by MHDM Model parameters k1 = 1, k2 = 1, ϕ = 0
PQRST complex alteration -
k1 = 0.5, k2 = 0.1, ϕ = π/4 k1 = 0.5, k2 = 1, ϕ = π/6
undulation without QRS-complexes rapid ventricular rate normal QRS isoelectric interval TP marked T wave
k1 = 0.2, k2 = 1, ϕ =2π/3
abnormal QRS P wave not associated
k1 = 1, k2 = 0.3, ϕ = 0 k1 = 0.5, k2 = 0.1, ϕ = π/4
P wave absent QRS-complex absent
k1 = 0.3, k2 = 1, ϕ = 0 k1 = 1, k2 = 0.1, ϕ = π/8
Correlated disease normnal sinus rhythm ventricular fibrillation atrial fibrillation atrial flutter nodal rhythm premature ventricle contraction A-V block asystole
F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation 289
Figure 6. The projection of the HDV on the frontal plane. The figure shows 10 seconds of the lead I for some cardiac diseases; according to Table II: (a) normal sinus rhythm (k1= k2=1, ϕ=0), (b) nodal rhythm (k1=0.5, k2=1, ϕ= π/6), (c) ventricular fibrillation (k1=0.3, k2=1, ϕ= 0) and (d) asystole (k1=0.5, k2=0.1, ϕ= π/4).
However, because we are interested in evaluating the MHDM performance in simulating cardiac diseases, we focused our attention only on the projection of the HDV on the frontal plane, this way we showed only the first of the twelve ECG leads, i.e. lead I. This is justified because the PQRST complex is well visible in the frontal plane. Table II summarizes the parameters settings that generate PQRST complex alterations referable to cardiac diseases. Figure 6 shows some projections of the HDV in the frontal plane (by lead I); that are obtained by varying the MHDM parameters according to Table II. The pulsation ω has been fixed to obtain a normal heat rate about 60 BPM and the noise has not been added for a better ECG morphology evaluation .
4. Conclusions A model dynamical for pathologic ECGs simulation has been introduced. In this paper, the aim of the model is to generalize the single channel ECG model described in [6] and to extend it to 3D reconstruction of the HDV. The obtained results emphasize the capability of the proposed model and they represent a base to develop a synthetic database of pathologic ECGs. Future developments will be oriented to the study of the model parameters with respect to the ECG morphology alteration, in order to characterize a greater number of heart diseases.
References [1] [2] [3] [4] [5] [6]
J. A. Malmivuo and R. Plonsey, Eds., Bioelectromagnetism, Principles and Applications of Bioelectric and Biomagnetic Fields, Oxford University Press, New York, 1995, 119-146, 277-289, 320-335. O. Dössel, Inverse problem of electro- and magnetocardiography: review and recent progress, International Journal of Bioelectromagnetism, 2 (2) (2000). A. Van Oosterom, Beyond the dipole; modeling the genesis of the electrocardiogram, 100 Years Einthoven, (2002), 7–15. D.B. Geselowitz, On the theory of the electrocardiogram, Proceedings of the IEEE, 77 (6)( 1989), 857876. G.D. Clifford, F. Azuaje, P. McSharry, Advanced Methods and Tools for ECG Data Analysis, Artech House Publishers, London, 2006. P. E.McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith, A dynamical model for generating synthetic electrocardiogram signals, IEEE Transactions on Biomedical Engineering, 50 (3) (2003), 289–294.
290 F. La Foresta et al. / Dynamic Modeling of Heart Dipole Vector for the ECG and VCG Generation
[7]
R. Sameni, G.D. Clifford, C. Jutten, and M.B. Shamsollahi, Multichannel ECG and Noise Modeling: Application to Maternal and Fetal ECG Signals, EURASIP Journal on Advances in Signal Processing, article ID 43407, 2007. [8] The PTB Diagnostic ECG Database, http://www.physionet.org/ physiobank/database/ptbdb/. [9] G.E. Dower, H.B. Machado, and J.A. Osborne, On deriving the electrocardiogram from vectorcardiographic leads, Clinical Cardiology, 3 (2) (1980), 87–95. [10] F. La Foresta, M. Cacciola, N. Mammone, F.C. Morabito, and M. Versaci, Inverse Problem Solution to Evaluate the Bioelectric Field of Fetal Heart Muscle: Remarks on Electrodes Placement, International Journal of Applied Electromagnetics and Mechanics, 26,(2007) 266-271. [11] F. La Foresta, N. Mammone, F.C. Morabito, Bioelectric Activity Evaluation of Fetal Muscle by Mother ECG Processing, Proceedings of The Twelfth Biennial IEEE Conference on Electromagnetic Field Computation, 132, 2006.
Chapter 5 Applications
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-293
293
The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects a
Maurice GRINBERGa,1 and Vladimir HALTAKOV a,b Central and East European Center for Cognitive Science, New Bulgarian University, Bulgaria b Sofia University “St. Kliment Okhridsky”, Bulgaria
Abstract. This paper introduces a recently proposed hybrid cognitive model, called TRIPLE focusing on its connectionist aspects. They are demonstrated on a series of schematic and realistic examples of analogy based retrieval from memory. The model integrates three modules which run in parallel: serial reasoning, connectionist and emotion engines. The serial reasoning engine deals with the current task, processes perceptual input, plans and performs actions, and synchronizes the activity the other two engines. Its special feature is formal reasoning. This paper focuses on the connectionist engine and the formalism behind it and how it compares to existing approaches. The connectionist engine can be considered as standalone model of retrieval from memory and mapping to the current task. It is based on a series of distributed re-representations of semantic net like symbolic localist representations of knowledge. The distributed representation building mechanisms proposed can reflect taxonomic, relational, semantic or other structure of the long term memory. The representations obtained are made dynamic by a complementary activation spreading process. The mapping-to-task mechanisms of the connectionist module are based on a dynamic similarity evaluation between task and memory content and simultaneous selection of the best mappings by a constraint satisfaction network. Keywords. Cognitive modeling, dynamic similarity assessment, retrieval and mapping
Introduction The model TRIPLE, introduced recently [1], has been designed as a cognitive model for Embodied Conversational Agents (ECA) (e.g. see [2]). As a consequence, on the one hand TRIPLE has several high level mechanisms needed for a cognitive architecture and on the other hand, its implementation is maximally optimized to achieve real time functioning. Thus, TRIPLE can account for a wide range of cognitive processes and achieves the computational performance and scale needed for realistic applications. Virtual environments (like the Internet and its future developments) become more and more complex and rich and even in some respects comparable to real environments. As well known, human level of performance is difficult to achieve especially regarding perception, communication, and context sensitivity. At the same time, although far 1
Corresponding Author: Central and East European Center for Cognitive Science, New Bulgarian University, Montevideo str. 21, 1618 Sofia, Bulgaria; E-mail: [email protected].
294 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
from human-like performance the cognitive architectures can come closer to the way humans behave in complex environments (see e.g. [3]) in comparison with traditional AI approaches. Thus, they seem to be good candidates to be used as ‘minds’ for ECAs living in virtual environments [2]. However, acceptable performance in real world settings poses imperatively the question of scalability for cognitive models. This is an important constraint and a powerful motivation for model improvements in efficiency which preserve cognitive plausibility and flexibility at the same time. The main advantages of the cognitive perspective to ECAs, related to context sensitivity and adaptation, come from the use of connectionist and dynamic system theory approaches. Therefore, in this paper, we focus on the connectionist aspects of TRIPLE and more specifically on the so-called Similarity Assessment Engine (SAE) which in our opinion is the main novelty in the architecture. It is responsible for retrieval from memory and mapping to the task at hand of episodic knowledge, determination of the working memory content (WM) and focus of attention. In order to put the content of this paper in the context of the whole TRIPLE architecture, the latter is presented briefly in Section 1. In Section 2, we present the main principles and the formalism of the SAE, and in Section 3 we give a series of examples illustrating its capabilities.
1. The TRIPLE Cognitive Architecture The TRIPLE cognitive model has three interconnected parts that function in parallel. The so-called Reasoning Engine (RE) is coordinating and synchronizing the activities of the model and relates the agent with its environment and with the tools it can use in this environment like communicate with a user, access and retrieve knowledge from ontologies, etc. [1]. RE is also responsible for instance learning – storing of useful episodes in Long Term Memory after evaluation. Part of RE is the Inference Engine (IE) which operates on a limited amounts of active relevant knowledge (the most active part of the WM of the agent). Its main role is to augment parts of WM with inferred knowledge and do consistency checks. The second module of Triple is the so-called Similarity Assessment Engine (SAE). It is designed to be a connectionist engine, based on fast matrix operations and is supposed to run all the time as an independent parallel process. The main mechanism is activation spreading in combination with similarity or correspondence assessment mechanisms which allow retrieval of knowledge relevant to the task at hand. The communication of SAE with RE is based on events related to the level of confidence for a match between the task and WM content. The information retrieved can correspond to the input and goals of the ECA at different level of abstraction. It will be considered in detail in Section 2. The third important part of the architecture is the Emotion Engine (EE) which is based on the FAtiMA emotional agent architecture [4]. FAtiMA generates emotions from a subjective appraisal of events and is based on the OCC cognitive theory of emotions [5]. EE, similarly to SAE, is supposed to run in parallel and influence various parameters of the model like the volume of WM, the speed of processing, etc. [6] for a simple exploration of the role of emotions on analogy making). The availability of the emotional engine allows for higher believability and usability based on the emotional expressions and gestures corresponding to the current emotional state of the agent. The
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 295
importance of emotions for ECAs and the interaction of users with them have been pointed out for instance in [7]. The TRIPLE model is connected to the DUAL/AMBR architecture [8,9] by inheriting some important mechanisms. TRIPLE, similarly to DUAL/AMBR, makes use of spreading of activation as a method for retrieval from memory of the most relevant episodic or general knowledge. The mapping of the knowledge retrieved to the task at hand and to the current input to the system is based on similarity and analogy in both models [10-12]. However the underlying mechanisms are essentially different. In DUAL/AMBR the 'duality' is achieved at the level of each micro-agent while in TRIPLE it is achieved by two systems which run in parallel and communicate on an event-driven basis. An important additional difference is that TRIPLE is using a fully fledged reasoning part in the standard AI sense, which is not available in DUAL/AMBR. The inference and entailment capabilities are integrated with the spreading of activation and evaluation of retrieval and action planning. Only the most active part of WM, corresponding to the focus of attention of the system is subject to augmentation based on inference and to other symbolic processing like evaluation, transfer, and action. The Amine platform [13] has similar augmentation mechanisms which are based on purely symbolic manipulation and are not conditioned by any attention mechanism of the system (see the discussion of 'elicitation' and 'elaboration' in [13]). However, the main difference of TRIPLE compared to other cognitive architectures, resides in the SAE. The SAE tries to implement dynamic connectionist type of processing based on distributed re-representation of symbolic and localist knowledge representation. It allows the model to use existing and newly created ontologies (and the associated fast formal reasoning, see e.g. [14]) and fast matrix based connectionist processing. RE integrates the results of the operation of the SAE and EE on one hand and the communication flow with the ‘Body’ and the ‘Environment’ on the other. It works with the sensory-motor part of TRIPLE by receiving the information from the sensors (user utterances, information about results from actions, etc.) and sending action commands to the ‘Environment’. The main interactions with the ‘Body’ and ‘Environment’ are the same as the ones reported in [11]. The task set of elements (instances of concepts and relations describing the task and the goal) is the source of activation for the SAE. As explained above SAE runs in parallel (together with the EE) and has continuously information about possible candidates for retrieval and consideration by RE. The latter evaluates the level of similarity and based on that establishes candidate correspondences between the elements of the task and LTM. When the correspondences between the task and LTM are established the RE verifies and evaluates them and eventually rejects or confirms them. Based on the existing correspondences, parts of past episodes are transferred as possible candidates for task completion. These transferred memory elements are evaluated on their turn for consistency with task until eventually an action transfer is chosen and the appropriate action is executed. We assume that RE must work serially because it deals with high level reasoning tasks. It actually deals with the most active part of WM which can be considered to be the focus of attention of the system. If the task is considered completed (e. g. a question is answered) the whole episode with the task and its completion is stored in LTM as an experience episode for future
296 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
use. Any new general knowledge acquired during a task completion based on inference or knowledge retrieval from an external source is also stored in LTM.
2. The Similarity Assessment Engine (SAE) The SAE can be considered as a standalone connectionist model which receives from the RE some task as input presented in symbolic form, namely as instances of concepts existing in LTM. It returns similarity evaluations (or relevance probabilities) between the task elements and some elements of LTM, typically episodic knowledge. In the case of analogy making such a task could be to map a target to the appropriate base. Generally, it activates relevant knowledge which can be further processed by the RE. The main underlying mechanism is standard activation spreading which can actually take place in a variety of networks (depending on the type of connections) involving the elements in LTM. The most commonly used is spreading of activation using the connections in the semantic tree (‘is-a’, ‘has’, ‘instance-of’, etc.) [15,16]. Spreading of activation can be also done using the semantic relatedness of the memory elements extracted from texts for instance by Latency Semantic Analysis (LSA) [1719]. The second important aspect of SAE are the mechanisms for deriving distributed representations based on the taxonomic, relational and semantic structure of LTM. The goal of this re-representation is to substitute the symbols in LTM with vectors which contain connectionist-like information about various aspects (or features) of these elements. For instance, such vectors can give the relation of an element to other elements in LTM (e.g. defined by ‘is-a’ and’ instance-of’ links), the participation and the role of this element in relations and actions (e.g. by counting the number of times an element is the first argument in the relations in LTM) and the correlation coefficient of this element to all the other elements in LTM, obtained by LSA analysis of texts and memory episodes relevant to the task at hand. Similar methods have been used in the concept maps’ and case-based reasoning literature (e.g. [20]). By performing these two procedures, we are left with a set of weight matrices giving the connections between LTM members from different and in some cases complementary perspectives (e.g. taxonomic, associative, semantic, etc.) and a set of related distributed representations for each LTM element. In our opinion, regarding the status of these two ingredients of SAE, they are more realistic from a connectionist point of view than a semantic-tree like LTM. Our approach is an attempt to a reverse reconstruction of a set of more realistic complementary distributed representations from symbolic localist representations (in opposition to other approaches of decomposition to canonical representation given in [21] or in [22] and their relation to a pure symbolic representation can be traced back. Ultimately, in this approach, the ‘complete’ rerepresentation would be a superposition of all possible (relevant) distributed representations which could be built based on localist symbolic representations. The activation spreading mechanism dynamically relates these two components of SAE (the set of weight matrices and distributed representations of LTM elements). It gives the relevance of each LTM element to a task (given that in TRIPLE the task elements are the sources of activation) and the relevance of each component in the distributed representations of this element at the same time (measured by their activation). It is obvious that this relevance is highly dynamic (depends on the activation patterns at a specific moment in time) and context sensitive depending on the
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 297
weights of the distributed representation sets. Thus, even the similarity based on a single distributed representation set changes over time due to the change of the components’ activations. 2.1. General Approach The main principles behind SAE are the following: x
Creation of a network of connections (weight matrices) based on some principle like association, taxonomy, participation in relations and actions, semantic relatedness, etc. and spread activation over these networks (in this way there could be different types of activation depending on the building principles of the networks);
x
Similarity assessment based on distributed representations obtained from the symbolic representation of knowledge (e.g. the matrix of weights of the taxonomical relations for a concept or instance of concept, or the weights to relations and actions in which they participate).
The idea behind these two principles can be related to a kind of reverse engineering approach. It starts from symbolic encoding of knowledge in a semantic network, in episodes, or even as implicitly present in texts in a domain. Then the task is to build distributed representations out of the structured symbolic representation. For instance, a taxonomy of the type ‘a kitchen chair’ ĺ ‘sub-class-of’ ĺ ‘chair’ ĺ ‘subclass-of’ ĺ ‘furniture’ ĺ etc., can be seen as a distributed representation of ‘kitchenchair’ over the members of the hierarchy ‘chair’, ‘furniture’, etc., and represented by the giving weights to the connection of ‘kitchen-chair’ with any of the higher lying concepts (e.g. by a distance function). Similarly, ‘chair’ can be represented in terms of the relations and action in which it participates like ‘being ON the floor’, ‘being seated ON’, ‘being BEHIND a table’, etc. which will give a distributed representation of objects and instances of objects over relations and actions. As far as the whole knowledge of the cognitive architecture is represented in the semantic net (or ontology) any distributed representation of knowledge elements, obtained by the method outlined above, will be in terms of elements of the same semantic network. In this way, the knowledge of the system can be represented as a multi-dimensional space, spanned by all elements in LTM (plus the instances in episodic memory) and using the various types of connections among the elements various representations can be built of subspaces of the full knowledge space in terms of other or the same sub-spaces. The static picture, described above, is made dynamic by the activation spreading mechanism. Effectively only the active elements in LTM, which characterize the state of the cognitive system at a specific moment in time, should contribute to the distributed representation. Thus, this approach to building distributed representation is highly dynamic and context sensitive. For instance, ‘kitchen-chair’ and ‘kitchen-table’ can be highly dissimilar if only the lowest nodes in the taxonomy are activated, e.g. ‘chair’ and ‘table’, respectively. But when ‘furniture’ gets active they will have something in common and will become more similar as activation spreads higher in the taxonomy. So, in assessing similarity (or more generally correspondence or relevance) only the active nodes matter and the correspondence measure varies with time reflecting the context of the situation as given by the activation patterns.
298 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
The construction of such distributed representation is related to the task considered. In a simple case-based reasoning framework, the use of taxonomy based distributed representations could be sufficient for the retrieval of the right base case. In analogy making research, it is widely accepted that the relational structure is essential (e.g. [2325] and in our approach, it can be accounted for (at least partially) by distributed representations based on relations and actions. In the latter, objects are described in terms of the relations and actions in which they participate, analogically to cooccurrence-in-text based method of semantic similarity like LSA [17], and on the role they play in these relation or actions. This representation gives automatically a distributed representation for the relations and actions in terms of objects participating in them. 2.2. Implementation The implementation of SAE in TRIPLE follows the principles laid down above. In the present implementation there is only one activation spreading mechanism (using the connection in a semantic net type LTM) and two distributed representations. The first distibuted representation is based on taxonomic relations of the type ‘instance-of’ and ‘sub-class-of’. The second takes into account the participation role played in realtions and actions. The aim of the present implementation is to explore the capabilities of TRIPLE in analogy and the taxonomy based distributed represetation accounts for superficial and class belonging similarities while the relation and action based one accounts for relational structure similarities. As described above, retrieval from memory is based on the level of correspondence of the task elements to knowledge in LTM. The measure of correspondence chosen is similarity of task and LTM elements measured on the basis of the two distributed representations by normalized scalar product. The whole process involves three sub-processes running in parallel – activation spreading, correspondence assessment and constraint satisfaction. Activation spreading takes place following the equation: at
f a ( WLTM T a t 1 )
(1)
where a is a vector, corresponding to the activation, represented in the space of all nodes of LTM and the task nodes; fa is a standard activation spreading function which keeps activity in the range [0,1]; WLTM+T is a weight matrix with elements corresponding to weights of links between the nodes in LTM (chosen equal to 1); t is the current iteration. The distributed representations of any memory or task element are presented in matrices, called similarity matrices (denoted X). The elements to be compared are represented as rows in X and the normalized scalar product of rows gives a correspondence or similarity measure for any two elements, weighted by the current activation:
S ijt
ait a tj ¦ X ik X jk akt , k
(2)
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 299
where ati and atj are the activities of the elements being compared at the time t; atk are the activities of the elements which form the ‘basis’ of the distributed representation of elements i and j. The matrix elements Xik can be the weights of the connections between the corresponding elements or be evaluated using a distance functions (see the discussion bellow). The inclusion of activation in the similarity evaluation (see Eq. (2)), makes it dynamic and dependant only on the most active elements in LTM. The general similarity matrix is a superposition of particular similarity matrices:
¦c S
St
s
t s
(3)
s
where
¦c
s
1.
s
In the present implementation we make use of two similarity matrices. The first is based on taxonomic relations like ‘instance-of’ and ‘is-a’ relations, and the second on co-participation in relations and actions taking into account of the role. Under these restrictions, Eq. (3) reads:
St
(1 cs )S tts cs S trs
(4)
where St is the resultant similarity matrix and Stts and Strs are the taxonomy based and the relation based similarity matrices, respectively. The index ‘t’ denotes time dependence as the similarity matrices depend on time through the time dependence of the activation patterns and via their definition via Eq. (2). The parameter cs determines the mixture of the two similarity matrices. As discussed in the examples in the next section, if the taxonomy based similarity matrix dominates the retrieved base is more superficially similar to the target and when the relation based matrix dominates the base is more structurally similar to the target. The value of cs may depend on some priming of the system, or its goal in a particular setting. Once the similarities have been updated (at each time step), they become external input into a Constraint Satisfaction Network (CSN) based on the IAC model of McClleland and Rumelhart [26]. Our approach and parameterization are similar to the well-known uses of CSN methods in analogy research [9,25]. However, our implementation differs in the ‘activation’ used in CSN. In our case, this is the similarity (or relevance probability) that plays the role of activation (see [27] for a similar approach in ontology mapping research). Thus, CSN takes place in a sub-network where similarity is considered to be a special kind of activity standing for probability of mapping between two elements characterized by a so-called correspondence hypothesis. Interaction and competition take place among such correspondence hypotheses based on the constraint to have only one mapping between a target element and a base element and the requirement to have consistent mappings between connected elements, i.e. elements which are connected themselves to correspond to connected elements). In this way, each correspondence hypothesis supports any other non-competing correspondence hypothesis. In other
300 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
words the correspondence hypotheses for connected elements will support each other if they do not map both to the same element (e.g. ‘chair’ĺ‘table’ and ‘on’ĺ’in-front-of’, coming from episodes in which ‘the chair is on the floor’ and ‘the table is in-front-of the window’). The latter is implemented by a connectivity matrix for LTM which contains weights inversely proportional to the distance between two elements, excluding taxonomy relations like ‘instance-of’ and ‘sub-class’. Formally, the implementation of this mechanism is done as follows. A matrix H is initialized with dimensions equal to the similarity matrix between task and LTM elements. Then it is updated by after each cycle of activation spreading and similarity calculation by the formula: 1 1 Hijt dHijt 1 cinhibHijt ,inhib (Hijt1 Hmin) (cexcitHijt ,excit Sijt )(Hmax Hijt 1 ),
(5)
where i refers to a target element, and j to a LTM base element. Every cell (i,j) of the matrix H represents a the relevance probability for a correspondence hypothesis (or similarity) between the elements i and j. The element of H remain in the interval [-1,1] (Rumelhart & McClelland, 1981). The terms Hij, inhib and Hij, excit are the inhibitory and the excitatory contributions to Hij, respectively. The quantity d is a decay parameter. The parameters cinhib and cexcit are parameters controlling the contribution of Hij,inhib and Hij,excit , respectively. The quantities Hij,inhib and Hij,excit are calculated by using the following expressions: H ij ,inhib
H excit
¦H ¦H it
t zi
mz j
C T HC
LTM
(6)
mj
(7)
where C ij
(1 / n ij ) m , ® 0, ¯
iz j i
(8)
j
where the indexes t and m refer to ‘target’ and ‘memory’. The quantities Cjm and Cit account for the closeness in terms of number of connections of any memory element m to the memory element j, and of any target element t to the target element i. The connectivity weights are calculate using Eq. (8), where nkl is the distance between the two nodes k and l in terms of the number of connections to be followed in order to reach p from k (e.g. 1 if there is one direct connection between them, 1/2 for two connections, etc.) and m defines the metric.
3. Exploration of the Model Capabilities In this section several examples of simulations with SAE are presented. There are chosen with the special purpose to explore the capabilities of the model in simple situations and have more of a proof of concept than of realistic applications. However,
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 301
we consider them as necessary first steps in the systematic exploration of the model potential and limitations. The goal is to probe different retrieval and mapping scenarios based on a change of three parameters – cinhib and cexcit, and cs (see Eqs. (4) and (8), respectively). The variation of cinhib and cexcit controls primarily the CSN dynamics but cexcit is expected to play a role in structural alignment process via the connectivity matrices which contain structural information. The influence of the excitatory connections also tends to keep related knowledge elements together and suppresses blending. However, the parameter cs is expected to play the most important role in structural correspondence. The larger this parameter is, the more structural correspondence is expected to be observed in retrieval and mapping. In Example 1, we demonstrate how the change in the parameter cs (see Eq. (4)), changes the type of mapping preserving the structural alignment. In Example 2, we demonstrate the performance on a more realistic example of retrieval of information from a music ontology and show how different values of cs can lead to different type of retrival. 3.1. Example 1 In this example (see Figure 1), we have only one base episode in LTM with the following structure: ‘(table on floor)’ and ‘(book on table)’. The target is similar in structure: (note on book) and (letter on note). The CSN parameters used are: cinhib = 0.15 and cexcit = 0.03.
a)
Base episode
b)
Target episode
Figure 1. Base (a) and target in Example 1. The arrows show the order of the arguments of the corresponding relations.
As expected (see Figure 2a), when we start using a large relational component in the similarity matrix S – cs = 0.9, the model maps the base to the target using the relational information involved and the connectivity matrices of the base and the target and gives the mappings: book to letter, table to note and floor to book. Note that ‘book’ from the target is not mapped to the ‘book’ of the base. This is due to the different roles
302 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
they have in the corresponding ‘on’ relations – in the base the book is on the table and in the target something is on the book (the note). With second parameterization (cs = 0.1), as seen from Figure 2b, the mapping is dramatically changed. Due to the predominance of the taxonomic part in S, due to the choice of the mixing parameter, the ‘book’ from the base becomes similar to the ‘book’ in the target and they are mapped together while the remaining elements follow this change – the mappings are reversed and ‘letter’ is mapped to ‘floor,’ and ‘note’ to ‘table’. Remarkably, the structural correspondence is preserved in this simple case, due again to the connectivity matrices.
a)
Alignment (cs = 0.9)
b)
‘Cross’ mapping (cs = 0.1)
Figure 2. Mapping between base and target for two values of cs: a) (cs = 0.9) and b) (cs = 0.1).
3.2. Example 2 In this example a real ontology is used in order to demonstrate the model capabilities in more realistic situations. The memory of the system is loaded from a subset of a gossip ontology, which contains facts extracted from Wikipedia. It contains information from the musical domain: music artists and groups with their prizes and records, as well as personal information like religion, children, parents etc. The first target represents an artist, which has a religion and a sibling. Since in the given subset of the gossip ontology only Johnny Cash has a sibling (Tommy Cash), the target artist is mapped to him. All mappings are shown in Table 1. Table 1. Retrieving information from a musical domain ontology Target
Base
Artist-Target-1
Artist-2 (Johnny Cash)
(Artist-Target-1 - hasSibling - Sibling-Target-1)
(Artist-2 - hasSibling - Person-12)
(Artist-Target-1 - hasReligion - Religion-Target-1
(Artist-2 - hasReligion - ReligionStatus-85)
Sibling-Target-1
Person-12 (Tommy Cash)
Religion-Target-1
ReligionStatus-85 (Born-Again Christian)
In the example above, the target can be directly mapped to a specific matching part in memory. A more complex example is when there are two alternatives for mappings
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 303
because of some differences in the information in the target and in memory. The target for the next example contains a music group, which has an Academy Award for Best Original Song (stored as Prize-20 in the ontology) and plays pop music. In the ontology the group Queen has such award (only for the experiment), but the music style it plays is Pop-Rock, which doesn’t exactly correspond to the style in the target. On the other hand Madonna also has the same Academy Award and plays exactly the same style – pop. However, Madonna is a music artist, but not a pop group. If we set the cs parameter of the system to 0.05, making it to give more emphasis on the taxonomical similarity between the objects, the music group from the target is mapped to Queen (see Table 2). However, if the cs parameter is set to 0.95, the model considers more the structural correspondence between the objects. Since the music artists and the music groups are similar in structure (they both have songs, records, prizes, etc.), the music group from the target is mapped to Madonna, who does also play the same music style as asked in the target (Table 3). Table 2. Retrieving information from a musical domain ontology – cs = 0.05 Target
Base
MusicGroup-Target-1
Group-11 (Queen)
(MusicGroup-Target-1 - hasPrize – Prize-20)
(Group-11 - hasPrize – Prize-20)
(MusicGroup-Target-1 - hasGenre - Pop)
(Group-11 - hasGenre – Pop-Rock)
Table 3. Retrieving information from a musical domain ontology – cs = 0.95 Target
Base
MusicGroup-Target-1
Artist-1 (Madonna)
(MusicGroup-Target-1 - hasGenre - Pop)
(Artist-1 - hasGenre - Pop)
(MusicGroup-Target-1 - hasPrize – Prize-20)
(Artist-1 - hasPrize – Prize-20)
4. Discussion and Conclusion In the paper, the connectionist aspects of the TRIPLE model were presented, namely the SAE module. The SAE relies on multifaceted re-representations of symbolic knowledge and builds two complimentary structures: connections between knowledge nodes and distributed representations of knowledge nodes. The connections matrices can be based on relations or associations between the knowledge elements or be derived on the basis of semantic relatedness (e.g. by LSA or related methods). The similarity matrices can be constructed again by using LSA, or by using the connections among the elements of LTM, associations based on learning, etc. In the paper, these mechanisms were illustrated by using the taxonomic connections and statistics about the involvement of objects in relations and their role in them. Both similarity matrices were modulated by the current activation, spread over all connections in LTM. Thus, the similarity between the task (target) and LTM episodes is dynamic and evolving during mapping and retrieval. We assume that the
304 M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects
elaborate semantic nets and ontologies are the expression of knowledge which is acquired in a connectionist, distributed way. So, this is an attempt to make a reverse analysis and derive distributed representation from the resulting localist symbolic representations. In this way, a rich distributed representation is obtained which contains the links of any element with the others in some way. The various possible distributed representations – taxonomic, relational, semantic proximity based, or other – complement each other and give the power of the model. They represent different layers of analyses of the task and could give variety of types of retrieved and mapped knowledge ranging from superficially similar to the task up to distantly analogous, depending on the context and the state of the model. At the same time this functionality is obtained just by spreading of activation and dynamic ‘similarity’ assessment (typically scalar products weighted by the activation of the LTM elements). The latter makes the model computationally effective and amenable to efficient computational implementation (using standard linear algebra packages) which potentially ensures the checking of its scalability in more realistic examples. While the full capabilities and limitations of this approach need to be carefully explored and compared to existing models, in this paper we demonstrated in a series of simulations the considerable potential of the model in retrieval and analogy making. In these simple examples, the manipulation of three parameters of the model – the mixture of taxonomic and relational similarity measures, and the parameters of the constraint satisfaction network – allowed to demonstrate that the model can account for various phenomena in analogy retrieval and mapping. It should be stressed that this model is just a component of the TRIPLE cognitive architecture and its interaction with the reasoning and emotion engines remains to be investigated. In any case, we expect that the combination of the three engines will result in a substantial improvement which is supported by the preliminary simulations performed so far. Elaboration and discussion of such examples will be presented in future publications.
5. Acknowledgments This work was partially supported by the project RASCALLI (FP6, EC). We would like to acknowledge also the fruitful discussions with Dr. H. Laftchieff and S. Kostadinov.
References [1] M. Grinberg, S. Kostadinov, The Triple Model: Combining Cutting-Edge Web Technologies with a Cognitive Model in an ECA, In: Proc. International Workshop on Agent Based Computing V (ABC'08), Wisla, Poland, 2008. [2] B. Krenn, C. Schollum, The RASCALLI Platform for a Flexible and Distributed Development of Virtual Systems Augmented with Cognition, In: Proc. of the International Conference on Cognitive Systems (CogSys 2008), University of Karlsruhe, Karlsruhe, Germany, April 2 - 4, 2008. [3] R. Sun (Ed.), Cognition and Multi-agent Interaction: From Cognitive Modeling to Social Simulation, Cambridge University Press, 2006. [4] J. Dias, A. Paiva, Feeling and Reasoning: A Computational Model for Emotional Characters. In: EPIA, Springer, 2005. [5] A. Ortony, G. Clore, A. Collins, The Cognitive Structure of Emotions. Cambridge University Press, UK, 1988.
M. Grinberg and V. Haltakov / The TRIPLE Hybrid Cognitive Architecture: Connectionist Aspects 305
[6] I. Vankov, K. Kiryazov, M. Grinberg, Impact of emotions on an analogy-making robot, In: Proc. of CogSci 2008, Washington DC, July 22-26, 2008. [7] C. Becker, S. Kopp, I. Wachsmuth, Why emotions should be integrated into conversational agents. In Conversational Informatics: An Engineering Approach. Chichester: John Wiley & Sons, 49-68, 2007. [8] B. Kokinov, A hybrid model of reasoning by analogy. In: Advances in connectionist and neural computation theory: Vol. 2. Analogical connections. Norwood, NJ: Ablex, 1994. [9] B. Kokinov, A. Petrov, Integration of Memory and Reasoning in Analogy-Making: The AMBR Model. In: Gentner, D., Holyoak, K., Kokinov, B. (eds.) The Analogical Mind: Perspectives from Cognitive Science, Cambridge, MA: MIT Press, 2001. [10] K. Kiryazov, G. Petkov, M. Grinberg, B. Kokinov, C. Balkenius, The Interplay of Analogy-Making with Active Vision and Motor Control in Anticipatory Robots, In: Anticipatory Behavior in Adaptive Learning Systems: From Brains to Individual and Social Behavior, LNAI 4520, Springer, 2007. [11] S. Kostadinov, M. Grinberg, The Embodiment of a DUAL/AMBR Based Cognitive Model in the RASCALLI Multi-Agent Platform, In: Proc. of IVA08, LNCS 5208, 356-363, 2008. [12] S. Kostadinov, G. Petkov, M. Grinberg, Embodied conversational agent based on the DUAL cognitive architecture, In: Proc. of WEBIST 2008, 2008. [13] A. Kabbaj, Development of Intelligent Systems and Multi-Agents Systems with Amine Platform, In: Proc. of the 14th International Conference on Conceptual Structures, ICCS 2006, 286-299, 2006. [14] A. Kiryakov, D. Ognyanoff, D. Manov, OWLIM – A Pragmatic Semantic Repository for OWL, In: Proc. Information Systems Engineering, WISE 2005 Workshops 3807, 182-192, 2005. [15] A. M. Collins, E. F. Loftus, A spreading-activation theory of semantic processing. Psychological Review 82 (1975), 407-428. [16] J. R. Anderson, A spreading activation theory of memory, Journal of Verbal Learning and Verbal Behavior 22 (1983), 261-295. [17] T. K. Landauer, S. Dumais, A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Bulletin 104 (1997), 211-240. [18] W. Kintsch, V. L. Patel, K. A. Ericsson, The Role of Long-term Working Memory in Text Comprehension, Psychologia 42 (1999), 186-198. [19] I. Licata, A Dynamical Model for Information Retrieval and Emergence of Scale-Free Clusters in a Long Term Memory Network, arXiv:0801.0887v1 [nlin.AO] (2008). [20] D. Leake, A. Maguitman, A. Cañas, Assessing Conceptual Similarity to Support Concept Mapping. Mapping, In: Proc. of the Fifteenth FLAIRS, AAAI Press, 168-172, 2002. [21] D. Gentner, M. Rattermann, K. Forbus, The roles of similarity in transfer: Separating retrievability from inferential soundness, Cognitive Psychology 25 (1993), 524-575. [22] C. Eliasmith, P. Thagard, Integrating Structure and Meaning: A Distributed Model of Analogical Mapping, Cognitive Science 25 (2001), 245-286. [23] K. Forbus, D. Gentner, K. Law, MAC/FAC: A model of similarity-based retrieval, Cognitive Science 19 (1994), 141-205. [24] D. Gentner, Structure-mapping: A theoretical framework for analogy, Cognitive Science 7 (1983), 155170. [25] K. J. Holyoak, P. Thagard, Analogical mapping by constraint satisfaction, Cognitive Science 13 (1989), 295-355. [26] J. L. McClelland, D. E. Rumelhart, An interactive activation model of context effects in letter perception: Part 1. An account of basic findings, Psychological Review 88 (1981), 375-407. [27] Ming Mao, Yefei Peng, M. Spring, Neural Network based Constraint Satisfaction in Ontology Mapping, In: Proc. of AAAI 2008, 1207-1212, 2008.
306
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-306
Interactive Reader Device for Visually Impaired People Paolo MOTTO ROS a,1 , Eros PASERO a , Paolo DEL GIUDICE b , Vittorio DANTE b and Erminio PETETTI b a Dipartimento di Elettronica, Politecnico di Torino — INFN, sezione di Torino b Istituto Superiore di Sanità di Roma — INFN, sezione di Roma Abstract. In this feasibility study the design and development of an interactive reader device for blind is examined. It is designed around a character recognition engine based on artificial neural networks and has a modus operandi that elaborates a continuous stream of images, acquired by a custom hardware, at the same time the user moves the device on the text he is trying to read. Indeed the system has to guide the user to follow the text flow, in a suitable way. This makes the proposed solution interactive, since it is a complementary sensory aid that gives information about the text it reads (by means of a tactile display and a speech synthesis engine) and at the same time the user has the ability to select the right operating mode and the actual device configuration. Keywords. Artificial neural networks, image processing, haptic interfaces, realtime systems, optical character recognition
Introduction One of the big problem for blind people is the access to the printed text information. Nowadays more and more information are stored and managed through computers, but in everyday life people can not always rely upon such devices, e.g. when consulting restaurant menus or, more important, advice sheets supplied with medicines. These are basic examples, but they give an idea that, though the modern technologies are becoming more and more present in our life, there is still a gap between what they are able to do and what we (or, in this case, visually impaired people) need to be helped in. The final aim of this project is to precisely fill this gap. The idea is to study, design and develop a device that, in a natural way, helps the user to read printed text. This research is only a part of a wider project, named Haptic (from the Greek word apto which refers to everything dealing with the sense of touch) with the aim of studying and designing a set of haptic aids for blind people in everyday life. It is funded by INFN (Istituto Nazionale Fisica Nucleare) and developed by ISS (Istituto Superiore di Sanità) in Rome and the Neuronica Laboratory in the Dipartimento di Elettronica of Politecnico di Torino. There are already some solutions (explored in Section 1), but none of them seems to have all the features desired. The goal is to design and develop a device enabling the 1 Corresponding
Author: Paolo Motto Ros, Dipartimento di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy; E-mail: e-mail: [email protected].
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
(a)
307
(b)
Figure 1. Two existing solutions: (a) detailed view of a Braille bar; (b) the Optacon device.
user to read printed text, in a flexible and autonomous way, without bothering to follow a precise sequence of predefined operations (as it occurs with other solutions), and leaving the user free of exploring the information source (e.g. newspapers, magazines, menus, to name few) under his complete control. This study has led to the development of a working prototype (described in Section 2) which has been used to validate our ideas by organizing tests with real users (see Section 3). In the end, we have to remember that this is mainly a feasibility study, and there are further studies still underway about this device (see Section 4).
1. Existing solutions One of the most used aid for blind people is the Braille bar (also called Braille terminal or Braille display), which is a PC peripheral composed by a series of cells, each one able to reproduce a character by means (usually) of piezoelectric actuators. These products are of course useful, but they are not stand-alone devices and they can only handle already stored information. This could be overcome with the use of a scanner and an OCR software, but again it is not a comfortable solution in many cases, since it is not an immediate way of reading. Talking about the Braille, we have also to mention the Braille printers: they solve to the same task as the traditional ones, but the output is different. Nevertheless it is not at all a “real-time” reading approach, and there will be almost the same problems seen before in reaching the desired information. The main alternative to the Braille system is the vocal synthesis: it is a more immediate and natural way of communication, but again it is only a mean of delivering information, such as the Braille terminal. It has been widely used in some software called “screen reader” which are able to read the screen content to the user. This overcomes some drawbacks of the Braille displays, which are inherently more suitable for text only systems. The first device designed to explore a paper in natural way under the complete control by the user, with the purpose of enabling him to read text, was the Optacon (OPtical to TActile CONverter). It was developed (and then commercialized by Telesensory System Inc.) in 1969 at the Stanford University by Linvill and Bliss. Before such device there were many studies and trials for making the reading task easier for visually impaired people, as reported in [1]. The very first electronic reading aid was the Optophone, developed in 1912 by Dr. E. Fournier d’Albe, which was able to reproduce an audible signal according to the source image seen through a scanning slit. The results were not so promising as expected, so, when the Optacon appeared, it was considered a big im-
308
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
provement, since it enabled the user to read printed text in a useful manner, with only a little training (in comparison to the alternatives of course). The basic idea of the Optacon was to equip the user with a little scanner for exploring the paper, and the graphical information were been reproduced on a little tactile array of actuators. From the image processing standpoint, it performed an image binarization with a prefixed threshold (this parameter could be tuned by the user), and so characters were not translated into the Braille ones, but simply represented point by point. That choice is what Linvill and Bliss called direct translation, in contrast with recognition (see [2]). These choices were made with the aim of designing a portable device with enough autonomy. The device features were settled in a study [3] aimed to to verify the quality of the tactile stimulations and to discover the best resolution of both the tactile and photocells array in order to enable the user to read text. The major advantage of the Optacon was the ability of making the user able to understand not only the text but also the graphics in an active way, since it is the user to decide which part of the paper should be examined. On the other side, it required many hours of training, and it could not be used continuously for a long period because of the fingertips stress. In comparison, a scanner with a PC equipped with an OCR software are more usable for reading a long text. Actually this device is no longer produced, but it seems that a lot of people still use it. This should lead to the conclusion that it was an excellent device, carefully designed around the user needs, but the technology available at that time was not sufficient to achieve such ambitious goal. Recently (beginning of 2008), Kurzweil and National Federation of the Blind, have released a new device called “kReader Mobile”2 . It is a commercial mobile phone, equipped with a camera, on top of which they developed a specific software able to recognize text in the images. Reading the available documentation [4], we can see that the user has first to take a snapshot of the entire paper, than the software tries to recognize all the text or tells to the user to adjust the shot in order to take a better new snapshot. It is not a “real-time” way of reading, since the image processing and the text recognition is still off-line. According to this modus operandi, the software needs a certain amount of time (difficult to estimate), making the user to wait for the first results, communicated by means of the speech synthesis. It may be that multiple attempts would be needed before getting the right image: the user is blind and it is difficult to focus the entire subject in the proper way (e.g. minimizing perspective issues). Anyway it is still an helpful tool that addresses the problem with a different approach than the one discussed in this study.
2. Proposed solution The main effort has been the software development and the integration of the tactile transducer into the first prototype [5]. The idea (inspired by the Optacon) has been to develop a tool that, in this case, simulates a finger extension, with the user holding his fingertip steadily on the tactile transducer while moving the device around the paper. It would have been the software to move the Braille information under the fingertip according to the position of the characters in the image acquired by the integrated camera. The 2 When we started the Haptic project this device was neither released nor announced; so it has to be considered as a concurrent device, rather than a predecessor one (as in the case of Optacon).
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
309
percept of the relative movement between the tactile feedback and the fingertip should have been the same as the one felt by the user reading a Braille book.
(a)
(b)
(c)
(d)
Figure 2. Views of Haptic prototypes: (a) the first prototype used only for development purposes; (b) prototype with integrated tactile display; (c) detailed view of the integrated camera; (d) detailed view of the tactile display.
Figure 2a gives an idea about the use of the proposed device, although it is only a first prototype, built with an old PC mouse and a small webcam inside; it was used only for software development purposes. The first real prototype was a device equipped with a USB camera on the bottom side and the tactile transducer on the top side (see Figures 2b, 2c and 2d). It was designed with the same ideas of the Optacon in mind, but with the valuable addition of the text recognition feature, preferring the recognition approach instead of the direct translation one (see [2]). At that time there was no hardware capable of supporting such approach, but nowadays this seems feasible, and thus the user can directly read the Braille character. This was the desired big improvement leap for those reading aids [3], together with the capability of helping the user follow the text flow (avoiding the need of a mechanical tracking aid) and eventually to spell the recognized characters. With all the previous studied solutions [1], the main limit was considered to be the way of encoding the information in an acceptable manner for the user; using the Braille, or the speech synthesis, has the advantage of reducing this gap. It is the device that goes towards the user needs and experience, and not the opposite, where people have to adapt themselves to the machines. Compared to the Braille terminals or printers, the proposed solution should have the advantage of being an all-in-one tool, meaning that it does not require any external accessory; having also the (optional) speech synthesis should enable all the visually impaired people to use this aid, even the ones who do not understand the Braille. Using a common available camera allowed to have a first software prototype in a shorter time and a good input image quality. In this way the actual field of view (about 17mm × 13mm) has been increased respect to the Optacon one (about 6mm × 3mm), enabling the software to view about 3 lines (10pt) at the same time. The only issue was how to find the right height from the surface and how to light up the paper in order not to have dark images. Clearly the system is still not able to read wide portions of text,
310
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
but according to the recognition approach, the software can “track” the user movements, providing useful information such as the vertical alignment to the current line of text, the switching between them and the reaching of the end/beginning of a line. The tactile transducer followed the Optacon idea of giving to the user a continuous feeling, so each pin of the array is put at a distance of about 1mm and they are arranged in a 8 × 8 matrix. This should be sufficient to stimulate the whole fingertip and also to correctly reproduce Braille symbols. The technology used is based on piezoelectric bars able to raise the required pins. The choice was not to use vibrating elements in order to avoid stressing the fingertip. Both the camera and the tactile display were connected to a PC and a specific software was developed. The speech synthesis was considered as an alternative to the Braille. It was not possible to deliver to the user all the needed information, i.e. the ones upon the text recognition and others on the movement tracking, so the proposed solution was to use the tactile display only for the text, and the speech synthesis for all the others.
Figure 3. Haptic software data flow diagram
From an architectural standpoint, the system can be divided into three parts: the first one acquires images from the paper the user is going to read, the second one processes the data and feeds the third unit which translates the resulting information into audio and tactile ones. The internal data flow is clearly more complex and its diagram can be seen in Figure 3. First of all the input data is pre-processed, in order to clean the bitmap and overcome some issues like a different illumination across the image. Then some useful higher level data are computed, and these are used to detect the alignment of the device along the text lines, in order to give useful advices to the user. These information are the ones that control almost all the successive processing stages, for example determining which character image will be analyzed. The recognition subsystem is based on Artificial Neural Networks (ANNs), a detail which needs to be pointed out, since it plays an important role in this design. We use an ensemble of ANNs, one for each symbol to be recognized, and a winner-takes-all approach. The feature extraction method, the network
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
311
topology and their training have been carefully studied in order to have reliable results (further information can be found in [6]). The actual implementation only works on one character image each time, for run-time performance reasons. Nevertheless all the results are collected and then processed to form a string, something like a buffer of the previous recognitions. This is useful both when the user wants to back track and when entire words are identified and eventually corrected. All the computed data, both about the text and the device position, are conveyed to the user through a user feedback subsystem. Its role is to abstract the specific underlying way of communicating information to the user by presenting a uniform and common interface. As said before, in this prototype there are two of them: one for the tactile display and one for the speech synthesis. Everything outlined above has to be done in real-time, so it is quite necessary to use lightweight algorithms and small data structures as much as possible. For these reasons, computations are made only when necessary (e.g. when the current frame is much different from the previous one) and each step tries to reduce and simplify the size of the data to be elaborated (giving higher level information). The software has also to be flexible enough in order to deal with different type of media (it is easy to note that paper, ink and so on, used by magazines are quite different from those found in newspapers), so it has been developed an adaptive system which can have multiple configurations selectable by the user.
3. Results In this paper we have seen how the problem of enabling the blind to read printed text has been addressed in the past and in the Haptic project by designing a novel interactive reader device. The most promising aid has been the Optacon (nowadays no more available), which was a portable device with a little scanner and a tactile display. Its main drawback was the lack of a proper character recognition functionality. On the opposite side, there is the newly designed Kurzweil kReader, but it requires to correctly take a snapshot of the whole paper before processing the image. Both of them have pros and cons, and we have seen how the freedom of use and the text recognition abilities have been combined together in a unique device. The proposed solution is still only a prototype, not usable on a daily basis, but complete enough to be tested by the end users, in order to verify the ideas and its real usefulness. Thus the so called “Haptic Workshop” (Turin, 12th June, 2006) has been organized, where final users coming from different contexts, together with a neuropsychiatrist and a tiflopedagogist, were invited. The users only received some basic advices, and then they were free to test it. This was important also to understand how they would have used it, since some assumption were made while designing the software, but they needed to be confirmed. All the opinions were summarized and divided them into two categories: hardware and software. Of course most of them could be seen as criticisms, but it is better to consider them the answer to the question “Where we should go?”. 3.1. Hardware The first impression about the device was surely enthusiastic, both for the new approach to the problem of reading printed text and for being potentially very useful on a daily
312
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
use, because of its flexibility and for being designed around the user needs. Of course such goal is very ambitious and it is almost impossible to fulfill it at the first chance, so the purpose of this workshop was precisely to exploit the weaknesses of this first prototype, especially in typical daily use, where the behavior and the needs vary a lot from user to user. Starting from these considerations, the most demanded characteristic was about the ergonomics: it was simply too heavy and big to be moved around the paper in a comfortable way. Regarding the tactile display, the major flaw was the inability of representing more than one character at a time. The ideal solution would be to have the opportunity to show an entire word at a time. This would make the system more similar to a Braille bar, but this solution would considerably increase the weight and the dimensions of the device (with the actual technology). Besides the tactile solution, we considered also the speech synthesis option. By our opinion, it was a second choice, only used because we did not find a suitable tactile way of delivering to the user the information about the position and the alignment inside the text flow. Facing with real users, we have seen that it is much more useful than expected, since they have reported that no more than 10–15% of the entire blind population is able to read the Braille. It is noteworthy that using only the speech synthesis, and not a tactile display, has the advantage of allowing a more compact and ergonomic design of the aid. Despite the above considerations, it was also remarked that a tactile display would not be useless. Indeed, when reading a challenging text, for example, the user has to pay more attention to the content, and in these cases the Braille would be more effective3 . In the end, the tactile display should still be considered, but as an option. 3.2. Software The proposed software solution, still in a development stage, showed to be quite effective in recognizing the text, but not yet very usable in a daily context. The first demanded enhancement was clearly the word-oriented features, i.e. the possibility of reading an entire word at once instead of letter by letter. The software was able to partially record the character recognitions, in order to facilitate going backward and re-reading some parts, but it has to be improved and to be completed with a syntactic analysis. Another area of improvement was found in the tracking algorithm, which is in charge of detecting the user movements, to properly activate the recognition process and to inform the user about the alignment in the text flow. As explained previously, it was preferred, for computational performance reasons, to have light data structures and functions, but in this case it seems that the problem was too simplified. This could be also due to a not perfect understanding of the user behavior, or, looking also to the hardware issues discussed above, to the difficulties to move the device in a smooth way.
4. Conclusion and future perspectives The aim of the workshop described in the previous section was to validate the ideas and to find how to improve our proposed solution, and in this regard it has been very useful. It has been stated that the solution is very interesting, the approach to the problem is 3 Also for sighted people in such situations, usually they prefer to read them selves the text instead of listening someone else.
P. Motto Ros et al. / Interactive Reader Device for Visually Impaired People
313
the right one. The results were encouraging enough to start a whole new project, named Stiper (STImolazione PERcettiva — Perceptive Stimulation), always funded by INFN and in collaboration with ISS (Istituto Superiore di Sanità). Stiper2 is (it is still underway) the natural prosecution of the former, and involves also Univerità di Roma 2. They follow the same research line traced by the previous Haptic project. The Haptic prototype was the result of a feasibility study, and the next step is to design a light and small device, which does not not require an external PC to perform computations. For this reason, the main goal of the Stiper project is to design and develop a second generation reader device, with the portability and autonomy as the key features.
References [1] P. W. Nye, “Reading aids for blind people — A survey of progress with the technological and human problems,” Medical and Biological Engineering and Computing, vol. 2, no. 3, Jul. 1964. [2] J. Linvill and J. Bliss, “A direct translation reading aid for the blind,” Proceedings of the IEEE, vol. 54, no. 1, pp. 40–51, Jan. 1966. [3] J. Bliss, “A relatively high-resolution reading aid for the blind,” Man-Machine Systems, IEEE Transactions on, vol. 10, no. 1, pp. 1–9, March 1969. [4] kReader Mobile User Guide, K-NFB Reading Tecnology Inc., 2008. [5] P. Motto Ros, “Lettore ottico per non vedenti,” Master’s thesis, Politecnico di Torino, 2005. [6] P. Motto Ros and E. Pasero, “Artificial neural networks for real time reader devices,” in Neural Networks, 2007. IJCNN 2007. International Joint Conference on, Aug. 2007, pp. 2442–2447.
314
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-314
On the Relevance of Image Acquisition Resolution for Hand Geometry Identification Based on MLP Miguel A. FERRERAa , Joan FÀBREGASb, Marcos FAUNDEZ-ZANUYb,1, Jesús B. ALONSOa, Carlos TRAVIESOa, Amparo SACRISTANb a Universidad de Las Palmas de Gran Canaria, Departamento de Señales y Comunicaciones, Centro Tecnológico para la Innovación en Comunicaciones, Spain. b Escola Universitària Politècnica de Mataró, Spain
Abstract. The effect of changing the image resolution over a biometric system based on hand geometry is analyzed in this paper. Image resolution is progressively diminished from an initial 120dpi resolution up to 24dpi. The robustness of the examined system is analyzed with 2 databases and two identifiers. The first database acquires the images of the hand underneath whereas the second database acquires the images over the hand. The first classifier identifies with a multiclass support vector machine whereas the second classifier identifies with a neural network with error correction output codes. The four experiments show that an image resolution of 72dpi offers a good trade-off between performance and image resolution for the 15 geometric features used. Keywords. Hand-geometry, resolution, biometrics, neural networks
Introduction Although hands have not attracted as much researchers as face and speech they represent an important part of communication between humans. In this paper we perform some experiments about the relevance of hand acquisition in a biometric handrecognition application based on neural networks. Numerous applications for personal identification exist and more are emerging daily. Examples of personal identification applications include immigration and border control, physical access control, etc. As a result, the area of biometrics will continue to be an area of interest for many researchers [1]. It is generally accepted that ideally a biometric should satisfy the four criteria of universality, uniqueness, permanence, and collectability [2]. The choice of biometric identifiers has a major impact on the performance of the system. Some of the major biometric identifiers in use today are fingerprint [1, pp.4364], hand geometry [3], iris [1, pp. 103-121], and face [1, pp. 65-86]. While systems based on fingerprint and eyes features have, at least to date, achieved the best matching performance, the human hand provides the source for a 1
Corresponding Author: Marcos Faundez-Zanuy, Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303 Mataró (Barcelona), Spain. E-mail: [email protected]
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
315
number of physiological biometric features. The idea of using hand features as a means of personal identification is not new. This approach was proposed as early as the 1970s [4]. The features are extracted from hand geometry, hand contour, hand palm, hand pressure profile, hand vein, etc. Hand geometric systems use an optical camera to capture two orthogonal two dimensional images of the palm and sides of the hand, offering a balance of reliability and relative ease of use. They typically collect more than 90 dimensional measurements, including finger width, height, length, distances between joints, and so on [5]. Although the basic shape and size of an individual’s hand remains relatively stable, the shape and size of our hands are not highly distinctive. In the laboratory experiments, the hand image is taken underneath but these readers are impacted by dirty hands (as fingerprint sensors can be) or dusty environment. Therefore the hand readers commonly used for access control to facilities, time clocks, or controlled areas use to take the image over the hand [6]. The main drawback of these systems is the large size of current hand geometry readers, which restricts their use in widespread applications such as those requiring small user interfaces (e.g., home computer user, keyboard). The designers try to reduce its size using mirrors in order to reduce the distance between the camera and the hand keeping the focal distance. Another way of reduce the scheme size is to place the rest of electronic devices (microprocessor, hard disk, power supplies, and so on) as together as possible. A new research line tries to acquire the hand image contactless but it is not a mature research line and it has still some problems with the illumination and obtaining projection invariant measures. This strategy rise the enclosure temperature increasing the problems of heat dissipation. A way of alleviating this drawback is to reduce the microprocessor speed without increasing the response time of the scheme, which is equivalent to reduce the computational burden of the identification system. An easy way of reducing the computational load of the system is to decrease the hand image resolution. This paper is dedicated to study the impact of the hand image resolution over the final performance of the hand geometry biometric device. In order to get general conclusions, we will present experiments with different databases and classifiers.
1. Hand databases In order to obtain more general results about the robustness against the image resolution, the biometric scheme has been evaluated with two different databases. The first one takes the images of the hand underneath and the second one takes the images over the hand 1.1. Underhand database This database consists on 10 different acquisitions of 85 people. The image has been acquired underneath the right hand of each user. We have used a conventional document scanner, where the user can place the hand palm freely over the scanning surface; we do not use pegs, templates or any other annoying method for the users to capture their hands. Most of the users are students of Las Palmas de Gran Canaria University within a selective age range from 23 to 30 years old. So, far away from
316
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
simplicities, we have selected the users to have approximately equal hand characteristics to see how the algorithms perform in the worst case. In the database, 54% of the users are males. The main drawback of this system for real applications is that to enrol each hand it takes about 20 seconds. The images have been acquired in a university office with a typical desk-scanner using 8 bits per pixel (256 gray levels), and a resolution of 150dpi (1403 by 1021 pixels). To facilitate later computation, every scanned image has been scaled by a factor of 20%. The first 50 peoples of this database correspond to the free GPDSHand database [7]. Fig. 1 shows an example of image acquired with this scheme.
Figure 1. Underhand image acquired.
1.2. Underhand database The second database has been acquired taking into account operational conditions. It consists of 10 different acquisitions of 85 people. The acquisition sensor has been a commercial web cam (it costs 10€) with infrared filter. The image size is 640 by 480 pixels with 8 bit in grayscale. The lighting system used was an array of infrared LEDs in the band of 850nm. The hand is placed on a white surface with a low reflection coefficient and the surface contains several pegs in order to help the users to put his/her hand. A mirror is used to acquire a 3D image of the hand in order to both, to measure the wide of the hand and make it less vulnerable to fraudulent attacks. The database has been collected in different bar and pubs which usual customer comes from different environment: fisherman, basket players, clerks, and so on. So we have hands with different problems. This acquisition system is able to be used for real time applications because the hand images are acquired instantly. Fig. 2 shows an example of image acquired with this infrared system.
Figure 2. Overhand image acquired
2. Geometric Parameters Once the image has been acquired, the 8 bits greyscale image is converted to monochrome image (1 bit per pixel), applying a threshold:
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
317
MonocromeImage=GrayScaleImage>threshold where the threshold is calculated by means of the Otsu algorithm. As the contrast between the image and the background is quite high, this step is not a problem. The contour of the hand is obtained from the monochrome image by following the edge. To locate the ends and valleys between the finger of the hand, the Cartesian coordinates of the hand contour are converted to polar coordinates (radius and angle) considering as coordinates origin the centre of the hand base. The peaks of the radius function locate the finger ends and the minimums of the radius indicate the valleys between fingers. This procedure is summarized in fig. 3.
Figure 3. Hand contour and radius. The ends and valleys between finger corresponds to the maximum and minimum of the radius
The exterior base of the index and little finger is calculated drawing a line going from the index-heart fingers valley to the ring-little fingers valley. Then, 20º are added to the slope of the above mentioned line and the line is extended up to the contour is crossed. The exterior of the thumb is worked out as the intersection of the contour and the line going from the heart-ring fingers valley to the index-thumb fingers valley. Fig. 4 shows an example of this points being detected.
Figure 4. Fingers end, fingers valley and exterior base of thumb, index and little finger located.
Once located the end and base of all the fingers, it is possible to measure the height and width of each finger. The height is calculated as the distance between the finger end and the geometric centre of the right and left sides of the finger base. The finger width is the distance between the both sides of the finger base. The third measure that characterizes each finger is the width of the finger at the 70% of the height. So there are a total of 15 measures (3 measures per finger) that characterize the hand geometry. Figure 5 shows an example with the underhand with overhand databases. The parameterization time depends of the image resolution. If we use a image resolution of 120dpi (the maximum considered in this paper), a PC Pentium IV 2.8 GHz take less than 1 second per hand to calculate the geometric measures.
318
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
Figure 5. Measures that characterize the hand geometry with the underhand and overhand databases.
3. Classifiers In order to identify to who belongs a hand from the geometrical parameters, it has been tried with two different classifiers: a support vector machine, and a neural network. 3.1. Support Vector Machine (SVM) A Support Vector Machines (SVM) [8] is a very spread classifier because it generally provides better generalization performance when the amount of data is small. Roughly, the principle of SVMs relies on a linear separation in the feature space where the data have been previously mapped, in order to take into account the eventual non-linearities of the problem. In order to achieve a good level of generalization performance, we maximize the distance (margin) between the separator hyperplane and the data. In the Structural Risk Minimization principle, Vapnik has proved that maximizing the margin means in fact minimizing the VC-dimension of the linear separator model, which has been shown to be a good way to reduce the generalization risk. To generalize the linear case one can project the input space into a higherdimensional space in the hope of a better training class separation. In the case of SVM this is achieved by using the so-called kernel trick. In essence, it replaces the inner product used to calculate the distance between the input and the separator hyperplane with a kernel function K. Amongst the commonly used kernel functions are the polynomial kernel and the RBF kernel. The software used to train and test this identification problem has been the SVM multiclass [9]. The gamma variable is optimized experimentally. 3.2. Neural Networks (NN) The geometrical parameters of the hands have been also classified with a Multi-Layer Perceptron (MLP) [10]. It has been trained as follows: when the input data belongs to a genuine person, the output (target of the NN) is fixed to 1. When the input is an impostor person, the output is fixed to –1. The MLP used has 80 neurons in the hidden layer; it has been trained with gradient descent algorithm with momentum and weight/bias learning function. The MLP has been trained 10000 epochs using regularization. We also apply a multi-start algorithm and we provide the mean, and standard deviation of the result for 50 random different initializations. The input signal has been fitted to a [–1, 1] range in each component. An improvement to the MLP training method described above is proposed in [11][12] and called error-correcting output coding (ECOC). In this approach, each class i is assigned an m-bit binary string, ci, called a codeword. The strings are chosen so that the Hamming distance between each pair of strings is guaranteed to be at least
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
319
& dmin. During training on example x , the m output units of a 3-layer network are c & clamped to the appropriate binary string f x . During classification, the new example & x is assigned to the class i whose codeword ci is closest (in Hamming distance) to the m-element vector of output activations. The advantage of this approach is that it can recover from any t d min 1 / 2 errors in learning the individual output units. Errorcorrecting codes act as ideal distributed representations. In [11,12] some improvements using this strategy were obtained when dealing with some classification problems, such as vowel, letter, soybean, etc., classification. In this paper, we apply this same approach for biometric recognition based on handgeometry measurements [13]. If we look the output codes (targets) learnt by the neural network when the input & pattern x k user, we can observe that just the output number k is activated, and the number of outputs is equal to the number of users. This approach will be named oneper-class, and will be a 15×80×85 MLP. An alternative is the use of natural binary coding, which provides a reduced number of outputs, because we can represent 85 different alternatives just using 7 bit. Thus, this approach will be a 15×80×7 MLP. Another approach is to assign to each user a different code. These codes can be selected from the first 85 BCH 2 (15,7) codes, BCH (31,11), etc. In fact, for instance, BCH (15,7) yields up to 27 128 output codes. However, we just need 85, because this is the number of users. It is interesting to observe that: x BCH (15,7) code provides a minimum distance of 5 bits between different codes, while one-per-class approach just provides a minimum distance of 2, and natural binary coding a minimum distance of one. x BCH (15,7) provides a more balanced amount of ones and zeros, while in oneper-class approach almost all the outputs will be inhibitory. A good error-correcting output code for a k-class problem should satisfy two properties [11]: x Row separation: each codeword should be well-separated in Hamming distance from each of the other codewords. x Column separation: each bit-position function fi should be uncorrelated from the functions to be learnt for the other bit positions f j , j z i .
Error-correcting codes only succeed if the errors made in the individual bit positions are relatively uncorrelated, so that the number of simultaneous errors in many bit positions is small.
4. Experimental Results This section is dedicated to present the robustness of the described biometric systems based on hand geometry when the image resolution is progressively diminished in both databases. So first we will introduce the resolution in which we have tested both systems.
2 An (n,k) BCH (Bose-Chaudhuri-Hocquenghem) is a k-column binary Galois array. It is used for error control detection techniques when messages are transmitted in a digital communication system [14].
320
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
4.1. Reduction of the Image Resolution The underhand database has been scanned at 150 dpi and reduced 80%. So, its initial resolution is 120dpi. The images have been reduced by means of the nearest neighbour interpolation method to 96dpi, 72dpi, 48dpi, and 24dpi. Results of the underhand database with those 5 resolution and 2 classifiers will be given The overhand database has been acquired with a webcam. The webcam gives images of 640 by 480 pixels. In order to know the equivalent resolution of the images contained in the overhand respect to the images of the underhand databases, the average area of the hands in each database has been calculated. The resolution ratio of the images is obtained as: ratio
Aunderhand Aoverhand
being Aunderhand the average area of the hands in the underhand database and Aoverhand the average area of the hands in the overhand database. The experimental ratio obtained is 0.6. So, the equivalent resolution of the overhand area is equal to the resolution of the underhand database by 0.6 that is 72dpi. The overhand database has been reduced to 48dpi and 24 dpi. So, results of this database with these 3 resolution and SVM classifier will be given. 4.2. Computational Considerations Decreasing the image resolution has the advantage of reducing the computational requirements of the microprocessor during the parameterization stage. The posterior verification step is not influenced by the image resolution because the verifier works just with the parameters. Nevertheless, the computational load of the parameterization step is the 85% of the computational requirements of the whole scheme. At the table 1 is shown the parameterization time in function of the image resolution supposing that the parameterization time for 120dpi is equal to 1. Table 1. Parameterization time in function of the image resolution Image resolution
Parameterization time
120
1
96
0.73
72
0.52
48
0.23
24
0.17
4.3. Results with the SVM Multiclass Classifier To train the multiclass SVM, the databases have been divided into two datasets: the train dataset and the test dataset. The train dataset contains four randomly selected hand images of each person. The data set contains the six remainder hands. The SVM has
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
321
been trained with the train dataset (including the gamma variable adjustment) and has been tested with the test dataset. This procedure has been repeated ten times in order to obtain an averaged recognition rate. The averaged recognition rates and their standard deviation for each database and each image resolution can be seen in Table 1. The results displayed in table 1 show that the input image resolution can be reduced up to 72dpi without lost of performance. Light reduction of the recognition rate is obtained if we reduce up to 48dpi. This performance reduction looks reasonable against the reduction of the computational burden up to 40%. Table 2. Averaged recognition ratio and standard deviation of the biometric scheme based on SVM multiclass Underhand database
Image resolution
average
deviation
99.28 %
0.42
120
99.67 %
0.20
96
99.85 %
0.42
98.52 % 96.45 %
Overhand database average
deviation
72
95.92 %
1.06
0.36
48
94.85 %
1.4
0.88
24
91.29 %
1.82
4.4. Results with the MLP-ECOC Classifier To train the MLP, the database has been divided into two datasets: the train dataset and the test dataset. The train dataset contains four randomly selected hand images of each person. The data set contains the six remainder hands. The MLP-ECOC has been trained with the train dataset and has been tested with the test dataset. The MLP-ECOC used contains 15 inputs, and 80 hidden neurons, both of them with hyperbolic tangent sigmoid nonlinear transfer function. The computation of the Mean Square Error (MSE) between the obtained output and each of the codewords provides a distance measure (it has been also tried with the mean Absolute Difference but the MSE provides slightly better results). Table 3. Averaged recognition ratio and standard deviation of the biometric scheme based on MLP-ECOC Image
Underhand database
resolution
average
deviation
120
98.12 %
0.55
96
97.83 %
0.66
72
97.26 %
0.63
48
96.03 %
1.01
24
85.38 %
1.99
A BCH (63,7) ECOC code has selected after trying with (15,7), (33,11), (63,7), and (127,8). The average recognition rate and standard deviation is reported on table 2 after 50 random initializations and 10000 training epochs. The results of table 2
322
M.A. Ferrera et al. / The Relevance of Image Acquisition Resolution
confirm the conclusions of table 1: the resolution of the image cannot be reduced beyond of 48 dpi without a severe loss of performance.
5. Conclusions This paper has tackled the problem of the robustness of a biometric scheme based on hand geometry against the hand image resolution. The conclusions obtained after checking two biometric schemes based on different classifiers with two uncorrelated databases are the next: x The hand image resolution can be reduced up to 72dpi without lost appreciable of recognition rate. x In this case, and comparing with 120dpi resolution, we save a 40% of the computational load x Light reduction of the recognition rate is obtained if we reduce up to 48dpi. x Severe reduction of the biometric scheme performance if the image resolution in reduced beyond 48 dpi. Acknowledgements. This work has been supported by the Spanish project MEC TEC2006-13141-C03/TCM and COST 2102
References [1] A.K. Jain, R. Bolle, S. Pankanti, S, Biometrics: Personal Identification in Networked Society. Kluwer Academic Publishers, 2001. [2] M. Faundez-Zanuy, Biometric security technology, IEEE Aerospace and Electronic Systems Magazine 21 (6), (2006) 15-26. [3] R. Sanchez-Reillo, C. Sanchez-Avila, A. Gonzalez-Marcos, Biometric identification through hand geometry measurements, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (10), (2000) 1168-1171 [4] R. H. Ernst, Hand ID System, U.S.Patent No 3576537 (1971) [5] S. González, C.M. Travieso, J.B. Alonso, M.A. Ferrer, Automatic Biometric Identification system by hand geometry, Proceeding of 37th IEEE International Carnahan Conference on Security Technology, ( 2003) 39-41. [6] D.L. Woodward, Exploiting finger surface as a biometric identifier, Thesis of Graduate Program in Computer Science and Engineering. University of Notre Dame, Indiana, U.S.A. , 2004. [7] C.M. Travieso, J.B. Alonso, S. David, M.A. Ferrer, Optimization of a biometric system identification by hand geometry. Cherbourg, France, 581-586, 19-22 September 2004 [8] K.R. Müller, S. Mika, G. Rätsch, K. Tsuda, B. Schölkopf, An Introduction to kernel-based learning algorithms, IEEE Transactions on Neural Networks, 12 (2), (2001) 181-201. [9] K. Rammer, Y. Singer, On the Algorithmic Implementation of Multi-class SVMs, Journal on Machine Learning Research (2001) [10] S. Haykin, Neural nets, A comprehensive foundation, 2on edition. Ed. Prentice Hall (1999) [11] T.G. Dietterich, G. Bakiri, Error-correcting output codes: A general method for improving multiclass inductive learning programs, Proceedings of the Ninth National Conference on Artificial Intelligence. Anaheim (1991) [12] T. Dietterich, Do Hidden Units Implement Error-Correcting Codes?, Technical report (1991) [13] M.A. Ferrer, M. Faundez-Zanuy, C.M. Travieso, J. Fabregas, J.B. Alonso, Evaluation of Supervised versus Non-Supervised Databases for Hand Geometry Identification, 40th IEEE International Carnahan Conference on security Technology, Lexington, Kentucky, 180-185, October 16-19, 2006.
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-323
323
Evaluating Soft Computing Techniques for Path Loss Estimation in Urban Environments Filippo LAGANÀ, Matteo CACCIOLA 1 , Salvatore CALCAGNO, Domenico DE CARLO, Giuseppe MEGALI, Mario VERSACI and Francesco Carlo MORABITO University Mediterranea of Reggio Calabria, DIMET Via Graziella, Feo di Vito, 89100 Reggio Calabria, Italy Abstract. Many studies have been carried out by the scientists so far, and now we have many propagation models of electromagnetic waves for various kind of building structures. Position of the buildings in the streets in urban areas, or the corridors in the office buildings, can be thought as waveguides for electromagnetic waves. For each kind of building structure, different mathematical models have been proposed and good approximations have been done by successful studies. In this context, the path loss estimation on urban environment is presented. Particularly, an urban street of Reggio Calabria, Italy, has been considered. In order to proceed for the estimation of path loss, we firstly exploited the most applied numerical methods for generating training and testing data, and subsequently we evaluated the performances of suitable Support Vector Machines in approximating the path loss values. Precisely, used numerical method are the Okumura-Hata model and the Ray-Tracing method, carried out with Wireless InSite software. Obtained results showed that Support Vector Regression Machines (SVRMs) provide more accurate prediction of path loss in urban area than Neural Networks. Final results pointed out the possible use of Support Vector Machines in this kind of application, with interesting applications since the lower computational cost than classical numerical method. Keywords. Path loss prediction, Evaluating forecasts, Support Vector Machines, Artificial Neural Network, Urban environment
Introduction The estimation of electromagnetic waves (EM-waves) path loss in urban environment is necessary for local wireless system and cell or micro-cell design of modern mobile communication networks. We could not find more accurate estimation of path loss in urban areas because of dispersion caused by reflection and blocking due to vehicles, pedestrians, and other objects on the road. Therefore, some researchers have measured radio waves and statistically modelled their results [1,2,3,4,5,6,7,8,9,10,11]. Among nu1 Corresponding
Author: University Mediterranea of Reggio Calabria, DIMET, Via Graziella Feo di Vito, I-89100 Reggio Calabria, Italy; E-mail: [email protected].
324
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
merous propagation models, the Okumura-Hata [12,13] model is the most significant one, providing the foundation of today’s mobile communications services. The model is based on extensive experimental data and statistical analyses which enable us to compute the received signal level in a given propagation medium. Many commercially available computer aided prediction models depend on the propagation environment. For example, the standard Okumura-Hata model generally provides a good approximation in urban and suburban environments. This model is also useful for micro-cellular services where antenna’s heights are generally lower than building heights, thus simulating a so called "urban canyon" environment. However, the measurement at all frequencies in the EMwaves band have to take a lot of time and the cost is increased. Including, the path loss depend on the uncertainty of traffic on the road. It does not ease the modelling of the path loss with conventional mathematical model. In this context, our aim is to propose a computational intelligence model to estimate the unknown path loss with comparable accuracy and more computational convenience of an urban street of Reggio Calabria, Italy. The paper firstly presents a theoretical background and, then, a comparison between Okumura-Hata and Ray-Tracing [14,15,16,17] models by using Support Vector Machines (SVMs) [18,19]. Subsequently we present a comparison between implemented SVRM based approach and an Artificial Neural Network (ANN) and finally we draw our conclusions.
1. Theoretical Background of Classical Path Loss Models In this section, we introduce the most important and used classical approaches for estimating the path loss in urban environments. They are the so called Okumura-Hata and the Ray-Tracing models. Whilst the former has been implemented using Matlab , we exploited the already cited Wireless InSite package for implementing suitable cases of study. 1.1. Okumura-Hata model The Okumura-Hata model is based on experimental data collected from various urban environments having approximately 15% high-rise building. The path loss formula of the model is given by L[db] = 69.55 + 26.16 log (f ) − 13.82 log (hB ) − a (hM ) +
(1)
+ (44.9 − 6.55 log (hB )) log (d) where L[db] is the path loss in [dB]; f is the frequency in [Hz]; d is the distance between the base station and mobile in [Km]; hB is the effective height of the base station in meters; a (hM ) = (1.1 log (f ) − 0.7) hM − (1.56 log (f ) − 0.8); hM is the mobile antenna’s height. Equation (1) may be expressed conveniently as L[db] = L0[db] + (44.9 − 6.55 log (hB )) log (d)
(2)
or more conveniently as L[db] = L0[db] + 10γ log (d)
(3)
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
325
where L0[db] = 69.55 + 26.16 log (f ) − 13.82 log (hB ) − a (hM ) γ=
(4)
44.9 − 6.55 log (hB ) 10
(5)
From Equation (3), we also notice that the Okumura-Hata model exhibits linear path loss 182 frequency: 877 [MHz] frequency: 886 [MHz] frequency: 812 [MHz]
180 178
Path Loss [dB]
176 174 172 170 168 166 164 162 15
20
25
30 35 Distance [Km]
40
45
50
Figure 1. Okumura-Hata model: Path loss [dB] vs. Distance [Km] at three different frequencies [MHz]
characteristics as a function of distance where attenuation slope is γ and the intercept is L0 . Since L0 is an arbitrary constant, we write L[db] ∝ 10γ log (d)
(6)
and in the linear scale L[db] ∝
1 with γ = 3.5 to 4 dγ
(7)
1.2. Ray-Tracing model In order to calculate the path loss value, a model of the urban street of Reggio Calabria, Italy has been exploited. This model is obtained by using the Wireless InSite software, based on Ray-Tracing method. In physics, Ray-Tracing is a method for calculating the path of waves or particles through a system with regions of varying propagation velocity, absorption characteristics, and reflecting surfaces. Under these circumstances, wavefronts may bend, change direction, or reflect off surfaces, complicating analysis. RayTracing solves the problem by repeatedly advancing idealized narrow beams called rays through the medium by discrete amounts. Simple problems can be analyzed by propagating a few rays using simple mathematics. More detailed analyses can be performed
326
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
by using a software package to consider propagation of many rays. In this context we have carried out a geometrical representation of the street (exemplified in Fig. 2) and we imposed physical parameters showed in Table 1.
(a)
(b)
Figure 2. Geometrical representation of the urban environment: a road of Reggio Calabria, Italy.
Table 1. Parameters imposed in the Ray-Tracing model (within the Wireless InSite software) Parameter
Value
EM-waveform Carrier frequency Effective bandwidth Phase
900 [MHz] 1 [MHz] 0o
Antenna Gain Antenna Receiver threshold Structure Buildings material
15.9984 [dBi] -250 [dBm] Concrete: εr = 15 [dimensionless], σ = 0.015 [S/m]
2. Evaluating SVRM performances in path loss estimation For Okumura-Hata, an application exploiting the one-against-one SVRM has been proposed. Firstly, the Okumura-Hata method has been implemented in Matlab , and a number of data has been numerically generated. The implemented Okumura-Hata model has the following inputs: • • • •
frequency; height of the radio-base station (from 30 to 200 [m]); height of the transmitting antenna (from 1 to 10 [m]); distance from the radio-base station of the receiving antenna.
It retains the measure of the path loss (calculated between 162.563 and 181.969 dB). After data collection, the database has been divided in two different subsets, the former used for training the SVRM, the latter exploited to test its performances in estimating the path loss. In this algorithm, SVRM inputs are:
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
327
• frequency; • distance from the radio-base station of the receiving antenna. For Ray-Tracing model, a similar procedure has been carried out, this time by exploiting the Wireless Insite software. Whilst, as usual, the considered output is the path loss (calculated between 1.051 and 250 dB), the inputs are: • • • • • • • •
frequency; antenna’s gain; distance from the radio-base station (1 [Km] with step of 100 [m]); linear density of habitations; mean height of habitations; length of the street; rotation angle of transmitting antenna; height of the transmitting antenna.
Linear (K (xi , xj ) = xi , xj ), polynomial, and Radial Basis Function (RBF) kernels have been considered in order to train and test the SVRMs, with a consequent analysis of the best performances by a convenient variation of the training parameters. Moreover: d
• polynomial kernel (K (xi , xj ) = (xi , xj + 1) ) has been evaluated by varying the degree d of the polynomial between 2 and 5; xi −xi 2
• RBF kernel (K (xi , xj ) = e− 2σ2 ) has been evaluated by varying the value of σ and considering the integer numbers between 1 and 10. Performances have been evaluated according to the following statistical indexes: Root Mean Squared Error (RMSE), Root Relative Squared Error (RRSE), Willmott’s Index of Agreement (WIA) and Mean Absolute Error (MAE) [20]. Table 2 summarizes the quantitative results of SVRMs in both cases, i.e. Okumura-Hata and Ray Tracing model, with the best kernel. It has been considered by trading-off the performances at the different kernels and the computational costs. Fig. 5 describes the postprocessing of the trained network response with a linear regression. The SVRM providing the best results is the linear SVRM trained and tested with data numerically generated by Okumura-Hata technique. It has been compared with the performances of an ANN equally trained and tested on the same sets previously used for SVRMs. In the next section we propose this further comparison. 3. SVRM vs. ANN Approach A comparison between the implemented SVRM and a Bayesian-rule based ANN improves the validity of obtained results. We set the neurons of the single hidden layer of the implemented ANNs according to the Kurkovà’s theorem [21] (see Table 2 for details). ANNs have been trained and tested on the same databases than SVRMs, by using tan-sigmoid (between input and hidden layer) and pure linear (between hidden and output layer) activation functions. We used a Back-Propagation (BP) algorithm with adaptive learning rate. The evaluation of the classification ability and the stopping criteria are based on the minimization of a Mean Squared Error (MSE) and Relative MAE (RMAE) learning performance indices, defined in equations (8) and (9) respectively, as a consequence of application of the BP algorithm:
328
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
Figure 3. Performances of SVRM for different kernels with Okumura-Hata model.
2 1 di − dˆi l i=1 l
M AE =
RM AE =
(8)
l 1 di − dˆi l i=1 di
(9)
being di and dˆi the i-th actual and estimated values, respectively. Moreover, because the implemented ANN retrieved unused codifies, ANN’s performances have been obtained after using a complicated system of thresholds, so slowing the computational elapsed times. On the contrary, the predictive SVRM-based, having approximately null time of response, can therefore be exploited in a real-time framework. Fig. 6 shows the evaluation of ANNs vs. learning epochs, while Fig. 7 draws final results for a comparison with SVRM approach. Table 2. Best ANN performances for the two considered numerical models ANN Okumura-Hata ANN Ray Tracing
R
Train epochs
Hidden neurons
1 0.626
450 100
4 25
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
329
Figure 4. Performances of SVRM for different kernels with Okumura-Hata model.
Figure 5. Scatter plots drawing the performances of SVRMs. R is the regression R-value. R=1 means perfect correlation.
4. Conclusions In this paper, the path loss prediction with computational intelligence methods has been discussed. In particular, a soft computing approach, based on the use of SVRM, has
330
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
Evaluation of ANN performances for Okumura−Hata model
Evaluation of ANN performances for Ray−Tracing model
Values of error parameters
Values of error parameters
1200 MAE RMSE RRSE
1000 800 600 400 200 50
100
150
200
250 300 Epochs
350
400
450
500
MAE RMSE RRSE
−2
10
−3
10
50
100
150
200
250 300 Epochs
350
400
450
500
100
150
200
250 300 Epochs
350
400
450
500
0.2 −1e−006
log(WIA)
WIA
10 0.15
−2e−006
10
0.1 −3e−006
10 0.05 50
100
150
200
250 300 Epochs
350
400
450
500
50
(a) Ray Tracing method
(b) Okumura-Hata model
Figure 6. Evaluation of ANNs for the different exploited learning epochs.
ANN for Ray Tracing, R=0.56898 500
ANN for Okumura−Hata, R=1
Data Points Best Linear Fit Out = Target
450
178 Simulated values
Simulated values
400 350 300 250 200
Data Points Best Linear Fit Out = Target
180
176 174 172 170
150
168
100
166 164
50 100
110
120
130 140 150 Actual values
160
(a) Ray Tracing method
170
180
164
166
168
170 172 174 Actual values
176
178
180
(b) Okumura-Hata model
Figure 7. Scatter results with ANNs.
been exploited in order to implement a model of prediction of path loss in a particular urban environment. Data from different methods have been used and the comparisons have been considered in order to obtain best performance. Moreover, input dimension has been varied according to the particular method. In this section, let us remark that the selected SVRM system shows very appreciable results, better than ANN. Moreover, the implementation of SVRM created on the Okumura-Hata numerical data was found effective despite the small number of input parameters. Conversely, the SVRM based on Ray-Tracing numerical data, having more inputs, provides lower performances. However, the proposed method provides a good overall accuracy in estimating the path loss in urban environment, as our experimentations demonstrate. At the same time, the procedure should be validated for different background or considering, for instance, the possibility of adding additional input parameters (about the transmitter or receiving antenna). Anyway, considering the presented results, the outcomes obtained from SVRM trained
F. Laganà et al. / Evaluating Soft Computing Techniques for Path Loss Estimation
331
on the Ray-Tracing model are more affordable than results retained by ANN. The related regression R-value is not so far from an acceptable value, and thus the model could be improved by validating the cross-influence of inputs. This study is on the way. All considering, we can affirm that SVRMs appear as a valid alternative tool for the issue of studying the path loss in the urban environments.
References [1] R. Prasad, Overview of wireless personal communications: Microwave perspectives, IEEE Commun. Mag. 35(4) (1997), 104–108. [2] A.J. Rustako Jr., N. Amitary, G.J. Owens, and R.S. Roman, Radio propagation at microwave frequency for line of sight microcellular mobile and personal communications, IEEE Transaction on vehicular technology 40(1) (1991), 203–210. [3] H. Masui, K. Takahashi, S. Takahashi, K. Kage, and T. Kobayashi, Difference of path loss characteristics due to mobile antenna heights in microwave urban propagation, IEICE Trans. Fundamentals E82-A-7 (1999), 1144–1149. [4] Y. Oda, K. Tsunekawa, and M. Hata, Advance LOS path loss model in microcellular communications, IEEE Transaction on vehicular technology 49(6) (2000), 2121–2125. [5] K. Taira, S. Sekizawa, G. Wu, H. Harada, and Y. Hase, Propagation loss characteristics for microcellular mobile communications in microwave band, in: Proc. of the 5th IEEE ICUPC (1996), Cambridge, MA, pp. 842–846. [6] T. Taga, T. Furuno, and K. Suwa, Channel modeling for 2-GHz-band urban line-of-sight street microcells, IEEE Transaction on vehicular technology 48(1) (1999), 262–272. [7] A. Yamaguchi, K. Suwa, and R. Kawasaki, Received signal level characteristics for radio channel up to 30 MHz bandwidth in line-of-sight microcells, IEICE Trans. Commun. E80-B (1997), 386–388. [8] E. Green and M. Hata, Microcellular propagation measurements in an urban environment, in: Proc. PIMRC (1991), pp. 324-328. [9] H. Masui, M. Ishi, S. Takahashi, H. Shimizu, T. Kobayashi, and M. Akaike, Microwave propagation characteristics in an urban quasi line-of-sight environment under traffic conditions, IEICE Trans. Commun. E84-B (2001), 1431–1439. [10] L.B. Milstein, D.L. Schilling, R.L. Pickholtz, V. Erceg, M. Kullback, E.G. Kanterrakis, D. S. Fishman, W. H. Biederman, and D.C. Salerno, On the feasibility of a CDMA overlay for personal communications network, services IEEE J. Select. Areas Commun. 10 (1992), 655–668. [11] H. Masui, T. Kobayashi, M. Akaike, Microwave pathloss modeling in Urban Line-of-sight Environments IEEE J. Select. Areas Commun. 20(6) (2002), 1151–1155. [12] Y. Okumura, E. Ohmori, T. Kawano, and K. Fukuda, Field strength and its variability in VHF and UHF land-mobile service Rev. Elec. Comm. Lab. 16(9-10) (1968), 825–873. [13] M. Hata, Empirical formula for propagation loss in land mobile radio services, IEEE Trans. Veh. Tech. 29(3) (1980), 317–325. [14] A. Glassner, An Introduction to Ray Tracing, Academic Press, New York, NY, USA, 1989. [15] P. Shirley, K.R. Morley, Realistic Ray Tracing, 2nd edition, A.K. Peters, New Jersey, USA, 2001. [16] H. Wann Jensen, Realistic image synthesis using photon mapping, A.K. Peters, New Jersey, 2001. [17] M. Pharr, G. Humphreys, Physically Based Rendering: From Theory to Implementation, Morgan Kaufmann, New York, NY, USA, 1989. [18] C. Cortes and V. Vapnik, Support vector network, Mach. Learning 20 (1997), 273–297. [19] A. J. Smola, Regression estimation with support vector learning machines, Master’s thesis (1996), Munchen, Germany. [20] M. Cacciola, G. Megali., F.C. Morabito, An Optimized Support Vector Machine Based Approach for Non Destructive Bumps Characterization in Metallic Plates, in: S. Wiak, A. Krawczyk, I. Dolezel (Eds.) Intelligent Computer Techniques in Applied Electromagnetics, Studies in Computational Intelligence Series 119 (1008), 131–139. [21] V. Kurkova, Kolmogorov’s theorem and multilayer neural networks, Neural Networks 5 (1992), 501– 506.
332
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-072-8-332
The Department Store Metaphor: Organizing, Presenting and Accessing Cultural Heritage Components in a Complex Framework Umberto MANISCALCO1, Gianfranco MASCARI2 and Giovanni PILATO1 1
ICAR -CNR Viale delle Scienze, 90128. Palermo, Italy {maniscalco, pilato}@pa.icar.cnr.it
2
IAC - CNR Viale del Policlinico 137 00161 Rome, Italy [email protected]
Abstract. We present a part of a wider research activity carried on within the CNR Interdepartmental Project named “Cultura e Territorio”. The project considers the interaction between culture and territory, this interaction determines co-evolving processes to which social, economic and knowledge phenomena are associated. Considering this very complex context we develop information management tools aiming to organize, present and access the CNR’s research components in the field of cultural heritage fruition, monitoring and conservation in a adaptive and dynamic way. The Department Store metaphor to organize, present and access research components in a complex framework can be usefully adopted as a guide line to define components characteristics and clustering, adaptive and dynamic presentation and accessing strategy.
Introduction The National Research Council (CNR) is a public organization; its duty is to carry out, promote, spread, transfer and improve research activities in the main sectors of knowledge growth and of its applications for the scientific, technological, economic and social development of the Country. To this end, the activities of the organization are divided into macro areas of interdisciplinary scientific and technological research, concerning several sectors: biotechnology, medicine, materials, environment and land, information and communications, advanced systems of production, judicial and socio-economic sciences, classical studies and arts1. In particular, Cnr develop a number of research activities directly in the field of cultural heritage or in other disciplinary sectors that could have fall out on cultural heritage field.
1
Extracted from official CNR’s web pages.
U. Maniscalco et al. / The Department Store Metaphor
333
Therefore the CNR use many research products that are applied directly or indirectly in the cultural heritage field. Nevertheless, the management of these products can be very complex as they have extremely different forms. A distinction among the CNR research products can be made on their type. In fact, they can be algorithms, scientific publications, methodologies, technologies, tools, applications or just “know-how” to be transferred. A further difficulty in managing these products is their documentation. They are in fact, developed in various field of research, thus their descriptions are expressed in a technical language and they are not homogenous. This language does not allow people, outside this field, to understand both their details and what they do. To clarify the above, often the documentation of these products consists only in some scientific paper in which what the products do is hidden among many other technical details. Therefore, as we said, this makes not understandable the use of the products. So we manage many different products almost always described in a way that is useless for a potential client. Talking about clients, that are supposed the final users of the products, the scenario looks equally complex. Also the clients can be of different types and with various type of educations and cultural background. In fact, clients can be scientists working either in the field where the products have been developed or in other fields. Private companies or public corporations that can or cannot have domain experts. Therefore the situation is that on a side we have a complex range of products that is not structured and not easily accessible, so we have a problem in the organizations. On other side there are our potential clients with different requests of knowledge depending by different kind of demand, so we have a problem in presenting and accessing the products [5], [6]. In the next session we describe the department store metaphor for organizing, presenting and accessing cultural heritage components in a complex framework. To follow, we take into account and describe the use of the chatbot, a particular kind of conversational agents, as usefully tools for interactive presenting and accessing cultural heritage components. Future works and Conclusions terminate the present work.
1. The Department Store Metaphor In this section we introduce the department store metaphor aiming to face the organization, the presentation and the access to the cultural heritage components in a complex framework. We said that a difficulty in managing these products is their documentation because they are often described in a verbose technical language. In our life, we choose and buy several products whose functioning is totally unknowns for us. That is possible because we access to a knowable description of these products in which all the complexity is hidden by the few distinctive characteristics of the product. For example, we choose among different kinds of computers selecting by chip type, operative system supported, hard disk capability, Gigabyte RAM on board and so
334
U. Maniscalco et al. / The Department Store Metaphor
on. Of course, we do that ignoring how the computer works or how the components are built. All Hi-Tec products (but in general all products), have a package in which are reported the main functions and the distinctive characteristics clearly expressed in a language understandable for all. So, the first step, if we want manage the research products the cultural heritage field we must face the problem of their description in relation to taxonomy. Indeed, we need at least two different kind of taxonomy. The first one define the typology of the product and the second one define the filed of application of the product. Thus, we have a taxonomy in which a product is classified as algorithm or application or tool or instrument and so on and another taxonomy in which a product is classified as product for monument or for museum or for paint and so on. Of course, these two taxonomies are related to each other so that we can have a product classified as algorithm for paint or application for museum and so on. Moreover, an anthology can be designed to define the relationship between the products; in fact many products can have relations with other products. For example, to monitor a monument in non-invasive way, soft sensors can be used and they are a product; to establish the optimal points in which install the sensors or to establish the optimal number of sensors to use we have at our disposal two different algorithms, those are two other products. The ontology in this, or in a similar case suggests to the client the use of a set of complementary products providing a full solution. Established the taxonomies we have our department store organized in sectors in which the clients can search the products depending on the field of application or on typologies. Moreover, ontology dynamically clusters the products around the focused one, using the relationship among them. Now we should face the problem of the description. As said, clients can be of different typology with different requests of knowledge depending by different kind of demand. The department store metaphor can help us to solve this problem. As said each product in a store have a package in which are reported their main functions and their distinctive characteristics clearly expressed in a language understandable for all. We do that for us products, in other words, we need a multi layer content that describes a product. In a first layer, the description should be expressed in a very easy language and it should report only the functionality of the product: what it does and in which case or scenario it is applicable. In the second layer the product should be described in a more complete way. The description in this layer should report how the product can be used, the result that we can obtain by its use, an example of application and other information of this kind. In the third level the products can be described in all its complexity using a technical language, reporting the methodologies or technologies employed in the its development and telling about all the technical characteristic. Doing so, we satisfy the demand of different typologies of client ensuring a suitable access to the information about the product depending on the cultural background and education of the client. Following the department store metaphor, we now can introduce a very useful tool. When we have a difficulty in understanding the functionality of a product in a
U. Maniscalco et al. / The Department Store Metaphor
335
department store or when we wish to compare two or more products, we can talk with an employee that often is able to satisfy our demand. The presence of staff employed in a department store is very important because they have an interactive relation with the client and they can clear up the doubts or give more information about the product using an understandable language for the client. To transpose the staff employed in a department store into our metaphor we can use a chatterbot (or chatbot) that is a type of conversational agent, a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods. In our case the chatbot have an important role because by its use we can obtain a dynamic presentation and access to the product, in fact the chatbot answers the questions of the client filling the information gap among the three levels of content previously described. In other words, by the use of chatbot we can better satisfy the demand of different typologies of client ensuring a suitable access to the information about the product depending on the cultural background and education of the client.
Figure 1. Client can access to one of three Description Level or he can use chatbot to satisfy its demand.
2. The Chat-bot Architecture In order to simulate the department store employee we propose to apply an intuitive chatbot architecture [4]. This kind of chat-bot is characterized by two typical behaviors: the first one is rational, and is given by the utilization of the taxonomies, the ontology and the standard AIML KB. The second one is associative, and it is realized through the use of a data-driven semantic space in which each one of the three-level items descriptions, AIML categories, taxonomy entries, ontologies concepts and user queries are mapped as vectors. This implicitly defines a “sub-symbolic” relationship net between any two items. This sub-symbolic space, which has the same psychological basis claimed by LSA [2] is exploited by the conversational agent during the dialogue with the user through ad hoc defined AIML tags. As a result, the “employee avatar” interacts with the user exploiting its standard KB but, moreover, it attempts to retrieve semantic relations between products stored in the conceptual space, which are of interest for the user but that could not be easily reachable by means of the traditional rule-based approach.
336
U. Maniscalco et al. / The Department Store Metaphor
2.1. Rational Interaction The rational interaction capability of the employee avatar is given by the standard AIML KB enhanced with primitives capable to explore the taxonomies and the ontology. The chat-bot is based on the ALICE technology [1]. As a matter of fact, the KB of an ALICE chat-bot is composed of question-answer modules, named categories and written in a language called AIML (Artificial Intelligence Mark-up Language). A pattern-matching algorithm is the engine of the interaction. The chat-bot engine compares the user questions with the patterns in its KB. Every time a match is found, the chat-bot will answer to the user with the template corresponding to the matched pattern. The ALICE KB can be constantly increased by the botmaster by means of a “targeting” mechanism. The targeting procedure consists in the analysis of the conversation files in order to detect those user questions, which had an incomplete matching with the AIML pattern. 2.2. Associative behavior The associative area is obtained by mapping the three level descriptions of the products, the taxonomies items, and the ontology concepts as vectors into a data driven conceptual space built by means of Latent Semantic Analysis (LSA) methodology [2]. The space is obtained through the statistical analysis of words co-occurrences into a corpus of texts built using ad hoc documents about the domain specific for the research tools. The employee avatar attempts to “guess” semantic relations between the items available in the “department store” and the user requests. This process is accomplished by evaluating geometric distances between the vectors representing them. The construction of the conceptual space is based on the collection of a set, as large as possible, of N documents dealing with the topics treated by the items. Let M be the number of unique words occurring in this set of documents. Let A={aij} be a MN matrix whose (i,j)-th entry is the square root of the sample probability of finding the ith word in the vocabulary in the j-th paragraph. According to the Truncated Singular Value Decomposition theorem [2], the matrix A is approximated with the Ak obtained computing the product Ak=UkkVkT , where Uk is a Mk matrix, V is a Nk matrix and is a kk diagonal matrix, whose elements are called singular values of A. This procedure creates a k-dimensional space where to map words and documents. In particular words are mapped as the rows of Uk and documents are mapped as the columns of VkT.
3. The Querying process After the creation of the semantic space, each item three-level description is encoded as a k-dimensional point in the conceptual space using as the normalized vector sum of the words composing it. As a result, each description is identified by a set of vectors (at least three) according to the required levels of detail. If oi and oj are two objects mapped into the conceptual space, the geometric similarity measure, defined as:
U. Maniscalco et al. / The Department Store Metaphor
cos2 (o ,o ) i j sim(oi ,o j ) = 0
337
if cos(oi ,o j ) > 0 otherwise
sets a semantic, weighted, sub-symbolic relationship between them. Given a vector q, associated to the query of the user, the set of vectors di, representing the descriptions of the items mapped into the conceptual space, which are sub-symbolically conceptually related to the user query, is given by: CR = {di sim(q,di ) T}
where T is an experimentally fixed threshold (T; 0T1). This technique allows also retrieving the most appropriate description level according to its sub-symbolic similarity with the user query. Moreover it is also possible to manually select, among the retrieved descriptions, those that match the request of a particular level of detail. The chat-bot can present these features through new specific AIML tags introduced for this interaction. This way it is possible to retrieve tools that can interest the users both through a traditional, rule-based approach and through a sub-symbolic semantic layer built exploiting a data-driven conceptual space.
4. Future works and Conclusions
We have presented a position paper about a Department Store metaphor, aimed at organizing, presenting and accessing research components in a complex framework. The idea is to organize products into a set of taxonomies and map the description of the products also in a data-driven conceptual space. A department store employee avatar, built as a conversational agent, which realizes the interaction interface for the users, can exploit these knowledge representation structures. Future work will regard the realization of a system prototype, in order to test the effectiveness of the proposed approach.
References [1] Alice: http://www.alicebot.org [2] Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, pp.259-284.
[3] Agostaro F., Augello A., Pilato G., Vassallo G., Gaglio S.: A Conversational Agent Based on a [4] [5] [6]
Conceptual Interpretation of a Data Driven Semantic Space. In Lecture Notes in Artificial Intelligence, Springer-Verlag GmbH, vol. 3673/2005, pp 381-392 G. Pilato, A. Augello, G. Vassallo, S. Gaglio, “Sub-Symbolic Semantic Layer in Cyc for Intuitive ChatBots” Proc. of International Conference on Semantic Computing, 2007. ICSC 2007. 17-19 Sept. 2007 pp:121 – 128, Irvine, California, USA. G.F. Mascari, M. Mautone L. Moltedo, P. Salonia, “Landscapes, Heritage and Culture”, Journal of Cultural Heritage Vol. 10, No.1, January-March 2009 pp. 22-29 P. Ciarlini, J.-F. Mascari, L. Moltedo, “Towards adaptive search of cultural heritage information in multimatch workshop EDUCA Berlin 2007
This page intentionally left blank
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved.
339
Subject Index ACE 163 algorithmic inference 19 ANN 271 artificial intelligence 29 artificial neural network(s) 306, 323 associative memory 247 attractors 11 audio and video recordings 51 background modeling 101 background subtraction 101 BagBoosting 208 Bagging 208 Bayesian decision networks 163 Bayesian networks 29 bicoherence 217 biometric recognition 62 biometrics 314 bispectrum 217 blind source separation 81 brain mapping 261 chatbot 163 Choquet integral 136 classification 91, 229 cognitive behavioral therapy 217 cognitive modeling 293 complex network(s) 39, 234 conflict 187 consensus management 136 constant Q transform 91 conversation modeling 70 convex weighting 154 cortical dynamics 247 cortico-cortical resonances 217 data fusion 29 data integration 197 decision fusion 197 decision making 154 decision templates 197 distance 154 DSS 163 dynamic portfolio management 146 dynamic similarity assessment 293 early fusion 197
ECG 281 electrical resistance tomography 241 electroencephalography (EEG) 229, 261 emergent computation 11 ensemble 208 entropy 261 epilepsy 261 equal opportunities 169 evaluating forecasts 323 face recognition 62 feature selection 271 forecasting 177 foreground modeling 101 frequency domain algorithms 81 functional demographic model 177 fuzzy expert system 169 fuzzy logic 169, 187 fuzzy number 154 gender equity 169 gene function prediction 197 gene targets 271 GEP 271 group decision theory 136 GVHD 271 hand-geometry 314 haptic interfaces 306 heart dipole model 281 heart diseases 281 image processing 306 insomnia 217 inverse problems 241 keyword spotting 70 late fusion 197 Lee Carter model 177 linear predictive coding 116 MATLAB 29 metabolic networks 39 Middle East 187 minimum guarantee 146 mobility model 19 moving object detection 101 multi criteria analysis 136
340
multilayer perceptron 116 music transcription 91 Naive Bayes combiner 197 neural network(s) 39, 101, 116, 314 neuron 234 noise 11, 234 non additive measures 136 non iterative imaging methods 241 non-destructive testing 241 onset detection 91 optical character recognition 306 pareto-like distribution law 19 partitioned block algorithms 81 path loss prediction 323 perceptual assessment 51 privacy 62 processes with memory 19 propagation of belief 29 random Boolean networks 11 random projection 208 random subspace 208 randomized maps 208 real-time systems 306 resolution 314 retrieval and mapping 293
reverberant environment 81 robustness and fault tolerance comparison 39 scenario 146 security 62 seismic signals discrimination 116 self organization 101 smoothing 177 SNR 271 social communities 19 spatio-temporal patterns 247 stopped object 101 support vector machine(s) (SVM) 91, 116, 229, 323 synchrony 247 tabletop 70 topology 234 uncertainty 154 urban environment 323 VCG 281 vector space integration 197 vocal and facial expression of emotion 51 weighted averaging 197
Neural Nets WIRN09 B. Apolloni et al. (Eds.) IOS Press, 2009 © 2009 The authors and IOS Press. All rights reserved.
341
Author Index Abbafati, M. Addabbo, T. Aguglia, U. Alonso, J.B. Apolloni, B. Augello, A. Barbieri, A. Barro, D. Bassis, S. Berger, F. Bianchi, L. Cacciola, M. Calcagno, S. Calvano, F. Canestrelli, E. Carota, M. Casali, D. Cianci, V. Cifani, S. Colacci, A. Colombini, G. Conti, V. Corazza, M. Costantini, G. Cuzzola, M. D’Amato, V. Dante, V. De Carlo, D. Del Giudice, P. Di Maio, G. Dutoit, P. Espa-Cervena, K. Esposito, A. Esposito, A.M. Fàbregas, J. Facchinetti, G. Faundez-Zanuy, M. Fedele, R. Ferrera, M.A. Fiasché, M. Filippone, M. Folgieri, R. Fortuna, L.
229 169 261 314 19 163 11 146 19 217 229 271, 323 323 241 146 229 229 261 70 11 110 39 136 91, 229 271 177 306 323 306 51 217 217 51 116 314 169, 187 62, 314 271 314 271 3 208 234
Franci, F. Frasca, M. Gaglio, S. Gallo, A. Giacco, F. Gil-Aluja, J. Gil-Lafuente, J. Giove, S. Giudicepietro, F. Grassi, M. Grinberg, M. Haltakov, V. Iacopino, P. Inuso, G. Kauffman, S.A. La Foresta, F. Laganà, F. Lang, T. Lanza, B. Luccarini, L. Maddalena, L. Mammone, N. Maniscalco, U. Marinaro, M. Mascari, G. Mastroleo, G. Masulli, F. Megali, G. Mello, P. Morabito, F.C. Motto Ros, P. Nunnari, G. Pagliaro, V. Palmieri, F. Parisi, R. Pasero, E. Pelletier, L. Perfetti, R. Perrig, S. Petetti, E. Petrosino, A. Piazza, F. Picaro, A.
187 234 163 234 116, 247 127 154 136 116 62 293 293 271 261, 281 11 261, 281 323 169 39 110 101 261, 281 332 116, 247 332 169, 187 3 271, 323 110 261, 271, 281, 323 306 234 187 29 81 306 217 91 217 306 101 70 81
342
Pilato, G. Piscopo, G. Princi, D. Principi, E. Quitadamo, L. Re, M. Ricci, G. Riviello, M.T. Rocchi, C. Rotili, R. Rovetta, S. Rubinacci, G. Russolillo, M. Sacristan, A. Saggio, G. Scarpetta, S. Scarpiniti, M.
163, 332 177 271 70 229 197 187 51 70 70 3 241 177 314 229 116, 247 81
Serra, R. Shaposhnyk, V. Sorbello, F. Sottara, D. Spata, A. Squartini, S. Tamburrino, A. Todisco, M. Travieso, C. Uncini, A. Valentini, G. Valerio, L. Versaci, M. Villa, A.E.P. Villani, M. Vitabile, S.
11 217 39 110 234 70 241 91, 229 314 81 197 19 323 217 11 39