Processes of Emergence of Systems and Systemic Properties Towards a General Theory of Emergence
This page intentionally left blank
Processes of Emergence of Systems and Systemic Properties Towards a General Theory of Emergence Proceedings of the International Conference Castel Ivano, Italy
18 – 20 October 2007
editors
Gianfranco Minati Italian Systems Society, Italy
Mario Abram Italian Systems Society, Italy
Eliano Pessa University of Pavia, Italy
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
A-PDF Merger DEMO : Purchase from www.A-PDF.com to remove the watermark
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
PROCESSES OF EMERGENCE OF SYSTEMS AND SYSTEMIC PROPERTIES Towards a General Theory of Emergence Copyright © 2009 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-981-279-346-1 ISBN-10 981-279-346-1
Printed in Singapore.
Chelsea - Process of Emergence.pmd
1
10/29/2008, 1:30 PM
The proceedings of the fourth national conference of the Italian Systems Society (AIRS) are dedicated to the memory of Evelyne Andreewsky, passed away on December 2007. Several members of AIRS had the honour to be her colleague and friend.
Evelyne Andreewsky was born in Paris. She earned an engineering degree in Electronics from E.S.E., Paris, and a "Docteur ès Sciences" degree (PhD) in Computer Science (Neurolinguistic Modelling), from Pierre & Marie Curie University, Paris VI. She was Senior Researcher at the French National Research Institute I.N.S.E.R.M. She has switched from (a straight) Computer Scientist career (as research engineer, chief of information processing public labs, consultant for government's policies, UNESCO expert...) to (pure) Research, trying to develop new multidisciplinary systemic approaches to Cognition and Language (over 150 papers in scientific international journals, books, chapters of books + books editor, guest editor of journals).
v
vi
Dedication
She was founder and honorary president of the Systems Science European Union (UES). She was actively involved in the board of scientific societies, namely AFSCET (French Systems Science Society) and MCX (European Program for Modelling Complexity). She belonged to the editorial board of scientific journals, namely "Cybernetics and Human Knowing" and Res-Systemica. She has organized or co-organized a number of national and international congresses, symposia and summer schools. She has been elected (1999) to Honorary Fellowship of the World Organisation of General Systems and Cybernetics (WOSC), founded by Professor John Rose, and has been invited to give courses or lectures in various countries. We will never forget her and her dedication to systems science. Thank you Evelyne.
PREFACE
The title of this fourth national conference of the Italian Systems Society (AIRS), Processes of emergence of systems and systemic properties − Towards a general theory of emergence, has been proposed to emphasize the importance of processes of emergence within Systemics. The study of this topic has a longstanding tradition within AIRS. Namely this conference can be considered as a continuation of the previous 2002 conference, Emergence in Complex Cognitive, Social and Biological Systems, and 2004 conference, Systemics of Emergence: Research and Applications. In the preface of the 2004 conference the editors wrote: “Emergence is not intended as a process taking place in the domain of any discipline, but as ‘trans-disciplinary modeling’ meaningful for any discipline. We are now facing the process by which General System Theory is more and more becoming a Theory of Emergence, seeking suitable models and formalizations of its fundamental bases. Correspondingly, we need to envisage and prepare for the establishment of a Second Systemics − a Systemics of Emergence…”. We had intense discussions in the periodic meetings of AIRS, focused on the large, increasing amount of contributions available in the scientific literature about emergence. In this regard we remark that AIRS members were and actually are involved in research projects in several disciplinary fields, having the experience of applying the view of emergence outlined above, for instance, in Architecture, Artificial Intelligence, Biology, Cognitive Science, Computer Science, Economics, Education, Engineering, Medicine, Physics, Psychology, and Social Sciences. As a consequence of this intense activity we felt an increasing need to better specify the principles to be adopted when dealing with this evolving, interdisciplinary study of emergence. With this point of view in mind, which could be viewed as a generalization of other instances, historically at the basis of birth of different systems societies in the world (e.g., Cybernetics, General System Theory, Living Systems Theory, Systems Dynamics, Systems Engineering, Systems Theory, etc.), in October 2006 the Italian Systems Society approved a Manifesto, available at our web site www.AIRS.it . It relates to our vision of the current situation of the role of world-wide Systems Societies, as well as of problems and perspectives of Systemics. In the Manifesto we outlined some fundamental aspects of our identity, such as the necessary role of disciplinary knowledge for Systemics, as vii
viii
Preface
well as of inter- and trans-disciplinary knowledge, the meaning of generalization, the need for rigorousness and the non-ideological valence of reductionism. We quote the concluding statements of the Manifesto: “The purpose of systems societies should be to identify and, where possible, produce contributions to Systemics taking place in disciplinary and multidisciplinary research, making them general and producing proposals for structuring and generalizing disciplinary results. Examples of theoretical aspects of such an effort are those related the establishment of a General Theory of Emergence, a Theory of Generalization, Logical Philosophical models related to Systemics and the issue of Variety in different disciplinary contexts.” The general theory of emergence we figure out is not a unique, caseindependent and scale-independent approach, having general disciplinary validity. Instead we have in mind different, dynamical and interacting levels of description within a constructivist view able to model processes of emergence, in order not to reduce all of them to a single description, but to introduce multimodeling and modeling hierarchy as a general approach to be used in principle. A related approach has been introduced in literature with the DYnamic uSAge of Models (DYSAM) and logical openness, i.e. meta-level modelling (models of models). We make reference to a constructivist science, as dealing with the constructive role of the observer in processes of emergence. The latter is related to his/her cognitive model allowing the recognition of acquired systemic properties, which occurs when the hierarchical processes generating these properties cannot be modeled by using traditional causal approaches. In other words, according to a constructivist view on one side the observer looks for what is conceivable by using the assumed cognitive model, and, on the other side, he/she can introduce methodologies allowing the possibility of producing incongruence, unexpected results and inconsistence. The latter process asks for a new cognitive model generating paradigm shifts and new theoretical approaches, such as in the case of abduction, as introduced by Peirce. All this is endowed with a deep, general, cultural meaning when the focus is on scientific aspects where it is possible to test, compare, validate and formulate new explicative theories. Moreover, we believe that the subject of emergence is a sort of accumulation point of increasing, mutually related conceptual links to disciplinary open questions, such as the ones mentioned in the topics of the conference.
Preface
ix
The study of processes of emergence implies the need to model and distinguish, in different disciplinary contexts, the establishment of structures, systems and systemic properties. Examples of processes of emergence of systems are given by the establishment of entities which constructivistically the observer detects to possess properties different from those of the component parts, such as in the case of collective behaviors giving rise to ferromagnetism, superconductivity and superfluidity and to social systems such as markets and industrial districts. It must be noted that in a constructivist view the whole is not constituted by parts, but rather the observer identifies parts by using a model in the attempt to explain the whole (observer and designer coincide only for artificial systems). A different partitioning corresponds to different, mutually equivalent or irreducible, models. Systems do not only possess properties, but are also able, in their turn, to make emergent new ones. Examples of emergence of systemic properties in systems (i.e., complex systems) are given by the establishment of properties such as cognitive abilities in natural and artificial systems, collective learning abilities in social systems such as flocks, swarms, markets, firms and functionalities in networks of computers (e.g., in Internet). Evolutionary processes establish properties in living systems. The models of these processes introduced so far are based on theories of phase transitions, of bifurcations, of dissipative structures, and of Multiple Systems (Collective Beings). On the one hand the ability to identify these processes allows effectiveness without confusing processes of a different nature but having in common the macroscopic and generic establishment of systems. This concerns a number of disciplinary contexts such as Physics, Cognitive Science, Biology, Artificial Intelligence, Economics. On the other hand the attempt to build a General Theory of Emergence corresponds to Von Bertalanffy’s project for a General System Theory. The conference will then focus upon these issues from theoretical, experimental, applicative, epistemological and philosophical points of view. We take this opportunity to mention an important, even if not explicit, outcome of the conference. The scientific committee and we, the editors, had the duty and benefit of this outcome and now we have the pleasure of sharing it with the readers. As it is well known, the scientific and cultural level of scientific journals and edited books is assumed to be assured by a good refereeing by the editorial board and the scientific committee. The task is supposed to be quite “easy”
x
Preface
when dealing with topics having general acceptance in academic and research contexts, robust methodologies, and consolidated literature. Consistency is assumed to be assured, in short, by the complete state of the art, and consequently grounded on the application of well-described approaches, consistent reasoning, supporting examples, validation procedures, so as to get coherent conclusions. Traditionally, the systemic community (the one we criticize in the Manifesto) has always tolerated low ‘grades’ in those areas as balanced by the need to break disciplinary well-defined barriers and approaches and encourage focus on new aspects not regulated by classic rules of acceptance. The purpose was to don’t take the risk of suffocating ideas able to generate interesting cultural processes despite their imprecise formulation, even presenting an interesting inconsistence. This was the age when to be inter- and transdisciplinary was a challenge (actually, it is still so in several universities). As emphasized in our Manifesto, disciplinary scientific research had the need to become more and more interdisciplinary, independently from roles, efforts and recommendations of system societies. The challenge for the systemic movement is, in our view, to convert this need into a theoretical result stemming from a General Theory of Emergence intended as a Theory of Change. The challenge is not only at theoretical level, but also at educational level (e.g., in which university department make such a research?). At the same time we have today available an enormous amount of knowledge and we have to face the temptation to explain-all-with-previousknowledge (like in Science). In this context we may lack approaches suitable for recognize and establish new paradigms, inhomogeneous in principle with the old ones. At the same time we lack ways to assure quality levels (e.g. “What if Simplicio had had computers available?”). One consequence of the unavailability of a General Theory of Emergence as a Theory of Change is the unavailability of a robust methodology for evaluating contributions having this mission. The attempt to evaluate each contribution as a disciplinary contribution may imply the lack of appreciation for innovative, inter- and trans-disciplinary systemic meaning. The problem relates to the production of scientific knowledge and educational systems having to deal with an enormous amount of available knowledge by using often old approaches, methodologies and technologies. How to recognize that a wrong, intelligent idea may be more important than a right, not-so-intelligent idea expected to be homologated because of its homogeneity with the established knowledge?
Preface
xi
Is the system community, in force of its historical attention and mission related to inter- and trans-disciplinarity, able to face this challenge in general, i.e. propose innovative approaches and methodologies able to guarantee, test and validate inter- and trans-disciplinary consistency and robustness? We will try to contribute, on the basis of our experience and research activity, to the introduction of proposals and methodologies. The Italian Systems Society is trying to play a significant role in this process. The conference was articulated in different sessions able to capture both the theoretical aspects of emergence as introduced above and the applicative ones: 1. Emergence in Architecture. 2. Processes of emergence in Economics and Management. 3. Emergence. 4. Emergence in social systems. 5. Emergence in Artificial Intelligence. 6. Emergence in Medicine. 7. Models and systems. 8. Theoretical problems of Systemics. 9. Cognitive Science. We conclude by emphasizing that we are aware of how much the scientific community focuses on the available knowledge, as a very comprehensible attitude. By the way, we also have the dream of inter-related forms of knowledge, one represented and modelled into the other, in which meanings have simultaneous multi-significance contributing to generate hierarchies allowing to deal with the meaning of human existence. With this dream in mind we use the bricks of science to contribute to make emergent a new multidimensional knowledge. Gianfranco Minati AIRS president Eliano Pessa Co-Editor Mario Abram Co-Editor
This page intentionally left blank
PROGRAM COMMITTEE
G. Minati (chairman)
Italian Systems Society
E. Pessa (co-chairman)
University of Pavia
L. Biggiero
LUISS University, Rome
G. Bruno
University of Rome “La Sapienza”
V. Coda
“Bocconi” University, Milan
S. Della Torre
Polytechnic University of Milan
V. Di Battista
Polytechnic University of Milan
S. Di Gregorio
University of Calabria
I. Licata
Institute for Basic Research, Florida, USA
M.P. Penna
University of Cagliari
R. Serra
University of Modena and Reggio Emilia
G. Tascini
University of Ancona
G. Vitiello
University of Salerno
This page intentionally left blank
CONTRIBUTING AUTHORS
Abram M.R. Alberti M. Allievi P. Arecchi F.T. Argentero P. Arlati E. Avolio M.V. Battistelli A. Bednar P.M. Bich L. Biggiero L. Bonfiglio N. Bouchard V. Bruno G. Buttiglieri F. Canziani A. Carletti T. Cirina L. Colacci A. Collen A. D’Ambrosio D. Damiani C. David S. Del Giudice E. Dell’Olivo B.
Della Torre S. Di Battista V. Di Caprio U. Di Gregorio S. Ferretti M.S. Filisetti A. Giallocosta G. Giunti M. Graudenzi A. Gregory R.L. Guberman S. Ingrami P. Lella L. Licata I. Lupiano V. Magliocca L.A. Marconi P.L. Massa Finoli G. Minati G. Mocci S. Montesanto A. Mura M. Odoardi C. Paoli F. Penna M.P.
Percivalle S. Pessa E. Picci P. Pietrocini E. Pinna B. Poli I. Puliti P. Ramazzotti P. Ricciuti A. Rocchi C. Rollo D. Rongo R. Sechi C. Serra R. Setti I. Sevi E. Sforna M. Spataro W. Stara V. Tascini G. Terenzi G. Trotta A. Villani M. Vitiello G.
This page intentionally left blank
CONTENTS
Dedication
v
Preface
vii
Program Committee
xiii
Contributing Authors
xv
Contents
xvii
Opening Lecture Coherence, Complexity and Creativity Fortunato Tito Arecchi
3
Emergence in Architecture Environment and Architecture – A Paradigm Shift Valerio Di Battista
37
Emergence of Architectural Phenomena in the Human Habitation of Space Arne Collen
51
Questions of Method on Interoperability in Architecture Ezio Arlati, Giorgio Giallocosta
67
Comprehensive Plans for a Culture-Driven Local Development: Emergence as a Tool for Understanding Social Impacts of Projects on Built Cultural Heritage Stefano Della Torre, Andrea Canziani Systemic and Architecture: Current Theoretical Issues Giorgio Giallocosta
xvii
79 91
xviii
Contents
Processes of Emergence in Economics and Management Modeling the 360° Innovating Firm as a Multiple System or Collective Being Véronique Bouchard The COD Model: Simulating Workgroup Performance Lucio Biggiero, Enrico Sevi Importance of the Infradisciplinary Areas in the Systemic Approach Towards New Company Organisational Models: the Building Industry Giorgio Giallocosta
103 113
135
Systemic Openness of the Economy and Normative Analysis Paolo Ramazzotti
149
Motivational Antecedents of Individual Innovation Patrizia Picci, Adalgisa Battistelli
163
An E-Usability View of the Web: A Systemic Method for User Interfaces Vera Stara, Maria Pietronilla Penna, Guido Tascini
181
Emergence Evolutionary Computation and Emergent Modeling of Natural Phenomena R. Rongo, W. Spataro, D. D’Ambrosio, M.V. Avolio, V. Lupiano, S. Di Gregorio
195
A New Model for the Organizational Knowledge Life Cycle Luigi Lella, Ignazio Licata
215
On Generalization: Constructing a General Concept from a Single Example Shelia Guberman
229
General Theory of Emergence Beyond Systemic Generalization Gianfranco Minati
241
Uncertainty, Coherence, Emergence Giordano Bruno
257
Emergence and Gravitational Conjectures Paolo Allievi, Alberto Trotta
265
Contents
xix
Emergence in Social Systems Inducing Systems Thinking in Consumer Societies Gianfranco Minati, Larry A. Magliocca
283
Contextual Analysis. A Multiperspective Inquiry into Emergence of Complex Socio-Cultural Systems Peter M. Bednar
299
Job Satisfaction and Organizational Commitment: Affective Commitment Predictors in a Group of Professionals Maria Santa Ferretti
313
Organizational Climate Assessment: A Systemic Perspective Piergiorgio Argentero, Ilaria Setti Environment and Urban Tourism: An Emergent System in Rhetorical Place Identity Definitions Marina Mura
331
347
Emergence in Artificial Intelligence Different Approaches to Semantics in Knowledge Representation S. David, A. Montesanto, C. Rocchi Bidimensional Turing Machines as Galilean Models of Human Computation Marco Giunti
365
383
A Neural Model of Face Recognition: A Comprehensive Approach Vera Stara, Anna Montesanto, Paolo Puliti, Guido Tascini, Cristina Sechi
407
Anticipatory Cognitive Systems: A Theoretical Model Graziano Terenzi
425
Decision Making Models within Incomplete Information Games Natale Bonfiglio, Simone Percivalle, Eliano Pessa
441
Emergence in Medicine Burnout and Job Engagement in Emergency and Intensive Care Nurses Piergiorgio Argentero, Bianca Dell’olivo
455
xx
Contents
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis Alberto Ricciuti Post Traumatic Stress Disorder in Emergency Workers: Risk Factors and Treatment Piergiorgio Argentero, Bianca Dell’Olivo, Ilaria Setti State Variability and Psychopathological Attractors. The Behavioural Complexity as Discriminating Factor between the Pathology and Normality Profiles Pier Luigi Marconi
473
487
503
Models and Systems Decomposition of Systems and Complexity Mario R. Abram How many Stars are there in Heaven ? The results of a study of Universe in the light of Stability Theory Umberto Di Caprio
533
545
Description of a Complex System through Recursive Functions Guido Massa Finoli
561
Issues on Critical Infrastructures Mario R. Abram, Marino Sforna
571
Theoretical Problems of Systemics Downward Causation and Relatedness in Emergent Systems: Epistemological Remarks Leonardo Bich
591
Towards a General Theory of Change Eliano Pessa
603
Acquired Emergent Properties Gianfranco Minati
625
The Growth of Populations of Protocells Roberto Serra, Timoteo Carletti, Irene Poli, Alessandro Filisetti
641
Investigating Cell Criticality R. Serra, M. Villani, C. Damiani, A. Graudenzi, P. Ingrami, A. Colacci
649
Contents
xxi
Relativistic Stability. Part 1 - Relation Between Special Relativity and Stability Theory in the Two-Body Problem Umberto Di Caprio
659
Relativistic Stability. Part 2 - A Study of Black-Holes and of the Schwarzschild Radius Umberto Di Caprio
673
The Formation of Coherent Domains in the Process of Symmetry Breaking Phase Transitions Emilio Del Giudice, Giuseppe Vitiello
685
Cognitive Science Organizations as Cognitive Systems. Is Knowledge an Emergent Property of Information Networks? Lucio Biggiero
697
Communication, Silence and Miscommunication Maria Pietronilla Penna, Sandro Mocci, Cristina Sechi
713
Music: Creativity and Structure Transitions Emanuela Pietrocini
723
The Emergence of Figural Effects in the Watercolor Illusion Baingio Pinna, Maria Pietronilla Penna
745
Continuities and Discontinuities in Motion Perception Baingio Pinna, Richard L. Gregory
765
Mother and Infant Talk about Mental States: Systemic Emergence of Psychological Lexicon and Theory of Mind Understanding D. Rollo, F. Buttiglieri
777
Conflict in Relationships and Perceived Support in Innovative Work Behavior Adalgisa Battistelli, Patrizia Picci, Carlo Odoardi
787
Role Variables vs. Contextual Variables in the Theory of Didactic Systems Monica Alberti, Lucia Cirina, Francesco Paoli
803
This page intentionally left blank
OPENING LECTURE
This page intentionally left blank
COHERENCE, COMPLEXITY AND CREATIVITY
FORTUNATO TITO ARECCHI Università di Firenze and INOA, Largo E. Fermi, 6 - 50125 Firenze, Italy E-mail:
[email protected] We review the ideas and experiments that established the onset of laser coherence beyond a suitable threshold. That threshold is the first of a chain of bifurcations in a non linear dynamics, leading eventually to deterministic chaos in lasers. In particular, the so called HC behavior has striking analogies with the electrical activity of neurons. Based on these considerations, we develop a dynamical model of neuron synchronization leading to coherent global perceptions. Synchronization implies a transitory control of neuron chaos. Depending on the time duration of this control, a cognitive agent has different amounts of awareness. Combining this with a stream of external inputs, one can point at an optimal use of internal resources, that is called cognitive creativity. While coherence is associated with long range correlations, complexity arises whenever an array of coupled dynamical systems displays multiple paths of coherence. What is the relation among the three concepts in the title? While coherence is associated with long range correlations, complexity arises whenever an array of coupled dynamical systems displays multiple paths of coherence. Creativity corresponds to a free selection of a coherence path within a complex nest. As sketched above, it seems dynamically related to chaos control. Keywords: heteroclinic chaos, homoclinic chaos, quantum uncertainty, feature binding, conscious perception.
1. Introduction - Summary of the presentation Up to 1960 in order to have a coherent source of light it was necessary to filter out a noisy regular lamp. Instead, the laser realizes the dream of shining a vacuum state of the electromagnetic field with a classical antenna, thus inducing a coherent state, which is a translated version of the vacuum state, with a minimum quantum uncertainty. As a fact, the laser reaches its coherent state through a threshold transition, starting from a regular incoherent source. Accurate photon statistics measurements proved the coherence quality of the laser as well the threshold transition phenomena, both in stationary and transient situations. The threshold is the first of a chain of dynamical bifurcations; in the 1980’s the successive bifurcations leading to deterministic chaos were explored. Furthermore, the coexistence of many laser modes in a cavity with high Fresnel
3
4
F.T. Arecchi
number gives rise to a complex situation, where the modes behave in a nested way, due to their mutual couplings, displaying a pattern of giant intensity peaks whose statistics is by no means Gaussian, as in speckles. Among the chaotic scenarios, the so called HC (Heteroclinic chaos), consisting of trains of equal spikes with erratic inter-spike separation, was explored in CO2 and in diode lasers with feedback. It looks as the best implementation of a time code. Indeed, networks of coupled HC systems may reach a state of collective synchronization lasting for a finite time, in presence of a suitable external input. This opens powerful analogies with the feature binding phenomenon characterizing neuron organization in a perceptual task. The dynamics of a single neuron is suitably modeled by a HC laser; thence, the collective dynamics of a network of coupled neurons can be realized in terms of arrays of coupled HC lasers [5,23]. Thus, synchronization of an array of coupled chaotic lasers is a promising tool for a physics of cognition. Exploration of a complex situation would require a very large amount of time, in order to classify all possible coherences, i.e. long range correlations. In cognitive tasks facing a complex scenario, our strategy consists in converging to a decision within a finite short time. Any conscious perception (we define conscious as that eliciting a decision) requires 200 ms, whereas the loss of information in the chaotic spike train of a single neuron takes a few ms. The interaction of a bottom-up signal (external stimulus) with a top-down modification of the control parameters (induced by the semantic memory) leads to a collective synchronization lasting 200 ms: this is the indicator of a conscious perception. The operation is a control of chaos, and it has an optimality; if it lasts less than 200 ms, no decisions emerge, if it lasts much longer, there is no room for sequential cognitive tasks. We call creativity this optimal control of neuronal chaos. It amounts to select one among a large number of possible coherences all present in a complex situation. The selected coherence is the meaning of the object under study. 2. Coherence 2.1. Classical notion of coherence Before the laser, in order to have a coherent source of light it was necessary to filter out a noisy regular lamp. Fig. 1 illustrates the classical notion of coherence, with reference to the Young interferometer. A light source with aperture ∆ x illuminates a screen with two holes A and B (that we can move to positions A’ and B’). We take the source as made of the superposition of
Coherence, Complexity and Creativity
source
5
detector
Figure 1. Young interferometer: a light source of aperture x illuminates a screen with two holes in it. Under suitable conditions, the phase interference between the fields leaking trough the two holes gives rise to interference fringes, as the point like detector is moved on a plane transverse to the propagation direction.
independent plane waves, without mutual phase relations. The single plane wave is called mode, since it is a solution of the wave equation within the cavity containing the source. Each mode, passing through ∆ x , is diffraction broadened into a cone of aperture θ = λ ∆ x . At left of the screen, the light from A and B is collected on a detector, whose electrical current is proportional to the impinging light power, that is, to square modulus of the field. The field is the sum of the two fields E1 and E2 from the two holes. The modulus must be averaged over the observation time, usually much longer than the optical period; we call |E1 + E2|2 this average. The result is the sum of the two separate intensities I1 = |E1|2 and I2 = |E2|2 , plus the cross phased terms E1∗ E2 + E2∗ E1 . These last ones increase or reduce I1 + I 2 , depending on the relative phase, hence interference fringes are observed as we move the detector on a plane transverse to the light propagation, thus changing the path lengths of the two fields. Fringe production implies that the phase difference be maintained during the time of the average … , this occurs only if the two fields leaking through the two holes belong to the same mode, that is, if observation angle, given by the distance AB divided by the separation r between screen and detector, is smaller than the diffraction angle θ = λ ∆ x . If instead it is larger, as it occurs e.g. when holes are in positions A′ , B′ , then the detector receives contributions from distinct modes, whose phases fluctuate over a time much shorter than the averaging time. Hence, the phased
6
F.T. Arecchi
terms are washed out and no fringes appear. We call coherence area that area SAB on the screen which contains pairs of points A, B such that the collection angle be at most equal to the diffraction angle. SAB subtends a solid angle given by
S AB =
λ2 ⋅ r 2 (∆ x) 2
(1)
The averaged product of two fields in positions 1 = A and 2 = B is called first order correlation function and denoted as
G (1) (1,2) = E1∗ E2
(2)
In particular for 1 = 2, G(1)(1,1) = E1*E1 is the local intensity at point 1. Points 1 and 2 correspond to holes A and B of the Young interferometer; their separation is space-like if the detector is at approximately the same distance from the holes. Of course, fringes imply path differences comparable with the wavelength, but anyway much shorter than the coherence time
τ = 1 ∆ω
(3)
of a narrowband (quasi-monochromatic) light source¸ indeed if the line breadth is much smaller than the optical frequency, ∆ω << ω , then the coherence time is much longer than the optical period T, that is, τ >> T . In the case of the Michelson interferometer, 1 and 2 are the two mirror positions, which are separated time-like. Fringe disappearance in this case means that the time separation between the two mirrors has become larger than the coherence time. 2.2. Quantum notion of coherence The laser realizes the dream of shining the current of a classical antenna into the vacuum state of an electromagnetic field mode, thus inducing a coherent state as a translated version of the vacuum state, with a minimum quantum uncertainty (Fig. 2). We know from Maxwell equations that the a single field mode obeys a harmonic oscillator (HO) dynamics. The quantum HO has discrete energy states equally separated by ω starting from a ground (or vacuum) state with energy ω 2 . Each energy state is denoted by the number (0,1,2, … ,n, …) of energy quanta ω above the ground state. In a coordinate q representation, any state with a fixed n is delocalised, that is, its wavefunction is spread inside the region
Coherence, Complexity and Creativity
7
Figure 2. Quantum harmonic oscillator in energy-coordinate diagram. Discrete levels correspond to photon number states. A coherent state is a translated version of the ground state; its photon number is not sharply defined but is spread with a Poisson distribution.
confined by the parabolic potential (see e.g. the dashed wave for n = 5). Calling p = mv the HO impulse, the n state has an uncertainty in the joint coordinateimpulse measurement increasing as
∆ q ∆ p = (n + 1 2) .
(4)
The vacuum state, with n = 0, has the minimum uncertainty
∆q ∆p = 1 2 .
(4’)
If now we consider a version of the vacuum state translated by (where is proportional to q), this is a quantum state still with minimum uncertainty, but with an average photon number equal to the square modulus |α2| (in the example reported in the figure we chose |α2| = 5). It is called coherent state. It oscillates at the optical frequency in the q interval allowed for by the confining potential. It maintains the instant localization, at variance with a number state. The coherent state pays this coordinate localization by a Poisson spread of the photon number around its average |α2|. The quantum field vacuum state shifted by a classical current had been introduced in 1938 by Bloch and Nordsieck; in 1963 R. Glauber showed that these states have maximal coherence, and that a laser emits such a type of light, since the collective light emission in the laser process can be assimilated to the radiation of a classical current.
8
F.T. Arecchi
While the fringe production is just a test of the modal composition of the light field, the Hanbury Brown and Twiss interferometer (HBT) implies the statistical spread of the field amplitude. HBT was introduced in 1956 as a tool for stellar observation (Fig. 3) in place of Michelson (M) stellar interferometer. M is based on summing on a detector the fields from two distant mirrors, in order to resolve the angular breadth of a star (that is, its diameter, or the different directions of two stars in binary components). The more distant are the mirrors the higher the resolution. However the light beams deflected by the two mirrors undergo strong dephasing in the horizontal propagation and this destroys the fringes. In HBT, the two mirrors are replaced by two detectors, whose output currents feed a correlator; now the horizontal path is within a cable, hence not affected by further dephasing. The working principle is intensity correlation (rather than field), which for a Gaussian statistics (as expected from thermal sources as the stars) yields the product of the two intensities plus the square modulus of the field correlation as provided by a standard interferometer, that is, 2
G ( 2) (1,2) = E1 | E2 |2 = I1I 2 + | G (1) |2
(5)
Instead, Glauber had proved that for a coherent state, all the higher order correlation functions factor as products of the lowest one, that is,
G ( n ) (1,2,
, n) = G (1) (1) G (1) (2)
G (1) (n)
(6)
in particular, for n = 2 , we have
G ( 2) (1,2) = G (1) (1) G (1) (2) .
(6’)
G (1) is just the intensity; thus a coherent state should yield only the first term of HBT, without the correlation between the two distant detectors. In 1966 the comparison between a laser and a Gaussian field was measured time-wise rather than space-wise as shown in Fig. 4. The laser light displays no HBT, as one expects from a coherent state. The extra term in the Gaussian case doubles the zero time value. As we increase the time separation between the two “instantaneous” intensity measurements (by instantaneous we mean that the integration time is much shorter than the characteristic coherence time of the Gaussian fields), the extra HBT terms decays and eventually disappears. We have scaled the time axis so that HBT for different coherence times coincide.
Coherence, Complexity and Creativity
Optical field
Photon detector
Electric signal
Electronic correlator
9
Figure 3. Left: the Michelson stellar interferometer M; it consists of two mirrors which collect different angular views of a stellar object and reflect the light to a single photon detector through long horizontal paths (10 to 100 meters) where the light phase is affected by the ground variations of the refractive index (wavy trajectories). Right: the Hanbury-Brown and Twiss (HBT) interferometer; mirrors are replaced by detectors and the current signal travel in cables toward an electronic correlator, which performs the product of the two instant field intensities E1* E1, E2* E2 and averages it over a long time [32].
Coherence times are assigned through the velocities. of a random scatterer, as explained in the next sub-section. Notice that Fig. 4 reports coherence times of the order of 1 ms. In the electronic HBT correlator, this means storing two short time intensity measurements (each lasting for example 50 ns) and then comparing them electronically. If we tried to measure such a coherence time by a Michelson interferometer, we would need a mirror separation of the order of 300 km! 2.3. Photon statistics (PS) As a fact, the laser reaches its coherent state through a threshold transition, starting from a regular incoherent source. Accurate photon statistics measurements proved the coherence quality of the laser as well the threshold transition phenomena, both in stationary and transient situations. We have seen in Fig. 2 that a coherent state yields a Poisson spread in the photon number, that is, a photon number statistics as
p ( n) =
n n − e n!
n
(7)
10
F.T. Arecchi
Figure 4. Laboratory measurement of HBT for Gaussian light sources with different coherence times; for each case, the first order correlations between the signals sampled at different times decay with the respective coherence time, and asymptotically only the product of the average intensities (scaled to 1) remains. The laser light displays no HBT, as one expects from a coherent state. [14]
where n = |α2| is the average photon number. This provides high order moments, whereas HBT for equal space-time positions 1 = 2 would yield just the first and the second moment. Thus PS is statistically far more accurate, however it is confined to within a coherence area and a coherence time. If now we couple the coherent state to an environment, we have a spread of coherent states given by a spread P(α). The corresponding PS is a weighted sum of Poisson distributions with different average values n = |α2| . In Fig. 5 we report the statistical distributions of photocounts versus the count number. If the detector has high efficiency, they well approximate the photon statistics of the observed light source. A few words on how to build a Gaussian light source. A natural way would be to take a black-body source, since at thermal equilibrium P(α) is Gaussian. However its average photon number would be given by Planck’s formula as
n =
1 exp(− ω / kT ) − 1
.
(8)
For visible light ω = 2 eV and current blackbody temperatures (remember that 104 K ≈ 1 eV ) we would have n << 1. In order to produce much larger n , we shine a laser on a random scatterer. It consists of a grinded glass disc, with scattering centers smaller than a wavelength and positioned at random distances from each other. As the disc rotates with a tangential velocity v at the site of the laser spots, it produces random speckles which lose coherence over a time
Coherence, Complexity and Creativity
11
Figure 5. Statistical distributions of photocounts versus the count number. If the detector has high efficiency, they well approximate the PS of the observed light. L = laser light; G = Gaussian light; S = superposition of the two fields L and G onto the same mode. The observation time T of each sample is much shorter than the coherence time of the Gaussian source; thus each measurement is “instantaneous”. [2,8]
inversely proportional to v. The coherence time may be of the order of 1 ms, as shown in Fig. 4, whereas the collection time of photons on the photocathode is of a few nanoseconds, thus practically instantaneous.
2.4. The laser threshold Thus far L has been a laser well above threshold, satisfying Glauber’s requirement the no reaction from the field be felt by the emitting atoms. We now explore how PS provides information on the threshold phenomena. Let us simplify an atom as a two-level quantum system resonantly coupled to an optical field E . If the atom is in the ground state (Fig. 6, upper part) it absorbs light and contributes a polarization P negative in the field P = − χ E . The corresponding interaction energy − P • E is a convex parabola (upper right) and the corresponding equilibrium probability for the field has a maximum at the minimum of the parabola; in fact it is a Gaussian as expected for a blackbody field. As the atom is excited to the upper state, it will emit a photon and the polarization changes sign. The corresponding parabola is concave and the fields has no minimum where to be confined, but it should escape (left or right) toward high E values; no finite area probability could be defined. However, as E achieves high values, a new fact occurs. The photon density in the cavity where the atom is contained increases, and there is a finite probability that the atom reabsorbs a photon going again to the excited state and then reemits a photon. The overall process, implying 3 photons, is less efficient than an
12
F.T. Arecchi
P = -α E E
?
Energy curves and corresponding probabilities of field E
?
P=αE E
P = α E – β E3 E -E0
+E0
Figure 6. A two-level atom resonantly coupled to an optical field E.
independent photon emission from 3 excited atoms; thus it amounts to a negative correction of the polarization cubic in E . The corresponding energy is quartic in E ; it has 2 minima symmetric around the origin. The field probability has 2 peaks at those minima. If the peaks are narrow, they can be assimilated to delta-like distributions at +E0 and −E0 . The photon production is no sensitive to the phase difference, thus we have a unique Poisson PS around n = |E0|2. As one increases the pump or reduces the cavity losses, a threshold is crossed where the stimulated emission gain compensates the losses. In Fig. 7, case 1 is below threshold with a confining parabola and a corresponding Gaussian field probability; case 3 is well above threshold with two narrow peaks of field probability. In between, case 2 (threshold: gain=losses) has zero linear polarization and hence a flat energy bottom, yielding a broad probability distribution. This means two things: 1. a large value of the fluctuations around the average; 2. a slow dynamics within the flat potential well with corresponding long correlation times and narrow linewidth.
Coherence, Complexity and Creativity
13
1 Energy curves (1,2,3) and corresponding probabilities of field E(1,2,3) in three situations: 1 – Below threshold 2 – On threshold 3 - Above threshold
2
3
1 2
E E
3 E
Figure 7. A two-level atom resonantly coupled to an optical field E.
The laser threshold is like the critical point of a phase transition, displaying both (i) critical fluctuations and (ii) critical slowing down. A formal approach is given in the Appendix. It summarizes the theory developed independently by the two groups of H. Haken and M. Lax, that the Landau classical theory of phase transitions in thermal equilibrium systems can be extended to non-equilibrium systems as the laser. In Fig. 8 we report the behavior of the second moment of PS, suitably scaled so that it is 1 for a Gaussian field and 0 for a coherent one, versus the ratio gain to losses expressed as ratio between M1 (first order moment of PS, that is, laser intensity) and its threshold value M10 . The experimental points fit well Haken’s theoretical line. In Fig. 9 we plot the linewidth of the laser fluctuations versus the intensity ratio to threshold I / I0 .
14
F.T. Arecchi
Figure 8. H2 is the second moment of a laser PS, suitably scaled so that it is 1 for a Gaussian field and 0 for a coherent one, versus the ratio between M1 (first order moment of PS, that is, laser intensity) and its threshold value M10. Points are experimental; solid line from Haken’s theory. [20]
Figure 9. Linewidth of the laser fluctuations versus the intensity ratio to threshold I/I0. The experimental points fit the full solution eff, whereas the linear approximation linewidth 01 is wrong around threshold. The horizontal axis is also graded by the so called pump parameter a; it is the difference gain – losses; a = 0 is the threshold. The upper horizontal axis is the average intracavity photon number for each a; for a = 0, it is about 1700 for the used laser. [17]
The experiment fits the full solution λeff of the time dependent nonlinear Fokker Planck equation (see Appendix). A linear approximation yields a linewidth λ01 which is wrong around threshold.
Coherence, Complexity and Creativity
15
Figure 10. Transient laser PS. As a fast intra-cavity shutter is switched from closed to open, the losses switch from a high to a low value. [11,3]
2.5. The transient laser If we insert a fast shutter in a laser cavity where the atoms are highly pumped, as the shutter is switched from closed to open, the losses switch from a high value (above the gain) to a low value (below the gain). With reference to Fig. 7, the system has a sudden jump from case 1 to case 3. The transient in a He-Ne laser takes about 10 µs. If we probe the PS over short observation times (50 ns) at different delays from the switch time (Fig. 10) we obtain different PS (normalized to equal area) ranging from the black-body statistics at the beginning (case a) to a Poisson statistics at the end (case f). In order to collect the ensemble of experimental points of each PS, once a delay has been set and a single sample collected, the shutter is switched off and the measurement cycle repeated for that same delay. If we extract first and second moments and plot average photon number n and variance ∆n2 = n2 − n 2 versus observation time, (Fig. 11) we see that while the average has a monotonic increase the variance undergoes a large intermediate peak. The explanation is that the initial field fluctuations are initially linearly amplified, before being limited by the cubic gain saturation. Such a transient large fluctuation, observed in 1967 for a laser, was later observed in all second order phase transitions.
16
F.T. Arecchi
Average photon number
Fluctuations around average value
time
Figure 11. Transient laser statistics: the average photon number has a monotonic increase, whereas the variance undergoes a large intermediate peak. The explanation is that the initial field fluctuations are initially linearly amplified, before being limited by the cubic gain saturation. [11]
3. Deterministic chaos and complexity 3.1. Deterministic chaos We know from Newton dynamics that, for assigned forces, the trajectory emerging from a given initial condition, as the star in Fig. 12, is unique. The coordinates of the initial point are in general real numbers truncated to a finite number of digits, thus the initial condition is spread over a small patch. Points of the patch converge toward the calculated trajectory or diverge from it, depending on whether the transverse stability yields a stability landscape like a valley (left) or the ridge of a hill (right). Poincaré proved that from 3 coupled dynamical degrees of freedom on, the right situation is generic. Nowadays we call it deterministic chaos. It is a sensitive dependence on initial conditions and implies a loss of information of the initial preparation. The loss rate K is called Kolmogorov entropy. We can adjust its value by adding extra variables which change the slope of the downhill fall without practically perturbing the longitudinal trajectory (control of chaos). In the case of the laser, the threshold is the first of a chain of dynamical bifurcations. Starting 1982, the successive bifurcations of a laser leading to deterministic chaos were explored. Among the chaotic scenarios, the so called HC (Heteroclinic chaos), consisting of trains of equal spikes with erratic interspike separation, was explored in CO2 and in diode lasers with feedback (Fig. 13, 14). Fig. 13 shows the experimental set up and displays the 3 coupled equations. The first two are the standard rate equations for the intensity x
Coherence, Complexity and Creativity
17
Transverse stability ( ) Trajectories from initial conditions different from On the left regular motion; on the right chaotic motion with information loss Figure 12. Deterministic chaos-the trajectory emerging from a precise initial condition, as the star, is unique. However, in general the initial condition is spread over a small patch. Points of the patch converge toward the calculated trajectory or diverge from it, depending on whether the transverse stability landscape is a valley (left) or a hill (right).
coupled to the population inversion y via the Einstein constant G. k0 and γ are the damping rates for x and y, respectively, p0 is the pump rate. Two equations do not give chaos, and in fact a generic laboratory laser is not chaotic. We add a third equation as follows. The detected output intensity provides a voltage z which drives an intracavity loss modulator (see added z term in the first equations). In the feedback loop, R and b0 act as control parameters. The third damping rate is of the same order as the other two. The dynamics (Fig. 14) consists of trains of almost equal intensity spikes, separated by erratic interspike intervals (ISI). In b) we zoom on two successive spikes, to show their repeatability. By a threshold we may cut the small chaotic fluctuations and observe a spiking of regular shape; however, chaos results in the variable ISI. In c) we build a 3-D phase space by an embedding technique. Each point reports the intensity sampled at time t and after two short delays τ and 2τ. The orbit closes at each turn (after a variable time depending on the local ISI). The figure is built over many spikes. The part of the orbit with a single line is the superposition of the large spikes, the small chaotic tangle corresponds to the small non-repetitive pulses.
18
F.T. Arecchi CO2 laser with feedback
Skeleton of 3D model
x = − k 0 x (1 − k1 sin 2 z ) + G x y
CO2
EOM
y = −2G x y − γ y + p 0
Det
R
z = β (− z − b0 − R x ) x y z
laser intensity population inversion feedback signal
B0
Figure 13. CO2 laser with feedback: experimental set up and the 3 coupled equations. CO2 is the gas where an electric discharge provides molecular population inversion at the laser frequency; EOM is an electro-optic loss modulator driven by the voltage z of the feedback loop; R is the gain of the feedback amplifier and B0 a d.c. bias voltage [13].
Figure 14. a) trains of almost equal intensity spikes, separated by erratic inter-spike intervals (ISI). b) zoom on two successive spikes, to show their repeatability. c) 3-D phase space built by an embedding technique. Each point reports the intensity sampled at time t and after two short delays and 2 . The figure is built over many spikes. The part of the orbit with a single line is the superposition of the large spikes, the small chaotic tangle corresponds to the small non-repetitive pulses [7].
The experimental phase space (Fig. 14c) suggests that is due to a saddle focus instability S, to which the system returns after a loop. The trajectory approaches S through a stable branch (Fig. 15), or manifold, and escapes away through an unstable one. We call α the contraction rate and γ ± iω the complex expansion rate. If α < γ [40] (Shilnikov condition), this local relation at S provides global
Coherence, Complexity and Creativity
19
Figure 15. HC (homoclinic chaos) consisting of the return to saddle focus S. The trajectory approaches S through a stable branch, and escapes away through an unstable one.
chaos, that we call HC (homoclinic chaos) since the return time to S is affected by the uncertainty in the expanding region, Around S the system displays a high susceptibility χ = response stimulus . Away from S, the system is insensitive to external perturbations and displays a repeatable loop. Time-wise, large spikes P of equal shape repeat at chaotic inter-spike intervals (ISI). In fact the feedback laser has a second instability, a saddle node corresponding to zero intensity (see Fig. 14); in such a case HC stays rather for heteroclinic chaos. Due to the high susceptibility, a small perturbation applied around S strongly affects the ISI ; we exploit this fact to synchronize the HC laser to an external signal. If the driving frequency is close to the natural one (associated with the average ISI ) we have a 1:1 locking. Fig. 16 shows the laser synchronization to a small forcing signal. In the feedback amplifier we introduce a periodic input which is a small percentage of the feedback signal. A forcing frequency close to 2π / ISI induces a 1:1 locking; at lower frequencies we have 1:2 and 1:3 locking, at higher frequencies we have 2:1 etc. locking regimes. It looks as the best implementation of a time code: indeed, networks of coupled HC systems may reach a state of collective synchronization lasting for a finite time, in presence of a suitable external input. This opens powerful analogies with the feature binding phenomenon characterizing neuron organization in a perceptual task (Sec. 4).
20
F.T. Arecchi
Figure 16. Laser synchronization to a small forcing signal of frequency close to (a) or smaller (b and c) or larger (d) than, the natural HC frequency 2 / ISI [7].
3.2. Complexity of a multimode light oscillator 3.2.1. a) longitudinal case Thus far we have referred to laser cavities designed to house one or a few transverse modes. In fact, if L is the mirror separation and d the mirror diameter, the number of diffraction angles λ d that can be seen by the aperture angle d L is given by the Fresnel number
F=
d2 λL
For example in a gas laser L = 1 m, λ ≈ 10-6 m and d ≈ 10-3 m, so that F ≈ 1 and the laser cavity hosts a single transverse mode. If however the gain line is much larger than the longitudinal mode separation ∆υgain >> c / 2L, then many longitudinal modes can be simultaneously above threshold. In such a case the nonlinear mode-mode coupling, due to the medium interaction, gives an overall high dimensional dynamical system which may undergo chaos. This explains the random spiking behavior of long lasers. The regular spiking in time associated with mode
Coherence, Complexity and Creativity
21
Figure 17. Photorefractive oscillator, with the photorefractive effect enhanced by a LCLV (Liquid Crystal Light Valve). Experimental setup; A is an aperture fixing the Fresnel number of the cavity, z =0 corresponds to the plane of the LCLV; z1, z2, z3 are the three different observation planes. Below: 2-dimensional complex field, with lines of zero real part (solid) and lines of zero imaginary part (dashed). At the intersection points the field amplitude is zero and its phase not defined, so that the circulation of the phase gradient around these points is non-zero (either ±2 ) yielding phase singularities. [15,4,10]
locking is an example of mutual phase synchronization, akin to the regular spiking reported in Fig. 16. 3.2.2. b) transverse case Inserting a photorefractive crystal in a cavity, the crystal is provided high optical gain by a pump laser beam. As the gain overcomes the cavity losses, we have a coherent light oscillator. Due to the narrow linewidth of the crystal, a single longitudinal mode is excited; however, by an optical adjustment we can have large Fresnel numbers, and hence many transverse modes. We carried a research line starting from 1990 [15,16, for a review see 4]. Recently we returned to this oscillator but with a giant photorefractive effect provided by the coupling of a photorefractive slice to a liquid crystal [10,6,12] (Fig. 17). The inset in this figure shows how phase singularities appear in a 2D wave field. A phase gradient circulation ±2π is called a topological charge of ±1 respectively. A photodetector responds to the modulus square of the field amplitude. To have a phase information, we superpose a plane wave light to the 2D pattern, obtaining results illustrated in Fig. 18. For a high Fresnel number we have a number of
22
F.T. Arecchi
Figure 18. Left: a phase singularity is visualized by superposing an auxiliary coaxial plane wave to the optical pattern of the photorefractive oscillator; reconstruction of the instantaneous phase surface: perspective and equi-phase plots. Right: if the auxiliary beam is tilted, we obtain interference fringes, interrupted at each phase singularity (± correspond to ±2 circulation, respectively). The digitized fringe plots correspond to: upper plot (Fresnel number about 3): 6 defects of equal topological charge against 1 of opposite charge; lower plot (Fresnel number close to 10): almost 100 singularities with balanced opposite charges, besides a small residual unbalance [16].
singularities scaling as the square of the Fresnel number [9]. Referring to the inset of Fig. 17, when both intersections of the two zero lines are within the boundary, we expect a balance of opposite topological charges. However, for small Fresnel numbers, it is likely that only one intersection is confined within the boundary; this corresponds to an unbalance, as shown in Fig. 18, upper right. The scaling with the Fresnel number is purely geometric and does not imply dynamics. The statistics of zero-field occurrences can be predicted on purely geometric considerations, as done for random speckles. If instead we look at the high intensity peak in between the zeros, the high fields in a nonlinear medium give a strong mode-mode coupling which goes beyond speckles. This should result from the statistical occurrence of very large peaks. In order to do that, we collect space-time frames as shown in Fig. 19, with the help of the CCD +grabber set up shown in Fig. 17. We don’t have yet a definite 2D comparison with speckles. However, a 1D experiment in an optical fiber has
Coherence, Complexity and Creativity
23
Figure 19. Photorefractive oscillator: Spatiotemporal profile extracted from the z2 movie. [10]
produced giant optical spikes with non-Gaussian statistics [43]. The author draw an analogy with the so called “rogue” wave in the ocean which represent a frequent problem to boats, since satellite inspection has shown that they are more frequent than expected on a purely linear basis. We consider the anomalous statistics of giant spikes as a case of complexity, because the mutual coupling in a nonlinear medium makes the number of possible configurations increasing exponentially with the Fresnel number, rather than polynomially. The rest of the paper explores this question: how it occurs that a cognitive agent in a complex situation decides for a specific case, before having scanned all possible cases, that is, how we “cheat” complexity.
4. Physics of cognition – Creativity 4.1. Perception and control of chaos Synchronization of a chain of chaotic lasers provides a promising model for a physics of cognition. Exploration of a complex situation would require a very large amount of time. In cognitive tasks facing a complex scenario, our strategy consists in converging to a decision within a finite short time. Various experiments [36,38] prove that a decision is taken after 200 ms of exposure to a sensory stimulus. Thus, any conscious perception (we define conscious as that
24
F.T. Arecchi
Figure 20. Feature binding: the lady and the cat are respectively represented by the mosaic of empty and filled circles, each one representing the receptive field of a neuron group in the visual cortex. Within each circle the processing refers to a specific detail (e.g. contour orientation). The relations between details are coded by the temporal correlation among neurons, as shown by the same sequences of electrical pulses for two filled circles or two empty circles. Neurons referring to the same individual (e.g. the cat) have synchronous discharges, whereas their spikes are uncorrelated with those referring to another individual (the lady) [42].
eliciting a decision) requires about 200 ms, whereas the loss of information in a chaotic train of neural spikes takes a few msec. Let us consider the visual system; the role of elementary feature detectors has been extensively studied [34]. By now we know that some neurons are specialized in detecting exclusively vertical or horizontal bars, or a specific luminance contrast, etc. However the problem arises: how elementary detectors contribute to a holistic (Gestalt) perception? A hint is provided by [42]. Suppose we are exposed to a visual field containing two separate objects. Both objects are made of the same visual elements, horizontal and vertical contour bars, different degrees of luminance, etc. What are then the neural correlates of the identification of the two objects? We have one million fibers connecting the retina to the visual cortex. Each fiber results from the merging of approximately 100 retinal detectors (rods and cones) and as a result it has its own receptive field. Each receptive field isolates a specific detail of an object (e.g. a vertical bar). We thus split an image into a mosaic of adjacent receptive fields. Now the “feature binding” hypothesis consists of assuming that all the cortical neurons whose receptive fields are pointing to a specific object synchronize the corresponding spikes, and as a consequence the visual cortex
Coherence, Complexity and Creativity
25
Figure 21. ART = Adaptive Resonance Theory. Role of bottom-up stimuli from the early visual stages an top-down signals due to expectations formulated by the semantic memory. The focal attention assures the matching (resonance) between the two streams [27].
organizes into separate neuron groups oscillating on two distinct spike trains for the two objects. Direct experimental evidence of this synchronization is obtained by insertion of microelectrodes in the cortical tissue of animals just sensing the single neuron (Fig. 20) [42]. An array of weakly coupled HC systems represents the simplest model for a physical realization of feature binding. The array can achieve a collective synchronized state lasting for a finite time (corresponding to the physiological 200 ms!) if there is a sparse (non global) coupling, if the input (bottom-up) is applied to just a few neurons and if the inter-neuron coupling is suitably adjusted (top-down control of chaos) [5,23]. Fig. 21 shows the scheme of ART [27]. The interaction of a bottom-up signal (external stimulus) with a top-down change of the control parameters (induced by the semantic memory) leads to a collective synchronization lasting 200 ms: this is the indicator of a conscious perception. The operation is a control of chaos, and it has an optimality; if it lasts less than 200 ms, no decisions emerge; on the contrary, if it lasts much longer, there is no room for sequential cognitive tasks (Fig. 22). The addition of extra degrees of freedom implies a change of code, thus it can be seen as a new level of description of the same physical system.
26
F.T. Arecchi
Figure 22. Chaos is controlled by adding extra-dynamic variables, which change the transverse instability without affecting the longitudinal trajectory. In the perceptual case, the most suitable topdown signals are those which provide a synchronized neuron array with an information lifetime sufficient to activate successive decisional areas (e.g. 200 ms), whereas the single HC neuron has a chaotic lifetime of 2 ms. If our attentional-emotional system is excessively cautious, it provides a top-down correction which may stabilize the transverse instability for ever, but then the perceptual area is blocked to further perceptions.
4.2. From perception to cognition - Creativity We distinguish two types of cognitive task. In type I, we work within a prefixed framework and readjust the hypotheses at each new cognitive session, by a Bayes strategy. Bayes theorem [21] consists of the relation:
P(h | data) = P (data | h)
P ( h) P(data)
(9)
That is: the probability P(h | data ) of an hypothesis h, conditioned by the observed data (this is the meaning of the bar | ) and called a-posteriori probability of h, is the product of the probability P(data | h) that data are generated by an hypothesis h, times the a-priori probability P (h) of that hypothesis (we assume to have a package of convenient hypotheses with different probabilities) and divided the probability P(data) of the effectively occurred data. As shown in Fig. 23, starting from an initial observation and formulating a large number of different hypotheses, the one supported by the experiment suggests the most appropriate dynamical explanation. Going a step forward and repeating the Bayes procedure amounts to climbing a probability mountain along a steepest gradient line.
Coherence, Complexity and Creativity
27
final condition INFORMATION Fitness = Probability mountains a-posteriori probability a-priori probability
Darwin = Bayesian strategy
initial condition
Figure 23. Successive applications of the Bayes theorem to the experiments. The procedure is an ascent of the Probability Mountain through a steepest gradient line. Each point of the line carries an information related to the local probability by Shannon formula. Notice that Darwinian evolution by mutation and successive selection of the best fit mutant is a sequential implementation of Bayes theorem. [19,18]
On the other hand, a complex problem is characterized by a probability landscape with many peaks (Fig. 24). Jumping from a probability hill to another is not Bayesian; I call it type II cognition. A deterministic computer can not do it. In human cognition, Type II is driven by hints suggested by the context (semiosis) yet not included in the model. Type II task is a creativity act because it goes beyond it implies a change of code, at variance with Type I, which operates within a fixed code. The ascent to a single peak can be automatized in a steepest gradient program; once the peak has been reached, the program stops, any further step would be a downfall. A non-deterministic computer can not perform the jumps of Type II, since it intrinsically lacks semiotic abilities. In order to do that, the computer must be assisted by a human operator. We call “meaning” the multi-peak landscape and “semantic complexity” the number of peaks. However, this is a fuzzy concept, which varies as our comprehension evolves (Fig. 25). Let us discuss in detail the difference between type I cognitive task, which implies changing hypothesis h within a model, that is, climbing a single mountain, and Type II cognitive task which implies changing model, that is, jumping over to another mountain.
28
F.T. Arecchi complexity MEANING
INFORMATION
STOP!!! Bayes without semiosis
complication
Figure 24. Semantic complexity - A complex system is one with a many-peak probability landscape. The ascent to a single peak can be automatized by a steepest gradient program. On the contrary, to record the other peaks, and thus continue the Bayes strategy elsewhere, is a creativity act, implying a holistic comprehension of the surrounding world (semiosis). We call “meaning” the multi-peak landscape and “semantic complexity” the number of peaks. It has been guessed that semiosis is the property that discriminates living beings from Turing machines [39]; here we show that a nonalgorithmic procedure, that is, a non-Bayesian jump from one model to another is what we have called creativity. Is semiosis equivalent to creativity? [19,18].
We formalize a model as a set of dynamical variables xi (i = 1,2, , N ) , N being the number of degrees of freedom, with the equations of motion
xi = Fi ( x1 ,
, x N ; µ1 ,
, µM )
(10)
where Fi are the force laws and the M numbers µ represent the control parameters. The set {F, x, µ} is the model. Changing hypotheses within a model means varying the control parameters, as we do when exploring the transition from regular to chaotic motion in some model dynamics. Instead, changing code, or model, means selecting different sets y ,ν , G of degrees of freedom, control parameters and equations of motion as follows:
yi = Gi ( y1 ,
, y R ;ν 1 ,
,ν L )
(11)
where R and L are different respectively from N and M. The set {G , y, ν } is the new model. While changing hypotheses within a model is an a-semiotic procedure that can be automatized in an computerized expert system, changing model implies catching the meaning of the observed world, and this requires what has been
Coherence, Complexity and Creativity
29
Re-coding = creativity C computation Semiosis
Newton
Scientific Theory 0
K
Figure 25. C-K diagram (C = computational complexity; K = Information loss rate in chaotic motion): Comparison between the procedure of a computer and a semiotic cognitive agent (say: a scientist). The computer operates within a single code and C increases with K. A scientist explores how adding different degrees of freedom one can reduce the high K of the single-code description. This is equivalent to the control operation of Fig. 22; it corresponds to a new model with reduced C and K. An example is offered by the transition from a molecular dynamics to a thermodynamic description of a many body system. Other examples are listed in Table 1. The BACON program [41] could retrieve automatically Kepler’s laws from astronomical data just because the solar system approximated by Newton two-body interactions is chaos-free.
called embodied cognition [46]. Embodied cognition has been developed over thousands of generations of evolutionary adaptation, and we are unable so far to formalize it as an algorithm. This no-go statement seems to be violated by a class of complex systems, which has been dealt with successfully by recursive algorithms. Let us consider a space lattice of spins, with couplings that can be ferro or anti-ferromagnetic in a disordered, but frozen way (spin glass at zero temperature, with quenched disorder). It will be impossible to find a unique ground state. For instance having three spins A, B, and C in a triangular lattice, if all have ferromagnetic interaction, then the ground state will consist of parallel spins, but if instead one (and only one) of the mutual coupling is anti-ferromagnetic, then there will be no satisfactory spin orientation compatible with the coupling (try with: A-up, Bup, C-up; it does not work; then try to reverse a single spin, but it does not work either). This model has a cognitive flavor, since a brain region can be modeled as a lattice of coupled neurons with coupling either excitatory or inhibitory, thus resembling a spin glass, [33,1,45]. We have a large number of possible ground
F.T. Arecchi
30
Table 1. From complication to complexity: four cases of creativity. 1 - electricity - magnetism - optics
Electromagnetic equations (Maxwell)
2 - Mendeleev table
Quantum atom (Bohr, Pauli)
3 - zoo of 200 elementary particles
Quarks (M. Gell Mann)
4 - scaling laws in phase transitions
Renormalization group (K. Wilson)
states, all including some frustration. Trying to classify all possible configurations is a task whose computational difficulty (either, program length or execution time) diverges exponentially with the size of the system. Sequentially related changes of code have been successfully introduced to arrive at finite-time solutions. [37,44]. Can we say that the mentioned solutions realize the reductionistic dream of finding a suitable computer program that not only climbs the single probability peak, but also is able to choose the highest peak? If so, the optimization problem would correspond to understanding the meaning of the object under scrutiny. We should realize however that spin glasses are frozen objects, given once for ever. A clever search of symmetries has produced a spin glass theory [37] that, like the Renormalization Group (RG) for critical phenomena [47] discovers a recursive procedure for changing codes in an optimized way. Even though the problem has a large number of potential minima, and hence of probability peaks, a suitable insight in the topology of the abstract space embedding the dynamical system has led to an optimized trajectory across the peaks. In other words, the correlated clusters can be ordered in a hierarchical way and a formalism analogous to RG applied. It must be stressed that this has been possible because the system under scrutiny has a structure assigned once for ever. In everyday tasks, we face a system embedded in an environment, which induces a-priori unpredictable changes in course of time. This rules out the nice symmetries of hierarchical approaches, and rather requires an adaptive approach. Furthermore, a real life context sensitive system has to be understood within a reasonably short time, in order to take vital decisions about it.
Coherence, Complexity and Creativity
31
We find again a role of control of chaos in cognitive strategies, whenever we go beyond the limit of a Bayes strategy. We call creativity this optimal control of neuronal chaos. Four cases of creative science are listed in Table 1. Furthermore, Fig. 24 sketches the reduction of complexity and chaos which results from a creative scientific step. Appendix. Haken theory of laser threshold [28,29,30,34] We summarize in Table 2 the Langevin equation for a field E, ruled by a dynamics f (E ) corresponding to the atomic polarization and perturbed by a noise. The noise has zero average and a delta-like correlation function with amplitude D given by the spontaneous emission of the N2 atoms in the upper state. The time dependent probability P( E , t ) for E obeys a Fokker-Planck equation. In the stationary limit of zero time derivative, the Fokker-Planck equation is easily solved and gives a negative exponential on V (E ) which is the potential of the force f (E ) . Below laser threshold, f (E ) is linear, V quadratic and P(E ) Gaussian. Above threshold, f has a cubic correction, V is quartic and P(E ) displays two peaks at the minima of the quartic potential. Table 2.
E = f (E) + ξ
Langevin equation
ξ 0 ξ1 = 2 Dδ (t ) D = γ spont N 2 ∂P ∂ ∂2P =− f (E) + D 2 ∂t ∂E ∂E P( E ) ≈ e −V ( E )
D
Fokker-Planck equation Stationary solution
V ( E ) = − f ( E )dE
f ( E ) = −αE 2
f ( E ) = +αE − β E E
Force laws, over/under threshold
F.T. Arecchi
32
References Papers of which I am author or co-author can be found in my home page: www.inoa.it/home/arecchi , List of publications - Research papers in Physics
1. D.J. Amit, H. Gutfreund, H. Sompolinski, Phys. Rev A 32, 1007 (1985). 2. F.T. Arecchi, Phys. Rev. Lett. 15, 912 (1965). 3. F.T. Arecchi, Proc. E. Fermi School 1967 in Quantum Optics, Ed. R.J. Glauber (Academic Press, New York, 1969), pp. 57-110.
4. F.T. Arecchi, in Nonlinear dynamics and spatial complexity in optical systems (Institute of Physics Publishing, Bristol, 1993), pp. 65-113.
5. F.T. Arecchi, Physica A 338, 218-237 (2004). 6. F.T. Arecchi, in La Fisica nella Scuola, Quaderno 18, (Epistemologia e Didattica della Fisica) Bollettino della Assoc. Insegn. Fisica 40(1), 22-50 (2007).
7. F.T. Arecchi, E. Allaria, A. Di Garbo, R. Meucci, Phys. Rev. Lett 86, 791 (2001). 8. F.T. Arecchi, A. Berné, P. Burlamacchi, Phys. Rev. Lett. 16, 32 (1966). 9. F.T. Arecchi, S. Boccaletti, P.L. Ramazza, S. Residori, Phys. Rev. Lett. 70, 2277, (1993).
10. F.T. Arecchi, U. Bortolozzo, A. Montina, J.P. Huignard, S. Residori, Phys. Rev. Lett. 99, 023901 (2007).
11. F.T. Arecchi, V. Degiorgio, B. Querzola, Phys. Rev. Lett. 19, 1168 (1967). 12. F.T. Arecchi, V. Fano, in Hermeneutica 2007, Annuario di filosofia e teologia, (Morcelliana, Brescia, 2007), pp. 151-174.
13. F.T. Arecchi, W. Gadomski, R. Meucci, Phys. Rev. A 34, 1617 (1986). 14. F.T. Arecchi, E. Gatti, A. Sona, Phys. Lett. 20, 27 (1966). 15. F.T. Arecchi, G. Giacomelli, P.L. Ramazza, S. Residori, Phys. Rev. Lett. 65, 25312534 (1990).
16. F.T. Arecchi, G. Giacomelli, P.L. Ramazza, S. Residori, Phys. Rev. Lett. 67, 3749 (1991).
17. F.T. Arecchi, M. Giglio, A. Sona, Phys. Lett. 25A, 341 (1967). 18. F.T. Arecchi, R. Meucci, F. Salvadori, K. Al Naimee, S. Brugioni, B.K. Goswami, S. Boccaletti, Phil. Trans. R. Soc. A, doi:10.198/rsta, 2104 (2007).
19. F.T. Arecchi, A. Montina, U. Bortolozzo, S. Residori, J.P. Huignard, Phys. Rev. A 76, 033826 (2007). F.T. Arecchi, G.P. Rodari, A. Sona, Phys. Lett. 25A, 59 (1967). T. Bayes, Phil. Trans. Royal Soc. 53, 370-418 (1763). G.J. Chaitin, Algorithmic information theory, (Cambridge University Press, 1987). M. Ciszak, A. Montina, F.T. Arecchi, arXiv, nlin.CD:0709.1108v1 (2007). R.J. Glauber, Phys. Rev. 130, 2529 (1963). R.J. Glauber, Phys. Rev. 131, 2766 (1963). R.J. Glauber, in Quantum Optics and Electronics, Ed. C. DeWitt et al., (Gordon and Breach, New York, 1965). 27. S. Grossberg, The American Scientist 83, 439 (1995). 28. H. Haken, Zeits. Phys. 181, 96-124 (1964), 182; 346-359 (1964). 29. H. Haken, Phys. Rev. Lett. 13, 329 (1964).
20. 21. 22. 23. 24. 25. 26.
Coherence, Complexity and Creativity
30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48.
33
H. Haken, Laser Theory, (Springer, Berlin, 1984). H. Haken, H. Risken, W. Weidlich, Zeits. Phys. 204, 223 (1967); 206, 355 (1967). R. Hanbury Brown, R.Q. Twiss, Nature, 4497, 27 (1956). J.J. Hopfield, Proc. Nat. Aca. Sci., USA 79, 2554 (1982). D.H. Hubel, Eye, Brain and Vision, Scientific American Library, No. 22, (W.H. Freeman, New York, 1995). M. Lax, Phys. Rev. 145, 110-129 (1966). B. Libet, E.W. Wright, B. Feinstein, D.K. Pearl, Brain 102, 193 (1979). M. Mezard, G. Parisi, M.A. Virasoro, Spin glass theory and beyond (World Scientific, Singapore, 1987). E. Rodriguez, N. George, J.P. Lachaux, J. Martinerie, B. Renault, F. Varela, Nature 397, 340-343 (1999). T.A. Sebeok, Semiotica 1341(4), 61-78 (2001). L.P. Shilnikov, Dokl. Akad. Nauk SSSR, 160, 558 (1965). A. Shilnikov, L. Shilnikov, D. Turaev, Int. J. Bif. And Chaos 14, 2143 (2004). H.A. Simon, Cognitive Science 4, 33-46 (1980). W. Singer, E.C.M. Gray, Annu. Rev. Neurosci. 18, 555 (1995). D.R. Solli, C. Ropers, P. Koonath, B Jalali, Nature 450, 1054 (2007). S. Solomon, in Ann. Rev. of Comp. Physics II, (World Scientific,1995), pp. 243-294. G. Toulouse, S. Dehaene, J.P. Changeux, Proc. Nat. Aca. Sci. USA 83, 1695 (1986). F. Varela, E. Thompson, E. Rosch, The Embodied Mind (MIT Press, Cambridge, MA, 1991). K.G. Wilson, Rev. Mod. Phys. 47, 773 (1975).
This page intentionally left blank
EMERGENCE IN ARCHITECTURE
This page intentionally left blank
ENVIRONMENT AND ARCHITECTURE – A PARADIGM SHIFT
VALERIO DI BATTISTA Politecnico di Milano Dipartimento Building Environment Science and Technology – BEST The interaction of human cultures and the built environment allows a wide range of interpretations and has been studied inside the domain of many disciplines. This paper discusses three interpretations descending from a systemic approach to the question: - architecture as an “emergence” of the settlement system; - place (and space) as an “accumulator” of time and a “flux” of systems; - landscape as one representation/description of the human settlement. Architecture emerges as a new physical conformation or layout, or as a change in a specific site, arising from actions and representations of political, religious, economical or social powers, being shaped at all times by the material culture belonging to a specific time and place in the course of human evolution. Any inhabited space becomes over time a place as well as a landscape, i.e. a representation of the settlement and a relationship between setting and people. Therefore, any place owns a landscape which, in turn, is a system of physical systems; it could be defined as a system of sites that builds up its own structure stemming from the orographical features and the geometry of land surfaces that set out the basic characters of its space. Keywords: Architectural Design, Architecture, Built Environment, Landscape.
1. Introduction A number of studies, both international (Morin, 1977 [19]; Diamond, 1997 [6]), and national (Bocchi and Ceruti, 2004 [1]; La Cecla, 1988, 1993 [14,15]) have recently highlighted a new and wider understanding of human cultures and their interaction with their settlements and the built environment. A part of the Milanese School of Architecture has been interested in these questions for a long time: I would like to recall, among the others, Guido Nardi’s work on dwelling (Nardi, 1986 [21]) and some of our own considerations about the settlement system and the “continuous project” and its double nature – both intentional and unintentional (Di Battista, 1988, 2006 [7,9]). This framework allows a range of interpretations: • architecture as an “emergence” of the settlement system; • place (and space) as an “accumulator” of time and a “flux” of systems; • landscape as one representation/description of the human settlement.
37
38
2.
V. Di Battista
Architecture (be it “high” or not) as an “emergence” of the settlement system
If we define architecture as “the set of human artefacts and signs that establish and denote mankind’s settlement system” (Di Battista, 2006 [10]), we agree that architecture always represents the settlement that generates it, under all circumstances and regardless of any artistic intention. Architecture emerges as a new physical conformation or layout, or as a change in a specific site, arising from actions and representations of political, religious, economical or social powers, being shaped at all times by the material culture belonging to a specific time and place in the course of human evolution. As these actions constantly signal our way of “belonging to a place”, they consequently promote cultures of site and dwelling that denote each dimension of the settlements: from the large scale of the landscape and the city to the small scale of homes and workplaces. These cultures of different settlements involve both human history and everyday life. The “settlement culture” (that is, the culture belonging to a specific settlement) reveals itself by means of its own techniques and artefacts – terracings, buildings, service networks, canals… – and their peculiar features, related to religion, rites, symbols and style. Artefacts and techniques derive from a social and economic environment and highlight psychological and cultural peculiarities of the settled population. Therefore, our artefacts shape and mark places for a long time; moreover, they come from the past continuously reflecting changes occurring in the settlement and in the built environment. All this means that architecture often outlives its generating system, becoming a heritage to the following ones, thus acting as memory – an identity condition linking people and places to their past systems. This peculiarity, signalling both continuity and inertia of the built environment, derives from the many factors that shape the relation between people and places over time. 3. The variable of time and the built environment Whenever we observe a system of objects, the landscape we are facing, this represents both what has been conserved and what has been transformed; it displays geometric shapes, dimension, materials, colors in their relationships and presents a great variety of information about the conditions and means by which every item has been produced and used, in any time. Every description always takes note only of the state of what has been conserved, because the information
Environment and Architecture – A Paradigm Shift
39
about what has been transformed has been irretrievably lost. But even what we perceive as “conservation”, is actually the result of transformation; only a very keen anamnesis and a historical and documental reconstruction can recognise the size and distance in time of past transformation; every backward enquiry puts what we observe to our scientific and cultural models and, paradoxically, the more recent and keener, the more questionable it is. Moreover, no “case history” will ever be able to describe each and every interaction between the built environment we observe today and the settlement system it once belonged to. Every possible assumption about past events is always an interpretation biased by today cultural models and their leading values. This means that memory acquires and processes materials in order to describe a past that always – in different ways – gets to us through our current reality; it, unavoidably, produces a project – be it intentional or unintentional – that regards future. 4. The bonds and values of time Our built environment is the solid outcome of the different lifetimes of all the various things that today represent the system. They represent the “state of reality”, but also refer to the settlements that produced and selected them in time. Therefore, the built environment is the resultant of the many settlements that came one after the other in the same place, the resultant of the un-realized imagination and the enduring conditions; and it is the summation of all the actions – conservation and transformation, addition and subtraction – that have been performed over time in the same place we now observe. It means that today a place is the way it is (be it anyhow) just and because in it a number of things happened, built up and dissolved in a number of times. Every place is the resultant of a huge quantity of things and times: N things N lives N events N times = place N This mound where things and human lives heap together, this summation of times, of every thing and of every human being that ever was in this place, this is what we can read today in our landscapes. This huge amount of past lives we perceive, even if confusedly, may be the reason why we are so spellbound by historical and archaeological finds. Maybe we perceive more keenly our own brief existence, the continuous change of our landscapes, when we face those places where the past and the mound of time become more evident. Actually, today every open space is the background of an ever-changing setting of movable things; this transient scene repeats itself with equivalent components, depriving the background of any meaning. This may be the reason
40
V. Di Battista
why, in our culture, some monuments and place retain acknowledged values and sometimes they become “sacred” in a strange way, being “consumed” by tourism in a sort of due ritual. The hugeness of past, that belongs to every place, cannot be perceived anywhere and anytime; it can be lost when there are not – or there are no more – traces; in these cases, the links between a place and its times wear out in the speed of actions that happen without leaving any mark. 5. Architecture and society No memory can be recalled when every trace of time past has disappeared, but no trace can reach across time if nobody calls it back by inquiry. What is the filter that defines the time of things? No project, no purpose of duration, no painstaking production can guarantee permanence. Only the strict bond between observed system and observing system gives body and meaning to the time of things in a given place. Our built environments, our settlements, are the references – which are in turn observing and observed – of the meanings connecting places and time. Therefore space receives time, it has received it in the past, is sees it flow in the present, it longs for it and it fears it in the future. In the present, the different speeds of change in settlements (for instance, economic values change much faster than social ones) meet the existence cycles of existent buildings; this raises two major issues: • the difference of speed of different settlements in different places of the world; • the virtual availability of all places and landscapes of the earth. This relativization seems to lessen values; indeed, it might offer a new meaning both to “different” conditions and to the material constitution and duration of the places where we live, even the more ordinary ones. In this new relationship with “virtuality” we always find a condition of “dwelling” always claiming a perceptible, physical relationship between us and the space – very variable in character and dimension – that receives our existence, our time, our observations, our decisions, our actions. How do the various existences of places and their specific things meet the occurrences of the human beings that produce, use, conserve, transform or destroy those places?
Environment and Architecture – A Paradigm Shift
41
To understand what happens in our built environments and dwelling places, we could imagine what happens in some of our familiar micro-landscape, such as our bedroom and the things it contains. We could consider the reasons – more or less profound – that organize its space, its fittings, its use, the way we enter it, its outlook and so on. We could also consider the meaning of the different things that characterize that place where we live day after day, discovering and giving way to emotions and rationality, needs and moods, function and symbols: all of these things being more or less inextricable. Now, let’s try and move these reasons and actions and emotions to the wider landscape of social places. Let’s consider the number of subjects acting, of things present in our settlement; let’s multiply the spurs and the hindrances for every subject and every thing. Finally, let’s imagine how many actions (in conservation, transformation, change of use etc.) could affect every single point and every link in the system. If we imagine all this, we will realize that the configuration and the global working of the system is casual, but nevertheless the organization of that space, the meanings of that place – of that built environment – are real. They can exist in reality only as an emergence (a good or bad one, it does not matter) of the settlement system that inhabits that same place. 6. Built environment and landscape Any inhabited space becomes over time a place (a touchstone both for dwelling and identity) as well as a landscape, i.e. a representation of the settlement ad a relationship between setting and people. Therefore, any place owns a landscape which, in turn, is a system of physical systems; it could be defined as a system of sites that builds up its own structure stemming from the orographical features and the geometry of land surfaces that set out the basic characters of its space. It is a multiple space that links every place to all its neighbours and it is characterized by human signs: the agricultural use of land, the regulation of land and water, all the artefact and devices produced by the settled population over time. Thus every place builds up its own landscape, describing its own track record by means of a complex system of diverse signs and meanings. Every landscape displays a dwelling; it changes its dimensions (is can widen up to a whole region, or shrink to a single room) according to the people it hosts and their needs (identity, symbol, intentions of change, use, image…) and their idea of dwelling. This landscape is made of signs that remind of past decisions, projects, actions; it gets its meaning, as a description of material culture, from everything
42
V. Di Battista
that has been done and conceived in it up to our age. And as soon as this space becomes a settlement – and therefore it is observed, described, acted in – it becomes not only a big “accumulator” of permanencies and past energies, but also a big “condenser” of all relations that happen and develop in that place and nowhere else. This local peculiarity of relations depends in part upon geography and climate, in part upon the biological environment (plants, animals), in part upon the characters of local human settlements. In the time t0 of the observation, the place displays the whole range of its current interactions, and that is its identity. Landscapes narrates this identity, that is the whole system of those interactions occurring in the given time: between forms of energy, matter, people, information, behaviors, in that place. Every inhabited place represents, in the time t0 of the observation, the emergence of its settlement system; therefore, as it allows for an infinite number of descriptions, both diachronic and synchronic, it also happens to be – all the time – the “describer” of our settlement system. Every local emergence of a human settlement represents (regarding the time of observation) at the same time both the condition of state t0 of the whole system, and the becoming (in the interval t 0 + t1 * t n ) of its conditions, as the systems within and without it change continuously. Therefore, a place is the dynamic emergence of an open system, the more complex and variable as the more interactive with other systems (social, economic, political…) it is. Observing a place during a (variable) length of time allows us to describe not only its permanence and change – with entities appearing and disappearing – but also its existence flow. This idea – the existence flow of a place, or of a settlement – gives a new meaning to the architectural project in the built environment. 7. The existence flow of a settlement system Every system of relations between human beings and their settlement shapes and gives meaning to its built environment in specific and different ways, according to the different geographic conditions and cultures. We agree that the built environment represents the balance, gained over time, between those environmental conditions and the cultural models belonging to that specific civilization. Some recent studies in environmental anthropology have investigated the feedback from the built environment to the social behavior, and it would be useful to consider the cognitive functions that the “built environment”, in a broader sense, could represent.
Environment and Architecture – A Paradigm Shift
43
Anyway this balance (environment – culture), within the same space, displays a variation in conditions that can be considered as a true flow of elements (and existence) inside the place itself. Resorting to the coincidence “inhabited place/settlement system”, we can describe the space of a place (location and basic physical conditions) as a permanence feature, the unchanging touchstone of all the succeeding systems and their existence cycles. Therefore, could we actually investigate one dynamic settlement system, proceeding in the same place along a continuous existence flow, from its remote foundation to our present time? Or should we analyze by discontinuous methods this same flow as it changes over time and articulates in space? It depends upon the meaning and purpose of our description level. Every place is an evidence of the whole mankind’s history; our history, in turn, changes according to places. The whole flows of events deposits artefacts and signs in places: some of them remain for a long time, some get transformed, some disappear. Generally speaking, natural features such as mountains, hills, plains, rivers, change very slowly, while their anthropic environment (signs, meanings, resources) changes quickly. The duration of artefacts depends upon changes in values (use values, financial values, cultural values etc.), and many settlements may follow one another in the same place over time. It is the presence and the variation of values belonging to the artefacts that establishes their duration over time. On the other side, a built environment crumble to ruins when people flee from it: in this case, it still retains materials and information slowly decaying. Radical changes in the built environment, otherwise, happen when changes in the settlement system establish new needs and requirements. As a settlement changes its structures (social, economic, cultural ones) by imponderable times and causes, so does the built environment – but in a much slower way and it could probably be investigated as an existence flow. In this flow relevant factors can be observed. Permanent and changing elements rely upon different resources and energies, and refer to different social values. Usually, the “useful” elements are conserved; when such elements embody other meanings (such as religious, symbolic, political ones) that are recognized and shared by a large part of the population, their value increases. Sometimes, elements remain because they become irrelevant or their disposal or replacement is too expensive. Duration, therefore, depends upon the complex weaving over time of the needs and the values belonging to a specific settlement
44
V. Di Battista
system, which uses this woven fabric to filter the features (firmitas, utilitas, venustas…) of every artifact and establish its destiny. Landscapes, as system of signs with different lifespan, have two conditions: On one side, in the flowing of events, symbolic values (both religious and civil ones) have a stronger lasting power than use and economic ones, which are more responsive to the changes in the system. On the other side, what we call “a monument” – that is, an important episode in architecture – is the result (often an emergence) of a specific convergence of willpower, resources, many concurrent operators and other favorable conditions. This convergence only enables the construction of great buildings; only commitment and sharing allows an artefact to survive and last over time. It is the same today: only if a system has a strong potential it will be able to achieve and realize major works, only shared values will guarantee long duration of artefacts. 8. Characters of the urban micro-scale The multiple scale of settlement system allows for many different description levels. There are “personal” spaces, belonging to each individual, and the systems where they relate to one another; these spaces include dwellings and interchange places, thus defining an “urban micro-scale” that can be easily investigated. Such a scale displays itself as a compound sign, a self-representation of the current cultural model which is peculiar to every settlement system; it has different specific features (geographical, historical etc.) characteristic of its meanings, roles, identities inside the wider settlement system. The structure of images and patterns (Lynch, 1960, 1981 [16,17]; Hillier and Hanson, 1984 [12]), the urban texture and geometric features, the characters of materials – such as texture, grain and color – their looking fresh or ancient, indicate such things as cultural models, the care devoted to public space by the population and their public image and self-representation. Urban micro-space always represents an open system, a plastic configuration of meanings, where different flows of values, energy, information, resources, needs and performances disclose themselves as variations in the relationship between longlasting and short-lived symbols and signs, which together constitute the landscapes where we all live. Such an open system allows for different levels of description, it requires a recognition, an interpretation of its changes and some partial adjustment and tuning.
Environment and Architecture – A Paradigm Shift
45
9. Inhabited places: use, sign, meanings It would be necessary to investigate the complex interactions that link some features of the cultural model of a population in the place of its settlement (history and material culture, current uses and customs), the way inhabited places are used, the configuration of the ensuing signs (Muratori, 1959 [20]; Caniggia, 1981 [2]), and the general meaning of the place itself. The ways a space is used, the conditions and needs of this use, generate a flow of actions; hence derives to the system a casual configuration which is continuously – though unnoticeably – changing. This continuous change produces the emergence of configurations, systems of signs, which possess a strong internal coherence. Just think of the characteristics of architecture in great cities, corresponding to the periods of great wealth in their history. This emergence of things and built signs, and the mutual relations among one another and their geographic environment is peculiar to all places, but it appeals in different ways to our emotions and actions. The appeal of a place depends upon the different mix of values, linked either to use, information, aesthetics, society etc., that the observer attaches to the place itself; this mix depends upon the observer’s own cultural and cognitive model, as well as his/her needs and conditions (Maturana and Varela, 1980 [18]). In this conceptual framework, every built environment brings forward a set of values which are shared in different ways by the many observing systems inside and outside it. In their turn, such values interfere with the self-regulation of the different flows that run through the environment: flows of activities, personal relationships, interpretations, emotions, personal feelings that continuously interact between humans and things. This generates a circular flow between actions and values, where the agreement connecting different parties is more or less strong and wide and in variable ways regulates and affects flows of meaning as well as of activity and values. 10. Project and design In the open system of the built environment and in the continuous flow of human settlements that inhabit places, there are many reasons, emotions, needs, all of which are constantly operating everywhere in order to transform, preserve, infill, promote or remove things. These intentional actions, every day, change and/or confirm the different levels of our landscape and built environment. This flow records the continuous variation of the complex connexions between people and places. This flow represents and produces the implicit project that all
46
V. Di Battista
built environments carry out to update uses, values, conditions and meaning of their places. This project is implicit because it is self-generated by the random summation of many different and distinct needs and intentions, continuously carried out by undefined and changing subjects. It gets carried through in a totally unpredictable way – as it comes to goals, time, conditions and outcomes. It is this project anyway, by chaotic summations which are nevertheless continuous over time, that transforms and/or preserves all built environments. No single project, either modern or contemporary, has ever been and will ever be so powerful as to direct the physical effects and the meanings brought about by the implicit project. Nevertheless, this very awareness might rouse a new project culture, a more satisfactory public policy, a better ethic in social and economic behaviour. A multiplicity of factors affects this project resulting, in turn, in positive and negative outcomes (Manhattan or the Brazilian favelas?). Which can be the role of intentional projects – and design – in the context of the implicit project, so little known and manageable as it is? How could we connect the relations between time and place deriving from our own interpretation of human settlements, to an implicit project that does not seem to even notice them? Moreover, how could today practice of architectural design – as we know it – cope with such complex and elusive interrelations, in the various scales of space and time? What signs, what meanings do we need today to build more consistent and shared relationships in our built environments? Is it possible to envisage some objective and some design method to improve the values of the built environment? How can we select, in an accurate way, what we should conserve, transform, renew, dispose of? How can we improve something that we so little know of? We need a project of complex processes to organize knowledge and decisions, to find effective answers to many questions, to carry out positive interactions to the flow of actions running through our settlements. 11. Self-regulation, consent, project The issue of consent and shared agreement about the shape and meaning of the human settlement is quite complex and subtle: it deals with power and control, with democracy, with participation. Decisions, choice and agreement cherish each other and correspond to the cultural and consume models of the population. Consent, through the mediation of local sets of rules, turns into customs and icons of the local practices in building and rehabilitation activities. This usually
Environment and Architecture – A Paradigm Shift
47
degenerates into the squalor of suburban housing; but it also makes clear that every human place bears the mark of its own architecture, through a sort of homeostatic self-regulation (Ciribini, 1984, 1992 [3,4]). Such self-regulation modifies meanings by mean of little signs, while upgrading signs by adding new meanings. The continuous variation of these two factors in our environment is the result of a systemic interaction of a collective kind: it could not be otherwise. Will it be possible to improve our capacity in describing such a continuous, minute action we all exert upon every dimension of the built environment? Will it be possible to use this description to modify and consciously steer the flow of actions toward a different behavior? Which role can the single intention/action play and, in particular, what could be the role of the project, as a mock description of alternative intentional actions within the collective unintentional action? How does the cultural mediation operate, between project and commonplace culture? How can it be transferred into the collective culture – that is, into culture’s average, historically shared condition? 12. Research A small settlement can represent, better than a urban portion, a very good place to investigate such complex relations; the good place to understand the role, conditions, chances and limits of the process of architecture – making (Di Battista, 2004 [8]). In a small settlement, the flows and variations of needs, intentions and actions seem more clear; their project comes alive as a collective description that signifies and semantically modifies itself according to values and models which are mainly shared by the whole community. Here the implicit project (Di Battista, 1988 [7]; De Matteis, 1995 [5]), clearly displays itself as a summation of intentions and actions that acknowledges and signifies needs and signs settled over time; in doing this, it preserves some of these signs – giving them new meanings – while discarding others, in a continuous recording of variations that locally occur in the cultural, social, economic system. This continuous updating links the existent (the memory of time past) to the perspective of possible futures. Moreover, it also links the guess upon possible change in real life and in dreams (that is, in hope) with the unavoidable entropy of the physical system. In this sense, the collective project that knows-acts-signifies the complex environment represent its negentropy (Ciribini, 1984, 1992 [3,4]). It would be advisable that such a project could acquire a better ethical consciousness. Thus, inside the collective project, the individual project would became the local
48
V. Di Battista
action confirming and feeding into the whole; or else, it could aim to be the local emergence, finally taking the lead of the collective model. The signs drawn from this continuous, circular osmosis (of models, intentions, predictions, actions, signs and meanings), over and over reorganize, by local an global actions, the existing frame of the built environment. This osmosis never abruptly upsets the prevalence of present signs and meaning: it makes very slow changes, which can be fully perceived over a time span longer than a human lifetime. This self-regulation allows the physical system to survive and change, slowly adjusting it to the continuous change of the cultural and dwelling models; it preserves the places’ identity while updating their meanings. When the implicit project lumps diverging intentions and actions together, the whole meaning becomes confused and hostile. Today, many economical and social relationships tend to “de-spatialize” themselves; many organizations and structures displace and spread their activities, individuals and groups tend to take up unstable relations and may also inhabit multiple contexts. Architecture seems to have met a critical point in shattering one of its main reasons: the capability to represent the relationship between the local physical system and the self-acknowledgement of the social system settled in it. Actually, this de-spatialization is one of the possible identities that individuals, groups and social/economical systems are adopting, and this could be the reason why many places are becoming more and more individualized/socialized. Therefore, the problems of globalization, of social and identity multiplicity, cause such an uncertain and fragmentary forecast that it urges the need and the quest for places that can balance such upsetting; that is why places with memory and identity are so strongly sought for. “Landscape” can be one of the significant centers for this re-balancing action. Landscape is perhaps the most powerful collective and individual representation of the many models we use to describe ourselves – the philosophical and religious as well as the consumerist and productive or the ethical and symbolic ones. It is also the more direct portrayal of many of our desires and fears, both material and ideal. Landscape and architecture are not mere pictures, nor embody only aesthetical and construction capabilities; they are meaningful representations of the time and space emergence of the continuous flow of actions; they self-represent the settlement system in space (Norberg-Schulz, 1963, 1979 [22,23]) and time, and the deepest existential and symbolic relationships of mankind (Heidegger, 1951 [11]; Jung, 1967 [13]):
Environment and Architecture – A Paradigm Shift
49
they are so rich and complex that we still find very hard to describe and even imagine them. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
G. Bocchi and M. Ceruti, Educazione e globalizzazione (Cortina, Milano, 2004). G. Caniggia, Strutture dello spazio antropico (Alinea, Firenze, 1981). G. Ciribini, Tecnologia e progetto (Celid, Torino, 1984). G. Ciribini, Ed., Tecnologie della costruzione (NIS, Roma, 1992). G. De Matteis, Progetto implicito (Franco Angeli, Milano, 1995). J. Diamond, Guns, Germs and Steel. The Fates of Human Societies (1997); (it. ed.: Armi, acciaio e malattie (Einaudi, Torino, 1998)). V. Di Battista, Recuperare, 36, (Peg, Milano, 1988). V. Di Battista, in Teoria Generale dei Sistemi, Sistemica, Emergenza: un’introduzione, G. Minati, (Polimetrica, Monza, 2004), (Prefazione). V. Di Battista, Ambiente Costruito (Alinea, Firenze, 2006) V. Di Battista, in Architettura e approccio sistemico, V. Di Battista, G. Giallocosta, G. Minati, (Polimetrica, Monza, (2006), (Introduzione). M. Heiddeger, Costruire, abitare, pensare (1951), in Saggi e discorsi, Ed. G. Vattimo, (Mursia, Milano, 1976). B. Hillier, J. Hanson, The social logic of space (Cambridge University Press, 1984) C.G. Jung. Man and His Symbols (1967), (it. ed.: L’uomo e i suoi simboli, (Longenesi, Milano, 1980)). F. La Cecla, Perdersi. L’uomo senza ambiente (Laterza, Bari-Roma, 1988). F. La Cecla, Mente locale. Per un’antropologia dell’abitare (Elèuthera, Milano, 1993). K. Lynch, The image of the City, (it. ed.: L’immagine della città (Marsilio, Padova, 1960)). K. Lynch, Good City Form (1981), (it. ed.: Progettare la città: la qualità della forma urbana (Etaslibri, Milano, 1990)). HR. Maturana, F.J. Varala, Autopoiesis and Cognition (1980), (it. ed.: Autopoiesi e cognizione. La realizzazione del vivente (Marsilio, Padova, 1985)). E. Morin, La Methode (1977), (it. ed.: Il metodo (Raffaello Cortina, Milano, 2001)). S. Muratori, Studi per una operante storia urbana di Venezia (Istituto Poligrafico dello Stato, Roma, 1959) G. Nardi, Le nuove radici antiche (Franco Angeli, Milano, 1986). C. Norberg-Schulz, Intentions in Architecture, (1963), (it. ed.: Intenzioni in architettura (Lerici, Milano). C. Norberg-Schulz, Genius Loci (Electa, Milano, 1979).
This page intentionally left blank
EMERGENCE OF ARCHITECTURAL PHENOMENA IN THE HUMAN HABITATION OF SPACE ARNE COLLEN Saybrook Graduate School and Research Center 747 Front Street, San Francisco, CA 94111 USA E-Mail:
[email protected] Considering the impact on human beings and human activities of architectural decisions in the design of space for human habitation, this chapter discusses the increasingly evident and necessary confluence in contemporary times of many disciplines and humanoriented sciences, with architecture being the meeting ground to know emergent phenomena of human habitation. As both a general rubric and a specific phenomenon, architectural emergence is the chosen focus of discussion and other phenomena are related to it. Attention is given to the phenomena of architectural induction, emergence, and convergence as having strategic and explanatory value in understanding tensions between two competing mentalities, the global domineering nature-for-humans attitude, in opposition to the lesser practiced humans-for-nature attitude. Keywords: architecture, convergence, design, emergence, human sciences, induction, systemics, transdisciplinarity.
1. Introduction What brought me to the subject of this chapter is my long-time interest in the occupancy and psychology of space. My approach to the subject is transdisciplinary and systemic, in that I think in contemporary times, we have to converge many fields of study and understand their interrelations to know the subject. What I find particularly interesting and relevant are reciprocal influences between one dynamic body of disciplines associated with architecture, art, design, and engineering the construction of human dwellings on the one side, and another body of disciplines associated with psychological and philosophical thought, human creativity and productivity, and well-being on the other side. Decades of research interest have transpired regarding the reciprocal influences between the two bodies of disciplines, but many would argue that the apparent marriage of architecture and psychology (to take one illustrative connection), through such a lens as environmental psychology [1] applied to architectural designs since the middle of the twentieth century, may have ended 51
52
A. Collen
in divorce by appearances of our human settlements of the early twenty-first century. From my reading of designers, architects and engineers, whose jobs are to design and construct the spaces we inhabit, in recent decades the development of our cities and living spaces constituting them have become subject to the same homogenizing and globalizing forces shaping our consumer products and human services. But for a minority of exceptions, overwhelmingly, the design and construction of human habitats have accompanied the industrialization, the standardization of the processes and products of production, and the blatant exploitation and disregard of the natural order and fabric of the physical world. From our architectural decisions and following them, subsequent actions to organize and construct our living spaces, we have today the accumulation of the physical, psychological, and social effects of them. Our intentions to live, collaborate, and perform in all kinds of human organizations do matter. We are subject to and at the effects of the spaces we occupy. This chapter is to discuss the relevance of trans-disciplinary and systemic approaches that may inform and three architectural phenomena that accompany the dwellings we occupy. 2. Two Attitudes What we do to our surroundings and each other in the forms of architectural decisions have lasting effects. If we believe our surroundings are there only to serve us to fulfill our needs to live, communicate, work, and breed, we have what may be termed the nature-for-humans attitude. Following this mentality, we freely exploit and redesign the natural world to suit ourselves. This attitude is rampant and we see the results everywhere on the planet today. The opposite mentality is the minority view. Adopting this critical interpolation of consciousness, if we believe we are here to serve our surroundings in a sustainable fashion to fulfill our needs, we have the humans-for-nature attitude. It is a pragmatic attitude in which every action takes into conscious regard the consequences of the action on the environment. Unfortunately, only a small proportion of humankind appears to manifest this mentality at this time in human history. We may increasingly question the dominant attitude, such that we may justifiably ask: What are we doing in the design and construction of our habitats to evidence that the humans-for-nature attitude underlies all that we do? Architectural phenomena and decision-making are foci to explore tensions between the two attitudes.
Emergence of Architectural Phenomena in the Human Habitation of Space
53
3. Human Activity Systems and Organized Spaces I have been involved with systems research and sociocybernetics for three decades [2]. I have been particularly interested in what we may call human activity systems [3]. A group of persons forms this kind of system when we may emphasize as the most important defining quality of such a system to be the interactions among these persons. The interactions constitute the activity of the system. The system is not very visible much of the time, but only in our imagination. However, when the people meet in person, or communicate by means of technology for example, the system is activated, it comes alive. It is the communications among the persons that make the system visible. In sum, it is what we mean by a human activity system. It is common that we are members of many human activity systems simultaneously and during our lives. The structures and places associated with human activity systems bring the subject matter of architecture to my research interest, because architecture I believe has a tremendous omnipresent influence on human activity systems. Typically today, we are separated from the natural environments that were common for most of humanity several generations ago. Most of us live our lives in cities. We live and work in contained and well-defined spaces. Considering the longevity of human history, the change from agrarian and nomadic non-city ways of life to the industrialized, consumer-oriented and modernized enclosed spaces of contemporary life has come fast. But an alternative way to think about it is to reflect upon the question: In what ways is the architecture of the life of a human being different today than two hundred years ago? This question is important, in that the architectural decisions of the past, as manifested in the dwellings we inhabit today, I submit have a profound influence on living, thinking, producing, and self-fulfillment. The idea of organized spaces need not be confined to physical housing as we know them. Dwellings, such as schools, offices, and homes, and the physical meeting places within them, such as countertops, dining tables, and workstations, are but nodes of vast and complex networks of persons spanning the globe, made possible via our electronic media technology. Thus, we have various levels of complexity for human activity open to us to consider what organized spaces entail, namely both real and virtual spaces. In fact, such devices as the mobile phone have profoundly altered our idea of what an organized space is. The interface between real and virtual space means that wherever we are in the physical world, there is increasingly present the potentiality of an invasive influential addition (radios, intercoms, cell phones, television and computer screens). These virtual avenues complicate our understanding of our inhabitation
54
A. Collen
of that physical space, because activation of a medium can at any time compete as well as complement our activity in that place. Being paged or phoned may distract or facilitate respectively from current events. The interface has become extremely important to communication, so much so, virtual devices are aspects included in the architectural decisions to design and construct human habitats, for example, placements of recreation and media rooms, and electrical wiring. As a result, various technological media devices are evidence of extensions of human activity systems into virtual realms not immediately visible to us with the physical presence of a group of persons at the same time in the same physical location. 4. Architecture Designs and Organized Space One form of expression of the design of space is architecture. To make a decision that organizes space is an essential element that creates architecture. To impose architecture in space is to organize the space for human habitation. Various organizations of space constitute architectural designs. This activity of ordering space, whether by design of the architect or the inhabitant, can lead to a range of consequences on human activity, from extreme control by others on the one hand to personal expression, happiness, and ornate displays on the other hand [4,5]. Beyond the basics of the perceptual cognitive relations involved in constituting design, the art and innovation in architecture tend to embroider and enhance its minimalism. However, contemporary approaches tend to challenge this view as too limiting, as evidenced for example when inhabitants of modernist architecture remodel and innovate to make their dwellings their own. Such secondary effects illustrate that we cannot take sufficiently into account the emergent consequences of imposing a given architectural design on human beings. Defining architecture, from Vitruvius to present day, and keeping it relevant to human settlements are challenges informatively described in terms of urbanizing concentrations of humanity as complex systems [6]. Further, a provocative journey through the development of architecture revisions the aesthetic of architecture and the primacy of beauty in contemporary terms of the pursuit of happiness that we can experience and manifest in the design and inhabitation of constructed spaces [7]. 5. Architectural Induction and Experiencing Space It is a non-controversial fact, an existential given, that the space a living being inhabits has a profound influence on that living being. Where the biologist may
Emergence of Architectural Phenomena in the Human Habitation of Space
55
point to primary examples of this fact by means of the phototropic and hydrotropic propensities in life forms, the anthropologist may cite the prevalence and placement of certain raw materials, infusing the artifacts of festivals, ceremonies and other cultural events, that are distinguishing markers among peoples. Interacting with the constituent make up of a living being, the environment is a determinant reality of that being. Arranging homes about a meeting place, limiting the heights for every urban dwelling, defining room sizes and their configuration to constitute the set of spaces of a dwelling are examples of architectural decisions. Architecture shapes and organizes the environment for human beings; de facto, architecture is a key environmental force. As a human being, my principal point of reference for existence is my being. To survive, I think in this way and relate to all other persons, things, and places from my personal point of view, my vantage point. Thus, cognition, perception, psychology, and phenomenology are particularly relevant for me to explain, understand, create, design, construct, and change the spaces in which I live, work, and relate with other human beings. At every moment, induction has much to do with my experiencing of the space I inhabit. What sights, sounds, smells, touches and tastes make my space of a place? The objects I perceive and my cognizance of their configuration about me constitute my ongoing experience. My experience is amplified because of my movement through space, which also means through time. My interactions with the objects are specific relations and my space a general relation, all of which are inductions. But those aspects of my experiencing of the space that may be attributed to decisions determining the overall design and organization of the space may be termed architectural induction. By means of perception, cognition and action, we experience space in chiefly four ways: 1) in a fixed body position, we sense what is; 2) we senses what is, while the body is in motion; 3) we interact with persons and objects that are what is; and 4) we integrate senses and actions of what is from multiple separate body positions. This internal frame of experiencing is an artificial articulation of course, because we are doing all four simultaneously most of the time. What becomes experience of a given space is determined in part by the internal frame and in part by the architecture of the space we occupy. The architecture induces and the frame influences. From the resultant confluence, experience emerges.
56
A. Collen
Figure. 1. Framing reconnects separated spaces.
6. Framing and Architectural Phenomena Framing is a natural inherent perceptual-cognitive process of being human (Fig. 1). To line out an area of space is to frame. It is to make separations in the space, to break the space into parts. What is included and excluded in the frame is an act of profound importance having major consequences in regarding architectural induction and emergence. One excellent example of framing in architectural design is making the window. The window is an elementary frame, depicted as a square, rectangle, triangle, circle, oval, or other such intended opening in what is otherwise a pure division of space. Let us consider the square window. What does each square window of a building, seen from a given vantage point communicate? What is its inducement? When a square is made as a window, doorway, recess, or projection, what is induced? Consider some possible relations, not as facts, but only hypotheses: The open square is separation, openness, or possibility; the double square is solidity, stability, or strength; the black-and-white or colored square is separation; the square with crossbars is confinement, imprisonment, or
Emergence of Architectural Phenomena in the Human Habitation of Space
57
control; the square of squares is separateness, security, or safety; and the square in a circle is fluctuation, alternation, tension, or creativity. Consistent with a phenomenology of experiencing space, the examples above are to illustrate the relevance of the experience of the beholder and occupier of the space, regarding the induction of the frame, in this case the square (like the window frame) and the consequent emergent elements of experience.
7. Arena of Inquiry Influences Architecture Inquiry is often discussed in terms of paradigm. We may also realize it is another example of framing. Philosophically, an arena of inquiry (paradigm) comes with an epistemology (knowing), ontology (being), axiology (set of values), and methodology (means of conducting inquiry). We want to know the space. There is knowledge of the place. We can experience the space by being in it and that is not the same as knowing about it. What we see, hear, touch, smell, and taste while in the place naturally spawns meanings, that is, interpretations of what we feel and think about the place. We bring to the place prior experiences that can influence and bias the framing. There are many ways we may value the place or not. And there are ways to explore, alter, and work the place into what we want or need it to be. But there usually are important elements to respect, preserve, and honor in the place. Finally, there are means to redesign and reconstruct its spaces. An arena of inquiry is comprised of the basic assumptions and ideas that define the decisions and practices leading to the architecture. As an arena, it influences the work and process of the inquirer, in this case, the architect who designs, the builder who constructs, and the human beings who occupy the space. When the architect adopts and works within one arena (paradigm), it is a way (frame) of thinking that influences and guides, but also limits thinking. But it is necessary to have to enable the discipline to exist. For the disciplined inquirer, in this case the architect, the frame (paradigm, arena) provides the rules, conceptual relations, principles, and accepted practices to make the architectural decisions required to compose and present the organization of space for human habitation. The paradigm scheme that I find informative is close to one published recently [8]. Paradigms are described to study effects of organized space, and I add a fifth (Systemic) to discuss paradigm for a more inclusive application to architecture. In brief, working within the Functional paradigm, we would be
58
A. Collen
preoccupied with whether the architecture is useful, efficient, and organizes space as intended. Does it work? To design within the Interpretive paradigm, we emphasize how people feel in the space, how they experience it. Is it reflective and enlightening? In the Emancipatory paradigm, we organize space to empower or subdue, liberate or imprison. Does the architecture free or control its occupants? To work in the Postmodern paradigm means to replicate and mimic the diversity and creativity of human beings who are to occupy the space. We would have a major interest in whether the architecture is heuristic and pluralistic, or delimiting and homogenizing. Finally, working within the Systemic paradigm, we would look for ways to combine, balance, configure, and complement the best features of the other paradigms when applied to a particular space. The broadest paradigm would be multi-methodological rather than restricted to one paradigmatic frame. The Systemic paradigm would be most akin to trans-disciplinary architecture, discussed later in this chapter. Given the variety of dwellings we see in our cities today, I find meaningful the following relations between paradigm and the kind of organized space: Functional affiliates with the factory to make a consumer product, Interpretive with the socializing place of a restaurant, Emancipatory with the health spa to promote human healing, Postmodern with the communal park square to support the social diversity of the community, and Systemic with combinations of the above. To illustrate this progression take the application of school architecture. During the industrialization of European and U.S. American continents, our public school systems rose for the populace as places to house our children while parents worked in factories. It is often argued that education then was more about control and socialization than learning and personal development. The design and construction of schools served former ends. Of course, these outdated functionalistic ideas cannot serve our present conditions and needs, even though the idea of containment in a space called school appears of enduring prevalence still. The architecture of schools has advanced extremely to explore the design and construction of more open environments [9,10], in fact to the extreme of considering the community the learning laboratory that once was the classroom. Learning is continuous, life-long, and increasingly augmented by the Internet. Places of learning are confined no longer to metaphors of the one-room schoolhouse, bricks-and-mortar campus, and local geography. To decide the inclusion and placement of a rectangular or oval window in a wall is a prime element and architectural decision. The decision is not divorced from the frames we bring to the act, but to the contrary, partly induced by them. To have familiarity with the arenas of inquiry in advance I contend invites more informed choices and a higher level of awareness to make the architectural
Emergence of Architectural Phenomena in the Human Habitation of Space
59
decisions required to design, construct, and alter human habitats to fulfill the range of human interests represented in the arenas.
8. Architectural Emergence The complexity of framing described in the two previous sections becomes even more profound when we take into consideration that the relations among the elements of the space we perceive changes continuously and multiple paradigms apply. Note the relations enrich and compound experience, for example, when we smell the changing odors walking through a garden (the passage of the body through space), and when sitting we see shadows moving on a wall through the day and feel rising and falling temperatures over days (occupying the same place through time). We are both instruments and recipients of change.
As we move through spaces, the body moves in a constant state of essential incompletion. A determinate point of view necessarily gives way to an indeterminate flow of perspectives. The spectacle of spatial flow is continuously alive . . . It creates an exhilaration, which nourishes the emergence of tentative meanings from the inside. Perception cognition balance the volumetrics of architectural spaces with the understanding of time itself. An ecstatic architecture of the immeasurable emerges. It is precisely at the level of spatial perception that the most architectural meanings come to the fore [11]. As every point of view gives way to the spatial flow of experience, an architecture emerges (Fig. 2). It is inherent in the existent manifest experience of the space occupied. It is a resultant architectural induction. There will likely be an architecture associated with the place one occupies, whether an office, town square, restaurant, or home. But we can also state that the idea of architecture is emergent from the personal experience of the place. That emergent phenomenon from the person is a valid phenomenon. Furthermore, it is justifiably legitimate to name the architecture of one’s experience and communicate it to others. This personal reference point and name of the experience are to be distinguished from the name architecture that is likely associated with the person and design used to construct and organize the space prior to human occupancy. The personal architecture has greatest relevance. From a phenomenological point of view, the totality of organized space experienced personally constitutes the experiential manifestations of consciousness. When lights, sounds, odors, and objects pervade a space, the space, as we experience it, is as much about what is there as what is not. The
60
A. Collen
Figure. 2. Multiple paradigms apply in organizing the spaces of this Parisian indoor emporium for the intended architectural induction to promote emergent behaviors expected in a haven of consumerism.
following are illustrative paired qualities of experience that may become descriptors of our experience of a particular place: Empty-full, present-absent, visible-invisible, loud-quiet, black/white-colored, soft-hard, hot-cold, and strongweak. They represent dimensions of experience, along which we use language to label and communicate experience to others. What is the sight, sound, smell, touch and taste of the space of the place? But descriptors need not be restricted to the sensorial. More complex constructions occupy our experience of space. Are the materials synthetic and artificial, or natural? What and who occupies the space? What interactions among the occupants of the space add to our experience of the place? Our perceptions and cognitions of sounds, lines, shapes, colors, odors and contacts become forces of influence. One may read, reap, interpret, and make meanings--the essential structures and contents of
Emergence of Architectural Phenomena in the Human Habitation of Space
61
consciousness of the place. But of great relevance is the relational nature of the space to our perceptions of the space and meaning attributions that constitute the experience we reflect upon, report, and discuss with others. The particular qualities that describe our experience in the most rudimentary and essential respects are emergent phenomena constituting the experience. They are examples of emergence. Regarding those aspects that stem from decisions determining the overall design and organization of a given space, we may use the phrase architectural emergence to refer to them. The phenomena of induction and emergence are complementary processes, like the two sides of the same coin. They are evocations of our existence in context. Which one to highlight is a matter of emphasis. We may focus on the inductive nature of experiencing space. The impact of the place is described in terms of induction. What flows from the habitat to the occupant, so to speak? What is the induction? Alternatively, we may focus on the emergent qualities of our experience of the place. When in the place, what comes forth to become the foreground of consciousness? What is emergent? Generally speaking, we may refer to the two phenomena as the architectural induction and architectural emergence of the organized space, respectively, when we can know the key architectural decisions involved to design and organize the space associated with the induction and emergence. To illustrate succinctly, placement of a stone arch at the entrance/exit joining two spaces (rooms, courts, passages) has an induction/emergence different from that of a heavy horizontal beam.
9. Systemics of Architecture, Emergence, and Attitude Put people together in a place. Organize the space by means of architecture via the architect, the occupants, or both. After some time, their interactions will likely induce a human activity system. In other words, a social system of some kind emerges, a human activity system defined not simply by the collective beings per se, but more definitively by their interactions. The nature and qualities of the interactions make the system what it is. But it is important to include in our thinking: The architecture of the space is part of the system. It induces to influence human interaction, there by participating in the emergence of properties that come to characterize the system. Given many interactive relations of the people with the environment and each other, concepts and principles applied to describe the designing and organizing of the space for the human beings who occupy it may be termed the systemics of its architecture, that is, those systemic concepts and principles applied to and active in that context.
62
A. Collen
Figure. 3. The office building skyscraper dominates the cityscape.
To illustrate, we may imagine a particular dimension of our experience of place (hot-cold, small-large, still-windy). If we select one element too extremely and focus on it, the whole may go out of balance with the other elements. In other words, a strong force or energy from one element can go so far as to obliterate the presence of others in the space. One element may overshadow the others, like one large tree blocks the sunlight that would nourish the other trees. We witness this spectacle entering a city square or living room of a home to immediately notice a towering building or large stoned floor-to-ceiling fireplace, respectively, with all others entities occupying the space organized around it. The size and intensity of the dominating entity (Fig. 3) tends to command and hold the attention, block out, or mask other entities. Whether the space is being organized in genesis, such as the design, plan, and construction of a new building, or the built space altered, such as remodeling the home, there are architectural decisions being made. The elements that dominant the space, the emergent qualities, may become particular inducements known to and characteristic of that architecture. The kiva (half egg-shaped oven-like fireplace), for example, has acquired this distinguishing status in the homes of Santa Fe, New Mexico. As to the systemic nature of architecture, we may wonder what overriding principle influences our thinking to make the architectural decisions by which the prominent qualities emerge. Is ideal architecture balance? Once we have knowledge of the emergent elements of a given architecture, is the task to find the balance of the most favorable inducements for human habitation? Similarly, we may ask: Is ideal architecture integration of those elements known to promote
Emergence of Architectural Phenomena in the Human Habitation of Space
63
well-being? Of particular relevance is that the emergence of any element to dominate the experience of the occupants of the place may lead further to concerns of human betterment at one extreme and human detriment at the other extreme. Which attitude (nature-for-humans or humans-for-nature) does the hallmark elements of an architecture support? What hallmarks a “green” ecologically sustainable architecture? The thesis developed in this chapter is that the spatial organization we impose through architectural decisions is an inducement in the emergence of the human social systems inhabiting the space. It merits testing to seek evidence for and against it, and whether it might be applied in constructive ways for human betterment. Given current concerns over survivability, it would also support shifts in consciousness from the presently dominant to the advisedly sustainable attitude. Our understanding of this relation seems both obvious and critical to the best of what architecture has to contribute. It should be generally known what inducements favor sustainability, well-being, productivity, and peaceful cohabitation. There is a powerful feedforward loop prominent in the systemics of architecture in its integral relation with design and technology [2]. Civilization progresses by accretion through novelty, diversity, and necessity [12]. We benefit from the discoveries and achievements of those who precede us. Through our immediate activities of design and construction involving feedback loops, we learn what works and what does not. The process is very pragmatic, requiring invention, innovation, and refinement; practical application; and extensive repetition by trial and error until efficacious action becomes reliable and sustainable. Thereby, we come up to the challenge of what is needed to solve the problems of our day. In the case of architecture, the performance, maintenance and endurance of the spaces we design and occupy come under our scrutiny. Ideally, our evaluations should lead over subsequent generations to increasingly superior dwellings in their construction [13], and our healthy living and experience of them [7,14]. As applied to the systemics of architecture, the myriad of feedback loops of human activity systems, coupled with the more macro feedforward loop linking generations are at the heart of second order systemics [15]. It is from the latter that architectures should emerge to apply to the present challenges we face.
10. Emergence of Trans-disciplinary Architecture One implication from the design, organization, and construction of the spaces we inhabit is that the emergent qualities bring preeminent importance to the trans-
64
A. Collen
disciplinary nature of architecture. It follows naturally from the systemics of architecture applied to a given space, because making an architectural decision increasingly has become a more complex endeavor. Some areas to consider are cultural elements; recognition of the unique qualities of indigenous materials; imaginative perspectives; knowing physical, physiological, psychological, social, and economic effects of the architecture on living beings; familiarity with current environmental conditions and fauna; knowing the perceiver’s angle of vision; the history of the place; and preconceptions of the inhabitants. All of these areas have a potential for inclusion in a particular architectural decision. Bringing a set of them together to define in part a given architecture recommends consultation with a range of experts, disciplines, and knowledge domains beyond the principal training and experience of the architect. Thus, to ensure successful completion of a project, the situation commands a systemic approach to organizing the space involved. A confluence of disciplines becomes important to consider and likely necessary, in order to design both conscientiously and consciously with the humans-for-nature attitude. This means a trans-disciplinary approach to making architectural decisions. This chapter has considered architectural phenomena and some aspects of architectural decision-making that would recommend organizing space for human habitation based on systemic and trans-disciplinary approaches. But articulation of the aspects often merely introduces key elements comprising the experience of those who made the original architectural decisions, and later those who occupy the place. From the relations among elements, specifically those that stem from various fields of study and disciplines of human experience and inquiry, we may see trans-disciplinarity emerge. Although matters of economics, physical design, perceptual cognitive relations, and engineering of structure are critical to applications of architecture, there are also psychological, socio-cultural, historical, and contextual influences to be included. For a particular place of human habitation, too much weight given to one aspect may have adverse consequences on the other aspects specifically and the entire space generally. Again, we must question the main principles driving the architectural process, such as balance or integration, mentioned earlier in this chapter.
11. Summary and Conclusion Our experience of space influences our state of being, relationships with others, home and work life, and connectedness to context. The name induction is given to label this phenomenon. Induction is a mediating construct to suggest critical relations between architectures and human activities. The importance of the
Emergence of Architectural Phenomena in the Human Habitation of Space
65
consequence of induction is termed emergence, another phenomenon defined as a quality, feature or characteristic of human interaction with the environment and others associated with and intentionally attributed to its inductive influences. Once the influences are known, their intentional confluence in making architectural decisions is termed convergence. When applied to developing human habitats architectural induction, emergence, and convergence may become advantageous to promoting mutually beneficial humans-for-nature relations. The three architectural phenomena can have strategic and explanatory value to detect and understand the consequences, respectively. The presumption is that our heightened awareness of these phenomena and the framing we apply to decision-making may better enable us to perceive acutely the influences of organized space on our well-being, human relations and activities; evidence the multiple systems of which we are part; and design more efficacious spaces for human beings and human activities. This chapter has been written with systemic and trans-disciplinary importance being given to the imposition of architecture in a place. Sensitivity is imperative to the phenomena of induction, emergence, and convergence. Well worth studying are the architectural decisions having relations to architectural designs and consequential evocations. If we are to become more appreciative of and caring for our environments, and thereby have a quality of life, it is paramount we understand and apply as wisely as possible these relations.
References 1. D. Lowenthal, J. of Environmental Psychology 7, 337 (1987). 2. A. Collen, Systemic Change Through Praxis and Inquiry (Transaction Publishers, New Brunswick, New Jersey, 2004).
3. P. Checkland, Systems Thinking, Systems Practice (Wiley, New York, 1981). 4. L. Fairweather and S. McConville, Prison Architecture (Architectural Press, New
York, 2000). C. Day, Spirit and Place (Architectural Press, New York, 2002). V. Di Battista, Towards a systemic approach to architecture, in Ref. 15, p. 391. A. de Botton, The Architecture of Happiness (Pantheon, New York, 2006). M. Mobach, Systems Research and Behavioral Science 24, 69 (2007). M. Dudek, Architecture of Schools: The New Learning Environments (Architectural Press, New York, 2000). 10. A. Ford, Designing the Sustainable School (Images Publishing Group, Victoria, Australia, 2007). 11. S. Holl, Parallax, (Architectural Press, New York, 2000), p. 13. 12. G. Basalla, The Evolution of Technology (Cambridge, New York, 1988).
5. 6. 7. 8. 9.
66
A. Collen
13. A. Stamps, Psychology and the Aesthetics of the Built Environment (Springer, New York, 2000).
14. J. Hendrix, Architecture and Psychoanalysis: Peter Eisenman and Jacques Lacan (Peter Lang, New York, 2006).
15. G. Minati, Towards a second systemics in Systemics of Emergence: Research and
Applications Eds. G. Minati, E. Pessa and M. Abram (Springer, New York, 2006), p. 667.
QUESTIONS OF METHOD ON INTEROPERABILITY IN ARCHITECTURE EZIO ARLATI (1), GIORGIO GIALLOCOSTA (2) (1) Building Environment Sciences and Technology, Politecnico di Milano Via Bonardi, 15 - 20133 Milan, Italy E-mail:
[email protected] (2) Dipartimento di Progettazione e Costruzione dell’Architettura, Università di Genova Stradone S. Agostino, 37 - 16123 Genoa, Italy E-mail:
[email protected] Interoperability in architecture illustrates contemporary instances of innovation. It aims, through the standardization of instruments and procedures (and especially through shared languages of/in IT tools and applications), at the optimization of interactions amongst agents and the work done. It requires, within a consistently non-reductionist systemic approach: (1) interactions and activities of conscious government in/amongst its fundamental component parts (politics, technical aspects, semantics); (2) development of shared languages and protocols, to verify technical, poietic, etc., innovations which do not destroy accumulative effects and peculiarities (axiological, fruitional, etc.). Keywords: systemics, architecture, innovation, sharing, interoperability
1. Introduction “Some might be filled with wonder watching a flock of birds, but such wonder derives from the impossibility of understanding their means of communication: wonder comes from a lack of comprehension, one can not understand because the communication codes are unknown or, if one prefers, because there is a lack of interoperability between their and our language” (Marescotti, in [1, p. 53], author's translation). In a similar way, in architecture, different languages and/or ineffectiveness between codes of communication in the transmission of data, information, etc., and in the operational instructions themselves, lead to interpretative difficulties: the latter often leading, at least, to inefficiencies and diseconomies in technical and management processes. In this way, interoperability in architecture aims at optimizing interactions amongst agents (as well as the work done), using shared common standards for processing/transmitting documents, information, etc. 67
68
E. Arlati and G. Giallocosta
Interoperability, however, if consistently intended in a non-reductionist sense [4, pp. 84-86], and [1, p. 23], should “(...) be developed in three modalities: technical interoperability, of which one can clearly state that although IT techniques and tools (hardware and software) present no problems, problems do exist and lie in the ability to plan adequate cognitive and thus cultural models; semantic interoperability, which takes us back to interdisciplinarity and the construction of dictionaries and thesauri; political interoperability, reflected in law, in the value and various aspects of the law, and in data property and certification. On these questions, standards are studied and developed (...) through the activities of agencies such as ISO (International Organization for Standardization) and OGC (Open Gis Consortium) ...” (Marescotti, in [1, pp. 56-57], author' s translation). In the same manner, the sharing of standards and protocols (which constitute the instrumental apparatus for interoperability), when used coherently without detracting from the languages themselves, axiological particularities, etc., ensures: • development of cognitive and cultural models which are acquired (in the sense of improved relative synergies) or also tendencies towards new ones (more adequate and effective interactions); • validation of the former models as an interdisciplinary resource [4, pp. 4951]. Only in this sense can interoperability in architecture find real meaning in an advanced systemic approach: this poses the fundamental questions of method for its use in a non-reductionist key. 2. From the industrialization of building to interoperability in architecture From a rationalist point of view [3, pp. 30-40], the industrialization of building seals a relationship between architecture and industry in the sense of a possible mass production (and in terms of product standardization and interchangeability of products and components). This is the result of those instances of innovation, typical of post-artisan choices and logic, through approaches and operational praxis inspired by mass production (prefabricated parts and industrialization of on-site castings) which, however, carry important implications for the design and management stages. This leads, especially with the use of construction techniques with high levels of prefabrication and/or industrialization of on-site castings, to standardizations which are not always
Questions of Method on Interoperability in Architecture
69
compatible with project poiesis, nor with more consolidated aspects of construction culture or living culture. Over the past few decades, however, new needs and awareness regarding populating processes emerge; often, moreover, such new needs and awareness transform obsolete connotations of progress and development into evident dichotomies: from growth and experience from localized hyper-population, to the containment of urban sprawl, the renovation of the pre-existent, the sustainability of building activities. New assumptions develop regarding the architecture-industry relationship, which are deployed mainly: • according to greater compatibility of the structural peculiarities of the former (indications of the singular nature of buildings, the limited opportunities of product standardization, etc.); • consistently with the most important developments in the latter (series of moderate quantity, analogical series, etc.). With such an evolution of the scenario, the traditional connotations of the industrialization of building are modified. Thus do they appear, less and less characterized by the rigid assumptions of mass production [3, pp. 41-64]. More recent connotations of the industrialization of building, therefore, tend to follow the objectives of the standardization of instruments and procedures, minimizing mass production aspects: and with the latter, any implications of possible offsets in the design stage and in many of the operational and management aspects of architecture. Amongst these objectives, technical interoperability leads to, as mentioned, a need for optimized integration of agents (and their respective activities), through shared languages, currently developed by specifications (called IFC Standards - Industry Foundation Classes) for the production/application of interoperable softwarea. Clearly, the use of the latter: a
The IFC standardized internationally through ISO PAS - Publicly Available Standard 16739/2005, is an open source software, and thus freely available to competent and expert users, and is run by the IAI - International Alliance for Interoperability (an international institution comprising researchers, public sector managers, industrial organizations, academics and university teachers). IAI-IFC develops applications of BIM (Building Information Model) concepts; allows the representation in an explicit, shared and thus interoperable way, of objects and their spatial-functional interrelationships. The BIM model consists of a unified information system whose component parts are explicitly marked in order of represented entities, geometric and typological correlations, assigned characteristics: operation is regulated as a function of tasks and responsibility given to the various subjects holding given capabilities and decisional roles; each subject, while authorized to operate only on their own activities, can visualize the whole set of transactions in progress in the model. In this way the integration of the various decisions can benefit from the simultaneous
70
•
•
•
E. Arlati and G. Giallocosta
represents an optimized scenario of operational integration, supporting programming, design, life-cycle management of building organisms, plant and infrastructure networks, etc.; it becomes an important component of an instrumental apparatus for a systemic approach towards architecture, identifying, moreover, shared integrations, and is coherent with effectively non-reductionist approaches, of its various aspects (functional, technological, energetic, structural, etc.); develops, above all, shared and optimum opportunities, even though only operationally, for the management and control of some of those factors (design, technical-operational, etc.) inducing processes of emergence in architecture [4, pp. 37-42, 98-99).
Thus the dichotomy (which still exists) between the unpredictability of emergence and the need for the prefiguration of architecture can, for some aspects and to some extent, be reduced, also through the use of shared simulation and modeling for the preliminary management and control of possible interactions amongst those factors: design, operational, etc. More generally (and reiterating some previously mentioned aspects) interoperability in architecture, together with other instruments: • overcomes previous connotations of predominant product standardization in building industrialization, endorsing late-industrial assumptions of predominant procedure standardization; • requires/foresees mature connotations of a systemic approach to architecture, replacing traditional structural reductionism. In the same way, in the development of interoperability, the risks explicit in the previous praxis of building industrialization are still present, although in different forms and ways. As mentioned above, in fact, emphasizing product standardization often implies behavior which potentially ratifies operational approaches with the consequent outcomes of building projects; in a similar way standardization of procedures practices, using decision support systems, often wiping out cultural peculiarities through the way they are used, for carrying out and managing architectural activities, in the safeguarding of memory etc., may lead to:
checks regarding potential, undesired conflicts and/or interference, allowing adequate solutions to be found rapidly.
Questions of Method on Interoperability in Architecture
•
•
71
possible removal, with those peculiarities, of the premises regarding multiple languages, axiologies, distinctive traits, etc., of the various architectural contexts; unacceptable breaks in cultural expression and accumulation, and ratification and reductionism at such a level b.
In this sense technical interoperability, when correctly understood (together) validates: • languages and shared operational contributions (and the efficiency of the work done), • cultural premises and peculiarities regarding the many and varied architectural contexts, especially regarding effectiveness and flexibility of the work done (related to project poiesis, axiological acquisitions, model outcomes, etc.), requires the removal of any technicist drift in the set-up and the use of standards, protocols, etc. In the same way, as mentioned above, it also requires conscious government and interactions with modalities (according to Marescotti) of political interoperability and especially semantic interoperability: where one considers, in the validation and the more advanced developments in those cultural premises and peculiarities, roles and contributions ascribable to interdisciplinarity and the construction of dictionaries and thesauri (Marescotti, in [1, pp. 56-57]. Within this framework, interoperability, through consistent interaction between its modalities (technical, semantic and political), as mentioned above, combines advanced definitions from a systemic approach and prepares the ground for a non-reductionist development of architecture. Or rather: when accepted in a rigorously systemic manner, it acts at the operational levels of architecture while conditioning cultural settings and developments (and in this sense one is dealing with the validation of virtuous tendencies).
b
Here, it should be stressed, more generally, that there is a risk of technological determinism, typical of sophisticated IT structures when not suitably controlled especially regarding the manmachine interface: clearly, this risk also tends towards an uncontrolled process of technological proxy.
72
E. Arlati and G. Giallocosta
3. Methodologies of technical interoperability 3.1. Cultural origins of methodological equipment and contextual motivations for evolutionary developments Methodological apparatus for technical interoperability in the processes of design, production and management of architecture have so far essentially been limited by a conservative approach: an approach whose main aim, as for most software companies, was a rapid market success for IT applications. This bears clear witness to the non mature nature of supply for the building industry (especially in Italy), compared to other particularly competitive sectors in global markets (regarding optimization of efficacy/efficiency of the work done) such as electronics, avionics, high-precision engineering. It also bears witness to, and is an expression of, the separate nature and fragmentation of the multifarious markets for building products, which preside over and choose, mistakenly rooted in a given geographical site, the main points of their specificity: and so far have been able to condition the various actors in building activities and the nature of the initiatives, on the basis of effective requirements for the validation of local cultures, acquired values, etc., but also when faced with unmotivated unwillingness to develop procedures for the integration and sharing of technological and operational resources; it also follows from this (amongst the various doubts surrounding a persistent dualism identity/mass production) that the essential existence of systems of representation and processing on IT platforms are difficult to integrate, being aimed more at confining within the boundaries of specific product series the need for cooperation between the various actors involved c. But it is precisely this scenario which constitutes the object of a progressive change, typical of the current state of processes of production and management of architecture and is mainly due to the rise of two decisive factors: • the multiplication of the know-how needed to satisfy the set of requirements of increasing breadth and complexity, stimulated by the need to optimize the use of resources at continually higher levels of quality; • the development of regulatory aspects, aimed at providing guarantees in terms of security and the certainty of quality through the whole of the c
Nevertheless (as previously mentioned) one may often have, due to other aspects (and as effects of mistaken ideas about innovation transfer and operational approaches towards interdisciplinarity): (1) better transference of procedures and equipment from other sectors towards the building industry, (2) instead of suitable translations in this (and thus coherent with its effective specificities) of practices and outcomes reached elsewhere.
Questions of Method on Interoperability in Architecture
73
building production line, faced with increased attention being paid to economic aspects and social strategies for the creation of the constructed environment (and whose sustainability crisis, as mentioned above, is now apparent). Within this context of rapid transformation, the progressive modification of the building initiative requires the production line of the city and of its buildings to undergo a global re-thinking of its meanings: • in relation to its economic and social points of references, • faced with a system of requisites which are no longer completely part of the traditional knowledge of the architectural disciplines. Thus, there is the need for the predisposition of a renewed thought scenario, before any other condition and capable of representing the wealth of interdependencies and interactions amongst the decisive factors in the design of architecture. In this way emerge the reasons for accepting technical, or technological (the better term given its meaning of cultural renewal) interoperability, and with its procedural and instrumental corollaries, as an environment of operational resources for reaching the objectives of sharing between knowledge and knowhow. The cognitive approach is the fundamental directing criterion for the adoption of the method of technological interoperability; its operational horizon is, in fact, that of a melting pot in which one can: • contaminate with insight the demands from the various universes of reference (traditional disciplines and their decision-making instruments, so far separate), • remodel the nature of the relationships and interactions amongst the factors determining the configuration of the processes and products. In this sense, the contribution of expert knowledge is fundamental, that is the acquired knowledge of those with experience of past effects and consequences of the behavior in operation of building objects faced with given design assumptions. 3.2. Technologies for modeling data-flows Techniques for the virtual representations, expressed (the latter) through reference to an explicit expert knowledge-base and to a declared asset of
74
E. Arlati and G. Giallocosta
requisites and qualitative objectivesd, allow one the render coherent the evaluation of a design model even during its initial conception, through the support of a powerful software simulation environment. Further: the availability of advanced processing and simulation equipment is leading to the progressive loss of the aspects (still significant) of an exclusive and separate nature, allowing one to observe and cooperate in the development of a project down to its most minute details. These aspects thus offer the operators shared opportunities of representations, modeling and checks on the flow of transactions of meanings and values during the development of the design solutions, simulating its concrete realization as well as the later stages of use and management. In this way in architecture, as mentioned, the unpredictability of emergence can, to a certain extent, be directed following positive propensities of the component parts and interactions which produce it: parts and interactions, naturally, which can effectively be managed through forecast probabilistic hypotheses. Thus, in the force field between political, and especially semantic and technical interoperability (Marescotti, in [1, pp. 56-57], the structure of the cognitive instances which circumstantiate the design of architecture can take shape. For this, the powerful instrumental support is still that virtual model which configures the cooperating objects in the formation of the project, identifying the main interactions (semantic, functional, materials, technicalconstructional, maintenance, etc.). The essential intrinsic novelty in any real project, and its natural, innovative content, lies in the order taken on by the set of significant factors in a given context, in a given cultural and productive point in time, and aiming at a specific scenario of present and future requirements: from this point one proceeds, through successive translations of knowledge in their respective appropriate languages and through progressive transformations of identity, to the construction of a virtual model of solutions to the programme of requisites to be satisfied. Experimental control of the outcome of the design stage (traditionally very late in the life-cycle, and extremely expensive in terms of time and work due to comparisons which have to be made with alternative solutions) so far, because of the continuing insufficient development of design support technologies, has limited experimental activities on alternative designs mainly to within the sphere of the expert knowledge of individual designers and executors: d
One of the fundamental aspects, presiding over the introduction of technological interoperability into the architectural project, consists precisely in maintaining the possibility that the motivation/necessity network be rendered explicit, described, feasible and controllable by a community sharing the whole ensemble of aims and interests (or at least significant parts of it).
Questions of Method on Interoperability in Architecture
75
expert knowledge, moreover, predominantly implicit (the tradition lacking a system of representation of design values beyond formal-linguistic, constructive or structural critique), and thus not sufficient (at least in terms of efficiency) when substituting interoperable although fundamental resources, as previously mentioned, to ensure governance. Properly, the binomial governance - technical interoperability, the former intended as an impeding factor (amongst others) of technological determinisme, allows, however, optimum (and efficient) experimental activities. Interoperability offers the various specialist involved in the design the possibility of actually seeing the same identical model through their own specific system of software instruments, in which they have imported, read and processed the original relational database, without the need for re-processing, re-coding, etc., and avoiding approximate interpretationsf. It supports the design stage throughout its development, from its birth, to the definition of the model (in all its spatial and technological components), through the executive and operational definition, to the updating of the data on the edifice as built, down to the management and maintenance stages. It is then possible to proceed with the processing of the component parts (structures, plant, etc.), properly integrated and with optimum control of mutual interferences and overlap, updating the original model of set with the grafting of new components, and also processing families of comparable and superimposable models (thus providing possible design stage alternatives about which choices to make). The project stage thus described generates original potentialities for reaching the prefixed qualitative objectives. It does, in fact, allow the implementation of a praxis of experimenting with design solutions during their actual formulation, and optimizing the acquisition of gained experience by grafting it onto new projects with much greater rapidity with respect to traditional methods. e f
See note b. The BIM modeling environment assumes the presence of a family of specialist software instruments, conceived for cooperating in the definition of the various sets of characteristic values defining the project; each is an expression of specific domains of expert knowledge deriving from the diverse disciplinary traditions, all of them operating on the basis of the ability to recognize and process the objects defined in the relational database, together sharing: (1) the concept of object-oriented, (2) 3-dimensional vectorial representations in Euclidean space, (3) the qualitative and parametric characteristics defined in the attributes. A further fundamental assumption is the existence of a conventional code for the description, interpretation and recognition of objects in their effective nature (in the sense of represented entities) defined in the relational database. See note a.
76
E. Arlati and G. Giallocosta
The multitude of implemented objects (each with its attributes) in the relational database, moreover, its inherent expert knowledge contributions and especially explicable ad hoc for different project situations (cultural, contextual, poietic, etc.), ensure significant increases in possible variations in options and applications. Naturally, these do not exhaust, however, the number of theoretically possible alternatives. 3.3. Paradigms of experimental application Currently, interoperable technologies of architectural design (based on IAI-IFC standards) require experimental applications in pilot-projects or other, in order to use in the experiment the areas of know-how of the various agents who contribute to making design and actuative decisions, in order to verify the effectiveness of the modelingg. To evaluate this effectiveness, and with it the ability to offer performance advantages to both agents and end-users, the results must be compared with various paradigms. Mainly, the possibilities of: • increasing the levels of government of a project, • proceeding to its most important interfaces, • progressively understanding its main qualitative and economic aspects, • modeling a project, • sub-dividing it into its main phases, • acquiring experience, • facilitating successive monitoring of behavior during life-cycle. The increased possibilities of governing a project, facilitated by the IAI-IFC standard, clearly define prototypes optimized for positive correspondences with end-user requirements and expected quality levels as defined by the regulation: this means especially control of the efficacy of the experimental procedures, on the basis of their achievable advantages (cost-benefit). Feasibility and checking the important interfaces with ranges of further models on the building and microurban scales allow control over the suitability for habitation and services as required by the end-users, and/or the opportunity of implementing innovative aspects. The possibility of progressively understanding, during project modeling, its main qualitative and economic aspects are fundamentally inscribed in the checks retrospectively made with respect to the useful life-cycle of the end-product; in g
Applicative experiences of this type are already under way in the BEST Department at the Politecnico di Milano.
Questions of Method on Interoperability in Architecture
77
this sense the design stages are considered as strategic for logical and temporal priorities, also as a function of the interoperability with successive ones (quality and economies in construction, maintenance, etc.). The effectiveness of modeling a project also concerns the quality and the rapidity of interaction with evaluations of cost and of the technical-operational process (calculations, estimates, etc.). Articulation into its main phases also concerns the identification of building suppliers’ products, to be integrated into the construction, management and maintenance cycles (libraries of interoperable products). The acquisition of experience (implemented, coded and shared within the relational database) is also functional for suitable re-use in future activities. The monitoring of the behavior of manufactured during life-cycle can also be facilitated due to the implementation of a single relational database of the results of the initial simulations, and of those carried out during successive operations. 4. Conclusions It becomes particularly clear, partly reiterating some of the above aspects, how the theoretical-methodological approaches and the outcome of experience of technical interoperability carried out so far suggest, for architecture, the need for and the opportunity of developing its cultural statutes, the sense of its formal and symbolic languages which are still often reduced, confined within selfreferentiality and conserving supposed purifying objectives and presumed intangibility of their relevance. The innovative opportunities offered, and especially the reasons inferable from the current historical context (social, economic, environmental sustainability, etc.), can no longer be inscribed within single disciplinary outlooks, or the breaking up of disciplines, or separated expertise: those opportunities and reasons, on the contrary, require the fundamental integration of disciplines and expertise, optimized osmosis of specialist rigor, expert knowledge, etc., capable of shared syntheses at the highest levels of generality (and not vagueness) and prospectively a harbinger of positive outcomes. In this sense, interoperability in architecture, as seen above, acquires decidedly systemic connotations. In fact, it promotes forms of collective, shared, multidisciplinary and positively desecrating knowledge; it is aware of synergy and does not obliterate, on principle, the specificities and peculiarities of contributions (expert knowledge). On the contrary, it implements the sharing of a cognitive model amongst the agents.
78
E. Arlati and G. Giallocosta
Such an innovative tendency is even more striking, when one considers (and to a much greater extent with respect to other productive sectors): • the traditional (still physiological) heterogeneities of the operators in building activities, • the frequency of conflict amongst them, • the lack of willingness towards common codes of communication amongst various skills, • often the diversification of interests (also due to the existence of persistent disciplinary barriers), • etc. Moreover, the same main objective of past experience in the industrialization of building, shown in the development in (precisely) an industrial sense in the building sector, can now also be pursued, although with different strategies, and at least within the terms of adopting ideally structured (and shared) operational practices, through the contribution of interoperable resources. The latter, however, still require further experimentation and elaboration and especially, as far as is relevant here (and as observed above), regarding developments in a non-reductionist sense. Technical interoperability, in this case, certainly requires further development and improvements, for example in the optimum rearrangement of cognitive and cultural models (Marescotti, in [1, p. 56]), with more flexibility of protocols and instrumentation, etc. But, above all, and of fundamental importance, confirming once again what has been said above, is coherent interaction amongst the technical, political and semantic components of interoperability in architecture, which will render it completely suitable for: • encouraging advanced and shared developments, • favor positive tendencies of components and interactions which lead to emergence, • produce efficient practices of diversity management. And not only on principle. References 1. V. Di Battista, G. Giallocosta, G. Minati, Eds., Architettura e Approccio Sistemico (Polimetrica, Monza, 2006).
2. C.M. Eastman, Building Product Models: Computer Environments Supporting Design and Construction (CRC Press, Boca Raton, Florida, 1999).
3. G. Giallocosta, Riflessioni sull’innovazione (Alinea, Florence, 2004). 4. G. Minati, Teoria Generale dei Sistemi, Sistemica, Emergenza: un’introduzione (Polimetrica, Monza, 2004).
COMPREHENSIVE PLANS FOR A CULTURE-DRIVEN LOCAL DEVELOPMENT: EMERGENCE AS A TOOL FOR UNDERSTANDING SOCIAL IMPACTS OF PROJECTS ON BUILT CULTURAL HERITAGE
STEFANO DELLA TORRE, ANDREA CANZIANI Building Environment Science & Technology Department, Polytechnic of Milan Via Bonardi 3, 20133 Milano, Italy E-mail:
[email protected],
[email protected] Cultural Heritage is comprehensible within an integrated vision, involving economic, cultural and ethic values, typical of not renewable resources. It is an open system that doesn’t correspond just to monuments but is made by the complex interactions of a built environment. The systemic relationships between cultural goods (object, building, landscape), and their environmental context have to be considered of the same importance of the systemic relations established with stakeholders/observers. A first partial answer to Cultural Heritage systemic nature has been the creation of “networks” of cultural institutions, that afterwards have been evolving in “cultural systems” and have been recently followed by “cultural districts”. The Cultural District model put forward a precise application for the theory of emergence. But its systemic nature presents also some problematical identifications. For Cultural Heritage the point is not any more limited to “direct” actions. We must consider stakeholders/observers, feedback circuits, emergence of activation of social/cultural/human capital, more than that linked to the architectural design process. a Keywords: local development, relation object - user, Heritage, network of relationships.
1. Cultural Heritage: between nature and history 1.1. Cultural “things” or cultural “heritage” A new vision of the role of Nature and Cultural Heritage might be find out in a well aware transdisciplinary vision of man and of his works within the ecosystems and the environment arose during the second half of the last century. It is enough to remember the 1972 first Club of Rome’s report, The Limits To Growth, or the UNESCO Convention concerning the Protection of the World
a
While the ideas expressed in the present paper derives from common analyses and reflections, the writing of the first part should be attributed to Stefano Della Torre (1.1, 1.2) and the second to Andrea Canziani (1.3, 2,3).
79
80
S. Della Torre and A. Canziani
Cultural and Natural Heritage (1972) and the ICOMOS Amsterdam Declaration (1975). The idea that Cultural Heritage is formed by “all the goods having a reference to civilization history” and that a Cultural Good is every “material evidence having value of civilisation” [1] is affirmed internationally in 1954 by The Hague Convention [2] and in Italy by the statements of the Commission for the Protection and Enhancement of the Historical, Archaeological, Artistic, and Natural Heritage (Commissione di indagine per la tutela e la valorizzazione del patrimonio storico, archeologico, artistico e del paesaggio), commonly known as the Franceschini Commission after its chairman, established in 1964. It is the overtaking of an aesthetic conception, which former laws have been based on, in favour of a wider idea of the cultural value. A value that includes every tangible and intangible evidence and is not limited to aesthetic or historic excellence. It is the expression of “a concept of culture that, against the imbalance caused by fast economic and social modifications [...], assumes a broad anthropological sense and put Nature in touch with History. Therefore artistic and historic heritage is considered as ‘concrete entity of site and landscape, of survival and work’ uninterruptedly placed over the territory and therefore not open to be considered separately from natural environment and in fact coincident at last with Ecology itself” [3, pp. 30-31]. The relevance of giving an open definition and bypassing a mere visual approach is due to the fact that persisting in a preservation of “things of great value” – handled one by one as a consequence of the aesthetic exceptionality that caused their condition of being under preservation laws – means persisting in the vision of Cultural Heritage as belonging to its own separate category, where the value is expressed by a “declaration of interest” that can not and shall not go further the merits of that particular “thing”. A first consequence of this state of things is to separate the goods from their territorial and social context. Just that same context that produced them. A second consequence is to divide a world where cultural value is absolutely prevailing, so that any restoration cost seems to be admissible, from another world where the cultural value is instead absolutely negligible. Third further consequence is the division between the object and its history: the “work of art” is just considered as materialization of a pure artistic value, forgetting its documentary evidence. In this way the preservation based on single value declarations singles out – i.e. divides, isolates – each object to preserve, that is indeed excluded from the evolution of its active context – social and economical –. Just a symbolic and aesthetic value is attributed to goods and they are used as touristy attraction at the most. It is
Comprehensive Plans for a Culture-Driven Local Development: …
81
unnecessary underlining the coherence between this preservation model and the idea of restoration una tantum, or better still, once and for all. And it is unnecessary to stress the distance from systemic visions and sustainable strategies. This terminological change should have been implying a change in general perspective [4]: either for shifting from a passive, static preservation to a proactive one, based on prevention and maintenance, and for a different attention to types and ways of enhancement and management activities. These ones directed to support an open and real common fruition of cultural value within the good. But nevertheless this has just partially happened. 1.2. A new idea of culture: heritage as open system Cultural goods indeed are an open system that does not correspond to monuments. The attention should overtake single elements, considering their inclusive interactions and “recognizing in the general environment the more important informative document” [3, p. 31]. The idea that the attention has not to be given to single goods or to their sum [3, pp. 44 ff.][6, pp.78 ff.] – the idea of the catalogue as taxonomic enumeration that should exhaust the global knowledge of a system – but to their interactions, opens to wider systemic visions. Cultural Heritage is only comprehensible within a global and integrated vision, involving economic, cultural and ethic values typical of not renewable resources. Therefore it does not makes any sense neither the restoration as sum of isolated interventions, nor the separation of protection processes from territorial context [7][15, p.13 ff.]. It is the basis of a conservation defined as coherent, coordinate and planned activity. From this perspective the value does not consist in the tangible object, but in its social function, “seen as intellectual development factor for a community and as historical element which defines the identity of local communities” [4,8]. Nowadays we know beyond doubt that Cultural Heritage is a source of exchange, innovation and creativity [9]. Therefore speaking of enhancement means referring to choices and acts that allow to use the potentialities of a cultural good to create mainly a social advantage. Of course, also if enhancement has to do with possibilities of use/fruition by a larger and larger public, either in a synchronic sense and in a diachronic sense [10], it cannot set aside protection. An object that becomes part of heritage must preserve the values that make of it a historical evidence, a symbol of cultural identification for a community. These values consist in its
82
S. Della Torre and A. Canziani
authenticity that is always tangible, related to its material body. The evidence value consists in the fabric because the memory depends on the signs of passing time on the fabric. Without preserving all the richness of the signs in their authenticity, the evidence itself is missed. And there is no emergence process of the value that originate from the meeting between the object and people histories. The systemic relationships between cultural goods and their environmental context has to be considered of the same importance of the systemic relations established with individuals and society. And perhaps these exercise an even deeper influence. This means that we have to give substance to enhancement not only at level of actions on each object/building/landscape (allowing fruition, minimizing carelessness, giving support services and rights management [11], ...), but also working on territorial context (transport system, accommodation facilities, ...), on environmental quality [12] and social context improvement (comprehension and recognition, involvement of users in care, planned conservation processes, ...). From this viewpoint the integration between different territorial elements is crucial to reach what we might call “environmental quality”. This is one of the main strategic aim for protection of cultural goods, not seen as separate entities whose aging has to be stopped, but as systemic elements whose co-evolutive potentialities we have to care [13,17]. The Cultural Heritage accessibility involves also the issues of mass democratic society and cultural citizenship rights [14][3, p.89, p.255], of social inclusion and cultural democracy [16]. The relationship between use/enhancement and socio-historic values [17, pp.10-13], with its accent on the user/observer, designs an idea of heritage that can have either a direct influence on development as innovation/education and also of course an indirect influence on the economic system. What may look like an inversion between direct and indirect benefits is just seeming: indeed if development is knowledge, economic benefits are the main positive externalities of knowledgeb [18]. What does it mean conserving cultural goods is wonderfully expressed by Andrea Emiliani when he writes: “Non era più possibile immaginare che un dipinto non facesse parte di una certa chiesa e che quella chiesa, a sua volta, non fosse parte integrante di una certa città, di un certo paesaggio, di una certa economia e di una certa società. Non era più possibile trascurare che, per b
“All that is spent during many years in opening the means of higher education to the masses would be well paid for if it called out one more Newton or Darwin, Shakespeare or Beethoven”. (A. Marshall, 1920 [5, IV.VI.26]).
Comprehensive Plans for a Culture-Driven Local Development: …
83
quanti fossero gli interventi di consolidamento e restauro, il risultato complessivo avrebbe aperto una nuova fase di precarietà e di rischio per edifici storici rimessi, si, in sesto ma destinati a subire le conseguenze del sempre più rapido e inarrestabile indebolimento dei relativi contesti territoriali. Chiunque avesse anche lontana pratica di certe faccende, doveva sapere ormai bene che restaurare un dipinto in una chiesa montana, sita in una zona afflitta da spopolamento, non aveva se non un significato interlocutorio o al massimo di mera conservazione fisica: una frana avrebbe poi colpito quella chiesa; il disboscamento avrebbe potuto facilitare quella frana; la fragilità socioeconomica di quel comprensorio avrebbe accelerato il disboscamento...e a cosa sarebbe servito allora aver restaurato quel dipinto, averlo tolto dal suo contesto culturale, averlo -in fondo- sottratto alla sua superstite funzione sociale, aver infine- con la sua assenza stessa- aggravato le condizioni di quella zona? E se a Raffaello è possibile sopravvivere anche nell'atmosfera indubbiamente rarefatta del museo, per il dipinto di minore interesse storico o qualitativo la vita nel contesto suo originario è tutto o quasi. Toglierlo di lì vuol dire operare con leggerezza quel temibile fenomeno di ‘dèracinement’ che è l'attentato più pericoloso che mai si possa organizzare sull'oggetto culturale” [19]. Considering the whole territory as a common good means that its control mechanisms have to deal with the heritage conservation and evolution. That means having to deal with participated management, with scenarios of shared design and confrontation with other disciplines studies, like politics, economy, social or biological sciences. These are the frameworks of governance, of systems like cultural districts, of transdisciplinary approach. 1.3. From integrated cultural systems to cultural districts The first partial answer to Cultural Heritage systemic nature has been the creation of “networks” of cultural institutions – such as museums or libraries – that afterwards have been evolving in “cultural systems”c [20]. The “cultural system” refers to ideas of programming and management rationalization, expressing a deeper generalization aim. The addition of an adjective like “integrated” expresses the awareness of the importance of connections with the territory and of resources diversification. But planned and controlled actions still c
“Molte delle considerazioni che accompagnano le iniziative di “messa a sistema” dei musei invece, come ad esempio quelle inerenti l’applicazione degli standard museali, hanno la tendenza a considerare il sistema una figura organizzativa in grado di supplire alle dimensioni, supposte insufficienti, dei musei locali, consentendogli di assumere comportamenti simili a quelli di maggiori dimensioni.” (Maggi Dondona, (2006), [15, p. 6 ff.]).
84
S. Della Torre and A. Canziani
remains central, as it was a predictable mechanism of networks and transmissions. The integrated cultural systems have been recently followed by “cultural districts”. As Trimarchi said, “Looking carefully the axiom was easy. The Italian industrial production [...] has developed in this complex unit, with a renaissance taste, in which the issues that for the big industry could be defects (the informality, the importance of family relations, the role of decoration, etc.). are instead unreplaceable virtuous elements” [21]. The district model at last seems to be the answer to all those enhancement problems of a Cultural Heritage that too often comes out as the Italian basic economic resource [22]. Moving from the success of many cultural districts within urban environments, it has been developed a set of models for new scenarios of culture-driven development, trying to deal also with situations really different from the urban ones. And then, while the model is still under the analysts’ lens, quite a few prototypes begin to be applied: usually cultural institutions combinations or development initiatives collected under some cultural symbol whose establishing as a system should produce profitable spinoffs, always not quite well defined at the moment but sure and wide in the future [23]. But drawing near districts, production, development and culture is not easy. A cultural district has been defined, from a theoretical standpoint, as the product of two key factors: “the presence of external agglomeration economies and the awareness of the idiosyncratic nature of culture [which is peculiar to a given place or community and to a specific time]. When these two factors join within a dynamic and creative economic environment, the conditions for having a potential cultural district are satisfied. Adding efficient institutions is the political factor that can transform a potential district in a real outcome” [24]. The attention is concentrated mainly in the creation of a value chain and in the main role played by the organization of the so called Marshallian capital: that “industrial atmosphere” with continuous and repeated transactions that cause the information to circulate. There are actually several cultural district connotations [24,25,26,32]. The word reminds an industrial/economic matrix and therefore the idea of incomes generated by the Cultural Heritage or even a culture commercialization. But a more detailed analysis makes clear that such a connotation has been deleted and the expression is used because of its relationship with local community participation, with the answering to government incentives, with the capability of such a system to produce and spread innovative cultural issues and external
Comprehensive Plans for a Culture-Driven Local Development: …
85
economies connected with innovation [23]. The idea of a district stresses the added value of concentration and localization, but also the emergence of these processes. The cultural district idea is linked to an inclusive vision that can re-discuss and understand on the one hand the role of Cultural Heritage within the development economies of a single territory, on the other hand “the deep change in the role of culture within the contemporary society and its present intellectual and emotional metabolisms” [27]. It is possible to recognize in people’s mental space the main infrastructure that has to be the aim of programming and planning. The quality of each action has the possibility to improve the cultural capital, i.e. the local community capability [28]. From the viewpoint of conservation studies, where we investigate cultural mechanisms and practical procedures that form the bases of architectonical heritage preservation, this model is pretty interesting. The systemic links among heritage, territory and society represent the cutting edge of preservation. Moreover the accent that the district model put on users participation is in tune with the most up-to-date visions of government/governance shifting and with conservation as activity made of studies, capability and prevention. 2. Emergencies between cultural districts and architectural heritage The cultural district model put forward a precise application for the theory of emergence. Moving from the systemic nature of Cultural Heritage we observe that preservation and enhancement procedures present complex interactions between elements and the properties of heritage as a whole are not deducible from single objects. We need strategies going further than the search of a local perfection. The same strategies need to be auto-controlled and adaptive [29] to answer to the evolutionary nature of Cultural Heritage. Moreover we have to consider that what is heritage and what is not is absolutely observer-dependent, with a crucial role of the observers’ awareness of the process. Within this framework the Dynamic Usage of Models (DYSAM) [30] is particularly suitable, having to deal with different stakeholders expectations and social systems modellisation [31]. But the systemic nature of a cultural district and of every single object, building or landscape, presents also some problematical identifications. A district is clearly an emerging system because of the nature of interaction and behaviour of its elements, but how interactions can act on a common property? and to which set these elements belong? It is not obvious that a church
86
S. Della Torre and A. Canziani
and a vineyard belong to the same set. A mere geographical link is too weak and the district link would be self-referential. The only real link is due to the sense of belonging to Cultural Heritage, and it requires a conscious acknowledgment act. Since a system is not behaving as a machine, where each component plays a role and can be replaced without the need to act on the others, any change influences the whole system. It is not possible any intervention on the system just acting on elements and the characteristics of a system can be influenced only by more complex interventions on components interactions over time. How do that reflect on the conservation of each single object when we deal with Cultural Heritage? “A building is considered a dynamic system, where technological interventions inspired by the concept of machine and repair are absolutely inadequate, based on reductionist simplifications. The task is to identify the level of description where occur those emergence processes that maintain the materiality of structure and to support them” [30, p. 92]. But what is the communication between a single building and territorial complex? The possibility to work on a single item without prejudice to the system may indeed be a useful property for the system conservation. But for the Cultural Heritage we must take into account the impossibility of support replication and the need for specific processes that require studies and a non-trivial -not standardizedknowledge. Therefore it is evident the point is not any more limited to the “direct” actions on the good, but we have to consider stakeholders –the observers- and feedback circuits. According to Crutchfield [33] there are three possible types of emergence: intuitive definition (something new appear), pattern formation (an observer identifies organization), intrinsic emergence (the system itself capitalizes on patterns that appear). If only the last one is a right and complete specification for systemic properties, few more doubts arise about the applicability to districts. How might we speak of systemic properties not reducible to single elements properties, but emerging from their interactions? Is it possible speak of unpredictable behaviours that lead to the need of a new cognitive model? At first glance economical interactions between i.e. agricultural and construction sectors do not seem to be unpredictable and the same might be for interactions between built objects and life quality. We should more properly speak of “not trivial consequences” of a model. However these consequences are predictable analyzing the model and it leads us to the second emergence type. Architecture is something coming from a design process, the basis is the prediction of use, coming from forecasts and willing. If an emerging behaviour is something that was not in the designer’s aims, then there are no such a
Comprehensive Plans for a Culture-Driven Local Development: …
87
behaviours for architectural objects. In architecture a real emergence is possible only when we recognize into a building some values that have not been foreseen in the design or construction process. That is exactly what usually happen to something recognized as cultural heritage. From this viewpoint it is much more interesting the emergence linked with the activation of social/cultural/human capital than that of architectural design process –that still remains unclear- [34]. 3.
From events to processes: integration of preservation and enhancement process with other territorial structures
The increase of human capital is recognized as a fundamental basis for development and culture can just address growing processes that otherwise might become a threat for heritage. The reference is the basic role of local community participation in the cultural district, the bright answering to government incentives, the ability to produce and spread innovative external economies connected with innovation: that is to say the emerging character of the district. The evolution of culture has always been the capability to interact building a knowledge network. Let us recall here the ideas of “acting/living local and thinking global” [35], at the opposite of predominant projects where the role of intelligence is restricted to immediate and circumscribed problems microscopically thinking- while consumption is global. That is not only a matter of levels of description. It is the case of the peripheral thinking at the basis of Internet: “With the Internet, decisions were made to allow the control and intelligence functions to reside largely with users at the “edges” of the network, rather than in the core of the network itself” [36,37]. Conservation and enhancement processes could act as a catalyst for quality, innovation, creativity, research also in peripheral zone of territory and society. You need to involve users, especially the local community, using a contextual approach that through narration and participation leads to new learning and awareness [18]. That is the recognition or even the construction of the identity connected to Cultural Heritage. Fixed identities must be re-think in the present time, where everyone lives a multiplicity of belongings, less and less linked to territorial definitions. We need to develop the idea of cultural value not as guardian of tradition, but as something emerging from the meeting between heritage elements and internal people’s space [38]. Within this framework if the meeting is not just one, but it is a history where each point of contact is a new deposit of value that is renewed by each event. It is the construction of a
88
S. Della Torre and A. Canziani
dynamic identity, not built just on consolidated values but on global and hybrid relationships. From this standpoint the cultural diversity is seen as necessary for humankind as biodiversity is for nature. In this sense, it is the common heritage of humanity and should be recognized and affirmed for the benefit of present and future generations” [39]. That is the reference for sharing values and for giving the right weight to the cultural performance [40]. Within this frame of reference the built heritage has a basic catalyst role because of its easy recognizable importance, of its use, of its high visibility. But the classical loop – investments, growth, profitability, investment – encounters difficulties when dealing with public goods, characterized by high interconnections between environment complexity and stakeholders. When investments for the Cultural Heritage conservation give rise to new knowledge and education a new loop is established: the heritage is better understood, the identity is enriched or reformulated, there is a new awareness of the importance of taking care, there are the premises for its participated conservation. References 1. “Report of the Franceschini Commission on the Protection and Use of Historical, 2. 3. 4. 5. 6. 7.
8. 9. 10. 11. 12.
Archaeological, Artistic and Natural Heritage”, Rivista trimestrale di diritto pubblico 16, 119-244 (1966). UNESCO, Convention for the Protection of Cultural Property in the Event of Armed Conflict with Regulations for the Execution of the Convention, (The Hague, 14 May 1954). M. Montella, Musei e beni culturali. Verso un modello di governance (Mondadori Electa, Milano, 2003). G. Pitruzzella, Aedon, 1, 2.6 (2000). A. Marshall, Principles of Economics (Macmillan and Co, London, 1920). S. Settis, Italia S.p.A. L’assalto al patrimonio culturale (Einaudi, Torino, 2002). P. Petraroia, “Alle origini della conservazione programmata: gli scritti di Giovanni Urbani”, TeMa, 3, (Milano, 2001). C. Fontana, in: L’intervento sul costruito. Problemi e orientamenti, Ed. E. Ginelli, (Franco Angeli, Milano, 2002), p.15 ff. S. Settis, “Le pietre dell' identità”, Il Sole 24 ore, (13 november, 2005), p. 29. G. Pastori, Aedon, 3, 1.6-8 (2004). UNESCO, Universal Declaration on Cultural Diversity (Paris, 2001). A. Cicerchia, Il bellissimo vecchio. Argomenti per una geografia del patrimonio culturale (Franco Angeli, Milano, 2002). G. Guerzoni, S. Stabile, I diritti dei musei. La valorizzazione dei beni culturali nella prospettiva del rights management (Etas, Milano, 2003). P.A. Valentino, “Strategie innovative per uno sviluppo economico locale fondato sui beni culturali”, in La storia al futuro: beni culturali, specializzazione del territorio e
Comprehensive Plans for a Culture-Driven Local Development: …
13.
14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
89
nuova occupazione, Ed. P.A. Valentino, A. Musacchio, F. Perego, (Associazione Civita, Giunti, Firenze, 1999), p. 3 ff. S. Della Torre, in Ripensare alla manutenzione. Ricerche, progettazione, materiali, tecniche per la cura del costruito, Ed. G. Biscontin, G. Driussi, (Venezia, 1999). S. Della Torre, G. Minati, Il Progetto sostenibile, 2, (2004). S. Della Torre, Arkos, 15, (2006). L. Fusco Girard, P. NijKamp, Eds., Energia, bellezza, partecipazione: la sfida della sensibilità – Valutazioni integrate tra conservazione e sviluppo (Franco Angeli Editore, Milano, 2004). M. Maggi, Ed., Museo e cittadinanza. Condividere il patrimonio culturale per promuovere la partecipazione e la formazione civica, Quaderni Ires, 108, (Torino, 2005). M. Maggi, C.A. Dondona, Macchine culturali reti e sistemi nell’organizzazione dei musei (Ires, Torino, 2006). Economia della Cultura, 14(4), (Il Mulino, Bologna, 2004). L. Fusco Girard, Risorse architettoniche e culturali: valutazioni e strategie di conservazione (Franco Angeli Editore, Milano, 1987). D. Schürch, Nomadismo cognitivo (Franco Angeli, Milano, 2006). A. Emiliani, Dal museo al territorio (Alfa Editoriale, Bologna, 1974), pp. 207-208. L. Zanetti, “Sistemi locali e investimenti culturali”, Aedon, 2, (2003). M. Trimarchi, Economia della cultura, 15(2), (Il Mulino, Bologna, 2005), p.137. “L' arte, ' petrolio d' Italia' ”, in: Settis, (2002), p.30 ff. P.L. Sacco, S. Pedrini, Il Risparmio, 51(3), (2003). W. Santagata, Economia della cultura 15(2), (Il Mulino, Bologna, 2005), p.141. P.L. Sacco, G. Tavano Blessi, Global & Local Economic Review, 8(1), (Pescara, 2005). P.A. Valentino, Le trame del territorio. Politiche di sviluppo dei sistemi territoriali e distretti culturali (Sperling & Kupfer, Milano, 2003). M. Trimarchi, Economia della cultura, 15(2), (Il Mulino, Bologna, 2005), p.138. A. Sen, Rationality and Freedom (Harvard Belknap Press, 2002). S. Guberman, G. Minati, Dialogue about Systems (Polimetrica, Milano, 2007). G. Minati, Teoria Generale dei Sistemi. Sistemica. Emergenza: un’introduzione, progettare e processi emergenti: frattura o connubio per l’architettura? (Polimetrica, Milano, 2004). G. Becattini, in Il caleidoscopio dello sviluppo locale. Trasformazioni economiche nell'Italia contemporanea, Ed. G. Becattini, M. Bellandi, G. Dei Ottati, F. Sforzi, (Rosenberg & Sellier, Torino, 2001). A. Canziani, Beni culturali e governance: il modello dei distretti culturali, Ph.D. dissertation, (Politecnico di Milano, Milano, 2007). J.P. Crutchfield, in Physica D, special issue on the Proceedings of the Oji International Seminar Complex Systems - from Complex Dynamics to Artificial Reality, 5-9 April 1993, Numazu, Japan, (1994). V. Di Battista, G. Giallocosta, G. Minati, Architettura e approccio sistemico (Polimetrica , Milano, 2006). L. Sartorio, Vivere in nicchia, pensare globale (Bollati Boringhieri, Torino, 2005).
90
S. Della Torre and A. Canziani
36. V. Cerf, U.S. Senate Committee on Commerce, Science, and Transportation Hearing on “Network Neutrality”, (February 7. 2006).
37. F. Carlini, “Io ragiono solo in gruppo”, Il manifesto, 25 luglio 2004. 38. U. Morelli, Ed., Management delle istituzioni dell’arte e della cultura. Formazione,organizzazione e relazioni con le comunità di fruitori (Guerrini, Milano, 2002). 39. UNESCO, Universal Declaration on Cultural Diversity, (Paris,2001). 40. A. Canziani, M. Scaltritti, Il Progetto sostenibile, (2008, in printing).
SYSTEMIC AND ARCHITECTURE: CURRENT THEORETICAL ISSUES GIORGIO GIALLOCOSTA Dipartimento di Progettazione e Costruzione dell’Architettura, Università di Genova Stradone S. Agostino 37, 16123 Genoa, Italy E-mail:
[email protected] Systemics approaches towards architecture, traditionally within a structuralist framework (especially within a technological environment), may evolve in a non-reductionist way through: - non-reductive considerations of the role of human requirements in the definition of inhabited spaces; - acceptance of the use-perception dialogical relationship, and more generally of the artscience nexus, as being characteristic of architecture. Likewise, there are theoretical issues in the development of systemic, particularly within the discipline of architecture, including: - the role of the observer, in the constructivist sense and within the exceptions of scientific realism; - the unpredictability of emergence, with its related limits (of purely ontological significance). Keywords: systemics, architecture, emergence, observer.
1. Introduction A great amount of experience with the systemic approach towards architecture distinguishes studies and applications in various disciplinary environments, which operate within that context. Sometimes by accepting the more important developments of systemic, in other cases reiterating the classical concepts of Systems Theory, such experience, however, does not appear to be significantly projected toward the objectives of disciplinary recomposition. In effect, there remains, especially in Italy, the anti-historical counterposition of scientific culture against artistic culture which still characterises all the relationships between the diverse disciplinary aspects in architecture, and any project of an interdisciplinary nature. For example, in Italian Faculties of Architecture, within the different environments operational, professional, etc., there are clear separations between project approaches and project culture (based on requirements, poietic, morphogenetic, etc.).
91
92
G. Giallocosta
The architectural project, on the other hand, when oriented towards allowing the optimum use and management of its many implications (social-technical, economic, perceptive, etc.), requires suitable interdisciplinary elaboration/ applications to be governed, given the mutual interactions and emergent effects, through transdisciplinarity (Minati, 2004 [12, pp. 37-42 and 49-52]). Similarly, the importance of infradisciplinary research should not be underestimated (regarding epistemological foundations and methodologies which are fundamental in obtaining specialistic rigour). The related problems are not only of an operative or applicational nature, but also concern (with these) the need/opportunity to put architecture on trial in its multiplicity of components (compositive, technological, social, economic, etc.) and in the interactions produced. These issues, identified here above all as being of a theoretical nature, can lead to just as many problems regarding architecture and the systemic approach. In the former, it is useful to work with conceptually shared approaches of definitions of architecture, inferring from these, suitable directions/motivations for scenarios of a systemic nature. In the latter, one needs to recognize, within the developments of systemic itself, those problems of major importance for architecture. 2. Possible shared definitions of Architecture Numerous definitions of architecture can be found in recent contributions, and in the historiography of the sector. For example, amongst those ascribable to various Masters (whose citations are given by Di Battista, in Di Battista et al., Eds., 2006 [5, p. 39]): • Architecture “(...) can be seen as the twin of agriculture; since hunger, against which men dedicated themselves to agriculture, is coupled to the need for shelter, from which architecture was born ...” (Milizia, author' s translation); • “(...) construire, pour l’architecte, c’est employer les matériaux en raison de leur qualités et de leur nature prope, avec l’idèe préconcue de satisfaire à un besoin par les moyens les plus simplex et les plus solides ...” (Violletle-Duc); • “l’architecture est le jeu savant, correct et magnifique des volumes assemblès sous le soleil (...)”, and also, “(...) the Parthenon is a selected product applied to a standard. Architecture acts upon standards. Standards are a fact of logic, analysis, scrupulous study, and derive from a well-
Systemic and Architecture: Current Theoretical Issues
93
defined problem. Experimentation fixes, in a definitive manner, the standard ...” (Le Corbusier, author' s translation). Amongst these, a definition by William Morris in 1881 describes architecture as the moulding and altering to human needs of the very face of the earth itself, except in the outermost desert (Morris, 1947, cit. in Benevolo, 1992 [2, p. 2]). One can agree with Di Battista in considering this definition as a compendium of many of the preceding and later ones; “(...) it takes architecture back to the concept of inhabited environment (...) where human activities have changed, utilised, controlled natural situations to historically establish the built-up environment (...) Architecture could therefore be defined as a set of devices and signs (not ephemeral - author' s note) of man which establish and indicate his system of settlement (...) Architecture is applied to a system of settlement as a system of systems: ecosphere (geological, biological, climatic, etc.) and anthroposphere (agricultural/urban, social, economic). Within these systems there are simultaneous actions of: • observed systems (physical, economic, social, convergent with/in settlement system, according to the schematisations of Di Battista - author' s note) which present different structures and a multiplicity of exchange interactions; • observing systems as subjects or collectives with multiple identities and values, but also as cognitive models (philosophical, religious, scientific, etc.) which explain and offer multiple intentions and judgement criteria” (Di Battista, in Di Battista et al., 2006 [5, pp. 40-41], author' s translation). Morris's definition (Morris, 1947 [13]), especially within the most explicit definition of architecture as a built-up environment (or founding and denoting settlement systems) for satisfying human needs, hence provides unitary and (at least a tendency towards) shared conceptions where, for example, one considers the sphere of human needs in a non-reductive sense of material needs, but also of axiology, representation, poiesis, amongst others. Another definition (most agreeable, and useful here for our declared purposes) is that of Benjamin who, in his most famous essay (Benjamin, 1936 [3]), stresses the particular nature of architecture as a work of art and from which one benefits in two ways, through use and perception: here we also see that dialogical, and highly interactive, relationship between artistic and scientific
94
G. Giallocosta
culturea. The aim of overcoming the art-science dualism was one of the declared objectives of the birth of the Faculties of Architecture, as a disciplinary synthesis of traditional contributions from the Fine Arts Academies (Accademie di Belle Arti) and the Engineering schools. One must, therefore, develop that relationship in a non-reductionist way (Minati, 2004 [12, pp. 84-86], and Minati, in Di Battista et al., 2006 [5, p. 23]), precisely as a synthesis of the twofold fruitional modes of architecture (Benjamin, 1936 [3]) and the effective indivisibility of its diverse aspects: not only representation, or communication, or use, etc., but dynamic interactions involving the multi-dimensional complex of those modifications and alterations produced (and which produce themselves) in the moulding and altering to human needs of the very face of the earth itself (Morris, 1947 [13]). The dialogical recovery of that relationship (art-science), moreover, becomes a necessary requirement, above all when faced with the objectives of optimised management of multiple complex relationships which distinguish contemporary processes in the production of architecture: transformation and conservation, innovation and recollection, sustainability and technological exaltation, etc. In current practices (and reiterating some of what has been said above) there is, however, a dichotomy which can be schematically attributed to: • on one hand, activities which are especially regulatory in the formal intentions of the architectural design, • on the other, the emphasis upon technical virtuosities which stress the role of saviour of modern technology (Di Battista, in Di Battista et al., 2006 [5, p. 38]). Nor can one ignore, particularly in the classical approach of applying systemic to architecture (and emphasised, especially in technological environments), the existence of an essentially structuralist conception, precisely in the sense described by Mounin (Mounin, 1972 [2], cit. in Sassoli, in Giorello, 2006 [8, p. 216])b: in this conception, for example, the leg of a table would be characterized, a
b
The art-science nexus has been well clarified, moreover, since ancient times. It is referred to, for example, and with its own conceptual specificities, in Plato's Philebus, Vitruvio's De Architectura, the Augustinian interpretation of architecture as a science based upon the laws of geometry, Le Corbusier's Le Modulor, etc. (Ungers, in Centi and Lotti, 1999 [4, pp. 85-93]). When a structure (or for others, a system) can essentially be considered as a construction, in the current sense of the word. In this formulation, analysing a structure means identifying the parts which effectively define the construction being considered (Mounin, 1972 [14], cit. in Sassoli, in Giorello, 2006 [8, p. 216]): and such a definition exists because “(...) a choice is made in the
Systemic and Architecture: Current Theoretical Issues
95
in an analogous manner to the constituent parts of the current concept of building system, “(…) neither by the form nor by the substance, because I could indifferently put an iron or a wooden leg (…) the functions of this part in relation to the structure will remain (in fact - author’s note) invariant (…) LéviStrauss (…) defined the method (…) used in his anthropological research in a way clearly inspired by structural linguistics (defining the phenomenon studied as a relationship between real or virtual terms, build the framework of possible permutations amongst them, consider the latter as a general object of an analysis which only at this level can be made, representing the empirical phenomenon as a possible combination amongst others and whose total system must first of all recognize itself - author’s note) …” (Sassoli, in Giorello, 2006 [8, pp. 216-217], author' s translation). In 1964, Lévi-Strauss identified as virtual terms some empirical categories (raw and cooked, fresh and putrid, etc.) showing how, once the possible modes of interaction had been established, such elements can “(...) function as conceptual instruments to make emergent certain abstract notions and concatenate them into propositions ...” (Lévi-Strauss, 1964 [11], cit. in Sassoli, in Giorello, 2006 [8, p. 217], author' s translation) which “(...) explain the structures which regulate the production of myths. In that work, Lévi-Strauss tried to show how ‘if even myths (i.e., apparently the most free and capricious human cultural elaboration) obey a given logic, then it will be proven that the whole mental universe of man is subjected [...] to given norms’. But if all the cultural production of man (...) can be traced back to unconscious structures possessing an internal logic independent of the subject, then structuralist ‘philosophy’ will result in a radical anti-humanism. We can interpret in this way the famous statement by Foucault, according to which the structuralist formulation decrees the ‘death of man’ (Foucault, 1966 [7], author' s note) ...” (Sassoli, in Giorello, 2006 [8, pp. 217-218], author' s translation). It thus becomes clear how the declared radical anti-humanism of the structuralist approach leads to approaches towards systemic which would obliterate (apparently?) one of its most important current paradigms: the role of the observer as an integral part, a generator of processes (Minati, in Di Battista et al., 2006 [5, p. 154]). But the systemic concept, as applied classically to architecture by Italian technological schools, does contemplate an observer in the role of user; the latter, in fact, with its own systems of requisites, functions as a referent for the requisites-performance approach, despite underlining its role attribution in a reductionist sense (as it fundamentally expresses the requisites arrangement of the various parts. And the main criterion for that choice is the function which they have ...” (Mounin, 1972 [14], cit. in Sassoli, in Giorello, 2006 [8, p. 216], author' s translation).
96
G. Giallocosta
deriving from the use value of the end-product) and lacking significant correlations with other agents (and carriers of their own interests, culture, etc.)c. More generally, therefore, although definitions of architecture which tend to be shared can trigger mature trial scenarios of a systemic nature (Di Battista et al., 2006 [5]), and within an interdisciplinary, transdisciplinary (and infradisciplinary) framework, unsolved problems, however, still exist. Amongst others: which observer (or better, which system of observers) is best for activities which put on trial architecture in its multiplicity of components and interactions? But this is still an open question even in the most recent developments in systemic. Similarly, there is the problem of the unpredictability of emergent phenomena (especially in the sense of intrinsic emergence)d, when faced with objectives/requisites, typical of the architectural disciplines, of prefiguring new arrangements and scenarios. 3. Specific problems in systemic and implications for Architecture The role of the observer, “(...) an integral part of the process being studied, combines with constructivism (...) This states that reality can not be effectively considered as objective, independently from the observer detecting it, as it is the observer itself which creates, constructs, invents that which is identified as reality (...) Essentially one passes from the strategy of trying to discover what something is really like to how it is best to think of it” (Minati, in Di Battista et al., 2006 [5, p. 21], author' s translation). Moreover, the connection between the role of the observer and the Copenhagen interpretation of quantum theory is well-known from 20th century scientific research; amongst others, Heinz Pagels refers to it explicitly (as far as this is of interest here), where one can consider as senseless the objective existence of an electron at a certain point in space independently from its concrete observation: thus reality is, at least in part, created by the observer (Pagels, 1982 [15], cit. in Gribbin, 1998 [9]).
c
d
In a similar way, the structuralist formulation (Mounin, 1972 [14]), at least in Mounin's idea (which, in postulating a choice in the arrangement of the various parts, identifies the criteria through their functions, and the latter will, in some way, have to wait), would seem in any case to assume implicitly the existence of a referent with such a waiting function, even though in a tautologically reductionist sense. Di Battista, moreover, develops a proposal for the evolvement of the classical requisites-performance approach, which integrates use-values with cultural, economic, values, etc. (Di Battista, in Di Battista et al., 2006 [5, pp. 85-90]). Where not only the establishment of a given behavior (even though compatible with the cognitive model adopted) can not be foreseen, but where its establishment gives rise to profound changes in the structure of the system, requiring a new modelling process (Pessa, 1998 [16], cit. in Minati, 2004 [12, p. 40]).
Systemic and Architecture: Current Theoretical Issues
97
Clearly, there is also great reserve regarding the role of the observer in the terms described above. Prigogine for example, although dealing with more general problems (including unstable systems and the notions of irreversibility, probability, etc.), states that: “(...) the need for introducing an ‘observer’ (much more significant in quantum mechanics than in classical mechanics - author' s note) necessarily leads to having to tackle some difficulties. Is there an ‘unobserved’ nature different from ‘observed’ nature? (...) Effectively, in the universe we observe equilibrium situations, such as, for example, the famous background radiation at 3°K, evidence of the beginning of the universe. But the idea that this radiation is the result of measurements is absurd: who, in fact could have or should have measured it? There should, therefore, be an intrinsic mechanism in quantum mechanics leading to the statistical aspects observed” (Prigogine, 2003 [17, p. 61], author' s translation). Moreover: the role of the observer would lead to the presence “(...) of a subjective element, the main cause of the lack of satisfaction which Einstein had always expressed regarding quantum mechanics” (Prigogine 2003 [17, p. 74], author' s translation). Thus, the presence of a subjective element brings with it risks of anthropocentrism. Nor can one avoid similar implications of constructivism with architecture (in which exist, tautologically, all anthropic processes); if the role of the observer, in fact, especially regarding decision-making and managerial activities, entails taking responsibility for safeguarding common interests, it becomes clear from other points of view that dichotomic attitudes can arise from this responsibility, and above all can be theoretically justified through considerations of subjective presences. But such problems regarding the observer in current developments in systemic also allude to the dichotomy between scientific realism and anti-realist approaches, whose developments (especially regarding logical-linguistic aspects of science) are efficaciously discussed by Di Francescoe. Particularly meaningful in that examination (Di Francesco, in Giorello, 2006 [8, pp. 127-137]), who, moreover, explains the position according to second Putnam (converted to antirealism, according to a form of so-called internal realism), Wiggins, and Hackingf, is the strategy suggested by the latter regarding the dual dimension of e f
Roughly, scientific realism (as opposed to anti-realist approaches) contemplate a reality without conceptual schemes, languages, etc. (Di Francesco, in Giorello, 2006 [8, p. 127]). More precisely: Hacking, 1983 [10]; Putnam, 1981 [18]; Putnam, 1987 [19]; Wiggins, 1980 [20]. Roughly speaking, the internal realism of the according to second Putnam contemplates, for example, how to “(...) ask oneself: of which objects does the world exist? only makes sense within a given theory or description” (Putnam, 1981 [18], cit. in Di Francesco, in Giorello, 2006 [8, p. 133]).
98
G. Giallocosta
scientific activity (representation and intervention) and therefore effectively translatable into architectural processes: reasoning “(...) of scientific realism at the level of theory, control, explanation, predictive success, convergence of theories and so on, means being confined within a world of representations. [...] And thus engineering, and not theorizing, is the best test of scientific realism upon entities. [...] Theoretical entities which do not end up being manipulated are often shown up as stunning errors” (Hacking, 1983 [10], cit. in Di Francesco, in Giorello, 2006 [8, p. 137], author' s translation). Naturally, in architecture, there are still critical implications regarding the role of the observer (and in further aspects, beyond those described here). If it could be accepted regarding that interaction between observed systems and observer systems, and as already mentioned (Di Battista, in Di Battista et al., 2006 [5, pp. 40-41]), for the formulation of the latter, the non-reductionist evidence of the multiple (and often different) interests, values, culture, etc., characteristic of the agents in construction projects, is also necessary. Neither is this a question of predominantly computational importance (how to formalise the observer systems); in fact, it also involves, and especially within the framework of managing and governing predominantly social interests/values/etc., of defining and following systemics, organised and self-organise, collectivities (Minati, in Di Battista et al., 2006 [5, p. 21]), avoiding problems: • from shared leadership to unacceptable dirigism, • from self-organisation to spontaneity. Then again: who observes the observer systems? Further more detailed considerations, amongst (and beyond) the problems and hypotheses mentioned so far, are therefore necessary. The role of the observer, moreover, is considered to be of fundamental importance also for the detection of emergent properties (Minati, in Di Battista et al., 2006 [5, pp. 21-22]). But the unpredictability of the latter (as mentioned above) leads to further problems in systemics approaches to architecture, effectively persisting in every outcome of anthropic processes: including those ex-post arrangements of the moulding and altering to human needs of the very face of the earth itself (Morris, 1947 [13]). This unpredictability, however, can to a certain extent be resolved dialogically with respect to the requisites of the prefiguration of scenarios (typical of architectural disciplines): • taking those scenarios to be consistently probabilistic (or as systems of probabilities);
Systemic and Architecture: Current Theoretical Issues
•
99
optimising them through validation of positive propensities of and amongst its component parts (and minimising negative potentialities), also through suitable formalisations and ex-ante simulations.
What is more, it is not unusual in architecture to resort to probabilistic formalizations. On the basis of the Bayes theorem, for example, when there is usually a number of experimental data (b1, b2, ..., bn) which can then be formulated in an appropriate manner, suitable networks can be formalized to support, amongst other things, the evaluation of technical risks (intended, even in building, as being connected to reaching the quality and the performance planned in the project), where, for example, the experimental evidence represents symptoms of the outbreak of a pathology (or hypothesis a)g. In the Bayesian approach, however, there is still the problem of the definition of the a priori probability, even though this can be reduced, according to some, on the basis of its own aspects of subjective variability through suitable inductive principles (besides those usually considered in the calculation of the probabilities), which would lead to corresponding variabilities in the attribution of a posteriori probabilitiesh. Emphasis should therefore be placed, in general, upon those theoretical issues regarding the dichotomy between the unpredictability of emergence and the necessity for ex-ante prefigurations in architecture. The systemic approach in this sense, given the state of current cognitive models of observer systems (Di Battista, in Di Battista et al., 2006 [5, p. 40]), can simply direct it towards an appropriate reduction of that dichotomy. But even this question takes on a purely ontological importance. g
h
See, for instance, the Bayesian approach to risk management in the building industry, see, for example, Giretti and Minnucci, in Argiolas, Ed., 2004, pp. 71-102. As is well known, the Bayes theorem (expressed here in its most simple form) allows calculation of the probability (a posteriori) p(a\b) of a hypothesis a on the basis of its probability (a priori) p(a) and the experimental evidence b: p(a\b) = (p(b\a)p(a))/p(b) The a priori probability “(...) can be interpreted as the degree of credibility which a given individual assigns to a proposition a in the case where no empirical evidence is possessed (...) Whereas p(a\b), which denotes the epistemic probability assigned to a in the light of b, is said to be the relative probability of a with respect to b. In the case where a is a hypothesis and b describes the available experimental evidence, p(a\b) is the a posteriori probability ...” (Festa, in Giorello, 2006 [8, pp. 297-298], author' s translation). Besides mentioning (even problematically) some of the inductive principles for minimising subjective variability, Festa recalls the subjectivist conception (de Finetti et al.): according to which, notwithstanding the subjective variability in the choice of the a priori probability, “(...) as the experimental evidence (...) available to the scientists grows, the disagreement (amongst the latter regarding the different evaluations of the a posteriori probabilities - author's note) tends to decrease ...” (Festa, in Giorello, 2006 [8, p. 305], author's translation).
100
G. Giallocosta
References 1. C. Argiolas, Ed., Dalla Risk Analysis al Fault Tolerant Design and Management (LITHOSgrafiche, Cagliari, 2004).
2. L. Benevolo, Storia dell’architettura moderna, 1960 (Laterza, Bari, 1992). 3. W. Benjamin, L’opera d’arte nell’epoca della sua riproducibilità tecnica, 1936 (Einaudi, Turin, 2000).
4. L. Centi, G. Lotti, Eds., Le schegge di Vitruvio (Edicom, Monfalcone, 1999). 5. V. Di Battista, G. Giallocosta, G. Minati, Eds., Architettura e Approccio Sistemico (Polimetrica, Monza, 2006).
6. H. von Foerster, Sistemi che osservano (Astrolabio, Rome, 1987). 7. M. Foucault, Les mots et les choses (Gallimard, Paris, 1966). 8. G. Giorello, Introduzione alla filosofia della scienza (1994) (Bompiani, Milan, 2006).
9. J. Gribbin, Q is for Quantum (Phoenix Press, London, 1998). 10. I. Hacking, Representing and Intervening (Cambridge University Press, Cambridge, 1983).
11. C. Lévi-Strauss, Le cru et le cuit (Plon, Paris, 1964). 12. G. Minati, Teoria Generale dei Sistemi, Sistemica, Emergenza: un’introduzione 13. 14. 15. 16. 17. 18. 19. 20.
(Polimetrica, Monza, 2004). W. Morris, in On Art and Socialism (London, 1947). G. Mounin, Clef pour la linguistique (Seghers, Paris, 1972). H. Pagels, The Cosmic Code (Simon and Schuster, New York, 1982). E. Pessa, in Proceedings of the First Italian Systems Conference, Ed. G. Minati, (Apogeo, Milan, 1998). I. Prigogine, Le leggi del caos, 1993 (Laterza, Bari, 2003). H. Putnam, Reason, Truth and History (Cambridge University Press, Cambridge, 1981). H. Putnam, The Many Faces of Realism (Open Court, La Salle, 1987). D. Wiggins, Sameness and Substance (Blackwell, Oxford, 1980).
PROCESSES OF EMERGENCE IN ECONOMICS AND MANAGEMENT
This page intentionally left blank
MODELING THE 360° INNOVATING FIRM AS A MULTIPLE SYSTEM OR COLLECTIVE BEING VÉRONIQUE BOUCHARD EM LYON, Strategy and Organization Dpt. 23 av. Guy de Collongue, 69132 Ecully Cedex, France Email:
[email protected] Confronted with fast changing technologies and markets and with increasing competitive pressures, firms are now required to innovate fast and continuously. In order to do so, several firms superpose an intrapreneurial layer (IL) to their formal organization (FO). The two systems are in complex relations: the IL is embedded in the FO, sharing human, financial and technical components, but strongly diverges from it when it comes to representation, structure, values and behavior of the shared components. Furthermore, the two systems simultaneously cooperate and compete. In the long run, the organizational dynamics usually end to the detriment of the intrapreneurial layer, which remains marginal or regresses after an initial period of boom. The concepts of Multiple Systems and Collective Beings, proposed by Minati and Pessa, can help students of the firm adopt a different viewpoint on this issue. These concepts can help them move away from a rigid, Manichean view of the two systems’ respective functions and roles towards a more fluid and elaborate vision of their relations, allowing for greater flexibility and coherence. Keywords: innovation, organization, intrapreneurship, models, multiple systems, collective beings.
1. Introduction Confronted with fast changing technologies and markets and with increasing competitive pressures, business firms are now required to innovate fast and continuously [1,2,3]. Conventional innovation processes led by R&D and Marketing departments are not sufficient to meet these requirements. In effect, conventional innovation processes tend to be rigid, slow and focused on technology and product development whereas what firms need now is flexible, rapid and broad scope innovation, encompassing all the key elements of their offer, management and organization [4,2,5,3]. Firms have to improve and transform the way they produce, manage client relations, ensure quality, configure their value chain, manage employees, develop competencies, generate revenues, etc. They have to innovate on all fronts and become “360° innovating
103
104
V. Bouchard
firms”. To this end, more nimble innovation processes are required and, above all, innovation must take place in every department and division of the firm. The 360° innovating firm has to rely on the creativity, talent, energy and informal network of its employees. In the 360° innovating firm, employees must be able to autonomously identify opportunities and re-combine the resources and competences that are spread throughout the various departments and sites of the firm to seize these opportunities. Sales and service persons, in frequent contact with clients can identify emerging needs and business opportunities, computer experts can grasp the value creation potential of new IT developments, experts in manufacturing and logistics can propose new solutions to concrete problems, finance experts can help assess costs and benefits, idle machines can be used to produce prototypes, foreign subsidiaries can come up with low-cost solutions, etc. 2. Intrapreneurs and the 360° innovating firm Opportunities and inventions that are identified and developed outside the conventional innovation track cannot succeed without a champion, someone who strongly believes in the project and is personally committed to its success. 360° innovation relies, therefore, on the emergence of internal entrepreneurs or “intrapreneurs” from the pool of employees [6,7,8,9,10]. Internal entrepreneurs or “intrapreneurs” are employees who identify internal or external value creation opportunities and seize the opportunity relying first and foremost on their own talent, motivation and network. Intrapreneurs can take advantage of the financial and technical resources as well as the wide array of expertise and competencies the firm detains. However the life of intrapreneurs is far from easy: they often cumulate the difficulties faced by entrepreneurs (understanding the market, improving the offer, creating a sound economic model, managing a team, making the first sale, etc.) to the difficulties that arise when one pursues an original project within a rigid and risk adverse environment. 3. The intrapreneurial process as a superposed organizational layer In their quest for 360° innovation, a number of firms try to encourage the emergence of intrapreneurs. To do so, they set up structures, systems and procedures whose goal is to encourage, identify, support and select intrapreneurial initiatives [11,12,13].
Modeling the 360° Innovating Firm as a Multiple System or Collective Being
The firm IE
105
The intrapreneurial proces s
Formal processes and established circuits
The formal organization
Slack resources
The environment Figure 1. Two interacting systems, the formal organization (FO) and the intrapreneurial layer (IL).
By doing so, firms de facto superpose a different and potentially conflicting organizational layer (the intrapreneurial process) over the formal organization [14,15,12,11,16,17]. The two layers can be seen as two systems interacting in a complex way (see Figure 1). 3.1. Two different but highly interdependent systems The formal organization (FO) performs well-defined tasks using well-identified procedures, people and resources, while the intrapreneurial layer (IL) assembles people and resources located anywhere in the organization (even outside the organization) on a ad hoc basis, relying extensively on informal networks (see Table 1). The two systems, however, have numerous contact points since most people and resources involved in the IL “belong” to the FO. Most of the time, the intrapreneur herself is a member of the FO, where she continues to fulfill her tasks, at least in the initial phases of her project. The relations between the two systems are complex: 1. The formal organization enables the emergence and unfolding of the intrapreneurial process by 1) granting autonomy to the internal entrepreneur, 2) providing most of the resources he uses, and 3) giving legitimacy to his
106
V. Bouchard Table 1. Two very different systems.
The formal organization (FO)
The intrapreneurial layer (IL)
Well defined set of elements and interactions
Fuzzy, constantly evolving set of elements
Relatively stable over time
Temporary
Planned (top down)
Emergent (bottom up)
A priori resources and legitimacy
Resources and legitimacy are acquired on the way
2. 3. 4.
project. In other words, system IL is embedded in system FO on which it depends for its survival and success. However system FO is also dependent on system IL. In effect, the intrapreneurial layer allows the formal organization to 1) overcome some of its structural limitations and 2) reach its objective of fast 360° innovation. The intrapreneurial layer is often competing for resource and visibility with some parts of the formal organization and often enters in conflict with it. (IL competes with a subsystem of FO). Finally, the intrapreneur and more generally all those who contribute significantly to the IL, can be rejected or envied by formal organization members because their values, work styles, status are different. (The culture – norms, values and behaviors – of system IL and that of system FO are conflicting).
3.2. Managing the intrapreneurial process The single intrapreneurial initiative is managed – primarily – by the intrapreneur himself. However the intrapreneurial process as an organizational dynamic, a sustained flow of intrapreneurial initiatives, has to be managed by the top management of the firm. Let us review what goals these actors pursue, the levers they control and some of their main strategic options. 3.2.1. Top management Top managers pursue several objectives. Among them: • Multiply the number of intrapreneurial initiatives; • Improve their success rate; • Contain the risks and costs; • Leave the formal organization (FO) “undisturbed”; • Provide concrete examples of the desired behavior to members of the formal organization.
Modeling the 360° Innovating Firm as a Multiple System or Collective Being
107
Some of their most significant control variables are: • The level and type of support granted to internal entrepreneurs; • The conditions at which support is granted; • The definition of desired/ undesired, licit/illicit intrapreneurial initiatives; • The creation of formal links between the two layers. Their strategic options can be positioned along two continua: • Granting high autonomy to employees vs. granting moderate autonomy to employees • Providing strong formal support to intrapreneurs vs. providing minimal formal support to intrapreneurs. • Relying essentially on informal links between the IL and the FO or relying on both informal and formal linkages. 3.2.2. Intrapreneurs Internal entrepreneurs generally seek to maximize their chances of success by: • Securing access to needed resources and competencies; • Minimizing conflict with the formal organization; • Getting the support of members of the leading coalition. Some of their most significant control variables are: • The level of strategic alignment of their project; • Their level of self-sufficiency/autonomy vis-à-vis the formal organization (FO); • The degree of visibility of their project. Here again their strategic options can be positioned along various continua: • Pursuing a strategically aligned project vs. pursuing a not so strategically aligned project; • Being highly self sufficient vs. trying to get formal help and support early on; • Keeping the visibility of the project low vs. giving the project high visibility. 4. A recurrent and bothering problem There are numerous empirical evidences that, over time, systems dynamics play strongly against the Intrapreneurial Layer, which remains marginal or shrinks after an initial period of boom [12,18,11,16,13,17].
108
V. Bouchard
In spite of the declarations and measures taken by the top management to encourage intrapreneurial initiatives, many intrapreneurs face so many difficulties that they renounce to their project. Some fail for reasons that can be attributed to the weakness of their project or their lack of skills but many fail because of the insurmountable organizational or political obstacles they face. And without a small but growing number of visible successes, the intrapreneurial dynamic soon comes to a halt. Some recurrent problems faced by intrapreneurs: • Parts of the formal organization actively or passively oppose the intrapreneur (including the boss of the intrapreneur); • Excessive work load, no team, no help; • The intrapreneur cannot obtain the needed financial resources; • The intrapreneur cannot secure solid and lasting top management support; • The intrapreneur is isolated and does not benefit from the advice of mentors or fellow intrapreneurs; • The intrapreneur is not able to simultaneously face external (market) and internal (political) challenges. A critical issue for firms interested in promoting 360° innovation, therefore, is to realize that such a negative dynamic is at play and find ways to counteract it. If we understand better the complex interactions between the two systems (FO and IL) and their main agents (top management, intrapreneurs, other managers), we might be able to find ways to reduce the pressures experienced by intrapreneurs thus favoring innovation and the creative re-deployment of resources within the firm. New concepts in system modeling such as multiple systems (MS) and collective beings (CB) could help us in this endeavor. 5.
New concepts in system modeling: multiple systems (MS) and collective beings (CB)
We propose to try and apply to the FO-IL systems dynamics depicted above the concepts of Multiple Systems (MS) and Collective Beings (CB) developed by Minati and Pessa [20]: • A MS is a set of systems established by the same elements interacting in different ways i.e., having multiple simultaneous or dynamic roles. Examples of MS include networked interacting computer systems performing cooperative tasks, as well as the Internet, where different systems play different roles in continuously new, emerging usages.
Modeling the 360° Innovating Firm as a Multiple System or Collective Being
•
109
A CB is a particular MS, established by agents possessing the same cognitive (natural or artificial) system. Passengers on a bus and queues are examples of CB established dynamically by agents without considering multiple belonging. Workplaces, families and consumers are examples of CB established by agents simultaneously and considering their multiple belonging.
These new concepts can help us reframe the challenges faced by the “360° innovating firm” which could be approached as a problem of increasing the degrees of freedom of various systems simultaneously involved in innovation i.e., increasing the number of representations simultaneous available to the various agents. For instance, we may design the Intrapreneurial Layer not only in opposition to the Formal Organization, but also considering the possibility of: • redefining problems by distinguishing between conventionally established differences and intentionally established differences between the two systems, for the purpose of systems modeling; • distinguishing between subsystems and systems of the multiple system; • not only establishing a distinction between functional relations and emergent relations but also mixing and managing the two. The proposed approach can help us move away from a rigid, Manichean view of the systems’ respective functionalities and roles towards a more fluid and elaborate vision of their relations, allowing for greater flexibility and coherence when tackling the organizational and managerial issues facing the 360° innovating firm. Let us illustrate these new concepts by applying them to the system “firm” in its productive function. Aspects such as production, organization, cost effectiveness, reliability and availability can be viewed: • as different properties of the firm viewed as a single system or as a set of subsystems, or • as different elements of the MS “firm”, constituted by different systems established by the same elements interacting in different ways. In the second eventuality, production will be considered as an autonomous system possessing its own independent representation and dynamics and not only a property of the system “firm”, itself dependant on organization. In the same way, quality is an autonomous system and not only an effect of production, and so on. The different dimensions are not only viewed as functionally related aspects of the system or of different subsystems, but also as different
110
V. Bouchard
combinations of the same elements (e.g., human resources, machines, energy, rules and facilities) forming different systems (e.g., production, organization and quality). What difference does it make ? In this case, we may act on a given system of the MS not only in a functional way, but also via the complex web of interactions that emerge from its elements’ multiple belonging. From a functional standpoint, the increasing of production may reduce quality and cost-effectiveness affect organization. In an MS perspective, quality is not an effect of production, but an autonomous property of elements also involved in production. Quality, in this case, will derive from design rather than production procedures. It becomes possible to consider properties laterally rather than functionally. Properties as quality, reliability and cost effectiveness are not properties of a single system, but properties of the whole. In the same way, human resources will be considered as agents able to pursue multiple roles in producing, organizing, communicating, marketing, developing new ideas, controlling quality and so on. In the “360° innovating firm”, no agent has a single specific role but rather multiple, dynamic, contextdependent roles. 6. Applicability of DYSAM The Dynamic Usage of Models (DYSAM) has been introduced in Minati and Brahms [19] and Minati and Pessa [20] to deal with dynamic entities such a MS and CB. The dynamic aspect of DYSAM relates to the dynamic multiple belonging of components rather than to the dynamic aspect related to change over time. DYSAM is based on simultaneously or dynamically model a MS or CB by using different non-equivalent models depending on the context. For instance, a workplace may be modeled in a functional way by considering the processing of input and the production of output; as a sociological unit by only considering interactions among human agents; as a source of waste, pollution and energy consumption; and as a source of information used for innovation. The concept of DYSAM applies when considering a property in the different systems of a MS or CB. Moreover, in this case, the models must take into account the fact that different systems are composed of same elements. In this way, dealing with quality in a system affects the other aspects not only in a functional way, but also because the same elements are involved in both. Depending on effectiveness, a firm may be modeled as a system of subsystems and/or as an MS or CB. For instance, the profitability of a firm cannot be modeled by using a single optimization function, linear composition of single different optimization functions, but rather by using a population (i.e., a system)
Modeling the 360° Innovating Firm as a Multiple System or Collective Being
111
of optimization functions continuously and dynamically established by considering context-sensitive parameters. DYSAM allows considering different, non-equivalent models, such as the ones related to profitability, reliability, availability, flexibility and innovation, as autonomous systems of a MS. 7. Conclusion Firms are nowadays required 1) to maximize return on assets, which implies strict performance control and efficient use of resources and 2) to innovate on all fronts (360° innovation), which implies local autonomy, trials and errors and patient money. In order to face these simultaneous and apparently contradictory requirements, several firms superpose an intrapreneurial layer to their formal organization. While the formal organization (FO) performs well-defined tasks using well-identified procedures, people and resources, the intrapreneurial layer (IL) assembles people and resources located anywhere in the organization (even outside the organization) on a ad hoc basis, relying extensively on informal networks to develop innovative projects. The two systems are in complex relations: if the IL is, to a large extent, embedded in the FO, sharing its human, financial and technical components, it also strongly diverges from it when it comes to representation, structure, values and behavior of some shared components. Furthermore, the two systems simultaneously cooperate and compete and frequently enter in conflict. In the long run, one observes that the organizational dynamic set forth usually ends to the detriment of intrapreneurial processes, which remain marginal or regress after an initial period of boom. The concepts of Multiple Systems and Collective Beings, proposed by Minati and Pessa, can help students of the firm adopt another viewpoint on the issues just described and tackle them differently. These concepts can help them move away from a rigid, Manichean view of the two systems’ respective functionalities and roles towards a more fluid and elaborate vision of their relations, allowing for greater flexibility and coherence when tackling the organizational and managerial issues facing the 360° innovating firm. The application of these concepts together with the related DYSAM techniques, could help students of the firm come to term with the multiple contradictions that arise from the mandatory adoption of multiple, non additive roles by the managers of 360° innovating firms. Acknowledgments I wish to express my gratitude to Professor Gianfranco Minati for his help and feedback on the paper.
112
V. Bouchard
References 1. P. Drucker, Innovation and Entrepreneurship (Harper Business, 1993). 2. G. Hamel, Harvard Business Review 77(5), 70-85 (1999). 3. J.P. Andrew, H.L. Sirkin, and J. Butman, Payback: Reaping the Rewards of Innovation (Harvard Business School Press, Cambridge, 2007).
4. P.S. Adler, A. Mandelbaum et al., Harvard Business Review, March-April, 134-152 (1996).
5. R.M. Kanter, Executive Excellence 17(8); 10-11 (2000). 6. G. Pinchot III, Intrapreneuring: why you don’t have to leave the corporation to become an entrepreneur (Harper and Row, New York, 1985).
7. R.A. Burgelman, Administrative Science Quarterly 28(2), 223-244 (1983). 8. D. Dougherty, C. Hardy, Academy of Management Journal 39(5), 1120-1153 (1996).
9. A.L. Frohman, Organizational Dynamics 25(3), 39-53 (1997). 10. P.G. Greene, C.G. Brush and M.M. Hart, Entrepreneurship Theory and Practice 23(3), 103-122 (1999).
11. Z. Block, I.C. Macmillan, Corporate venturing : creating new businesses within the firm (Harvard Business School Press, Boston, 1993).
12. R.M. Kanter, J. North et al., Journal of Business Venturing 5(6), 415-430 (1990). 13. V. Bouchard, Cahiers de la recherche EM LYON, N. 2002-08 (2002). 14. N. Fast, The rise and fall of corporate new venture divisions (UMI Research Press, Ann Arbor, 1978).
15. R.A. Burgelman, L.R. Sayles, Inside corporate innovation: strategy, structure and managerial skills (Free Press, New York, 1986).
16. P. Gompers, J. Lerner, in R.K. Morck, Ed., Concentrated Corporate Ownership (University of Chicago Press, Chicago, 2000).
17. V. Bouchard, Cahiers de la recherche EM LYON, N. 2001-12 (2001). 18. R.M. Kanter, L. Richardson, J. North and E. Morgan, Journal of Business Venturing 6(1), 63-82 (1991).
19. G. Minati, S. Brahms, in: Emergence in Complex, Cognitive, Social and Biological Systems, G. Minati and E. Pessa, Eds., (Kluwer, New York, 2002), pp. 41-52.
20. G. Minati, E. Pessa, Collective Beings (Springer, New York, 2006).
THE COD MODEL: SIMULATING WORKGROUP PERFORMANCE
LUCIO BIGGIERO (1), ENRICO SEVI (2) (1) University of L’Aquila, Piazza del Santuario 19, Roio Poggio, 67040, Italy, E-mail:
[email protected],
[email protected] (2) LIUC University of Castellanza and University of L’Aquila, Piazza del Santuario 19, Roio Poggio, 67040, Italy E-mail:
[email protected] Though the question of the determinants of workgroup performance is one of the most central in organization science, precise theoretical frameworks and formal demonstrations are still missing. In order to fill in this gap the COD agent-based simulation model is here presented and used to study the effects of task interdependence and bounded rationality on workgroup performance. The first relevant finding is an algorithmic demonstration of the ordering of interdependencies in terms of complexity, showing that the parallel mode is the most simplex, followed by the sequential and then by the reciprocal. This result is far from being new in organization science, but what is remarkable is that now it has the strength of an algorithmic demonstration instead of being based on the authoritativeness of some scholar or on some episodic empirical finding. The second important result is that the progressive introduction of realistic limits to agents’ rationality dramatically reduces workgroup performance and addresses to a rather interesting result: when agents’ rationality is severely bounded simple norms work better than complex norms. The third main finding is that when the complexity of interdependence is high, then the appropriate coordination mechanism is agents’ direct and active collaboration, which means teamwork. Keywords: agent-based models, bounded rationality, law of requisite variety, task interdependence, workgroup performance.
1. Introduction By means of the COD (Computational Organization Design) simulation model, our main goal is to study the effects of the fundamental modes of connection and bounded rationality on workgroup performance. Therefore, we are at a very micro-level of analysis of a theory of interdependence and coordination. Technological interdependence is one of five types of interdependence,1 the others being the behavioral, informational, economic, and juridical. Technological interdependence coincides with task (or component) interdependence, when it is referred at the micro-level of small sets of technologically separable elementary activities. Task interdependence is 113
114
L. Biggiero and E. Sevi
Figure 1. Modes of connection.
determined by several factors, which occur at network, dyad, and node levels.1 One of the most important factors is just the mode of connection, that is the parallel, sequential or reciprocal ways in which tasks and/or agents interactions can take place. Two (or more) tasks can be connected by means of one (or more) of these three modes of connection (Fig. 1): (1) parallel connection, when tasks are connected only through its inputs and/or outputs; (2) sequential connection, when the output of one task is the input of the following; (3) reciprocal connection, when the output of a task is the input of the other and vice versa. This categorization coincides with those that, in various forms and languages, has been proposed by systems science, calling them systemic coupling [2,3]. It’s noteworthy to remind that they exhaust any type of coupling and that, as underlined by cybernetics, only the reciprocal mode refers to cybernetic systems, because only in that case there is a feedback. Indeed, into the systems science the reciprocal connection is usually called structural coupling, while into the organization science it is called reciprocal [4,5]. According to Ashby [6] the elementary and formally rigorous definition of organization is the existence of a functional relationship between two elements. For some links are more complex then others, the degree of complexity resides into the form and degree of constraint connections establish between elements. In fact, in parallel connection systems are almost independent (Fig. 1), because they are linked just through resources (input) sharing and/or through the contribution to the same output. These are very weak constraints indeed. The strength of the constraint increases moving to the sequential connection because the following system depends on the output of the preceding one. It is not just a “temporal” sequence, but rather a sequence implied by the fact that the following operation acts on the output of the previous one. Thus, according to
The COD Model: Simulating Workgroup Performance
115
Figure 2. The model structure.
the definition of complexity in terms of the degree of constraint, the sequential is more complex than the parallel connection. Finally, the reciprocal connection has the highest degree of constraint because it operates in both directions: system B depends on the input coming from A’s output, and vice versa. Here we see a double constraint, and then the reciprocal is the most complex connection. Moreover, the double constraint makes a radical difference, because it constitutes the essence of feedback, and therefore the essence of the cybernetic quality. Lacking the feedback relationship, parallel and sequential connections are not cybernetic interdependencies. This reasoning leads to argue that the ranking of the three basic types of interdependencies in ascending order of complexity is the following: parallel, sequential, reciprocal. This way Thompson’s [4] and Mintzberg’s [5] arguments are supported and clarified by cybernetics. Biggiero and Sevi [7] formalize these concepts, and link them to organization and cybernetics literature. Moreover, they analyze the issue of time ordering, which expands the number of the fundamental modes of connection from three to seven. However, notwithstanding these developments and a vast literature no any operative and conclusive demonstration has been supplied, neither through empirical data or algorithmically. The first aim of the COD model, therefore, is just to do it in the virtual reality. In section three it is shown that, in order to achieve a satisfying performance, a workgroup executing tasks characterized by sequential or, even more, reciprocal connections should employ progressively more and more complex coordination mechanisms. Indeed, without them performance is very low in regime of parallel connection, and near zero in regime of sequential or
116
L. Biggiero and E. Sevi
reciprocal connection. Moreover, in section four limits to agents’ computational capacity are introduced, and it is evidenced that they sharply decrease group performance. Finally, it is shown that when such limits are severe, weaker coordination mechanisms perform better. 2. The COD model architecture and the methodology of experiments 2.1. The general structure The COD modela has a hierarchical structure, which sees on the top two “objects”: TaskCreator and WorkGroup. The former generates modules while the latter manages agents’ behavior. Through a frequency chosen by the model user, TaskCreator gives the quantity and quality of modules to be executed. Thus, by simulating the external environment, it operates as an input system for the group of workers. By creating modules with more or less tasks, it defines also structural complexity (Fig. 2). In this simple version of the COD model we suppose that both the structure of modules and agents’ behavior will not change. A module is constituted by tasks, which are made by components. In each interval, that is in each step of the simulation, one component per each task is examined, and eventually executed. Here we make the simplified assumption that each single module is characterized by only one mode of connection, and that between modules, whatever is its inner interdependence, there is only parallel interdependence. Thus, the mode of connection refers to the relationships between the tasks of a single module. Finally, it is assumed that the components of each single task, regardless of its number, are always connected by a sequential interdependence. It configures a workgroup producing independent modules, which are not given at once at the beginning, but instead supplied progressively at each interval according to a given frequency. Apparently, this is a situation rather different from that usually described in the literature on technology, modularity or production management. There, few complex products and its parts are planned and available as a stock at the beginning. Therefore, in the language of the COD model, many modules are connected in various ways to build a complex output. Conversely, here, many (relatively simple) products (modules) a
The program running the model is available on the URL of Knownetlab, the research center where the authors work: www.knownetlab.it. To run the program is needed the platform and language LSD (Laboratory on Simulation Development), available at www.business.auc.dk/lsd. With the program even some indications on its concrete handling are given. Anyway, the authors are available to give any support to use it and to questions concerning the topics here addressed.
The COD Model: Simulating Workgroup Performance
117
Table 1. Agents’ behaviors. Behaviors
Description
Search
Looking and engaging for a component (of a module’s task)
Execution
Working on that component
Inactivity
Being locked into a component
are supplied alongside the simulation. Rather than a car or an electric appliance, the situation simulated by the COD model resembles more a library small workgroup. Books are modules, and cataloguing, indexing, and placing in the right place of shelves are tasks, with its own components, that is elementary operations like checking, writing, etc. A task can be in one of the following states: • not-executable, because – in the case of reciprocal connection – it is waiting for a feedback, or because – in the case of sequential connection – its preceding task has not been executed; • executable, because there is parallel connection or – in the case of sequential connection – the preceding task has been executed or – in the case of reciprocal connection – the feedback is ready. 2.2. Agents’ behavior At any interval each agent can do one of three things (Tab. 1): searching, that is looking for a component (of a task) to work on; working in the component where she is currently eventually engaged; being inactive, because she cannot do any of the previous two things. In this basic version of the model we suppose that all agents are motivated to work and that they have all the same competencies. The latter assumption will be implicitly modified by the introduction of norms, which force agents to follow a specific behavior. These norms can be interpreted as coordination mechanisms, and have been set up in a way to improve agents’ behavior, so to increase group performance. Searching is performed by an agent who is not already engaged, and therefore looking for a (component of a task of a) module. Such a searching consists in checking all components of all tasks of all modules existing in that specific interval in TaskCreator, and then in randomly choosing one of them. If she finds a component, then she engages in that same interval, and in the following interval she works it out. If she doesn’t find any free component, then she waits for the next step to start a new search. Hence, searching and engaging activity takes at
118
L. Biggiero and E. Sevi
least one interval. Of course, only one agent can engage into the same component. The agents executing a certain component will finalize the corresponding task by moving to the next component until the whole task is completely executed. In this case, while moving from a component to the next one into the same task there are no intervals spent for searching. Once ended up the last component of the task, the agent will be free to search a new component in a new task of the same module or of another module. The third possible behavior is the inactivity. It means that, temporarily or definitely, she cannot work out the component she has chosen and has been engaged. This situation occurs in one of two cases: (i) she doesn’t find any free task to be engaged in; (ii) she engages in a component of a sequential task whose preceding task has not been executed; (iii) she chooses a component of a task which is connected in reciprocal interdependence with other tasks, whose feedback is missing or delaying. In other words, she needs the collaboration of other agents, who at the moment (or definitively) are not available. It is supposed that in each step an agent can operate (work out) no more than one component, while two or all (three) agents can work on the same module. 2.3. The formalization of the modes of connection Let’s consider the two tasks X and Y, and the following formalisms: • xt and yt represent the state at time t respectively for X and Y. They indicate the state of advancement of the work of the two tasks; • α and β represent the number of components constituting respectively the tasks X and Y. In our model we consider component length of 1 step; • px and py indicate the number of components ended up into a task. A task is considered ended up when its last component has been completed. In our case X task is executed when px = α and Y task when py = β; • Ca,t indicates the contribution that the agent a provides at time t. Once engaged into a task, at each step an agent increases the degree of advancement of the task and reduces the number of remaining components. Generally, the value of C is not a parameter because it depends on the characteristics of an agent in each specific step. However, in this basic version of the model, C is assumed as a stationary value equal to 1. 2.3.1. Parallel connection This mode of connection is characterised by indirect connection through the common dependence (complementary or competitive) from the same inputs or
The COD Model: Simulating Workgroup Performance
119
through the contribution to the same module’s output. Tasks are interdependent because they are organised (interested) to achieve (contribute to) the same output. Agents engage into a specific task and execute it needless any input or feedback from other tasks. Once engaged into a task, the agent proceeds until its end. In formal terms:
for the task X
if p x < α then xt = xt −1 + Ca , t else xt = xt −1
for thetask Y
if p y < β
then yt = yt −1 + Ca , t else yt = yt −1. At each step the state of advancement increases by the value corresponding to the contribution supplied by the agent who works that task. Once all the components are completed, that is when p x = α for X task and p y = β for Y , the state of advancement stops in a stationary value. Tasks indirect dependence on inputs is represented by the fact that once agent ai is engaged into X she cannot engage into Y . It’s clear also the indirect dependence on outputs because the two components contribute to the outcome of the whole module ( xt + yt ). Let’s suppose tasks X and Y were made respectively by three and four components ( α = 3 and β = 4 ). Let’s suppose that agent a1 engages into the former task at time t1 while agent a 2 engages into the latter task at time t3 . Each agent employs one step into searching the tasks before starting working in the next step. Agent a1 starts the execution of the X task at time t 2 and, ending up a component in each step, completes the task after 3 steps at time t 4 , when the number of completed components reaches the number of components of the task ( p x ≥ α = 3 ). In the same way the Y task is ended up after four steps at time t 7 , when p y ≥ β = 4 . The whole module is considered completed at the major time ( t 7 ), with a state of advancement equals the sum of the state of the two tasks ( x7 + y7 = α + β ). 2.3.2. Sequential connection It is characterised by the fact that the output of a system -a task, in the present model- enters as input into the following system. This is a direct asymmetric dependence relationship. In formal terms:
120
for the task X
L. Biggiero and E. Sevi
if p x < α then xt = xt −1 + Ca ,t
for the task Y
else xt = xt −1 ; if p y < β and
px ≥ α
then yt = yt −1 + xt −1 + Ca , t else yt = yt −1 ; As in the parallel connection there is also an indirect interdependence related to resources sharing, because if agent ai is engaged in X she cannot work in Y . The task Y depends entirely on X either because it takes into account the state of X in the previous step ( yt = yt −1 + xt −1 + Ca ,t ) or because if all components of X are not executed ( p x < α ), then Y cannot start ( yt = yt −1 ). Workflow crosses sequentially both tasks and the final output is obtained only with the completion of task Y . It is clear the asymmetry of the relationship: while task X acts autonomously, task Y depends on (adapts to) X ’s behaviour. Let’s suppose the task X and Y were made by three components ( α = 3 and β = 3 ). Let’s suppose that agent a2 engages into Y task at time t1 while agent a1 engages into X at time t3 . Since the starting of the Y task needs the output of X , at time t 2 agent a2 will not be able to start working on Y . In fact, because a1 engages into the former task at time t3 , execution of X task is started yet on time t 4 and, as in the parallel case, it is ended up after three steps at time t 6 (when p x ≥ α = 3 ). Only from the next time t 7 , agent a2 can start the execution of Y that is completed after three steps at time t9 (when p y ≥ β = 3 ). The whole module is considered executed at the end of the latter task at time t9 , with a state of advancement of works ( yt ) equals to 6. 2.3.3. Reciprocal connection The reciprocal interdependence is characterised by a situation like the sequential connection plus at least one feedback from the latter to the former taskb. The output of a task enters as input into the other, and vice versa. Therefore this connection can be truly considered a kind of interdependence, because the dependency relationship acts in both directions and employs the output of the
b
It can be hypothesised (and indeed it is rather common) a double feedback from the former to the latter component. The question concerns from which component comes out the final outcome. This double or single loop becomes rather important and complex when considering more than two systems and when dealing with learning processes.
The COD Model: Simulating Workgroup Performance
121
connected systemc. The major complexity of this connection respect to the previous ones mirrors in the formalisation too. The formalisation of the two tasks is symmetric.
for the task X
if ( p x = 0 AND p y = 0) OR ( p x ≤ p y AND p y < α ) then xt = xt −1 + yt −1 + Ca , t
for the task Y
else xt = xt −1 ; if ( p x = 0 AND p y = 0) OR ( p y ≤ p x AND p y < α ) then yt = yt −1 + xt −1 + Ca , t else yt = yt −1 ;
The function of dependence is formalised in the same way of the sequential connection, but now it occurs in both tasks. The execution of a task should take into account what happened in the other ( xt −1 + yt −1 ). They should exchange its outputs at the end of each component, so that workflow crosses over time both tasks. For instance, if the task X worked out component 1, in order to execute component 2 it needs to get from task Y the output of its component 1. The work on a task cannot take place until in the other at least the same number of components is executed. In the formalisation this is represented by the following conditions: if ( p x ≤ p y ), and if ( p y ≤ p x ). Thus, tasks exchange feedback as many as the number of its components. To illustrate the model, let’s suppose a module made by two tasks, both constituted by three components ( α = 3 and β = 3 ). Let’s suppose that agent a1 engage into the second task at time t1 , and that in the next step t 2 she works out the first component, while agent a3 , still at time t 2 , engages into the first task. At next time t3 , in order to proceed with the second component ( yt = yt −1 + xt −1 + Ca ,t ), the Y task needs to work out the output of the first component of the X task. The execution of the second component of the Y task cannot start until on the X task a number of components at least equals to the Y task have not been worked out. In formal terms when ( p y ≤ p x ). This way the second task results temporarily locked-in and the agent a1 cannot do anything else than being inactive in time t3 and try to work again in the next time. At time t3 agent a3 executes the first component of first task and, by giving its output to the second task, allows the execution of the second component by agent a1 at time t 4 . Remind that a module is considered completed only when all its tasks have been worked out. c
Actually both the conditions must hold.
122
L. Biggiero and E. Sevi
It is important to underlie that, even just from looking carefully at the formalisation, the reciprocal interdependence is much more sensitive to the risk of inactivity and/or delay. Tasks are really interdependent both on the exchanged output and on the time at which the transfer happens. Though in principle a module characterised by reciprocal connection can be worked out by a single agent moving between tasks, this implies delays for searching and engaging. Therefore, supposing modules of the same length, the best performance occurs when all tasks are taken at the same moment by an agent, because simultaneity eliminates delays.
2.4. Norms and coordination mechanisms Norms and coordination mechanisms would not be necessary if the pure chance were sufficient to assure a satisfying, even if not maximum group performance. However, as we will discuss in next section, through our model we experimented that, without some elementary norm, the performance is unsatisfying in regime of parallel connections and nearly zero for the other two types of connection. We have hypothesized six norms (Tab. 2), and its corresponding coordination mechanisms [5]. Cooperation Norm guarantees that every agent is willing to cooperate: nobody defects its job or voluntarily defeats the colleagues purposes. Finalizing Norm drives agents to complete the task in where they are engaged by moving from the current to the next component. As we have shown in 2.2 and 2.3 paragraphs, agents can be engaged in not-executable tasks. The Anti-inactivity Norm works out this inactivity by prescribing that agents leave locked task and search for another one. Since this norm works out the situation of inactivity but doesn’t prevent it, we introduce the Outplacement Norm able to prevent choosing locked components. This norm works on sequential connection by forbidding agents to pick tasks following tasks not yet executed, while in reciprocal connection by driving agents to avoid tasks that are waiting for feedback. The Focusing Norm prescribes that agents give precedence to incomplete tasks by choosing tasks of modules in progress. More complex is the Norm of Collaboration, since it recommends that agents choose with priority tasks currently under working, that is, incomplete tasks on which other agents are currently engaged. The first five norms are forms of weak planning focused on tasks, because agents are told how to search and cope with tasks, overlooking other agents. However, they are weak forms of planning, because they don’t specialize agents on a specific kind of task or module, and neither they are directed by a
The COD Model: Simulating Workgroup Performance
123
Table 2. Norms and coordination mechanisms. Type of norm 1. Cooperation Norm 2. Finalizing Norm 3. Anti-inactivity Norm (1 + 2 +3) 4. Anti-trap Norm (1+2+3+4) 5. Focusing Norm (1+2+3+4+5) 6. Collaboration Norm (1+2+3+4+5+6)
Description Every agent does work (nobody defeats, free-rides, defects or loafs) Once started a task, agents must end it up moving from the current to the next component. Agents forced to inactivity because engaged in a locked task leave it immediately and move to search another task. Agents avoid to be engaged in locked tasks. In sequential connection they avoid tasks following tasks not yet executed, while in reciprocal connection they avoid tasks that are waiting for feedback. Agents give priority to choose tasks of modules in progress. Agents give priority to choose tasks of modules under working by other agents.
Corresponding coordination mechanisms Planning agents’ behavior Planning agents’ behavior Planning agents’ behavior Planning agents’ behavior
Planning agents’ behavior Favoring reciprocal adaptation
supervisor. In fact, the corresponding configuration of the workgroup is not hierarchical: it is a group of peers who do not coordinate directly one another. The sixth norm is qualitatively different, because it addresses precisely to agents’ collaboration, and thus it configure the group as a teamwork. These norms have been applied in a cumulative way, increasing complexity at each level. Some of them can be seen as a sub-set of the previous one. Higher complexity means that each norm implies more constraints than the previous one. This way to measure norm complexity equals that used to measure the complexity of the modes of connection. These constraints limit agents’ behavior, by addressing their efforts in a more effective and efficient way. By constraining behaviors many wrong choices are prevented, and thus, group performance increased. The issue of the role played by norms and coordination mechanisms pertains to the theory of coordination and not to the theory of interdependence: while the former deals with concrete or abstract objects, like systems, tasks, activities, etc., the latter deals with and refers to agents. Tasks (systems) are connected in a certain way, and that way is defined by the technology. Agents are coordinated in a certain way, and that way is defined by the norms that somebody (or the agents themselves) sets up and applies. The rationale for the need of norms and coordination mechanisms is more complex and cannot be extensively discussed here. We can just say simply that without them group performance results unsatisfying or just null. As we will argue in next section, the need for norms progressively more complex can be carried on as the
124
L. Biggiero and E. Sevi
demonstration that some type of connection is more complex than others. Moreover, norm complexity can be measured in the same way as the complexity of the modes of connection, that is in terms of the degree of constraint they put on agents’ behavior. The more restrictive, that is the more limiting agents’ choices, the more complex they are. Constraints limit agents’ behavior by addressing their efforts to a more effective and efficient way, so that wrong choices are prevented and (consequently) performance increased. Notice that the COD model does not deal with the issue of how norms are set up or emerge or eventually change. Moreover, respect to Mintzberg’s categorization [5] of coordination mechanisms, here managers’ supervision is not considered. Finally, components are supposed to be totally standardized, task eventually differ only for the number of components, and modules for the modes of connection among its tasks.
2.5. The methodology and working of the model Our model analyzes the effects of task interdependence by showing how, in order to get a satisfying performance, more complex connections require more complex norms. Group size is fixed at 3 agents, whose performance is measured by the following two indexes: • effectiveness: number of executed modules divided by the maximum number of executable modules. This index varies between 0 and 1; • efficiency: number of working steps divided by the maximum number of steps that can be employed in working. This index refers to the degree of use of inputs, which here is constituted by the agents’ time actually employed for working divided by the maximum number of steps that the group can employ for working, thus excluding the steps for engagements. Two aspects should be underlined: (i) these indexes are normalized on group size and structural complexity, so that effectiveness and efficiency are independent on them; (ii) maximum efficiency doesn’t necessary correspond to maximum effectiveness because only through an adequate coordination agents efforts can be addressed on the right tasks and resources can be minimized. Experiments are conducted on specialized groups, that is groups executing modules characterized by only one of the three modes of connections. Thus, performance of workgroups specialized on parallel modules (henceforward labeled as (P), sequential modules (S) and reciprocal modules (R) are analyzed separately. We run 10 simulations per each experiment, using different values for the random generator, so to prevent its possible influence on results. Data
The COD Model: Simulating Workgroup Performance
125
record the mean of performance indexes given by each series of 10 simulations. Each simulation lasts 900 intervals, and the module creation frequency is fixed on 0,50 and kept constant during the experiments, so that in each simulation are respectively generated 450 modules. Each module is composed by three tasks and each task by three components, that is, each task needs three working steps and one engaging step. Given these environment parameters, a group of three members has a production capacity of 675 tasks resulting in 225 modules. It means that, when the group performs at maximum, it completes 225 modules corresponding to its maximum productive capacity. According to a satisfying, and hence non-maximizing approach to social sciences [8,9,10], it is important to underlie two types of group performance: the conditions under which is reached respectively the maximum and the satisfying performance. In the former case maximum effectiveness and efficiency are achieved, while in the latter a satisfying performance could be enough. The rationale is that groups whose members work efficiently –that is, they don’t waste time in searching in vain or remain tapped in blocked components- and whose effectiveness is acceptable can be judged positivelyd. Our library small workgroup is therefore supposed at maximum to complete the storing of 225 books (modules) per year, that is 75 books per worker, which means 9 working days per book. At first sight this is not a hard goal to reach if books are simple, and if this would be the only activity of librarians. Saying that books are simple means, as we will show with our simulation results, that tasks (cataloguing, indexing, and placing) can be executed independently, that is in a parallel regime. In other words, each worker could independently work on one of the tasks related to the same book. The situation would change slightly whether task connections were sequential and dramatically whether were reciprocal. The difficulty would further increase whether agents’ ability to search among incompletely stored books were limited. In both cases, and of course much hardly when these conditions occur simultaneously, it would be really difficult to reach, if not the maximum, at least a satisfying performance without employing a set of complex coordination mechanisms. This is one of the things we are going to discuss in next section with the results of our experiments. d
Though it is not treated in this basic version of the model, a crucial point is that each norm has its own cost, and that more complex norms are more costly. Adding more and/or more complex (costly) norms increases effectiveness maybe up to the maximum, but it should be checked in each specific case whether the advantages coming from the maximum effectiveness do compensate the disadvantages coming from managing more numerous and eventually more complex norms.
126
L. Biggiero and E. Sevi
3. The effects of task interdependence We base on the following argument our ordering of modes of connection in terms of complexity: a mode is more complex than another one if, ceteris paribus, a workgroup operating in a certain mode of connection requires that, in order to reach the same performance, more numerous and complex coordination mechanisms should be employed. This is an indirect demonstration, based on computational experiments. Although this demonstration results a coherent ordering of modes of connection, it is different to that used above (see the introduction section) and in other works [5] where complexity is defined in terms of degree of connection constraint. Workgroups facing with parallel interdependence are not complex, and don’t need special devices to be effective. Conversely, in regime of sequential or reciprocal interdependence complexity grows up, and consequently coordination becomes more complex too. In spite of model simplifications, our analysis confirms the main suggestions coming from consolidated literature on this subject. The radical difference is that now such statements are based not on the “ipse dixit”, that is on the reputation of some scholar, but rather on an algorithmic demonstration. Incidentally, in our case the “scholars” were perfectly right in the main arguments. The results of our simulation model (Tab. 3) show that the Cooperation Norm is actually unable to help groups achieve adequate performance. Effectiveness is low whatever the mode of connection, while efficiency reaches a satisfactory level only in the parallel connection. The Finalizing Norm guarantees an almost satisfying performance only to the group working tasks connected with parallel interdependence, while in the other two cases agents are locked into tasks that cannot be executed. In the sequential regime too many agents engage into tasks successive to those not yet completed, and in the reciprocal interdependence they wait too long for a feedback from other tasks. In most simulations agents are almost all yet locked in the early steps, so that the group soon enters in an irreversible paralysis. The Anti-inactivity Norm prescribes that agents locked into a task leave it immediately (during the same interval when they engage in it) and search for another task. Hence, this norm works out the situation of inactivity but doesn’t prevent it, because it intervenes on the effects and not on the causes of inactivity. This norm leaves untangled the performance of the group working in parallel regime, because there is no inactivity to be worked out, and it improves a little bit the group working with the sequential mode. The performance of the
The COD Model: Simulating Workgroup Performance
127
Table 3. The effects of task interdependence, main results from the simulation model.
1. Cooperation Norm
2. Finalizing Norm (1 + 2)
3. Anti-inactivity Norm (1 + 2 +3)
4. Anti-trap Norm (1 + 2 +3+4)
5. Focusing Norm (1+2+3+4+5)
6. Collaboration Norm (1+2+3+4+5+6)
Effectiveness
Efficiency
P
0.16
0.53
S
0.03
0.19
R
0.01
0.23
P
0.66
1.00
S
0
0.01
R
0
0
P
0.66
1.00
S
0.54
0.76
R
0.16
0.58
P
0.66
1.00
S
0.65
1.00
R
0.29
0.74
P
1.00
1.00
S
1.00
1.00
R
0.79
0.79
P
1.00
1.00
S
1.00
1.00
R
1.00
1.00
reciprocal group remains definitely unsatisfactory. Indeed agents consume a lot of time in searching. In order to substantially improve the performance another norm becomes necessary. The Anti-trap Norm prevents choosing locked tasks. Its action requires that agents know the right sequence of execution of each task. While the group working in the regime of reciprocal tasks remains into its respective quasi-satisfying and bad performance, the group facing with sequential tasks reaches the same outcomes of the parallel regime. Through the Focusing Norm, which prescribes that agents choose prior incomplete modules, a sharp increase of performance is realized. It brings groups in both parallel and sequential to the maximum. Once focusing agents on the same modules, their efficiency pushes effectiveness. Even the group working with reciprocal tasks benefits substantially from this norm, but it doesn’t yet reach the maximum performance. To this aim it is necessary the (final) Norm of Collaboration, which forces agents choosing firstly modules currently under working, that is, incomplete modules on which agents are engaged in. This norm is more restrictive than the
128
L. Biggiero and E. Sevi
previous one, because, in order to get priority, it is not enough that a module is incomplete. It is even necessary that in that module other agents are currently working on. By adding this norm all the three types of interdependence reach the maximum performance. Notice that this norm is qualitatively different and more complex than the previous ones: it establishes coordination between agents, while the others intervene on the relationships between agents and tasks.
4. The effects of bounded rationality Our model is truly non-neoclassical, because: a) agents are rule followers 8, 9 and not utility maximizers; b) agents’ rationality is bounded. 10, 11, 12, 13, 14 Currently there is a hot debate concerning the ways to operationalize bounded rationality so to give it a sound and effective scientific status. Among the many ways in which this could be done and the many facets it presents, in our model we chose one of the simplest: agents’ computational capacity. The idea is that agents cannot look at and compute all executable modules, because they should be checked and computed in order to decide which one is better to be executed, and in which component of its tasks. Fig. 3 shows that the challenge to agents’ rationality sharply increases over time, at the half of group working life there are at least 112 incomplete circulating modules to be computed by each agent. In particular, the problem is generated by the progressive proliferation of incomplete modules and tasks. Let’s consider the best situation of efficiency and effectiveness: a group working in regime of parallel connections, where agents are coordinated by the Collaboration Norm, that is the most effective coordination mechanisms. And further, let’s suppose they have no computational limits in searching, checking, and computing modules and tasks (Fig. 3). Well, even in this most favorable case already after early 20% of group working life, that is after 180 intervals, around 45 incomplete modules do circulate (Fig. 3). In the best conditions – easiest regime and most effective norms- in a single interval each agent after 180 intervals should be able to compute 45 books, that is 135 tasks (405 components). The size of the decision space becomes too large very soon. Tables and shelves of our small library workgroup progressively and soon become filled in by incompletely stored books, and hence the degree of disorder grows accordingly. Every day becomes much harder to compute all the incomplete modules in order to choose the right one. Even in the best conditions the goal which at first sight appeared so easy to achieve becomes unreachable. If the yearly duration of working time of an average American worker were supposed
The COD Model: Simulating Workgroup Performance
129
Figure 3. Complete and incomplete modules circulating in a group working in parallel regime and coordinated by the Collaboration Norm and with unboundedly rational agents. NMod: number of created modules; NModCompl: number of executed modules; NModToCompl: number of uncompleted modules.
to be 1800 hours, in our 900 steps simulations each interval would correspond to 2 working hours. Now, the problem of agent computational capacity sets up in the following way: how many modules can be looked at and computed (analyzed) in one interval (2 hours)? This problem is crucial, because group effectiveness and efficiency depends essentially on this ability. An agent, in fact, has to “open” and check all incomplete modules in order to choose the task to be worked out. Let remind that each module is made by 3 tasks, each constituted by 3 components. At the very end, an agent will choose a specific component of a specific task of a specific module. She will do that taking into account the specific interdependence regime and the norms eventually ruling that group. Even assuming that librarians work of storing is effectively supported by computer programs, it could be reasonably supposed that in a 2-hour-length standard interval a highly competent (strongly rational or efficient) agent can check 40 modules (equals to 120 tasks, equals to 360 components), while a lowly competent or motivated just 2. Tab. 4 results show that if the group is in the parallel or sequential regime and it is coordinated through the most effective norm, then the achievement of a satisfying performance requires a computational capacity of at least 20 modules per each agent. Consequently, only with a very high rationality joined with the best coordination mechanism it is possible that a group deals with complex task to achieve a satisfying performance. If the regime is in the reciprocal mode, then it is requested the double capacity. Let say that this regime needs librarians with double competence respect to the other two regimes. If the group is coordinated by a less effective norm, then in the reciprocal regime the performance will
130
L. Biggiero and E. Sevi
Table 4. Agents’ computational capacity effects on group performance. Group 1: Coordination through Anti-trap Norm. Group 2: Coordination through Norm of Collaboration. Group 1 Agents’ computational capacity in terms of modules
Effectiveness R
Efficiency
P
S
P
S
R
2
0.66
0.66
0.28
1.00
1.00
0.74
5
0.67
0.66
0.28
1.00
1.00
0.74
10
0.65
0.64
0.28
1.00
1.00
0.74
20
0.65
0.65
0.29
1.00
1.00
0.74
40
0.66
0.66
0.30
1.00
1.00
0.74
80
0.66
0.65
0.29
1.00
1.00
0.74
Group 2 Agents’ computational capacity in terms of modules
Effectiveness
Efficiency
P
S
R
P
S
R
2
0.37
0.43
0.17
0.37
0.44
0.32
5
0.52
0.58
0.28
0.52
0.59
0.41
10
0.63
0.69
0.40
0.63
0.70
0.51
20
0.75
0.79
0.58
0.75
0.80
0.64
40
0.85
0.88
0.75
0.85
0.89
0.78
80
0.93
0.95
0.89
0.94
0.95
0.90
never overcome 30% even supposing a computational capacity of 80 modules per agent. It means that in presence of complex task interdependence and without collaborating at their best there is no way to reach a satisfying performance. Tab. 4 shows also another interesting result: at the lowest degrees of computational capacity it is more effective a simple coordination. In fact, when the regime when connection is parallel and computational capacity is less than 20 modules, Group 1 performs better than Group 2. Similarly, when the mode of connection is sequential or reciprocal, group 2 performs better than group 1 only if computational capacity exceeds 10. This is due to the fact that once rationality is really bounded, it is so also as concerning goal seeking behaviors. In fact, the norms of focalization and collaboration tell agents searching for two specific module categories: in progress and under the execution of other agents. However, the problem is that, if computational capacity is under a certain threshold, then the time consumed for searching those specific module categories is high. In particular, it becomes so high to vanish the advantages of being more goal seeking. The effectiveness of goal seeking behavior is more
The COD Model: Simulating Workgroup Performance
131
Figure 4. Effective combinations among bounded rationality and coordination complexity.
than compensated by the ineffectiveness of spending a lot of time in the searching activity. In other words, leading to less specific goals, that is allowing for a wider range of choices, less complex norms reduces the effort of searching and increases effectiveness of less rational agents. This explanation is confirmed by the analysis of efficiency, which actually is inversely dependent on the time spent in searching activity. When mode of connection is parallel or sequential, whatever the computational capacity of agents, the efficiency of group 1 is much higher than the efficiency of group 2. Similarly, when tasks are connected in a reciprocal way, group 2 scores a higher efficiency only if agents have a high computational capacity. Figure 4 summarizes the effective combinations among bounded rationality and coordination complexity. If the workflow arriving to the workgroup from external environment were better regulated than a flow of 0,5 module per step, then the performance would be, ceteris paribus, much higher, and in particular there would be less incomplete modules. In other words, in order to reach satisfying performances, workgroups would need less rationality or less (complex) coordination norms. At the extreme of a perfect regulation, the number of modules arrived from the external environment would coincide with those completed, and goal achievement would request no high rationality (just one module per agent per interval) and not all the norms. On the other hand, it is likely (but left as well to future research agenda) that a workflow more uncertain than a constant rate of module creation would require ceteris paribus more rationality or more complex coordination mechanisms. Such an increase of complexity associated with unstable workflow could be more than compensated by introducing agents’ learning in one or more of these three forms: i) better searching ability after completing tasks; ii) higher
132
L. Biggiero and E. Sevi
productivity when working on the same task; iii) major ability to collaborate as the number of successful collaborations grow over time. Actually, in this basic version of the model agents do not learn and therefore the corresponding forms of nonlinearity do not take place.
5. Conclusions Our simulation model tells us that groups working on complex interdependencies can reach an acceptable performance only by means of complex norms. Reciprocal interdependence can be managed satisfactory only through the Focusing Norm, and reaches the maximum only through the Norm of Collaboration, which actually includes five simpler norms. Sequential interdependence can be satisfactorily managed by applying the Anti-trap Norm, which includes three norms, and the parallel interdependence already with the Finalizing Norm. These results have also a normative side: it is redundant to employ complex norms to coordinate groups working on tasks connected by simple interdependencies. Further, and quite surprisingly, when agents’ rationality is severely bounded, the Collaboration Norm becomes not simply redundant, but indeed disadvantageous. In other words, coordination between agents does not work well when agents’ computational capacity is very low. Well focused taskbased coordination would perform better. Of course these results, and especially this normative side should be taken with prudence, because our model is still extremely simple. The introduction of knowledge exchange, competencies, personal conflicts, learning processes, and task specificity could change them significantly. However, by now we obtained four relevant findings: 1) an algorithmic demonstration of the ordering of interdependencies in terms of complexity; 2) an operationalization of bounded rationality in terms of computational capacity; and 3) an algorithmic analysis of the effects of bounded rationality on workgroup performance, which takes into account also task interdependence as a moderating factor; 4) an explanation of why and under what circumstances teamwork is a superior organization. This latter result confirms suggestions proposed, but left theoretically and empirically unproved in organization science. This version of the COD model is simple, because it supposes that agents have the same competencies and motivations to work, that they don’t learn, that they don’t make mistakes, that there are no behavioral issues (personal conflicts, leadership problems, etc.), and that there are no differences between tasks. Moreover, there are no externalities, neither other forms of nonlinear
The COD Model: Simulating Workgroup Performance
133
phenomena. However, despite its simplicity, this model is very helpful either because it is the ground on which building more complex and realistic models or because it already shows many interesting effects. Moreover, by the inner logic of simulation models, in order to be able to explain results coming from rich (complex) models it is necessary to know the behavior of the variables in simple (controllable) models.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
L. Biggiero, Towards a new theory of interdependence, (2008), www.ssrn.com . F. Varela, Principles of Biological Autonomy (Elsevier, NY, 1984). C.W. Churchman, The Systems Approach (Dell, NY, 1979). D.J. Thompson, Organizations in Action (Mc-Graw, New York, 1967). H. Mintzberg, The Structuring of Organization (Prentice-Hall, Inc. NJ, 1979). W.R. Ashby, in Principes of Self-organization, Ed. H. von Foerster and G.W. Zopf, (Pergamon Press, New York, 1962), p. 255. (Reprinted in Modern Systems Research for Behavioral Sciences, Ed. W. Buckley (Aldine Pub, Chicago, 1968)). L. Biggiero and E. Sevi, Modes of connection and time ordering definitions and formalisation of the fundamental types, (2008), www.ssrn.com . J.C. March, A Primer of Decision Making. How decisions happen (The Free Press, NY, 1994). J.C. March, in Organizational Decision Making, Ed. Z. Shapira (Cambridge UP, Cambridge, 1997), p. 9. J.C. March and H.A. Simon, Organizations (Blackwell Publishers, Cambridge, 1958). J. Conlinsk, Journal of Economic Literature 34, 669 (1996). B.D. Jones, Annual Review of Political Science 2, 297 (1999). D. Kahneman, American Economic Review 93, 1449 (2003). H.A. Simon, Organization Science 2, 125 (1991).
This page intentionally left blank
IMPORTANCE OF THE INFRADISCIPLINARY AREAS IN THE SYSTEMIC APPROACH TOWARDS NEW COMPANY ORGANISATIONAL MODELS: THE BUILDING INDUSTRY GIORGIO GIALLOCOSTA Dipartimento di Progettazione e Costruzione dell’Architettura, Università di Genova Stradone S. Agostino 37, 16123 Genoa, Italy E-mail:
[email protected] Infradisciplinary, besides interdisciplinary and transdisciplinary, applications, forms part of the definition of new company organizational models and, in particular, for networked-companies. Their related systemic connotations characterize them as collective beings, especially regarding the optimization of interactions between agents as well as context-specific interference. Networked-companies in the building industry (chosen to illustrate the infradisciplinary values of the systemic approach towards company organizational models) require, due to their nature and particularities of context, certain specifications: behavioral microrules of an informal nature, suitable governance of their sector, etc. Their nature and particular context thus determine, especially in the systemic view, the need not only for an interdisciplinary and transdisciplinary approach, but also an infradisciplinary one. Keywords: systemics, infradisciplinarity, building, company, organization.
1. Introduction The Discorso preliminare of Diderot and d’Alembert' s Enciclopedia states: “(...) there is not a single academic who would not willingly place the theme of his own study at the centre of all the sciences, in a way similar to primitive human beings who placed themselves at the centre of the world, convinced that the world had been made for them” (Diderot and d’Alembert, 1772 [3], cit. in Dioguardi, 2005 [5, p. 45], author' s translation). Even today, within various disciplines, this tendency persists: • sometimes with each academic emphasising those features, carriers of assumed generalist evidence, of his/her own area of interest; • sometimes insisting tout court upon the particular collocation of their own expertise; • in other cases claiming the irreducibility of that science towards more general laws of interpretation and conduction of the phenomena, or stressing assumed structural peculiarities, etc. 135
136
G. Giallocosta
From the same Discorso, the role assigned to philosophy in the “(...) encyclopaedic order of our knowledge ...” (Diderot and d’Alembert, 1772 [3], cit. in Dioguardi, 2005, [5, p. 44], author' s translation) also emerges. This encyclopaedic order, in fact, “(...) consists of collecting knowledge within the smallest possible space and putting the philosopher, so to speak, over and above this vast labyrinth, at quite an elevated observation point, from which he can completely embrace all the main arts and sciences ...” (Diderot and d’Alembert, 1772 [3], cit. in Dioguardi, 2005 [5, p. 44], author' s translation). This approach leads to the fragmentation of knowledge into various disciplines. This disciplinary fragmentation remains common practice in many areas of study and research. Nor usually can one be completely free of mistaken interpretations of: • generalism, where the aim is to recompose (and/or construct general theories) but with unacceptable simplifications; • specialism, whenever means and ends inspired with scientific rigour in the interpretation and management of peculiarities (and in the relative operational activities) lead to artificial sectorialisms. In this sense systemic, especially through interdisciplinary and transdisciplinary processes (thus producing interactions, correspondences and theories at higher levels of generalisation), also leads to recomposition amongst the various disciplines. And it does so not by replacing but by integrating the specialistic knowledge of the latter: for this reason, and to avoid mistaken assumptions about centrality in all the sciences (Diderot and d’Alembert, 1772 [3], cit. in Dioguardi, 2005 [5, p. 45]), infradisciplinarity is associated with interdisciplinarity and transdisciplinarity. It is well-known that: • interdisciplinarity occurs when problems and approaches of one discipline are used in another; • transdisciplinarity occurs when systemic properties are discussed and studied in a general way, as properties of models and representations (without reference to cases in specific disciplines). Infradisciplinarity has, in its turn, already been defined epistemologically as being, fundamentally, a set of prerequisites necessary for any scientific activity and as a method of investigation regarding intrinsic aspects of disciplines (Lorenzen, 1974 [11, pp. 133-146]), and here is taken, above all, as resources and assumptions activating and validating specialistic rigour. It is thus important that it be considered in research activities, to avoid genericism.
Importance of the Infradisciplinary Areas in the Systemic Approach …
137
A example of the risks of mistaken generalism, and of insufficient attention towards infradisciplinary aspects (even though it is not strictly ascribable to the scientific disciplines), is the case of company organisational systems applied to the construction sector. The latter shows, in fact, significant and substantial peculiarities, as will be seen below: peculiarities which, moreover, are expressed to the same extent in the nature of the construction companies (especially Italian ones), leading to an indication of their particular behavior as collective beingsa, and sometimes precursors (as will be seen) of completely unexpected operational effectiveness. Current theories of company organisational models, converging, above all, towards the concept of networked-company (Dioguardi, 2007 [6]), provide a general reference framework for applications/elaborations over widely differing sectors. In this way, the latter (through the use of models, behavioral rules, etc.) can manage and coordinate innovational leanings and inherent multiple phenomenologies of various scenarios: local industrial estates, virtual industrial estates, etc.b. Also this does not exclude any specificity of such phenomenologies.
a
b
The concept of collective being expresses, above all, situations in which the system which emerges from the interactions amongst the component parts may show behaviour very different from that of its individual components: so different, in fact, as to require the dynamic use of multiple systems (interacting and emerging from the same components). This concept, when applied to the reality of companies, allows the description of new models, and thus novel possibilities of intervening in processes involving the ability to (Minati and Pessa, 2006 [12, pp. 64, 70-75, 89-113, 365-368]): decide, store information, learn, act intelligently, etc. Collective beings also refers to collective behaviour emerging from that of autonomous agents which share a similar cognitive model, or at least a set of common behavioural micro-rules (Minati and Pessa, 2006 [12, pp. 110-111]). Virtual can mean potential. In the thinking of St. Thomas Aquinas and other scholastics: - an effect is formally contained within its cause, if the nature of the former is present in the latter; - an effect is virtually present within its cause if, while not containing the nature of the former, it may produce it (Minati and Pessa, 2006 [12, p. 362]). The concept of virtual company usually refers to an electronic entity, put together by selecting and combining organisational resources of various companies (Minati and Pessa, 2006 [12, pp. 365-368]). This concept also expresses an opportunity for active cooperation amongst several companies, often having the same target. In this sense, the constitution of a virtual company implies the development of a suitable network of relationships and interactions amongst those companies, developed on the basis of customer requirements (Minati and Pessa, 2006 [12, p. 366]). More generally, the concept of virtual district comprises simultaneous meaning of: - potential (and specific) organisational development, where the constituent members appear to have significant proximity only from an IT point of view (Dioguardi, 2005 [5, p. 127]), - quasi-stability (Garaventa et al., 2000 [8, p. 90]).
138
G. Giallocosta
In this sense, therefore, when coherently adopted within a systemic view, new company organization theories avoid the risks of any pretext regarding the centrality of individual areas of application: thereby reducing the possible effects of self-referentiality. 2. Systemics connotations of networked-companies Although prior to his later contributions, Dioguardi defines the networkedcompany as “(...) a series of laboratories (...) expressed as functional areas which overall provide a network of internal operational nodes. Amongst these (...) economic transactions develop almost leading to an internal quasi-market. The company is also open to external cooperation from other companies, through transactions with suppliers, and these (...) produce a network of supplier companies which are nevertheless independent and able to activate transactions in a real external market which remains, however, an expression of the supplier network of the general company (...) The company is thus structured in a reticular way allowing the coexistence of a hierarchical order together with the efficiency of the market within an organisational harmony like that in Goethe' s web of thought: “The web of thought, I' d have you know / Is like a weaver' s masterpiece: / The restless shuttles never cease, / The yarn invisibly runs to and fro, / A single treadle governs many a thread, / And at a stroke a thousand strands are wed ” (Goethe, 1975 [10, p. 94]; Italian citation: “In realtà, la fabbrica dei pensieri / va come un telaio: / pigi il pedale, mille fili si agitano / le spole volano di qua e di là, / i fili corrono invisibili, / un colpo lega mille maglie”, cit. in Dioguardi, 1995 [4, p. 171], author' s note) ... This company model, however, entails centrifugal freedom of movement capable of disaggregating the component companies (but also involving critical elements for the individual supplier companies - author' s note)c. It is thus necessary (regarding the risks of activating such centrifugal autonomies author' s note) to search for elements of aggregation and homogeneity. And these can be found precisely within the concepts of culture and quality, intended both as internal organizational requirements as well as external manifestations capable of expressing a competitive nature” (Dioguardi, 1995 [4, p. 171], author' s translation). Especially in Italy, in the building sector (in the more advanced cases) models of networked-companies, or of general company, can be defined through connections “(...) at three fundamental levels: c
See, for example, particularly regarding the building sector, Giallocosta and Maccolini, in Campagnac, 1992 [2, pp. 131-133].
Importance of the Infradisciplinary Areas in the Systemic Approach …
• • •
139
the general company itself, which essentially takes on the role of managing and orchestrating (...) the multinodal company (the operational nodes of the general company) ... responsible for managing production, finance and plant and machinery; the macrocompany, consisting of the specialist external companies (...) involved, by the general company, through the multinodal company, in production and supplier activities, by means of quasi-stable relationships ...” (Garaventa et al., 2000 [8, p. 90], author' s translation).
More recent conceptual developments, ascribable to modern aspects of company networksd, could lead to possible developments of traditional districts towards innovative forms having their own technological identity, able to compete in global markets. For the genesis and optimum development of these innovative structures, the themes of governance and the need for associationism amongst companies are very important; particularly the latter aspect, for which: “(...) associationism amongst companies should be promoted, with the objective of a more qualified presence in the markets, and thus one should promote the formation of networks (...) comprising not only small and medium-sized companies in similar markets, but also companies in other districts having analogous or complementary characteristics, interested in presenting themselves in an adequate manner to large-scale markets ...” (Dioguardi, 2007 [6, pp. 143144], author' s translation). Clearly, the theme of governance becomes central for company networks (and networked-companies), especially where: • their formation occurs through spontaneous processes (company networks); • whenever criticality occurs (or there is a risk of it occurring) during useful life-cycles, but also due to more general requirements regarding the definition and realisation of goals (mission) and related management issues; in this sense, the existence of a visible hand (a coherent substitute for the invisible one of the market evoked by Adam Smith), deriving from the professional competence of the managers and from suitable regulatory strategies for that sector, ensures governance: thus acting as an observer, in the systemic sense, and thus active, being an integral part of the processes occurring (Minati and Pessa, 2006 [12, pp. 50-55]).
d
Such company networks “(...) lead to novel aspects of economic analysis which form units at a third level, in addition to individuals representing first level and companies second level elements” (Dioguardi, 2007 [6, p. 138], author' s translation).
140
G. Giallocosta
Further systemic connotations of networked-companies lie in maximizing and, at least in the really advanced cases, optimization of the interactions amongst component companies (agents); these latter, moreover, are typical of a company collective being, expressing the ability to learn, accumulate know-how, follow a strategy, possess style, leadership, etc.: it follows that it possesses intelligence (or better, collective intelligence) in terms of, for example, the ability to make choices on the basis of information, accumulated know-how, elaborating, strategies, etc., and also when faced with peculiarities of context (Minati and Pessa, 2006 [12, pp. 110-134, 372-374])e. The explicit role played by the latter already alludes to the significant effects thus produced (but also, as will be seen, to the synergies which arise) regarding the ability/possibilities of the companies to make choices, and thus to assume suitable behavior, to follow suitable strategies, etc: the set of peculiarities factors are thus considered as agents (and in a dialogical sense with rules and general aspects) regarding the optimum development of behaviors, company strategies, etc., and coherently with innovative theories; these do in fact acquire, precisely because of this (and of the infradisciplinary aspects which it carries), continuous refinement and specification. In resolving the make or buy dichotomy, prevalently by way of orientation towards productive decentralisations (whilst maintaining strategic internal activities, avoiding a drift towards hole corporations), the networked-company also stresses its own behavior as an open system: and precisely, at least on the basis of its productive performance, in the sense of being able to continually decide amongst various levels of openness or closeness with respect to its own context (Minati and Pessa, 2006 [12, pp. 91-124]). The nature of the latter, and the decisions regarding the ways in which one can relate to it, also induce within the company, the possibility of adaptive flexibility (minimal where tendencies toward systemic closure prevail). Such strategies then evolve towards possible forms of dynamic flexibility, such that the company, even though it is suitably prepared in this sense (Tangerini, in Nicoletti, 1994 [13, pp. 387-392]), not only receives market input but modifies and develops it (Garaventa et al., 2000 [8, pp. 125-137]): for example, by anticipating unexpressed requirements, satisfying e
The concept of intelligence in company collective beings, coherently with a simplistic definition of the former, may be considered as corresponding to, for example, the ability to find the right answers to the questions, and assuming (or considering) that the right answers are not so much the real ones but the more useful ones (or rather, those that work). In this sense one can attribute intelligence to collective beings: the intelligence of flocks, swarms, companies, etc., are manifest in the specificity of their collective behavior, where only collectively (as opposed to the inability of the individual members) are they capable of solving problems (Minati and Pessa, 2006 [12, pp. 116-125]).
Importance of the Infradisciplinary Areas in the Systemic Approach …
141
latent needs, etc., and while sustaining in an evident manner, unacceptable risks of manipulating the processes of the formation of demand, the development of induced needs, etc., and for which inhibiting measures of such risks even through governance and sharing ethical codes is necessary (Minati and Pessa, 2006 [12, pp. 336-346]). Thus, there are mutual company-context influences, synergic modifications between them, following non-linear and recursive processesf. Above all, the existence of such interactions, their attributes, the implicit nature of their own connotations (closely related to the character and peculiarities of the company and of the other actors involved, and to the context in which they occur), require the use of infradisciplinary applications and improvements for: • an effective interpretation of such emergent phenomena (Minati and Pessa, 2006 [12, pp. 98-110]), • the most efficient management possible of the latter. 3. Networked-company in the Building Industry Specific aspects of the building sector (illustrating in particular its distinctive character compared to other areas of industrial activity, and for its direct interference with company organizational models) can be summarized, amongst others, by (Garaventa et al., 2000 [8, pp. 27-40] and Sinopoli, 1997 [16, pp. 4665]): • relationships with particularities of contexts (environmental, terrain, etc.)g; • technical and operational activities carried out always in different places; f
g
Non-linear processes, typical of complex systems (distinguished, moreover, by exchanges with the external environment), show behaviors which can not be formulated in terms of a (linear) function f (x) such that: f (x+y) = f (x) + f (y) and f (a*x) = a* f (x). A formulation of recursive processes, typical of autopoietic organisations (which produce themselves), can occur by means of a program (p) expressed in terms of itself, so that its execution leads to the application of the same algorithm to the output of the previous stage. A recursive program recalls itself generating a sequence of calls which end on reaching a given condition, a terminating condition. “Due to the fact of being a building, which occupies a significant portion of land over very long periods of time (...) the product of the construction process has to face up to the problem (...) of relating to the characteristics of its own context: the physical ones (climate, exposure, geology, meteorology), the environmental and historical ones (...) The relationship with its context ensures that the product from the building process adds to its economic role a series of cultural and symbolic meanings and that the agents involved in this process have to come to terms with a discipline (which industrialists almost never have to face) which deals precisely with these specific meanings, that is, architecture ...” (Sinopoli, 1997 [16, p. 48], author' s translation).
142
• • • • •
G. Giallocosta
unique nature of the building; tendency towards high costs of construction; maintenance, and often increases in the economic value of the end products over the years; presence of fractionated leadership in the management of single initiatives; existence of temporary multi-organisations during the management of activities (as for other events and areas of activity, such as theatre, football or rugby matches, etc.).
The significant impact of the building industry upon regulations at the urbanistic and territorial planning level, and upon the multiple particularities of the various contexts, tends to place it in a special position within the macroeconomic scenario: there emerges, for example, the need for extremely detailed institutional regulations (Garaventa et al., 2000 [8, pp. 37-41]), and related interests and multi-disciplinary values (social, economic, cultural, etc.). In the building sector, the technical and operational activities are always carried out in a discontinuous manner and in different places (building sites) and burdened with significant risks (unforeseeable nature of climatic and environmental factors, etc.), supplying unique end-products: thus excluding rigorous programming of operational activities, any meaningful production standardizations, etc. The tendency towards high costs of the finished product, and the relatively long periods necessary for the design and production, often lead to burdensome outlays of economic resources by the construction companies, with consequent heavy negative cash-flow for the latter which usually persists over the whole period of the contract. The maintenance (and often the increase) over time of the economic value of the end-products also explains the lack of interest of those involved in this sector (compared to other sectors of economic activity) regarding aspects of productivity, technical innovation, etc.h: emphasis is, in fact, placed upon factors more ascribable to rent income rather than industrial profit (or the optimization of productive activities). h
The building industry, in fact, “(...) is ‘a strange world of suspicious people (...) who often reject (...) innovations which upset (...) behaviour and (...) habits’ (Sinopoli, 1992 [15, p. 12], author' s note and translation) ... The mistrust is (...) deeply rooted since this community has accumulated millenia of experience, producing great works using almost all types of material available in nature (...) This ‘strange world’, thus, takes from experience the criteria of evaluation of possible innovations accepting external stimuli only through a long and not very clear process of ‘metabolism’ through which novel proposals are compared with the order of a system dominated by the persistence of intrinsic conditions, such as (...) the specific nature of the product and in the ways of producing it ...” (Maccolini, in AA. VV., 1996 [1], author' s translation).
Importance of the Infradisciplinary Areas in the Systemic Approach …
143
Distinct from other industrial activities, where in almost all cases a single agent (the managing director) guarantees through her/his own staff the management and control of the various sub-processes (analysis of demand and of the market, product design, construction, marketing, etc.), in the building industry the functions of leadership are fractionated and assigned to a number of agents (designer, builder, etc.); the latter, moreover, are only formally coordinated by the client (who usually does not possess any significant professional competence). Often this situation, even when some of those agents (often the construction company, especially in Italy and notwithstanding initiatives taken to follow European Directives) take on a central role in the productive processes (Garaventa et al., 2000 [8, pp. 74-76, 120]), leads to conflict, legal wrangling, etc., heightened, moreover, by the existence of temporary multi-organizations. The latter (Sinopoli, 1997 [16, pp. 60-65]): • are formed for the period necessary for managing a given activity (clearly excluding the possibility of accumulating any common experience, as it is unlikely to be usable on successive occasions), • are composed of organizations (or agents, such as designers, companies, suppliers of materials and components, etc.) which, although each is independent, make decisions which depend upon (or interact with) those taken by the others. Situations emerge, endogenous to the building sector, which lead to, well beyond the peculiarities ascribable to the various social, local or other aspects, structural dissimilarities of processes and production in the building sector with respect to other sectors of industrial activity. With other so-called non-Fordist sectors (not in the historical sense), the building sector certainly possesses original modes of production and accumulation (Garaventa et al., 2000 [8, p. 28]). Symptomatic of this, for example, especially during its phases of greatest development, are the significant capital earnings ability but also the low productivity of the work done (which contributes much less to profit formation, as is well known, with respect to other industrial sectors). The character and the specific nature of the building sector thus become the aspects and questions to face up to in order to develop networked-companies in this industry. Above all, a governance as a consequence of public policies besides, naturally, the visible hand of a management possessing a suitable business culture for the building industry, will ensure sufficient compatibility with: • the significant implications for urbanistic and territorial layouts, • the processes of the formation and satisfaction of demand,
144
•
G. Giallocosta
harmonic developments (Giallocosta and Maccolini, in Campagnac, 1992 [2, pp. 131-133]) of supply and of the markets (general companies having responsibility for production and orchestration, specialist companies, independent companies active in maintenance, micro-retraining, renovation, etc.).
Naturally, the importance of the existence of business ethical codes is also clear, especially: • in dynamic flexibility strategies, • in make-buy optimizations which do not damage the competitiveness nor the independence of the supplier network (Giallocosta and Maccolini, in Campagnac, 1992 [2, pp. 131-133]). More generally, the requirements of governance seem to take on the aspects of an active observer (as recalled above in a precisely systemic sense), but with leading roles, nature and attributes, in the face of the peculiarities of the sector. The latter also concern, as mentioned above, particular methods relating to models of operational organization, and planning activities, standardization (only as a tendency and of limited reliability), etc. Above all, informal procedures become particularly important within the organizational structures of the sector. This phenomenon is of particular importance in the Italian situation: the significant “(...) presence of informal procedures, which often spill over (...) into aspects which are, at least, problematic within a normal framework of productive efficiency (...), do not, however, hinder the realization of buildings of a good qualitative level, especially for small and medium-sized buildings without particularly complex plant (...) One often observes, for instance, the high quality of the finishing of buildings in Italy (work in which precisely those informal procedures are the most widespread - author' s note), with respect to the situation in France or in Britain” (Garaventa et al., 2000 [8, p. 93], author' s translation). In this country, the “(...) companies work mainly by using techniques and rules of the trade learnt at the individual level (...) This is personal individual knowledge, rather than know-how or operational procedures developed by the company (...) The lack of any formalized rules of the trade lead to processes of professional training during the work itself (...) which produces certain homogeneity of the agents at all levels (...) Thus the development of responsibility of the operatives, (...) their common hands-on training, end up generating a significant understanding amongst the various operatives. This allows a good product from a largely non-formalized context (...) The operatives seem to have a unity of intention (...) which (...) renders possible the realization
Importance of the Infradisciplinary Areas in the Systemic Approach …
145
of buildings of acceptable quality” (Garaventa and Pirovano, 1994 [7], cit. in Garaventa et al., 2000 [8, p. 94], author' s translation). In this sense, that particular behavior as a collective being derives from the sharing of a cognitive model taken up through training activities which are often not formalized (especially in Italy) but which can establish common behavioral micro-rules (Minati and Pessa, 2006 [12, pp. 110-111]): • notwithstanding discontinuous and heterogeneous experience, • provided that customary forms of intervention existi. Thus, in networked-companies in the building industry and, more generally, for the multiple forms of business aggregation to be found there, that dual organizational order, formal and informal, typical of socio-technical systems, develops: • where the latter translates the set of unwritten rules through the former, but originating from the distinct personalities of the operators and thus decisive in reaching successful conclusions or in determining the unsuccessful ones (Dioguardi, 2005 [5, pp. 87-89]), • with particular emphasis upon the peculiarity and the importance of the informal organization, faced with the analogous peculiarities of the sector. Here, therefore, amongst other factors distinguishing the sector from other areas of economic activity, any coherent development of networked-companies requires: • validation and optimization of the work done, even as the effects of informal procedures (as far as they can be ascribed to compatible type of intervention); • maximizing operational flexibility (for the work done and/or differentiated activities).
i
“The unresolved problem of the Italian building sector is that the unity of intention under these conditions is only possible for traditional working practices and for relatively simple buildings. In large and complex buildings (...) the operatives, especially at the operational level on the building site, lose the overall view of the job (...) and, with that, the unity of intention (...) In an analogous manner, the technical quality required for innovative building work can not be reached using the unwritten rules of the trade (...) On the other hand, the efforts being made to formalize processes and technical rules have the effect of destroying the context which generates artisan know-how and the unity of intention: (...) the Italian building industry finds itself in a difficult and contradictory situation ...” (Garaventa and Pirovano, 1994 [7], cit. in Garaventa et al., 2000 [8, p. 94], author' s translation).
146
G. Giallocosta
Similarly, the diseconomies which still burden the processes of producing buildings, and which to different extents penalize the operators involved (negative cash-flows for the companies, high costs and prices for the customers, users/clients, etc.), demand innovation capable of significant reductions in these phenomena, notwithstanding the emphasis still placed upon rent earnings. In this sense, and also for questions of a more general nature (Giallocosta and Maccolini, in Campagnac, 1992 [2, pp. 131-133]), networked-company models above all, being validating procedures of productive decentralization in the building industry, require (together) regulatory activities regarding: • appropriate limits and well-defined suitability of such procedures; • policies of containment of the costs of intermediaries, often exorbitant and inherent in such activities. Clearly, there is also need for more innovative tendencies which: • optimise quality-cost ratios of the work done, • put into place shared rules and formal procedures, especially for complex and technologically advanced activities (where competition is ensured through interchangeability of know-how and operators). For the latter, naturally, there are aspects common to other industrial sectors, but, for the reasons outlined above, they acquire particular importance in the building industry and, thus, require appropriate governance. 4. Conclusions Networked-company models are emblematic of the infradisciplinary aspects of the systemic approach. Within these models one can, in fact, verify the effectiveness of the most recent theories of business organization, typical of the late-industrial era. At the same time, however, given the current specific aspects of the building sector, there are other peculiarities ascribable to: • networked-company in the building industry, • consistent developments. Thus, the infradisciplinary contributions to systemics (as adjuvant activities which process phenomena having multiple components within a system) do not
Importance of the Infradisciplinary Areas in the Systemic Approach …
147
lead to reductionismj, as long as there are no mistaken assumptions of centrality (Diderot and d’Alembert, 1772 [3], cit. in Dioguardi, 2005 [5, p. 45]). Moreover, within the terms mentioned above, the harmonious deployment of transdisciplinarity, interdisciplinarity and infradisciplinarity become essential. As observed above, a networked-company in the building sector provides an exemplary case of this. References 1. AA. VV., Nuove strategie per nuovi scenari (Bema, Milan, 1996). 2. E. Campagnac, Ed., Les grands groupes de la construction: de nouveaux acteurs urbains? (L’Harmattan, Paris, 1992).
3. D. Diderot, J.B. d’Alembert, in Enciclopedia o dizionario ragionato delle scienze, 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
j
delle arti e dei mestieri, 1772 (Laterza, Bari, 1968). G. Dioguardi, L’impresa nella società di Terzo millennio (Laterza, Bari, 1995). G. Dioguardi, I sistemi organizzativi (Mondadori, Milan, 2005). G. Dioguardi, Le imprese rete (Bollati Boringhieri, Turin, 2007). S. Garaventa, A. Pirovano, L’Europa dei progettisti e dei costruttori (Masson, Milan, 1994). S. Garaventa, G. Giallocosta, M. Scanu, G. Syben, C. du Tertre, Organizzazione e flessibilità dell’impresa edile (Alinea, Florence, 2000). P. Gianfaldoni, B. Guilhon, P. Trinquet, La firme-réseau dans le BTP (Plan Construction et Architecture, Paris, 1997). J.W. Goethe, Faust (Penguin Classics, Middlesex, 1975). P, Lorenzen, Ed., Konstruktive Wissenschaftstheorie (Suhrkamp, Frankfurt, 1974). G. Minati, E. Pessa, Collective Beings (Springer, New York, 2006). B. Nicoletti, Ed., Management per l’edilizia (Dei, Rome, 1994). R. Pietroforte, E. De Angelis, F. Polverino, Eds., Construction in the XXI Century: Local and global challenges (Edizioni Scientifiche Italiane, Naples, 2006). N. Sinopoli, L’innovazione tecnica nelle costruzioni, in Sinopie, 6 (1992). N. Sinopoli, La tecnologia invisibile (Angeli, Milan, 1997).
Reductionism intended above all as the unmanageability of emergent phenomena caused by unsuitable convictions about the exhaustive nature of praxis and cognitive models centered only upon specific details and particularities.
This page intentionally left blank
SYSTEMIC OPENNESS OF THE ECONOMY AND NORMATIVE ANALYSIS
PAOLO RAMAZZOTTI Dipartimento di Istituzioni Economiche e Finanziarie, Università di Macerata via Crescimbeni 20, Macerata, Italy E-mail:
[email protected] The paper discusses economic analysis as a normative – as opposed to positive – science. Contrary to conventional economics, it argues that: the economy does not consist of markets alone; both the economy and markets are open systems. The organization of markets and other economic activities therefore depends on the interaction between the economy and the rest of society. What configuration holds in practice is a matter of public policy. In this perspective, public policy is an intrinsic part of economic analysis, not something that follows once the economy has been investigated. The paper also argues that markets have a rationale of their own. As a consequence, public policy must define – or co-determine – the appropriate economic configuration not only by acting upon the institutional setup of markets but also by identifying those sections of the economy that should be coordinated by markets and those that should resort to other economic institutions. Keywords: openness of economy, markets as open systems, public policy.
1. Introduction This paper discusses economic analysis as a normative science. Contrary to conventional economics, it argues that since the economy does not consist of markets alone and both markets and the economy as a whole are open systems, the organization of markets and other economic activities depends on the interaction between the economy and the society they are a part of. What configuration holds in practice is a matter of public policy. In this perspective, public policy is an intrinsic part of economic analysis, not something that follows once the economy has been investigated. The paper also argues that markets have a rationale of their own. As a consequence, public policy must define – or co-determine – an appropriate economic configuration not only by acting upon the institutional setup of markets but also by identifying those sections of the economy that have to be coordinated by markets and those that have to resort to other economic institutions.
149
150
P. Ramazzotti
The paper is arranged as follows. The next section argues that, even in a very stylised model of the market, some political decisions are necessary concerning factor endowments and, in more general terms, property rights. This implies that, depending on which decision is actually taken, a whole range of market configurations is possible. Section 3 argues that the choice of configuration depends on the relation between the economy and the way it is perceived and understood by people. To this end, the section focuses on the characteristics of knowledge. It stresses its irreducibility to a consistent system and how this feature may affect how people assess the economy. More specifically, the multiple facets of knowledge reinforce the possibility of a significant variety of economic setups. Section 4 contends that, how the economy is organized ultimately is a matter of public action. This implies that economics cannot be viewed other than as a normative science. Economic inquiries that neglect the role played by the policy maker either rely on a semiclosed system view of the economy or implicitly assume that the only economy to be taken into account is the status quo. Section 5 provides the conclusions. 2. Capitalist markets as open systems The conventional notion of a self-regulating market can be traced back to conventional economic theory. Walrasian general equilibrium is based on the assumption that when such “exogenous” variables as technology and preferences are given and known and when resources are assigned to economic agents, a properly functioning price system provides all the information that is required in order to allocate those resources. Since technology, preferences and endowments are believed to be independent of how the market functions, the market itself can be viewed as a semi-closed systema: although it may be altered by exogenous shocks, it is a self-regulating systemb. A (Walrasian) market is one where prices are determined by preferences, endowments and technology alone. Prices, however, reflect the assignments of property rights, which simply means that someone is assigned the right to use something independently of the possibly negative consequences that this use may have on third parties: if individual A eats her apple, individual B will not be able to eat it. The rule whereby A rather than B has a right to eating the apple – a b
See Auyang (1988) [1] for a definition of semi-closed system. This view has been criticized on a number of accounts by a great many authors (see, for instance, Boulding 1968 [2], Georgescu-Roegen 1976 [7], Kapp 1976 [9]; see also Dow 1996 [6]). It is nonetheless appropriate to reassess it in order to appreciate its relation with the Polanyian themes that are discussed below.
Systemic Openness of the Economy and Normative Analysis
151
even if B is starving and A is not – is all but natural. It is determined according to some explicit or implicit decision. The assignment of the right – the related decision – is a political, surely not an economic, issue (Schmid 1987 [21]; Bromley 1989 [3]; Samuels, Schmid 1997 [20]; Medema, Samuels 2002 [15])c. The implication of the above is twofold. First, even at the highly abstract level of a Walrasian economy, the market has a political dimension, which obviously contrasts with its claimed independence from other societal instancesd. Second, depending on how property rights are assigned, a range of possible sets of relative prices is possible. In order for the price allocation mechanism to work, a decision has to be made concerning what the interests to be defended are, i.e. what the social priorities aree. Individuals make their economic choices under path dependent circumstances that are associated to political factors. These circumstances lead to a price set which is only one out of many possible ones. It is the price set that reflects past and present social priorities as they emerge from the existing system of power. A different system of power would not constrain a given market. Subject to the profit constraint, it would simply make the market function according to different priorities. Insofar as someone is entitled to charge a price for something, someone else is obliged to pay if she wants that something. Different price sets may be viewed, therefore, as leading to the different possible payoffs of a zero-sum game. There are instances where some sets of payoffs may be deemed superior to others, however. In terms of per capita income, for instance, some distributions may be preferable than others in that they favor a higher rate of income growth. Thus, economic policy – including the assignment of property rights – need not merely reflect the balance of power among conflicting interests. It may also reflect a choice related to some notion of social welfare. The problem is how this social welfare should be defined, i.e. what metric ought to be used to assess social efficiency and the performance of the economy. The above considerations on the variety of price sets suggest that it is rather inappropriate to assess economic outcomes in terms of a price-based indicator. c
d
e
Efficiency – i.e. finding the best way to achieve some goal such as allocation or growth – is not distinct of that political decision. Reducing slack, for instance, involves a decision over the right that a worker has in taking her time when she carries out a task. Markets are characterized by other institutions, e.g. those that affect the conduct of individuals and organizations. We shall not deal with these here. “The issue is not government versus not government, but the interests to which government is to lend its protection and the change of interests to which it gives protection” (Medema, Samuels 2002 [15, p. 153]).
152
P. Ramazzotti
Any trade off would reflect the specific set of relative prices it is based on. So, in general terms, before choosing, one would have to preliminarily choose which set of relative prices is appropriate. Physical output may be just as misleading as a price-valued output: decisions concerning what to produce and how to produce it are based on relative prices, thus on those same circumstances that undermine the uniqueness and objectivity of price-based indicators. The information that a given market provides is based on the political decisions that underlie its institutions. Priorities based only on that information would be biased by the status quo. In other terms, any attempt to change a given market according to the priorities set out by that same market would be selfreferential. The choice of the priorities that the economy should pursue, therefore, requires some value judgement. A benchmark that transcends those priorities is necessary. Independently of how a specific market is arranged, however, market transactions are unlikely to occur if they do not meet the requirement of profitability. Despite differences in how a capitalist market is arranged, it is always based on the profit motive. The name of the “game” is profit. Market institutions should be distinguished, in this regard. On the one hand, the profit goal is a key institutional feature of any capitalist market. On the other, this feature needs to be qualified by a range of other institutions, which we have discussed above. These institutions are not given once and for all but depend on explicit or implicit choices as to what priorities or interests should prevail. The profit game may be either expanded to the point that it encompasses all of the economy or it may be restricted. The same political decisions that assign property rights may choose not to assign them, i.e. they may choose that some good should not be treated as a commodity: this is the case when all members of a community are entitled to medical assistance, which is eventually paid through taxes. In such a case, medical assistance is not a commodity, it is an entitlement, i.e. a right that derives from being a member of a community. Political priorities underlie not only how the market works but also its boundaries. From this perspective, reliance on a profit-centred benchmark would imply the subsumption of society to the market rather than the other way round. Summing up, institutions determine property rights and entitlements, so they involve a value judgement concerning justice. From this point of view, the real obstacles to change would seem to be related to the political establishment – e.g. whether representative democracy works properly or not. The section that follows will focus on how a benchmark that transcends profit may emerge. This
Systemic Openness of the Economy and Normative Analysis
153
will provide some insights on whether it is actually possible to separate the economic domain from the political one. 3. Knowledge as an open system The previous section pointed out that the choice of the priorities that the economy must pursue transcends the market. It has to do with the relation between the society and the economy as well as with the role of the market within the economy. It therefore has to do with what the society values. Contrary to conventional economic theory, individuals are not able perfectly to process all the information required nor is that information generally available. This means that, whether they have to assess a good they wish to consume or the general performance of the economy, they must avail themselves of some assessment criterion. This requires knowledge. The definition of knowledge is definitely controversialf. Drawing on Loasby (1991, 1999, 2005) [10-12], I refer to knowledge as a set of connections – a pattern of relationships – among concepts that is required to make sense of (sections of) realityg. Since nobody can take everything into account at the same time, it is part of the learning process to select what is supposed to be relevant, i.e. to trace boundaries between what needs further inquiry and what has to be discarded. How to do this depends on the goals and the aspiration level of the learning actor (Simon 1976) [25]. An aspiration level reflects individual idiosyncrasies as well as the cultural environment of the learning actor, i.e. the range of shared beliefs, interpretative frameworks and learning procedures that other actors in that environment accept. It ultimately is a value judgement concerning relevance. Although everything is connected to everything else, so that one might conceive of a unique learning environment, in practice actors must adapt to their limited cognitive abilities by learning within specific sub-environmentsh: family, school, religious congregation, workplace, trade union, etc.. Specific knowledge f
g
h
The variety of approaches to the topic emerges in a recent “Symposium On Information And Knowledge In Economics” in the April 2005 issue of the Econ Journal Watch. A discussion of knowledge and public policy is in Rooney et al. (2003) [19]. The “A specific report can provide information only if it can be connected to something else, and it is unlikely to provide much information unless this ‘something else’ is a pattern of relationships—how some things fit together. Such patterns constitute what I call knowledge. Knowledge is a set of connections; information is a single element which becomes information only if it can be linked into such a set.” (Loasby 2005 [12, p. 57]). These environments are the subsystems of what Simon (1981) [26] referred to as a semidecomposable system.
154
P. Ramazzotti
depends on the specific problems that arise in each environment and, possibly, in those that are contiguous to iti. How those problems are framed – i.e. how problem solving activities are carried out – depends on the requirements and the priorities that arise within those environments: how you describe a brain depends on whether you are a butcher, an anatomopathologist, etc. (Delorme 1997, 2001 [4,5]). Market-related learning is constrained by profit in that it would be useless for a businessman – in his capacity as a businessman – to learn something that does not generate an economic gain. Obviously, he may wish to read Shakespeare independently of business considerations – possibly to make sense of life – but, in so doing, he will be pursuing a different type of knowledge, which is unrelated to profit and presumably unconstrained other than by his background knowledge and by the characteristics of the learning process itselfj: it could be associated to what Veblen referred to as idle curiosity. In his attempt to make sense of life, an actor may distinguish preferences, which are associated to egoistic goalsk, from commitments, which are associated to non-egoistic goals – be they those of another individual, of another group or of an entire communityl – or simply to ethical rules. What is important about this distinction is that there may be no common denominator between the two domains. As long as preferences and commitments do not interfere with each other, there may be no problem. Indeed, they may produce positive feedbacks, as may be the case when actors rely on non-egoistic rules in order to find a solution to the Prisoner’s Dilemma. When the two domains do interfere, the actor may face a conflict which is much like a moral dilemmam. An example might be an individual who carries out a specific economic activity – e.g. the production of armaments – that clashes with her ethical values – the non-
i j
k
l
m
The importance of contiguity is stressed by Nooteboom (1999) [16]. See the distinction that M. Polanyi (1962) [18] provides of different learning processes and of how they can be more or less restricted by the bounds that characterize them. Preferences may also include sympathy, which occurs when A’s well being depends on B’s well being. See Sen (1982) [23]. “Non-egoistic reasons for choosing an action may be based on ‘the possibility of altruism’. They can also be based on specific loyalties or perceived obligations, related to, say, family ties, class relations, caste solidarity, communal demands, religious values, or political commitment.” (Sen 1986 [24, p. 344]). The typical example of a moral dilemma is when Agamemnon was forced to choose between losing his army or losing his daughter. A similar concept in psychology is cognitive dissonance, which arises when an individual is unable to cope with information that is inconsistent with her strongly held views.
Systemic Openness of the Economy and Normative Analysis
155
acceptance of military conflicts as the solution to international or domestic disputes. Owing to bounded rationality, knowledge may well involve coexisting yet potentially inconsistent views of reality. In a capitalist economy, the profit motive provides the rationale for most economic transactions. It therefore underlies views of how to conduct business and of economic activity in general. These views may turn out to be inconsistent with what actors view as appropriate from other perspectives, e.g. ethical, religious, etc.n. Preferences and values associated to potentially inconsistent domains may coexist for quite a long time, without interfering with each other. Consequently, each domain is likely to lead to the insurgence of domain-specific institutions. Markets may therefore coexist with institutions that transcend them: clubs, churches, political parties, etc.. Institutional setups are not merely instrumental to the solution of specific problems. Once they are established they easily become a part of the reality that actors take for granted: they are internalized. This cognitive dimension of institutions (Zucker 1991 [29]; Scott 1995 [22]) suggests that it may not be easy to conceive of their dismantling as well as to envisage an alternative. One implication of the above discussion is that there is no single “game” being played, and quite a few sets of rules may coexist, interact and sometimes clash. The institutional setup that underlies the market may or may not be consistent with ethical values, as well as with institutions that are associated to those values. Thus, sweatshops may be consistent with profit and its related institutions – e.g. firms, stock markets, etc. – but may be deemed unacceptable from a range of ethical perspectives and inconsistent with the rules associated to their related – e.g. religious, human rights and political – institutions. From a policy perspective, this suggests that making sense of, and somehow dealing with, these inconsistencies should be a priority. Thus, although complexity requires that we devise simulation models that take account of multiple interactions and non linearities (Louie, Carley 2007 [13]; Law, Kelton 1999 [14]), so as to appreciate the dynamics of a system, the key point of the paper is that these should not be viewed as ever more sophisticated deterministic models. Inconsistencies involve choices and degrees of freedom within the models. They stress that the distinction between positive and normative economics is generally misleading. n
Ethics exists in business as well as in other domains of life. It is part of the market-related institutional setup. However, when I refer to ethics, in what follows, I refer to values that are independent of market-related activities.
156
P. Ramazzotti
A second implication is that knowledge about one’s reality may consist in a set of autonomous sub-systems but the boundaries between these sub-systems are never given once and for all. They may be reassessed. So, although actors may accept a distinction between the economic and other domains, thereby adapting to, and situating themselves within, a given economic and societal setup, they may also valuate those very setups and desire to change them. This involves confronting the profit motive with other values. More specifically, it involves confronting alternative ways that society can resort to in order to materially reproduce itselfo. The conclusion the above discussion leads to is that, although economic performance is usually assessed in terms of how profit-centered transactions within markets provide for the allocation of resources, for the rate of growth, or for accumulation, this type of assessment may be misleading for two reasons. First, societal values that clash with market-related values may eventually undermine social cohesion, thereby disrupting economic, as well as social, relations. Second, in so far as the economy is a sub-system of society, its performance should not be expected to prevail a priori over society’s overall performance, however assessed. At the very least they should be on the same standing. More generally, one would expect that the economy’s performance should be assessed in terms of society’s value system rather than in terms of its own criteria. Taking account of societal dimensions such as justice and carep, however, may be problematic, as the next section will argue. 4. Systemic openness and public policy Section 2 argued that political choices determine rights, which – together with other institutions – determine the structure of prices and the composition and amount of output and investment. The resulting institutional structure acts upon the choice sets of economic actors. This is the context where actors learn about the economy and learn to interact in compliance with the constraints that the extant market provides. As the discussion of knowledge in the previous section argued, however, learning actors generally transcend the economy and pursue a knowledge that is independent of market constraints. The interaction between the societal value system that this knowledge leads to and the economy allows for a great variety of potential economic and societal setups. Which one occurs o
p
Polanyi (1957) [17] specifically refers to contracted exchange, redistribution and reciprocity as the three available alternatives. Van Staveren (2001) [28] links the value domains of care and justice to giving – which corresponds to Polanyi’s notion of reciprocity – and distribution.
Systemic Openness of the Economy and Normative Analysis
157
in practice depends on the values that eventually prevail, either explicitly or implicitly. The above discussion on how markets are structured stressed that the very assignment of property rights affects distribution. It is therefore reasonable that different stakeholders within the economy will try to defend their vested interests or shift the balance of economic power to their advantage. Any economic analysis that acknowledges the openness of a market economy must take into account how different interests may affect the overall performance of the economy. The assumption that the economy should not be interfered with is tantamount to implicitly accepting the balance of economic power that is determined by the status quo. A more active policy generally changes such a balance. An appropriate policy procedure would require the explicit formulation of choices. Any policy reasonably has to decide what weight it must assign to each type of economic activity, thus what boundaries there should be between the market, the welfare state and a broadly defined non-profit sector (including families). A strictly interrelated task is to define the characteristics of these economic sectors, thus how they are expected to interact among each other. It is not enough, however, to argue in favor of principles such as redistribution and reciprocity as if they were alternatives to the market. Depending on a variety of circumstances, they may be either alternative or complementary. The relation between market and non-market values and priorities may vary. In some instances, it may be a positive one. Thus, a rise in employment may be functional to a rise in output, quite independently of any value judgement in favor of full employment per se, and redistribution and reciprocity may be functional to the profitability of the market independently of any reference to social justice or care. Post Keynesian economics, for instance, has stressed the relation between distribution and growth, especially emphasizing that a more balanced distribution of income generally has a positive effect on the level of income itself; welfare provisions such as schooling and public health typically create positive externalities that favor economic growth; as for reciprocity, while charities prevent the effects of the market economy from occurring in their most dramatic form, they also help the market in that they prevent social unrest, which would undermine economic activity. On the other hand, distribution may also affect profits – thus investment decisions and growth – negatively, as Kalecki (1943) [8] pointed out. Restrictions on polluting industries may reduce profitability. Under some circumstances – which mainstream economic thought tends to view as permanent – public expenditure
158
P. Ramazzotti
may displace private investment. Any help to the poor may be claimed to provide a disincentive to work, as the defenders of workfare policies contend. The three forms of integration and their related institutions may, therefore, be mutually consistent or inconsistent. What is more, inconsistency may exert its negative consequences even when it does not occur. In a capitalist market economy beliefs affect economic decisions, especially investment, in a significant way. So it may suffice for business just to believe that accumulation is precluded for that expectation to fulfil itself. Uncertainty may cause economic disruption quite independently of action by business to defend its vested interests. Change in how markets are structured may affect those perceptions, leading to reactions that range from uncertainty to cognitive dissonance. The implication is that, although the priorities underlying the inception of change may be generally accepted, the process may lead to the perception of unexpected and unwanted institutional inconsistencies. This is a critical issue in the light of the question: who is to choose? Owing to complexity, actors may change their mind in the process. The bandwagon effects of uncertainty may reinforce these effects. Policy must co-evolve with the parties involved. It must induce institutional change but unless it allows actors to change their perception of the economy as institutions change, it may determine two types of negative reactions. First, change may not be perceived, so that actors continue behaving as if nothing happened: for instance, provisions in favor of the weaker sections of society (e.g. consulting rooms provided by the welfare state) may remain underemployed, to the advantage of the market for private, costly and often inappropriate services (e.g. abortions). Second, opposition to what is perceived as institutional disruption might lead to reactions that recall luddist opposition to technological change. While a utility maximizer who learns only in order to achieve her goal would not be concerned about the general effects of the policy that is being carried out – she would merely focus on (micro) adaptation – a learning actor who can abstract from specific circumstances of time and place may judge policy at the macro level and either support it or oppose it. Thus, a bi-directional relation must occur between social actors and policy makers. The former must be aware of what change is occurring and how it may impinge on their life. The latter must achieve change through consent, which involves that they must avoid actors from perceiving change exclusively in terms of social disruption but also that they must be aware of what change is most important in the eyes of the social actors.
Systemic Openness of the Economy and Normative Analysis
159
Along with the bi-directional relation between social actors and policy makers, social actors must interact with each other. Precisely because they may wish to change their economic and societal environment, inconsistencies may arise among the metrics adopted by each one. In order to overcome these inconsistencies actors must be able to carry out appropriate search processes, that is, learn – a policy implication from section 3. In doing so they must also interact with others in order to achieve a generally shared view of what the appropriate metric should be. 5. Concluding remarks Systemic openness characterizes all markets: they could never work properly if they were not embedded in a broader (institutional) environment. Markets and institutions are interrelated. Not all institutions, however, are functional to the market, because some arise within extra-economic domains and may well be inconsistent with the institutions underlying the market, as well as with the profit motive that characterizes modern capitalism. The issue society has to deal with, therefore, is how to avoid institutional inconsistency. This involves choosing what relation must exist between the market, the economy and other societal institutions. The above choice requires a view of how things are and of how they ought to be: it requires knowledge of the reality people are a part of. Knowledge, however, is also an open system: people cannot separate, once and for all, their economic lives from their ethical lives. At the same time, they cannot keep everything together because they are boundedly rational. They cannot have allencompassing and consistent views of what is appropriate. Quite to the contrary, inconsistencies may occur within individuals as well as among them. A priori there is no reason to believe that economic constraints, which are not technically neutral but ultimately depend on discretionary decisions, should prevail over other societal requirements. Similarly, there is no reason to believe that the status quo is preferable to other situations. Economic analysis must, therefore, investigate how direct the economy towards society’s ends. It must deal with normative issues. If public policy is concerned with the overall quality of life of the members of society, it must allow them to overcome the inconsistencies discussed above. It must therefore take into account that, along with the values underlying the functioning of the market, a range of different values exists, and only members of society can judge what the priorities are. But, in order to choose, these members must be provided with the preliminary requirements for free choice.
160
P. Ramazzotti
The issue is not figuring out what people want but giving rise to a process that will lead people to learn how and what to choose. The process of change that such a policy determines may well lead actors to perceive new inconsistencies. Actors who initially favor one type of change may eventually change their views about what the priorities are. This is consistent with the assumption that actors are not substantively rational and that they learn as their environment evolves. It implies that the choice of priorities is an ongoing process that requires interactive learning and dialogue between policy makers and the actors involved, as well as among the latter. The general conclusion this discussion leads to is that democracy matters for normative economics. Democracy may be a means for an ongoing learning process by social actors – that eventually leads to appropriate choices – or it may be a mere counting of votes. Similarly, when institutional inconsistencies prevent governments from choosing, the solution may consist in allowing society to deal with those inconsistencies – at the risk of some social instability – or in restricting the action of minorities and dissenters. The type of action that governments take eventually affects the subsequent ability of society to actually choose the relation between its general values and economic ones as well as between the status quo and other alternatives. References 1. S.Y. Auyang, Foundations of complex-system theories (Cambridge University Press, Cambridge, 1988).
2. K.E. Boulding, in Management Science 2(3), 197-208 (1956); also published in: 3. 4. 5. 6. 7. 8. 9.
K.E. Boulding, Beyond Economics. Essays on Society, Religion and Ethics (University of Michigan Press, Ann Arbor, 1968). D.W. Bromley, Economic Interests and Institutions – The conceptual foundations of public policy (Blackwell, New York, 1989). R. Delorme, in Beyond Market and Hierarchy, Ed. A. Amin and J. Hausner, (Cheltenham, Elgar, 1997). R. Delorme, in Frontiers of Evolutionary Economics. Competition, SelfOrganization and Innovative Policy, Ed. J. Foster and J.S. Metcalfe, (Cheltenham, Elgar, 2001). S.C. Dow, The methodology of macroeconomic thought: a conceptual analysis of schools of thought in economics (Elgar, Cheltenham, 1996). N. Georgescu-Roegen, in Energy and Economic Myths. Institutional and Analytical Economic Essays, Ed. N. Georgescu-Roegen, (Pergamon Press, New York, 1976). M. Kalecki, Political Quarterly, 14, (1943). K.W. Kapp, in Economics in the Future: Towards a New Paradigm, Ed. K. Dopfer, (Macmillan, London, 1976).
Systemic Openness of the Economy and Normative Analysis
161
10. B.J. Loasby, Equilibrium and Evolution. An Exploration of Connecting Principles in Economics (Manchester University Press, Manchester, 1991).
11. B.J. Loasby, Knowledge, Institutions and Evolution in Economics (Routledge, London, 1999).
12. B.J. Loasby, Econ Journal Watch 2(1), 56-65, 2005 13. M.A. Louie, K.M. Carley, The Role of Dynamic-Network Multi-Agent Models of 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.
Socio-Political Systems in Policy, CASOS Technical Report, (2007), http://reports-archive.adm.cs.cmu.edu/anon/isri2007/CMU-ISRI-07-102.pdf . A.M. Law, D.W. Kelton Simulation Modelling and Analysis (McGraw-Hill, New York, 1999). S.G. Medema, W.J. Samuels in Economics, Governance and Law. Essays on Theory and Policy, Ed. W.J. Samuels, (Elgar, Cheltenham, 2002), pp. 151-169. B. Nooteboom, Cambridge Journal of Economics 23, 127-150 (1999). K. Polanyi, in Trade and market in the early empires: economies in history and theory, Ed. K. Polanyi et al., (The Free Press, New York, 1957), pp. 243-270. M. Polanyi, Personal Knowledge. Towards a Post-Critical Philosophy (Routledge, London, 1962). D. Rooney, et al., Public Policy in Knowledge-Based Economies. Foundations and Frameworks (Elgar, Cheltenham, 2003). W.J. Samuels, A.A. Schmid, in The Economy as a Process of Valuation, Ed. W.J. Samuels, S.G. Medema, A.A. Schmid, (Elgar, Cheltenham, 1997). A.A. Schmid, Property, Power, and Public Choice. An Inquiry into Law and Economics, 2nd edition, (Praeger, New York, 1987). W.R. Scott, Institutions and Organizations (Sage Publications, Thousand Oaks, 1995). A. Sen, in Choice, Welfare and Measurement (Basil Blackwell, Oxford, 1982). A. Sen, in Development, Democracy and The Art of Trespassing. Essays in Honor of Albert O. Hirschman, Ed. A. Foxley, M.S. McPherson, G. O' Donnel, (Indiana University of Notre Dame Press, Notre Dame, 1986), pp. 343-354. H.A. Simon, in Method and Appraisal in Economics, Ed. S.J. Latsis, (Cambridge University Press, Cambridge, 1976), pp. 129-148. H.A. Simon, in The Sciences of the Artificial (MIT Press, Cambridge, MA, 1981). H.A. Simon, A. Newell, Human Problem Solving (Prentice Hall, Englewood Cliffs, 1972). I. van Staveren, The Values of Economics. An Aristotelian Perspective (Routledge, London, 2001). L.G. Zucker, American Sociological Review 42, 726-743 (1991).
This page intentionally left blank
MOTIVATIONAL ANTECEDENTS OF INDIVIDUAL INNOVATION
PATRIZIA PICCI, ADALGISA BATTISTELLI Department of Psychology and Cultural Antropology University of Verona - Italy E-mail:
[email protected] The current work seeks to focus on the innovative work behavior and, in particular, on the stage of idea generation. An important factor that stimulates the individual to carry out the various emergent processes of change and innovation within the organization is known as intrinsic motivation, but under certain conditions, the presence of different forms of extrinsic motivation, as external regulation, introjection, identification and integration, positively influences innovative behavior at work, specifically the creative stage of the process. Starting from this evidence, the organizational environment could be capable of stimulating or indeed inhibiting potential creativity and innovation of individuals. About 100 individuals employees of a local government health department in Central Italy were given an explicit questionnaire. The results show that among external factors that effect the individual such as control, rewards and recognition for work well done, controlled motivation influences overall innovative behavior whereas autonomous motivation plays a significant role in the specific behavior of idea generation. At the same time, it must also be acknowledged that a clearly articulated task which allows an individual to identify with said task, seems to favor overall innovative behavior, whilst a task which allows a fair degree of autonomy influences the behavior of generating ideas. Keywords: innovation, antecedents of individual innovation, motivation, self determination theory, work characteristics.
1. Introduction One of the most common convictions nowadays is the understanding that in terms of innovation, it is not only the remit of the organization to be innovative but also of its individual parts. Whereas the organization can provide experts and specialists in Research and Development, it is also possible to develop and use individuals’ innovative potential, in order to respond successfully to the constant challenges of the market and social system. Betting on personnel becomes a deciding factor in competitive advantage, often reflected in a high quality service, based on the idea of continuous improvement. Currently, the focus of research on organizational citizenship behavior, employees’ creativity, positive individual initiative and on critical and reflective
163
164
P. Picci and A. Battistelli
behavior patterns which accompany people at work is the primary motivation of personnel to commit themselves to various proactive behaviors, identified as “extra-role”. It is the same general concept, according to which individuals in certain situations will do more than requested and comes under the definition of innovative work behavior (IWB), whereby individuals begin a certain course of action and intentionally introduce new behaviors in anticipation of the benefits of innovative changes (Janssen, Van De Vliert and West, 2004 [26]). For many modern organizations, public or private, competitive advantage depends on their ability to favor and program innovation, by activating and converting the ideas within the innovative process and transforming them into marketable products (Salaman and Stirey, 2002 [32]). The emerging character of innovation, when adopting the framework of Theories of Emergence (see, for instance, Crutchfield 1994 [12]; Holland 1998 [24]; Minati and Pessa 2006 [28]), appears as a sort of challenge for every theory of this kind. Namely, by its very nature, the innovation itself is unpredictable, a circumstance which seems to rule out any attempt to state the conditions granting for the occurrence of this special form of ‘intrinsic’ emergence (to use Crutchfield’s classification). Nevertheless, while a complete model of emergence of innovation is obviously unfeasible, we could still try to characterize the different psycho-social factors which seem to have some relevant influence on the development of innovation itself. We could then try to answer to questions such as: what particular factor favors individuals’ innovative behavior at work? What stimulates them to be creative or to adhere and contribute to processes of change and improvement in their specific work? By concentrating on the individual innovative process in general and on the phase of idea generation in particular, the current paper proposes to examine in what way specific motivational factors, linked to the individual and their work, can stimulate the frequency of the creation of ideas and the expression of new and better ways of doing things by individuals in the workplace. The psychological motivational construct of creativity and innovation at work as an antecedent factor has been defined on the basis of studies, carried out by Amabile (1983; 1988) [2,3], according to the distinction between intrinsic and extrinsic. Such a distinction built upon previous studies of Deci (1971) [14] and Deci and Ryan (1985) [15] and led to the Gagné and Deci (2005) [19] theory of self-determination which represents a recent attempt to put extrinsic motivation (autonomous vs controlled) into operation. Furthermore, by
Motivational Antecedents of Individual Innovation
165
continuing to follow the line taken by Amabile, various characteristics of the organizational/work environment which can stimulate or inhibit the expression of potential individual creativity, are described. Among the abovementioned characteristics, the important role of the various elements of the task (Hackman and Oldham, 1980 [21]), will be examined, highlighting in particular possible influences on the creative phase of the individual innovative process. 2. Motivational factors which influence the emergence of individual innovation and idea generation. Individual innovation or role, understood as “the intentional introduction within the role of new and useful ideas, processes, products and procedures” (Farr and Ford, 1990 [18, p. 63]), is that type of innovative behavior which the individual puts into practice to improve the quality of their work. A recent widespread current of thought considers individual innovation as a complex process made up of phases, often defined in different ways, which can be essentially divided into two distinct behaviors, namely, the generation and implementation of ideas (Rank, Pace and Frese, 2004 [29]). The precise moment of idea generation is the initial phase of innovative wok behavior. It is considered to be the most relevant factor in creativity, that is to say the most closely linked, to “the production of new and useful ideas” (Scott and Bruce, 1994 [31, p.581]), in which it is principally the individual, who acts according to their interpretation of internal environmental factors. The abovementioned phase, which is characterized by “subjectivity”, differentiates itself from other phases of innovative behavior at work which are more “social” (idea promotion and idea realization), that is to say, those which give space for necessary moments of interaction between individuals. Given its fundamental importance for the development of emerging innovation, idea generation is among the most discussed tasks of individual innovative behavior. These innovations may be born both spontaneously and intentionally from individuals and groups at work, with the aim of making the work at hand better, simpler and more efficient (among the many papers devoted to a theory of this subject we will quote West, 1990 [34]; McGahan, 2000 [27]; Alkemade et al., 2007 [1]; Cagan, 2007 [11]). One specific study, aimed at researching individual characteristics which stimulate or inhibit creativity, as expressed by employees towards their work, is that of Amabile and Gryskiewicz (1987) [8]. They analyzed individual performance within a problematical situation in the workplace. Among the qualities of the problem solver, which favor
166
P. Picci and A. Battistelli
creativity, not only do a number of typical personality traits, such as persistence, curiosity, energy and intellectual honesty, emotional involvement in the work per se and willingness to accept a challenge emerge but also the possession of fundamental cognitive abilities for certain sectors, and finally characteristics more closely linked to the particular situation. These abilities include being part of a team with dependable intellectual and social qualities and showing good social, political and relational skills. Amabile (1988) [3] proposes a componential model, in terms of psychosocial creativity, in which three necessary components/elements for good creative performance are described, namely, domain-relevant skills for the task, creativity-relevant skills and intrinsic task motivation. Intrinsic motivation involves people carrying out an activity in and of itself, given that they find the activity interesting and that it gives a spontaneous satisfaction. Extrinsic motivation, on the other hand, requires a combination of activity and certain consequences thereof, in such a way that satisfaction does not originate in the activity per se but rather in the consequences from which they derive (Porter and Lawler, 1968 [30]). A number of authors (Woodman, Sawyer and Griffin, 1993 [35]), maintain that it would be preferable to avoid the development of extrinsic motivation among workers, given that it would direct the focus/attention “beyond the heuristic aspects of the creative task and towards the technical and normative aspects of performance” [35, p. 300] even if certain conditions exist in which these aspects play a favorable role in the creative carrying out of the work performance and in which it may even be necessary and desirable that the positive effects actually increase. For example, with the imposed limitations of deadlines, expectations, controls and contractual rewards, work tends to be completed on time and well. Furthermore, not only do people need financial recompense for their work but they also need positive rewards of other types, such as feedback, recognition and behavioral guidelines (Amabile, 1988 [3]). Amabile in particular dedicates a major part of his work to the study of the role of motivation in task performance, positing the hypothesis that intrinsic motivation may influence the emergence of the creative process, whereas extrinsic motivation may actually be destructive, even if at times in simpler tasks it can act in synergy with intrinsic motivation and actually increase the expression of creativity, to such an extent that high levels of performance such as innovative, emerge clearly (Amabile, 1996, 2000 [5,6]; Amabile, Barsade, Mueller and Staw, 2005 [7]).
Motivational Antecedents of Individual Innovation
167
The inclusion of intrinsic motivation as determinant in innovative behavior directs our attention towards the motivational potential of work characteristics, such as the variety of task and of competences required, the degree of significance and of perceived identity in and of itself, feedback and autonomy (Hackman and Oldham, 1980 [21]). Farr (1990) [17] confirms that compared to simplified tasks, complex tasks are more challenging and potentially encourage innovation. Hatcher, Ross and Collins (1989) [23] highlight a positive correlation between task complexity (a comprehensive measurement of autonomy, variety and feedback) and the generation of ideas phase. In order to understand the degree of commitment and level of motivation that an individual has with regards their work, it is therefore also useful at this point to consider the nature of the task, which as previously noted, is strongly related to satisfaction with same (Hackman and Oldham, 1975 [22]). From what has already been stated, it seems patently obvious that in order to be creative in carrying out tasks at work, it is necessary to be intrinsically motivated and this only becomes possible if two fundamental conditions exist, namely that a person loves what they are doing and that their work takes place in a positively motivational context. Only when these conditions have been met, does the probability of being ready for innovation within organizations increase, thus guaranteeing creative contributions by employees, which in turn produce important benefits in the long term. If it is taken as a given that the process of innovation not only includes the development but also the implementation of creative ideas, the objective of the present paper is to consider how individual work motivation and work characteristics influence the emergence process of idea generation. It has therefore been decided to concentrate on the initial phase, which also represents the phase of maximum individual creativity in the process. 3. Intrinsic and extrinsic motivation in the idea generation process The relationship that is established between intrinsic and extrinsic motivation has caused great interest in the pertinent literature. The most prevalent psychological models proposed to date in the field, have tended to concentrate on a deep antagonism between these two forms of motivation, in that as one increases, the other decreases. Nonetheless, as the abovementioned implies, various evidence points to a more complex and clearly expressed reality. Firstly, we will look at the fact that even though the relationship between intrinsic and extrinsic motivation is always inversely proportional, we will underline how
168
P. Picci and A. Battistelli
under certain conditions, the synergic presence of these two motivational forms may actually determine positive effects on creative performance. It is useful to remember that in this regard an innovative project is made up of various phases and that whilst it may be helpful in the initial phases to propose as many ideas as possible, in the successive phases it is more important however to dwell upon those produced, examining and choosing the most appropriate (Gersick, 1988 [20]). It is generally maintained that the synergic action of extrinsic motivators is more useful in those phases, where a high level of new ideas is not required, such as the phase of collecting data or the implementation of the chosen solutions. Amabile (1994) highlights intrinsic and extrinsic motivation as relatively independent factors, rather than completely opposing poles of the same dimension. Nonetheless, certain empirical evidence shows how people simultaneously maintain a strong orientation towards intrinsic and extrinsic motivation. An interesting fact has emerged from comparing the link between motivation and creativity. Not only do reported test scores for creativity in professionally creative individuals correlate positively with the so-called “challenge” component of intrinsic motivation, but they also correlate positively with “external acknowledgement”, a component of extrinsic motivation (Amabile, 1994). The current research uses the Gagné and Deci (2005) theory of selfdetermination as a reference point. The theory works within a continuum that distinguishes between autonomous and controlled motivation. These two forms of intentional motivation, by their nature, can be differentiated from “amotivation”, which implies a complete absence of motivation on the part of the subject. Autonomy presupposes a degree of willingness and of choice in the actions to be performed, e.g. “I am doing this job because I like it”, whereas controlled motivation, being partly intentional, differentiates itself by the fact that the subject acts under pressure from external factors , e.g. “I am doing this work for the money”. Intrinsic motivation is a classic example of maximizing autonomous motivation. With regards extrinsic motivation, however, the theory identifies 4 types of motivation along a continuum, from the complete absence of autonomy to its absolute presence (auto-determination). Among the abovementioned types of motivation, two belong to controlled motivation (externally controlled motivation and introjection) and two belong to autonomous motivation (identification and integration).
Motivational Antecedents of Individual Innovation
169
In order to be undertaken successfully, the activities, which may be of little personal interest, require external motivational forms, such as financial rewards, positive acknowledgements and promotions. This is a form of externally controlled motivation and is the prototype of extrinsic or controlled motivation. Other types of extrinsic motivation are related to those behaviors, values and attitudes, which have been interiorized in people at differing levels. It is possible to distinguish between three fundamental processes of interiorizing: introjection, identification and integration, which are differentiated by the degree of autonomy characterizing them. Introjection is the process, by which a value or a behavior is adopted by an individual but is not fully accepted or lived by said individual as their own. Unlike the other two types of autonomous motivation, namely identification and integration, the above type of extrinsic motivation is controlled. Identification is characterized by the fact that a behavior or a value is accepted by the subject/individual because they have judged it to be personally important and coherent with their identity and objectives. For example, if a nurse truly has the well-being of their patients at heart, they will be prepared to operate independently, undertaking their own initiatives with tasks of little interest or even with highly unpleasant ones. Ultimately, integration is a form of extrinsic motivation, characterized by a greater degree of autonomy, in which certain values and behaviors are not only tacitly accepted by the individual but also incorporated and integrated into their value system and way of life. The abovementioned motivational form, even if it shares many aspects of intrinsic motivation, is still part of extrinsic motivation, due to the fact that the person who is acting, is not interested in the activity per se but considers the activity at hand to be instrumental in reaching personal objectives “similar to” but “different from” said activity. As is stressed in the theory, people who are autonomously motivated, even if the motivation is extrinsic in nature, are potentially more inclined to introduce changes in the way they work because they constantly wish to do their work in the most efficient manner possible. It is for this reason therefore that the work itself becomes even more intrinsically motivating, without excluding however the wish on the part of the individual to have other forms of external acknowledgement, regarding the quality of their work. This datum implies that even those individuals with a more controlled motivation may potentially develop creative ideas while carrying out their work, by simply introjecting a value which does not belong to them, in order to adapt to their organization’s wishes to ultimately obtain a certain form of positive acknowledgement or to avoid other forms of disapproval.
170
P. Picci and A. Battistelli
Thus, in line with the salient aspects of the self-determination motivational theory, as reexamined by Gagné and Deci (2005) [19], particularly with regard to an as yet unstudied possible relationship to innovative behavior, it appears pertinent at this juncture to hypothesize that not only autonomous motivation (due to its proximity to intrinsic motivation) but also controlled motivation (externally regulated at differing levels), may have a positive influence on the behavior of idea generation. Let us therefore look at the following hypothesis: H1: Both autonomous motivation (in the form of identification and integration) and controlled motivation (in the form of external regulation and introjection) positively influence idea generation behavior. 4. Work characteristics influencing the emergence of the innovation process Undoubtedly, the motivation that a subject shows towards their work depends not only on individual personality but various studies have also shown the crucial role that the work environment has in stimulating creative capacity. In other words, external background and other factors stimulate the individual per se and condition their creative and innovative capacity. For example, the resources includes all those aspects which the organization places at the individual’s disposition, until such time as it becomes practical for them to effect a creative performance. This allows sufficient time to produce an innovative piece of work, to work with competent and prepared personnel, having the availability of funds, materials, systems and adequate processes, along with the relevant information, and to have the possibility of learning and training (Sigael and Kaemmerer, 1978 [33]; Ekvall, 1996 [16]). It has been consistently observed over time that the structural organization of work has direct effects on creative performance. The more complex the task, the more motivated, satisfied and productive the individuals become (Cummings and Oldham, 1997 [13]). A greater complexity of task should stimulate a higher creative potential, to the extent that a higher degree of responsibility and autonomy, in the choices made by an individual, is clearly implied. In this case, we are dealing with tasks that call for the necessity of adopting various perspectives and observing the same problem from different points of view. The abovementioned perspectives are characterized by the fact that they require a high level of ability until the tasks are carried out. They enable individuals to follow through with the task from beginning to end, in such a manner that the
Motivational Antecedents of Individual Innovation
171
individual is fully aware of the meaning of their work. They provide important feedback during the execution of the task. Finally, these various perspectives have a strong impact on people’s lives, both within and outside the organization. By way of contrast, the most simple or routine tasks tend to inhibit enthusiasm and interest and consequently they do not stimulate the expression of creative potential (Scott and Bruce, 1994 [31]). Some jobs in contrast to others however, offer people a greater opportunity for innovative behavior. Hackman and Oldham (1980) [21] identified three conditions, in which people can feel motivated by their work. Firstly, they must recognize the results of their work. Secondly, they have to experience the sensation of taking responsibility for the results of their work. Finally, they must live their work as something significant and relevant to their value system. The research literature, in this regard, highlights five useful work characteristics for handling the demands of the task at hand (Hackman and Oldham, 1975, 1980 [22,21]), skill variety, task identity, task significance, autonomy and job feedback. When combined together, the five job characteristics decide the motivational potential of a working role/position. For this reason, if a job has a low motivational potential, the intrinsic motivation will be correspondingly low and the feelings of a person will no longer be positively influenced, even by a job done well. Farr (1990) confirmed that “complex” jobs, when compared to simpler ones, are more challenging and require more thought and that consequently they tend to promote innovation. Those studies that follow this hypothesis generally confirm the existence of a relationship between the job characteristics and further confirm the creative phase of innovation, known as idea suggestion (Axtell, Holman, Unsworth, Wall, and Waterson, 2000 [10]). Using job complexity as a measurement, based on Job Diagnostic Survey (Hackman and Oldham, 1980 [21]), Oldham and Cummings (1996) found a positively significant correlation in creative scores, attributed to employees by their superiors, highlighting the interaction between job complexity and personality characteristics, in predicting idea generation. Overall, studies in job characteristics suggest that when individuals are committed to various tasks with high levels of controls, they have a greater propensity to find new solutions, in improving their work (Axtell et al., 2000 [10]).
172
P. Picci and A. Battistelli
It is therefore by following the objective of said work that we are in a position to outline the hypothesis mentioned below: H2: Job characteristics (task identity, task significance; feedback, autonomy and skill variety) positively influence the behavior of idea generation. Finally, in the light of such evidence and considering the predictable relationship between job characteristics and intrinsic motivation, it is quite plausible to hypothesize a synergic role for these factors, not only in the generational phase of ideas but also within the process of individual innovation. It is therefore ultimately proposed to test for the following hypothesis: H3: Autonomous and controlled motivation and job characteristics positively influence the entire behavior of individual innovation. 5. The method The application within the work environment context of said study was the Health Service, because of the changeable nature of the organization which underwent a notable number of reforms, legislated for during the 1990’s that transformed Hospitals into “Health Service Firms”. The research was carried out according to a quantitative and transversal type of methodology, through a specifically geared questionnaire. A research questionnaire was presented in a generic manner to the relevant subjects, providing them with general indications, whilst clarifying total anonymity and the final objective of the project. This was done in complete agreement with the Head of the Psychology Unit and the Head of Quality Business Centre for Health Service Firms, operating in a Region of Central Italy. 5.1. The sample The sample is made of 100 subjects currently employed in the Health Service and in the administrative area of the relative management service, situated in a region of Central Italy. 53% of the total sample are Health Service Personnel and the remaining 47% are made up of Administrative Personnel within the Hospital Service. 48% of the sample are male and the remaining 52% are female. The average age of sample participants is 44.7. The information regarding qualifications obtained by those involved in the sample, revealed a relatively diverse reality, divided as follows: 40% are in possession of a High School Certificate, 10% are in possession of a Diploma
Motivational Antecedents of Individual Innovation
173
from Professional Schools/Technical Institutes, 3% are in possession of a 3 year University Diploma, 26% are in possession of a Graduate Degree and finally 21% are in possession of a Postgraduate Qualification. The average tenure within the Health Service of those sampled is 15 years. Regarding the function served by the various subjects within their respective sectors, 22 are Directors/Managers/Referees in situ or in organizational positions. 20 are part of Management and the major part of the sample (58 subjects) declared that they belong to Ward Personnel. Finally all those involved in the study have officially spent an average of 12 years in Service. 5.2. The measure The questionnaire was composed of two sections, the first comprising of a general enquiry into personal details and the second included three scales, the function of which was to analyze innovative behavior, motivation and perceived job characteristics. The construct of innovative work behavior (IWB) by Scott and Bruce (1994) [31], revisited by Janssen (2000) [25], was to use to measure innovative work behavior. In order to observe the innovative behavior of idea generation, three specific items were used for this dimension, taken from the scale of nine items of innovative work behavior, as published by Janssen (2000).These items, based on three states of innovation, conceived three items that refer to idea generation, three that refer to idea promotion and three that refer idea realization. The response format was based on the 5 point Likert Scale, where 1= never and 5= always, and upon which the subjects indicated the level of frequency, with which they undertook innovative work behavior, e.g. “With what frequency does it happen to you to have to come up with original solutions to problems?” The measurement scale for job characteristics was taken from “Job Diagnostic Surveys” (JDS), by Hackman and Oldham (1980) [21]. The scale was composed of ten items and had already been used in previous non-published Italian Research that had tested the validity of the structure. Five dimensions were considered: task variety, identification of the subject within the task, significant, autonomy and feedback. Each dimension looked into two items of the scale. For example, “My job requires me to use a number of complex capacities at a high level.” (task variety); “My job offers me the opportunity to finish that part of the work, which I had previously started.” (task identification); “My job is not very significant or important in my life.” (task
174
P. Picci and A. Battistelli
significance); “In doing my job, I am constantly provided with a considerable degree of independence and freedom.” (autonomy) and finally “My job provides me with little or no indication upon which I may judge, whether I am doing well or badly.”,(feedback from the work itself). The subjects were requested to indicate their response on the 5 point Likert scale, where 1= absolutely false and 5= absolutely true, based on their level of agreement/disagreement with potentially descriptive aspects of their job. To observe the forms of motivation that drive individuals to perform their job, a recently constructed scale of 20 items is in the publishing and evaluation stage in Italy, based on the self-determination theory of Gagné and Deci (2005) [19]. The motivational forms considered, refer to intrinsic motivation that is completely autonomous and to various types of controlled and autonomous motivation, which may be identified along the continuum of extrinsic motivation as follows: externally regulated motivation (e.g. “I am doing this job because it allows me to have a high salary.”), introjection (e.g. “I am doing this job because the esteem in which my colleagues hold me, depends on my work.”), identification (e.g. I am doing this job because it is important to me.”), integration (e.g. “I am doing this job because it allows me to reach my goals in life.”). The subjects were asked to indicate their response on a 7 point Lickert scale, where 1= absolutely false and 7= absolutely true, based on their level of agreement with the example of motivation described in the items. 6. The results Among the measures used, Table 1 summarizes the average, the deviation standard and the reliability test (Alpha Cronbach). When compared to the motivational scale of Gagné (2005) [19], it emerged from an explorative analysis of the results that the original four-dimensional structure, as hypothesized by the author (controlled external motivation, introjection, identification and integration) never appeared in the data obtained by the sample used in this research. In fact, the resulting three-dimensional structure consists of the following: external regulated motivation (M=2.67; DS=1.12), introjection (M=2.58; DS=1.32) and identification/integration (M=4.14; DS=1.31). This last dimension incorporates two motivational types which are nearest to intrinsic motivation or to those motivational types which according to the theory, appear in the autonomous category. In order to obtain the current structure, 6 items of the 20 in the original scale were eliminated, due to a lack of saturation among the sample.
Motivational Antecedents of Individual Innovation
175
Table 1. Descriptive Analyses of variables. VARIABLES Autonomy Task Variety Feedback Identification Significance Job Characteristics Innovative work behavior (IWB) IWB Idea suggestion M_Integration/Identification M_Externally Controlled Motivation M_Introjection
N 100 98 100 100 100 100 100 100 97 99 99
Range 1-5 1-5 1-5 1-5 1-5 1-5 1-5 1-5 1-7 1-7 1-7
M 3.77 3.63 3.74 3.81 3.97
D. S .93 .98 .80 .92 .77
3.26 3.18 4.14 2.67 2.58
.69 .80 1.31 1.12 1.32
.70 .88 .85 .90 .70 .69
Note: it is reported that the reliability coefficient of the global Job Characteristic scale was calculated out of a total of 10 items
In line with the central hypothesis of this research, we then proceeded with an analysis of possible specific relationships between each of the variables, hypothesized as antecedents (motivation and job characteristics) and the single phase of idea generation. Table 2 shows the results of the regressions carried out, in order to study the relationship between idea generation and motivation in their autonomous and controlled forms (H1 hypothesis): Within the sample, the results of the regression show a significantly positive influence in the dimension that covers forms of integration/identification of idea generation behavior. It is therefore possible to confirm that motivation, only in the form of identification/integration, influences the emergence of the innovative behavior of idea generation. From the abovementioned data, it clearly emerges that the H1 hypothesis cannot be confirmed in its totality, showing once again that only autonomous motivation which is the nearest to an intrinsic form, appears to be implicated to a greater degree in the creative process. In fact, this process which is based on the emergence of individual innovation, does not reveal any significant result in relation to the dimensions of controlled motivation (external regulation and introjection). Table 3, on the other hand, shows the possible relationships of influence between job characteristics, according to the job characteristics model (Hackman and Oldham, 1980 [21]), and always shows the specific phase of idea generation. As can be seen from the above Table, the behavior of idea generation gives a result of being positively influenced by two specific job characteristics, namely task variety and autonomy.
176
P. Picci and A. Battistelli Table 2. Job characteristics in the phase of idea generation. Dependent Predictors Variables R² adjusted = .110; F= 13.219; p<.000 Idea gen. integration/identification IWB
.345
t
p
3.636
.000
Table 3. Motivation within the phase of idea generation. Dependent Predictors Variables R² adjusted = .262; F= 18.573; p<.000 Idea gen Task Variety IWB Autonomy
.365 .234
t
p
3.602 2.307
.000 .023
t
p
3.650 2.709 2.265
.000 .008 .026
Table 4. Motivational antecedents of IWB. Dependent Predictors Variables R² adjusted = .250; F= 11.975; p<.000 Task Variety Innovative work Externally controlled Motivation behavior Task Identification
.337 .238 .210
With regards to this hypothesis, it is therefore possible to confirm that a more complex task which requires various abilities and knowledge and which allows the subject ample margins of autonomy, facilitating them to put an initial innovative behavior of idea production into practice. Finally, an analysis of the influence of two motivational variables, generally understood as antecedents of innovative behavior, shows a significantly positive influence on task variety and identification, in addition to externally regulated motivation on innovative work behavior. It is therefore possible to confirm, even if only partially, the H3 hypothesis of the current study , to the extent that only two job characteristics correlate to IWB, that is to say, task variety and identification, whereas among the various motivational dimensions, only external regulation correlates significantly (Table. 4). Nevertheless, such a result constitutes an interesting opportunity for discussion, reflecting as it does the complex nature of innovative behavior and its revelation in terms of the specific phases of process and in terms of tasks and factors which are able to influence in different ways the emergence of each. In actual fact, it is possible from an explorative analysis of the data to confirm that innovative work behavior seems to be influenced by certain task characteristics, including its complexity (Skill variety), the possibility for the individual to follow through with the whole task (Task Identity) and the
Motivational Antecedents of Individual Innovation
177
prevalently externally controlled motivation. However, from a more detailed analysis of idea generation behavior, it seems self-evident that the presence of a complex and varied task (Skill Variety) requires the co-presence of at a systemic level must necessarily be accompanied by factors more strictly linked to the decisional capacity of individuals, along with the perception of a certain degree of autonomy in the task at hand (Autonomy) and finally the presence of autonomous motivational or self-determined forms (identification/integration). 7. Discussion The success of this research suggests how the variety and the degree of autonomy that individuals perceive in relation to their task, are conditions of equal importance for autonomous and functionally intrinsic motivation, in relation to the process of idea generation. From the point of view of the emergence of implemented innovation, it appears necessary to suggest a certain level of attention regarding the coherence of action programs, that lead to a specific task completion order by the individual, in such a way that we can identify said task from beginning to end, accompanied as it should be, by an equally coherent reward or external acknowledgement system. In fact, the subjects of the sample seem to have a higher propensity for putting into practice innovative behaviors, including idea generation and realization, when they perceive certain characteristics of the task at hand and certain motivational forms. In particular, the motivational forms are perceived during the process, not only as a form of extrinsic motivation (rewards, acknowledgements, promotions and controlling mechanisms of the superiors) which makes workers act in a concrete way, unlike normal procedure, but also as types of self-determined motivation because they are guided by a process of personal interiorizing of values and objectives, imposed by the work context that facilitate the emergence of the creative process. Overall, it is possible to confirm a contextual work situation which influences the adoption of innovative behavior and of idea generation, in addition to giving the subject the possibility of carrying out a varied task that requires high levels of competence, skills and knowledge. The presence of a task thus characterized (Cummings and Oldham, 1997), stimulates the development of a higher level of creative potential in the subject, to the extent that in order to carry out a complex task, they are more likely to look at the problem from different points of views and to advance diverse opinions and resolutions. By way of contrast, the most simple and routine tasks tend to inhibit interest and
178
P. Picci and A. Battistelli
enthusiasm, by not stimulating the expression of creative potential (Scott and Bruce, 1994). This result was also confirmed by the present study, both in relation to general innovative behavior and the specific phase of idea generation. Despite some positive indications provided by the current study in regards to a number of the most important motivational factors that can influence the emergence of the innovation process, a critical reflection on the observed data cannot but open up other questions linked to the effective nature of the relationship between contextual and individual aspects of motivation and innovation. In fact, the analysis conducted to date, appears to confirm the hypothesis of motivational influence in “rich” jobs, in terms of complexity and competence in emerging innovation. On the other hand, from the system point of view and given the longitudinal nature of the process, we cannot exclude the simultaneous perception of an increase in the complexity of task with the arrival of the latest innovation (if it was not for the fact of the innate and difficult cognitive reconstruction that forms the basis for every process of change). This observation should alert us to the fact that it is possible to fall into a vicious circle of relationships that continually require more in-depth analysis and further clarification. Beyond a more in-depth confirming analysis of the instruments used that are capable of better testing the consistency of all the factors considered, some of which are theoretically based, but nonetheless still have to be validated, the research further proposes testing the relationships suggested by the results. All this should occur from the point of view of considering the phenomenon as something that cannot be traced back to a purely individual process but as something that due to its psycho-social nature, must only be ultimately included through the adoption of a multidisciplinary and complementary approach and of analytical methods and diversified observation. This can reduce the limitations of an exclusively retrospective analysis, accompanying the questionnaire method with direct observation of and participation in the phenomenon. It was however possible to understand, even if only partially, the clear influence of two variables, the one being individual and the other socio-contextual. Both of these variables are of fundamental importance in sustaining individual motivation towards innovative performance, which nowadays, when it focuses on the moment of idea production and generation, shows itself to be ever more essential in the most varied professional contexts. Despite the results obtained, the study presents some limitations. The first limitation is represented by the fact that only two variables were considered as
Motivational Antecedents of Individual Innovation
179
behavioral antecedents of idea suggestion, completely excluding a whole series of other possible intervening variables in the process that could be traced back to group and organizational factors. A second identifiable limitation in the method of research assumed (quantitative and transversal) was the fact that it examined the perceptions of variables in an exclusively retrospective manner, thereby excluding the possibility of conducting a longitudinal analysis of the process. A third fundamental limitation stems from the nature of statistical tools used to analyse the data. They are intrinsically linear and, as emergent phenomena can occur only in presence of non-linear relationships between variables, it could seem that, as regards the detection of factors affecting the studied emergence, the obtained results would be unreliable. However, it can be remarked that this limitation holds only if we are searching for the exact form of laws linking the variables under consideration. If, on the contrary, as in our study, the interest is focused only on the detection of possible cues about factors influencing the emergence of innovation, then even linear tools are enough. Namely, if we evidence a cue through a linear method, a fortiori it will be evidenced by a non-linear method. The problem would arise only in the case of failure of linear tools, which is not our case. Unfortunately non-linear tools for detecting emergence in social, cognitive, and economical systems are still poorly available, and this often forces people to use traditional linear methods. Anyway we feel that new analyses (to be hoped) of our data through genuinely non-linear tools should confirm our first conclusion, and, at the same time, evidence new kinds of phenomena. References 1. F. Alkemade, C. Kleinschmidt, M. Hekkert, International Journal of Foresight and
International Policy 3, 139-168 (2007). T.M. Amabile, Journal of Personality and Social Psychology 45, 357-376 (1983). T.M. Amabile, Research in Organizational Behavior 10, 123-167 (1988). T.M., Amabile, R&D Innovator 3, 1-9 (1994). T.M. Amabile, Creativity in context (Westview Press, Boulder, Color., 1996). T.M. Amabile, in Basic Principles of Organizational Behavior: A Handbook, Ed. E.A. Locke, (Blackwell Publishers, Oxford, 2000). 7. T.M. Amabile, S.G. Barsade, J.S. Mueller and B.M. Staw, Administrative Science Quarterly 50(3), 367-403 (2005). 8. T.M. Amabile and S.S. Gryskiewicz, Creativity in the R&D laboratory, Technical Report 30, (Centre for Creative Leadership, Greensboro, NC, 1987). 9. N. Anderson, C.K.W. De Dreu and B.A. Nijstad, Journal of Organizational Behavior 25, 147-173 (2004).
2. 3. 4. 5. 6.
180
P. Picci and A. Battistelli
10. C.M. Axtel, D.J. Holman, K.L. Unsworth, T.D. Wall and P.E. Waterson, Journal of Occupational and Organizational Psychology 73, 265-285 (2000).
11. J. Cagan, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 21, 13-14 (2007).
12. J.P. Cruchtfield, Physica D 75, 11-54 (1994). 13. A. Cummings and G.R. Oldham, California Management Review 40(1), 22-38 (1997).
14. E.L. Deci, Journal of Personality and Social Psychology 18, 105-115 (1971). 15. E.L. Deci and R.M. Rayan, Journal of Research in Personality 19, 109-134 (1985). 16. G. Ekvall, European Journal of Work and Organizational Psychology 5(1), 105-123 (1996).
17. J.L. Farr, in Innovation and creativity at work: psychological and organizational strategies, Ed. M.A. West and J.L. Farr, (Wiley, Chichester, 1990), pp. 207-230.
18. J.L. Farr, C. Ford, in Innovation and creativity at work: psychological and 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
organizational strategies, Ed. M.A. West and J.L. Farr, (Wiley, Chichester, 1990), pp. 63-82. M. Gagné and E.L. Deci, Journal of Organizational Behavior 26, 331-362 (2005). C.J.G. Gersick, Academy of Management Journal 31, 9-41 (1988). J.R. Hackman and G.R. Oldham, Work redesign (Addison-Wesley, Reading. MA, 1980). J.R. Hackman and G.R. Oldham, Journal of Applied Psychology 60(2), 159-170 (1975). L. Hatcher, T.L. Ross and D. Collins, Journal of Applied Behavioral Science 25, 231-248 (1989). J.H. Holland, Emergence from Chaos to Order (Perseus Books, Cambridge, Massachusetts, 1998). O. Janssen, Journal of Occupational and Organizational Psychology 73, 287-302 (2000). O. Janssen, E. Van De Vliert and M.A. West, Journal of Organizational Behavior 25, 129-145 (2004). A. McGahan, Business Strategy Review 11, 1-16 (2000). G. Minati, E. Pessa, Collective Beings (Springer, Berlin, 2006). J. Rank, V.L. Pace and M. Frese, Applied Psychology: an International Review 53(4), 518-528 (2004). L.W. Porter and E.E. III. Lawler, Managerial attitudes and performance (IrwinDorsey, Homewood, IL, 1968). S.G. Scott and R.A. Bruce, Academy of Management Journal 37, 580-607 (1994). G. Salaman and J. Stirey, Journal of Mangement Studies 39(2), 147-163 (2002). S.M. Siegel and W.F. Kaemmerer, Journal of Applied Psychology 63(5), 553-562 (1978). M.A. West, in Innovation and creativity at work: psychological and organizational strategies, Ed. M.A. West and J.L. Farr, (Chichester, Wiley, 1990), pp. 309-333. R.W. Woodman, J.E. Sawyer and R.W. Griffin, Academy of Management Review 18, 293-321 (1993).
AN E-USABILITY VIEW OF THE WEB: A SYSTEMIC METHOD FOR USER INTERFACES VERA STARA(1), MARIA PIETRONILLA PENNA(2), GUIDO TASCINI(1) (1) Università Politecnica delle Marche, Facoltà di Ingegneria, DEIT
[email protected] [email protected] (2) Università degli Studi di Cagliari, Facoltà di Scienze della Formazione
[email protected] Different approaches can be applied to assess the usability of a web application. Each one of them presents advantages and drawbacks, as well as cost-benefits trade-offs. This contribution contains a short review of the state-of-the-art in web usability assessment issues, by focusing on a new systemic approach, called “e-usability”, designed to deal in a more integrated way with the human-computer interaction, so as to allow a more realistic assessment of web interfaces. The applied methodology is quick and doesn’t need any artifact design or evaluation cost. Keywords: user interfaces, usability, usability methods.
1. Introduction The Web revolutionized the computer and communications world, becoming a world-wide broadcasting tool, a mechanism for information dissemination, and a medium for cooperation and interaction between individuals and their computers regardless of geographic location (Leiner et al., 2003 [10]). The Web is now 15 years old and in its short life it heavily affected also our way of life (The Observer, 2006 [17]); millions of web sites offer users information, goods, services, and entertainment. Unfortunately, many web sites are not as successful and usable as they should be (Fang and Holsapple, in press); usability problems may cause difficulties in user interaction with the system (Cockton et al., 1999 [3]). Users in fact may find problematic an element of the interface, for various reasons, such as the difficulty in learning system operation, slow task performance, errors of use (Wang, 2001 [18]). The usability of a system is considered as a key quality in web activities such as e-commerce and e-banking, but, unfortunately, designers spend a huge amount of money in developing fancy interfaces rather than investing on quality (Anandhan et al., 2006 [1]). As a matter of fact there are several benefits in a usability-oriented approach both for the producer and the users: 181
182
• • • • •
V. Stara et al.
Easy to use products are more competitive on the market; Taking user requirements properly into account during development can reduce the need for expensive late re-design work; Improved usability reduces the demand for support and increases the users' perception of the overall quality of the product; Usable products enable customers to efficiently achieve their goals, rather than waste time struggling with the product interface; Well designed products aid learning and can reduce the time spent on training.
In order to improve usability, we need standards describing how a user interface should be designed for given tasks, users, and contexts. User interface standards are usually encoded in style guides and are available for every major operating system. Standards help to ensure consistency across applications, thus reducing learning time and helping to prevent user errors. This is important in an area such as web usability that is still relatively young and contains many conflicting opinions on what makes a website usable. In fact standards are independent on the opinion of single companies and present a balanced, authoritative view. But standards also mean business, as companies cannot ignore standards since compliance is a mandatory requirement in many contracts (especially in the EU). This contribution contains a short review of the state-of-the-art in web usability assessment issues, by focusing on a new systemic approach, called “eusability”, designed to deal in a more integrated way with the human-computer interaction, so as to allow a more realistic assessment of web interfaces. 2. Usability assessment: Methods and Tools Different approaches can be used in assessing the usability of a web application. Each one of them presents advantages and drawbacks, as well as cost-benefits trade-offs. Most evaluation methods are based on the three steps: getting usability data, analyzing these data and finally suggesting solutions or improvements in order to remove problems. According to Hornbæk (2006) [7] it is possible to classify the usability measures so far introduced into three groups: the ones related to effectiveness, the ones related to efficiency, and the ones related to the degree of fulfillment of the ISO 9241 standard for usability. This standard is defined as: the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.
An E-Usability View of the Web: A Systemic Method for User Interfaces
183
Table 1. Measure Binary task completion Accuracy Recall Completeness Quality of outcomes Experts’ assessment
Explanation Percentage of tasks that a user successfully completes The accuracy with which users complete tasks Users’ ability to recall information from the interface The extent of users’ solution tasks Measures of the quality of the interaction outcome Assessment of the interaction quality Table 2.
Measure Time Input Rate Mental effort Usage Patterns Communication Effort Learning
Explanation The duration of tasks or part of tasks Input rate by the users using mouse or keyboard The users’ mental effort when using the interface Measures of the steps made on the interface to perform tasks Resources spent in communication processes The amount of learning required to use the interface in an effective way Table 3.
Measure Explanation Standard Questionnaires Measure of satisfaction by using a standardized questionnaire Preference
Measure of preferences using the interfaces
Satisfaction with the interface Users’ attitudes and perceptions
User satisfaction as regards his/her interaction with the interface User satisfaction as regards his/her interaction with other persons and contents through the interface
Effectiveness can be defined as the accuracy and completeness with which user achieves specific tasks. Possible measures are described in Table 1. Efficiency is the amount of resources spent with respect to the accuracy and completeness with which users achieve goals. Possible measures are listed in Table 2. Satisfaction concerns the comfort and acceptability of use. Possible measures are described in Table 3. Beyond standard Shneiderman (1997) [16] usability measures, we mention others, but not less important, such as: • Learnability: a measure of the degree to which a user interface can be learned quickly and effectively;
184
• •
V. Stara et al.
Memorability: the amount of interface operation whose steps do not need to be learned again; Errors: the ability of the system to detect user errors.
Usability assessment methods can be classified also using different criteria (Hom, 1998 [6]), such as: • the kind of user interface; in this regard it is possible to distinguish between three methods: a) methods that can evaluate only WIMP (windows, icons, pointer and mouse) user interfaces; b) methods that can evaluate only web interfaces; c) methods that can be used to evaluate both WIMP and web interfaces. • the approach used for evaluation; typical approaches are: a) inquiry b) inspection c) testing d) related techniques. As regards the latter distinction, we can introduce further subdivisions as follows: a) Inquiry Methods include: Contextual Inquiry: it is a structured field interviewing method. It is one of the best methods to understand users’ work context. Ethnographic Study/Field Observation: users are observed on the field. Survey: it is an ad-hoc interview with users. Questionnaries: they are written lists of ad-hoc questions. Journaled Session: it is used as a remote inquiry method based on data capture with the journalizing software. Self-Reporting Logs: it is a paper and pencil journal in which users are requested to log their actions and observation while interacting with a product. Screen Snapshots: it is a method in which the users take screenshots at different times during the execution of a tasks or series of tasks. b) Inspection Methods include: Heuristic Evaluation: usability experts judge whether each element of a user interface follows established usability principles. Cognitive Walkthrough: expert evaluators construct task scenarios from a specification or early prototype and then play the part of a user working with that interface.
An E-Usability View of the Web: A Systemic Method for User Interfaces
c)
185
Formal Usability Inspection: it is a code inspection. This technique is design to reduce the time required to discover defects in a tight product design cycle. Pluralistic Walkthroughs: they are meetings during which users, developers and usability professionals step through a task scenario, discussing and evaluating each element of the interaction. Feature Inspection: it analyzes only the feature set of a product. Each set of features used to produce the required output is analyzed for its availability, understandability, and general usefulness. Consistency Inspection: it begins with a usability professional analyzing the interfaces and noting the various implementations of a particular user interaction or function. An evaluation team, using the usability analysis, negotiates and decides on which is the best implementation for the usability attributes of each product. Standard Inspection: it ensures compliance with industry standards. In such inspections, a usability professional with extensive knowledge of the standards analyzes the elements of the product. Guideline checklists: they help to ensure that usability principles will be considered in a design. Usually, checklists are used in conjunction with a usability inspection method: the checklist gives the inspectors a basis on which to compare the products. Testing Methods include: Thinking Aloud protocol is a popular technique used during usability testing. During a test, where the participant is performing a task as part of a user scenario, the participant is asked to vocalize his or her thoughts, feelings, and opinions. Co-discovery: it is a type of usability testing where two participants attempt to perform tasks together while being observed. The advantage of this method over the thinking aloud protocol is two-fold: in the workplace, most people have someone else available for help; the interaction between the two participants can produce more insights than a single participant vocalizing his or her thoughts. The question-asking protocol evolves from thinking aloud protocol; instead of waiting for users to vocalize their thoughts, they are asked direct questions about the product. Their ability (or lack of ) to answer questions can help in determining which parts of the product interface were obvious, and which were not. Performance Measurement: some usability tests are targeted at determining hard, quantitative data. Most of the time these data have the form of performance metrics--how long does it take to select a block of text with a mouse, touchpad, or trackball? How does the placement of the backspace key influence the error rate?
186
V. Stara et al.
Eye tracking: it allows testers to identify where the participant’s eyes are pointing during a test. Eye tracking equipments use several different technologies, including skin electrodes, marked contact lenses, image processing cameras, and reflector trackers. The last type is probably the most effective, as it does not require physical contact with the user' s eye/eye socket. Instead, a beam of light is projected onto the eye; a sophisticated camera picks up the difference between the pupil refection and known reference points to determine what the user is looking at. d) Related Techniques are: Prototyping models a final product and tests attributes. Affinity diagramming is a categorization method in which users sort various concepts into several categories. This method is used by a team to organize a large amount of data according to the natural relationships between the items. Blind voting is a voting procedure in which each person does not know the opinion of the other ones. Blind voting is often implemented as an electronic meeting system. Card sorting is a categorization method where users sort cards depicting various concepts into several categories. According to Frøkjær et al. (2000) [5], when researchers use one of those measures, they make some implicit or explicit assumptions about the specific method and the particular context, and automatically they may ignore other aspects of usability and therefore they may not capture the method itself: at the moment it is not clear which usability measure needs to be used in a particular situation. Some data are difficult to collect and for this reason notable problems are associated to every method (Hornbæk, 2006 [7]), such as measures of the interaction quality, the outcome of the users’ interaction, measures of learning and retention, how users interact with the interface, measures of users’ satisfaction and perception ignoring validated questionnaires. 3. A systemic method Usability is context dependent (Newman and Taylor, 1999 [13]) and shaped by the interaction between tools, problems and people. Therefore it cannot be directly measured, but to create easy-to-use Web pages an evaluation of usability is necessary. Our approach starts from a consideration: usability represents a complex scenario in which neither quantitative nor qualitative measures are exhaustive, because both subjective and objective data do not describe at the same time this complexity.
An E-Usability View of the Web: A Systemic Method for User Interfaces
187
We use the term “e-usability” to define the usability of the online, computer and electronic world in a systemic view in which it is possible to consider the user factor, the machine factor, the web factor and the context factor as the complex focus of web usability evaluation. Conceptually, usability of an artefact (an e-artefact), measures the cognitive distance between the designer and user model: according to Norman (2002), a good design is based on a mapping between the user' s mental model and the designer' s design or conceptual model. The user' s model is the mental model developed through an interaction with the system. The system image results from the physical structure that has been built (including documentation, instructions, and labels). The designer expects the user' s model to be identical to the design model. But the designer doesn' t talk directly to the user, since communication takes place through the system image. If the system image does not make the designer model clear and consistent, then the user will end up with a wrong mental model. For this reason usability is not a property of objects but it always refers to the task, to the user and to the relationship between humans and artifacts. How can we make computer technology more usable by people? In this regard we claim that: 1. this requires the understanding of at least four components: the user who interacts with it, the system (the computer technology and its usability), the interaction between the user and the system, the context of use. 2. Usability is a multi-disciplinary subject since the designer of an interactive system should have expertise in a range of fields: psychology and cognitive science to understand the user’s perceptual, cognitive, and problem solving skills, sociology to understand the wider context of interaction, ergonomics to understand the user’s physical capabilities, graphic design to produce effective interface present action, computer science and engineering to be able to build the necessary technology. 3. Usability is related to the design of interfaces. However the interface and the user can be considered as two subsystems of a broader system which includes also all the domains which influence (and are influenced by) human-computer interaction. The problem of human-computer interface is an aspect of the adaptation process which occurs between a human and its surrounding environment (Penna and Pessa, 1994 [15]). The three principles sketched above characterize our approach to e-usability as “systemic”. In the following we shortly summarize our systemic model and its four factors (user factor, system factor, interaction factor and context of use factor).
188
V. Stara et al.
User Factor can be analyzed by classifying experience, educational level, age and all the information about final users which could help to define their target and to discover their mental models and expectations. According to the User Centered Design approach, the user is identified by asking: • How much experience do the users have with: Computers? The Web? The domain (subject matter)? • What are the users’ working/web-surfing environments? • What hardware, software, and browsers do the users have? • What are the users’ preferred learning styles? • What language(s) do the users speak? How fluent are they? • What cultural issues might there be? • How much training (if any) will the users receive? • What relevant knowledge/skills do the users already posses? • What do the users need and expect from this web site? • What are the tasks users need to perform; how do they currently perform these tasks? What is the workflow? • Why do the users currently perform their tasks the way they do? • What are the users’ information needs? • How do users discover and correct errors? • What are the users’ ultimate goals? As regards the other factor, the system or the computer technology, it constitutes the hardware aspect that influences usability perception through input and output devices, operating system, network connection. Input to computers consists of sensed information about the physical environment. Familiar examples include the mouse, which senses movement across a surface, and the keyboard, which detects a contact closure when the user presses a key. Humans communicate with the system using motor functions or voice emissions. Output from computers can include any emission or modification to the physical environment, such as produced by a display, speakers, or tactile and force feedback devices. The perception channels used by humans are traditionally three: visual, auditory and tactile. However in most cases, information coming from a computer consists in visual patterns, which can be characterized by means of shape, colour, luminance, dimension, position, orientation, texture and temporal evolution of the above. In case of auditory patterns, whose use in computer interfaces is more and more increasing, they are
An E-Usability View of the Web: A Systemic Method for User Interfaces
189
characterized by frequency, objective and perceived amplitude, duration and spatial localization (Penna and Pessa, 1994 [15]). Hardware or operating system could influence input and output devices: engineering parameters of a device’s performance such as sampling rate, resolution, accuracy, and linearity can all influence performance. Latency is the end-to-end delay between the user’s physical movement, sensing this, and providing the ultimate system feedback to the user. It is to be taken into account that, despite the fast technological development, most users concentrate their hardware and software choices on a very small number of products. For instance, today almost a single operating system dominates the personal computer market (with important exceptions in specific contexts). Similarly, only two website browsers are favoured by the vast majority of users. More than ninety percent of users have their monitor resolutions set to 800×600 or 1024×768 pixels. Thus, by designing for 800×600, designers will, at the same time, satisfy the needs of this most common resolution, as well as those at any higher resolution. And, while most users in their place of work have high-speed Internet access, most users at home connect at dial-up (56K or less) speeds. All the circumstances quoted above impose strong constraints on interface design and on the possibility of satisfying usability requirements. As regards the interaction we remark, first of all, that, according to Krug (2000) [8], people tend to spend very little time reading most web pages. Namely they prefer to scan them. For this reason the organization of information within web site is vital to its usefulness. Websites have higher usability scores when text is written concisely, easily scannable and written in an objective instead of a promotional style (Morkes and Nielsen, 1997 [12]). Viewers tend to move quickly from page to page. Instead users usually scan for information that is of direct interest to them. Accordingly, it is suggested that text should be concise, include only one key idea per paragraph, use highlighted keyword or phrases, and use bulleted lists when possible. Users have grown accustomed to looking in certain areas on a screen to find specific items (Bernard, 2001 [2]). All these considerations led researchers to formulate a brief list of suggestions to improve the interaction between users and machine/web pages: create a clear visual and logical hierarchy on each pages, use conventions accepted by the community, break up pages into clear areas and finally make it obvious what is clickable. The context of use defines the actual conditions under which a given artefact/software product is used, or will be used in a normal day to day working situation. It involves identifying the intended users of a product, the tasks they will perform and the environment in which they will use the product (the
190
V. Stara et al.
Figure 1. “E-Usability” Model
physical, social, and organizational environments in which the system/site is used). It is possible to describe the context of use in terms of the following: • Capabilities of intended users. It includes impairments/disabilities. • User Tasks (scenario level). This allows to identify the goals of each task performed by the users as well as their overall goal for using the system. Task characteristics affecting usability (frequency, dependency, and duration) should be described. • Environmental aspects. They describe the environment in which the users will use the system. This includes hardware, software, and supplemental materials. In most cases, a set of products (and their characteristics) should be sufficient. Understanding the context of use guides early requirements and user interface design decisions, providing a basis for later usability evaluations. In an attempt to summarize the points discussed above we propose the “eusability” model as a working model to design or evaluate an usable artefact (Figure 1 and Table 4). 4. Conclusions Usability is the real challenge of the web. Instruments for discount usability or advanced operating methods are available for anybody who desires to produce user-friendly web pages. In this paper, a practical method has been proposed. This method is quick and does not require any artefact design or evaluation cost. This method is oriented to the four factors which are the key points for characterizing a good human-computer interaction; it is presented under the form of a checklist which
An E-Usability View of the Web: A Systemic Method for User Interfaces
191
Table 4. Descriptive Factors of “E-Usability” Model User
System
Interaction
Context
•
- How much experience do the users have with: Computers? The Web? Domain (subject matter)? - What are the users’ working/web-surfing environments? - What hardware, software, and browsers do the users have? - What are the users’ preferred learning styles? - What language(s) do the users speak? - How fluent are they?- What cultural issues might there be? - How much training (if any) will the users receive? - What relevant knowledge/skills do the users already posses? - What do the users need and expect from this web site? - What are the tasks users need to perform; how do they currently perform these tasks? What is the workflow? - Why do the users currently perform their tasks the way they do? - What are the users’ information needs? - How do users discover and correct errors? - What are the users’ ultimate goals? Known Input/Output devices and Operating System Design for 800x600 pixel (monitor resolution) Design for the connection speed of most users Scannable pages Follow the standard. Main Menù on the left Clear visual and logical hierarchy on each pages - Capabilities of intended users. It includes impairments/disabilities. - User Tasks (scenario level). Identify the goals of each task users perform and their overall goal for using the system. Task characteristics affecting usability (frequency, dependency, and duration) should be described. - Environmental aspects. Describe the environment in which the users will use the system.
Collects data referring to real life and past experience of users, proposing a user-centered approach; • Allows collection of data referring to an unlimited number of users; • It has not excessive costs of preparation, operation, data analysis and time spent by the users; • It can be applied in whatever section of interest and in any step of artefact design and evaluation. Such instruments could be useful tools to make any website usable.
192
V. Stara et al.
References 1. A. Ananthan, S. Dhandapani, H. Reza, K. Namasivayam. In Proceeding of The 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
Third International Conference on Information Technology: New Generation (ITNG’06), (2006). M.L. Bernard,. Proceedings of CHI ’01, 171-172, (2001). URL: http://psychology.wichita.edu/hci/projects/CHI%20web%20objects.pdf (accessed on March 2007). G. Cockton, and D. Lavery, in Human-Computer Interaction – INTERACT ’99: Proceedings of the Seventh IFIP Conference on Human-Computer Interaction, Ed. M.A. Sasse and C. Johnson, (IOS Press, London, 1999), 344-352. X. Fang, C.W. Holsapple, Decision Support Systems (in press). E. Frøkjær, M. Hertzum, K. Hornbæk, in Proceedings of ACM Conference on Human Factors in Computer Systems (ACM Press, New York, NY, 2000), pp. 345352. J. Hom, The usability methods toolbox (1998), http://jthom.best.vwh.net/usability/ (accessed on March 2007). K. Hornbæk, International Journal of Human Computer Studies 64, 79-102 (2006). S. Krug, Don' t Make Me Think: A Common Sense Approach to Web Usability (New Riders Publishing, 2000). ISO, 9241 Requirements for Office Work with Visual Display Terminals, Draft International Standard (DIS) (ISO, 1999). B.M. Leiner, V.G. Cerf, D.D. Clark, R.E. Kahn, L. Kleinrock, D.C. Lynch, J. Postel, L.G. Roberts, S. Wolff, History of Internet, (2003), URL: http://www.isoc.org/internet/history/brief.shtml (accessed on March 2007). C. Mariage, J. Vanderdonckt, C. Pribeanu, The Handbook of Human Factors in Web Design, Ed. R.W. Proctor, K.-Ph.L. Vu, (Lawrence Erlbaum Associates, Mahwah, 2005), Chapter 41. J. Morkes, and J. Nielsen, Concise, scannable, and objective: How to write for the Web (1997), Alertbox. URL: http://www.useit.com/papers/webwriting/writing.html (accessed on March 2007). W. Newman, A. Taylor, in: Proceedings of IFIP TC.13 International Conference on Human-Computer Interaction, (IOS Press, Amsterdam, 1999), pp. 605-612. D. Norman, The Design of Everyday Things (Basic Books, New York, 2002). M.P. Penna, E. Pessa, Le Interfacce Uomo Macchina (Di Renzo Editore, Roma, 1994). B. Shneiderman, Designing the user interface (Addison Wesley, 1997). The Observer, Websites that chanced the world (2006), URL: http://observer.guardian.co.uk/review/story/0,,1843263,00.html (accessed on March 2007). C.H. Wang, A Survey of Design Guidelines for Usability of Web Sites, Proceedings of HCI International 1, 183-187 (2001).
EMERGENCE
This page intentionally left blank
EVOLUTIONARY COMPUTATION AND EMERGENT MODELING OF NATURAL PHENOMENA R. RONGO(2,3), W. SPATARO(1,3), D. D’AMBROSIO(1,3), M.V. AVOLIO(1,3), V. LUPIANO(2), S. DI GREGORIO(1,3) (1) Department of Mathematics, University of Calabria, Italy (2) Department of Earth Sciences, University of Calabria, Italy (3) Center of Excellence for High Performance Computing, University of Calabria, Italy Cellular Automata are discrete dynamical systems capable to produce interesting and heterogeneous emergent behaviors even in spite of simple local rules of evolution. In this review paper, an evolutionary methodology for evolving models of complex natural macroscopic systems through Macroscopic Cellular Automata is focused. Two examples of applications to the simulation of lava and debris flows, when compared with real cases of study, have confirmed the goodness of the approach, both in qualitative and quantitative terms. Keywords: cellular automata, modeling of lava and debris flows, local evolution rules.
1. Introduction Among natural phenomena, lava and debris flows belong to the class of geological processes which are generally considered hard to be modeled and simulated, due to the variability of their rheological behavior. For instance, depending on temperature, lava flows can range from Bingham to Newtonian fluids, as well as debris flows can do (in this case the role of temperature can be hold by other factors such as water content). Nevertheless, these phenomena belong to the set of the most dangerous ones, as they may produce serious damage for people and properties. However, their forecasting could significantly decrease these hazards, for instance by simulating the flows paths and evaluating the effects of control works (e.g. embankments or channels). Hence, good simulation models, able to reproduce the phenomena of interest with a satisfying degree of precision, can represent an interesting and useful tool for civil defense purposes. Unfortunately, the above mentioned phenomena may be difficult to be modeled through classic methods (e.g. by differential equations - cf. McBirney and Murase 1984 [15]), and many alternative methods have been proposed in
195
196
R. Rongo et al.
literature in the attempt to approach the problem from different perspectives. Among these, one of the most promising is based on Cellular Automata (CA), which in a natural manner allow to describe the dynamics of (even complex) systems by simultaneously applying local rules of evolution to the elements (called cells) of a discrete domain. Well known examples of CA are Lattice Gas Automata and Lattice Boltzmann models (cf. Succi 2004 [18]), which are particularly suitable for modeling fluid dynamics at a microscopic scale. However, lava and debris flows are difficult to be modeled at such scale, as they generally evolve on very large areas. For these cases, Macroscopic Cellular Automata (MCA - cf. Di Gregorio and Serra 1999 [11]) can represent a valid alternative, as they offer a convenient formal context were, as the name suggests, the main features of the phenomena of interest can be directly described at a macroscopic level. Often, the local rules of evolution of a MCA model depend on a set of parameters, whose variations can produce different emergent dynamical behavior. In other words, once the model parameters have been set, the model exhibits a particular (emergent) dynamic, which can be more or less similar to the desired one. For instance, if one attempts to reproduce a real debris flow, the desired behavior for the model is a simulation which is the most possible similar to the real case. Unfortunately, it is not always simple to discover the best set of parameters for a MCA model without relying on an automated calibration technique. In such a case, as assessed in this review paper, an evolutionary approach based on the application of Genetic Algorithms (GAs - Holland 1975) [13] can represent a valid solution, at least for the calibration of models of natural complex phenomena, as applied by Di Gregorio et al. (1997) [12] for the specific problem of modeling soil bioremediation by MCA. In the next Sections, Macroscopic Cellular Automata and Genetic Algorithms are briefly presented. The first are considered in the simulation models definition, while the latter are the evolutionary algorithms utilized for their calibration. Specifically, the MCA lava flows simulation model SCIARA-fv and the debris flow model SCIDDICA-S4c are illustrated, together with calibration results on two real cases of study. A final discussion concludes the paper. 2. Macroscopic cellular automata Macroscopic Cellular Automata were proposed by Di Gregorio and co-workers in 1982 to model the dynamics of macroscopic spatially extended systems, and firstly applied to the simulation of basaltic lava flows (Crisci et al. 1982) [3].
Evolutionary Computation and Emergent Modeling of Natural Phenomena
197
Subsequently, MCA were adopted for the simulation of many macroscopic phenomena, such as other kinds of lava flows (Crisci et al. 2004) [4], debris flows (Iovine et al. 2005) [14], as well as pyroclastic flows (Avolio et al. 2006) [1], bioremediation processes (Di Gregorio et al. 1997) [12] and traffic control (Di Gregorio et al. 1996) [9]. With respect to the classical definition of Cellular Automata, MCA introduce some extensions. Firstly, the state of the cell is decomposed in “substates”, each one representing a particular feature of the phenomenon to be modeled. The overall state of the cell is thus obtained as the Cartesian product of the considered substates. Moreover, some parameters are generally considered, which allow to “tune” the model for reproducing different dynamical behaviors of the phenomenon of interest. Eventually, as the cell state is subdivided in substates, even the state transition function is split in “elementary processes”, which constitute the local rules of evolution for the system, each one describing a particular aspect of the considered phenomenon. Finally, several “external influences” can be considered in order to model features which are not easy to be described in terms of local interactions. An example of a MCA model for the simulation of lava flows is presented in the following sections. 2.1. The Minimization Algorithm of the Differences Phenomena involving flows which evolve at a macroscopic level of description are particular suitable to be modeled through MCA. If the cell dimension is a constant value throughout the cellular space, as usually occurs in MCA models, it is possible to consider characteristics of the cell (i.e. substates), typically expressed in terms of volume (e.g. lava volume), in terms of thickness. This simple assumption permits to adopt a straightforward but efficacious strategy that computes outflows from the central cell to the neighboring ones in order to minimize the non-equilibrium conditions. Outflows computation is performed by the “minimization algorithm of the differences”, well described in (Di Gregorio and Serra 1999) [11]. It is based on the following assumptions: • two parts of the considered quantity must be identified in the central cell: these are the unmovable part, u(0), and the mobile part, m; • only m can be distributed to the adjacent cells. Let f(x,y) denote the flow from cell x to cell y; m can be written as:
m=
#X i =0
f (0, i)
198
R. Rongo et al.
where f(0,0) is the part which is not distributed, and #X is the number of cells belonging to the X neighborhood; • •
the quantities in the adjacent cells, u(i) (i=1,2,...,#X) are considered unmovable; let c(i)=u(i)+f(0,i) (i=0,1,...,#X) be the new quantity content in the ith neighboring cell after the distribution; let cmin be the minimum value of c(i) (i=0,1,...,#X). The outflows are computed in order to minimize the following expression: #X i=0
(c(i ) − c min )
(1)
The minimization algorithm operates as follows: 1. the following average is computed: a=
m+
i∈A
u (i )
#A
where A is the set of not eliminated cells (i.e. those that can receive a flow); note that at the first step #A= #X; 1. 2.
cells for which u(i) a (i = 0, 1,…, #X) are eliminated from the flow distribution and from the subsequent average computation; the first two points are repeated until no cells are eliminated; eventually, the flow from the central cell towards the ith neighbor is computed as the difference between u(i) and the last average value a:
f (0, i ) =
a − u (i ) i ∈ A 0
i∉ A
Note that the simultaneous application of the minimization principle to each cell gives rise to the global equilibrium of the system. The correctness of the algorithm is stated in (Di Gregorio and Serra 1999) [11], i.e. it minimizes equation 1. 3. Genetic Algorithms Once a MCA model has been (well) defined, it generally needs a calibration phase in order to individuate the parameters values which allow to reproduce the phenomenon at best. This is a crucial phase for the model development, needed to assess its goodness. In fact, if the model was built by including all the peculiar aspects of the system to be simulated, a proper calibration phase can discover
Evolutionary Computation and Emergent Modeling of Natural Phenomena
199
suitable values for parameters which allow to simulate the phenomenon with a satisfying degree of accuracy. At the contrary, if some important aspect was omitted, a proper calibration will point out such weakness, as no such a good simulation will be possible. Note that, in such discussion, the “unlucky” possibility to obtain good simulation results by using “bad” models is not considered. Even if no standardized optimization techniques do exist for MCA, in previous works concerning debris and lava flows simulation (D’Ambrosio et al. 2006 [6]; Spataro et al. 2004 [17]), an evolutionary approach based on the application of Genetic Algorithms demonstrated to be a good choice. Briefly, GAs are adaptive heuristic search algorithms inspired to Natural Selection and Genetics in which a solution to a given search problem is encoded as a genotype (or individual), and the set of all possible values it can assume is named search space. At the beginning, the GA randomly creates a population of individuals (candidate solutions), each one evaluated by means of a fitness function. Subsequently, the selection operator, which represents a metaphor of Darwinian Natural Selection, chooses individuals that undergo reproduction, by favoring the fittest ones (i.e. those having higher fitness). Reproduction is thus performed by means of genetic operators (generally crossover and mutation, representing a metaphor of sexual reproduction), and a new population of offspring obtained. The evolution towards a good solution is typically obtained by the iterative application of selection and genetic operators to the initial population. The iterative process continues until a termination criterion is met, such as a known optimal or acceptable solution is attained, or the maximum number of steps is reached. The convergence towards a good solution is stated by the “Fundamental Theorem of GAs” (Holland 1975) [13]. 4. Lava and debris Flows MCA Simulation Models In the following, the models SCIARA-fv (D’Ambrosio et al. 2006) [5] and SCIDDICA-S4c (D’Ambrosio et al. 2007) [7] are described. However, while the first is (briefly) illustrated by describing both local rules of evolution and parameters, the second is only outlined and the role of parameters pointed up. 4.1. The Lava Flows Model SCIARA-fv SCIARA-fv is the last release of a family of MCA models for lava flows simulation. The main model’s characteristics can be summarized by the following points:
200
• • •
• • • •
•
R. Rongo et al.
it is a bi-dimensional model, based on hexagonal cells; the cell neighborhood, X, is composed by the cell itself and the six adjacent ones; the model substates are Qa, Qt, Qf6 and QT for altitude, lava thickness, lava flows from the central cell towards the six adjacent ones and temperature, respectively; lava feeding is modeled as an external influence by specifying cells which behave as vents; lava flows are computed by applying the minimization algorithm of the differences, as described in the next section; lava temperature drop is modeled by applying the irradiation equation; lava viscosity varies according to lava temperature; it is modeled in terms of adherence, which specifies the minimum amount of lava that cannot flow out of the cell at each step; solidification process depends on lava temperature; it is trivially modeled by adding solidified lava thickness to the cell altitude.
In formal terms, SCIARA-fv is defined as SCIARA-fv =
where: • R is the set of hexagonal cells covering the finite region where the phenomenon evolves; • L⊂R specifies the lava source cells (i.e. vents); • X = {Center, NW, NE, E, SE, SW, W} identifies the hexagonal pattern of cells that influence the cell state change. They are the cell itself, “Center”, and the “North-West”, “North-East”, “East”, “South-East”, “South-West” and “West” neighbors; • Q = Qa × Qt × QT × Qf6 is the finite set of states, considered as Cartesian product of “substates”. Their meanings are: cell altitude, cell lava thickness, cell lava temperature, and outflows lava thickness (from the central cell toward the six adjacent cells), respectively; • P = {ps, pTv, pTsol, padv, padsol, pcool, pa} is the finite set of parameters (invariant in time and space), which affect the transition function; their meanings are: time corresponding to a CA step, lava temperature at the vent, lava temperature at solidification, lava adherence at the vent, lava adherence at solidification, the cooling parameter and cell apothem, respectively; • τ:Q7→Q is the cell deterministic transition function;
Evolutionary Computation and Emergent Modeling of Natural Phenomena
•
201
γ:Qt×N→Qt specifies the emitted lava thickness from the source cells at each step k∈N (N is the set of Natural Numbers).
The transition function is described in outline in the following, in order of application. 4.1.1. Lava flows computation Lava rheological resistance increases as temperature decreases; consequently, a certain amount of lava, i.e. the lava adherence ν , cannot flow out from the central cell towards any neighboring ones. It is obtained by means of the inverse exponential function:
ν = k1e − k 2T where T∈QT is the lava temperature, while k1 and k2 are parameters depending on lava rheological properties (Park and Iversen 1984). The values for k1 and k2 are simply obtained by solving the equations system:
padν = k1e − k 2 pTν
p adsol = k1e − k 2 pTsol Let a∈Qa and t∈Qt be the cell altitude and cell lava thickness, respectively; in order to compute lava outflows from the central cell towards its neighboring ones, the minimization algorithm is applied to the following quantities: • u(0) = a(0)+ν • m = t(0)–ν • u(i) = a(i)+t(i) (i=1,2,…,6) Eventually, a relaxation rate factor, related to the CA clock ps and to the cell size pa, may be considered in order to obtain the local equilibrium condition in more than one CA step. This can significantly improve the realism of model as, in general, more than one step may be needed to displace the proper amount of lava from a cell towards the adjacent ones, especially when a small value for ps and high value for pa are considered. However, since a relatively high value for ps and a small value for pa were adopted in the simulations here presented, the relaxation rate was not taken into account in practice, and its exact specification thus omitted.
202
R. Rongo et al.
4.1.2. Temperature Variation A two step process determines the new cell temperature. In the first one, the cell temperature is obtained as weighted average of residual lava inside the cell and lava inflows from neighboring ones: Tav = t r × T (0 ) +
6
f (i,0) × T (i )
i =1
tr +
6
f (i,0 )
i =1
where tr∈Qt is the residual lava thickness inside the central cell after the outflows distribution, T∈QT is the lava temperature and f(i,0) the lava inflow from the ith neighboring cell. Note that f(i,0) is equal to the lava outflow from the ith neighboring cell towards the central one, computed by means of the minimization algorithm. The final step updates the previous calculated temperature by considering thermal energy loss due to lava surface irradiation:
T = Tav
3
1 + (Tav3 CA V )
where C=pcool is the “cooling parameter” which depends on lava rheology, A is the surface area of the cell, and V the lava volume (Crisci et al. 2004) [4]. 4.1.3. Lava Solidification When the lava temperature drops below the threshold Tsol, lava solidifies. Consequently, cell altitude increases by an amount equal to lava thickness and new lava thickness is set to zero. 4.2. The Debris Flows Model SCIDDICA-S4c SCIDDICA is a family of Macroscopic CA models designed for a wide-range of hazardous flow-type landslides including earth flows, debris flows, and debris avalanches. In the last release S4c (D’Ambrosio et al. 2007) [7], the model is able to simulate fast moving inertial flows by also taking into account the effect of collisions among masses. It is also able to simulate the effects of erosion and the activation of secondary sources. It is formally defined by the quintuple: SCIDDICA-S4c= where • R is the hexagonal cellular space, i.e. the set of regular hexagons covering the finite region in which the phenomenon evolves.
Evolutionary Computation and Emergent Modeling of Natural Phenomena
•
•
•
•
203
X = {Center, NW, NE, E, SE, SW, W} identifies the hexagonal pattern of cells that influence the cell state change. They are the cell itself, “Center”, and the “North-West”, “North-East”, “East”, “South-East”, “South-West” and “West” neighbours. Q=Qz×Qh×Qr×Qo6×QE×Qpx×Qpy is the finite set of states of the generic cell, defined as the Cartesian product of substates. Their meanings are: cell altitude, cell debris thickness, depth of soil that can be eroded (regolith), outflows debris thickness, energy, and components of momentum along x and y directions, respectively. Note that the superscript of Qo refers to outflows from the central cell towards the six neighboring cells. P={pc, pt, pd, padh, pf, pet, ppef } is the set of model parameters utilized in the transition function; their meaning are: cell apothem, CA clock (i.e. the time corresponding to a CA step), energy dissipation, debris adherence, friction angle, erosion threshold, and progressive erosion factor , respectively. Their further specification are described below. Note that pd is a family of three dissipative parameters, as specified in the following. : Q7 Q is the deterministic transition function. It is made of the local rules of evolution (or “elementary processes”).
In the following, the specification of the elementary processes of the transition function is described with the aim to specify the role of model parameters. For a complete description of the model, see D’Ambrosio et al. (2007) [7]. 4.2.1. Regolith erosion As stated above, the flow is able to erode the soil on a specific condition, i.e. when the flow energy overcomes a prefixed threshold, pet . In this case, the depth of erosion is proportional to the mass energy by means the parameter of progressive erosion, p pef . In any case, the depth of erosion cannot be greater than the depth of soil that can be effectively eroded, specified by the substate Qr . Hence, if q r ∈ Qr is the depth of regolith in a generic cell, qe ∈ QE its energy, with the condition qe > pet , then the depth of erosion, he , is given by the following formula:
he = {qr , qe , p pef } 4.2.2. Computation of the “minimizing” debris outflows In SCIDDICA-S4c, the concept of “minimizing” outflows was introduced. They are simply the outflows computed by applying the minimization algorithm of the
204
R. Rongo et al.
differences, without considering any kind of relaxation factor. Hence, such flows are those that, if distributed to the neighboring cells, lead the neighborhood to the state of equilibrium. The parameters involved in such elementary process are pf and padh. The first specifies a “critical angle”: if the angle among two adjacent cells does not overcome pf, the cell candidate for the distribution is eliminated and will not receive any flow. The parameter padh specifies the thickness of flows that cannot leave the cell due to the effect of adherence. It generally depends on the characteristics of the flowing mass. Minimizing outflows are then considered as a “starting point” to derive “effective flows”, as described in the following. 4.2.3. Conservation of mass, energy and momentum This is the elementary process where effective flows are computed. In general, if qo(x,y) denotes the minimizing flow from the cell x to the cell y, v(x,y) its velocity and d(x,y) the distance between the cell x and y, the effective flow, f(x,y), is given by the following formula:
f ( x, y ) = q0 ( x, y ) ⋅ [v( x, y ) ⋅ pt d ( x, y )] Here, the quantity v( x, y ) ⋅ pt , being pt the CA clock, represents the distance that the flow covers on the basis of its velocity v( x, y ) . Consequently, v( x, y ) ⋅ pt d ( x, y ) can be considered as an index of how much space the flow will cover in a CA step with respect the maximum allowed, i.e. d ( x, y ) , and thus can be considered as a variable relaxation rate. Eventually, note that v( x, y ) ⋅ pt cannot overcome d ( x, y ) , otherwise the flow can exceed the neighborhood. If this happens, the CA clock must be diminished and the simulation restarted. Once effective flows are determined, also associated values of energy and momentum are computed so that, in the subsequent distribution phase, mass, energy and momentum are preserved. 4.2.4. Energy loss This elementary process is responsible to velocity drop and, consequently, to energy dissipation. Three parameters are involved, which together form the overall parameter of dissipation pd. If v denotes the module of velocity of the mass in a generic cell, velocity drop is modeled as follows:
v = v − h p dN − v p dP − v 2 pdQ
Evolutionary Computation and Emergent Modeling of Natural Phenomena
205
where: • pdN is the “not-dependent” dissipation parameter, which produces a velocity drop only on the basis of the mass weight (in SCIDDICA-S4c modeled in terms of height h). • pdP is the dissipation parameter, which produces a velocity drop proportionally to the current velocity. • pdQ is the dissipation parameter, which produces a velocity drop proportionally to the square of the current velocity. Note that such mechanisms of dissipation can be considered either alone or in combination. For instance, if one conjectures that the behavior of the flow to be modeled is essentially turbulent, proportional velocity drop can be neglected by simply imposing pdP to 0. Similarly, if the behavior of the flow is conjectured to be essentially laminar, the dissipation which characterize turbulent flows (i.e. proportional to v2) can be neglected by simply imposing pdQ to 0. 5. Calibration As previously stated, once that a MCA model has been defined, a calibration phase is generally needed to find a set of parameters which allow the model itself to reproduce the phenomenon of interest in a satisfying manner. At this purpose, maps of real cases can be compared with simulations, and a quantitative measure of the quality of the results can be expressed through suitable “fitness functions”. As discussed in D’Ambrosio et al. (2006) [6], the trivial comparison of the extent of real and simulated cases can be considered for a simplified, preliminary calibration. Though, when proper input data are available, a more “articulated” fitness function, based on a more representative set of characteristics of the phenomenon (e.g. erosion depth or landslide thickness for a debris flows model, or even information about the duration of the real event), is indeed a better choice, and allows for a more “refined” calibration. In the following the application of Genetic Algorithms to the calibration of SCIARA-fv and SCIDDICA-S4c is described with respect two real cases of study. 5.1. SCIARA-fv Calibration Among the numerous variants of Genetic Algorithm models proposed in literature (cf. Mitchell 1996 [16]; Cantù-Paz 2000 [2]), the one employed for the calibration of SCIARA-fv represents (encodes) parameters to be optimized as bit strings. Moreover, the GA is steady-state and elitist, so that at each step only the
206
R. Rongo et al.
10
m3/sec 5
0 1
2
3
4
5
6
7
8
9
10
Days
Figure 1. Lava emission rate of the 2001 Etnean eruption started from Mount Calcarazzi which threatened the towns of Nicolosi and Belpasso.
worst individuals are replaced. The remaining ones, required to form the new population, are copied from the old one, choosing the best. In order to select the individuals to be reproduced, the “binary-tournament without replacement” selection operator was utilized. It consists of a series of “tournaments” in which two individuals are selected at random, and the winner is chosen according to a prefixed probability, which must be set greater for the fittest individual. In our case, this probability was set to 0.6. Moreover, as the variation without replacement scheme was adopted, individuals cannot be selected more than once. Employed genetic operators are classic Holland’s crossover and mutation with probability of 1.0 and 2/44, respectively. In particular, the above probability of mutation permitted to have, on an average, two bits mutated for each individual, as the genotype length (obtained as the sum of the number of bits chosen for the encoding of each considered SCIARA-fv parameters - cf. Table 1), was exactly 44. Eventually, the number of individuals forming the initial population was set to 256, while the number of individuals to be replaced at each GA step was set to 16. Finally, the original fitness function e1 (cf. Spataro et al. 2004 [17]), was replaced with a new one. The e1 fitness function took into account only the comparison between the areal extensions of the real and simulated events; it was defined as:
e1 =
m( R ∩ S ) m( R ∪ S )
(2)
where R and S represent the areas affected by the real and simulated event, respectively, while m(A) denotes the measure of the set A. Note that e1 ∈ [0,1] ;
Evolutionary Computation and Emergent Modeling of Natural Phenomena
207
Table 1. The best set of SCIARA-fv parameters as obtained through calibration phase, together with their explored ranges. The number of bits used for the genetic algorithm encoding are also listed. Parameter
Explored range
ps pTv pTsol padv padsol pcool pa
[60, 180] [1123, 1173] [0.1, 2.0] [6.0, 30.0] [10-16, 10-13] -
Bits 8 8 4 6 16 -
Best value 155.29 s 1373 °K 1165.35 °K 0.7 m 12 m 2.9×10-14 m °K-3 5m
its value is 0 if the real and simulated events are completely disjoint, being m( R ∩ S ) = 0 ; it is 1 in case of perfect overlap, being m( R ∩ S ) = m( R ∪ S ) . First calibration experiments were performed by considering the 2001 Etnean eruption (Sicily, Italy) which started from the fracture of Mount Calcarazzi and pointed southwards creating the main danger for the towns of Nicolosi and Belpasso (cf. Figure 1 for the lava flow emission rate at vent). In this preliminary phase the fitness function e1 was adopted. However, even if results seemed quite satisfactory, the best simulation gave its final shape at the end of the 8th day, in spite of the 10th for the case of the real lava flow. As a consequence, obtained parameters allowed for simulating lava flows with different rheological characteristics, e.g. with greater viscosity. Hence, an improved fitness function, f1, was devised, which takes into account both the areal extensions of the real and simulated events, and their temporal duration. It is defined as follows: f1 = e1(t1 ) e1(t 2 )
where e1 is defined as before, while t1 and t2 represent two different temporal instants where it is evaluated. In particular, t1 represents the time in which the real event reaches its stationary state. In this instant, the function e1 is evaluated for the first time, giving information about the overlapping ratio of the simulation at that particular moment. However, contrarily to the real event, the simulation might not reach its final configuration at the same instant and its shape change further in time. In this case, if the function e1 is again evaluated, for instance at the time t2>t1, its value could differ from the previous one, meaning that the overlapping ratio changed and thus the simulation did not stop when the real event did. Hence, as e1 does, even the function f1 gives values belonging to the interval [0,1], with the difference that the value 1 is obtained when the real and simulated events perfectly overlap, with the further condition
208
R. Rongo et al.
Figure 2. Comparison between the 2001 Nicolosi Etnean Event and the best SCIARA-fv simulation, as obtained by adopting the parameters listed in Table 1. Key: 1) Area affected by the real event; 2) Area affected by the simulation; 3) Area affected by both real and simulated events; 4) Limits of real event, 5) Limits of simulated event.
that the simulation stops exactly at the same time as the real event does. In other words, f1=1 if and only if e1(t1)=e1(t2)=1. Note that, by considering the available data concerning the cases of study here considered (limited to the areal extension and duration), f1 can be considered a satisfying objective function for the model calibration phase. A more refined function can be certainly considered, e.g. by evaluating intermediate results along the overall period of evolution and not only at the end. However, its definition is constrained to the availability of reliable information about the real phenomenon, which is usually difficult to obtain. Accordingly, the goal for the GA was to find a set of CA parameters that maximize f1. On the basis of previous empirical attempts, ranges within which the values of the CA parameters are allowed to vary were individuated in order to define the GA search space (cf. Table 1), and a set of 10 experiments iterated for 100 steps. As regards the fitness function, t1 was set to 10 days (which corresponds to the duration of the real event), while t2 was set to 13 days. As concerns the prefixed parameters, pTv was set to a value which corresponds to the typical temperature of Etnean lava flows at vents, while pa was set on the basis of the detail of available topographic map of the area of interest. Coupled with the prefixed parameters, those obtained thanks to the calibration phase allowed to satisfactorily reproduce the considered 2001 Nicolosi Etnean lava flow (cf. Figure 2), giving rise to a fitness equal to 0.72, corresponding to a value of 0.74 in terms of areal comparison (i.e. in terms of e1 – cf. equation 2).
Evolutionary Computation and Emergent Modeling of Natural Phenomena
209
Figure 3. The May 1998 Curti landslide. Key: 1) area affected by the real case; 2) limit of the zones with constant depth of regolith (assumed values in meters, in italics); 3) border of the area considered for comparison between the real and the simulated cases; 4) secondary source locations.
This value exceeds the “classic” threshold (0.7) commonly assumed as “acceptable” for calibration experiments. 5.2. SCIDDICA-S4c Calibration As for SCIARA-fv, the calibration of SCIDDICA-S4c was performed through a genetic algorithm by considering a real case of study, specifically the May 1998 Curti-Sarno (Campania, Italy) debris flow (Del Prete et al., 1998) [8]. In Figure 3, the map of the Curti real case is shown, depicting location and extent of
210
R. Rongo et al.
Table 2. List of SCIDDICA-S4c parameters either prefixed or optimized through GA. Variation ranges and best values are also shown. Parameter pc pt pdN pdP pdQ padh pf pet ppef
Explored range [0, 5] [0, 1] [0, 16] [0, 10] [0.0001, 5]
Bits 8 8 8 8 8
Best value 1.25 m 0.25 s 4.7 0 0.74 0.001 m 2.3 degrees 0J 0.18
sources, and thickness of erodable regolith. Information concerning topography, sources and erodable regolith, together with the values assigned to the CA parameters, do define the “initial conditions” of each simulation. However, differently from the case of SCIARA-fv, the fitness function e1 was adopted (cf. equation 2), as only the areal extent of the real case was known with sufficient detail. Similarly to SCIARA-fv, model parameters were encoded as bit strings, and populations of 200 individuals were considered. Furthermore, the adopted GA is a steady state and elitist model with a “binary-tournament without replacement” selection operator (with probability 0.6 to select the best individual) and classic Holland’s crossover and mutation as genetic operators (with probability 0.8 and 1/40, respectively). In particular, the probability of mutation permitted to have, on an average, one bit mutated for each individual, as the genotype length (obtained as the sum of the number of bits chosen for the encoding of each considered SCIDDICA-S4c parameters - cf. Table 2), was exactly 40. Eventually, the number of individuals forming the initial population was set to 200, while the number of individuals to be replaced at each GA step was set to 15. Previous empirical attempts of simulation of the considered case of study, manually performed by iteratively assigning reasonable values to model parameters, helped in hypothesizing the ranges within which the values of the CA parameters are allowed to vary (cf. Table 2). Values of CA parameters, either optimized by GA or prefixed, are listed in Table 2. As concerns the prefixed parameters, the value adopted for pc suggested a small value for pt; such values allowed for a quite detailed description of the phenomenon, both in spatial and in temporal terms. The dissipative parameters pdP was set to zero, in order to only consider frictional and turbulent effects.
Evolutionary Computation and Emergent Modeling of Natural Phenomena
211
Figure 4. The May 1998 Curti landslide: comparison between the real case and the best simulation. Key: area affected by 1) real landslide, 2) simulated landslide, 3) both cases; 4) border of the area considered for comparison.
Eventually, the value of padh allowed for simulating the observed effect of thin mud plastering along the trail. As regards the optimized parameters, the values obtained for pdN and pdQ seem to evidence an important role of dissipation for the considered case of study: values are in fact very close to the right end of the explored ranges. The value obtained
212
R. Rongo et al.
for pf well reflects the high assumed fluidity of the event. Despite a quite wide range was selected for optimizing pet, its resulting value well corresponds to the extremely-erodable type of detrital cover (mainly, allophane soils) of the study area. Furthermore, the related factor of progressive erosion, ppef, did also end up with a rather low value, still reflecting the characteristic of the real event, in which erosion progressively occurred. In Figure 4, a graphic comparison between the map of the best simulation (i.e. obtained with optimized parameters) and the real case is shown. Even though only simplified, calibration allowed to satisfactorily reproduce the real phenomenon, both in qualitative and in quantitative terms. In particular, the overall affected area and depth of regolith erosion along the path are quite in good accordance with surveyed evidence; the branching of the flow at the base of the slope is fairly well simulated (better than by previous releases of the model), and so their successive merging right upslope of the urbanized area. The obtained fitness value of 0.76 exceeds the “classic” threshold (0.7) commonly assumed as “acceptable” for calibration experiments. 6. Conclusions In this review paper we described an evolutionary approach for the calibration of two Macroscopic Cellular Automata models for the simulation of lava and debris flows, respectively. Such phenomena, classified among the most dangerous geological processes both for living beings and material properties, are unfortunately very difficult to be modeled through standard approaches. However, Macroscopic Cellular Automata demonstrated to be a valid alternative in such attempt, on condition that their great capability to show even more different emergent dynamical behavior is, in a way, properly governed. As evidenced by the application to the considered simulation models to two real cases of study, Genetic Algorithms can perform this task in a satisfying manner. This is not surprisingly, as Genetic Algorithms demonstrated a high and general ability as optimization algorithms in many scientific fields. Also in our cases, they were able to properly calibrate the considered simulation models so that the phenomena to be reproduced could be simulated in a satisfying manner. Anyway, particular attention must be reserved to the definition of the fitness function. As regards our experience, the choice of a “poor” fitness, like that only based on an areal comparison between the real and simulated events, can lead the search of the desired emergent model behavior towards fictitious solutions (local optima). This was confirmed by the first calibration performed on the
Evolutionary Computation and Emergent Modeling of Natural Phenomena
213
SCIARA-fv lava flow model, for which only the adoption of a more refined objective function which, besides information about the areal extent, considered also information on the duration of the real event, allowed to obtain a new set of model parameters able to produce the desired model behavior. Unfortunately, reliable information about the phenomena here discussed are often difficult to be obtained. This is due to several reasons, among which the fact that they are rapid phenomena (in particular the debris flows) and then difficult to be monitored during their evolution. For the Curti debris flow, for instance, it was not possible go back to the duration of the event with a sufficient precision, and only a simplified fitness function could be considered. In such cases the model reliability could further be confirmed, and a validation phase which evaluates the goodness of the model against a sufficient number of different cases of study could be certainly desirable. References 1. M.V. Avolio, G.M. Crisci, S. Di Gregorio, R. Rongo, W. Spataro and D. D’Ambrosio, Computers and Geosciences-UK, 32, 897-911 (2006).
2. E. Cantù-Paz, Efficient and accurate Parallel Genetic Algorithms (Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000).
3. G.M. Crisci, S. Di Gregorio and G.A. Ranieri, in Proceedings International AMSE Conference Modelling & Simulation, Paris, France, Jul.1-3 1982, (1982), pp. 65-67.
4. G.M. Crisci, R. Rongo, S. Di Gregorio and W. Spataro, Journal of Volcanology and Geothermal Research 132, 253-267 (2004).
5. D. D’Ambrosio, R. Rongo, W. Spataro, M.V. Avolio and V. Lupiano,. in LNCS 4173, Ed. S. El Yacoubi, B. Chopard and S. Bandini, (2006), pp. 452 - 461.
6. D. D’Ambrosio, W. Spataro and G. Iovine, Computers and Geosciences-UK 32, 861-875 (2006).
7. D. D’Ambrosio, G. Iovine, W. Spataro and H. Miyamoto, Environmental Modelling & Software 22, 1417-1436 (2007).
8. M. Del Prete, F.M. Guadagno and A.B. Hawkins, Bullettin of Engineering Geology and the Environment 57, 113-129 (1998).
9. S. Di Gregorio, D.C. Festa, R. Rongo, W. Spataro, G. Spezzano and D. Talia, in 10. 11. 12. 13.
Parallel Computing: State-of-the-Art and Perspectives, Ed. E.H. D' Hollander, G.R. Joubert, F.J. Peters and D. Trystam, (1996), pp. 69-76. S. Di Gregorio, R. Serra and M. Villani, in Proceedings of 3rd Systems Science European Congress, Roma 1-4 October 1996, Ed. E. Pessa, M.P. Penna, A. Montesanto, (Kappa, Roma, 1996), pp. 1127-1131. S. Di Gregorio and R. Serra. Future Generation Computer Systems 16, 259-271 (1999). S. Di Gregorio, R. Serra and M. Villani, Complex Systems 11, 31-54 (1997). J.H. Holland, Adaption in Natural and Artificial Systems (University of Michigan Press, Ann Harbor, 1975).
214
R. Rongo et al.
14. G. Iovine, D. D’Ambrosio and S. Di Gregorio, Geomorphology 66, 287-303 (2005). 15. A.R. McBirney and T. Murase. Annual Review of Earth Planetary Sciences 12, 337357 (1984).
16. M. Mitchell, An Introduction to Genetic Algorithms (MIT Press, Massachusetts, 1996).
17. W. Spataro, D. D’Ambrosio, R. Rongo and G.A. Trunfio, In Proceedings of the 7th
International Conference on Cellular Automata for Research and Industry (Perpignan, France, Sep.20-23). LNCS 4173, (2004), pp. 725-734. 18. S. Succi, The Lattice Boltzmann Equation for Fluid Dynamics and Beyond (Oxford University Press, Oxford, 2004).
A NEW MODEL FOR THE ORGANIZATIONAL KNOWLEDGE LIFE CYCLE
LUIGI LELLA, IGNAZIO LICATA ISEM, Institute for Scientific Methodology, Palermo. Italy E-mail: [email protected] Actual organizations, in particular the ones which operate in evolving and distributed environments, need advanced frameworks for the management of the knowledge life cycle. These systems have to be based on the social relations which constitute the pattern of collaboration ties of the organization. We demonstrate here, with the aid of a model taken from the theory of graphs, that it is possible to provide the conditions for an effective knowledge management. A right way could be to involve the actors with the highest betweeness centrality in the generation of discussion groups. This solution allows the externalization of tacit knowledge, the preservation of knowledge and the raise of innovation processes. Keywords: organizational knowledge, theory of graphs, network models.
1. The knowledge life cycle Nowadays every organization must be able to learn quickly and continually from the environment where it operates (Nonaka, 1994 [18]). The new knowledge comes from the experiences of the individuals operating within the organization, and it is constructed through their social and collaborative interactions (Nonaka and Takeuchi, 1995 [19]; Nonaka, Toyama and Konno, 2000 [21]). For this reason the technology should focus on the problem of finding innovative solutions to improve the cooperation among individuals and the awareness of the knowledge and the skills reached by each of them (Stenmark, 2003 [24]). As noticed by McElroy (McElroy, 2003 [17]) the knowledge management systems (KMS) of the first generation have been focussed principally on the processes of knowledge diffusion and integration. This means they were based on the assumption that valuable knowledge was already present within the organization. Therefore the main purpose of a KMS was to provide the right information to the right people and to codify all the explicit and tacit knowledge embodied in organizational processes and in the beliefs of the individuals. But the purpose of a KMS should be also the production of new knowledge, not only the integration of the existing organizational knowledge. So, as stated
215
216
L. Lella and I. Licata
by McElroy, the KMS of the second generation have all to deal with the problem of the creation of knowledge, favouring the detection of problems and needs and the finding of solutions. This innovation process proceeds in a form that is called by McElroy “Knowledge Life Cycle” or KLC. The KLC is not just a model, but a “framework for placing models in context” using the exact definition of McElroy. So a complex of different competing ways and views of how knowledge can be produced and integrated. The way this framework works is influenced by the following assumptions. First of all the learning foundation is the experience of gaps in everyday activities. The detection of these gaps, which are the lack of the needed knowledge to carry out the activities in the right way and in the shortest time, represents a sort of emergence of problems. The detection of gaps is just the first step toward the formulation of the problems which are defined by McElroy “knowledge claims”, which comprise an analysis, an elaboration of the problems to be solved. Knowledge claims could be conjectures, assertions, reports, guidelines or entire theories on the right processes to follow in order to fill the detected gaps. The formulation of a knowledge claim can involve more individuals leading to the generation of groups. These communities, in a formal or informal way, share ideas submitting them to a sort of peer review. This process is requested to validate the emergent innovative ideas. Such process of knowledge claim formulation and evaluation is considered by McElroy a process of knowledge production. Not all the knowledge claims survive within the organization finding the interest and the approvation of the other individuals. The ones that do not succeed in the evaluation process could be “undecided knowledge claims” or “falsified knowledge claims”. The reports which certify the failure of the knowledge claims are called “meta-claims” i.e. claims about claims. The knowledge claims which pass the validation are instead integrated into the activities of a wider group of people. The integrated knowledge can take the form of mentally held knowledge by individuals or groups or explicit artifacts like documents and files. The first type of container can be considered a special kind of tacit knowledge, but only the explicit forms of knowledge are considered “knowledge claims” by McElroy. The following phase is the knowledge use which regards the business processing, not the knowledge processing, even if new problems, and so other knowledge claims, can arise even in this last phase.
A New Model for the Organizational Knowledge Life Cycle
217
Summing up the processes of Knowledge Production, Knowledge Integration and Business Processing have not to be conceived as isolated, but they interact each other in a complex manner. And their complexity degree has to be realized in order to support the organizational processes of innovation. Means that the capture, the coding and the deploying of knowledge alone are not sufficient to guarantee the creation of innovation. These efforts are merely examples of information management or information processing, not knowledge management. The main intuition of McElroy is that a KMS has to guarantee strategies and environments where knowledge can also be valuated producing knowledge claims and meta-claims. Only an evaluative and critical process can integrate and coordinate the different phases of knowledge management. 2. A new framework to support KLC According to the definition of McElroy, in order to support the knowledge life cycle the KMS has to provide and promote knowledge sharing spaces where individuals can discuss certain problems, conjectures and theories. The system we are going to present tries to achieve this goal in two steps. First the social network of the entire organization is analyzed to detect the points where knowledge and information pricipally flow. Once these individuals have been detected they are prompted by the system to create a community to discuss a given knowledge claim of common interest. This group can take the form of a community of practice (Hildreth and Kimble, 2000 [10]; Wenger, McDermott and Snyder, 2002 [25]; Saint-Onge and Wallace, 2003 [23]) where individuals meet each other in face to face encounters or it can take the form of a network of practice (Hildreth and Kimble, 2004 [11]) where individuals take on the debate in virtual environments as forums, blogs (Jensen 2003 [14]) or a wiki (Ebersbach, Glaser and Heigl, 2005 [9]). The encounter has to produce a document, for example a report, a guideline, a directive, which resumes the ideas, the problems and the solutions which have emerged in the debate. This document has to be structured as an hypertext, meaning that it has to contain also references to other documents and reports. The network of documents can be considered as a network of ideas which have been externalized by a single author or a community of authors as the discussion reports. It has to be stressed that the present work doesn’t want to cope with the problem of the definition of an opinion formation model (Di Mare and Latora,
218
L. Lella and I. Licata
2006 [8]; Bordogna and Albano, 2007 [6]). It wants only to present a preliminary study on the effects of the introduction of a new knowledge management platform on the evolution of networks of ideas and knowledge claims within an organization. Figure 1 shows the two different dimensions taken into consideration by the system, which are the organizational social network and the network of ideas which emerge from the knowledge production spaces provided by the system and maintained by the individuals which intercept the majority of knowledge and information flows. The emerging network of ideas can be conceived as a complex system which constantly evolves in time. This implies that the structure of the network continuously changes through the addition or the removal of nodes and links. In this kind of networks the survival of nodes seems to depend on some quality of the nodes, for example the quality or the perceived interest or utility of the exposed idea or knowledge claim. Thanks to the innovative value of the claimed ideas it can happen that some research papers in a short timeframe acquire a very large number of citations, much more than other contemporary or older publications. As stated by Bianconi and Barabasi (Bianconi and Barabasi, 2000 [4]) this example suggests that the nodes of a network of ideas have a different ability (fitness) to compete for links. The success of the idea depends also on its popularity and its foundations which are represented respectively by the number of other documents which reference the externalized idea (for example the inbound links of an electronic hypertext) and the number of document referenced by the externalized idea (for example the out-bound links of an electronic hypertext). A good model which considers all these factors is the one presented by Bianconi and Barabasi (Bianconi and Barabasi, 2000 [4]). The process starts with a net consisting of N disconnected nodes. At every step t = 1 N each node establishes a link with other m units. If j is the selected unit, the probability that this node establishes a link with the unit i is:
Pi =
U i ki U k j j j
(1)
where ki is the degree of the unit i , i.e. the number of links established by it, while U i is the fitness value associated to the node. We want to define such network principally by connecting the knowledge claims which have been externalized in debates and encounters promoted by the
A New Model for the Organizational Knowledge Life Cycle
219
Figure 1. Network of externalized ideas at t = 0. At t = 1 the pattern of ties within the social network is changed. The betweeness centrality of A, B and C has decreased and the individuals D and E, having the highest betweeness centrality, are invited to promote other two discussion spaces. Individuals A, B and C continue to attend to the discussions of their groups, but the interest on their knowledge claims has vanished. This is attested by the decrease of the betweeness centrality of the promoters A, B and C. Thanks to the high fitness the node E within the network of externalized ideas can establish the same number of connections as node B. Probably the individual E promotes a meta-claim on the knowledge claim sustained by the individual B. This justifies the presence of the connection among reports B and E. But the ever changing values of the business centralities of the promoters guarantees that no externalized knowledge claim will prevail over the other ones.
individuals which control the flows of information and knowledge. This choice is due to the fact that these individuals have a broader and more generalized vision of the problems and needs of the organization. So they are more capable to suggest knowledge claims which inspire the interest of a wide organizational community. The betweeness centrality has been considered in literature (Marsden, 2002 [16]; Alony, Whymark and Jones 2007 [2]) as a way to find the most valuable nodes within a social network. It can be said that a node with an high betweenes centrality plays a “broker” role in the network, i.e. it has a great influence over what flows (and does not) in the network. These nodes play indeed an important role but at the same time constitute a failure point of the network. That is because without their presence some subgroups of individuals within the
220
L. Lella and I. Licata
organization could be cut off from information and knowledge. The betweeness centrality bi of a node i belonging to a social network is obtained as:
g jiw
bi = j ,w
g jw
(2)
where g jw is the number of shortest paths from node j to node w ( j , w ≠ i ) and g jiw is the number of shortest paths from node j to node w passing through node i . The betweeness centrality is indeed the best way to select people which can start and promote discussion among the other individuals within the organization. The main purpose of our system is twofold. First of all our platform has to promote environments where people can share their ideas on topics of common interest. To achieve the best results the system detects the most important people in the knowledge life cycle, that is those with the highest betweeness centrality. These people have to suggest (maybe with the aid and the prompt of the system) a knowledge claim which can regard the largest audience. All the suggestions, problems and solutions which are emerged from the community that grows around the promoters of the discussion are grouped, organized and externalized. In this way a certain amount of tacit knowledge, that is knowledge principally held in the minds of individuals or embedded in processes (Polanyi M., 1967 [22]; Nonaka I., 1994 [18]; Nonaka et al., 1998 [20]; Hildreth and Kimble, 2000 [10]; Bhatt, 2001 [3]; Bosua and Scheepers, 2002 [7]), can be exteriorized in an explicit form like a report or a guideline. Thanks to such sharing environments new ideas can arise promoting the creation of knowledge and innovation. At the same time the system allows to achieve another important outcome that is the preservation of knowledge. During the encounters the participants can get in touch with people never seen before or people whom they have never collaborate with. In this way the knowledge can survive even without the promoter of the discussion. For this reason it has to be considered that the fitness function, that is the betweeness centrality, is not constant but changes over time. The participation to a common activity as a discussion forum or the updating of the content of a collaborative work environment as a wiki or a community blog can be conceived as a form of communication or social relation. So in our model the fitness function we choose for a given externalized idea takes the following form:
A New Model for the Organizational Knowledge Life Cycle
0 g (t ) jiw
bi (t ) = U i (t ) = j,w
g (t ) jw
221
t < ti t ≥ ti
(3)
where ti is the instant at which the discussion group is constituted. It implies that the betweeness centrality of each node of the social network may vary over time thanks to the effect of the collaborative process of knowledge creation. We assume that bi (t ) is a decreasing function for t > ti , considering that the creation of a discussion group involves a rewiring process in the social network localized around the node i which promotes the discussion. This factor has important effects on the evolution of the network of opinions and ideas as we will demonstrate hereafter. In another work Bianconi and Barabasi (Bianconi and Barabasi, 2001 [5]) compared their model to the evolution of a Bose gas, assigning an energy ε i to each node determined by its fitness U i and a parameter β acting as an inverse temperature ( β = 1 T ):
εi = −
1
β
log U i
(4)
According to this mapping, a link between two nodes i and j with different fitnesses U i and U j corresponds to two different noninteracting particles on the energy levels ε i and ε j . The addiction of a new node to the network corresponds to the insertion of a new energy level ε i and 2m particles to the gas. In particular m particles, corresponding to the m out-bound links of node i , distribute themselves on the level ε i while the other m particles are deposited on other energy levels corresponding to the in-bound links coming from node i . The probability that a particle is settled on a level i is given by (1), and deposited particles are not allowed to jump to other energy levels. Each node added at time ti and corresponding to an energy level ε i is so characterized by an occupation number ki (ε i , t , ti ) representing the number of links (particles) that the node establishes at time t. Bianconi and Barabasi (Bianconi and Barabasi, 2001 [5]) made the assumption that each node increases its connectivity following the power law:
k i (ε i , t , ti ) = m
t ti
f (ε i )
(5)
By the introduction of a chemical potential µi they also demonstrated that the dynamic exponent f (ε i ) takes the following form:
222
L. Lella and I. Licata
f (ε i ) = e − β (ε i − µ i )
(6)
This mapping has lead to the prediction of the existence of three different phases in the evolution of their network model. When all the nodes have the same fitness (6) predicts that f (ε i ) = 1 2 and according to (5) the occupation number, which corresponds to the connectivity of node i, increase as (t ti )1 2 . This means that old nodes having smaller ti have larger ki and the model reduces to the scale free model (Albert and Barabasi, 2001 [5]). In our case this result indicates that new ideas tend to originate from the most popular ones, which establish more connections with the others, and old ideas have more chances to become more popular and survive than the others. Clearly this phase, that has been called by Bianconi and Barabasi “firstmover-wins” (FMW), doesn’t correspond to a real network of opinions where the value of the idea influences more its success than its age. In systems where nodes have different fitnesses the fittest nodes acquire links at an higher rate even if they have been introduced at a later time with respect to the others. This phase is called by Bianconi and Barabasi “fit-getrich” (FGR). In our model the value of an externalized idea is approximated by the fitness (3), which can be considered the extent by which the promoter of the knowledge claim is capable to interest the members of the community which discuss the knowledge claim. But in the two first phases there is no clear winner as the fittest node’s share of all links decreases to zero in the thermodynamic limit, leading to the emergence of a hierarchy of few large hubs surrounded by many less connected nodes. In the “first-mover-wins” phase the relative connectivity of the oldest node follows the law:
k max (t ) (1 2) −1 −1 2 ≈t =t →0 mt
(7)
In the “fit-get-rich” phase the relative connectivity of the fittest node decreases as:
k (ε min , t ) ≈ t f (ε min ) −1 → 0 mt
(8)
considering that f (ε min ) < 1 . Bianconi and Barabasi demonstrated that below a given TBA = 1 β BA the fittest node maintains a finite fraction of the total number of connections during the growth of the network. This particular phase has been compared by Bianconi and Barabasi to the Bose-Einstein (BE) condensation (Huang, 1987 [12]).
A New Model for the Organizational Knowledge Life Cycle
223
Figure 2. Evolution of the network of externalized ideas at t = 1.
In a network of knowledge claims this phase has to be avoided because it means that a given idea prevails over the other ones, limiting the process of innovation of the organization. As stressed by Bianconi and Barabasi real networks have a T independent fitness distribution meaning that their status (BE or FGR) is independent of T. Luckily our KMS model tends to level the fitnesses of the nodes. The rewiring of social ties around the individuals with the greatest betweeness centralities leads to the appearance of new individuals with the highest betweeness centralities. In this way the BE condensation can be avoided. This assert cannot be mathematically demonstrated as the fitness function depends on an unspecified number of variables, but figure 1 and figure 2 can show the way by which the network of externalized ideas evolves in time. At t = 0 we have three individuals A, B and C with the highest betweeness centralities which are invited by the system to promote a space of discussion reporting all the arisen knowledge claims in the documents A, B and C. Some people within the discussion groups A and B notices a correlation among the themes treated by A and B and a reference is generated among the corresponding reports. A correlation is detected between the reports B and C and another reference is added. The spaces of discussion are open to every interested participant and alerting measures could be adopted in order to spread the invitations over the entire organization. In this way there can arise large
224
L. Lella and I. Licata
communities which do not include only the strongest ties of the discussion promoter. At t = 1 the pattern of ties within the social network is changed. The betweeness centrality of A, B and C has decreased and the individuals D and E, having the highest betweeness centrality, are invited to promote other two discussion spaces. Individuals A, B and C continue to attend to the discussions of their groups, but the interest on their knowledge claims has vanished. This is attested by the decrease of the betweeness centrality of the promoters A, B and C. Thanks to the high fitness the node E within the network of externalized ideas can establish the same number of connections as node B. Probably the individual E promotes a meta-claim on the knowledge claim sustained by the individual B. This justifies the presence of the connection among reports B and E. But the ever changing values of the business centralities of the promoters guarantees that no externalized knowledge claim will prevail over the other ones.
3. Conclusion and future work A KMS needs techniques and strategies to support the entire knowledge life cycle. This process has to lead to the formulation of knowledge claims and meta-claims, which are produced by problem analysis and problem validation processes. These activities cannot be scheduled and structured beforehand in a top-down fashion by the management, but they have to arise in an emergent manner, considering the knowledge gaps encountered by the agents in their activities and involving the right people which can effectively judge and deal with the arisen knowledge claims. A possible way could be to monitor the evolution of the social network which characterizes the organization in order to detect the individuals with the highest betweeness centralities and prompt them to detect problems. These individuals intercept the majority of information and knowledge flows and therefore they are the rightest people to suggest knowledge claims, submitting them to the pair review of a large involved and interested community. In other words these actors are invited by the system to produce knowledge. The discussion environments promoted and sustained by these individuals drives a social relation complexity turned to the generation of meta-claims or other knowledge claims. In this work we presented a knowledge management framework that is designed to externalize tacit knowledge producing knowledge claims. We have tried to demonstrate that our framework is capable to preserve organizational
A New Model for the Organizational Knowledge Life Cycle
225
knowledge from being lost and most of all to create the right conditions for keeping the innovation processes. We have chosen the model of Bianconi and Barabasi to represent the growth of the network of knowledge claims as this is the only model which allows to consider in the evolutionary process both the popularity of the externalized ideas, i.e. the number of the references made by other knowledge claims to the externalized idea, and the value or fitness of the externalized idea, represented by the betweeness centrality of its promoter. Every time an individual is invited to suggest and promote a knowledge claim his/her betweeness centrality decreases favoring the augmenting of the betweeness centralities of the members of the community generated by the knowledge claim. This sort of leveling effect of the fitnesses of the externalized ideas allows to avoid the situation where a certain knowledge claim prevails over the other ones. It is important to stress that without the particular mechanism of involving in discussions the individuals with the highest betweeness centrality, the knowledge gaps perceived by these individuals could remain internalized in a tacit form or, once externalized, could be limited to a small group of individuals strongly tied to them. A number of issues have still to be treated. First of all we are going to model and evaluate the effects of the introduction of discussion promoters on the overall structure (Iansiti and Levien, 2004 [13]) of the network of ideas. For example we will evaluate the robustness of the network of ideas in the presence of specific kinds of perturbations, the productivity of the network of ideas in terms of delivery of innovations and the niche creation in terms of variety, i.e. number of new ideas in a given period of time, and the overall value of the new options created. In this effort we need to take into account both local and global resources of the ecosystem. For example the fitness function should not exclusively depend on the betweeness centralities of the nodes but also on global measures of the network health as the previously introduced ones. It has been demonstrated that ecosystems governed by local and global resources can lead to the emergence of stable hubs which are a strong indicator of system robustness (Lella and Licata,2007 [15]) , and we will try to evaluate if our knowledge management model does follow this particular trend. After this preliminary study of the knowledge model we will choice the communication channels to monitor in order to obtain a good representation of the organizational social network. Many researchers have tried to evaluate the possibility to approximate the pattern of organizational relations principally
226
L. Lella and I. Licata
following face to face encounters, telephone communications, tele conferences, and email flows. We will review all the works regarding organizational social network analysis and we will try alternative ways to reconstruct the pattern of social ties represented by networks of k-logs. The second problem to be solved is the choice of the most suitable environment to promote the creation of knowledge. We will compare the performances of different solutions like face to face debates, forums, community blogs and wikis. Finally we will have to define appropriate mechanisms and strategies to involve the individuals to share their knowledge and experiences, maybe suggesting them a list of possible arguments to debate with the colleagues. We will also have to define ways to invite individuals to the discussion groups. We will compare the effects of different solutions as the direct invitation of the promoter or the definition of alerting mechanisms which for example suggest the potentially interesting discussion groups for the activities of each individual operating in the organization.
Acknowledgments This work has been partially granted by the PRIN-2005 research project “Dinamiche della Conoscenza nella Società dell’Informazione”, national Coordinator Prof. Cristiano Castelfranchi. One of authors (IL) thanks Ginestra Bianconi for her precious suggestion and encouragement.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
R. Albert, A. Barabasi, Rev. Mod. Phys. 74, 47-97 (2001). I. Alony, G. Whymark and M. Jones, Informing Science Journa,l 10, (2007). G.D. Bhatt, Journal of Knowledge Management, 5(2), (2001). G. Bianconi and A-L. Barabasi, arXiv:cond-mat/0011029v1, (2000). G. Bianconi and A-L. Barabasi, Physical Review Letters, 86(24), (2001). C.-M. Bordogna, E.-V. Albano, Journal of Physics: Condensed Matter, 19, (2007) R. Bosua and R. Scheepers, in Proceedings of the 25th Information Systems Research Seminar in Scandinavia (IRIS25), Ed. K. Bodker, M.K. Pedersen, J. Norbjerg, J. Simonsen M.T. Vendelo, (Roskilde University, Denmark, 2002). A. Di Mare and V. Latora, arXiv:physics/0609127, (2006). A. Ebersbach, M. Glaser and R. Heigl, Wiki. Web Collaboration (Springer, 2005). P. Hildreth and C. Kimble, Journal of Knowledge management 4(1), 27-38 (2000). P. Hildreth and C. Kimble, Knowledge Networks: Innovation through Communities of Practice (Idea Group, Hershey, PA, 2004). K. Huang, Statistical Mechanics (Wiley, Singapore, 1987).
A New Model for the Organizational Knowledge Life Cycle
227
13. M. Iansiti, R. Levien, The Keystone Advantage: What the New Dynamics of Business 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Ecosystems Mean for Strategy, Innovation and Sustainability (Harvard Business School Press, Boston, 2004). M. Jensen, Columbia Journalism Review (2003). L. Lella, I. Licata, EJTP, 4, 14, 31-50 (2007). P.-V. Marsden, Social networks 24(4), 407-422 (2002). M.-W. McElroy, The new knowledge management. Complexity, learning, and sustainable innovation (KMCI Press, Butterworth-Heinemann, Boston, MA, 2003). I. Nonaka, Organization Science, 5(1), (1994). I. Nonaka and H. Takeuchi, The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation (Oxford University Press, New York, 1995). I. Nonaka, P. Reinmoeller and D. Senoo, Euro. Manag. J. 16(6), 673-684 (1998). I. Nonaka, R. Toyama and N. Konno, Long Range Planning, (32), 5-34 (2000). M. Polanyi, in Knowledge in Organizations, Ed. L. Prusak, (ButtreworthHeinemann, Boston, MA, 1967), pp. 135-146. H. Saint-Onge and D. Wallace, Leveraging Communities of Practice (ButtreworthHeinemann, Boston, MA, 2003). D. Stenmark, Knowledge and Process Management 10(3), 207-216 (2003). E. Wenger, R. McDermott and W.-M. Snyder, Cultivating Communities of Practice (HBS Press, 2002).
This page intentionally left blank
ON GENERALIZATION: CONSTRUCTING A GENERAL CONCEPT FROM A SINGLE EXAMPLE
SHELIA GUBERMAN Digital Oil Technologies, Cupertino, California, USA E-mail: [email protected] Using the linguistic approach it is possible to generalize from a single example. Keywords: linguistic approach, concept formation, generalization.
1. Introduction and background In Artificial Intelligence it is accepted that a computer can create a pattern by applying a pattern recognition algorithm to a set of examples that represent at least two classes of objects (for example, various representations of the characters “A” and “B”). Because there are no precise definitions of “pattern” and “concept” these two terms in the AI context were considered as synonyms. That substitution does not solve any problems but it reflects the general tendency in AI to baselessly lift the level of “intelligence” achieved at some point in time by using philosophical vocabulary. So, it was decided that if one can say that the computer creates a pattern it will be correct to say that the computer creates a concept. The reality is quite different. The computer does not create patterns. Pattern recognition programs generate decision rules. The decision rule built by the computer represents the difference between a given class of objects (a “pattern”) and another given class (or a number of classes) – not the essence of the class, i.e., the pattern. This state of art becomes clear when we consider, for example, that water can be distinguished from ice by density, from oil by electrical resistance, from acid by its effect on living tissue. Here it is important to observe that none of these discernible attributes is sufficient to describe water as a concept. Consequently, because pattern recognition programs do not generate patterns, nobody can say that pattern recognition is a solution for creating concepts (without discussing the equivalence between pattern and concept). At the same time these two problems – pattern recognition and creating a concept – can be treated using a similar approach, i.e., the linguistic approach.
229
230
S. Guberman
(a)
(b)
(c)
Figure 1. Winston’s examples for constructing the notion of “an arch”.
2. On the philosophical approach In philosophy the definition of a concept is as follows: “A concept is an abstract idea or a mental symbol, typically associated with a corresponding representation in language” [13]. In this paper we follow the “representation in language” approach. P. H. Winston was the first to reject the idea of creating a concept from a number of examples [14]. In that paper Winston proposed the use, for machine learning, of the concept of “an arch”. In his approach he emphasized the importance of an adequate “visual language” of description and pondered, as a matter of fact, a single example of an arch, and a number of examples of what might be considered “not an arch”, each example calling attention to a crucial feature of an arch (see Figure 1). It turns out that this approach also embodies an attempt to define a “concept” through differentiating attributes, and it fails (see the example of the concept of “water” above). Let us now take a close look at the problem from the linguistic point of view. And let us use the same example – an arch. Let us describe an arch in simple English. The result might be something like this: One block rests on two standing blocks
(1)
We can transform this sentence to a more formal grammatical structure. Specifically: [(4-block) rests] on {[(4-block) stands] [(4-block) stands]}
(2)
Where brackets reflect the levels of the grammatical structure, and “4” indicates that the base of the block is a 4-angle. In our opinion, this relatively simple construction makes a lot of sense and allows us to construct a concept. The reasons are these: 1. The level of the term in that structure reflects the importance of the feature in the definition of the notion “an arch”: the higher the structural level of the term, the more important it is in defining the concept of “an arch”. The
On Generalization: Constructing a General Concept from a Single Example
(a)
(b)
231
(c)
Figure 2.
2.
3.
lower the level, the less important it is. For example, the lowest level term (for a block in our example it is 4) can be altered (to become a 3 or a 10) but it will influence the nature of the “arch” object very little (see, for example, Figure 2(a)). The term appearing on the next level (the block) can be changed to a pyramid, or to an “H-profile block”, nevertheless the structure will still be interpreted as roughly an arch. If the term “block” is changed to a “pyramid”, we will still recognize the shape of an arch (or at least a caricature of an arch – see Figure 2(b)). Changes at a higher level, however, put an end to perceiving this structure as an arch (see Figure 2(c), where “stands” is changed to “lies”). We note that in sentence (2) there is no indication of the relation between the two upright blocks. This means that there is nothing specific to be said about this relation, i.e., upright blocks are in a “nonspecific” position. If they are to be in a particular position (for example, they touch each other) this would be reflected in the natural language description (“two upright blocks are touching each other and the third one rests on them”). The use of the terms “upright” and “rests” may trigger the realization that the “upright block” is vertical and “resting block” is horizontal.
Nevertheless, the most important information captured in sentence (2) is the hierarchical grammatical structure of the statement. It allows us to emphasize the key features of the object we are pondering, and, as a consequence, allows us to define the notion of “an arch”. We thus posit that this grammatically structured sentence is the notion of an arch. The word “arch” is a label only for that structure. So far we have considered a particular example of description. We now consider whether there exists a general approach which can allow us to create concepts through the use of an adequate language of description. 2.1. Constructing a description According to Bongard' s “imitation principle” [1], the most effective way of solving any recognition problems is to describe the objects to be recognized in
232
S. Guberman
terms of how the objects were created. We point out that sentence (2) discussed above defines the concept of “an arch”. Moreover, it comprises an instruction on how to build an arch (stand two vertical bars apart and rest a third one across top of the two vertical ones). If we can develop a program that will “look” at a single example of an arch (Figure 1) and create description (2), this will solve the problem of recognizing arches in general, because structure (2) is an arch. In other words, if an object is described using an adequate language of description, this description contains the description of the class, to which the object belongs, and thus the “concept”. To further illustrate the construction of a description, let us now consider the computer recognition of handwriting. A language, adequate for describing written scripts, has been introduced [3]. In this language, the description of any given character becomes the “name” of the character. We consider it remarkable that the language adequate for the description of handwriting recognition is the language used in describing the process of writing. We note that this is in accordance with the “imitation principle” introduced by Bongard [1]. Most importantly, though, the script is treated not as a static picture, but as a track of the movement of the writing implement. The language of description for writing recognition consists of 8 basic elements (“words”), which are interrelated: in free-hand writing, one element can be transformed into other, which is its neighbor in the line of elements as shown in (3): (3) Therefore, in this language of description, the canonical character is described as a sequence of elements . If we have such a description of the canonical “a” we can then apply the transformation rule and get all possible shapes of a written “a”: • If the first element “ ” is changed into its neighbor in the sequence (3) “ ” this will produce a written “a” . • If the second element “ ” is changed into its neighbor “ ” this will produce a written “a” - . • If the third element “ ” is changed to his neighbor “ ” this will produce a written “a” - , • and so on.
On Generalization: Constructing a General Concept from a Single Example
233
Thus, we get exactly what we are after: from the appropriate description of a single example of a written “a” we get the description of the complete class of “a”. A number of other applications may demonstrate that the real solutions to old unsolved problems of pattern recognition are found only when an adequate language of description is used to describe the objects under consideration. We again emphasize that it is remarkable that when the description is adequate it not only gives the right answer, but the decision rule become extremely simple, one may say – primitive [4, 5]. What this means is that such an approach may reveal a new understanding of the intellectual processes that occur in the human brain. It seems to us that there is no need for sophisticated procedures for decisionmaking and recognition for intelligence to take shape, but the ability to create an adequate description of the world. That leads us to a deeper problem: why is our brain endowed with the ability to adequately reflect the world (the outer one as well as the inner one)? This problem was discussed by Wittgenstein at the beginning of the 20th century [15]. His answer was as follows. There exist an infinite number of potential objects in the world but there are only a finite number of notions in our language. Every object that is reflected in the language as a notion has a social meaning. The knowledge about these notions, as social objects (the structure of the objects as well as their key features) has to be translated, from one person to another, from one generation to the next. If so, there would be no objects in the world, which could not be articulated with a language, and, at the same time, they would all be socially important. This means that to a great degree, the world, in which we live consciously, is a world, which is filtered through our language. To put it simply (and perhaps facing the danger of oversimplification) we may say this: we live in a world which is defined by our language. 3. Implicit generalization Let us now analyze a very simple image (Figure 3). The description of this image in a “natural language” might be: “big white triangle”
(4)
This sentence can be transformed into its formal structure: (big) (white) (3) [angle] The grammatical structure of (5) is:
(5)
234
S. Guberman
Figure 3.
plane figure big
white
polygon = “n-angle” 3
(6)
Sentence (5) represents the notion of a “big white triangle”. Now, let us try to generalize this notion by eliminating some of the terms in the grammatical structure of (5) shown by sentence (6). If we start at the lowest level of this structure, we may eliminate the term “3”. As a result we get this description of a general class: “big, white polygon (n-angle)”. If we now omit “big” from the second level of the structure in (6), we obtain a broad notion of the “white nangle”. Now, if we leave out “white” from (6), we get a notion of the “big nangle”. Finally, if we omit “n-angle”, we get a more general notion of “big white” plane figure. We now note that for natural language, the information embedded in a given sentence is obtained not only from terms explicitly used in the sentence but also from the related terms that are not explicitly brought up. Again, let us consider sentence (5). The “big white triangle” implies the existence of “small white triangle” and of “big black triangle”, and of small black 4-angle, and so on. Thus, the single sentence, which describes a simple object, may represent a world populated by big triangles, small triangles, white quadrangles, black pentagons, and so on. And that is not all. As soon as we contemplate a small object and we draw it on the paper in a certain way, this may generate a new description, for example: “small white triangle in the left upper corner”. This opens a new dimension in the world of figures (more precisely, two dimensions if we deal with figures on a plane).
On Generalization: Constructing a General Concept from a Single Example
(a)
235
(b)
Figure 4.
We observe that for natural language, even for a simple sentence, layers of understanding arise from the knowledge about the language and the world. This represents an important difference between natural and algorithmic languages. Another difference between natural and algorithmic languages lies in their goals: the goal of natural language is to explain, the goal of algorithmic language is to ensure program execution. Each subsequent sentence in the text written in natural language has to be understandable; each subsequent sentence written in an algorithmic language has to be executable [6]. Now we will show how the linguistic interpretation of a concept can explain some of our mental abilities. 4. Psychological experiments The experiments concerning visual observation described in this section of the paper have been previously presented in detail [10]. The task presented to the human subject of the experiment is to find, in Figures 4(a) and 4(b), an object different from all the others shown. It turns out that the time to search for such an object is significantly longer when the human subject of this experiment is presented with Figure 4(b). We will attempt to explain this observation by making the use of the approach presented in this paper. We note that the whole image is perceived as a number of similar objects: pairs of parallel lines (in accordance with the proximity principle of Gestalt psychology) [10]. The description of the object in natural language is “two parallel lines”. As with sentence (5), explored in the previous section of the paper, we note that it is possible to generate objects dissimilar to the “two parallel lines” description by inserting “non” for each element of the structure: 1. “non-two parallel lines” 2. “two non-parallel lines” 3. “two parallel non-lines “ (for example arcs)
236
S. Guberman
(a)
(b)
Figure 5.
One can see that the object that is different from the rest, which has to be found, fits description 2. This means that as soon as we recognize that the ordinary object here is “two parallel lines” we implicitly possess the idea of what we have to find. It is obvious that 1 has to be rejected; all objects consist of two lines. 3 has to be rejected as well, because all objects in Figure 4(a) are lines. The first proposition tested is 2, and this will lead to the right solution. For Figure 4(b), the description of the object is “two lines”. The dissimilar object is described as: 1. “non-two lines” (for example, “three lines” or “one line”), 2. “two non-lines” (for example, “two arcs”). 3. “two parallel lines”. For the situation shown in Figure 4(b), description “two parallel lines” does not correspond to the unconsciously perceived patterns. It is thus reasonable to predict that the decision time concerning finding a dissimilar object might be longer. The actual experiment [10] is in agreement with this prediction, arrived at by analyzing the structure. The same analysis and conclusion can be obtained for the two images shown in Figure 5(a) and 5(b). The task is the same: find a dissimilar object. The natural language description of the object in Figure 5(a) is “vertical line”. The dissimilar object described by “non-vertical line” structure turns out to be precisely the description of the object that is the subject of the required task. The description of a standard object in Figure 5(b) is “line”; we observe that “non-line” description will not generate the description of the dissimilar object- “vertical line”. 4.1. Gestalt as generalization All basic Gestalt principles (similarity, proximity, good continuation and so on) have to help in recognizing the organization of the image, i.e., dividing the image into appropriate parts and find the relationships between them. In the case
On Generalization: Constructing a General Concept from a Single Example
(a)
237
(b) Figure 6.
of the simple drawing in Figure 6(a) the image can be described as two crossing lines “ab” and “cd”, or two touching angles “ac” and “bd”, or four lines “aO”, “bO”, “cO”, “dO”. The “good continuation” principle helps describe the image (i.e., to represent our perception) as containing two parts – two crossing lines “ab” and “cd”. Why is this choice of representing (describing) the image preferable? From all potentially possible partitions of the whole, the preferred set of parts is that with the simplest description [3]. The simplicity of the description reflects 1) the number of parts (the lower the number, the simpler the description), 2) the relationships between the parts (touching, crossing, above, to the right), and 3) the simplicity of description of each of the parts. So, the hypothesis of creating the image in Figure 1 by drawing the lines point-by-point and in random order has to be rejected as being extremely complicated and practically impossible. The number of parts in the case of two crossing lines and two touching corners is the same – two, but to create the whole from the chosen parts is much more difficult in the case of corners. It is simple to draw the first corner, but drawing the second one takes a lot of concentration. First, the vertex of the corner has to coincide with the vertex of the first corner. Secondly, the direction of the first leg has to be precisely the same as the direction of the appropriate leg of the first corner. That will give the smooth continuation in the crossing point. The same conditions have to be satisfied for the second leg. Overall, it is a very arduous problem. This means that the relationships between the parts are very complicated. In the case of crossing lines the relationships are described by one condition only – crossing. The perception of Figure 6(a) as “two crossing lines”, as a matter of fact, represents not only the given image, but also a set of images (see Figure 6(b), first row) which our perception will refer to one class, which carry the same
238
S. Guberman
pattern, the same Gestalt. One of the important features of that pattern is stability: one can change some parameters of an object (curvature of the lines, intersection point, or length of lines) but the resulting image will still carry the same Gestalt. On the contrary, should one choose to describe Figure 6(a) as consisting of two angles, changes in the parameters will create a set of images (Figure 6(b), second row), which will not be accepted by our perception as belonging to the same class, the same pattern, the same Gestalt, as the initial image does. So, the first row in Figure 6(b) containing figures with the same Gestalt is a correct generalization of Figure 6(a). 4.2. Grammatical structure of a common sentence Although the examples analyzed above are formally expressions in natural language they are rather “technical”. Let us now analyze an ordinary sentence in natural language: The black horse jumped over the half-decayed fence.
(7)
As noted before, there are many things we may deduce from this sentence, depending on our knowledge of the language and of the world. This sentence may describe a scene of a chase out of a city, perhaps near an abandoned farm. We may have seen such a scene many times in western movies. Here is its grammatical structure: horse black
jumped over
fence half-decayed
(8)
Let us now begin to change the low level terms in this structure and observe how this may influence our perception of the scene. The white horse jumped over the half-decayed fence.
(9)
Nothing changes in our perception. As a matter of fact our intellect is very sensitive to different kinds of details. In our understanding of the world (or at least, the movie world) a white horse is something special, and in most cases, may be associated with an important person in the story. The black horse jumped over the painted fence.
(10)
The essence of the scene is still the same but now it could take place in a different environment – perhaps closer to a populated area, perhaps on the outskirts of a small town.
On Generalization: Constructing a General Concept from a Single Example
239
The black horse kicked the half-decayed fence.
(11)
Our perception and the pattern has changed dramatically. There is no longer a chase. We may imagine a rather comical scene: a drunken cowboy on an unhappy horse. Now let us ponder this: The black fly flew over a half-decayed fence.
(12)
This sentence may create a completely different perception of the scene. We may imagine that it is a hot afternoon. Cowboys are sitting in a restful repose near the saloon. They are half-asleep. Silence. Only a black fly is making a loud buzzing noise. We now note that the analysis of a transformation of sentence (9) may offer us a glimpse into a cultural phenomenon worth pondering. The history of medicine shows that some diseases are referred to by the name of the physician who described it first (Alzheimer' s disease, Korsakov’s syndrome, for example). The descriptions of a particular case of the disease were then used by generations of physicians as a description of the disease. However, we know that the same disease may create, in different patients, patterns of symptoms with a number of variations. Therefore, the generalization could be successful only when the description contains not only the list of characteristic symptoms of the disease but also their relative importance. As demonstrated in the previous sections of this paper, such information could be represented in the grammatical structure of the description. The fact, that these descriptions were really good, means that these physicians were well educated and their skill in natural language was high. The physicians of future generations will have to be linguistically educated to be able to extract the knowledge from the grammatical structure. All this shows the importance of wielding a skilful pen not only for physicians but also for many other professionals – a truism, which has been and is still disputed by many, many students in school. 5. Conclusion We have demonstrated that if the description of a single object or situation is obtained in an adequate language, its grammatical structure will contain information on the relative importance of the properties of objects or situations. This allows the creation of generalizations (abstractions) of the object or situation. This, in turn, allows us to discover relationships between the structure of the language and the system of concepts about the world around us. One of
240
S. Guberman
the manifestations of this kind of relationship is that the name of an object (not a symbol denoting it, but the name expressed in an adequate language of description) reflects its essence. We note here that this theme is the subject of discussion in a specific branch of philosophy – Philosophy of name [7]. In conclusion, let us quote two authors with whom we share their point of view. Plato: “Words do imitate Ideal Forms in a perfect and consistent way” [8]. Russell: “For my part, I believe that, partly by means of the study of syntax, we can arrive at considerable knowledge concerning the structure of the world” [9]. References 1. M. Bongard, Pattern Recognition (Spartan Books, New York, 1970). 2. A. Church, Introduction to Mathematical Logic, Vol. 1, (Princeton University Press, Princeton, NJ, 1956).
3. S. Guberman, in Proceedings of the 6th Systems Science European Congress, Sept. 19-22 2005, (Ecole Nationale Supérieure d' Arts et Métiers (ENSAM), Paris, 2005).
4. S. Guberman and E. Andreevsky, Cybernetics and Human Knowing 3(4), 41-53 (1996).
5. S. Guberman, Y. Pikovskii, E. Rantsman, in Proc. SPE Western Regional Meeting, (Long Beach, California, 1997).
6. S. Guberman, W. Wojtkowski, Res-Systemica, 2005, http://www.afscet.asso.fr/resSystemica/
7. A. Losev, Philosophy of Name, (in Russian: Context Publ., Moscow, 1992). 8. Plato, Cratylus , available at 9. 10. 11. 12. 13. 14. 15.
http://www.journals.uchicago.edu/ISIS/journal/issues/v94n4/940415010/940415010. web.pdf B. Russell, An Inquiry into Meaning and Truth, (Allen & Unwin, London, 1940), Preface. A. Treisman, Scientific American 254(1): 114-125 (1986). M. Wertheimer, Productive thinking (Harper & Brothers, New York, 1959). M. Wertheimer, Philosophische Zeitschrift für Forschung und Aussprache 1, 39-6 (1924). Wikipedia, “concept”, “gestalt”, www.wikipedia.org . P.H. Winston, Learning Structural Descriptions from Examples, Technical Report, (Massachusetts Institute of Technology, 1970), available at https://dspace.mit.edu/bitstream/1721.1/6884/2/AITR-231.pdf . L. Wittgenstein, Tractatus Logico-Philosophicus (Taylor & Francis, London, 2001).
GENERAL THEORY OF EMERGENCE BEYOND SYSTEMIC GENERALIZATION
GIANFRANCO MINATI Italian Systems Society, Milan, Italy E-mail: [email protected] The problem in defining generalization is considered by examining some core aspects, such as (a) the extent of the domain of validity of a property, (b) the transformation between different non-equivalent representations and (c) the respective representations of different observers and their relationships, i.e., a dynamic theory of relationships between levels of observation as introduced by the Dynamic Usage of Models (DYSAM). The purpose of this paper is to better clarify the conceptual framework of generalization in order to be able to set the context for a General Theory of Emergence as meta-theory, using models of models (as for logical openness) and interacting hierarchies. After considering some approaches used to generalize and focussing upon the purpose of General System Theory for generalizing, we examine some concrete approaches, such as DYSAM, for building up a General Theory of Emergence with specific theories of disciplinary emergence as particular cases. Keywords: emergence, generalization, meta-theory, models, system, trans-disciplinarity.
1. Introduction We introduce the concept of generalization by distinguishing between the concepts of transposition and translation of properties. We then specify the meaning of the concept of generalizing as: 1) extension of the domain of validity of properties; 2) transforming between different non-equivalent representations; 3) respective representations of different observers and their relationships. The discussion relates to the possibility of using models, representations, methodologies and results obtained in one domain in another one and to simultaneously use different, non-equivalent models such as in the Dynamical Usage of Models (DYSAM) briefly described below. We present a list of classical approaches, i.e., in non-systemic frameworks, used for generalizing, such as Abstraction, Analogy, Concept, Homomorphism, Induction, Isomorphism, Knowledge Representation, Language, Learning, Metaphor, Model, Relation and Structure. We briefly discuss what is considered to be the opposite of generalizing, that is making unique and non repeatable, focussing upon the level of representation used.
241
242
G. Minati
We then examine the generalization of systemic properties, i.e., considering them as properties of categories of new entities, i.e., systems. We consider the so-called General System Theory (GST) approach and the prospective framework of the General Theory of Emergence (GTE). While GST is concerned with the generality of properties of systems established through organization, a GTE is expected to focus upon collective phenomena establishing systems and particularly: 1) correspondences between models and representations of phenomena considered emergent; 2) the development of tools for detecting and verifying processes of emergence and de-emergence; 3) the identification and classification of different possible non-equivalent kinds of emergence; 4) identification of the limitations of its generalization by defining the domain of validity of such a theory. We then introduce DYSAM as an approach for meta-modeling within the framework of the search for a GTE as meta-theory, still unavailable. We then introduce aspects considered relevant for a future GTE. 2. What is generalization? 2.1. An introduction The word generalization denotes the process of making general. The adjective general identifies the fact that a property is considered suitable for a larger quantity or wider variety of elements than that originally considered. The process of generalizing is dealt with in various disciplines, such as in logics when dealing with induction [1], philosophy [2], psychology and AI as in processes of learning [3]. It may take place with or without the changing of variables or rules in representations and models. In the first case there is a transposition of the same property in different contexts, in the second there is an adaptation, translation, which, however, maintains the fundamental aspects. In linguistics, for instance, the same words or expressions may be transposed between different languages or may be translated in such a way as to keep the same meaning by using different words and concepts. Examples of transposition take place when the same model is used to model different kinds of phenomena, i.e., transposed, such as for the Lotka-Volterra equations used to describe population and market dynamics by changing the meaning of variables. In music, an example of transposition is when an orchestra plays adapting its music to a special, historical, recorded theme or natural sounds (e.g., Mozart’s kindergarten symphony). Examples of translation take place when the same conceptual approach is used to model different kinds of
General Theory of Emergence Beyond Systemic Generalization
243
phenomena, such as looking for attractors and processes of convergence or equilibrium in different models. In music an example of translation is when the same score is adapted for another musical instrument. We recall that generalizing is different from making generic. The process of making generic makes properties imprecise and fuzzy. In this way the field of validity is not well-defined and, because of that, extended. A generic property possesses little rigor, and is accepted as being imprecise. In everyday usage of the concept there is an unfortunate correspondence between impreciseness and general validity. Similarly, popularizing is based on simplifying complex concepts. Generalizing in this case is intended as an extension of the validity of welldefined properties to other entities in a domain or to other domains by reducing accuracy. 2.2. Specifying the concept of generalization We may now try to better specify the concept of generalization. It relates, for instance, to 1. the problem of extending the domain of validity of a property. In this case, for instance, the process of generalizing consists of replicating a behavior in any context (the success or failure is not cognitively learned, but evolutionarily selected). For instance, the use of pheromone trails by ants is a behavior repeated in any context. It doesn’t work on lava, ice or water because it is a non-survivable or unsuitable environment for ants. In this case the behavioral rules of the system are fixed; 2. the transformation between different non-equivalent representations such as continuous and discrete by allowing interpolation and extrapolation in mathematics, conservative and non-conservative systems [4], biological and physical modeling [5]. This relates to the relationships between different models. By the way, we are not postulating the assumption that it is always possible to transform one model into another, to reduce one to another. It is possible to find correspondences which may allow reducibility or furnish the reasons for irreducibility; 3. the respective representations of different observers and their relationships. This relates to a lack of a dynamic theory of relationships between levels of observation. One attempt to introduce a suitable approach was the introduction of the Dynamic Usage of Models (DYSAM) [6,7]. We underline how this view is the opposite of reductionism based on the usage of a single, specific disciplinary model to deal with any kind of phenomenon. An extension of this approach is given by learning. Learning is intended as in modern cognitive science, i.e., as the process of suitably changing behavioral rules to better fit the environment.
244
G. Minati
These three points may be considered as issues for the fulfillment of strategies for generalizing. The interest in a better understanding of the concept of generalization relates to the possibility of implementing two different strategies: • the use of models, representations, methodologies, results obtained in one domain, in another one, as with inter-disciplinarity, mentioned below in Section 5; • the simultaneous usage of different non-equivalent models, by adopting multidimensional representations leaving it to the user to identify a suitable strategy for dealing with multiple-levels, based upon acting on one level to influence the others. This is a typical instance of Collective Beings (CBs), particular cases of Multiple-Systems (MSs). MSs are established by the same components interacting in different ways, such as interacting networked computer systems performing cooperative tasks and the Internet, where different systems play different roles in continuously new, emerging usages. CBs are established when the same components, interacting in different ways, are autonomous agents, i.e., possessing a natural or artificial cognitive system, able to simultaneously or dynamically decide to interact in different ways [7]. Examples include components which are simultaneously members of families, workplaces, traffic systems, mobile telephone networks or consumers. How it is possible to combine generalizations? Which combination of generalized properties is required to produce a property still possessing general validity (e.g., linear, non-linear and connectionist)? What about the domain of validity? Is it just an intersection of domains or it is possible to consider more sophisticated solutions? We may think of a kind of algebra of generalized properties. This is the problem of trans-disciplinarity as mentioned below. We briefly recall another aspect to be considered, that is, theoretical issues, but also obstacles to generalization in real life. Actually, we have examples of powerful approaches for generalizing which have received little or no interest from researchers. We need to discover the real reasons for this lack of interest. We are faced with the problem that theoretical research is performed by human beings living in real social systems affected by several kinds of real problems such as career, need for gratification and support, stereotyping of approaches, difficulties in publishing, economic and human resources, interest from students, colleagues and industry. We mention a couple of examples in the field of mathematics related to a domain of research having achieved results for generalization which are unconsidered. The first relates to research on
General Theory of Emergence Beyond Systemic Generalization
245
continuity extended from real numbers to cardinal or transfinite numbers, i.e., supercontinuum [8]. The other relates to the Löwenheim-Skolem theorem in model theory, asserting that the existence of a model with transfinite cardinality implies existence of the same model with any other transfinite cardinality. This theorem established the existence of “non-standard” models of arithmetic. It states that even systems in which we may prove Cantor' s Theorem, stating the existence of transfinite numbers, had countable models [9]. 3. What is the opposite of generalization? In the preceding section we listed some crucial characteristics of the concept of generalization in conceptual, non-systemic frameworks. From this discussion we may conclude that a first, general characteristic of the concept of generalization is related to that of repeatability. This correspondence considers, of course, the context, the model, the level of description and the observer. We may say that we apply the concept of generalization to itself. The concept of repeatability may be a good entry point to specify what generalization is not. The opposite of generalizing is to make particular, that is to consider a specific, non-repeatable situation. In this case the interest is on specifying the uniqueness and how this has been achieved. Depending on the level of description, any event may be represented as unique. Such a situation may be related to (a) the level of description considering a virtually infinite number of details, (b) events having very low probability of occurring (unique events are more frequent than probable ones) and (c) models representing and allowing simulation of unique events, such as for chaos. Nevertheless unique events are interesting not only for modeling and, eventually, simulating how they have been generated, but also because they may be represented and memorized. It is then possible to reproduce the representation. Examples of unique events are registrations of artistic events and pictures of unique astronomical events. It is then possible to scientifically consider representations of unique events. Thanks to modeling and computer techniques it is now possible to simulate, i.e., make repeatable and interactive, unique events. Of course, uniqueness exists only in relation to some specific aspects of the representation, i.e., a level of description. On the one hand, representation, reproduction, modeling and simulating are intrinsically limited by the information available. For instance, pictorial images do not explicitly provide information about temperature and speed, but these may be logically inferred. On the other, the representation is related to the constructivist process used by the observer [10].
246
G. Minati
After these brief comments, we can see that the characteristic of being specific and non-repeatable mainly relates to the level of description used by the observer. When considering the opposite of generalizing we can also mention a process usually considered to have this characteristic: the process of specializing. It usually refers to applied, research and educational activities in specific disciplines. Specializing is intended as a process that identifies a welldefined and restricted area where specific disciplinary knowledge and expertise is applied in a repeatable way. As we will see, specialization can be considered as the opposite of generalization once we have a theoretical and not only operational definition of generalization. 4. Outline of some classical approaches used for generalizing As mentioned in Section 2 there are at least two ways of generalizing: 1. the process of generalizing consists of making something applicable to a wider variety of cases. Problems of generalizing in this case regard the repeatability and applicability of cognitive results to cases having a context of validity different from the initial one. 2. the process of generalizing consists of representing the same property in different, equivalent ways. An example of a way of representing the same property in different contexts is isomorphism, structure-preserving mapping between two algebraic structures. An example of a way of representing the same property in a partial way in different contexts is given by analogy and metaphors (see below). In the following we list and briefly comment upon some approaches used to generalize, i.e., to extend the validity of properties and operators within a specific domain and from a specific domain to another domain. They are called classical because they do not necessarily apply to systemic properties. Briefly, we may distinguish between systemic and non-systemic properties when properties are considered, by the level of description of the observer, as related to elements or to the systems established by interacting elements. Elements or parts are constructivistically identified by the observer when using (equivalent or non-equivalent) models to explain the whole, i.e., the system [11]. In short, systemic properties are properties of systems established through organization or collectively. Systemic properties are stationary, i.e., existent while the process of organization or emergence is active, such as functions for organizations, e.g., assembly lines and electronic circuits, computational functions for connectionist devices (e.g., Neural Networks) and life itself. A system may also have nonsystemic properties whereas a non-system can not possess systemic properties,
General Theory of Emergence Beyond Systemic Generalization
247
because these are only generated by emergence or organization. Examples of non-systemic properties, i.e., not necessarily applying to systems, are: weight, speed, quantity, position, shape and odd/even. Examples of systemic properties, i.e., necessarily applying to systems, are: complexity, dissipation, openness/closeness and organization. Behavior, on the contrary, may refer both to elements and systems established by interacting elements. For instance, consider the behavior of a particle and the behavior of a gas of particles in a changing environment (pressure and temperature). Generalization of not necessarily systemic properties, may take place through: • Abstraction. Cases can be considered as special cases of more general ones, as in learning. • Analogy. At a certain level of description, two different items are considered equivalent. • Conceptualizing. Cognitive inference producing concepts from abstraction, objects to be used in a generalized way through tools such as language and in particular analogy and metaphors. It is possible to have the concept of abstraction whereas it is not possible to have the abstraction of a concept. Concepts, distinguished from abstractions themselves, relate to the usability of the abstractions as elements for other, higher levels of abstraction. • Morphisms. Homomorphism enables the conservation of an algebraic structure from one domain to another. Isomorphism, bijective morphism, enables the conservation of an algebraic structure between two different domains in a bijective way. • Induction. It enables one to assume probable the extension of a property p1 detected in all elements considered and all having another property p2 to all elements having the property p2. • Knowledge representation. Knowledge has a generalizing content per se, referring to the possibility of applying it to different cases. Representing has a generalizing content allowing a higher level of abstraction, i.e., using knowledge to process representations of itself. • Language. Languages are of crucial interest for generalizing because they use symbols (e.g., words) as representations and, recursively, as elements to generate higher, i.e., more abstract, levels of representation, such as statements and systems of statements, e.g., books, stories, hypotheses and models. • Learning. Learning may be intended as a representative process of generalizing being related to cognitive restructuring in such a way as to build up models for larger, i.e., more general cases.
248
•
•
•
•
G. Minati
Metaphor. It allows description of something less well-known in terms of something more well-known. It is a kind of hypothetical analogy between the familiar and the less familiar. In this way the process of producing metaphors may also be very misleading because the suggested analogy may be completely unsuitable. Modeling. It has a particular power of generalization because knowledge is not only represented to be transmitted, recorded, and used to induce generation of other knowledge, but also for simulation. Relations. They constitute a very basic kind of correspondence between elements in different sets. Making different sets in correspondence to each other is the first basic step for the possibility to transport properties from one set to others. Structures. They are simple ways to represent symbolic knowledge and may be generalized as properties through transpositions between different sets.
When considering processes of generalization we may also consider the measurability of generalization to deal with questions such as: • Which property is more general? • How general is a property? Quantifying generalization may relate to the extension of the domain of validity, by considering limitations. For instance flexible and diluted are properties which may be used in a general way from their original domain of validity, physics and chemistry. Which property is more general? The one provided with the larger domain of validity. Probably the property of being flexible applies to a larger number of cases than the property of being diluted because the latter applies to environments having, even metaphorically, different densities whereas the former to virtually any environment. To answer the question How general is a property? it is necessary to introduce a measure. This is not the interest of this paper. For our purpose it is sufficient to consider that it is possible to define the problem. Another issue is what we may call “Artificial generalization”. Examples of artificial processes of generalization include, for instance, areas of artificial intelligence, such as pattern recognition, image understanding, language processing and automated diagnosis. Another aspect concerns the ability of artificial systems to learn, i.e., generalize non-explicitly represented knowledge, such as in artificial Neural Networks. Another example concerns reasoning by example. Most of these approaches are based upon Gestalt principles [12,13,14]. Their application is related to methods, focussing on the
General Theory of Emergence Beyond Systemic Generalization
249
general problem of pattern recognition, for generalizing experimental data allowing computers to use sets of examples, as in geology and medicine. This subject is not the interest of this paper. We only mention how the problem can also be considered from this point of view. We conclude this Section with some comments regarding the question “Is it possible to not generalize? Generalizing is a property of cognitive systems. Systems having cognitive systems of different levels of complexity generalize in different ways and at different levels. This may easily be detected by considering animal behavior. Moreover, any learning process is based on some level of generalization. So the question does not apply to systems provided with cognitive systems. We are interested on how and what to generalize, as discussed in the following session. 5. Generalization for systemic properties Systemic properties, as introduced above, are general as they not only refer to categories of items, but to properties adopted by categories of new entities, i.e., systems. How the interaction between elements is a necessary condition for the establishment of systems has been widely discussed in the literature.7 The so-called General System Theory (GST) was introduced by von Bertalanffy15 for systems design, properties, usage, representation, inter- and trans-disciplinary applications. Inter-disciplinarity occurs when approaches and models of one discipline are used by another, while trans-disciplinarity arises when properties are studied per se, considered without reference to specific disciplinary cases; transdisciplinarity also studies the relationships between properties. In this line of thought we had the establishment of various approaches such as Systems Dynamics [16], Systems Theory [17], Systems Engineering [18], Information Systems [19], Living Systems [20], Social Study of Information Systems [21], Soft Systems Approach [22], and Systems Practice [23]. After considering processes such as collective behavior, self-organization, emergence and the constructivist role of the observer, the focus in the literature was then generalized by considering 1) how collective phenomena establish systems, 2) how processes of acquisition of new properties take place within systems, 3) hierarchies of emergence and interactions between them, 4) multimodeling and 5) establishment of Multiple Systems. The new problems relate to the search for a General Theory of Emergence (GTE) to account for any kind of collective phenomena [24]. This expression is used in various disciplinary contexts with the aim of generalizing processes of emergence, as when introducing the concept of an “evolutionary Mechanics” [25] and for an
250
G. Minati
ontology of levels and when discussing agent-based computing [26]. While the framework of GST regards systems established through organization, structured into subsystems and possessing properties to be managed by using inter- and trans-disciplinary approaches, GTE deals with processes of the establishment of systems through emergence, where the dynamics do not relate only to processes with respect to time, but also to multi-modeling, hierarchy of interacting levels, acquisition of new properties and the multiple roles of elements. In this way GTE considers GST as a particular case. Moreover GST has been controversially named “Theory” even though it is more of an approach and a cultural framework. Theories are statements of a language having generalized explicative and predictive power. In science a theory consists of hypotheses related to experimental data logically connected as in the hypothetico-deductive method. A theory is then a formalization of observations allowing it to be used as a model to explain, predict, simulate and to be negated. Examples of very well-known theories are: electromagnetism, game theory, gravitation, quantum theory and relativity. The term theoretical relates to the representation of a result which is predicted by theory, but has not yet been observed. Typical examples include the prediction of the existence of black holes. Failed predictions are useful to prove a theory wrong, as in the famous Michelson-Morley experiment performed to detect the aether wind. GTE is expected to become a real theory able to explain collective processes in different disciplinary fields. It is conceptually possible to use theories of phase transitions in any disciplinary field if we are able to properly describe processes using suitable variables in the equations. In the same way we could have both a single GTE of, more likely, different disciplinary approaches expressing in different ways the same principles (at the moment we have neither!). 6. General Theory of Emergence The study of emergence and emergent phenomena is at the focus of current research interests in several disciplines, such as Physics, Biology, Artificial Intelligence, Economics and Cognitive Science. There are general theories such as Analytical Mechanics related to a mechanical description of phenomena; Thermodynamics related to a description of phenomena in terms of energy and collective motion of particles; theories of phase transitions related to changes of state of matter (e.g., solid, fluid, gaseous, superconductive and superfluid), development of biological organisms, learning and aspects of social systems. Synergetics related to processes of selforganization (describable through order parameters) of patterns and structures
General Theory of Emergence Beyond Systemic Generalization
251
in open systems far from thermodynamic equilibrium as in many different physical, biological, chemical and social systems. For such modeling, it is possible to achieve generality by giving different meanings to the variables (e.g., particle, cell, social agent as buyer in a market). The novelty is that a GTE should deal with a hierarchy of processes of emergence, when a property emerges from the interaction of entities emerging from lower-level structures, such as in Baas hierarchies [27,28,29], acquired emergent properties (see the paper presented at this conference) and systemic, emergent properties having current theories as particular cases. It may be expected to be a theory of modeling emergent phenomena, a meta-theory like meta-mathematics (i.e., mathematics used to study the foundations and methods of mathematics) within a formalist approach. The purpose in this case is not to try to demonstrate self-coherence (destroyed by Gödel’s theorem), but the relationships and interdependence between models. Meta-theorizing is now being focussed upon in various disciplinary fields, including physics [30]. Metatheories are related to modeling by using models, i.e., meta-modeling. Meta-modeling is a consolidated approach in several disciplines, such as software engineering using meta-languages to describe other languages and creating semantic models [31,32,33,34]. Another example of meta-modeling in systems science is the concept of logical openness related to the establishment of meta-levels, i.e., models of models [35]. Finally the concept of DYSAM not as a single, procedural, rule-based methodology, but as a systemic general model, meta-model (i.e., a model of models), used to carry out specific, contextual methodologies [7]. A GTE is thus very closely related to the generalization of processes of establishing collective phenomena possessing and generating multiple systemic properties. It is expected, at least, to: • deal, in a systematic way, with correspondences between models and representations of phenomena considered emergent; • allow the development of tools for detecting and verifying processes of emergence and de-emergence in general [7,36]; • identify and classify possible different non-equivalent kinds of emergence, such as biological and physical [5]; • identify the limits of its generalization. It is necessary to define the domain of validity of such a theory. This is because any attempt to produce a theory having unlimited value has in itself the contradiction stated by the wellknown Gödel theorem in Meta-mathematics.
252
G. Minati Table 1. Emergence of new properties in emergent systems. Emergent systems
Social systems
Living systems Brain
Properties of the emergent systems Language and population dynamics Homeostasis, Autopoiesis Cognitive abilities
Emergent properties within emergent systems Swarm Intelligence and Collective learning abilities (e.g. industrial districts) Psychosomatic illnesses Mind illnesses
GST should be intended as a general theory of processes establishing systems and systemic properties. The general process to be theorized is that of emergence [37], comprehensive of that of organization. In its turn a GTE should have particular theories as specific cases and should be able to multi-model processes of the establishment of systems and of systemic properties. This relates to meta-properties such as the property to • make emergent a process (i.e., induce processes of emergence); • detect the occurrence of a process of emergence; • make a process de-emerge (i.e., to disappear); • transform one process of emergence into another; • manage a process of emergence (e.g., slow down, speed up, split, change parameters); • mix processes of emergence (e.g., swarming and markets); • separate processes of emergence; • find possible categories of non-equivalent processes of emergence; • describe interactions between levels in hierarchies making new properties emerge; • distinguish between processes of emergence of systems and processes of emergence of properties in complex systems (i.e., processes of emergence in processes of emergence). In this view, systems do not only possess properties, but new properties may be established by following processes of emergence taking place within them (Tab. 1). GTE may be able to generate meta-models, i.e., models of models of specific processes of emergence as studied in the literature. We may have an anticipation of this theory when considering systemic properties as trans-disciplinary properties and try to represent relationships between them. Using this language we may, for instance, describe conditions for their compatibility/incompatibility, sequential occurrence over time, randomness, multi-modeling, power to activate, detect, de-emerge, transform, influence, mix, and separate processes of
General Theory of Emergence Beyond Systemic Generalization
253
emergence. Today we have phenomenological approaches, but not a theory able to model embedded processes of the emergence of properties. Systems do not only objectively possess properties, but are also able, in their turn, to make emergent new ones (complex systems are systems within which processes of emergence occur). Examples of emergence of systemic properties in systems established by processes of emergence are given by cognitive abilities in natural and artificial systems, collective learning abilities in social systems such as flocks, swarms, markets, firms and functionalities in networks of computers (e.g., Internet). Evolutionary processes are assumed to establish properties in living systems together with processes of self-organization [38]. The generality that we have today relates to the validity of a theory in different domains. With GTE we are looking for a theory of properties, taking the first ones as particular cases. We also think that a good approach is both to try to clarify why theories such as, for instance, Synergetics, theories about phase transformations and dissipative structures and quantum models of emergence are not themselves General Theories of Emergence (i.e., what is missing?) and how is it possible to generalize them in such a way to extend them towards a more general one. On this point we recall von Bertalanffy’s expression about the Unity of Science: “A unitary conception of the world may be based, not upon the possibly futile and certainly farfetched hope finally to reduce all levels of reality to the level of physics, but rather on the isomorphy of laws in different fields” [15]. Is this the prospective for a General Theory of Emergence? 7. Conclusions Scientific research is a continuous balance between delving deeper and deeper into details and generalizing results in order to apply it to a more generalized domain and create relations between specialized results at a suitable level of description. We have discussed here the definition of generalization with special regard to the context of systems research and emergence. The problem of generalizing as related to systems research and particularly that of founding a General Theory of Emergence has been discussed. We have introduced a framework within which it is possible to build up the basis of such a theory as a meta-theory, based on meta-modeling. A first step in this endeavor is find out why theories, such as Synergetics, theories about phase transformations and dissipative structures, and quantum theories are not themselves General Theories of Emergence and how they may be particular cases of a more general theory. The interest to lay the basis of such a theory is related to the possibility of
254
G. Minati
managing systemic properties and processes of emergence as with DYSAM and other approaches. Those improvements are expected, for instance, to allow: • the availability of a theory able to go beyond the cross-disciplinary usage of models (i.e., inter-disciplinarity), towards the usage of multiple-modeling, such as for CBs, and the modeling of embedded processes of emergence, as for the emergence of properties in emergent systems (i.e., transdisciplinarity); • the search for systemic properties where they have not yet been considered; • transfer between different levels: the study of problems in specific disciplines and details used to define systemic properties and use systemic properties for acting upon those details. The aim of this paper was to introduce: • the meaning of generalization; • how the issue of generalization is related to emergence and Systemics; • a framework for the search for a General Theory of Emergence. References 1. J.H. Holland, K.Y. Holyoak, R.E. Nisbett and P.R. Thagard, Induction (MIT Press, Cambridge, MA., 1986).
2. C.W. Evers and E.H. Wu, Journal of Philosophy of Education, 511 (2006). 3. T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997). 4. G. Nicolis, and I. Prigogine, Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations (Wiley, New York, 1977).
5. E. Pessa, in Systemics of Emergence: Research and Development, Ed. G. Minati and E. Pessa (Springer, New York, 2006), pp. 355-374.
6. G. Minati and S. Brahms, in Emergence in Complex Cognitive, Social and 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp. 4152. G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). J. Hintikka and G. Sandu, The Journal of Philosophy 290 (1992). A. Rohn, The Journal of Symbolic Logic 25 (1941). R. Butts, and J. Brown, Eds., Constructivism and Science (Kluwer, Dordrecht, Holland, 1989). S. Guberman and G. Minati, Dialogue about Systems (Polimetrica, Milan, Italy, 2007). M. Bongard, Pattern Recognition (Spartan Books, New York, 1970). M. Wertheimer, Productive Thinking (Harper, New York, 1943). M. Wertheimer, Social Research, 78 (1944). L. von Bertalanffy, General System Theory: Foundations, Development, Applications (George Braziller, New York, 1968). J.W. Forrester, Industrial Dynamics (MIT Press, Cambridge, MA., 1961).
General Theory of Emergence Beyond Systemic Generalization
255
17. S.A. Umpleby and E.B. Dent, Cybernetics and Systems 79 (1999). 18. W.A. Porter, Modern Foundations of Systems Engineering (Mac Millan, New York, 1965).
19. R. Hirschheim, H.K. Klein and K. Lyytinen, Information systems development and 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.
data modeling: Conceptual and philosophical foundations (Cambridge University Press, Cambridge, 1995). J.G. Miller, Living Systems (McGraw Hill Books, New York, 1978). C. Avgerou, Omega, 567 (2000). P. Checkland and J. Scholes, Soft Systems Methodology in Action (Wiley, New York, 1990). P. Checkland, Systems Thinking, Systems Practice (Wiley, New York, 1981). G. Minati, in Systemics of Emergence: Applications and Development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), pp. 667-682. J.P. Crutchfield, in Complexity: Metaphors, Models, and Reality, Ed. G. Cowan, D. Pines and D. Meltzer, (Addison-Wesley, Reading, MA, 1994) pp. 515–537. C. Emmeche, S. Køppe and F. Stjernfelt, Journal for General Philosophy of Science 83 (1997). N.A. Baas, in Alife III, Santa Fe Studies in the Science of Complexity, Proc. Volume XVII, Ed. C. G. Langton, (Addison-Wesley, Redwood City, CA, 1994), pp. 515-537. N.A., Baas and C. Emmeche, Intellectica 67 (1997). K. Kitto, Modeling and generating Complex Emergent Behavior, Ph.D. thesis, The School of Chemistry, Physics and Earth Sciences (The Flinders University of South Australia, 2006). S. Blaha, The Metatheory of Physics Theories, and the Theory of Everything as a Quantum Theory Computer Language (Pingree-Hill Publishing, Auburn, NH, 2005). G. Booch, J. Rumbaugh and I. Jacobson, The Unified Modeling Language User Guide (Addison Wesley Longman Publishing Co., Redwood City, CA, 1999). J.P. van Gigch, System Design Modeling and Metamodeling (Plenum Press, New York, 1991). J.P. van Gigch, Applied General Systems Theory (Harper & Row, New York, 1978,). J.P. van Gigch, Metadecisions: Rehabilitating Epistemology (Kluwer, New York, 2003). G. Minati, M.P. Penna and E. Pessa, Systems Research and Behavioral Science 131 (1998). G. Minati, in Emergence in Complex Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp. 85-102. J.P. Crutchfield, Physica D, 11 (1994). S. Kauffman, Investigations (Oxford University Press, New York, 2000).
This page intentionally left blank
UNCERTAINTY, COHERENCE, EMERGENCE
GIORDANO BRUNO Department of MEMOMAT, Sapienza University of Rome Via A. Scarpa 16, 00164 Rome, Italy E-mail: [email protected] In a previous paper (Uncertainty and the Role of the Observer, co-authored with G. Minati and A. Trotta, Proceedings of the 2004 Conference of the Italian Systems Society in publication by Springer), we focused on the deep epistemological contribution of the Italian mathematician Bruno de Finetti (1906 - 1985), from a systemic point of view. He considered the probability of an event nothing but the degree of believe of the observer in its occurrence, relating this degree of believe to the information available, in that moment, to the observer. He pointed out how, when considering probability, we need to focus on the role of the observer expressing the degree of believe and how S/He can construct a system of coherent probabilities. The purpose of this paper is to show how this subjective conception of probability is based on assuming a systemic framework, even in cases of conditional events. Regarding this, we underline how the fundamental conceptual and methodological tool is the well-known Bayes Theorem. With reference to this theorem, we will be introducing examples to show how its usage is not only crucial in generating probabilities suitable for the emergence of a system of coherent evaluations, but even able to explain some paradoxical aspects. Keywords: subjective probabilities, Bayes theorem, role of the observer, coherent evaluations.
Sembrerebbe pertanto naturale che i modi abituali di pensare, di ragionare, di decidere, dovessero esplicitamente e sistematicamente imperniarsi sul fattore incertezza come sull’elemento concettualmente preminente e determinante. (Bruno de Finetti, Teoria delle probabilità, Einaudi, 1970) 1. Introduction In a previous paper [1] we focused on the deep epistemological contribution of the Italian mathematician Bruno de Finetti (1906 - 1985), from a systemic point of view. He considered the probability of an event nothing more than the degree of believe of the observer in its occurrence, relating this degree of believe to the information available, in that moment, to the observer [2]. The goal of the paper was to show how the subjectivist approach to probability has a systemic validity, in the sense that the observer plays a fundamental role in making the emergence
257
258
G. Bruno
of a system of coherent probabilities, when S/He must assign a set of probabilities to different events, relating to a given random phenomenon. We recall now the principal treated aspects. First of all, it was remarked how it is possible to assign to a family of events a qualitative measure of probability, by a natural order relation: not less possible than [3]. This relation let us construct an axiomatic probability theory, even in the case of conditional events by introducing a further axiom. Obviously the observer is free, with regard to S/His opinion, to choose the preferred relation among all that are admissible. Secondly, it was quoted as de Finetti has introduced a numerical measure of the degree of believe of an event. He, by referring to a bet pattern, determines such a measure (probability) as the price p which a coherent person is willing to pay for getting 1 if the event were true or 0 otherwise. In this definition of probability, a coherent person is a person who agrees only bets in which S/He has not an “a priori” loss. This subjective probability approach, founded on the former numerical measure of probability, suggests how the de Finetti conception is the most able to assure a systemic procedure, based on the role of the observer and on the coherence tool. 2. Conditional events and their probabilities In this paper we wish to continue dealing with our argumentation in the case of conditional events. We recall that, given any two events E and H ( H ≠ Φ ), we can consider a conditional event E H , which has the following meaning TRUE, E H = FALSE, INDETERMINATE,
if H true and E true if H true and E false if H false
So if we want to bet on E/H we must do it, by de Finetti [2], in the following way:
PAY p TO GET
1 0 p
if H true and E true if H true and E false if H false
Uncertainty, Coherence, Emergence
259
By this definition of a conditional bet, we obtain that we can estimate the uncertainty of E H by means of the price p . This measure is named conditional probability and P( E H ) denotes it. De Finetti [2] proves that the conditional probability P( E H ) , so introduced, verifies all the properties (axioms) of a probability and it represents the probability of E , H supposed true. In particular, he shows how the natural condition of coherence: random winnings “not all negative” in a set of bets, leads for any event E and H ( H ≠ Φ ) to the theorem:
P( E ∩ H ) = P ( H ) P( E H ) ; and to its corollary, the well-known Bayes Theorem ( E ≠ Φ ):
P( H E ) = K P( H ) P( E H ) , with K = 1 P( E ), P( E ) ≠ 0 . 3. Bayes theorem meaning and its applications Let us dwell on the meaning of Bayes Theorem. If we regard E as an event which represents an experimental result of a random phenomenon and H an hypothesis concerning the same phenomenon, then the theorem asserts that the probability of H conditionally to E is proportional to the probability of hypothesis H multiplied by the probability of E conditionally to H . To clarify better the former explanation, let us to resort to the classical model of an urn of unknown composition. Let us considerer an urn which contains N balls, but it is unknown how many are red (having from 0 to N red balls). Let us indicate by E the event h red balls on n , as a possible result of an extraction of n balls (for example without restoration). The event in the urn there are r red balls on N , be the hypothesis H . The Bayes theorem allows us to evaluate the probability of hypothesis H conditionally to the experiment E (called final probability): it is proportional by the factor K to the probability of H (called initial probability) multiplied to the probability of E conditionally to H (called likelihood). In other words, the Bayes theorem shows us how we must update our evaluations in the presence of further information (better, by supposing to receive further information): final probability = K × initial probability × likelihood.
260
G. Bruno
Let us observe that initial and final have, in this context, only the meaning respectively of before and after of E beginning known. Of course, in the same way we must evaluate the final probability of the contrary hypothesis H c , obtaining
P( H c E ) = K P( H c ) P( E H c ) , with K = 1 P( E ) , P( E ) ≠ 0 . Much more in general, if we want to evaluate the final probabilities of m different hypothesis H j , which set a partition of the certain event Ω , we obtain the following expression of Bayes theorem:
P( H i E ) = K P( H i ) P( E H i ) , j = 1,2, P( E ) =
, m ; with K = 1 P( E ) ,
P( H j ) P( E H j ) , P( E ) ≠ 0 .
j =1, 2, , m
Let us go back to our example. The H1 represent the possible hypotheses of the urn’s composition. We may, after an initial guess, formulate, via Bayes Theorem, the final answer. In the general framework of objectivistic probability (classical as well as frequentist probability) the observer has only to execute, in the correct manner, the calculations, using symmetries or self similarity (we need to point out that some subjective choices have been made, e.g. all the outcomes are considered equiprobable and the extraction independent). In these cases Bayes Theorem loose part of its importance and it stands as a pure mathematical result. In de Finetti [2] subjectivistic context Bayes Theorem shows better the meaning of “learning by experience”. This is exactly what happens in the medical practice. While is investigating a possible illness, a medical doctor usually starts from an initial guess, then he asks for specific instrumental exams, and, on the basis of exams’ results, he comes out with the final answer. Even in this case we are in the realm of uncertainty: the event “The patient suffering of such an illness” is only possible, not certain nor impossible. Fortunately, despite human and instrumental errors, the probabilities of discovering an illness are close to 1 (or 0). In the applications the maximal likelihood method is often used to estimate the value of a parameter that may contribute to some stochastic phenomena: one has to compute the probabilities of E (probability densities in the case of a continuous parameter) conditionally the hypothesis H1 (likelihoods) and take the maximum value as the parameter’s estimate.
Uncertainty, Coherence, Emergence
261
In the urn example we evaluate the P( E H j ) , for all j, and we take the largest value: assuming that the maximum of P( E H j ) is obtained for j = 3 , we say that the hypothesis H 3 is an estimate of the “true” urn composition. Let us imagine that the urn contains ten balls without knowing how many of them are red. We sample (extract one ball and put it back into the urn) five balls, three are red. We call E this outcome. We now evaluate P( E H j ) , j = 0,1, ,10 , and we get:
P( E H1 ) =
5! j 3! 2! 10
3
10 − j 10
2
.
In particular, we have:
P( E H 0 ) = 0 P( E H1 ) = 0.0081 P( E H 2 ) = 0.0512 P ( E H 3 ) = 0.1323 P( E H 4 ) = 0.2304 P ( E H 5 ) = 0.3125 P( E H 6 ) = 0.3456 P( E H 7 ) = 0.3087 P ( E H 8 ) = 0.2048 P ( E H 9 ) = 0.0729 P( E H10 ) = 0 The maximum value 0.3456 is attained for H 6 , therefore we conclude that the estimate (maximal likelihood) for the urn’s composition is: six red balls out of ten. There are two side effects in this approach both of a logical nature. First of all we are using the inverse conditional probabilities P( E H j ) instead of the P( H j E ) which are the correct ones; only through the P( E H j ) one may be able to find out the most probable urn’s composition among the available hypotheses, and take it as the urn’s composition estimate. Moreover, and this is the second side effect, we have not made any use of the P( H j ) (hypothesis probability) and this may lead to a wrong conclusion. In the urn’s composition example, if one uses (as he should) Bayes Theorem in evaluating P( H j E ) he would need the P( H j ) . Every distribution, apart from the equiprobable, of them may lead to a different result, i.e. the maximum of
262
G. Bruno
H j may be not associated to H 6 . If we know, by chance, that it is more probable to find, in the urn, as many red balls as no red balls i.e. P( H 5 ) = 0.6 , P( H 4 ) = P( H 6 ) = 0.15 , and P( H 0 ) = P( H1 ) = P( H 2 ) = P( H 3 ) = P( H 7 ) = = P( H 8 ) = P( H 9 ) = P ( H10 ) = 0.0125, we get P( H 0 E ) = 0 P( H1 E ) = 0.00035 P( H 2 E ) = 0.00225 P( H 3 E ) = 0.00581 P( H 4 E ) = 0.12185 P( H 5 E ) = 0.66111 P( H 6 E ) = 0.18278 P( H 7 E ) = 0.01357 P( H 8 E ) = 0.00902 P ( H 9 E ) = 0.00320 P( H10 E ) = 0 Therefore we now have H 5 as the most probable hypothesis. Let us discuss a different example, may be even more significant for its “paradoxal” outcomes. Paul is looking for a new job, but he does not show up for the interview. Let us denote this event by E . The human resources director wants to know why, and he comes up with some hypotheses: H1 = Paul found a new job H 2 = Paul went to jail H 3 = Paul won a lottery or any other different reason. If the director uses likelihood to reach a conclusion, he will come out with H 2 , since H 2 implies E , therefore P( E H 2 ) = 1 . On the other hand, using Bayes Theorem, the most probable hypothesis may not be H 2 . As a matter of fact, if we assume P( E H1 ) = 0.6 , P( E H 3 ) = 0.2 e P( H1 ) = 0.7 , P( H 2 ) = 0.25 , P( H 3 ) = 0.05 we get
P( H1 E ) = 0.618 P( H 2 E ) = 0.368 P ( H 3 E ) = 0.014 Hence the most “reasonable” hypothesis is also the most probable.
Uncertainty, Coherence, Emergence
263
4. Conclusions The examples we have been discussing suggest the following remarks. Bayes Theorem is the cornerstone of “coherence” upon which we build up the probabilities updating. Only by using Bayes Theorem one is able to reduce the uncertainty not to eliminate it, and coming, therefore, to certain conclusions. It represents a good example of non-linear thinking. The probabilities updating must follow an unique principle: the coherence. It guarantees the observer not falling into contradictions moving from initial to final guesses. The final probabilities of the m events would be admissible if and only if they satisfied Bayes Theorem. The observer, therefore, is able to pick one or more sets of admissible hypotheses using coherently his level of faith on the events being considered, the available information and the coherence. The observer models the emerging system (set of events interacting with probabilities assigned by the observer himself) taking into account the likelihoods which are merely probabilities even though they are considered as “certain” data. The observer is eventually the unique “responsible” of his evaluations, and in any case it is not fair to say if he was right or wrong in his prediction. This is because any prediction lives in the realm of uncertainty and it cannot become a forecast (realm of certainty). One may only argue if the observer has been coherent or not. From what we have discussed in this present work as well as in the previous one, we may deduce that the uncertainty logic of Bruno de Finetti [4] is a significant example of systemic approach to the study of “reality”, and it may be considered as an example of logical openness. The observer, as a matter of fact, in his quest for an admissible system of probabilities, does not need (as in the objectivistic framework) to know the whole space of events, its elements and their probabilities, but he can start from the probability evaluation of a single event, and he proceeds step by step in evaluating the probabilities of those events he is interested in. Obviously the observer must consider all the interactions among the events (in the unconditioned as well as in the conditioned cases), and he must respect, as we mentioned several times, “only” the coherence.
264
G. Bruno
References 1. G. Bruno, G. Minati and A. Trotta, in Proceedings of the 2004 Conference of the Italian Systems Society (Springer, New York, 2006).
2. B. de Finetti, Theory of Probability, A Critical Introductory Treatment (translated by A. Machi and A. Smith) (Wiley, London, 1974).
3. B. de Finetti, Annales de l’Institut Poincaré 7(1), (1937). 4. B. de Finetti, La logica dell'incerto (Il Saggiatore, Milano, 1989).
EMERGENCE AND GRAVITATIONAL CONJECTURES
PAOLO ALLIEVI(1), ALBERTO TROTTA(2) (1) Sogin S.p.A., Via Torino 6, 00184 Roma, Italy E-mail: [email protected] (2) Department of Mathematics, Scientific Lyceum ”Innocenzo XII” Via Ardeatina 87, 00042 Anzio (RM), Italy E-mail: [email protected] The behaviour of coherent structures emerging as outcome of a phase transition can be ruled by classical or by quantum laws. The latter circumstance depends in a critical way on the relative importance of quantum fluctuations which, in turn, depends on the numerical value of Planck’s constant. In this paper we explore the consequences of the hypothesis according to which there are different kinds of Planck’s constant, each one related to the kind of interaction entering into play in the specific phase transition. Within this paper we dealt with the simplest case, in which we have only two Planck’s constants: the usual one, interpreted as related to electromagnetic interactions, and another, related to gravitational interactions. We feel this framework should be useful to describe cosmological phase transitions, such as galaxy and star formation, as well as the birth of black holes. According to our hypotheses, these emerging coherent structures should be ruled by suitable quantum laws (expressed, for instance, by a suitable kind of Schrödinger equation), including a “gravitational” Planck’s constant. Even if the present paper deals with the particular case of gravitational interactions, it seems that its methodology could be useful even to study other kinds of emergent phenomena. Keywords: Planck’s constant, gravitational interaction, corpuscular models, gravitational waves.
1. Introduction The behaviour of coherent structures emerging as outcome of a phase transition can be ruled by classical or by quantum laws. The latter circumstance depends in a critical way on the relative importance of quantum fluctuations which, in turn, depends on the numerical value of Planck’s constant. Usually people assumes the latter as a universal constant, whose value is very small. It is, however, worthwhile to explore the consequences of the hypothesis according to which there are different kinds of Planck’s constant, each one related to the kind of interaction entering into play in the specific phase transition. Would this be the case, the whole theory of emergence should be reformulated. Within this paper we dealt with the simplest case, in which we have only two Planck’s constants: the usual one, interpreted as related to electromagnetic interactions, and another,
265
266
P. Allievi and A. Trotta
related to gravitational interactions. In order to keep the theory as simple as possible, we adopted a semiclassical description, based on a heuristic corpuscular model of long range interactions, which let us find the numerical value of the “gravitational” Planck’s constant, as well as give a more correct estimate of frequency of gravitational waves. We feel this framework should be useful to describe cosmological phase transitions, such as galaxy and star formation, as well as the birth of black holes. According to our hypotheses, these emerging coherent structures should be ruled by suitable quantum laws (expressed, for instance, by a suitable kind of Schrödinger equation), including a “gravitational” Planck’s constant. We found even a more general formula to compute the value of the associated Planck’s constant for whatever kind of long range interaction, provided it be described by a corpuscular model. Even if the present paper deals with the particular case of gravitational interactions, it seems that its methodology could be useful even to study other kinds of emergent phenomena. As a consequence of the foregoing, the paper deals with the following topics: • the order of magnitude of the graviton wavelength λG = 1013 m and frequency vG = c/λG =(3⋅108)/(1013) ≅ 10-5 Hz (section 2), • the gravitational Schrödinger’s Equation (section 4) and the gravitational Planck’s constant hg = 2⋅1021 h (section 6, where h = 6.62⋅10-34 Js is the electromagnetic Planck’s constant), • the quantized orbits of the solar system (section 7), • the limit mass Mlimit = 284 solar masses of the black Hole (section 8). The conclusions are important because it is possible to place the frequency range of the gravitational waves around 10-5 Hertz, a circumstance allowing precise experimental tests. The gravitational Planck’s constant hg = 2⋅1021 h and graviton wavelength λG allow us to estimate the order of magnitude (10100) of the ratio of the largest (lPlanckG = 1067 cm) to the smallest (lPlanckE = 10-33 cm) Universe dimension. The topic has been treated in a classical/quantum way to find a confirmation of a heuristic corpuscular Model of Electrodynamics and Gravitation (see Table 1, where the order of magnitude of some parameters of interest are shown, and Table 2). In fact considering a heuristic corpuscular Model of Electrodynamics [1], based on the hypothesis that every charge emits naturally and continually particles of energy εf ≅ 10-24 eV and diameter df ≅ 10-19 m, it is possible to state the mathematical structures of 0 (permittivity of free space) and h (electromagnetic Planck’s constant).
Emergence and Gravitational Conjectures
267
Table 1. Comparison between Atomic and Solar Systems.
Mass
m
m
m
m
kg
Heavenly body
Diameter
eV
1022 D
kg
length
Mass
Particle
SOLAR System
Diameter D
ATOMIC System
Atom U238
4E-25
2E+11
1E-10
-
1E+12
1,6E+12
-
Photon uv
2E-35
1E+01
1E-10
1E-07
1E+12
-
-
SunJupiter -
Nucleus U238 Nucleon
4E-25
2E+11
1E-13
-
1E+09
1,4E+09
2E+30
Sun
1,67E-27
9E+08
1E-14
-
1E+08
1,4E+08
2E+27
Jupiter
Graviton
2E-34
1E+02
1E-14
1E-13
1E+08
-
-
-
Electron
9E-31
5E+05
1E-15
-
1E+07
1,3E+07
6E+24
Earth
Fotino
2E-60
1E-24
1E-19
-
1E+03
1,0E+03
1E+10
mountain
Gravitino
2E-84
1E-48
1E-22
-
1
1
1E+03
rock
Table 2. Present universal Values (Constants over time??). stationary G,c ε0, c
Gravitation Electromagnetism
waves h g, c h,c
me = const e2 e
In particular for h we have [2]:
h=
2 ⋅ 9π k D2 d f ≅ 16 k 4 ε f c
2 ⋅ 9π (10 21 ) 2 10 −19 ⋅1,6 ⋅10 −19 ⋅ ⋅ = 6,62 ⋅10 −34 Js (1) 16 (3 ⋅1013 ) 4 10 − 24 ⋅ 3 ⋅108
Considering, moreover, a heuristic corpuscular model of gravitation [3], based on the hypothesis that every body emits naturally and continually particles of energy εg ≅ 10-48 eV and diameter dg ≅ 10-22 m, with time constant
τ = (cR 2p ) (4m p G ) = 2 ⋅109 years (were Rp and mp are respectively proton Radius and mass), it is possible to state the mathematical values of G (gravitational constant) and hg (gravitational Planck’s constant). In particular for hg we have [4]:
268
P. Allievi and A. Trotta
hg =
2 ⋅ 9π kD2 d g 2 ⋅ 9π (1021)2 10−22 ⋅1,6 ⋅10−19 ≅ ⋅ ⋅ ≅ 1,32⋅10−12 Js 16 k 4 ε g c 16 (3⋅1013)4 10−48 ⋅ 3⋅108
(2)
Some consequences of such a corpuscular model of gravitation are the following: (1) The velocity of light c decreases over time (3 m/s in 10 years); (2) solar system expands; (3) Earth Radius increases of 3 mm/year; (4) Earthquakes recurrence Period is 7 years; (5) Earth revolution (year) and rotation (day) Periods increase respectively of about 8 s and 0,004 s in 100 years. Therefore the ratio between hg and h is:
hg h
=
ε f dg ⋅
εg d f
≅ 2 ⋅10 21
(3)
It is noticeable that for any other X Process by which particles, of energy ε x and diameter dx, are emitted, it is possible to state the following mathematical structure of the new emerging physical quantity:
hx =
2 ⋅ 9π k D2 d x ⋅ 4⋅ . 16 k εx c
(4)
2. Tuning Error Although the theory of general relativity foresees that an accelerated mass emits gravitational waves, which travel in space at the velocity of light and consist of gravitons, since 1950 the scientists are vainly proving the existence of the gravitational waves. Systems of antenna, to catch gravitational waves, have been built for resonance frequencies of the order of some kHz. In the following we suppose that there is a tuning error in the astronomical observations, in that frequencies too high are considered. We moreover suppose that the mass and the dynamical parameters of the heavenly bodies, which emit gravitational waves, only determine the wave intensity and not its frequency, which is characteristic of the elementary units (accelerated nuclei) constituting the heavenly bodies. If we refer to the Table 3, we can see that proceeding from strong interaction to electromagnetic one and then to gravitational one, the characteristic frequencies, which are associated to the particles transferring action, progressively decrease. We therefore assume that this is a natural manifestation which all the existing matter shows and leads to place the frequency range of the gravitational waves around 10-5 Hertz.
Emergence and Gravitational Conjectures
269
Table 3. Characteristic frequencies, associated to the particles transferring action. Emitting source
Source Radius
Radiation (Particle)
Nucleus
Rn
m 1E-14
Atom
RA
1E-10
Macromolecules Heavenly Bodies
Rmol
1E-08
RHB
1E+09
(*)
G=(RHB/RA) A=(10
9
Gamma (Photon) Light (Photon)
/10-10)10-6
Thermic rays (Photon) Gravitational wave (Graviton)
Wavelength
Frequency
λn
m 1E-12
fn
Hz 1E+20
λA
1E-06
fA
1E+14
λmol
1E-04
fmol
1E+12
λG (*)
1E+13
FG
1E-05
The frequency vG of the gravitational wave can be calculated as follows. During the gravitational collapse, every elementary system, which constitutes the collapsing mass, can prevalently be (given the relative abundance of Hydrogen) reduced to a nucleon of mass m which orbits round an other nucleon, as it is shown in Figure 1. The order of magnitude of wavelength λG of the graviton, which is emitted from such a system, is:
λG = 4π
r3 c2 Gm
1/ 2
(10 −10 ) 3 ⋅ (3 ⋅ 108 ) 2 = 4π 6,67 ⋅ 10 −11 ⋅ 1,67 ⋅ 10 −27
1/ 2
≅ 1013 meters (5)
where: r = 10 −10 m is the atomic radius, c = 3 ⋅ 108 m / s the velocity of light, G = 6.67 ⋅ 10 −11 Jm / kg 2 the gravitational constant and m = 1,67 ⋅ 10 −27 kg the nucleon mass. The corpuscular model of gravitation, based on the loss of mass [4], leads to the same value for λG. The order of magnitude of the frequency vG of the gravitational wave is then:
νG =
c
λG
=
3 ⋅ 108 ≅ 10 −5 Hz 1013
(6)
3. Planck’s Quantum-Electromagnetic Length (lPlanckE). After the Big Bang, the following equation of the energy conservation is valid:
E0 = Mc 2 + U p where, with reference to the Universe, E0 is the total energy,
(7)
270
P. Allievi and A. Trotta
v m
2r
m v Figure 1. A nucleon of mass m which orbits round an other nucleon.
M = M rest 1 −
v2 c2
−1 / 2
≅ M rest 1 +
1 v2 2 c2
the mass,
v the expansion speed, r the radius, c the velocity of light and
3 M2 Up = − G the potential energy (G is the universal attraction constant). 5 r Differentiating Eq. (7), we have:
0 = dM − which becomes:
1=
6 G M ⋅ dM 3 G M 2 + dr 5 c2 r 5 c2 r 2
(8)
3 G M M dr 2− 5 c2 r r dM
(9)
The quantization of the angular momentum yields:
h 2π
(13)
h 2πvr
(11)
Mvr = or:
M= where h is the Planck' s constant.
Emergence and Gravitational Conjectures
271
Putting Eq. (11) into Eq. (9), the latter becomes:
1=
3 Gh 1 h 1 dr 2− 2 2 10π c vr 2π vr 2 dM
(12)
Bearing Eq. (35) in mind, Eq. (12) becomes:
3 Gh 1 7 5 Gh 1 2+ = 2 2 10π c vr 9 6π c 2 vr 2 Extracting r from Eq. (13), we have: 5 Gh r2 = 6π c 2 v 1=
(13)
(14)
At the beginning of the Big Bang is v = c , so Eq. (14) becomes, representing s quantum-electromagnetic length: by lPlanckE the Planck' 2 l PlanckE =
and finally:
l PlanckE =
5 Gh 6π c 3
(15)
5 Gh 6π c 3
(16)
utilizing the present values of the constants that are G = 6.67 ⋅10 −11 Jm kg 2 , h = 6.62 ⋅ 10 −34 Js , c = 3 ⋅108 m / s , Eq. (16) yields the following value for lPlanckE:
l PlanckE =
5 6.67 ⋅10 −11 ⋅ 6.62 ⋅10 −34 = 2 ⋅10 −35 m ≅ 10 −33 cm 8 3 6π (3 ⋅10 )
(17)
4. Planck’s Quantum-Gravitational Length (lPlanckG). The gravitational Schrödinger' s equation, relating to a system of two bodies of which one of mass m is gravitating round the other of mass M at distance r with velocity v and momentum p = mv , is:
− where ∆ =
hg2 8π
2
m 2p
∆−
GM E Ψ = Ψ r m
∂2 ∂2 ∂2 is the Laplacian, mp the nucleon mass, + + ∂ x2 ∂ y2 ∂ z2
(18)
272
P. Allievi and A. Trotta
hg is the gravitational Planck' s constant, E the total energy and Ψ (x,y,z,t) the state function of the system above mentioned. Eq. (18) is determined in the following way. Let us consider the total energy of the system:
mv 2 Mm −G =E 2 r
or: 2
m mp
m 2p (v x2 + v 2y + v z2 ) 2m
(19)
−G
Mm =E r
(20)
Let us assume the following operators (where i2 = −1):
p px = m p v x =
hg
hg ∂ hg ∂ ∂ ⋅ ; p py = m p v y = ⋅ ; p pz = m p v z = (21) 2π i ∂z 2π i ∂x 2π i ∂y ⋅
Inserting these operators in Eq. (20) and multiplying both sides by Ψ (x,y,z,t), we yield:
1 m 2m m p
2
2
hg ∂ 2π i ∂x
hg ∂ 2π i ∂y
+
2
hg ∂ 2π i ∂z
+
2
−G
Mm Ψ = EΨ r
(22)
or:
hg2
m − 2 ⋅ 8π m m p
2
∂2
⋅
∂x
2
+
∂2 ∂y
2
+
∂2 ∂z
2
−G
Mm Ψ = EΨ r
(23)
and finally:
−
hg2 8π
2
m 2p
⋅
∂2 ∂x
2
+
∂2 ∂y
2
+
∂2 ∂z
2
−G
M m ⋅Ψ = EΨ r
(24)
Comparing Eq. (18) with the well-known Schrödinger' s electromagnetic equation that we cast under the form:
−
h2 8π 2 me2
∆−
e2 E Ψ = Ψ 4πε 0 me r me
(25)
where me and e are respectively the electron mass and charge and ε0 is the permittivity of free space, we deduce that we can utilize the results obtained by differential equation (25) taking care of substituting into them:
Emergence and Gravitational Conjectures
273
hg / m p for h / me
(26)
e2
GM for
(27)
4πε 0 me
Therefore we can get Planck' s quantum-gravitational length lPlanckG utilizing the following expression of the Bohr radius for the hydrogen atom:
rBohr
2
h 2ε 0 h = = 2 me πme e
⋅
1 4π
2
1 e2
⋅
(28)
4πε 0 me
by the substitutions (26) and (27) and putting, for the hydrogen atom, M = mυ :
hg
l PlanckG =
2
mp
hg2 1 ⋅ 2⋅ = 4π Gm p 4π 2G m 3p 1
(29)
Remembering Eq. (44) in section 6, we can finally compute the following value for lPlanckG :
l PlanckG =
(2 ⋅10 21 h) 2 4π
2
Gm 3p
=
(2 ⋅10 21 ⋅ 6.62 ⋅10 −34 ) 2 2
4π ⋅ 6.67 ⋅10
−11
(1.67 ⋅10
− 27 3
)
≅ 10 65 m = 10 67 cm (30)
Remembering Eq. (43), lPlanckG can also be computed through the expression:
l PlanckG =
5.
h 3c 4π G 2 m 4p me
(31)
2
Change of the Radius R of a heavenly Body when its Mass M changes
We can show that the volume change dV of a heavenly body, when its mass M changes, is:
dV dM = −3 V M
+
Gravitational effect where, for example, β =1 for the black hole; β =1 for the Earth;
α=2 α=0
α
dM M
+
Fusion kinetic effect
α = 3,3 β =1 α = 2/3 = 0.66
β
dM M
(32)
Mass effect for the Star; for kinetic effect alone.
274
P. Allievi and A. Trotta
For the whole Universe, whose mass M, after the Big Bang, decreases over time (because part of it changes into energy), Eq. (32) becomes:
dV dR dM 2 dM 2 dM 7 dM =3 = −3 + = −3+ =− V R M 3 M 3 M 3 M Gravitational effect
(33)
kinetic effect
consequently:
and so:
dR 7 dM =− R 9 M
(34)
dR 7 R =− dM 9M
(35)
6. Gravitational Planck’s Constant hg Comparing the Graviton and Photon physical characteristics, it is possible to arrive at the result that the ratio of gravitational Planck' s constant hg to electromagnetic one h is equal to the ratio of λG (Graviton wave length) to λF (Photon wave length), that is:
hg h
=
λG λF
(36)
This expression is coherent with the supposition that the order of magnitude of the Photon Energy E F = hν F = hc λ F is equal to the Graviton Energy
EG = hgν G =
hg c
λG
.
We know that the wave length λG of the Graviton, emitted from a system of a nucleon which orbits round an other nucleon at distance r, is:
λG =
4π 2 r 3 RSchwarzschild
1/ 2
4π 2 r 3c 2 = G ⋅ mp
1/ 2
≅ 1013 meters
(37)
being: RSchwarzschild = Gm p c 2 and where mp is the nucleon mass and r the atomic radius.
Emergence and Gravitational Conjectures
275
On the other hand we know that the wave length λF of the Photon, emitted still from the hydrogen atom, is, remembering Eq. (27):
λF =
4π 2 r 3c 2 ≅ e2 4πε 0 me
4π 2 r 3c 2 hc me
(38)
where e = 1,6 ⋅ 10−19 C and me = 9,1 ⋅ 10−31 kg are respectively the electron charge and mass and ε 0 = 8,85 ⋅10 −12 F / m is the permittivity of free space. By relations (37) and (38), Eq. (36) becomes:
λ hc / me = G = h λF G ⋅ mp
hg
1/ 2
hc = G ⋅ m p ⋅ me
1/ 2
(39)
Putting into Eq. (39) h = 6.62 ⋅ 10−34 Js , c = 3 ⋅108 m / s , G = 6.67 ⋅10−11 Jm / kg 2 , m p = 1.67 ⋅ 10−27 kg and me = 9.1 ⋅ 10−31 kg , the ratio hg / h assumes the following value:
hg h
=
6.62 ⋅10 − 34 ⋅ 3 ⋅108
1/ 2
6.67 ⋅10−11 ⋅1.67 ⋅ 10− 27 ⋅ 9.1 ⋅10 − 31
= 1.4 ⋅ 1021
(40)
Therefore the value of gravitational Planck' s constant is given by:
hg = 1.4 ⋅1021 ⋅ h = 1.4 ⋅ 1021 ⋅ 6.62 ⋅10−34 ⋅ Js = 9.3 ⋅10−13 Js
(41)
From Eq. (39) we have moreover, raising it to the 2nd power:
hg2
=
hc Gm p me
(42)
hg2 =
h 3c Gm p me
(43)
h
2
or:
It is possible to express the gravitational Planck’s constant in the following different way:
hg h
=
1
e = 2 ⋅10 21 m 4π ε 0G e ⋅
(44)
276
P. Allievi and A. Trotta
This expression is obtained by taking into account that the Photon/Graviton Force has the following mathematical structure:
1 ∆E 1 ∝ 2 c ∆t λ and therefore, remembering Eq. (36), F=
2
hg
=
h
λG λF
1
2
=
⋅
(45)
e2 r2
4πε 0 FF = FG m2 G 2e r
=
e2 4πε 0G me2 1
⋅
(46)
7. Stationary States (stationary Orbits) of the solar System To make an example we utilize the results obtained in section 4 for the solar system. Recalling the values of the physical constants above mentioned and utilizing for the solar mass the value M = MS = 2⋅1030 kg, we compute the radius of the 1st stationary orbit and the relative total energy (without intrinsic energy), per orbiting unit-mass,:
r1 =
hg2 4π
2
m 2p
GM
=
(2 ⋅ 10 21 ⋅ 6.62 ⋅ 10 −34 ) 2 4π 2 ⋅ (1.67 ⋅ 10
− 27
) 2 ⋅ 6.67 ⋅ 10 −11 ⋅ 2 ⋅ 1030
= 120 000 km (47)
E1 1 GM 1 6.67 ⋅ 10 −11 ⋅ 2 ⋅ 1030 J =− ⋅ =− ⋅ = −5.5 ⋅ 1011 8 m 2 r1 2 kg orbiting 1.2 ⋅ 10
(48)
or, if we express the energy in unit-mass,
E1 / c 2 kg = − 6 ⋅ 10 − 6 m kg orbiting
(49)
It is interesting to note that the energy, which is necessary to the orbiting unitmass to go away from the energy level E1 (1st stationary orbit) to infinity, is:
En → ∞ E E kg − 1 2 = − 1 2 = 6 ⋅ 10 − 6 2 kg mc mc mc orbiting
(50)
The full mathematical description of the system is analytically intractable and is left for future developments.
Emergence and Gravitational Conjectures
277
Table 3 shows radii rn of the stationary orbits, for some significant values of n, to which the real Planets orbits correspond. The values of rn are computed by resorting to the following relationship:
rn = n 2 ⋅ r1 = n 2 ⋅120000 km .
(51)
8. Limit Mass of a black hole The Dirac’s Equations for an electron in a central field, as for example for hydrogenoid systems, yield the following expressions for total energy Wn of the stationary states of the hydrogenoid system (Wn = mec2 + En , that is: electron intrinsic energy + electron kinetic energy + electromagnetic potential energy of the system electron/nucleus, disregarding the intrinsic energy of the nucleus which is at rest) and for radii rn of the stationary orbits:
me c 2
Wn, j , H = 1+
1 ( j + )2 − α 2 2
rn, j , H = −
(52)
α2
1 Z e2 1 = 2 2 4πε 0 (Wn, j , H − me c ) 2
2
1/ 2
+ n'
Z e2 2
4πε 0 me c 1 −
Wn, j , H
(53)
me c 2
where:
α=
2π Ze 2 2π me Ze 2 Z ⋅ = ⋅ ⋅ = hc 4πε 0 c h 4πε 0 me 137
(54)
is, for Z = 1, the fine-structure constant, 1 j =l ± is the internal quantum number , 2 l = 0 , 1, is the azimuthal quantum number,
n= j+
1 + n'= 1, 2 , 2
is the total quantum number,
the symbol H, initial letter of Hydrogen, is referred to Hydrogenoid systems of atomic number Z, while the values of the other constants are defined in sections. 4 and 6.
278
P. Allievi and A. Trotta Table 3. Radii rn of the stationary orbits, to which the real Planets orbits correspond. N
1 2 3 22 30 35 44 80 109 155 194 222
Planet P
Planet Mass mP kg
Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto
Average distance from Sun 106 km
3,3E+23 4,9E+24 6,0E+24 6,5E+23 1,9E+27 5,7E+26 8,7E+25 1,0E+26 1,0E+22
rn 106 km 0,12 0,48 1,08 58 108 147 232 768 1426 2883 4516 5914
58 108 150 228 778 1427 2870 4496 5900
Orbiting velocity vn km/s 1054 527 351 48 35 30 24 13 10 7 5 4,7
Expressions (52) and (53), on the grounds of the above mentioned analogy between electromagnetic and gravitational phenomena (see section 4), can be also utilized for gravitational phenomena, as in the case of a heavenly body of mass m which gravitates, at a distance r, round another heavenly body, at rest, of greater mass M, when we substitute (see relations (26) and (27)) into them: m for me, αg for α, hg / mn for h / me and GM for (Ze2)/(4πε0 me). Relations (52), (53) and (54), therefore become for gravitational phenomena:
m c2
Wn, j = 1+
1 ( j + ) 2 − α g2 2 rn , j =
(55)
α g2
1 ⋅ 2
c
2
2
1/ 2
+ n'
GM Wn, j 1− m c2
where: mn = m p = 1.67 ⋅ 10 −27 kg is the nucleon mass, hg = 2 ⋅ 10 21 h is the gravitational Planck’s constant,
(56)
Emergence and Gravitational Conjectures
αg =
2π mn ⋅ ⋅ GM is the gravitational fine-structure constant, c hg
1 2 l = 0 , 1, j =l ±
n= j+
279
(57)
is the internal quantum number, is the azimuthal quantum number,
1 + n'= 1, 2 , 2
is the total quantum number.
Wn,j are the values of the total energy of system stationary states (Wn,j = mc2 + T + U = mc2 + E, that is: intrinsic energy of mass m + kinetic energy of mass m + potential energy of the system of two masses m and M ), disregarding intrinsic energy of mass M that is Mc2. The fundamental state of the gravitational system in argument is individuated by the following values of the quantum numbers:
n =1,
l=0
(58)
consequently by conditions (57) we deduce:
j=
1 and n'= 0 . 2
(59)
Inserting such values (58) and (59) into Eq. (55), the total energy of the system at the fundamental state, per unit-mass orbiting round mass M, is:
W
1,
1 2 2
mc
= 1+
−
α g2
1 2
1 − α g2
= 1 − α g2
(60)
In case that mass M reaches the following limit value:
M limit =
c hg 2π G mn
=
3 ⋅108 ⋅ 2 ⋅10 21 ⋅ 6.62 ⋅10 −34 2π 6.67 ⋅10 −11 ⋅1.67 ⋅10 − 27
= 5.7 ⋅1032 kg = 284 M Sun
(61)
where MSun = 2⋅1030 kg , we have, by relation (57), αg = 1 and expression (60) becomes:
W
1,
1 2
=0
(62)
from which:
m c2 = − E
1,
1 2
(63)
280
P. Allievi and A. Trotta
while the radius of the 1st stationary (fundamental) orbit is, recalling Eq.s (56), (61) and (62):
r
1 1, 2
=
hg G M lim it 2 ⋅ 10 21 ⋅ 6.62 ⋅ 10−34 = = = 210 km 4π c m n 4π ⋅ 3 ⋅ 108 ⋅ 1.67 ⋅ 10 − 27 2 c2
(64)
Relation (63) express that mc2 is equal to the energy which is necessary, for mass m, to leave the fundamental orbit and, winning the gravitational field due to mass M, to reach infinity with remaining mass equal to zero. This behaviour is peculiar to masses which are placed on the surface of a black Hole. Therefore, radius r
1,
1 2
= 210 km also represents the radius of limit mass
M limit = 284 solar masses, which then behaves as a black Hole. 9. Conclusions This article shows that it is possible to place the frequency range of the gravitational waves around 10-5 Hertz . This allows detailed experimental tests. Moreover we sketched how, by making use of a corpuscular semiclassical approach, we can compute the values of two fundamental Planck’s constants associated to two different interactions, as well as their interrelationships. The method could be easily extended to other contexts, allowing the building of a general theory of emergence with a multiplicity of different “Planck’s constants”. This could generalize in an interesting way the usual theories of emergence so far introduced. References 1. P. Allievi, Theory on the corpuscular nature of the interactions among moving
charged particles, in Proceedings of the Mathesis National Conferences, (Anzio/Nettuno 2004). 2. P. Allievi, Theory on the Photon structure, in Proceedings of the Mathesis National Conferences, (Anzio/Nettuno 2004). 3. P. Allievi, Theory on the corpuscular nature of the gravitation, in Proceedings of the Mathesis National Conferences, (Trento, 2006). 4. P. Allievi, Theory on the Graviton structure, in Proceedings of the Mathesis National Conferences, (Trento, 2006).
EMERGENCE IN SOCIAL SYSTEMS
This page intentionally left blank
INDUCING SYSTEMS THINKING IN CONSUMER SOCIETIES
GIANFRANCO MINATI(1,2), LARRY A. MAGLIOCCA(2) (1) Italian Systems Society, Milan, Italy E-mail: [email protected] (2) Ohio State University, Columbus, Ohio, USA E-mail: [email protected] We introduce some core principles related to systems thinking: interaction, establishment of systems through organization and self-organization (emergence), and the constructivist role of the observer including the use of language. It is not effective to deal with systemic properties in a non-systemic way, by adopting a reductionist way of thinking, i.e., when properties acquired by systems are considered as properties possessed by objects. We consider the reduced language adopted in consumer societies as functional to maintain consumerist attitude. In consumer societies, language is suitable for maintaining people in the role of consumers with a limited ability to design and create. In this context freedom is intended as freedom of choice. To counteract this reduced language, we propose the diffusion of suitable games, cartoons, comics and pictures, images, concepts and words which can enrich everyday language, especially that of young people, and provide an effective way for inducing some elementary aspects of systems thinking in everyday life. The purpose is to have a language to design and develop things and not merely to select from what is already available. We list a number of proposals for the design of such games, stories and pictures. Keywords: consumerism, games, induction, language, systems thinking.
1. Introduction In this paper we use the term Systemics to refer to the usage of systemic concepts in various activities, such as scientific research, management, education, economics, politics and culture in general. Some currently used expressions and terms are: systems theory; systems thinking; general system theory and systemic view, principles, approach, properties and problems as frequently introduced and discussed in the scientific literature. Systems are not observer-independent, but are observer-dependent in the sense that it is the observer modeling a phenomenon as a system. This contrasts with the objectivistic view assuming reality to exist as it is. In Systemics, modeling is carried out by the observer through a language to represent it. This approach is known as constructivism. The opposite of Systemics is reductionism. It is based on assuming the level of description related to composing elements as a general effective strategy to deal with systems and the macroscopic level as a linear
283
284
G. Minati and L. Magliocca
extension of the microscopic one. In this paper we will better specify the concepts mentioned above and use them for dealing with social systems, i.e., communities established by interacting autonomous agents. An agent is said to be autonomous when provided with a cognitive system allowing them to decide how to interact. Examples of social systems are anthills, beehives, teams, temporal communities (queues and markets), corporations, cities and hospitals. This paper focuses on the usage of reductionism in consumer social systems (i.e., societies supporting their economic activities by artificially increasing the consumption of resources and products. Support for this consumer attitude is given through the use of a reductionist language reducing systems to objects and processes to products. This is a way to hide the inherent non-sustainability of the underlying and induced processes. We consider that a positive contribution towards allowing social systems to overcome this non-sustainable consumer phase is to introduce and expand the usage of non-reductionist language in daily life, for instance through games, cartoons, comics, pictures and stories able to induce Systems thinking, able to make people realize its full complexity and effects. 2. The general view There are many instances in the scientific literature describing how it is ineffective to deal with social issues (e.g., economic, educational, related to family, health and safety) without considering the need to model them using a systemic approach based on the fundamental work by Bertalanffy [1] and other researchers [2,3,4,5]. In more recent times the usage of systemic models based on emergence has been introduced to deal with problems of self-organized social systems, such as industrial districts, markets and organizational learning [6,7,8], and living systems such as flocks and swarms modeled using, for instance, Artificial Life [9,10] and Swarm Intelligence [11,12,13]. Systemic concepts are more and more widespread at professional and academic levels, whereas common, everyday thinking is based on simplified and reductionist approaches. We believe that one reason may be that consumer societies sustain this approach by diffusing short-term, symptomatic, cause-effect, and local views for the marketing of solutions and functions with no reference to the more general picture. Within the framework of the constructivist approach illustrated later, this is mainly done by imposing a language (through processes of standardization, reduction, and simplification of verbal and pictorial language used in advertising, TV, and other media) which only represents and deals with what is considered suitable for consumers. Induction [14] is a logical inference
Inducing Systems Thinking in Consumer Societies
285
allowing one to infer from a finite number of particular cases to another case or to a general conclusion. For instance, if we extract a number of balls from a given box and see that all are white, then we may infer that all the balls in the box are white. We introduce here the idea of using games, comics, cartoons and stories for popularizing the core principles of systemics, as a contribution towards reproducing systemic problems in the players. In this way it is possible to induce some elementary aspects of systems thinking, suitable for designing behaviors and not just applying, optimizing and using current rules (i.e., designing vs. playing a game with predefined rules) derived from reductionist thinking. In short, the idea is to induce the use of a language suitable for constructivistically representing and managing systems in a non reductionist way. The purpose is to induce systemic concepts, representations and approaches to be adopted as commonplace when considering many other problems. 3. Constructivism and Language In this paragraph we will focus upon the constructivist role of the observer based on language, when dealing with natural (i.e., non-artificially designed) systems. The role of the observer is not to perturb or produce relativity, as in classical views, but, following the introduction of Gestalt psychology [15], Cognitive Science [16,17], and Constructivism [18], to create cognitive existence, as when the observer detects / cognitively generates coherence (e.g., dealing with selforganized, emergent phenomena, such as swarming and flocking). The existence of the phenomenon is necessarily related to the cognitive model used by the observer [19,20]. 3.1. Constructivism The constructivist approach, or constructivism [18,21,22,23], has historically been connected with the principles mentioned above. Von Glasersfeld, for instance, asks: “What is radical constructivism?” He defines it as an unconventional approach to the problem of knowledge and knowing. It starts from the assumption that knowledge, no matter how it is defined, is in the heads of people, and that the thinking subject has no alternative but to construct what he or she knows on the basis of his or her own experience. What we make of experience constitutes the only world we consciously live in. It can be sorted into many kinds, such as things, self, others, and so on. But all kinds of experience are essentially subjective, and though I may find reasons to believe that my experience may not be unlike yours, I have no way of knowing that it is
286
G. Minati and L. Magliocca
the same. The experience and interpretation of language are no exception [18]. The same book he explicitly deals with the issue of language (such as in Chapter 1, entitling a paragraph “Which language tells it ‘As It Is’?”, p.2, and also in Chapter 7, in a paragraph entitled “Language and Reality”, p.136). In the same book, the Sapir-Whorf hypothesis (see Section 3.2) is mentioned as an important source for his work on p.3. To summarize, we may say that the more extensive, more accurate, more articulated, more able to express nuances, more capable of abstractions our language is the more effective we can be because, correspondingly, we may construct sophisticated representations and we may model our action with increasing accuracy. In a constructivist view, properties depend upon the level of description (in short, a level of description used by an observer is given by the disciplinary knowledge used, purposes of the observer in modeling, the kind and quantity of variables, scaling, relations and interactions used to model a system) adopted by the observer [24]. Examples are: • A ballpoint pen may be intended as an object (for suppliers, sellers and users) or as a system of interacting components (for instance, for the designer). • A device, such as a TV set, may be intended as an object (for suppliers, sellers and users) or as a system of interacting components (for a technician having to fix it). • An autonomous system (i.e., a system provided with a cognitive system) may be intended as a buyer (i.e., an agent making an economic transaction) or as a system of interacting components (for a physician or a psychologist considering the future usage and effects of what is bought). 3.2. Language There are controversial theories regarding different research aspects of language, for instance language and human behavior, linguistics and cognitive science, general semantics theory, knowledge representation, linking human processes of thinking to human behavior, learning, linking of the thinking process to language, man-machine interfacing, representing, representing meaning through language, linguistic construction of reality and translation [25,26,27,28,29,30, 31,32,33,34,35,36,37]. For the purpose of this paper we will limit ourselves to considering some of the most fundamental, general principles which have been the basis of research activity. For instance, how learning involves language and how language
Inducing Systems Thinking in Consumer Societies
287
influences learning, as introduced by Vigotsky: “The relationship between thought and word is not a thing but a process, a continual movement back and forth from thought to word and from word to thought: .... thought is not merely expressed in words; it comes into existence through them” [38]. This view was successively elaborated and formulated as the celebrated Sapir-Whorf hypothesis [39,40] – now accepted in the weaker sense – mentioned in Chapter 10 of Von Bertalanffy’s book cited above. In this context we just want to give a general idea of the approach. The general, ‘strong’ (we mention below the so-called ‘weaker’ versions), idea introduced by this approach is that what we can think is enabled by the language that we use for describing, hypothesizing, designing, rejecting, and so on. If we do not have the language to say it, it doesn’t exist for us. There are many approaches for dealing with the ideas introduced by the Sapir-Whorf hypothesis. The versions of these approaches may be briefly summarized [41] as below: 1. Strong hypothesis—language determines thinking; 2. Weak hypothesis—language influences perception and thinking; 3. Weakest hypothesis—language only influences memory. Below we consider the weak hypothesis as the most suitable for dealing with processes of influencing and even manipulating social systems behavior. The purpose of this paper is not to propose how to teach systemic knowledge, rather to make its adoption natural at the various social levels, such as in schools, families, workplaces, management, politics, and in particular social systems (i.e., hospitals, prisons, temporary communities – transport, social events, distribution - and so on). As mentioned in the introduction, we think that one possible, effective approach is to improve the language used in social systems for constructivistally representing and managing systems in a non reductionist way [42,43]. Unfortunately in consumer societies the purpose is to simplify and standardize [19,44,45] languages used in everyday life making people concentrate on marketing and business issues only. 4. How to have the language to imagine it? As introduced by constructivism there is a continuous correspondence between real life style and what can be represented by the social language. We are focusing on the fact that the simplified language of consumer societies is, in short, suitable for supporting consumer activities such as selecting, comparing and optimizing. By using this kind of reduced and simplified language people
288
G. Minati and L. Magliocca
may only have wishes and projects that can be dealt with by consuming, see the issue related to the economics of consumer credit invented to support this strategy. Moreover, the increase in consumer credit reduces, as a secondary effect, savings [46,47]. An example of effects of this simplified language is given by confusing freedom with freedom to select between pre-defined choices. In consumer societies the strategy is not to explicitly reduce freedom (as in authoritative societies), but to strongly induce a mono-dimensional freedom, such as that of selecting between pre-established choices. In this way social systems are such that they make ineffective (nevertheless, they do not forbid) a selection of something that is not already on offer. Examples include: • Replacement of a process with a product, such as products replacing diets, physical exercise and active remedies to correct unhealthy life styles. The possibility of selecting products is assumed to substitute the possibility of changing lifestyle; • Offering ways of spending free time (e.g., in shopping centers and attending organized events) versus autonomously designing activities (for instance, local excursions). Many people do not know their own town or country because there are non-standardized micro-tourist offers. This is common for artistic places and beauty spots; • Offering technologies for producing microclimates. The user can select from the market a specific air-conditioner, but not decide or influence the processes producing this need. Air conditioning helps to locally reduce temperature, by using energy and then contributing to increase the outside temperature; • Offering pre-cooked food together with pharmaceutical products to reduce problems deriving from fast food and the preservatives they contain. The user may select the product, but has no influence on the lifestyle producing those needs; • Offering evening TV programs that the consumer may select versus the possibility of reading a book, listening to music, discussing, writing or navigating on the Internet. Alternative evenings are not forbidden, but unusual. See the pathological dependence of TV upon young generations, very useful on the one hand, to control children (i.e., not having to spend time directly interacting with them) and establish shared time in families (with possible low-level interactions limited to the issues proposed by TV) and, on the other, to advertise products. The user selects the programs, but not the lifestyle.
Inducing Systems Thinking in Consumer Societies
289
The language used by advertisements supports the reductionist view, i.e., the attention towards objects and properties are assumed to be self-established and self-consistent, with a lack of or no attention at all being paid to the underlying and related processes. Examples are: • Advertisements using a language combining a given food with given effects inducing the idea, without describing it explicitly, that this food produces those effects; • Advertisements using a language combining specific products with values, such as naturalness, good health, reliability, beauty and success inducing one to accept a relationship between them; • Advertisements using references to science to support the truthfulness, the objectivity of the quality of a product, even using scientific language to say something which is certainly not scientific (such as percentages and statistics referring not to variables, but to immeasurable aspects). In this view it is more important to present a convincing statement rather than an accurate description of the product or the process. Previous classical strategies for manipulating social systems were based on persuading or convincing people adopt certain beliefs, ideals, or political, religious or racial views. The strategy for manipulating and controlling social systems in order to establish consumer societies is different. It is sufficient to reduce the possibility of thinking in a different way and, above all, to imagine different scenarios. This strategy is pursued by depriving social systems of a language suitable for imagining and thus designing change. We think that one aspect of this strategy is to relegate systemic thinking to scientific and professional issues to be kept separate from everyday thinking. As mentioned above, systemic properties are supported and maintained by continuous interactions of components with the constructivist role of the observer modeling them in this way. If we stop interactions, systemic properties disappear (e.g., device functions, teams, organizations and life itself) and if we ignore interactions systemic properties become just properties. The reductionist way of considering systemic properties is assuming them to be objectivistic properties by ignoring the complex interacting supporting processes of interaction and the constructivist role of the observer. Systemic properties are reduced to mere properties of objects and then considered as such.
290
G. Minati and L. Magliocca
5. Having a language for designing. The entry point. The ideal strategy spread the usage in everyday activities of systemic concepts for designing evolutionary rules and establishing a more self-aware behavior, may be pursued by increasing the level of social knowledge. Moreover, in our societies the problem lies not so much with the non-availability of knowledge (available through the Internet, broadcasting, popularization, increasing offer of books and in schools), but with its poor or unsuitable usage. Knowledge is more accessible today than it has ever been. We believe that, in spite of this, its social influence has been de-activated by social behavioral models based, for instance, on language manipulating techniques. 19,45 In order to reverse this de-activation and its usage, one possible approach is to diffuse suitable images of the meaning and usage of knowledge: • to provide examples which tend to induce more general approaches, and • to enrich everyday language in such a way as to make possible the modeling of life in a systemic way, thus allowing the detection, representation and focussing upon interactions and systemic properties. Systemic knowledge has the power to allow the design of micro and macro evolutionary social rules more than simply organizational ones. We believe that the most important target is the younger generation allowing them to design new rules and not just perpetuate the current ones as the only possible choice. In economics, for instance, the perspective may not only be absolute and continuous growth in the same way [48,49,50]. Development may be intended as changing the ways and fields of growth [20,51]. It may be designed or be a constraint from previous growth processes. One way to induce systems thinking may be through comics and cartoons, video backgrounds, images, films, linguistic games, interactive on-line videogames and quiz games based on systemic principles able to enrich everyday language with words, images and concepts. The usual, consumer, offer of games in general, especially for young people in the age range of 2 –13 [52], is mostly for leisure, entertainment, implicitly advertising other products, services and supporting consumer life styles. In this section we list some approaches for the design of games and other means which may induce systemic thinking, i.e., to consider as natural the systemic aspect of problems and situations.
Inducing Systems Thinking in Consumer Societies
291
5.1. Linguistic games We propose linguistic games, such as: • Translation of a phrase from a language into the same language (i.e., by using different words): what should be invariant is the meaning. • Finding out what is lacking in a word to express a certain meaning and how to overcome it. For instance, how to substitute the lack of conditional or subjunctive tenses for verbs? • Finding out how the meaning of words lies not in the words themselves (as labels), but in interactions, within the mind of the observer, with the preceding and subsequent words. The same process occurs for the meaning of phrases within a story or a book. • Building up ambiguous statements (the meaning depends upon the observer and the context). • Building up phrases by using predefined small sets of collections of words and discovering meanings which are not possible to represent using only those words. One way to play these games is to have different competing players or teams as typically occur in a classroom. Systemic content of the game: the issues relate to (1) the systemic nature of language, i.e., how the meaning is established through the interactions between words in the mind of the observer, (2) multiple-representations, i.e., through the process of re-formulating and translating, and their equivalence or nonequivalence. 5.2. Distinguishing between the composition of elements and the establishment of systems This game has the purpose of distinguishing between processes of composing elements and establishing systems. The process of composing is able to establish entities having new properties (new with respect to the component elements). However, in this case, the process of interaction is not required to be continuous. It is expected to produce some effects, giving rise to new properties which may be considered as non-systemic. Examples are the processes of cooking and of mixing colored water or light giving rise to new colors (for instance, blue and yellow giving green). On the contrary, the process of establishing systems (both through organization and/or emergence) is based on continuous interaction between elements. Examples are properties of devices established when
292
G. Minati and L. Magliocca
powered on (i.e., when elements are made to interact, as in electronic circuits). One way to play a game involving these aspects may be to categorize processes to compose, organize and emerge. It may be based on producing pictures, movies, animations, on-line quizzes and discussions in classrooms. Systemic content of the game: the issues relate to distinguishing processes of composition of elements and how a system may be established by organizational and/or self-organizational rules. In the first case elements give rise to new structures whereas in the second, systems are designed by setting explicit organizational rules or by setting behavioral rules for components. 5.3. Acting on a system It is well-known and illustrated by cases considered below, that acting at the microscopic level has non-linear effects on the macroscopic one and is an ineffective way for managing the macroscopic one. Acting upon single elements is a poor strategy for acting upon a system. The systemic view requires one to consider the interactions among elements and, specifically, the functional roles of elements in organized systems and the behavioral rules in emergent systems. Ways of influencing systems include: • influencing interactions among elements, for instance, by perturbing through the introduction of noise and varying its intensity; • changing the organizational rules followed by elements when interacting, such as changing how electronic components are connected or the organization in an assembly line. The replacement in this case is due to a missing role in the organization; • changing the behavioral rules followed by elements when interacting, such as how single agents behave in traffic and anthills. The replacement in this case is due to an unorthodox behavior of the agent. Besides considering specific cases, representing and commenting upon them, a very effective approach could be to act on simulated systems and then detect, comment upon and explain reactions. It could be a useful idea for designing innovative educational videogames [53], for instance, in the framework of the well known Game of Life, the cellular automaton devised in 1970 by J. H. Conway. It is a zero-player game because the evolution is determined by the initial state inserted by the player who then observes how the system evolves. Other games close to this purpose and available today are based upon loops of action-reaction as in models based on System Dynamics, such as SimCity TM.
Inducing Systems Thinking in Consumer Societies
293
Systemic content of the game: when any phenomenon is represented as a system then effective action upon it requires the adoption of a systemic approach and not merely acting upon individual components and processes assuming that this will produce linear effects upon the system. 5.4. More is not always better The untying of a knot requires clarity about the entire configuration: applying stronger force usually makes the problem worse. The key point is to balance actions having in mind the global configuration of effects. The addition, for instance, of more energy and resources to a system does not linearly imply an improvement: a system may need to reduce its temperature and the amount of resources to be processed in order to allow reorganization. What is good for an element may be not good at all for the system. Local aspects can not be extended linearly to the entire system. Games may be invented by contrasting cases where the purpose of the system is accomplished by increasing or reducing input. Suitable supplies to the system are possible with a good knowledge of the whole and not only of its parts. Systemic content of the game: the expression more is better is usually based on the assumption that such abundance refers to resources essential for the system and that the receiver can always limit and refuse any excess. The general idea is that abundance is better than shortage. By the way, resources are often information per se. The limitation of one thing and abundance of something else are important inputs for a system in order to establish, for instance, balancing, compensating and evolutionary processes. The supplier may have a wrong model of the system. Examples are diets and daily activities able to reinforce, model and balance a system. 5.5. Improvement of parts does not always imply improvement of the system For instance, larger wings do not allow an airplane to fly faster and higher production may be a disaster for a company lacking a suitable distribution network. The message is that the functioning of a system is based upon keeping a balance between inputs and effects, allowing variations over specific optimal ranges. It relates to the so-called homeostatic principle. Homeostasis is the property of an open system to regulate its internal environment to maintain a stable condition in a changing external environment. In biology, for instance, an automatic mechanism produces opposite reactions to external influences in order to maintain equilibrium. The homeostatic principle is the key-ordering force for
294
G. Minati and L. Magliocca
all self-sustaining systems, and above all, living systems. Games may be based on quizzes and finding examples where detecting a means of improving the performance of a system have effects upon component parts and their interactions. Systemic content of the game: another way to make evident how the strategy of acting upon an element is ineffective for acting upon the system. 5.6. A problem in a component part is a problem for the entire system How do negative actions upon component parts affect the system or diffuse through it? We may consider two aspects of the problem by considering the evolutionary scenarios of a social system: • Perturbing effects of an action upon parts and interactions have unbalancing effects throughout a system considered a long time after their occurrence. This delay worsens the situation because (a) corrective actions are often no longer possible, and (b) the system will have adapted its evolution on the assumption of the persistency of the action. This is the typical situation of non-sustainable processes. The system will adapt to the new situation by producing changes, e.g. adaptation to pollution or a poisoned food chain. • Some adopt the approach that a negative situation caused by such an imbalance or non-sustainability will be solved by scientific improvements. This is what it is called progress. Moreover, the point is that progress, when successful, will enable the system to adjust to the new situation, by restructuring its evolutionary rules not on the basis of its design, but to tolerate and adapt to changes as constraints due to particular, local interests. Such as the consumption of exhaustible resources, for example, oil used as a fuel while also being used for producing artificial substances, such as plastic. The arrival point is the same in both cases: a new reconfiguration of the system. Is this the only possible definition of development? Analogous ways of thinking apply, for instance, to processes related to medicine, psychology and economics. A cartoon may show a house where the first floor is on fire or a ship with a hole on one side and people saying, I am pleased that the fire is not in our room, or I am pleased that the hole is not on our side. Help may arrive before the disaster, but the system will no longer be the same. Is this the only way to develop a system, by constraints?
Inducing Systems Thinking in Consumer Societies
295
Systemic content of the game: any negative, destructive effects on parts and interactions will have, in time, evolutionary, non-linear and different effects on the entire system. 6. Conclusions Some basic concepts have been introduced about processes of the establishment of systems, constructivism and the role of the observer based on language to model and produce cognitive reality. Dealing with systems needs systemic knowledge as opposed to reductionist knowledge based on considering systems acquiring properties as objects possessing properties. Social systems should, in our view, be dealt with not only by using systemic knowledge, but also by taking into account that the agents involved are autonomous agents behaving through the use of their cognitive systems and cognitive models. In nature, the cognitive models used are normally fixed, predefined for species (ants, for instance, build ant-hills, bees build beehives, and felines always hunt in the same way; only evolution can change that). Human beings may vary, invent, and design their own social systems through knowledge and cognitive models. The fundamental tool for designing worlds is language as introduced by constructivism. Consumer societies are based on a reduced language maintaining usage of a specific cognitive model supporting marketing and consumption, by considering agents merely as economic agents, buyers. Reductionism applied to social systems has the consequence (i.e., purpose in consumer societies) to assume social evolutionary rules to be fixed and not subject to improvements or changes. We have presented here some ideas for designing games and other approaches able to induce systemic ways of thinking in young people and, consequently, the ability and the freedom to design, i.e., design the future and not just accept a continuation of the present. What is new in this paper? The approach for understanding the strategy at the basis of consumer societies. References 1. L. von Bertalanffy, General System Theory. Development, Applications (George Braziller, New York, 1968).
2. P. Checkland, Systems Thinking, Systems Practice (Wiley, New York, 1981). 3. C.W. Churchman and M. Verhulst, Eds., Management Sciences (Pergamon, New York, 1960).
4. R.L. Flood and M.C. Jackson, Eds., Critical Systems Thinking: Directed readings (Wiley, Chichester, UK, 1991).
5. M.C. Jackson, Systems approaches to management (Kluwer, NewYork, 2000).
296
G. Minati and L. Magliocca
6. S.Y. Auyang, Foundations of Complex System Theories in Economics, Evolutionary 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
Biology, and Statistical Physics (Cambridge University Press, Cambridge, UK, 1998). F. Belussi and G. Gottardi, Eds., Evolutionary Patterns of Local Industrial Systems. Towards a Cognitive Approach to the Industrial District (Ashgate, Aldershot, UK, 2000). G. Dei Ottati, European Planning Studies, 463 (1994). E. Bonabeau. and G. Theraulaz, Artificial Life 303 (1994). P. Cariani, in Artificial Life II, Ed. C. Langton, D. Farmer and S. Rasmussen, (Addison-Wesley, Redwood City, CA, 1992), pp.775-797. E. Bonabeau, M. Dorigo and G. Theraulaz, Swarm Intelligence: from natural to artificial systems (Oxford University Press, UK, 1999). M.M. Millonas, in Artificial Life III, Ed. C.G. Langton (Addison-Welsey, Reading, MA, 1994), pp. 417-445. G. Theraulaz, S. Goss, J. Gervet and J.L. Deneubourg, in Proceedings of the 1990 IEEE International Symposium on Intelligent Control, Ed. A. Meystel, J. Herath and S. Gray (IEEE Computer Society Press, Los Alamitos, CA, 1990), pp. 135-143. J.H. Holland, K.Y. Holyoak, R.E. Nisbett and P.R. Thagard, Induction (MIT Press, Cambridge, MA, 1986). M. Wertheimer, Philosophische Zeitschrift für Forschung und Aussprache, 39 (1925). P.H. Lindsay and D.A. Norman, Human Information Processing (Academic Press, New York, 1972). D.A. Norman, Cognitive Science 1 (1980). E. von Glasersfeld, Radical Constructivism: A Way of Learning (Routledge Farmer, New York, 1995). G. Minati, in Systemics of Emergence: Research and Applications, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), pp. 569-584. G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). H.R. Maturana and F. Varela, The Tree of Knowledge: The Biological Roots of Human Understanding (Shambhala, Boston, MA, 1992). H. von Foerster, Understanding Understanding: Essays on Cybernetics and Cognition (Springer, New York.2003). P. Watzlawick, Ed., Invented Reality: How Do We Know What We Believe We Know? (Norton, New York, 1983). S. Guberman and G. Minati, Dialogue about systems (Polimetrica, Milano, Italy, 2007). J.R. Anderson, Language, Memory, and Thought (Erlbaum, Hillsdale, NJ., 1976). J.R. Anderson, The Architecture of Cognition (Harvard University Press, Cambridge, MA, 1983). J.R. Anderson, Rules of the Mind (Erlbaum,: Hillsdale, NJ, 1993). D. Bickerton, Language & Species (University of Chicago Press, Chicago, 1992). D. Bickerton, Language and Human Behavior -The Jessie and John Danz Lectures(University of Washington Press, Washington, 1996). N. Chomsky, Knowledge of language: Its nature, origin and use (Praeger, New York, 1986).
Inducing Systems Thinking in Consumer Societies
31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
297
T.W. Deacon, The symbolic species (W.W. Norton & Company, New York, 1997). T. Givon, Evolution of Communication 45 (1998). G. Grace, The Linguistic Construction of Reality (Croom Helm, London, 1987). N. Lund, Language and thought (Routledge, Hove, UK, 2003). R.A. Müller, Behavioral and Brain Sciences, 611 (1996). S. Pinker and P. Bloom, Behavioral and Brain Sciences, 707 (1990). W. Wilkins and J. Wakefield, Behavioral and Brain Sciences, 161 (1995). L.V. Vigotsky, Thought and Language (MIT Press, Cambridge, MA, 1962). E. Sapir, Language 207 (1929). Reprinted in Selected writings of Edward Sapir, Ed. D.G. Mandelbaum, (University of California Press, Berkeley, 1949), pp. 34-41. B. Whorf and J. Caroll, Ed., Language, Thought and Reality: Selected writings of B. L. Whorf (John Wiley&Sons, New York, 1956). E. Hunt and F. Agnoli, Psychological Review 377 (1991). L.A. Magliocca and G. Minati, in Emergence in Complex Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp.235250. L.A. Magliocca and A.N. Christakis, Systems Research and Behavioral Science, 259 (2001). H. Marcuse, One Dimensional Man (Beacon, Boston, 1964). G. Minati, World Futures, 29 (2004). G. Bertola, R. Disney and C. Grant, Eds., The Economics of Consumer Credit: European Experience and Lessons from the US (MIT Press, Cambridge, MA, 2006). L.C. Thomas, J. Ho and W.T. Scherer, IMA Journal of Management Mathematics, 89 (2001). N. Georgescu-Roegen, in The Political Economy of Food and Energy, Ed. L. Junker, (University of Michigan, Ann Arbor, MI, 1977), pp. 105-134. N. Georgescu-Roegen, in Prospects for Growth: Changing Expectations for the Future, Ed. K.D. Wilson, (Praeger, New York, 1977), pp. 293-313. N. Georgescu-Roegen, in Energy: International Cooperation on Crisis, Ed. A. Ayoub, (Press de l’ Université Laval, Québec, 1979), pp. 95-105. G. Minati, Proceedings of the first Italian Conference on Systemics, Ed. G. Minati, (Apogeo, Milano, Italy, 1998), pp. 93-106. J.B. Schor, Born to Buy: The Commercialized Child and the New Consumer Culture (Scribner, New York, 2004). I. Bogost, Persuasive Games (MIT Press, Cambridge, MA, 2007).
This page intentionally left blank
CONTEXTUAL ANALYSIS. A MULTIPERSPECTIVE INQUIRY INTO EMERGENCE OF COMPLEX SOCIO-CULTURAL SYSTEMS
PETER M. BEDNAR (1,2) (1) Department of Informatics, Lund University, Sweden. (2) School of Computing, University of Portsmouth, UK. PO1 3HE, Portsmouth, Hampshire, UK. E-mail: [email protected] This paper explores the concept of organizations as complex human activity systems, through the perspectives of alternative systemic models. The impact of alternative models on perception of individual and organizational emergence is highlighted. Using information systems development as an example of management activity, individual and collective sense-making and learning processes are discussed. Their roles in relation to information systems concepts are examined. The main focus of the paper is on individual emergence in the context of organizational systems. A case is made for the importance of attending to individual uniqueness and contextual dependency when carrying out organizational analyses, e.g. information systems analysis. One particular method for contextual inquiry, the framework for Strategic Systemic Thinking, is then introduced. The framework supports stakeholders to own and control their own analyses. This approach provides a vehicle through which multiple levels of contextual dependencies can be explored and allows for individual emergence to develop. Keywords: strategic systemic thinking, contextual analysis, individual emergence, contextual dependency.
1. Introduction Minati [1] suggests that a study of processes of emergence implies a need to model and distinguish the establishment of structures, systems and systemic properties. It goes on to point out that, in a constructivist view, an observer identifies such properties by application of models. Different perceptions of structures and systems correspond to different, irreducible models. Perceived emergence of systemic properties, e.g. functionality in computer systems or collective learning abilities in social systems, then ensues from application of such models. The author of this paper wishes to compare and contrast two alternative models that may be applied in forming constructivist views of organizational systems. The paper shows how one particular model highlights the importance of individual, as well as organizational emergence. Its contribution is to argue for a 299
300
P.M. Bednar
move away from reductionist cybernetic models towards critical systemic thinking – from attempts to reduce uncertainties inherent in management of organizations towards approaches which embrace ‘complexification’. Using information systems development as an example, the implications for individual and collective learning in organizations are explored and a case for contextual methods of inquiry to support organizational learning is made. A particular framework for contextual inquiry is then described in outline. An organisation may be viewed as a complex social system, affected by goals and values of the individuals within it [2]. We are reminded by Senge [3] that “Today, systems thinking is needed more than ever because we are becoming overwhelmed by complexity. Perhaps for the first time in history, humankind has the capacity to create far more information than anyone can absorb, to foster far greater interdependency than anyone can manage, and to accelerate change far faster than anyone’s ability to keep pace ... organizations break down, despite individual brilliance and innovative products, because they are unable to pull their diverse functions and talents into a productive whole.” [3, p. 69]. The nature of these social systems, their sub-systemic structures and the relations which sustain them over time vary widely from one organization to another. An organization can also be viewed as a purposeful human activity system [4]. However, objective agreement on the nature of such systems is elusive, since the defining properties of ‘the system’ will depend upon the viewpoint of the individual who considers it. For example, when a person enters a bank as a customer, he is likely to view this organization as a system for providing him with financial services. However, to a person who enters that bank as an employee, it may appear to be a system for providing her with a livelihood. Checkland refers to these differing perspectives as “Weltanschauungen” or “worldviews” [4]. Schein [2] suggested that organizational culture is formed over time through shared goals. Such sharing could only be achieved through a negotiation of differing perspectives held by individuals [4]. For this reason, agreement on a single description of a “real” human activity system will remain elusive and consensus on its goals difficult to achieve. Within any ‘organization’, an interacting collection of living individuals can be found, each with a unique life history and worldview. Every individual produce her/his own unique understanding of context, constructed through interaction with organizational systems and environment by means of a variety of sense-making strategies [5,6,7]. Those taking on responsibility for
Contextual Analysis. A Multiperspective Inquiry into Emergence …
301
management as an activity need to be aware of the challenges posed by these differing perspectives. One possible definition of ‘management’ is ‘a set of practices and discourses embedded within broader asymmetrical power relations, which systematically privilege the interests and viewpoints of some groups, whilst silencing and marginalizing others.’ [8]. Langefors [9] discusses the role of organizational information systems. He considered that, in order to manage an organization, it would be necessary to know something about the current state and behaviour of its different parts and also the environment within which it was interacting. These parts would need to be coordinated and inter-related, i.e. to form a system. Thus, means to obtain information from the different parts of a business would be essential and these means (information units) would also need to be inter-related. Since the effectiveness of the organization would depend upon the effectiveness of the information units, an organization could be seen as crucially ‘tied-together’ by information. For Langefors, therefore, the organization and its information system could be viewed as one and the same [9]. The next section of the paper sets out some of the theoretical background within which contemporary systemic models have been framed. This is followed by a discussion of learning and knowing in an organizational context. Contrasting models of organizational systems are then set out, showing how different perspectives on emergence result from their application. A role for contextual inquiry in enabling individual, as well as organizational emergence to be explored is then set out. One possible method of contextual inquiry is explained. The final section of the paper attempts to summarize the arguments. 2. Background The Many attempts have been made in the past to understand and manipulate social phenomena by application of laws derived from the natural world. Ackoff [10] quotes examples set out by sociologist Sorokin [11] where researchers had attempted to establish laws of ‘social physics’. He also notes that philosopher Herbert Spencer referred to a general characteristics of ‘life’ (accepted in relation to biological phenomena) as no less applicable to society, i.e. characteristics of growth, increasing differentiation of structure and increasing definition of function [10]. A great deal of research is available on systems perspectives in social science (see for example West Churchman [12], Simon [13]). However, as Emery [14] points out, these contributions have been fragmented and diverse, often using similar terms to denote quite different concepts. Attempts have been made to liken the operation of social ‘systems’ to
302
P.M. Bednar
mechanistic models derived from engineering (see, for example, applications of the Shannon-Weaver [15] model from telecommunications to human interaction and communication) or to organic models from biology (e.g. applications of Maturana and Varela’s theory of autopoeisis [16]). Ulrich [17] provides a discussion of the way that root metaphors in systems thinking influence the way in which a person conceives of ‘a system’. Without these metaphors, the concept of a system might have remained ‘empty’. The scope for systemic research to inform management thinking has therefore been diverse and confused. Perhaps one of the most influential works has been the General Systems Theory of Von Bertalanffy [18]. He did not favour direct application of mechanistic models to human problems, suggesting instead: “... systems science, centered in computer technology, cybernetics, automation and systems engineering, appears to make the systems idea into another – and indeed the ultimate – technique to shape man and society ever more into the ‘mega machine’ ...” [18, p. viii]. In his chapter on ‘The Meaning of General Systems Theory’ he points out that models which are essentially quantitative in nature have limited application to phenomena where qualitative interpretations ‘may lead to interesting consequences’ [18, p. 47]. Nevertheless, cybernetic models derived from GST have had great appeal in management literature. In particular, a concept of sub-optimality has been the focus of attention. Boulding [19], for instance, attempts to establish laws of organization. His law of instability suggests that organizations fail to reach a stable equilibrium in relation to their goals due to cyclic fluctuations resulting from the interaction of sub-systems. Ways to remove sub-optimality, a result of conflict between systemic and sub-systemic goals, have therefore been identified as a key function of management as it attempts Fayol’s [20] classic tasks of planning, directing and controlling. The reflection is that learning must surely be a prerequisite to purposeful activities of the kind Fayol describes [20]. Bateson [21] reminds us that a critical element of learning is reflexivity – awareness of one’s own responses to context. Such reflexivity should inform any systemic view of human activities. From an interpretive perspective, an individual’s sense-making is codependent with the organizational culture within which it takes place, and requires continual construction/re-construction through reflection over time [2]. A perception of organizational life focused on goal-seeking is therefore problematic. Vickers [22] argues that life consists in experiencing relations rather than seeking ‘ends’. He challenges a cybernetic paradigm which a goal-
Contextual Analysis. A Multiperspective Inquiry into Emergence …
303
seeking model implies, suggesting instead a cyclical process in which experience generates individual norms and values. These in turn create a readiness in people to notice aspects of their situation, measure them against norms and discriminate between them. Our ‘appreciative settings’ condition our perceptions of new experiences, but are also modified by them. Development of an individual’s appreciative system is thus ongoing over time as a backdrop to social life. If individual sense-making is co-dependent with organizational culture there must be some interaction between them, built on communication. Information can be defined as data which is rendered meaningful in a particular context. The meaning attributed to an item may well vary when understood from the point of view of different individuals. Each individual produces her/his own understanding of contexts within which information is formed, constructed through interaction with organizational systems and their environment by means of a variety of sense-making strategies [5]. During the 1960’s, Borje Langefors [23] developed the ‘Infological Equation’. This work identifies the significance of interpretations made by unique individuals within specific organizational contexts [9]. The Infological Equation [9,23] “I=i(D,S,t)” shows how meaningful information (I) may be constructed from the data (D) in the light of participants’ pre-knowledge (S) by an interpretive process (i) during the time interval (t). The necessary pre-knowledge (s) is generated through the entire previous life experience of the individual. Individuals perform different systemic roles within organizations, and have unique perspectives derived from the sum of previous life experiences. Meanings are constructed by different individuals reflecting their unique world views. While it is possible to construct a ‘conduit’ through which data may flow around an organization, information is constructed by individuals in their interactions within the organizational context. Logically, therefore, it is possible to develop a data system to support management tasks, but this could only become an information system through direct and interpretive participation from those individuals using it. The logic demonstrated by the Infological Equation suggests that individual learning and organizational development are inextricably bound together. Information systems must therefore provide support for contextually relevant individual learning, and organizational analysis drawing on this learning, as a systemic process over time [24]. 3. Learning and Knowing Those theories that an individual creates through sense-making will be influenced by multiple contextual dependencies arising from her/his experience
304
P.M. Bednar
and environment [24]. Such dependencies have been derived through the particular experiences of individuals involved, in the context of their own working situations. The distinctiveness of each work situation lies in construction of meanings that individuals attach to it. In relation to systems design in particular, therefore, there is no reason to assume consensus among the different actors as to the desirable properties of a proposed system. Indeed, as the Infological Equation demonstrates [9,23], it is not possible for any individual to know in advance precisely what requirements she/he might have. Instead, actors need support to engage in a collaborative endeavour of requirement shaping. Here individuals partake in a learning spiral through reflection on sense-making in a work context in order to create understanding of those emergent ‘systems’ in their minds. Individual learning may be described as taking place through sense-making processes as a response to messy and uncertain contexts in which resolutions are sought. Different orders of learning may be identified, based on a cycle of experience and reflection on experience [6,25]. Higher orders of learning involve reflection on sense-making processes themselves, i.e. a learning cycle transforms into a spiral. Reflection on sense-making becomes an exercise in practical philosophy. Certain points follow from this. If individual learning is a creative process based in sense-making, then context is clearly important. Any unique individual’s view is based in reflection on experience [6], and experience is context specific. Therefore, an examination of contextual dependencies, as part of analysis, will be important. Knowing, as a creative process, is inextricably linked to learning. Bateson [6] suggests that information may be defined as ‘a difference that makes a difference’, existing only in relation to a mental process. This process is what leads to an individual ‘knowing’. Bateson [6] describes a hierarchy of different orders of learning. At level zero, learning represents no change, since the same criteria will be used and reused without reflection. This is the case in rote learning of dates, code words, etc which is contextually independent and in which repeated instances of the same stimuli produce the same resulting ‘product’. All other learning, according to Bateson’s hierarchy [6], involves some element of trial and error and reflection. Orders of learning can be classified according to types of errors and the processes by which correction is achieved. Level I involves some revision using a set of alternatives within a repeatable context, level II represents revision based on revision of context, and so on.
Contextual Analysis. A Multiperspective Inquiry into Emergence …
305
Bateson’s hierarchy [6] finds an echo in the work of Argyris and Schon [25] (single and double-loop learning). Double loop learning comes about through reflection on learning processes in which individuals may attempt to challenge prejudices and assumptions arising from their experiences [25,26]. When individuals need to solve an immediate problem, i.e. close a perceived gap between expected and actual experience, they may harness their sense-making processes within contexts of existing goals, values, plans and rules (Vickers’s appreciative settings [5]), without questioning their appropriateness. However, if individuals challenge received wisdom and critically appraise assumptions previously applied, double-loop learning occurs. The resulting process creates a productive learning spiral, which is at the heart of any successful organizational innovation. As mentioned previously, the Infological Equation [9,23] suggests that individuals develop unique understandings (meaningful information) by examining data in the light of (their own) pre-knowledge gained from reflecting on experience during a previous time interval. Information, and ‘knowledge’ derived from it, cannot therefore be seen as commodities, to be transmitted from one individual to another (or stored) as containers of objective meaning. Furthermore, it is through these processes of constructing new understandings/ meaning, by examining data in light of experience, that organizations, their goals and cultures are constituted. If individual learning is a creative process, organizational learning is so also. 4. Complexification and Emergence Attempts by students of management to reduce organizational problems to consideration of ‘sub-optimality’, drawing on mechanistic models from systems science can be seen as reductionism. Exploration of multiple levels of contextual dependency may help analysts to avoid entrapment in various types of reductionism, including undue reliance on sociological, psychological or technological concepts. It may also help to eliminate tendencies towards generalization, or substitution of an external analyst’s own views for those of the participating stakeholders. A need to promote deep understandings of problem spaces requires us to go beyond grounding of research in phenomenological paradigms. In order to avoid various types of reductionism and achieve deepened understanding, analysts must attempt to incorporate philosophy as an integral part of their research practice [5,17,21,27,28]. As pointed out by Werner Ulrich [29] in his discussion of boundary critique perception of a system varies with the stance of the observer, i.e. this
306
P.M. Bednar
differentiates between an observer’s and an actor’s picture of reality, which means that anyone wishing to inquire into IS use must continually align themselves with actor perspectives. For example, meaning shaping in particular situations can be described through comparisons of different actors’ perspectives within given structural criteria, or ‘circling of realities’. This refers to a necessity to acquire a number of different perspectives (in time-space) in order to be able to get a better and more stable appreciation of an actor reality [30]. The whole person includes dimensions of both ‘heart’ and ‘mind’ [31]. Personal perspectives which transcend received, organizational ‘common sense thinking’ may be encouraged to emerge through methods which emphasise individual uniqueness and contextual dependency. Those engaged in management tasks such as IS design should not forget that they set up personal boundaries for a situation by defining it from their own experiences and preferences. As human beings we all have pre-understandings of phenomena, which are influenced by our own values, ‘wishful thinking’, and how each of us has been socialized into a particular society. These preunderstandings are being reviewed gradually, with the support of our experience. In a continual exchange/interchange between an individual’s preunderstanding and experience, a process of inquiry may progress. It follows from the preceding discussion that, from the point of view of each individual’s perception, an organization is an emergent property of inter-individual sensemaking processes and activities. The organization is continually constructed/reconstructed for each individual as a result of emergence from individual sensemaking perspectives. A critically informed approach to research involves recognition / understanding of this emergence. Without recognition of the uniqueness of each particular individual’s experience of organizational life this critical approach may be undermined. Within a traditional scientific paradigm, the focus of a researcher’s attention rests on increasing the precision and clarity with which a problem situation may be expressed. This can lead to an artificial separation of theory from praxis, of observation from observer and observed. ‘Knowing’ about organizational context (formed by on-going construction of meanings through synthesis of new data with past experience) may be deeply embedded and inaccessible to individuals concerned. The perspective promoted in this paper emphasises self-awareness of human individuals. In research undertaken from this perspective, a focus towards emancipation and transparency, rather than clarity and precision, is adopted. A researcher taking such a perspective will recognize that there are uncertainties and ambiguities
Contextual Analysis. A Multiperspective Inquiry into Emergence …
307
inherent in socially constructed everyday world views (a similar discussion can be found in Radnitzky [32]. In some approaches, a human activity system is regarded as a mental construct derived from an interrelated set of elements, in which the whole has properties greater than the combination of component elements. When such a model is adopted, individual uniqueness is subsumed in perceived emergent properties of a conceptualised system. Even when considered as a duality seen as a system to be served and a serving system [4,33], individuals remain invisible. In order to take into account unique individual sense-making processes within an organizational problem arena, there is a need for analysts to explore multiple levels of contextual dependencies. Every observation is made from the point of view of a particular observer [32]. Since it is not possible to explore problem spaces from someone else’s point of view, it follows that external analysts can only play supportive roles in enabling individuals within given contexts to explore their own sense-making. In an alternative model, an organizational system may be seen as an emergent property of unique, individual sense-making processes and interactions within a particular problem arena [34,35]. When considered in this way, it is possible to perceive some individuals themselves to have emergent properties of their own which can be larger than (e.g. outside of) those of one particular organizational system seen as a whole. Consider, for instance, a football club seeking to recruit skilful players for its team. The manager may perceive a need for a creative, attacking midfielder to play a role as one component part of the team’s efforts to win. The Los Angeles Galaxy Club recently experienced such a need but chose to recruit former England captain, David Beckham. Beckham can play the role of an attacking mid-fielder for the team. However, he brings with him qualities which transcend this in terms of his personal notoriety, publicity potential and marketing value for sales of Club products such as replica shirts, etc. Beckham has emergent properties beyond those of any other mid-field footballer in relation to the human activity system which is that Club. This model is not, of course, the same as a non-systemic, fragmented view which focuses on individuals but fails to perceive an emergent system arising through their interactions, and hence ignores the impact of norms, values, expectations, communicational acts, etc. on individual sense-making processes [36]. 5. Contextual Inquiry The importance of context for systemic analysis has been widely recognized [4,17,24,28]. Contextual inquiry, as described here, is viewed as a special case
308
P.M. Bednar
of contextual analysis. This paper describes an application of a framework for contextual inquiry, the Strategic Systemic Thinking (SST) framework [24]. This forms an exploration into the nature of open systems thinking and how systemic identities are maintained and generated within a specific human activity context. SST maintains a particular focus on ways in which human analysts can deal with complexification and uncertainty although this poses apparently insuperable epistemological problems. Particular emphasis is placed on a multiplicity of individual sense-making processes and ways these are played out within organizations. SST can support groups of organizational actors to take contextual dependencies into consideration, and is intended as a means to enable them to cope with escalations in complexity. A cardinal principle of the framework is that actors should own and control their own inquiry, supported but not dominated by a facilitating professional analyst. When an attempt is made to evaluate effectiveness in managing or ‘designing’ organizational systems, concepts of analysis become important. Good practice requires an understanding that addresses intrinsic and contextually-dependent characteristics of organizational activities. An understanding can only come about through relevant evaluative and analytical strategies. Evaluation is a result of both inquiring and reflecting thought processes, i.e. mental activity intrinsically dependent upon a demonstrated, contextually-dependent desire to explore a certain problem space. Analysis is an inquiry into the assumed-to-be unknown and/or a questioning of the assumed-tobe known. Evaluation, is a consolidating process, where judgments are made, and assumed ‘truths’ and ‘knowledge’ are incorporated into some kind of hierarchy. Together, an analysis (i.e. creation of ‘new’ knowledge) and evaluation (i.e. categorization of ‘existing’ knowledge) represent closing of a learning circle. Any conscious reflection over requirements for a higher quality learning circle could become a daunting exercise as it involves raising the quality of ‘knowing’. This is why a framework such as SST has an important role to play. SST involves three aspects intra-analysis, inter-analysis and value-analysis. These should not be regarded as sequential, as it is possible to begin at any point in the framework. SST is intended to be iterative, and therefore it is possible to move from one analysis to another repeatedly and in any direction, at any time. A range of methods are available to the actors, and their facilitating external analyst, in seeking to articulate their worldviews. These methods include: rich pictures, brain-storming, mind-maps, diversity networks, drama transfers, roleplaying – all of which are supporting creation, visualization, and communication
Contextual Analysis. A Multiperspective Inquiry into Emergence …
309
of mental models and narratives. Each of the three aspects of the framework helps to guide inquiries with a number of themes. The purpose of intra-analysis is to enable creation of an individual process for structuring a problem. This analysis aims to create and capture a range of narratives from participating stakeholders by providing an enrichment and visualization process for them. Inter-analysis is the aspect of the inquiry which represents collective reflections of decision-making alternatives. The aim is to have a dialogue and to reflect upon ranges of narratives derived through intra-analysis. The purpose is not to achieve consensus or to establish common ground, but to produce a richer base upon which further inquiry and decision-making could proceed. Grouping of narratives takes place through consideration and discussion of individually produced narratives. Results of these inquiries might be considered to form a knowledge base relating to problem spaces under investigation. A critical and reflective approach in considering these results is needed to ensure a basis for ‘good’ decision-making and to avoid unintended, negative consequences for actors and organizations concerned. Evaluation could be said to be an examination of the ‘known’ – what has been learned from analyses in a sociocultural context. Here actors may carry out examinations of values influencing and constraining the analyses, and consider prioritization from political and cultural perspective. SST can be explained as involving groups of professional members of organisations to act as analysts of their own problem spaces under guidance of expert analysts as external facilitators. This includes examination of their activities and specific use of methodologies, rhetoric and strategies to construct local arguments and findings. By the end of an initial analysis, analysts (e.g. organisational actors) might for example be familiar with some of the strategies available within their organization for further inquiries into contextual dependencies. SST is complementary, rather than alternative, to traditional approaches to analysis. However, there may be conflicts relating to unproblematized assumptions of ontological beliefs and logical empiricism (i.e. unquestioned beliefs of ‘objectivities and truths’). Other assumptions may also arise which are incompatible with the underlying philosophy of SST, e.g. the traditional communicational theories, focusing on a ‘sender-receiver’ perspective. To give a simplified example, in a traditional approach, inquiry might ask what a company wants to achieve with its information and communication system. On the other hand, a contextual inquiry would ask what the people who will use the system want to achieve, and what roles and specific purposes their activities might have in organizational contexts. What makes their
310
P.M. Bednar
unique situation recognizable for them? What specific role do they give to information (and the organizational business)? This inquiry is to be seen as investigation by users themselves into their own assumptions and needs within the space of an open information system (an ' organization' , human activity system or socio-cultural system). This is a bottom up perspective on organisation, information and (technical) communication systems. Systems are envisaged, which are shaped with the intention to serve specific organizational actors and their needs – from their own points of view. 6. Conclusions Contextual inquiry is intended to support analysts to recognize individual emergence, multiperspectivity and open systems thinking in combination. Two different categories of emergence are highlighted. In the first, each individual’s identity is an emergent property of a number of emergent systems of which the individual is a member. In the second category, each organization is an emergent property of the multiple perspectives of all the interacting individuals for whom its existence is relevant. There are multiple views of what comprises the organization, formed from the multiple perspectives of many individuals. From a systems analyst’s point of view, many possible descriptions will emerge in any organizational inquiry, through the differing experiences of context among many individuals. The boundaries of an organizational system will be dependent upon multiple perspectives and descriptions from individuals. This requires consideration to be given to sense-making, emotion and learning processes that those individuals engage in. It is helpful to highlight different levels of abstraction involved in discussions about systems as emergent properties of socio-cultural phenomena. The Strategic Systemic Thinking framework is discussed as a contemporary version of contextual analysis. Its aim is to support application and use of specifically adapted methods by groups of individual stakeholders in their efforts to construct understanding and meaning. Its focus is on ways in which information needs and information use are created by individuals. A concept of contextual dependency is of interest because it supports a focus of inquiry by unique individuals, on their own individual beliefs, thoughts and actions in specific situations and contexts. Through this kind of inquiry support is provided for a contextually-dependent creation of necessary knowledge. This has potential to provide a foundation for more successful communication, systemic analysis and eventually information systems development to be achieved. The
Contextual Analysis. A Multiperspective Inquiry into Emergence …
311
purpose is to create a form of organizational transformation that allows individual emergence to surface. References 1. G. Minati, Call for Papers, AIRS Congress 2007,
(http://www.airs.it/AIRS/indexEN.htm, accessed 18 September 2007).
2. E. Schein, Organizational Culture and Leadership (2nd edition), (Jossey-Bass., San Francisco, 1992).
3. P.M. Senge, The Fifth Discipline: The Art & Practice of the Learning Organization (Doubleday, New York, 1990).
4. P. Checkland, Systems Thinking, Systems Practice (John Wiley & Sons, Chichester, 1981).
5. K. Weick, Sensemaking in organizations (Sage Publications, Thousand Oaks, Cal.. 1995).
6. G. Bateson, Steps to an Ecology of Mind (University of Chicago Press, Chicago, 1972).
7. P.L. Berger and T. Luckmann, The Social Construction of Reality: A Treatise in the Sociology of Knowledge (Anchor Books, New York, 1966).
8. D.L. Levy, M. Alvesson and H. Willmott, in Studying Management Critically, Alvesson, Eds. M. Alvesson and H. Willmott (Sage, London, 2003).
9. B. Langefors, Essays on Infology - Summing up and Planning for the Future (Studentlitteratur, Lund, 1995).
10. R.L. Ackoff, Ackoff’s Best: His Classic Writings on Management (Wiley, New 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.
York, 1999). P. Sorokin, Contemporary Sociological Theories (Harper, New York, 1928). C.W. Churchman, The Systems Approach (Delacourt Press, New York, 1968). H.A. Simon, The Sciences of the Artificial (MIT Press, Cambridge, Mass., 1969). F.E. Emery, Systems Thinking (Penguin Books Ltd., Harmondsworth, 1969). C.E. Shannon and W. Weaver, The Mathematical Theory of Communication (University of Illinois Press, Champaign, IL, 1999). H.R. Maturana and F. J. Varela, Autopoiesis and Cognition: The Realization of the Living. (Reidel, Boston, 1980). W. Ulrich, Critical Heuristics of Social Planning (Wiley, Chichester, 1983). L. von Bertalanffy, General Systems Theory (George Brasiller, New York, 1969). K.E. Boulding, The Organizational Revolution (Harper & Row, New York, 1953). H. Fayol, General and industrial management (Pitman Publishing company, London, 1949). G. Bateson, Mind and Nature: a Necessary Unity (Hampton Press, Cresskill, NJ, 2003). G. Vickers, Freedom in a Rocking Boat (Allen Lane-Penguin Books LTD, London, 1970). B. Langefors, Theoretical Analysis of Information Systems (Studentlitteratur, Lund, 1966). P.M. Bednar, Informing Science 3(3), 145 - 156 (2000).
312
P.M. Bednar
25. C. Argyris and D.A. Schon, Organizational Learning (Addison Wesley, Reading Mass., 1978).
26. A. Argyris and D.A. Schon, Organizational Learning II - Theory, Method and Practice (Addison Wesley, Reading Mass., 1996).
27. H.K. Klein, Fourth Leverhulme Lecture January 12 2007 (Salford Business School, UK, 2007).
28. H.E. Nissen, Informing Science 10, 21-62 (2007). 29. W. Ulrich, The J. of Information Technology Theory and Application (JITTA) 3(3), 55-106 (2001).
30. P.M. Bednar and C. Welch, Informing Science 10, 273-293 (2007). 31. C.U. Ciborra, Getting to the Heart of the Situation: The Phenomenological Roots of 32. 33. 34. 35. 36.
Situatedness. (Interaction Design Institute, Ivrea, Symposium 2005. Accessed June 2007 at: http://projectsfinal.interaction-ivrea.it/web/2004_2005.html. 2004 ). G. Radnitzky, Contemporary Schools of Metascience (Akademiforlaget, Gothenburg, 1970). P. Checkland and S. Holwell, Information, Systems and Information Systems making sense of the field (John Wiley & Sons, Chichester, 1998). G. De Zeeuw, Systemica: J. of the Dutch Systems Group 14(1-6), ix-xi (2007). P.M. Bednar, Systemica: J. of the Dutch Systems Group 14(1-6), 23-38 (2007). N. Hay, The Systemist, 29(1), 7-20 (2007).
JOB SATISFACTION AND ORGANIZATIONAL COMMITMENT: AFFECTIVE COMMITMENT PREDICTORS IN A GROUP OF PROFESSIONALS
MARIA SANTA FERRETTI Dipartimento di Psicologia, Università di Pavia Piazza Botta 6, 27100 Pavia, Italy E-mail: mats.larssonphysto.se Job satisfaction and organizational commitment have long been identified as relevant factors for the well-being of individuals within an organization and the success of the organization itself. As the well-being can be, in principle, considered as emergent from the influence of a number of factors, the main goal of a theory of organizations is to identify these factors and the role they can play. In this regard job satisfaction and organizational commitment have been often identified with structural factors allowing an organization to be considered as a system, or a wholistic entity, rather than a simple aggregate of individuals. Furthermore, recent studies have shown that job satisfaction has a significant, direct effect on determining individuals’ attachment to an organization and a significant but indirect effect on their intention to leave a company. However, a complete assessment of the role of these factors in establishing and keeping the emergence of an organization is still lacking, due to shortage of measuring instruments and to practical difficulties in interviewing organization members. The present study aims to give a further contribution to what is currently known about the relationship between job satisfaction and affective commitment by using a group of professionals, all at management level. A questionnaire to measure these constructs, following a pilot study, was designed and administered to 1042 participants who were all professionals and had the title of industrial manager or director. The factors relating to job satisfaction and the predictive value of these factors (to predict an employee’s emotional involvement with their organization) were simultaneously tested by a confirmative factorial model. The results were generalized with a multisample procedure by using models of structural equations. This procedure was used to check whether these factors could be considered or not as causes producing the measured affective commitment. The results showed that the four dimensions of job satisfaction (professional development, information, remuneration and relationship with superiors) are not equally predictive of affective commitment. To be more specific, the opportunity of professional development or growth provided by a company was shown to be the best predictor of affective commitment. This seems to suggest that, as expected, the emergence of organizations could be a true emergence, not reducible to a sum of single causes. Implications, future lines of research and limitations are discussed. Keywords: job satisfaction, affective commitment, professionals.
313
314
M.S. Ferretti
1. Introduction In the past years theories of well-being in an organizational context evidenced how the latter be an emergent construct influenced by a number of factors, the main ones being job satisfaction and organizational commitment. A complete theory of this kind of emergence should clarify the roles these factors play by resorting to suitable investigations based on the use of questionnaires. The results obtained from the latter should, then, constitute the main departure point for the building of a more complete theory of emergence of well-being. As regards job satisfaction, we remark that it has a central role in many contemporary models and theories of behavior and attitude to work. It is also an important element for the improvement of the quality of employees’ working lives and for organizational efficiency. In the past, satisfaction was studied and observed in essentially a onedimensional way but today it is recognized as being a complex and multidimensional structure. Previous observations tended to be fairly general and simplistic but today, they have been developed into detailed conceptual and empirical definitions. The body of research formed by previous studies and theories relating to this topic has allowed the identification of the multiple factors that influence satisfaction. It has also allowed us to understand what the possible consequences of job satisfaction are. The main factors, according to Locke (1976) [17] are: turnover, absenteeism, physical health, mental health, complaints, attitude to work, self-esteem. On the other hand, the causes of satisfaction, or factors that can influence it, are highlighted in many different models, which aim to classify them. Two of the main classifications that should be mentioned are those cited by Herzberg (1967) [15] and Locke (1976). According to Herzberg (1967) the causes of dissatisfaction are linked to factors inherent in the environmental context (e.g. company politics or procedures, technical competence of superiors, remuneration, inter-personal relationships, physical conditions of work) and to factors inherent in the content of a job (e.g. the nature of the work itself, responsibility held, professional promotion, recognition gained and results achieved). However, Locke proposes another way of dividing up the factors that influence satisfaction: working conditions (e.g. remuneration, promotion, recognition and benefits), the people who you come into contact with at work and individual characteristics (age, sex, how long an employee has been with the company, level of training and hierarchical position held). Previous literature presents many different definitions of commitment, but all of them include the attachment of the individual to their organization. The
Job Satisfaction and Organizational Commitment: Affective Commitment …
315
most commonly studied conception of commitment is based on the work of Mowday et al (1979; 1982) [26,27], who identifies three components: acceptance of the company’s objectives, willingness to work for the company and a wish to stay within the organization. More recently, a three-component conception of commitment has been developed (Meyer, Allen & Smith, 1993) [25]. Meyer and Allen (1997) made a distinction between the target of an employee’s commitment (e.g. the organization, the job, supervisors and the group in which one works) and the relationship that is created between the employee and the target. This relationship may be of an emotional kind: it is possible to feel morally obliged not to leave the organization or job. It is therefore possible to recognize the costs associated with leaving the job (Meyer & Allen, 1991 [21]; Meyer & Herscovitch, 2001 [24]). Most of the past research has analyzed organizational commitment, defined as the psychological state that characterizes the relationship between employee and organization (Meyer and Allen, 1991; 1997) [21,22]. As we have seen, Meyer and Allen have identified three types of commitment: normative, continuance and affective. In the case of normative commitment employees stay because they feel a duty to do so. Continuance commitment expresses a utilitarian bond: employees are conscious of the costs involved in changing so stay within the organization because it is convenient for them to do so. Affective commitment corresponds to emotional attachment: employees stay within the organization because they want to. Meyer, Allen and Smith (1993) [25] discussed the origin and nature of the three types of commitment. They suggest that there are various factors involved. Normative commitment derives from personal values and from the moral sense of obligation that a person has after having obtained favors or benefits from the organization. Whereas, continuance commitment is a product of the benefits obtained from working in a certain organization and from a lack of any other alternative job opportunities. Finally, affective commitment originates from an employee’s working conditions and from their expectations of results: if the job gives the employee what he expects to receive. The model of commitment developed by Penley and Gould (1988) follows a slightly different approach to that taken by Meyer and Allen. Based on Elzoni’s (1961) multi-facetted conceptualization of involvement, Penley and Gould argue that individuals’ commitment to an organization exists in an emotional and an instrumental form. In fact, it is possible to display an emotional commitment, a calculated commitment or a commitment which is due to reasons alien to the organization. In addition, moral commitment is described as a form of emotion with high levels of positivity, characterized by an acceptance of and
316
M.S. Ferretti
identification with the organization’s objectives. Penley and Gould (1988) therefore, seem to conceptualize moral commitment in a similar way to how affective commitment is defined by Meyer and Allen. In the literature relating to organizational commitment, many studies have been carried out with the aim of identifying the factors that precede it (Meyer & Allen, 1990; Meyer, Allen & Smith, 1993 [25]; De Cotis & Summers, 1987 [11]; Wasti, 2003 [32]). A variety of factors that precede organizational commitment have been highlighted, but one, which has received particular attention, is affective commitment. Mowday, Porter & Steers (1982) [27] classified these preceding factors into four categories: individual characteristics (demographic details, personality and attitudinal), structural characteristics of the organization, job characteristics and work experience. Many studies have supported the idea that the different forms of commitment have a considerable impact on work performance, absenteeism, turnover and work related stress (Meyer & Allen, 1991 [21]; Mowday, Porter & Steers, 1982 [27]; Hackett, Bycio & Hausdork, 1994 [14]; De Cotis & Summers, 1987 [11]). The three forms of commitment were all shown to be negatively correlated with turnover. As for the behavioral consequences of organizational commitment, it was found that affective commitment (more so than normative or continuance commitment) is correlated to a large number of resultant variables (e.g. turnover, work performance, citizenship behavior) and presents stronger correlations with every one of these variables (Meyer & Herscovitch, 2001) [24]. Organizational commitment has been studied in relation to many personal and organizational variables, one of which is job satisfaction. There are many parallels between organizational commitment and job satisfaction. This is no surprise as many different studies have shown that there is a strong correlation between commitment and job satisfaction. For example, Mathieu and Zajac (1990) obtained a correlation equal to .49 between the two constructs. Although these two factors are closely correlated they constitute two constructs that are empirically separate (Brooke, Russell, & Price, 1988 [6]; Glisson & Durick, 1988 [13]; Shore, Newton & Thornton, 1990 [29]). Recent studies have shown that job satisfaction is of key importance as it has a direct effect on employees’ commitment to their organization and an indirect effect on employees’ intention to leave a company (Powell & Meyer, 2004) [28].
Job Satisfaction and Organizational Commitment: Affective Commitment …
317
2. Field measures & Targets The theories and models that have been proposed in this area of research so far have been based upon studies that used participant samples taken from various groups of professionals. However, there are only a limited number of studies that have specifically aimed to study these factors in professionals at a managerial level. Few research projects have attempted to determine the factors preceding organizational commitment in different occupational groups. Ritzer & Trice (1969) suggest that the relationship between organizational commitment and its preceding factors might be stronger for non-professionals than for professionals as professionals don’t direct their expectations toward the organization but toward their job. Therefore, the organization, as an object to which they could become committed, is not as important to them as it is for nonprofessionals. Nystrom (1990) emphasized the importance of vertical communication exchanges between managers and their subordinate employees. His research supports the expectation that bosses who experience low quality communication exchanges with their direct superiors tend to feel less involved in the organization, while, managers who take part in good quality vertical communication exchanges express a high level of organizational commitment. The results of the meta-analysis conducted by Cohen (1992) show that the variable ‘salary’ has a stronger relationship with organizational commitment in professionals. Salary is included in the organizational model as a proceeding factor in as much as the organization controls this variable. The fact that this calculated factor has a strong effect on professionals shows that their expectations are not only of an intrinsic character. This result indicates that although general intrinsic aspects are important for the commitment of workers with a high–level job status, extrinsic factors also play a key role (Angle & Perry, 1981). A previous study by Ferretti & Argentero (2006) [12] forms the basis of the current paper. Ferretti & Argentero (2006) showed that job satisfaction in professional work is multi-dimensional: they describe four dimensions which are specific to this type of worker. The first is satisfaction connected to the information an employee receives. The second is satisfaction connected to the opportunities of development and professional growth while the third related to satisfaction linked to remuneration. The final dimension is related to the employee’s relationship with his or her superiors. The present study aims to investigate the relationship between these dimensions of satisfaction and commitment (affective commitment) concerning the organization while also considering the possibility of different the types of satisfaction and commitment.
318
M.S. Ferretti
Past research has shed light on the casual effect between job satisfaction and affective commitment. Darwish (2002) [10] suggested that job satisfaction has a noteworthy effect on affective commitment (regression co-efficient = .44). Moreover, Jernigon, Beggs & Kohut (2002) carried out a study on a group of nurses and showed that affective commitment was explained by work satisfaction. Their line of research related to the variations in the types of commitment (moral, calculated or alien) due to the level of work satisfaction (autonomy, interaction, remuneration, professional status, organizational politics and characteristics required). Therefore, one could expect that in a sample of professionals these different aspects related to job satisfaction could help to explain emotional involvement in the organization. After having revised and expanded Locke’s model (1997), Meyer et al (2004) [23] claimed that workers with strong affective commitment are driven by intrinsic motivation. One could therefore expect that among the different dimensions of job satisfaction observed in the directors it is personal development that provides the biggest contribution to an employee’s affective commitment. To examine the role that specific aspects of job satisfaction play as predictors of affective commitment, the following hypothesis’ have been formulated and tested: H1: Satisfaction due to the opportunity for development, interaction with superiors, remuneration and the information an employee receives all have a positive influence upon affective commitment. H2: The satisfaction due to the opportunity for development is the best predictor of affective commitment STUDY The present study aims to identify which variables relating to job satisfaction encourage emotional involvement with the company. 3. Method Participants 1042 individuals participated in the current investigation (men = 553; women = 489) who were all professionals (professional is used here to mean an employee who is responsible for managing other employees and who has a title such as
Job Satisfaction and Organizational Commitment: Affective Commitment …
319
Table 1. Participants characteristics (N = 1042). Gender Age
Job Level Organizational seniority
Professional macro-area
M
% 53.07
F
46.93
<30 years old 30-39 years old 40-49 years old 50 years old Managers Professionals 5 years 6-10 years 11-20 years >20 years Technical Production Sales Staff Product and Program Management Procurement ICT Systems Other
4.64 29.51 32.57 33.28 7.69 92.31 13.76 11.21 20.36 54.67 17.27 22.74 13.81 26.30 1.00 3.26 8.25 7.37
“manager” or “director”) in a multinational manufacturing company which has its headquarters in Italy. The participants (tab. 1) were mostly over 40 years old (65.9%) and had the title “manager” (92.3%). 22.7% worked in production and 26.3% were staff. 75% of the participants had worked for the company for more than 10 years. These individual characteristics of employees are related by the company’s business activity (manufacturing) and by the employees high level (managers or directors) within the investigated company. 3.1. Materials The investigative questionnaire (displayed in the appendix) used in this study consists of an anonymous questionnaire made up of 20 items, 16 of which measure the four dimensions of job satisfaction while 4 investigate commitment to the organization. The participants were asked to indicate, for each item, how much they agreed with the statement presented (on the basis of their experience within the company) with a Likert–type scale with 5 points where 1 = don’t agree at all and 5 = completely agree. The four dimensions of satisfaction respectively measure:
320
•
•
•
•
M.S. Ferretti
Satisfaction for opportunities of development and professional growth (4 items), this relates to the employees competence which has been contributed to by suitable training and mobility programs. An example of this kind of item is: “I think that in this company I have good opportunities to take part in training courses and to develop professionally” Satisfaction for information received (3 items) refers both to company strategies and consequentially to the company’s results. An example of this kind of item is: “I think that I am well-informed of the company’s results”. Satisfaction for pay (4 items). This area refers to the recognition gained at work for the employee’s personal contributions. These contributions can be seen in terms of personal ability and willingness to propose alternative solutions to problems. An example of this kind of item is: “I can say, with satisfaction, that my personal contributions are suitably recognized within the company I work for”. Satisfaction for relationship with superiors (3 items). This refers to the presence of a professional relationship with a superior who supplies concrete support and who encourages the professional growth of the employees under him. An example of this kind of item is: “When I am in a difficult situation or when I don’t know what to do I can count on the support of my direct superior”.
Concerning the last dimension: Affective commitment (4 items). This refers to emotional commitment and the employee’s emotional involvement in the organization. An example of this kind of item is: “The more I work in this company the more I feel I am a part of it” The statements used in the questionnaire were tested in a pilot study which consisted in interviews with both individual employees and in groups. It was found that the dimensions claimed by past studies to be important for job satisfaction also emerged from the interviews. Therefore, as seen in previous literature such as Huang & Van de Vliert (2003) it is possible to classify the factors into intrinsic and extrinsic. Characteristics that refer to the job itself and those that relate to interpersonal relationships belong to the first category while the aspects of the job connected to remuneration, status and career belong to the second category. It should be noted that in the sample used, the aspects of the job that are cited to be a source of satisfaction refer to four areas that significantly reflect the managerial level of the participants. For this reason they are only partially interchangeable with as those found in samples of lower level professionals. For
Job Satisfaction and Organizational Commitment: Affective Commitment …
321
example, elements such as those connected to information received and those connected to the relationship with superiors are suggested to be of key importance, while aspects relating to interpersonal relationships with colleagues of an equal level are not identified as important. 3.2. Procedure The administration of the questionnaire was organized in special meetings, where the professionals were informed of the aims of the questionnaire, its contents and the instructions of how to complete it. It was totally anonymous and a guaranty was given that the company’s management would not be able to trace the identity of the participants in any way. 3.3. Analysis of the data The four dimensions of job satisfaction identified in a previous study (Ferretti & Argentero, 2006) [12] were hypothesized to be predictors of commitment using a model of structural equations. A cross-validation procedure (Cudeck & Brown, 1983 [9]; Bagozzi & Baumgartner, 1994 [3]) was followed which consisted in randomly dividing the sample into two sub-samples, developing a model on the first of the two subgroups and then establishing its generalizability. The analysis’ were carried out in three steps: 1) formulation and elaboration of the model on an initial subgroup of participants; 2) generalization of the results to the rest of the sub-group; 3) testing the hypothesis of structural invariance according to type. The model that was subjected to testing is shown in Figure 1. As can be observed, it is composed of a model of measures (20 observable variables that measure 5 latent variables) and a structural model (the causal relationships between dimensions of satisfaction and commitment). To be more specific, the model allows both the validity of the factorial structure of the job- satisfaction scale and the entity of the relationship that joins the dimensions of jobsatisfaction to affective commitment to be subjected to testing simultaneously. In this model each item is bound to the saturation of its own original factor. The relative model of structural equations was analyzed with the statistical software AMOS 5 (Arbuckle, 2003) [2]. The goodness of fit was tested using the 2 test. The goodness of fit is considered sufficient when the 2 is not significant; however, given its dependence on sample size, other indices that were independent from this characteristic were considered, in particular the CFI: the comparative fix index (Bentler, 1990) [5], the TLI – Tucker-Lewis Index (Tucker & Lewis, 1973) [31] and the RMSEA – Root mean square error
322
M.S. Ferretti
v15
v10
v13
v7
v5
v1
Pay
v11
v6
v2
Information v17
v12
v8
Affective Commitment
v3
v18 v19 v20
Supervision
Development
v9
v4
v14
v16
Figure 1. The hypothesized model.
approximation (Steiger, 1990) [30]. The first two indices show values ranging from 0 to 1 but values above .90 were considered as satisfactory, as suggested by Bentler (1990) [5]. For the RMSEA, the instructions given by Browne (1990) [7] were followed. Browne (1990) suggests that values below .08 should be considered satisfying and that those below .05 should be considered as good (Browne & Cudeck, 1993 [8]; Marsh, Balla & Hau, 1996 [19]). The generalizability of the model was then checked on the second subsample (Bagozzi & Foxall, 1995) [4]. This multi-sample procedure allows the simultaneous analysis of data taken from different samples, forcing all or some of the parameters to be identical in the different groups. We tested four hypothesis’ which were progressively more restrictive: a) the equivalence of the factorial weights, b) the equivalence of the coefficients of the regression, c) the equivalence of the covariance and d) the equivalence of the measurement errors. The test of invariance requires the specification of a model in which certain parameters are forced to be the same. It also requires that this model be compared to another model which is less restrictive and in which the parameters
Job Satisfaction and Organizational Commitment: Affective Commitment …
323
Table 2. Reliability of the Job Satisfaction Scale: alpha of Cronbach. Factors Pay
(6 item)
Alpha 0.82
Information
(3 item)
0.81
Supervisors
(3 item)
0.84
Development Affective Commitment
(4 item) (4 item)
0.84 0.78
Table 3. Fit indices. 2
gl
2
/gl
RMSEA
TLI
CFI
1° sample (N= 521)
387.08
160
2.40
.05
.94
.95
2° sample (N= 521)
379.26
160
2.40
.05
.94
.95
are free to have any value whatsoever. The comparison between the models was carried out by using the chi-squared test which indicates a value of non invariance. Finally, by employing the same procedure the hypothesis of structural invariance by type was tested. 4. Results The internal validity of all the items belonging to the scales of the questionnaire was evaluated by using the coefficient alpha of Cronbach. As displayed by table 2 the coefficients vary from .78 to .84, and are therefore acceptable. The itemtotal analysis also showed positive correlations. Table 3 presents the indices of good fit considered in both the samples. As far as the formulation of the model (1st sample) is concerned, the indices prove to be satisfying according to the information suggested by previous literature (Bentler, 1990) [5]: the model therefore gives an appropriate explanation for the data. Even though the value of 2 is significant, the indices of fit are over the threshold of .90, while the error (RMSEA) remains on a threshold of .05. As the data and the theoretical model have been shown to be congruent, it is possible to proceed to the interpretation of the parameters. Table 4 shows that the parameters are all high and all significant (p < .001), which is also true of the correlations between the different dimensions of job satisfaction (Table 5).
M.S. Ferretti
324
Table 4. Model of measurement: standardized parameters. Pay V1
.66
V5 V7 V10 V13 V15 V3 V8 V12 V2 V6 V11 V4 V9 V14 V16 V17 V18 V19 V20
.56 .77 .73 .72 .57
Supervisors
Information
Development
Affective Commitment
.85 .78 .72 .76 .76 .81 .56 .52 .71 .73 .79 .82 .64 .50
The structural coefficients (table 6) are all shown to be significant (p< .05) except one: the quality of the relationship with superior members of staff which doesn’t seem to produce effects on the participants’ affective commitment to the organization. This indicates that the existence of a professional relationship in which the superior encourages open discussion, who provides support and who encourages the professional growth of the employees under him is not predictive of affective commitment. Out of the other three dimensions of job satisfaction, the significant regression coefficients indicate that the component relating to remuneration (defined as recognition gained at work for the personal contribution given in terms of competence and willingness to provide alternative solutions to problems ( = 27; p< .05) positively influences affective commitment as does the component relating to how available information is to the employee ( = .12; p<.05). However, the best predictor of affective commitment is the opportunity of development and professional growth as offered by the company ( = .48: p<.001). Overall, the portion of variance explained by the model is shown to be decidedly good (R2 =.68).
Job Satisfaction and Organizational Commitment: Affective Commitment …
325
Table 5. Factor correlations (standard errors in brackets). Pay Pay Supervisors Information Development
-
Supervisors .63 (.06) -
Information
Development
.63 (.05) .40 (.06) -
.80 (.06) .64 (.07) .65 (.06) -
Table 6. Standardized path coefficients.
Pay Supervisors Information Development
Affective Commitment .27 ** .12 * .02 .48 ***
* = p < .05; ** = p < .01; *** = p < .001 .
The second sample also presents indices of adequate fit. The generalizability of the results is shown in table 7. As can be observed, the tests of the hypothesis’ of structural invariance show a substantial invariance between the two groups for all the progressively restrictive models that were compared. The differences all prove to be insignificant, which confirms that the identified structure can be generalized. If we consider type, the move from less to more restrictive models didn’t result in any significant rise in the value of 2, which indicates that for both men and women, the parameters can be considered as invariable (Table 8). Therefore, the proposed model is equally valid for both sexes. 5. Discussion Meyer and Allen (1997) [22] claim that employees’ commitment to their organization is still important and that it will become even more so in the future for both the well-being of the individual and for the success of the organization. The present study aimed to identify which variables linked to job satisfaction could encourage emotional involvement in the company. Dimensions of job satisfaction have long been identified, by the previous literature in this area, as being correlated with different types of commitment. Furthermore, it has also been known for years that affective commitment originates from job satisfaction due to working conditions. The results presented by the current study allow us to add some more information to what is already
M.S. Ferretti
326
Table 7. Analysis of the invariance between the two samples. Model
2
RMSEA
TLI
CFI
M1: Unconstrained model
766.34
.04
.95
.94
M2: Factor loadings
775.43
.04
.95
.94
M3: Regression weights
777.83
.03
.95
.94
M4: Structural covariances
788.54
.03
.95
.94
M5: Measurement residuals
788.56
.03
.95
.94
Hypothesis Test -
M2-M1 2 = 9.08 P< .87 M3-M2 2 = 2.40 P< .66 M4-M3 2 = 10.71 P< .38 M5-M4 2 = .02 P< .89
known. The current study accepts hypothesis two but only partially accepts hypothesis one. In the specific sample of managers, affective commitment to the organization was shown to be determined by; satisfaction due to the employee receiving adequate recognition for their personal contribution, by the employee’s perception of being adequately informed as to company procedures and results and more strongly by the opportunity of professional development (defined as the opportunity for the employee to use his competences which are partly due to the company providing suitable training opportunities). However, the employees’ relationship with their superiors appears to have a neglectable effect on affective commitment. A possible interpretation of this might be that for employees with this level of contract, emotional involvement with the company might develop mainly via a strictly individual-organization bond which is not mediated by a direct boss-employee relationship. In conclusion, it is development, recognition and communication offered by the organization that encourage an increase of commitment to a company. As Cohen observed (1992) the intrinsic aspects of a job (in this case the opportunity to develop and grow professionally as offered by an organization) have an important role, even if the economic aspects (satisfaction with the jobs’ remuneration), that refer to extrinsic aspects, should not be ignored.
Job Satisfaction and Organizational Commitment: Affective Commitment …
327
Table 8. Analysis of the invariance between males and females. Model
2
RMSEA
TLI
CFI
M1: Unconstrained model
578.58
.04
.94
.93
M2: Factor loadings
592.56
.04
.94
.93
M3: Regression weights
600.12
.04
.94
.93
M4: Structural covariances
788.54
.03
.95
.94
M5: Measurement residuals
788.56
.03
.95
.94
Hypothesis Test -
M2-M1 2 = 13.97 P< .53 M3-M2 2 = 7.55 P< .11 M4-M3 2 = 11.01 P< .36 M5-M4 2 = 1.38 P< .24
6. Contributions and implications The results of this study identify numerous implications for managers. First of all, the results seem to suggest that managers should undertake regular and systematic assessments of their employees’ type of commitment and job satisfaction. It is also necessary that managers identify the level of their employees’ commitment in relation to their job satisfaction in order to develop better and more efficient strategies to improve the quality of the employees’ perception of the organization. Secondly, the organization should also evaluate managers on their ability to create a “committed” group of employees. Some evidence does, in fact, exist that suggests that organizations can financially benefit from strategies which promote employee commitment as a part of; selective staffing, evaluation of development, compensation schemes which are competitive and impartial and complete training and development activities. Thirdly, the results of this study suggest that engineering organizations and managers should work to create positive relationships and respect between the various organizational units. Dessler (1999), provided convincing reasons, as did Pfeffler & Veiga (1999), that support the idea that both organizational commitment and organizational achievements could be significantly increased.
328
M.S. Ferretti
7. Limits and directions for future research This article has presented the results of a investigation carried out to predict the nature of employees’ commitment on the basis of their perceptions of job satisfaction. The study underlines the usefulness of efforts to understand how the various factors influence the nature and form of an individual’s commitment. If managers don’t know the causes of an attitude, they can not accurately predict the behavior that might follow it. By using a prospective theory, this study throws light upon the possible relationship between affective commitment and the specific dimensions of job satisfaction. Managers can play an important role in promoting commitment by insuring that the organization makes suitable efforts to direct the content of the work to be done and the context in which is it to be done and by applying management skills that minimize employee’s dissatisfaction. References 1. N.J. Allen and J.P. Meyer, Journal of Applied Psychology 63, 1-18 (1990). 2. J.L. Arbuckle, amos 5.0 user’s guide (Smallwaters, Chicago, 2003). 3. R.P. Bagozzi and H. Baumgartner, In Principles of marketing research, Ed. R.P. Bagozzi, (Blackwell, London, 1994). 4. R.P. Bagozzi and G.R. Foxall, European Journal of Personality 9, 185-206 (1995). 5. P.M. Bentler, Psychological Bulletin 107, 238-246 (1990). 6. P.P. Brooke, D.W. Russell and J.L. Price, Journal of Applied Psychology 73, 139145 (1988). 7. M.W. Browne, Mutmum Pc: user’s guide, (Ohio State University, Department of Psychology, Columbus, 1990). 8. M.W. Browne and R. Cudeck, In Testing structural equation models, Ed. K.A. Bollen and J.S. Long, (Sage, Newbury Park, CA, 1993), pp. 136-162. 9. R. Cudeck and M.W. Browne, Multivariate Behavioral Research 18, 147-167 (1983). 10. A. Darwish, International Journal of Stress Management 9, 99-112 (2002). 11. T.A. De Cotis and T.P. Summers, Human Relations 40, 445-470 (1987). 12. M.S. Ferretti and P. Argentero, In Systemics of emergence: research and development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), pp. 535-548. 13. C. Glisson and M. Durick, Administrative Science Quarterly 33, 61-81 (1988). 14. R.D. Hackett, P. Bycio and P.A. Hausdorf, Journal of Applied Psychology 79, 1523 (1994). 15. F. Herzberg, Work and the nature of man (World Book, Cleveland, 1967).
Job Satisfaction and Organizational Commitment: Affective Commitment …
329
16. X. Huang and E. Van de Vliert, Journal of Organizational Behaviour 24, 159-179 (2003). 17. E.A. Locke, In Handbook of Industrial and Organizational Psychology, Ed. M.D. Dunnette, (Rand McNally, Chicago, 1976). 18. E.A. Locke, In Advances in motivation and achievement (Vol. 10), Ed. M.L. Maehr and P.R. Pintrich, (JAI Press, Greenwich, CT, 1997), pp. 975-412. 19. H.W. Marsh, J.R. Balla and K.T. Hau, In Advanced structural equation modelling: issues and techniques, Ed. G.A. Marcoulides and R.E. Schumaker, (Erlbaum, Mahwah, NJ, 1996), pp. 315-353. 20. E. Mathieu and D.M. Zajac, Psychological Bulletin 108, 171-194 (1990). 21. J.P. Meyer and N.J. Allen, Human Resource Management Review 1, 61-89 (1991). 22. J.P. Meyer and N.J. Allen, Commitment in the workplace (Sage, Thousand Oaks, CA, 1997). 23. J.P. Meyer, T.E. Becker and C. Vanderberge, Journal of Applied Psychology 89, 991-1007 (2004). 24. J.P. Meyer and L. Herscovitch, Human Resource Management Review 11, 299-326 (2001). 25. J.P. Meyer, N.J. Allen and C.A. Smith, Journal of Applied Psychology 78, 538-551 (1993). 26. R.T. Mowday, R.M. Steers and L.W. Porter, Journal of Vocational Behaviour 32, 92-111 (1979). 27. R.T. Mowday, L.W. Porter and R.M. Steers, Organizational linkages: The psychology of commitment, absenteeism, and turnover (Academic Press, San Diego, CA, 1982). 28. D.M. Powell and J.P. Meyer, Journal of Vocational Behaviour 65, 157-177 (2004). 29. L.M. Shore, L.A. Newton and G.C. Thornton Journal of Organizational Behaviour 11, 57-67 (1990). 30. J.H. Steiger, Multivariate Behavioural Research 25, 173-180 (1990). 31. L.R. Tucker and C. Lewis, Psychometrika 38, 1-10 (1973). 32. S.A. Wasti, Applied Psychology: an International Review 52, 533-554 (2003).
This page intentionally left blank
ORGANIZATIONAL CLIMATE ASSESSMENT: A SYSTEMIC PERSPECTIVE
PIERGIORGIO ARGENTERO, ILARIA SETTI Department of Psychology, University of Pavia, Piazza Botta, 6, 27100 Pavia, Italy E-mail: [email protected] A number of studies showed how the set up of an involving and motivating work environment represents a source for organizational competitive advantage: in this view organizational climate (OC) research occupies a preferred position in current I/O psychology. The present study is a review carried out to establish the breadth of the literature on the characteristics of OC assessment considered in a systemic perspective. An organization with a strong climate is a work environment whose members have similar understanding of the norms and practices and share the same expectations. OC should be considered as a sort of emergent entity and, as such, it can be studied only within a systemic perspective because it is linked with some organizational variables, in terms of antecedents (such as the organization’s internal structure and its environmental features) and consequences (such as job performance, psychological well-being and withdrawal) of the climate itself. In particular, when employees have a positive view of their organizational environment, consistently with their values and interests, they are more likely to identify their personal goals with those of the organization and, in turn, to invest a greater effort to pursue them: the employees’ perception of the organizational environment is positively related to the key outcomes such as job involvement, effort and performance. OC analysis could also be considered as an effective Organizational Development (OD) tool: in particular, the Survey Feedback, that is the return of the OC survey results, could be an effective instrument to assess the efficacy of specific OD programs, such as Team Building, TQM and Gainsharing. The present study is focused on the interest to investigate all possible variables which are potential moderators of the climate - outcome relationship: therefore future researches in the OC field should consider a great variety of organizational variables, considered in terms of antecedents and effects of OC, and OC studies should be conducted as crosslevel and multilevel researches. Keywords: organizational climate, assessment, systemic models.
1. Introduction In the last 15-20 years, business organizations have been placed in a context characterized by economic, social and technological changes [21]. In order to be competitive in this type of context, organizations must seek to achieve high levels of flexibility and productivity. Human resources have a key role in reaching these goals: when employees perceive that the employer wants to
331
332
P. Argentero and I. Setti
satisfy their psychological needs, they engage themselves at a higher level and invest more time and effort in their work [31,49] which results in greater competitiveness and productivity. Various empirical studies have shown how the creation of an involving and motivating organizational environment represents a basic source of competitive advantage for organizations [35,49,46]. This is why organizational climate (OC) research occupies favored position in current industrial and organizational psychology [24,56]. At the beginning of the research in this field, organizational climate was defined as a set of characteristics that describe an organization and that: (a) distinguish the organization from other organizations, (b) are relatively enduring over time and (c) influence people’s behavior in the organization itself [16, p. 362]. Reichers and Schneider [56] described climate in terms of shared perceptions of organizational policies, practices and procedures, both formal and informal [7]. OC is a product of individual perceptions of work features, events and processes but only when a consensus between individuals exists, various perceptions join each other and globally represent the organizational climate [43]. Therefore, because OC results from the aggregation of individual climate ratings, the climate strength could be described as the degree of agreement among the organization’s members about practices and policies [8,38,67]; an organization with a strong climate is “a place where events are perceived the same way and where expectations are clear” [67, p. 221], where there is no ambiguity about organizational norms and practices and there are uniform perceptions and expectations among its members: it is an organization whose members have similar understanding of the norms and practices and share the same expectations. The degree of agreement among the organization’s members could be explained through the ASA (attraction-selection-attrition) model: similar types of people are attracted to, selected and retained by organizations [62]. As a result of these processes, those people who stay with an organization are likely to have similar personalities, values and attitudes [66]. In addition, similar people usually have a similar view of the world, which leads to greater consensus regarding climate perceptions [64]. Dickson, Resick and Hanges [15] defined organizations with formally stated rules and expectations as “mechanistic organizations”: this type of workplace would support a high level of agreement among people because of the defined policies, the formalized practices and the centralized decision-making responsibility, factors that increase the consistency of the members’ behavior and performance.
Organizational Climate Assessment: A Systemic Perspective
333
2. Organizational climate origins Mayo and Lewin (1939) paved the way for the study of OC in the field of work group research. In particular, Mayo developed the notion of group “atmosphere” in order to refer to OC. From these studies focused on group dynamics, Litwin and Stringer [39] defined OC as “a molar concept which describes situations influence on personal motivation of achievement, power and affiliation”. Firstly Litwin and Stringer identified some applicative aspects of OC. Then Campbell et al. [6] described OC as a set of attributes specific to a particular organization that may be inferred from the way the organization deals with its members and its environment. For the individual within an organization, climate takes the form of a set of attitudes and expectancies that describe the organization in terms of both static characteristics (such as degree of autonomy) and behavioroutcome and outcome-outcome contingencies [6, p. 390]. In the past, OC was often confused with similar constructs, such as satisfaction [19,34] and organizational culture. But in the last 25 years, thanks to various studies on Psychological Climate, it has been defined in more specific manner, as a molar construct which includes most significant psychological representations of individuals about organizational structure, processes and events [25,58]. The main interest in Psychological Climate is due to the fact that some authors argue that Psychological Climate research gave rise to OC climate. On the one hand, psychological climate refers to the individuals’ perceptions of their environment and the meaning they attach to it. Therefore it is an individual feature and it must be placed on a individual level of analysis. On the other hand, OC reflects beliefs about the organization’s environment that are shared by the members and to which they attach psychological meaning through which they make sense of their environment [26,24,61,64]. Therefore, the unit of analysis for OC researchers should be typically the aggregate of the individual climate ratings, with aggregation occurring only after sufficient withinorganization agreement has been reached [15]: an organizational climate is said to exist only when agreement among individual members exists [29]. Then, two distinct and important characteristics of organizational climate are the means of the aggregate perceptions and the amount of agreement among members [67]. 3. Meaning and method of OC measurement Some climate studies seem to be more concerned with measurement techniques and instruments than with understanding the theoretical concepts or constructs they are measuring. On the contrary, they should first outline the conceptual
334
P. Argentero and I. Setti
boundaries of the organizational climate and then measure it. In other words, the theoretical definition should guide the choice of their measurement tools and techniques but, as shown in some past studies, organizational climate researchers have frequently adopted a strategy that goes the other way, that is from practice to theory. So, what we are underlining is that theoretical and conceptual issues should serve as the basis for the practical measurement [24]. OC survey is usually conducted in order to carry out a self-evaluation process through which it is possible to understand how people perceive and interpret their work environment: as weather forecast should anticipate atmospheric events, organizational assessment should anticipate the organizational ones [41, p. 157]. As for the measure instruments, in general we can distinguish between specific measurements and broader ones. If the aim is to predict a specific outcome (e.g., safe behavior) it is useful to measure the perceptions of a specific climate (e.g., safety climate) because this is evidence that specific climates can predict specific outcomes; conversely, if the interest is to predict broad organizational outcomes (e.g., job performance) it is practical to use a broader taxonomy of climate as a molar construct [7]. Independently of the adopted measure methodology, OC assessment could be considered as snapshot of the organization at the exact time the survey was conducted. However OC assessment should be developed in a research-action perspective: it should be managed as a means of knowledge of the current situation, in order to implement improvements plans. In fact the theoretical basis of OC assessment is Lewin’s research-action theory, characterized by cyclicity: an organizational analysis should be the first step for the development of subsequent ameliorative actions and processes. Therefore, a climate survey should be completed by an action plan developed on the basis of the organization’s strength and weakness features: the former must be used in order to improve the latter. A measurement process should be repeated regularly, in order to check the efficacy of actions developed after the last survey and to examine the changes occurred. Measuring organizational climate implies measuring several organizational variables in terms of antecedents and components of the construct: from this point of view, OC assessment should be considered a systemic assessment process, because it involves the simultaneous evaluation of many dimensions.
Organizational Climate Assessment: A Systemic Perspective
335
3.1. Measurement variables and instruments As previously argued and also suggested by various other studies, there are many dimensions which could be assessed using an OC survey. For example, Forehand and Gilmer [16] claimed that the dimensions of organizational climate which have to be included in an evaluation process are: organizational size, structure, systems complexity, leadership style and goal directions [24]. In a review of four studies [30,39,65], Campbell et al. [6] proposed a complex model in which they identified the following dimensions included in organizational climate and the factors on which they were based: 1) individual autonomy, whose related factors are individual responsibility, agent independence, rules orientation and opportunities for exert individual initiative; 2) the degree of the structure imposed upon the position. The variables included in this dimension are: structure, managerial structure and closeness of supervision; 3) reward orientation, whose related factors are reward, general satisfaction, promotion achievement orientation, being profit minded and sales oriented; 4) consideration, warmth, and support. The factors included in this dimension are: managerial support, management of subordinates, warmth and support. Campbell et al. [6] admitted that their list of dimensions was partial, and many factors of organizational climate still had to be determined. Ostroff’s taxonomy (1993) may be considered the most complete taxonomy of organizational climate [7]. It identifies a framework, made up of 12 climate dimensions, grouped in three higher order facets: affective, cognitive and instrumental climate perceptions, described as follows: 1. affective perceptions, concerned with interpersonal and social relations among workers. This dimension covers: participation, cooperation, warmth, and social rewards; 2. cognitive climate perceptions, concerned with the individuals’ involvement in their work activities. The variables included in this dimension are: growth, innovation, autonomy and intrinsic rewards; 3. instrumental perceptions, concerned with task involvement, which includes: achievement, hierarchy, structure and extrinsic rewards [7]. The research of Majer, Marcato and D’Amato [41] developed an organizational climate model, identifying four major measurement areas, described as follows: relationship quality, stress perception, power perception, creativity and risk acceptance. Relationship quality consists of individual perceptions of personal relationships in their organization. It covers socialization level, participation level, trust level, relational equality level and friendship level.
336
P. Argentero and I. Setti
Stress perception is the quality of the bond between the single individual and his/her environment. People feel stressed when their environment demands exceed their personal resources, so it could be defined as a symptom of imbalance between environment demands and individual resources. Power perception is the degree of power that each individual perceives to exert on his/her organization and, conversely, the degree of power that each individual perceives his/her organization exerts on him/herself. Power perception refers to the subjective and psychological perception rather than to hierarchical and objective perception. Creativity is the ability to create non conventional ideas, while risk acceptance is the willingness to support personal ideas with a certain amount of risk. Although the above classifications can be said to describe the four major areas of measurement, the dimensions that should be included in an OC survey should be chosen on the basis of the specific organization’s values, policy and mission. Thanks to its advantages, a questionnaire is the most common instrument used to measure OC: it can be administered to many people; it allows quick quantitative analysis because it is usually self-report, so the time required is remarkably reduced. A questionnaire usually includes multiple choice questions which are analyzed on the basis of significant variables, such as gender, length of service and work status. For these reasons, a questionnaire is the most common instrument used to assess OC. However, some authors suggest to rely on qualitative instruments, too, such as interviews, observation and focus groups [10] because the use of both quantitative and qualitative measure instruments improves studies efficacy and quality level in the field of psychosocial research [23]. Therefore the sequential use of focus group, questionnaire and interview seems to be the most effective methodology: through focus groups we can gather information about people in an explorative perspective; then, through a questionnaire, it could be possible to gather information about many people in a very short time; lastly, information gathered through a questionnaire can be used to conduct deep and detailed analysis through interviews. 4. The organizational climate in a systemic perspective Interest in conducting studies that simultaneously consider individual characteristics (e.g., attitudes, perceptions, behaviors) and organizational ones (e.g., climate, culture, performance) in the field of industrial and organizational
Organizational Climate Assessment: A Systemic Perspective
337
psychology has grown in the past decade. They can be defined as cross-level and multilevel studies and can be placed in a systemic perspective [45]. More in detail, OC is linked with many organizational variables, in terms of antecedents and consequences of the climate itself. It has been theoretically related to antecedent variables, such as the organization’s internal structure and its environmental features, and has been described as an important determinant of individual and organizational outcomes [6,28,27,33,48]. An example of a systemic model of organizational climate was developed by James and Jones [24,27] who focused on the conceptual relationship between climate perceptions and some end-result criteria. The model of James and Jones was revised by Kopelman, Brief and Guzzo [33], who argued that climate perceptions impact on important behaviors and attitudes through key cognitive and affective states. In the light of the theories described above, Carr et al. [7] proposed the model shown in diagram 1. On the left of the diagram, there are three higher order factors of climate postulated by Ostroff [44]: affective (related to social relations among workers), cognitive (concerned with involvement in work activities) and instrumental (which concerns getting things done in the organization); in the central part are the two process variables of job satisfaction and organizational commitment and lastly, on the right, the three outcomes: job performance, withdrawal and psychological well-being. Carr et al. [7] assert that OC indirectly impacts on outcomes of interest, through its impact on specified cognitive and affective states. So, through the mediation of these cognitive and affective states, the three higher order facets of OC explain a meaningful amount of variance in individual-level work outcomes. 4.1. Organizational safety climate in the systemic perspective 4.1.1.
Relationship between meaningful/safety climate and individual outcomes
Like Carr et al. [7], also Brown and Leigh [5] studied the OC in a systemic perspective, on the premise that the employees’ favorable perceptions of the organizational environment are positively related to key outcomes such as job involvement, effort and performance. They argued that when employees perceive the organizational environment positively, that is in a manner consistent with their values and interests, they are likely to identify their personal goals with the organization’s ones and, in turn, to invest greater effort to pursue them. In particular, when employees perceive the work environment as
338
P. Argentero and I. Setti Climate
Cognitive and affective states
Outcomes
Job performance
Affective Job satisfaction
Psychological Well-being
Cognitive
Organizational Commitment Instrumental
Withdrawal
Figure 1. A systemic model of organizational climate (Carr et al., 2003)
meaningful and psychologically safe, they are likely to show higher levels of job involvement, effort and performance. In order to understand Brown and Leigh’s model, the concepts of “psychological safety”, “psychological meaningfulness”, “job involvement” and “effort” must be described and their links with OC explained. Psychological safety is the employee' s “sense of being able to show and employ one' s self without fear of negative consequences to self-image, status, or career” [31, p. 708]. The climate dimensions indicative of psychological safety are: (1) supportive management: employees feel they have control on their tasks and on their work methods thanks to managers who are perceived as flexible and supportive; (2) clarity: organizational roles and norms are clearly perceived; (3) self expression: employees feel free to express their self-concepts in their work roles. Psychological meaningfulness could be described as “a feeling that one is receiving a return on investments of one' s self in a currency of physical, cognitive, or emotional energy” [31, pp. 703-704]. In other words, people perceive their work as meaningful when they consider it challenging and rewarding. The dimensions of psychological climate indicative of psychological meaningfulness are: (1) perceived meaningfulness of contribution: employees feel that they contribute significantly to the achievement of organizational goals; (2) recognition: employees perceive that the organization adequately recognizes their contributions; (3) challenge: people feel their work is challenging and could lead to personal growth. Lastly, Job Involvement can be defined as a cognitive belief state of psychological identification with one' s job [32,36,40,55] and effort as the means by which motivation is translated into work outcomes and it’s also the mediator between job involvement and work performance.
Organizational Climate Assessment: A Systemic Perspective
Organizational Climate
Job Involvement
Effort
339
Performance
Figure 2. Safety climate in a systemic perspective (Brown and Leigh, 1996)
Brown and Leigh’s model is represented in diagram 2. This model explains the process through which the employees’ perceptions of OC are related to job involvement, effort and performance. If the employees perceive their work environment as psychologically safe and meaningful, they are likely to be more involved and committed in terms of time and energy spent on the workplace. In turn, involvement and effort are positively related to a greater performance. An organizational climate perceived as psychologically safe and meaningful results in greater productivity through the mediation of job involvement and effort. Job involvement has an indirect impact on performance through the mediation of effort: only when the effort is not included in the model there is a modest but statistically significant direct relationship between job involvement and performance, which becomes non-significant once included in the model. At the same time, the effect of psychological climate on effort is indirect, mediated by job involvement. In detail: when management is perceived as supportive, work roles are clear, employees feel free to express themselves, feel that they are contributing to organizational outcomes, feel appropriately rewarded by their organization and perceive their work as challenging, they are likely to be more job-involved and to exert greater effort. Thus, a high level of job involvement and effort equate to a high-level performance. Brown and Leigh’s model could be considered in a systemic perspective because it explains how OC perceptions impact on organizational performance through the mediation of individual variables of job involvement and effort [35,49]. 4.1.2.
Safety climate as the mediator between job insecurity and safety outcomes
Another approach to OC study is focused on safety behaviors that people display in their work environment. For example, Probst [53] studied the role of organizational safety climate in a systemic perspective as a mediator between perceptions of job insecurity and employees’ safety outcomes.
340
P. Argentero and I. Setti
Organizational safety climate can be defined as “a unified set of cognitions (held by workers) regarding the safety aspects of their organization” [68, p.101]. When people work in a climate of job insecurity, caused by fear of layoffs, they frequently display multiple negative effects, in terms of bad physical health conditions [57], high levels of psychological distress [14] and low job satisfaction [12]. Moreover, when employees are dissatisfied with their perceived job security, they are also less committed to the organization [2], they frequently engage in work withdrawal behaviors such as absenteeism, tardiness and task avoidance [52]; they are also likely to quit their job [2,12]. On these premises, employees’ safety climate perceptions could explain the relationship between job insecurity and safety attitudes, behaviors and outcomes, reducing the detrimental effects on the employees’ physical and psychological health. In other words, safety climate plays a key role in determining whether, and to what extent, job insecurity has a negative impact on the individual’s safety outcomes. The organizational safety climate includes a set of dimensions, predictive of work safety-related outcomes such as accidents and injuries, safety compliance, safety motivation and safety knowledge [4,13,22,42]. These factors are: management values, the extent to which the management focuses on safety; safety communication, the extent to which information regarding safety are openly exchanged; safety training, the extent to which training is accessible, relevant and comprehensive; safety systems, the extent to which safety procedures are perceived to be effective in accident prevention. Probst [53] argues that those employees who perceive that their organization has a strong safety climate, also exhibit a good safety knowledge, have a high level of safety compliance and experience fewer accidents and injuries in comparison to those employees who perceive their organization to have a weak safety climate. Organizations demonstrate the importance of safety when their employees are requested to focus on safety compliance in case they wish to retain their job. On the contrary, those organizations that demonstrate less attention to safety matters, may lead their employees to believe that a great level of attention on safety may not be critical to retain their job [53]. In conclusion, organizational safety climate perception plays a moderating role in the relationship between job insecurity and safety outcomes [47,54]: more in detail, a strong organizational safety climate would soften the negative effects of job insecurity on the employees’ safety outcomes. Therefore an organization’s safety climate has an important moderating effect on the negative consequences of job insecurity. In particular, a strong safety climate lowers or
Organizational Climate Assessment: A Systemic Perspective
341
erases the negative effects of job insecurity on safety knowledge, safety compliance, amount of accidents and workplace injuries. 4.2. OC survey as an organizational development tool Pursuing the organization’s outcomes, each employee could contribute to organizational development. Employees have two basic resources to show their involvement in the organization: time and energy, used to give body to effort [5]. These resources are completely under the individual’s control. Because of this high degree of individual control, effort is likely to be sensitive to the employees' subjective perceptions of OC. When people perceive that the organization satisfies their psychological needs, they are likely to respond by investing more time and energy toward the achievement of organizational outcomes. That is, there is a direct positive relationship between OC perceptions and employees’ effort. Nowadays organizations are placed in a context characterized by continuous changes they must implement; Hitt argues that “in the new millennium organizations must be in continuous transformation. They have to develop and constantly react to new technologies, new markets, new business and people, both employees and clients” [21, p. 51]. In order to reach this objective, Organizational Development (OD) is an effective strategy, which could be defined as an organization’s effort to improve efficacy through human behavior knowledge [3]. There are many OD programs with common features [50]: they are supported by the organizational management, engage all the organization levels, are focused on specific goals, support processes of organizational diagnosis and of action plans implementation [37]. Organizational Climate is considered as a construct capable of supporting innovative changes [1] because it influences work attitudes which, in turn, support innovation [11,17,18]. OC survey could therefore be considered an OD program. Levy [37] suggests a classification of the most effective OD strategies described below. OC survey could be run together with other OD strategies, though it is characterized by a key component: feedback. • Team Building: a strategy used to build new work teams or to improve efficacy of existing teams. It is often used as an OD strategy [59]. • Total Quality Management (TQM): the basis is the employees’ engagement in quality control and it is delivered by those organizations focused on product and service quality. • Gainsharing: after productivity improvements, employees receive a salary bonus. It is based on a link between performance and money bonuses in
342
•
•
•
P. Argentero and I. Setti
order to support individual engagement, satisfaction and organizational productivity. Techno-structural interventions: they can support the efficacy and the effectiveness of those organizations whose OD processes are based on technology and structural features. For example, the organization chart transformation is a strategy used to build a “matrix” structure in which every subject has two authorities, one referred to the product and the other to the function. It is a complex structure. However, it has some important advantages: for example it supports communication and cooperation. Reengineering or business process redesign is another example of technostructural processes. It is a general re-design of organizational processes accomplished in order to improve performance, and which can be measured in terms of costs, quality, available services and quickness [20]. Organizational Transformation: it consists of interventions which imply values, objectives and mission change [51]. The organizational culture change is an example of an Organizational Transformation process: organizational culture changes could improve organizational efficacy and productivity. Knowledge Management is another example of this kind: it consists of efforts made in order to develop, spread and effectively use knowledge. Organizations which use Knowledge Management are usually defined as learning organizations and are characterized by: a structure that supports learning processes, a knowledge management system focused on organizational success, a recognition of training effort, an organizational culture focused on change and creativity, a leadership actively engaged in learning processes [9]. Survey feedback: questionnaires are used in order to systematically gather organizational data, later to be analyzed and processed in order to plan a change process. In this way OC survey can be considered a change strategy but, for its efficacy, it is important to pay attention to the way used to return the results to the employees because they will be promoters and executors of the change process itself [37]. Organizational management has a key role in a key step: it has to return the results to the employees. For the people to become aware of the strength and weakness areas, results feedback is a key step which is fundamental to shape an action plan, built in strict collaboration of managers and employees.
In short, there are many OD strategies that can be used to assess individual perceptions about different organizational events and processes. In detail, the
Organizational Climate Assessment: A Systemic Perspective
343
Survey Feedback, that is the return of the OC survey results, is an effective instrument to assess the efficacy of the different OD programs described above, such as Team Building, TQM, Gainsharing and so on. 5. Conclusion In the present economic context, characterized by many technological and productivity changes, resources necessary to organizational success are not material but rather human - because people are the subjects really able to improve competitiveness. This is the reason why I/O Psychology is nowadays focused on the improvement of the employees’ organizational behaviors and work life quality [60]. Competitive organizations can improve the employees’ motivation and satisfaction: for this reason there is a strong interest in OC research. In fact, OC surveys should be used to program and implement action plans, built on the areas of strength and weakness it has identified. This helps organizations to execute changes and achieve improvement objectives, which in turn will influence employees’ motivation and satisfaction levels. The Organizational Development (OD) processes confirm the key role of human resources: these processes are usually primarily focused on people, rather than on technologies or products. There are different OD programs, and organizational climate has been identified as a construct apt to improve innovative changes [1]. In fact it influences work attitudes which, in turn, favor innovation [11,17,18]. In detail, the most relevant aspect of an OC survey is its feedback: its efficacy depends on the organizational management’s role which should be actively engaged in the process and should return results to the employees. Survey feedback show strength and weakness of the areas built on a strict teamwork between managers and employees, through which they can shape an action plan. When survey feedback is managed this way, it can be used to assess OD process efficacy such as Team Building, TQM, Gainsharing and so on. OC studies are cross-level and multilevel, because they involve a large variety of organizational variables, considered in terms of antecedents and effects of OC. That is, OC research can be viewed in a systemic perspective: climate perceptions impact on individual behaviors and attitudes (i.e. performance, well-being and withdrawal) through key cognitive and affective states (i.e. satisfaction and commitment) [33]. The interest for studying OC in a systemic perspective is also due to the fact that the employees’ perception of their work environment as a meaningful item, impacts on the organizational performance through the mediation of individual variables (such as job involvement and effort; [35,49]).
344
P. Argentero and I. Setti
In conclusion, the scientific literature in the OC field shows many empirical research studies, focused on specific practical issues, while there are not many theoretical studies about OC considered in a systemic perspective; in detail, one of the identified gaps in the climate literature consists of researches on the mediating links between organizational climate and outcomes [43]. What is noteworthy is the importance to understand what kind of environmental and individual features can influence the individual’s perceptions of shared experiences and how these characteristics become important outcomes. From an applicative point of view, the knowledge of what aspects of the work environment are related to a particular outcome can help to determine where to focus an intervention effort. The future research in the OC field should aim to investigate cognitive and affective states that play a key role in the climateoutcome relationship, besides those identified by Carr et al. (job satisfaction and organizational commitment). For example, work motivation may be assumed to be a mediating variable in this relationship. The next aim should be then to investigate all possible variables which are potentially moderators of the climate - outcome relationship: that is, it could be suggested to study OC using a systemic perspective, analyzing both organizational (e.g. size and demographic characteristics) and individual variables (e.g., cognitive ability, conscientiousness and core self-evaluative traits) which could mediate this relationship. From a systemic perspective, it could also be useful to better understand the relationship between climate and culture [43,63], because both constructs are focused on the way people experience and make sense of their organizations [63], but climate specifically refers to what happens in an organization and culture refers to why it happens [43]. Thus, they are not competing but complementary concepts and they should be deeply studied when the understanding of organizations psychological life is one’s research objective [63]. References 1. G.A. Aarons, Child and Adolescent Psychiatric Clinics of North America 14, 255271 (2005).
2. S. Ashford, C. Lee and P. Bobko, Academy of Management J. 32, 803-829 (1989). 3. R. Beckhard, Organization development: Strategies and models (Addison-Wesley, Reading, MA, 1969).
4. R.L. Brown and H. Holmes, Accident Analysis and Prevention 18, 455-470 (1986). 5. S.P. Brown and T.W. Leigh, J. of Applied Psychology 81(4), 358-368 (1996). 6. J.P. Campbell, M.D. Dunnette, E.E. III. Lawler and K.E. Jr. Weick, Managerial behavior, performance, and effectiveness (Mc- Graw-Hill, New York, 1970).
7. J.Z. Carr, A.M. Schmidt, J.K. Ford and R.P. DeShon, J. of Applied Psychology 88(4), 605-619 (2003).
Organizational Climate Assessment: A Systemic Perspective
345
8. J.A. Colquitt, R.A. Noe and C.L. Jackson, Personnel Psychology 55, 83-109 (2002). 9. T.G. Cummings and C.G. Worley, Organization development and change (7th ed.) 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.
(South-Western College Publishing, Cincinnati, 2001). A. D’Amato and V. Majer, Il vantaggio del clima (Cortina Editore, Milano, 2005). F. Damanpour, Academy of Management J. 34, 555-590 (1991). J. Davy, A. Kinicki and C. Scheck, J. of Vocational Behavior 38, 302-317 (1991). N. Dedobbeleer and F. Beland, J. of Safety Research 22, 97-103 (1991). S.W. Dekker and W.B. Shaufeli, Australian Psychologist 30, 57-63 (1995). M.W. Dickson, C.J. Resick, P.J. Hanges, J. of Applied Psychology 91, 351 (2006). G.A. Forehand and B.V.H. Gilmer, Psychological Bulletin 62, 361-382 (1964). R.T. Frambach and N. Schillewaert, J. of Business Research Special Issue: Marketing Theory in the Next Millennium 55, 163-176 (2002). C. Glisson, Clinical Child and Family Psychology Review 5, 233-253 (2002). R.M. Guion, Organizational Behavior and Human Performance 9,120-125 (1973). M. Hammer and J. Champy, Reengineering the corporation: A manifesto for business revolution (HarperCollins, New York, 1993). M.A. Hitt, Organizational Dynamics, Winter, 7-17 (2000). D.A. Hofmann and A. Stetzer, Personnel Psychology 49, 307-339 (1996). T.L. Jacob, in American Pragmatism and Communication, Research, Ed. D.K. Perry (Lawrence Erlbarum Associates, Mahwah, 2001). L.R. James and A.P. Jones, Psychological Bulletin 81, 1096-1112 (1974). L.R. James, J.J. Hater, M.J. Gent, J.R. Bruni, Personnel Psychology 31, 783 (1978). L.A. James and L.R. James, J. of Applied Psychology 74, 739-751 (1989). L.R. James and A.P. Jones, Organizational Behavior and Human Performance 16, 74-113 (1976). L.R. James, L. A. James and D. K. Ashe, in Organizational climate and culture, Ed. B. Schneider (Jossey- Bass, San Francisco, 1990), pp. 40-84. L. R. James, W. F. Joyce and J. W., Jr. Slocum, Academy of Management Review 13, 129-132 (1988). R. Kahn, D. Wolfe, R. Quinn, J. Snoek and R. Rosenthal, Organizational stress: Studies in role conflict and ambiguity (Wiley, New York, 1964). W.A. Kahn, Academy of Management J. 33, 692-724 (1990). R.N. Kanungo, J. of Applied Psychology 67, 341-349 (1982). R.E. Kopelman, A.P. Brief, and R.A. Guzzo, in Organizational climate and culture, Ed. B. Schneider (Jossey-Bass, San Francisco, 1990) pp. 282-318. W.R. LaFollette and H.P. Sims, Organizational Behavior and Human Decision Processes 13, 257-278 (1975). E.E. III. Lawler, The ultimate advantage: Creating the high-involvement organization (Jossey-Bass, San Francisco, 1992). E.E. III Lawler and D.T. Hall, J. of Applied Psychology 54, 305-312 (1970). P.E. Levy, Industrial/Organizational Psychology. Understanding the workplace (Houghton Mifflin Company , Boston/New York, 2003). M.K. Lindell and C.J. Brandt, J. of Applied Psychology, 85, 331-348 (2000). G.H. Litwin and R. Stringer, Motivation and Organizational Climate (Harvard University Press, Cambridge, 1968).
346
P. Argentero and I. Setti
40. T.M. Lodahl and M. Kejner, J. of Applied Psychology 49, 24-33 (1965). 41. V. Majer, A. Marcato and A. D’Amato, La dimensione psicosociale del clima organizzativo (Franco Angeli, Milano, 2002).
42. A. Neal, M.A. Griffin and P.M. Hart, Safety Science 34, 99-109 (2000). 43. C. Ostroff, A.J. Kinicki and M.M. Tamkins, in Comprehensive handbook of 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68.
psychology, Vol. 12: I/O psychology, Ed. W.C. Borman, D.R. Ilgen and R.J. Klimoski (John Wiley & Sons, New York, 2003) pp. 565-594. C. Ostroff, Organizational Behavior and Human Decision Processes 56, 56 (1993). C. Ostroff, A.J. Kinicki, M.A. Clark, J. of Applied Psychology 87, 355-368 (2002). C.P. Parker, B.B. Baltes, S.A. Young, J.W. Huff, R.A. Altmann, H.A. Lacost and J.E. Roberts, J. of Organizational Behavior 24, 389-416 (2003). S.K. Parker, C.M. Axtell and N. Turner, J. of Occupational Health Psychology 6, 211-228 (2001). R.L. Payne and D.S. Pugh, in Handbook of industrial and organizational psychology, Ed. M.D. Dunnette (Rand McNally, Chicago, 1976), pp. 1125-1 1173. J. Pfeffer, Competitive advantage through people: Unleashing the power of the work force (Boston Harvard Business School Press, Cambridge, 1994). C. Piccardo, L. Colombo, Governare il cambiamento (Cortina, Milano, 2007). J.I. Porras and R.C. Silvers, in Annual Review of Psychology, Ed. M.R. Rosenzweig and L.W. Porter (Annual Reviews Inc., Palo Alto, 1991) 42, 51-78. T.M. Probst, in The psychology of work: Theoretically based empirical research, Ed. J.M. Brett and F. Drasgow (Erlbaum, Mahwah, 2002), pp.141–168. T.M. Probst, J. of Occupational Health Psychology 9(1), 3-10 (2004). T.M. Probst, T.L. Brubaker, J. of Occupational Health Psychology 6, 139 (2001). S. Rabinowitz and D.T. Hall, Psychological Bulletin 84, 265-288 (1977). A.E. Reichers and B. Schneider, in Organizational Climate and culture, Ed. B. Schneider (Jossey-Bass, San Francisco, 1990). E. Roskies and C. Louis-Guerin, J. of Organizational Behavior 11, 345-359 (1990). D.M. Rousseau, in International review of industrial and organizational psychology, Ed. C. Cooper and I. Robertson (Wiley, New York, 1988), pp. 139-158. S. Salas, J.F. Cox and H.P. Sims Jr, Group and Organization Management 22(2), 185-209 (1997). W.B. Schaufeli and A.B. Bakker, J. of Organizational Behavior 25, 293-315 (2004). B. Schneider, Personnel Psychology 28, 447-479 (1975). B. Schneider, Personnel Psychology, 40, 437-453 (1987). B. Schneider, in Handbook of organizational culture and climate, Ed. N.M. Ashkanasy, C.P.M. Wilderom and M.F. Peterson (Sage, Thousand Oaks, CA, 2000) pp. 17-21. B. Schneider and A. Reichers, Personnel Psychology 36, 19-39 (1983). B. Schneider and C.J. Bartlett, Personnel Psychology 21, 323-333 (1968). B. Schneider, H.W. Goldstein, D.B. Smith, Personnel Psychology 48, 747 (1995). B. Schneider, A.N. Salvaggio, M. Subirats, J. of Applied Psychology 87, 220 (2002). D. Zohar, J. of Applied Psychology 65, 96-102 (1980).
ENVIRONMENT AND URBAN TOURISM: AN EMERGENT SYSTEM IN RHETORICAL PLACE IDENTITY DEFINITIONS MARINA MURA Dipartimento di Psicologia, Università degli Studi di Cagliari [email protected] Within the systemic framework of Environmental Psychology (Bechtel and Churchman, 2002) and following Urry (2002) and Pearce’s approaches (2005), the aim of this research is to investigate within the context of urban tourism which world views emerge from a Discourse Analysis (Edwards, Potter, 1993). of the speech of native and nonnative Sardinian residents. It addresses the issue of how social-physical diversity might be preserved (the problem of tourism sustainability, Di Castri, Balaji, 2002). In this regard, forty in-depth narrative interviews of inhabitants with short- and long-term residential experience in Cagliari (Italy) were conducted and examined (Discourse Analysis). It was found that the native and non-native’s rhetorical devices expressed similar representations of urban places, but in diverse relationship to social and place identity. Their environmental transitions were based on the tourist gaze, or the functional view and heritage pride. This displays some basic central dimensions of sustainable tourism. Keywords: sustainable urban-tourism, social identity, place identity, Discourse Analysis.
1. Introduction Urban tourism has not been investigated enough in psychology although its social importance increases with its diffusion. Understanding tourist behavior and experience in a globalized world, and especially in urban places in which an increasing number of people live, is today very important because of its changing impact on the social environment (Di Castri and Balaji 2002 [11]; Vereczi 2002). The framework of the Transactional-Contextual Paradigm of Environmental Psychology (Bechtel and Churchman, 2002 [6]) underlines the need to take into account systemic connections between peoples’ representations and socialcultural aspects of places and their bio-physical features in order to understand an environment and the trends of its changes. The origins of environmental psychology are rooted in a variety of social and scientific issues that include a worldwide concern with the environment and ecological movements, increased criticism of laboratory methods and the 347
348
M. Mura
advancement of naturalistic research, an interdisciplinary ethos, and a focus on molar, global units of analysis. The present Transactional Contextual Model, based upon the organism world views of systems theorists (Von Bertalanffy, 1968 [32]; Laszlo, 1972 [20]), assumes a central role in the complex set of relationship between elements. The systemic approach defined psychology as the study of dynamic and holistic psychological systems in which human and environmental components exhibit complex reciprocal relationships and influences. This approach is sensitive to the role of temporal factors and describes feedback loops and ongoing reciprocal and mutual influences within the system. Some environmental psychologists (Altman, Rogoff, 1987 [1]) stress some criticism of change processes “linked to underlying regulatory principles such as homeostasis, and/or teleological principles. […] chance is usually associated with system movement towards an ideal state and reflects the “location” of a system in respect to an ideal stable condition” […] Although both transactional and organismic orientations emphasize the study of holistic person-environment units of analysis, they differ in their conceptions of how holistic systems are composed and operate. In the transactional view the whole is composed of inseparable aspects that simultaneously and conjointly define the whole” (pp. 23-24). The term aspects refers to features of a system that must consider other features of the system in order to adequately understand their functioning. From this perspective, change is inherent to the system, and studying such transformations is necessary in order to understand the phenomenon. The analysis focuses on the change configuration such as in the quantum and relativity theories of physics. Psychological outcomes are variable, emergent, and novel because the configurations of people, psychological processes, and contexts cannot be wholly predicted from a knowledge of the separate aspects of the system. Psychological events are purposive, intentional, and goal directed; goals and purposes are based on shortand long-term motives, social norms, emergent qualities of phenomena and other factors. Often there are multiple and flexible goals at work in the same transactional configuration. A transactional view does not eschew prediction or general principles of psychological functioning because psychological dynamic events, while variable, may form and display general patterns across similar events. Phenomena are intrinsically dynamic, but not necessarily random. Therefore consistencies across similar events may or may not allow for general statements and theories, and the transactional perspective is interested in both unique events as well as patterns across similar events. From this perspective, we studied urban tourism as a system to understand how urban social representations of “tourism” and social-place identities
Environment and Urban Tourism: An Emergent System in Rhetorical …
349
(Proshansky et al., 1983; Tajfel, Turner, 1986 [28]; Bonaiuto et al., 2003 [8]) affect the behavior of different groups of residents, native and non-native, with varying degrees of residential stability ‘in loco’. In this study, tourism was regarded and analyzed as a very important social-psychological experience based on a transactional relationship between a place’s (in this case the city of Cagliari, Sardinia: the second-largest island in the Mediterranean Sea) physical-social environmental features and its inhabitants (natives and tourists). Following the post-tourism approach (Urry, 2002 [31]), we define tourism as a social-cultural form, a “glance” experience pre-formed by direct and mediated communication, no longer with spatial and temporal borders or well-defined places. We attempted to investigate just how the representation of any given place as “tourist” comes about through a Discourse Analysis (Antaki, 1994 [2]; Speer and Potter, 2000 [27]) of a pathway of narrative and semi-structured interviews (Atkinson, 1998 [5]) of native and non-native inhabitants. City choice and subject typology were dictated by the assumptions of a number of psychologists and sociologists (Mannell, Isho Ahola, 1987 [22]; Urry, 2002 [31]; Argyle, 1996 [3]; Ashworth, 2003 [4]), according to whom contemporary cities reveal a convergence of “tourism” and leisure activities, since the objects, modes, and places involved in well-being and recreation/relaxation are many and varied. Furthermore, Pearce (2005) [24] underlines the influence of the representations of a place on tourists’ on-site experiences: social representations of tourism are rooted in mass-media messages and interpersonal dialogues, and drive tourist and inhabitant behavior. Therefore, the choice of Discourse Analysis was dictated by considerations of theoretical and methodological nature. In terms of theory, we followed the methodological principles of transactional view. We consider the observer’s role in events, take setting and context into account, seek to understand the participants’ perspective, and emphasize the study of process and change. Thus we choose an “emic approach” (Harris, 1985 [17]) – in-depth interview Discourse Analysis – because we agree with the claim that the communication of daily life is mainly rhetorical, and that rhetorical statements create and express the representations and attributions of social groups (Billig, 1996 [7]; Potter and Wetherell, 1998 [25]; van Dijk, 2003) and the system in which they develop: “Discourse psychology disputes the “window on mind” epistemology of language that is generally implicit in attribution theories […] by outlining an alternative set of principles, which can be term the Discursive Action Model (DAM)” (Edwards, Potter, 1993 [14, pp. 23-24]). Its three major principles are:
350
1. 2. 3.
M. Mura
attributions are discursive action, attributions are factual reports and descriptions rhetorically organized to counter alternatives, reports involve two levels of accountability, that in the reported event and that of the current speaker who is making the report. DAM is a model of action.
From a methodological perspective, it became clear that only a discourse analysis of narrative and semi-structured interviews is able to capture how the “arena of social action, with constructive and pragmatic relationship to world and thought” (Edwards, Potter, 1993 [14, p. 37]) is revealed through speech acts. The interpretational repertoires of “versions of the world” fall within the constraints imposed by language, but remain “socially” constructed and vary on the basis of the discursive actions that emerge in any given situation. The choice of which of the available “versions” to utilize is the outcome of social-individual “construction” and reflects a social sub-system of the world. In the interviewers’ “versions of the world”, we expected to find out how people-place transactions influence the system of the “city of Cagliari” as a “tourist place” to contribute something about the reasons for which this city, despite its many tourist resources and services (a nice skyline, a beautiful beach, a temperate climate, an important historical center, an interesting ecosystem – the sea and two ponds with important animal species – and transport facilities – airport, port, railway), is not considered a “tourist place”. Data available on visitors, above all during the summer period, confirm that the past few years have seen a constant drop in presences during the summer months, with respect to the other months of the year. 2. Method To the aim of knowing how and which “versions of the world” are developed by people-environmental transitions in Cagliari, we performed a deep narrative discursive analysis on 40 narrative interviews (Atkinson, 2000; Bruner, 1992) centered on a specific place (the city of Cagliari) considered a “potential” tourist city. In the spring of 2003, 8 graduates, who we can consider trainees, and not professional interviewers, unaware of the aims and objectives of this research project, carried out and recorded a series of interviews with native and nonnative inhabitants, resident in the city for varying periods of time and with diverse residential perspectives. Participants were 23 females and 17 males balanced for age (N.20 < 30> N. 20; range 19-55 years old) and origin (11
Environment and Urban Tourism: An Emergent System in Rhetorical …
351
Table 1. Transcription notation. (.) (2.0) [overlap] ↑ ↓ Underline > faster < Rea::lly . [(laugth)] Fann(h)y = ° ?
Micro-pause Pause length in seconds Overlapping speech Rising intonation Lowering intonation Emphasis Encloses speeded up talk (Brackets) Enclose words the transcriber in unsure about (empty brackets enclosed talk that is not hearable) Elongation of the prior sound Stopping intonation Immediate latching of successive talk Comments from the transcriber Loughter within speech without any silence Equal signs indicate a “latched” relationship talk appearing within degree signs is lower in volume relative to surrounding talk Question marks signal “questioning” intonation, irrespective of grammar
natives of Cagliari; 10 natives of Sardinia; 10 Italians born outside Sardinia; 9 Europeans). The choice of non-professional interviewers was based on the intention of creating a communicative exchange in which both parties could co-construct a dialogue based on a series of exchanges very similar to a “natural” conversation. The framework was that indicated by Speer and Potter: “The interviewer, for example, structures the talk by making certain issues and identities relevant and not others. Likewise, in such contexts, the “respondent” or “interviewee” orientates to the research interview as relevant, by speaking as a generic person who is keen to offer a suitably qualified response […]. Nonetheless, we would like to emphasize that we are treating these as natural materials in the specific sense that we are not privileging the actions and orientations of the researcher, but are instead treating her as an active implicated part of what is going on […]. If the participants’ institutional status (as interviewer/interviewee) is relevant to the interaction, then, it will be oriented to” (Speer, Potter, 2000 [27, note 7, pp. 565-566]). Interviews were transcribed by 3 psychology graduates likewise unaware of the final aims of the research project - based on “Jefferson’s transcription” rules (Jefferson, 1989 [18], Table 1]). Following the Discursive Action Model (Edwards, Potter, 1993 [14]), a specific analysis was implemented to identify rhetorical devices. Each interview was initialized by specifying the order of discursive analysis (casual),
352
M. Mura
interviewer (I)/ respondent (R), interviewer/respondent gender (M/F = male/female), interviewer/respondent bracket (Y/A = young/adult: >30); interviewer/respondent extraction (C – native of Cagliari; S = native of Sardinia; I = Italian, not born in Sardinia; E = European); date of interview, date of transcription, total number of linguistic passages identified (each passage was identified by means of a progressive number referred to the whole interview). The analysis was facilitated, where relevant, by some indications about the tone of the interview based on specific comments by the interviewer and other non-verbal elements. The interpretations were supported by a series of extracts. However, to save space, only some aspects felt to be particularly significant for the purposes of this study are analyzed here. Other significant partial statements present in the interviews are reported between inverted commas within the text. 3. Results: rhetorical devices and interpretational repertoires Discourse analysis displayed a central element of the co-constructed representations of the interviewer and respondent. It was never disassociated from individual self-presentation or else self-identity (extract 1). Interviewers and respondents adopted rhetorical devices to control interaction and demonstrate competence. The most common device adopted was “footing” (Goffman, 1979 [16]; Levinson, 1988 [21]; Edwards, Potter, 1993 [14]): while reporting and constructing questions or explanations, speakers were accountable for their own actions in speaking, for the veracity of their accounts, and for the interactional consequences of those accounts. In turn, the respondent replied in such a way as to present an image of himself/herself which was distinctive and felt to be most appropriate to the relational context. The interpretational repertoire most frequently used by interviewers, and also spontaneously by respondents, was the simulation of the role of a “friend” who identifies places which “he/she likes” and frequents during leisure time. The “friend” was asked to identify a pathway or recommend an holiday, because one can get to know a city only by “wandering around it” or, better still, being “taken around” to “see something”. The “pathway” was presented in two main ways: the most expert guides (tourist professionals) started from an elevated position (Monte Urpinu), offering an overview of the city which subsequently became more defined as the tourist/friend went through the historic quarters of the city and other significant historical sites. Non-natives of the city and non-professionals linked their representation to an element considered “touristic” ‘par excellence’: the sea (in Cagliari the
Environment and Urban Tourism: An Emergent System in Rhetorical …
353
Extract 1. [XXI /F-F/Y-Y/S-C/2003/2003/490]. 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214
I: and if your friend was from another place maybe from Florence and came here to Cagliari what would you [show? ] him = R: [of Cagliari in particular? ] = I: yes [(laughs)] R: uhm good question, well the sea is too obvio::us (.3) but, come on… it’s surely the first thing (.) why?: (.) they’ve never seen anything like that ↑ (.) and not Poetto’s beach/sea and the reasons: of its being ruined ↓ ((I smile )) , but because the mo::st beautiful places are Costarei °Calapira° (.) I don’t know if you’re familiar with them (.) ((I nod)) ↓ uhm while I don’t like Costa Smeralda ↑ (.)
reference is to Poetto beach) is presented as different/more attractive in regard to other beach areas frequented by non-Sardinians or non-residents and is often compared with the other coasts of the island. This is another rhetoric device, “factual report”, utilized to describe directly perceived events or by means of graphic description and sequential narrative that imply perceptual clarity. In this case, the interpretational repertoire displayed diverse “tourist city” aspects. Another is the historic center, the Castello quarter, which they found attractive for its steep, narrow streets and alleys which offer a view of the city from above, drawing the eye down to the port and gulf. One young female respondent also identified an area by pinpointing a trendy dive frequented in the small hours by youngsters (Extract 2). Sometimes the historic quarter received a negative evaluation, if compared with better else, but it was always presented as a tourist reference. Other respondents tended to suggest a specific thing (a church, a corner and so on) in order to present a distinct self, thus tourist-recommended resources were strongly linked to individual identity. “One-of-a-kind” elements were also present in the representation of a “tourist city” and referred to the model of co-construction centering on comparison: one-of-a-kind indicates the presence of elements not to be found elsewhere, in other cities or places visited. Even a small single characteristic element is sufficient, provided it is not “available” elsewhere. An essential
354
M. Mura Extract 2. [XI/F-M/Y-Y/S-C/2003-2003/ 890]. 259 260 261 262 263 264 265 266 267 268 269 270 271
I: then what would I show them from the roundabouts/here? R: surely not the area around Castello, the centre is certainly the ugliest part of Cagliari and the fact is that here uhm: I mean that it’s not ancient it’s just old and that’s something I just hate (.) The historical centre is not beautiful according to me (.) As in the case of other cities (.) However, there’s the area of Ibarium , I don’t know if you know where [it is]= I: [I do]= TO: and it’s all long the edge: and: (.) it is like hanging from a cliff high up above the city it’s beautiful that I surely would show(.) then museums and other places there’s nothing(.) one thing is a monument… and…a a monument (.) a: church, a beautiful one that very few know is San Michele(.) and the church of San Michele (.) that I would surely show
element of factual reports were “Original small dives” which can draw holidaymakers, originally oriented towards a beach holiday, like a magnet to the city. The “proof” of a place’s attractiveness and uniqueness is based on how crowded and well photographed it is. Factual reports and accountability (footing) produced an interesting interpretative repertoire: the simulated amicable relationship induced a sort of merging of the relationship between personal, social and place identity: friendship ensured veracity and the identified “special” places represented a distinctive place identity. This was confirmed by the fact that descriptions were rhetorically organized to undermine alternative ways of being: in their special role as “friend,” speakers revealed things that “only a few know”, or else mentioned and rejected better known spots, such as the “Costa Smeralda” (Emerald Coast – Northern Sardinia), highlighting their own diversity vis à vis unidentified, but always rhetorically (albeit implicitly) present “others” (Extract 1). Moreover, if the (non-Sardinian) respondent’s statements were contradictory or negative with respect to the evaluation of the city as attractive to tourists, the interviewer “compelled” the respondent to make a precise choice or moved the discourse in the direction of a choice, towards certain specific conclusions or, again, requested indication of resources to be enhanced, making the respondent take on the role of public administrator (Extract 3). Thus the positive identity of locals (both interviewers and native respondents) appeared linked to the presence of attractive characteristics in Cagliari from the point of view of tourism. This fact was particularly evident
Environment and Urban Tourism: An Emergent System in Rhetorical …
355
Extract 3. [XXI/F-M/Y-Y/C-E/2003-2003/ 902]. 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650
I: of a person [from outside]= [ from outside or anyway of a person that lives in Cagliari but travels]= R: [ but who travels a lot]= uhm boh(.) I told you I can simply say that: (.) it can potentially be a lot and: (.) Respect to those cities that really don’t have, that which they, that is, they don’t have (.) It has the means but doesn’t use them (.) this is the only thing I can say to you (.) ↓ And: because let me repe::at in Germany you look at the banks of the lakes they’re all extremely well kept ↑ (.) Then you look at Poetto without mentioning the blunder that they have recently done with the story of widening the beach: (.) Blunder ↓ (.) mean the idea was good, but it didn’t come out right However (.) all said, in the end, they ruined something (.) and they’re continuing to do:: it because they’re building I don’t know if you’re aware of it it (.) Anyway in the area of Villasimius they’re building right down close to the sea (.) violating all possibile and imaginable laws ↑ (.)
when the person appeared strongly attached to the place, through his/her selfcategorization as a young intellectual with roots running deep in the land of origin, passionate about its culture and traditions, very often summarized in a self-declaration of “Sardinian citizenship” (Extract 4). Extreme case formulation (Pomerantz, 1986) was another rhetoric device used by the high identified: a “trip” is seen only as something which takes the traveler outside Sardinia (defined as “home”), and this “departure” implies a change of Self marked by the wish to exhibit distinctive symbols such as language (defined as a “banner”), and the desire to return home as soon as possible. Paralinguistic phenomena highlighted and rhetorically communicated the “passion” at the basis of such statements, while laughter consistently preceded, accompanied, and followed affirmations that the respondent felt to be too strong or exaggerated. This paralinguistic rhetoric device, however, made it possible to express one’s identity in the conversational field and prevented the counterpart from assuming a negative approach or giving judgments of non-normality. In this “version of own world”, travel had meaning only if directed at some form of “attractive and worthwhile” activity, opposed to the passivity of the trips of “others”, defined as “purely to see something” or “waste of time”.
356
M. Mura Extract 4. [VI/F-M/Y-A/S-C/2003-2003/ 490]. 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
R: For me to go on a trip is when you cross the sea when you go beyond the sea And : I realise that I change the moment I cross the sea why ?: (.) because > every time I have happened to be outside Sardinia < the need of identity has been evermore acute with a growing need to speak Sardinian, , co : n (0.5) in other words it’s as if one wants (.) to bring forward some sort of: banner ((laughs)) a flag of this I think that this thing is a little pathological ((laughs and coughs )) I: Can you tell me about you travel experience? R: Sure > ah here’s another important thing I didn’t tell you < for me there are no trips (.) > this is certainly a huge distortion < ↑ for me there are no trips uhmm (0.5) without it being > doing something < ((moving of hands in the air )) that has to do with work. I have already started to travel I have been part of a folklore groups since I was : 11 years old
This behavioral alternative was represented by symbol-localities: Maldives are compared to Cagliari’s beach (Poetto) as symbols, the former representing the passive, useless trips of the “others”. It is better to stay home if one is not equipped with an approach of “discovery and comparison with a completely different reality”, better still if referred to the great European capitals and visits to museums. Hence we found enunciations of what is attractive in the city, which for all intents and purposes includes everything: food and drinking water, the impossibility of staying “away” for “long periods”, “the mistral wind” whose “voice” is a “great companion”, the sun (emphasized by the tone of voice and a pause) which – unlike in other places – peeks through even on cloudy days, the sea, etc. It became clear that the identity of a place and its representation as being of tourist interest were intrinsically entwined in the profound conviction of those questioned when respondents (non-natives of Cagliari unable to provide the information requested, or pinpoint its tourist resources, due to lack of knowledge of the city), faced with a native-born interviewer from Cagliari, enacted the “footing” rhetorical device: 1. respondents expressed regret and at times ascribed their inability to provide positive answers to shortcomings, recognizing implicitly the affective link which binds any individual to his/her place of origin;
Environment and Urban Tourism: An Emergent System in Rhetorical …
2.
357
respondents, after a fraction of a second’s silence and other paralinguistic signs, followed with the affirmation that the city cannot be defined as tourist on first impact, but becomes so once it is “known” better, and only if one has the patience to “discover things”.
There were those who, so as not to offend the sensibilities of the interviewer, shifted the emphasis from recommending a holiday in the city to recommending it as a place to live: the city is suitable for a class of inhabitants in search of peace and quiet, who hate traffic, because Cagliari is a “people-friendly” city, beautiful in the sense that the “quality of life is excellent” and it is not chaotic. Based on the conversational context, to give rhetorical force to affirmations (accountability), the respondent might also utilize different forms of identity selfcategorization, perhaps stating that he/she does not truly feel ‘a citizen’ of Cagliari, although born there, because at the time his/her parents were still living in their city of origin. Here the aim was to indicate belonging to a famous city (in this case Florence). Whereas in other sequences the respondent defined him/herself as ‘a citizen’ of Cagliari to support the validity of his/her opinion on the tourist resources of the city. Again, in some cases he/she states not always living in the city, defining him/herself as a “person who travels a lot”, to support the role of “expert” in the evaluation of places and persons, while at other times he/she affirms living there to support the statement that places and events are not sufficiently publicized if he/she has found out about them only recently. As to be expected, a constant element in the repertoires of Cagliari as a tourist city was the fact that it is only potentially so, by using factual reports and systematic vagueness rhetoric devices in order to designate it as distillation of a common wisdom. This was also expressed through affirmations that the city has many “means”, especially in comparison with other similar cities, but fails to “use” them (Extract 3). To make a city attractive to tourists, it is necessary to “publicize” and “well keep” it. Moreover, in this discourse we also found another dimension of the profound conviction that the place and the identity of its inhabitants are closely related: indeed, it was stated that the “Sardinian” has no concept of “marketing” because it invests in “product quality”, but not in “image quality”; the failure of tourism to “take wing” would appear linked to the “local mentality”. The same judgment was made about the inhabitants of Cagliari, defined as “generous” (with a low tone of voice) but not open towards tourists, “sociable”, but only if “you know how to take them”, “courteous” but tending to see tourists as “strange beings”. In this they demonstrate having the same mentality as other “Sardinians”.
358
M. Mura Extract 5. [XVI/F-M/Y-A/C-C//2003-2003/ 772]. 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246
Re: Well then can you give me a definition of tourist city? To: Mmm well a tourist city (0.4) must be (.) should be a city that is at the disposal of the tourist but without forgetting (.) the ..the.. the people that live there. Let’s take Venice it’s a great tourist city However I find that for the people living there it must be a nightmare Because they don’t own their own city = = owners at home mmm for this reason I think that a city a tourist city should have the: (0.8) should show and put its beauties in the showcase (.) its own characteristics So that on the behalf of > the administrators there should be greater care < of all its esthetical features, (.) the gardens, the facades, to the urban furniture etc.etc,… But at the same time (0.3) think about them > think about these things < for those living there and not for those coming to see them
However, this judgment was attenuated by recognizing the fact that the “Cagliaritano-Sardinians” are also the only true artificers of the city’s beauty, and moreover by highlighting the fact that their characteristics might well derive only from a personal impression. In these interactional situations the interviewer moved the attention of inhabitants to the theme of organization and repeated the question on the city’s potential. The respondent generally intervened by reaffirming his/her position but also accepting the standpoint of the interviewer: the city is defined as “beautiful”, thanks to its sea, historic center and small tourist train, but lacking in organization and “in style”, as shown respectively by the difficulty in accessing information and the “shuttered” shops. On the theme of the city’s lack of a true characteristic tourist vocation, we saw the emergence, among natives of the city, of the need to achieve compatibility between the transformation of the city as a tourist center and the well-being of its residents (Extract 5): tourism brings with it the risk of no longer being “lords and masters in one’s own home” (the negative example quoted here is “Venice”). There was an implicit contrast between the population of relatively stable residents and those who are only passing through; the “hosts” present their beauty spots/characteristics as though in a “show case”, through which they
Environment and Urban Tourism: An Emergent System in Rhetorical …
359
evidently also present themselves, and the “visitors”, indicated as “guests”, must observe and admire, but they must be neither too numerous nor too intrusive. In this case, the “footing” rhetoric device was used to support the veracity of their accounts and the interactional consequences of such. An important role in the set-up of this city “show case” was assigned to public administrators, in an attempt to ward off negative judgment on the inhabitants (because the interviewer was an inhabitant). One of the “weakest” elements of the city was in fact identified as a lack of care or nurture: public places and areas are sketchily cleaned, administrators are responsible at times for true damage to the city, witness the attempt to restore Poetto beach with the introduction of dark sand or the damage caused by private building activity in defiance of scenic and heritage protection legislation (extract 3). 4. Conclusion It would appear clear that discursive analysis is capable of bringing to the fore “in vivo”, in conversation, the co-construction of interpretative repertoires which are rhetorically expressed, that is talk in action. As Edwards and Potter (1993) [14] claim, social action attributions are discursive action to invitation refusals, blaming, and defenses. Reports and descriptions were rhetorically organized to undermine alternatives: factual reports, systematic vagueness, extreme case formulations, graphic description, “footing” are rhetoric devices for managing interest, to externalize an account by constructing certain actions as universal or normative. Also, interview discourse analysis displayed interpretative repertoires which confirmed our expectations of finding in “versions of the world” the most important people-place transition: representations. Interpretative repertoires are the result of the place and social representations, constructively formed about what is “true” and “accepted” for interlocutors: We know that social representations influence group behavior and thus the “city of Cagliari” system as a “tourist place” (Bechtel and Churchman, 2002 [6]). “Versions of the world” gave us representations of the city as a system in which tourist resources are at once objective (a beautiful beach, an important historical center, entertainment services, and so on) and subjective (what the guide likes), possibly unique and authentic (the guide should be a friend). Resources were a native cultural expression, providing a sense of pride and individuality, like the entire city, and the only difference between natives and tourists is that the latter’s attitude is driven by the “tourist gaze” (Urry, 2002 [31]). Nevertheless, city tourist resources were an important part of the selfesteem of natives’ social and place identity, whereas pleasure references
360
M. Mura
constituted the tourists’ limited social-place identity. In repertoires, the social construction of Cagliari as a tourist city is strongly linked to the identity presented. This blending means that only an artificial separation of selfpresentation and representation of the object is possible since the statements perform both functions: they strengthen the persuasive value of arguments through the presentation of a positive distinct identity, constructed rhetorically with respect to “others”, implicitly present. The representation of the city was indeed marked by the presence both of the elements unanimously agreed upon, obviously representing “stereotypes” of the general representation of an urban “tourist venue” (sea and historic center) but also by evaluations of very different type and “idiosyncratic” places, which blend the distinctiveness of those indicating them in relation to the model of “visitor”, implicit or explicit, to which reference is made. Our analysis moreover provided confirmed the attempt to establish a conceptualization of place identity as constructed in the same terms as social identity (Tajfel, 1981) and Breakwell’s process of identity theory (Bonaiuto, Twigger-Ross, Breakwell, 2004 [8]). When the “ambiguous” origin of the subject brought about variable regional self-categorizations in relation to the conversational and rhetorical activity in progress, people chose that which was more conducive to a positive self presentation and accountability. Indeed, it would seem that a profound conviction emerges of the co-essence of place and inhabitant identity in the version of the world in which the place, as an end-product of the culture (becoming technique and art) of its inhabitants, represents both successes and failings. The place cannot be spoken of in a derogatory manner in front of a native inhabitant because this is regarded as offensive (Twigger-Ross, Uzzel, 1996 [30]). The identification of “tourist” resources themselves (buildings of historical importance, lesser places of typical morphology, cultural events etc.), for visitors and residents alike would seem to provisionally support the hypothesis that in the city and for its residents the representation of tourist behavior does not differ from that of leisure time: the places and activities objectifying it were the same. In conclusion, the social construction of this tourist place in communicative relationships simultaneously represented social reality and individual identity as possessing value (self-esteem) and as distinct (distinctiveness). Discourse analysis of different “residents” (native and non-native) allows us to identify the “tourist-place system” as a molar effect of physical and architectural characteristics, and social elements which form the basis for natives and visitors in awarding the “beautiful” label.
Environment and Urban Tourism: An Emergent System in Rhetorical …
361
Moreover the interpretative repertoire, in referring to natives as “masters of the house,” provided a very significant cue for maintaining sustainable tourism: visitors want a native-friend to take them sightseeing (they look for authenticity), they want new-unique “things” to “gaze” upon and this is recognized by everyone as the goal of a typical (socio-physical) environment. A “tourist” place is an emergence of a preserved bio-social diversity place: the system of socio-physical environmental features and the native people’s sense of belongingness give “identity” to that place. References 1. I. Altman, B. Rogoff, in Handbook of environmental psychology, Vol. 1, Ed. D. Stokols, I. Altman, (Wiley, New York, 1987), pp. 1-40.
2. C. Antaki, Explaining and Arguing: The social organization of accounts (Sage, London, 1994).
3. M. Argyle, The Social Psychology of Leisure (Penguin Books, London, 1996). 4. G.J. Ashworth, in Classic Reviews in Tourism, Ed. C. Cooper, (Channel View Publications, Clevedon, 2003).
5. R. Atkinson, The life Story Interview (Sage Pubblications, London, 1998). 6. R.B. Bechtel, A. Churchman, Eds., Handbook of Environmental Psychology (Wiley, New York, 2002).
7. M. Billig, Arguing and Thinking (Cambridge University Press, Cambridge, 1996). 8. M. Bonaiuto, C. Twigger-Ross, G. Breakwell, in Psychological Theories For 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
Environmental Issues. Ed. M. Bonnes, T. Lee, M. Bonaiuto, (Ashgate, Adershort, 2003), pp. 203-233. Bonnes, M., Nenci A.M., “Ecological Psychology”, in UNESCO - Encyclopedia of Life Support System, (Unesco-Eolls, Oxford, 2002). D. Capozza, R. Brown, Social identity process (Sage, London, 2000). F. di Castri, V. Balaji, Tourism, Biodiversity and Information (Backhuys, Leiden, 2002). W. Doise, La forza delle idee. Rappresentazioni sociali e diritti umani (Il Mulino, Bologna, 2002). W. Doise, A. Clemence and F. Lorenzi Cioldi, Reprèsentations sociales et analyses des donnèes (Pug, Grenoble, 1992). D. Edwards, J. Potter, Psychological Review 1, 23-41 (1993). R.M. Farr, S. Moscovici, Eds., Social representations (Cambridge University Press, Cambridge, 1984). E. Goffman, Semiotica 25, 1-29 (1979). M. Harris, Culture, People, Nature. An Introduction to General Anthropology. (Harper & Row Publishers, New York, 1985). G. Jefferson, in Conversation. An interdisciplinary perspective, Ed. D. Roger, P. Bull, (Multilingual Matters, Clevedon, 1989), pp. 156-197. K.M. Korpela, T. Hartig, F.G. Kaiser and U. Fuhrer, Environment and Behaviour, 33(4), 572-589 (2001).
362
M. Mura
20. E. Laszlo, The Systems view of the world (George Braziller, New York, 1971). 21. S.C. Levinson, in Erving Goffman: Studies in the interactional order, Ed. P. Drew and A. Wootton, (England Polity, Cambridge, 1988), pp. 161-289.
22. R.C. Mannell, S. Isho-Ahola, Annals of Tourism Research 14, 314-331 (1987). 23. M. Mura, in Culture, Quality of life and Globalization, Ed. R.G. Mira, J. M. Sabucedo, J. Romary, (Book of Proceedings, 2002), pp. 870-871.
24. Pearce, P.L. Tourist Behaviour. Themes and Conceptual Scheme (Channel View Publications, Clevedond, 2005).
25. J. Potter, M. Wetherell, in The psychology of Social, Ed. U. Flick, (Cambridge University Press, Cambridge, 1998), pp. 138-155.
26. C. Puchta, J. Potter, British Journal of Social Psychology 41, 345-363 (2002). 27. S. Speer, J. Potter, Discourse & Society 11(4) 543-572 (2000). 28. H. Tajfel, J.C. Turner, in Psychology of intergroup Relations, II Edition, Ed. S. Worchell, W.G. Austin, (Nelson-Hall, Cicago, 1986).
29. H. Te Molder, J. Potter, Eds., Conversation and Cognition (Canbridge University Press, Cambridge, 2005).
30. C.L. Twigger-Ross, D.L. Uzzel, Journal of Environmental Psychology 16, 205-220 (1996).
31. J. Urry, The tourist gaze (Sage Publications, London, 2002). 32. L. Von Bertalanffy, General System Theory (Braziller, New York, 1968).
EMERGENCE IN ARTIFICIAL INTELLIGENCE
This page intentionally left blank
DIFFERENT APPROACHES TO SEMANTICS IN KNOWLEDGE REPRESENTATION S. DAVID, A. MONTESANTO, C. ROCCHI DEIT – Università Politecnica delle Marche {s.david|a.montesanto|c.rocchi}@univpm.it There are different approaches to model a computational system, each providing a different Semantics. We present a comparison between different approaches to Semantics with the aim of identifying which peculiarities are needed to provide a system with a uniquely interpretable Semantics. We discuss different approaches, namely Description Logics, Artificial Neural Networks, and Databases, and we identify classification (the process of building a taxonomy) as a common trait. However, in this paper we also argue that classification is not enough to provide a system with a Semantics, which emerges only when relations between classes are established and used among instances. Our contribution also analyzes additional features of the formalisms that distinguish the approaches: closed vs. open world assumption, dynamic vs. static nature, the management of knowledge, and the learning process. We particularly focus on the open/closed world assumption, providing real world modeling examples to highlight the differences, and the consequences of one choice vs. the other. We also consider an important difference: in symbolic systems the notion of Semantics is ‘declared’ by means of axioms, rules, or constraints, whereas in subsymbolic ones the notion of Semantics emerges according to the evolution of a modeling system. Keywords: description logics, artificial neural networks, databases, open world assumptions.
1. Introduction Recently there has been a growing interest in the notion of Semantics. Probably pushed forward by the development of the Semantic Web, many researchers in computer science have started investigating in the field of Semantics. Such a notion has been already widely investigated in many fields like linguistics, philosophy and logic. Following the analytic stream as conceived in (Russell, 1908 [19]) and (Frege, 1918 [7]), the concept of Semantics in philosophy evolved in formal terms until Montague provided a formalization (Montague, 1974 [15]), which is widely accepted in the field of logic-based linguistic studies. In philosophy, a less formal trend brought to the definition of Semantics in terms of ”correspondence to the world” (Wittgenstein, 1921 [23]), an approach 365
366
S. David et al.
influenced by the formal work of Tarsky about the notion of truth (Tarsky, 1944 [22]). Meanwhile, the work in cognitive psychology explored the human process of categorization and classification, which led to the development of models, inspired by formal logic, but more focused on representational issues (scripts, frames, etc.). In (Balkenius and Gärdenfors, 1991 [4]) the authors show that by developing a high-level description of the properties of neural networks it is possible to bridge the gap between the symbolic and the subsymbolic levels (Smolensky, 1993 [21]). We can see this relation by giving a different interpretation of the structure of a neural network. They highlight ‘scheme’ as the key concept for this construction. Schemata are neutral with respect to the different views of cognition and have been used in many fields (Balkenius, 1993 [5]). Moreover Balkenius uses the term scheme as a collective name for the structure used in the works of (Piaget, 1952 [17]), (Piaget and Inhelder, 1973 [16]), (Rumelhart and McClelland, 1986 [18]), and (Arbib and Hanson, 1987 [2]), including also concepts such as Frames (Minsky, 1986 [14]) and Scripts (Schank and Abelson, 1977 [20]). Nowadays Semantics is involved in many research fields: natural language processing, semantic web, knowledge representation and medical informatics. Our purpose is to analyze the concept of Semantics in different approaches adopted in the design and implementation of computational systems. We consider three approaches to domain modelling: Description Logics (DLs), Relational Databases (DBs), and Artificial Neural Networks (ANNs). DLs and DBs establish their foundation in the theory of propositional logic and its connectives: AND, OR, and NOT, with the addition of universal ("x) and existential ($x) quantification from predicate logic, while some particular type of ANNs (e.g., see McCulloch and Pitts, 1943 [13]) can express connectives of propositional logic. Research and discussion in the fields of DLs, DBs and ANNs often involves the notion of Semantics. In this paper we closely look at each approach, highlighting peculiarities and common traits. Each approach is examined in relation with a simple domain, including entities such as wines and courses, which we introduced to show commonalities and differences between the three approaches. We will first consider in Section 3.1 Description Logics, a widely known approach in the knowledge representation field, which exploits logic tools (subsumption, consistency and satisfiability) to implement reasoning over structured data. Such an approach finds its roots in logical formalisms and exploits a clear definition of Semantics, often expressed in terms of set theory.
Different Approaches to Semantics in Knowledge Representation
367
We will then analyze in Section 3.2 the relational database approach, which allows for fast retrieval of structured data via queries over tables. This approach is widely adopted in commercial applications, where data are organized in terms of tuples. Finally we will consider in Section 3.3 Artificial Neural Networks, which show a fairly different and peculiar approach to the modeling of a domain, comprising a necessary learning phase to classify input data. Our analysis has been done in terms of some main features, namely: static vs. dynamic nature, closed vs. open world assumption, management of implicit knowledge, need for a learning phase, and classification. We identify the last one as a common feature. Nevertheless we argue that classification is not enough to have a ’system with Semantics’. We believe Semantic arises when relations between classes are considered. We also highlight the different nature of Semantics in the three approaches: in DBs it is explicitly declared (via tables), in DLs it partially explicited (via axioms) and partially calculated through reasoning algorithms, and in ANNs Semantics emerges from the state transitions of a system. 2. Motivating example In order to ease the understanding of our purpose, we present an example. We describe in plain English a sample scenario, which we expect an output from (e.g., a result from a query): this scenario will later be represented in each of the three approaches, assuming the same domain and granularity. We consider how the formalization varies across different representation approaches and we finally discuss the different outcomes observing how the different features of the approaches behave and sum up to form the Semantics. As we want to keep our example simple, we neither describe nor represent in details our domain, but concentrate on the relevant aspects concerning the purpose of our work. We introduce a domain containing wines and courses, and a relation that states when a given wine is ideal with a course, because the flavor of the wine fits the taste of the course. We describe the general elements of the domain and their characteristics as follows: • Wine is a class. • Wine has an alcoholic degree. • Wine has a color. • Wine is a liquid.
368
• •
S. David et al.
Course is a class. ”ideal with” is a relation between Wine and Course.
With the term class, we refer to a set of elements that share the same characteristics. Assertions about particular elements (instances) of the domain are: • Marzemino is an instance of wine. • Marzemino alcoholic degree is 12. • Marzemino’s color is red Ruby. • Rabbit is a course. • Marzemino is ideal with rabbit. 3. Formalization of the Example In this section, for each of the approaches investigated, we introduce its formalism, then we formalize the example presented in Section 2, and finally we show how the different features of the formalisms are used. 3.1. The Example in Description Logics Description Logics are a family of languages for the representation of knowledge, which also allow reasoning over it. Before formalizing the example in Description Logics, we briefly introduce the basic concepts underlying DLs: components, syntax, semantics, and reasoning. We point the interested reader to the Description Logic Handbook (Baader et al., 2003 [3]) or to the bibliography for a deeper presentation. 3.1.1. Components of a DL system The three main components of a Description Logic System are depicted in Figure 1. A Knowledge Base (KB) is a set of assertions (also called statements or axioms) about a domain, defined by means of Classes and their Properties and Relationships; it can be described in Description Logics by means of a concept language. The axioms are organized in a TBox and in an ABox, and reasoning services provide the ability to deduce additional information from the knowledge stored in the KB. 3.1.2. TBox and ABox The TBox contains the intensional knowledge, which is general knowledge concerning the domain of discourse fixed in time and usually not subjected to
Different Approaches to Semantics in Knowledge Representation
369
Figure 1. The Components of a Description Logic System.
change. The information stored in a TBox represents Concept and Roles (i.e., properties of concepts and relationships among concepts), which are aggregates of elements in the domain and form the terminology, i.e., the vocabulary used to describe the whole domain and to assign names to complex concepts description. Concepts are defined by unary predicates and Roles by binary predicates; Concepts can be either Atomic or complex. As an example, consider the following axioms: 1. Wine Β Liquid hasColor hasAlcoholicDegree. “A wine is liquid, has a color, and has an alcoholic degree”. Liquid is an atomic concept (not defined in terms of others). Wine is a complex concept, hasColor and hasAlcoholicDegree are Roles. 2.
-
idealWith.Course, idealWith .Wine. This is subtler, but it basically says ”A wine is ideal with an animal”. It expresses the range (Course) and domain (Wine) of the role idealWith. The value after the ”.” (e.g., in idealWith.Course it would be Course) is called filler of the property. The superscript denotes the inverse property. The DL syntax indeed does only allow specifying the filler (i.e., the range) of a property and not its domain.
On the other hand, the ABox contains the extensional knowledge, which can change over time and represents assertions about individuals in the domain. Individuals are instances of concepts and roles defined in the TBox, e.g., Wine(marzemino), idealWith(marzemino, rabbit). The first assertion states that Marzemino is a specific instance of class Wine, whereas the second is an instance of the role idealWith, stating that Marzemino is ideal with rabbit.
370
S. David et al.
TBox and ABox form the KB, denoted by Σ = (T,A), where T is the TBox and A the ABox. A single statement (or axiom) denotes information contained in either the TBox or ABox and is represented by α, β. The example presented in Section 2 can be expressed in the DL formalism as follows:
Τ = {Wine Β Liquid -
hasColor
hasAlcoholicDegree, idealWith.Course,
idealWith .Wine} Α = {hasAlcoholicDegree(marzemino, 12), hasColor(marzemino, redRuby), Course(fish), idealWith(marzemino, rabbit)} A KB also contains so-called implicit knowledge, information not explicitly stated, but that can be logically deduced by the existing statements. Implicit knowledge can be discovered with the help of reasoning services, for example in answer to a query posed by a user to the system. A reasoning engine (reasoner) also provides additional inference services: depending on the input and on the goal, the system carries out different processes. Note that it is not necessary to explicitly state neither Wine(marzemino) nor Course(rabbit), i.e., that marzemino is a wine and that rabbit is an animal, since these conclusions are automatically derived from the idealWith(marzemino, rabbit) relation. 3.1.3. Semantics of DL Languages We will denote with ∆ the domain of discourse, with × the Cartesian product of two generic sets and with the subsumption relation between Concepts or Roles. The semantics of the languages is given in terms of an interpretation, defined as a pair I = (∆ I , ·I ), where ∆I is a non-empty set called domain and the interpretation function. ·I is a mapping from every Concept to a subset of ·I, from every Role to a subset of ∆ I × ∆ I and from every Individual to an element of ∆ I. An interpretation I is a model for a Concept C if the set C I is nonempty. 3.1.4. Reasoning A Description Logic system provides many basic inference services. Here we present some of them, along with the sketch of how they are used to perform complex operations.
Different Approaches to Semantics in Knowledge Representation
1. 2. 3.
4.
371
Subsumption: decide whether a concept is more general than another one. Upon Subsumption, the process of Classification is built, which is the process that builds the hierarchy of the concepts in the TBox T. Consistency Check: decide if Σ is satisfiable, i.e., if it is coherent and admits a model. Instance Checking: the problem to decide if an instance C(a) is satisfied in every model of Σ. On Instance Checking in based the process of Retrieval or Query Answering: given a Knowledge Base Σ and a concept C, find the set of all instances C(a) in Σ. Concept Satisfiability: decide whether a concept C is satisfiable in a Knowledge Base Σ.
3.1.5. Characteristics of DL Semantics CWA/OWA. The Closed World Assumption and the Open World Assumption represent two different approaches to the evaluation of implicit knowledge in a KB. The difference in their behavior is usually clarified by a comparison between the structure of a Knowledge Base Σ = (T,A), and a Database, where the schema of the latter (i.e., tables and structure) roughly corresponds to T and its tuples correspond to A. On one side, we have a single Database instance, which represents the only possible interpretation of stored data, while on the other we have one out of all possible interpretations of Σ. Hence, while in a Data Base if an information is not explicitly stated in a tuple, it is interpreted as ”negative” or false knowledge, in a KB it is considered false only if it contradicts some other axioms in the domain or if it is explicitly stated (see Section 3.2). Static and dynamic systems. We already noted that a KB stores two types of knowledge: intensional and extensional. The former is also said timeless, since it is unlike that it be changed over time. Description Logics and KBs are indeed suitable for describing domains that can evolve over time, but only at the ABox level, i.e., with assertions on the individuals and not on the structure stored in the TBox. TBox is designed in a way that it hardly can be changed. Description Logics Systems are therefore static systems in that they cannot automatically update the TBox, as this implies the redefinition of an existing concept (see De Giacomo et al., 2006 [6]). Only the interaction of the KB designers can modify a concept. Note also, that the literature about updates w.r.t. the ABox is very limited (investigated in Liu et al., 2006 [12]), and also ontology evolution (i.e., ontology update w.r.t. the TBox) is poorly investigated (an exception is Haase and Stojanovic, 2005 [8]).
372
S. David et al.
Incomplete knowledge. Description Logics (like other Knowledge Representation formalisms) provide constructors that allow to specify what an object is not or what it can be, rather than describe what it is. For example, it is acceptable and correct to define statements like the following. • ¬Student(paul): we know that Paul is not a student, but we do not care whether he is a professor or perhaps an assistant. • hasParent(john, paul) ∨ hasParent(jane, paul). Either John or Jane is parent of Paul, but whom of them? Perhaps both? Again, we do not care who is, or who are, the parent of Paul, we only know that he has (at least) one. The reason for this behavior is in one of the purposes of the DL formalism: it needs to be able to manage information that is added while the domain is described in more details. This amounts to have the ability to deal with objects that are not fully specified. 3.2. The Example in Relational Databases To the extent of this work, we consider a Data Base (DB) as a structured pool of data. We do not go into details of DB theory, so we point the reader to (Abiteboul et al., 1995 [1]) for a comprehensive introduction to the topic. A DB relies on a scheme, which describes the ’objects’ that are held in the DB. Among different approaches to the structure of a scheme, the most commonly known and commercially adopted is the relational one, which allows defining information in terms of multi-related records (tables). Less known approaches are the hierarchical one (e.g., used in LDAP) and the network model, which allows multiple inheritance via lattice structuresa. The representational tool in relational DBs is the table, which allows representing instances and relation between instances. For example a wine can be represented as in Table 1. The first row of the Table is the definition of the wine concept. It is named ’Wine’ and has three attributes: name, color, and alcoholic degree. The second row represents a particular instance of wine, whose name is Marzemino, red ruby colored and with 12 degrees. More formally both a definition and an instance can be expressed as a tuple. For example the following is a tuple representing the wine definitionb: Wine: name, color, alcoholic degree a b
It is important to note that, for the scope of this paper, we focus on the representational approach and not on technical details or implementations like SQL-based systems. Note that attributes are written lowercase and fillers capitalized.
Different Approaches to Semantics in Knowledge Representation
373
Table 1. Wine definition. Wine
name Marzemino
color Red Ruby
alcoholic degree 12
Table 2. Definition and instantiation of IdealWith. idealWith
wine Marzemino
course Rabbit
whereas the Marzemino instance can be expressed as: Wine(Marzemino, Red Ruby, 12) In the definition of the concept it is possible to specify the type of an attribute (e.g., integer, string, etc.) and whether it is mandatory or not. Attributes’ fillers (namely column values) have to be of the same type. For example the color of a wine has to be a string, the degree has to be an integer or decimal number, etc. A table can also express a relationship. Consider Table 2: the first row defines the IdealWith relation, which can hold between a wine and a course. Assuming rabbit is an instance of course, the second row states the fact that Marzemino is an ideal wine with a rabbit course. It is important to notice that Table 2 states a relation between two tuples, the Marzemino instance and the rabbit instance. This simple DB allows making queries, for example, to retrieve the list of wines or courses, the wines with a degree greater than ten, etc. The purpose of a relational DB, especially in commercial applications, is fast querying, supported by views built during batch procedures. A view is a sort of virtual table containing tuples of some particular table column, with pointers to the row associated with the value, usually built to quickly locate tuples (rows) in a table. Views simply quicken the retrieval but the application works anyway, just slower. Views can be built on any combination of attributes, which usually are the most frequently queried. We can distinguish three main actions, which are performed on a DB: schema construction, population, and querying. The first action is the definition of the tables and of the relations between them. During this phase some semantics is involved, especially during the establishment and statement of relations. For example, the definition of the IdealWith table requires that the elements that populate it have to come from the wine and the course tables. More formally, in set theory terms, the pairs that populate the IdealWith table have to be a subset of the Cartesian product obtained by coupling tuples of the Wine
374
S. David et al.
table and tuples of the course table. The same ’semantic requirement’ will apply in the subsequent phases: population and querying. Population is the constructions of instances, tuples describing the objects that populate the database. During the population there will be a sort of type check, to control the type-correctness of the tuples populating IdealWith. The querying phase involves also the specification of properties, which introduce additional semantic operations. For example it is possible to retrieve all the wines with an alcoholic degree greater than ten. In other words, the retrieval allows specifying constraints on the values of the tuples to be searched in a specific table or in a set of tables. 3.2.1. Characteristics of DB Semantics OWA/CWA. The underlying assumption of relational DBs is that the world is closed. This means that, during the querying, if a tuple does not exist (or is void) the result is an empty set of tuples. Static/Dynamic system. Relational DBs are static systems. In this case static means that, unlike ANN approaches (see Section 3.3), there is no training phase required for the system to work. Once the schema has been declared and populated, the system is ready to be used. Incomplete knowledge. The management of incomplete knowledge is not handled by relational DBs. A DB always retrieves information that describe what an element is, so it must have certainty about the data stored. It is important to notice that the main purpose of relational DBs is fast querying of explicitly stated knowledge. A DB, by design, would allow some ‘reasoning’ on data. For example the table IdealWith, instead of being populated by hand, could be dynamically constructed by implementing rules which couple wines and courses according to their characteristics. This mechanism resembles the one adopted by rule-based systems, which dynamically apply rules to structured data (knowledge bases). Although this might look as a strong similarity between two different approaches we should focus on the foundations of them. By foundations we particularly mean the representational approach on which they rely. For example a clear difference is about constraints. In a DB, constraints are taken into account during the population phase. For example, if color is a mandatory attribute and an instance of colorless wine is inserted, the database simply rejects such statement (integrity constraints check). A DL knowledge base, in the same situation, would simply insert an instance of wine with an unknown color. This is
Different Approaches to Semantics in Knowledge Representation
375
Figure 2. A generic ANN unit.
due to the purpose of the system. In a DL-based approach incomplete data are considered anyway as a source of possible computation. The DB is instead based on the assumption that incomplete data, namely data that do not fit the schema, are to be rejected. 3.3. The Example within Artificial Neural Networks An Artificial Neural Network (ANN) could be seen as a system of programs and data structures that approximates the operation of the human brain. They involve a large number of processors operating in parallel each with its own small ”sphere of knowledge” and access to data in its local memory. Figure 2 shows how a generic neural network unit works. The action potential function P(t) defines how a single unit collects signals by summing all the excitatory and inhibitory influences acting on it. This potential affects the activation function that calculates the output value of the unit itself. In (Balkenius and Gärdenfors, 1991 [4]), an artificial neural network N is defined as a 4-tuple <S,F,C,G>. S is the space of all possible states of the neural network. The dimensionality of S corresponds to the number of parameters used to describe a state of the system. F is a set of state transition functions or activation functions. C is the set of possible configurations (that is weight distributions) of the network. G is a set of learning functions that describe how the configurations develop as a result of various inputs to the network. We can identify two interacting subsystems in a neural network: <S,F> that governs the fast changes in the network, i.e., the transient neural activity, and that controls the slower changes, corresponding to the whole learning in the system.
376
S. David et al.
ANNs have a distributed representation of knowledge (Kurfess, 1999 [11]): an item is not represented by a single symbol or a sequence of symbols, but by the combination of many small representational entities, often referred to as micro-features. The concept ’wine’, for example, would not be represented as a string of characters, but as an entity that has the properties ’color’, ’alcoholic degree’, and other characteristics of wine. Such representational schemata have some properties like similarity based access, fault tolerance, and quick response time. We could say that a static scheme is a stable pattern of activity in a neural network. A scheme α corresponds to a vector <α1,...,αn> in the state space S. A scheme α is currently represented in a neural network with an activity vector x = <x1,...,xn>. This means that xi ≥ αi for all 1 ≤ i ≤ n. Let α, β be two schemata. If α β, then β can be considered to be a more general scheme than α, and α can thus be seen as an instantiation of the scheme β. Semantics in ANN. According to (Healy and Caudell, 2006 [9]), concepts are symbolic descriptions of objects and events, observable or imagined, at any arbitrary level of generality or specificity. They are organized as a multithreaded hierarchy ordered from the most abstract to the most specific. In this context, the semantics of a neural network can be expressed as an evolving representation of a distributed system of concepts, many of them learned from data via weight’s adaptation. Usually we use definitions or constraints to build a class, which are conditions to be satisfied, or better they are features and attributes of the same classes. Members representing a specific domain compose classes. ANNs create sub-symbolic class relations strongly related to the particular domain described. These relations are embedded into the dynamical system structure. The architecture of this dynamical system is a model of the learned domain. OWA/CWA. A clear distinction between closed world assumption or open world assumption in the field of ANN is not so easy. Usually the standard models of neural networks are closed world systems. But we could evaluate the ”openness” of a neural network first considering its physical structure: for example if we need a variable number of nodes, we can apply a pruning approach that removes redundant units from a network (Wynne-Jones, 1991 [24]). On the other hand, we can use a fixed structure but change the amount of information in the training set. An example can be found in (Rumelhart and McClelland, 1986 [18]), about learning the past tenses of English verbs. It is a simple perceptron-based pattern associator interfaced with an input/output
Different Approaches to Semantics in Knowledge Representation
377
encoding/decoding network, which allows the model to associate verb stems with their past tenses using a special phoneme-representation format. Static/dynamic system (Learning and relational semantics). Learning modifies the structure of the weights in the neural network in order to maximize the number of constraints satisfied. In this way ANNs catch the constraints structure of the particular context modeled, so we can say that it has “learned” the relational semantics of that domain. This point of view shows that semantics is a kind of Gestalt that constrains data into a coherent structure. The understanding of meaning could consist of the emergence of a coherence starting from a chaotic initial state through a phase transition. Even (Balkenius and Gärdenfors, 1991 [4]) have shown that by introducing an appropriate schema concept and exploiting the higher-level features of the resonance function in a neural network it is possible to define a form of nonmonotonic inference relation. So, the ”truth” in ANNs consists of the dynamic state in which a node is active or not, that is, the truth is embedded into the knowledge state of the system. The particular dynamic system represented by a specific ANN structure is the model of the learned domain. Typically, a neural network is initially ”trained” with large amounts of data and rules about data relationship. One of the most important features of a neural network is its ability to adapt to new environments. Therefore, learning algorithms are critical for the study of neural networks. Learning is essential to most of these neural network architectures and hence the choice of the learning algorithm is a central issue in the development of an ANN. Learning implies that a processing unit can change its input/output behavior as a result of changes occurred in the environment. Since the activation functions are usually fixed when the network is constructed and the input/output vector cannot be changed, the weights corresponding to that input vector need to be adjusted in order to modify the input/output behavior. A method is thus needed, at least during a training stage, to modify weights in response to the input/output process. A number of such learning functions are available for ANN models. Learning can be either supervised, in which the network is provided with the correct answer for the output during training, or unsupervised, in which no external teacher is present. Multiple Layer Perceptron’s (MLPs) training algorithms are examples of supervised learning using the error backpropagation rule (EBP) (Rumelhart and McClelland, 1986 [18]). EBP is a gradient descent algorithm that uses input vectors and the corresponding output vectors to train a multiple layer network
378
S. David et al.
until it can approximate a given function. It was proved that MLPs, which are networks with biases, a sigmoid layer, and a linear output layer, can approximate any function with a finite number of discontinuities. Self-Organizing Maps (Kohonen, 2000 [10]) are based on non-supervised learning. The preservation of neighbourhood relations is a very useful property that has attracted a great interest, for which similar input patterns from the input space will be projected into neighbouring nodes of the output map, and conversely, nodes that are adjacent in the output map will decode similar input patterns. All self-organizing networks have been generally considered as preserving the topology of the input space, and this is the consequence of the competitive learning. However, by following recent definitions of topology preservation, not all self-organizing models have this peculiarity. Incomplete knowledge. In order to manage incomplete knowledge, logical full negation is necessary, but ANNs can only implement a logical atomic negation (e.g., see McCulloch and Pitts, 1943 [13]). For example, we can build an ANN that distinguishes between “Wine” and “not Wine”. However, in ANNs, there is no correspondent of DLs' s full negation, and therefore we can not represent negation between relations among classes, for example the concept of “marzemino not idealWith fish or verdicchio not idealWith rabbit”, like in Section 3.1. So, incomplete knowledge can not be implemented in ANNs in the same way it is intended in knowledge representation. Example. If we want to built a suitable model of neural network that represent the concept IdealWith talking about wines and courses, we can choose among different neural networks architectures. We decided to use a Multilayer Perceptron, that models the relation between the different instances of wine (input data) and the different typology of course (rabbit and fish). As figure 3 shows, the input layer of the MLP consists of 3 nodes, the completely connected hidden layer consists of 2 nodes and the output layer consists of only 1 unit. The latter unit codes the different typologies of courses, by using 1 for rabbit and 0 for fish. We chose a sigmoidal activation function for the output unit in order to have a more refined classification, so the output will be not merely 0 or 1, but we can have a range of values between [0,1]. In this way we have more complete information about not only the course concept but also on the concept IdealWith. The learning algorithm of the MLP is represented by equation 1, taken from (Rumelhart and McClelland, 1986 [18]).
Different Approaches to Semantics in Knowledge Representation
379
Figure 3. Multilayer Perceptron that models the concept IdealWith.
∆ wij = −η
∂E + α ∆ wij (n − 1) ∂ wij
(1)
The minimum error (E) obtained during the training depends on a suitable learning rate (η) and momentum (α). Once completed the training, it is necessary to test its effective ability to recognize the concept IdealWith. The training set will be represented by all the instances of wine associated to a specific course. So, when we minimize the error on the association training set of wine and typology of course, we will have a model of this kind of concept embedded in the Perceptron structure. Moreover this model can associate, after the training phase, a new instance of wine (never seen before) to one of the class of course. In this way a neural network is able to fill missing information. 4. Résumé and Discussion Table 3 summarizes the results we obtained from our work. It shows the component that the formalisms share and those that differ. We argued that classification does not suffice to provide Semantics, but we claimed that it is a fundamental trait, and is the first building block to provide Semantics to a system. Relations represent the second necessary building block. They are defined among classes and then used between instances of classes. These two blocks are shared by all approaches. We want to point out the attention on the difference between a taxonomy and classification in logic-based formalisms. Taxonomy is a hierarchy of concepts, connected by means of a IS-A relation, that is, we can only express that a class is superclass of another one, or, in other terms, that the set of the
380
S. David et al. Table 3. Summary of the components that build Semantics in different approaches.
Classification Relations among classes OWA/CWA Static/Dynamic Incomplete knowledge Learning
Description Logics yes yes open static yes no
Databases yes yes closed static no no
Artificial Neural Networks yes yes both dynamic no yes
instances of a class is contained in the set of instances of the superclass. There are no possibilities to define additional relations among classes, for example if they belong to different hierarchies. Hence, we do not consider taxonomy expressive enough to provide the semantics to a system. A classification, on the other hand is the process that modifies an existing taxonomy, by adding new elements to it. In particular, we can distinguish classification at two different levels, according to (Baader et al., 2003 [3]). • Classification of concepts (at TBox level) determines subconcept/ superconcept relationships (called subsumption relationships in DL) between the concepts already present in a given taxonomy and a freshly defined concept, placing the latter in the correct place in the hierarchy, or notifying its inconsistency with the existing taxonomy. Hence, classification of concepts allows one to structure the taxonomy of a domain in the form of a subsumption hierarchy. • Classification of individuals (at ABox level) determines whether a given individual is always an instance of a certain concept (i.e., whether this instance relationship is implied by the description of the individual and the definition of the concept). It thus provides useful information on the properties of an individual. From this formal definition, it might not be straightforward to see that a DB can classify elements, since there is no relation among tables other than foreign keys or constraints. However, such relations can clearly be identified in the conceptual model of the DB itself. Moreover, as every tuple must belong to a table, we can see the process to add a tuple to a table as a classification of individuals. We have shown that, besides classification and relation among classes, there are other characteristics that are needed to build the Semantics of a system. Some of these are peculiar to a formalism, i.e., the learning phase of an artificial neural network, or are common to some of them but used differently, i.e., the
Different Approaches to Semantics in Knowledge Representation
381
assumption of a closed world or an open world, resulting in different effects when querying the data stored. 5. Conclusion and further work We have investigated and presented three popular formalisms used in Knowledge Representation that use different Semantics and we identified their commonalities (Classification and relations among classes), and several other peculiarities (closed vs. open world assumption, dynamic vs. static nature, the management of knowledge, and the learning process), used in different ways. The approaches have been chosen because they cover a wide range of combinations in the different use of their peculiarities. We showed that Semantics is based on classification and on relationships between classes, and is refined by additional peculiarities that are used differently, according to the purpose of the resulting system. At the moment, we were not able to find a definition of Semantics that can be used across different systems. Besides the commonalities shared be all formalisms, there are indeed peculiarities too different to be reconciled in a unique definition. We have also shown the difference between the notion of Semantics in symbolic systems (e.g., DLs and DBs) and in subsymbolic systems (e.g., ANNs). In the former, Semantics is described with the use of axioms, rules, or constraints, whereas in the latter, Semantics emerges according to the evolution of a modeling system. Nevertheless, we foresee two directions in which this work might be extended. On the one side, we can put additional efforts in the review of other formalisms, like datalog, lambda calculus, fuzzy logic, SQL and so on, to confirm that we identified the correct peculiarities, or if we missed some. On the other side, we have spent a lot of time in the discussion of what can sum up to define Semantics, but we have also left behind a lot of issues and ideas, that we decided to investigate in a future time. Hence, additional effort should concern a deeper investigation in the following peculiarities: • Algorithms: are they part of the semantics of a system, or are they the consequence of the semantics? • User interaction: can a user modify the Semantics of a system, or is she constrained to use it as-is? • Context: to state whether it is involved or not in Semantics we need further investigation.
382
S. David et al.
References 1. S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases (AddisonWesley, 1995). 2. M.A. Arbib and A.R. Hanson, Eds., Vision, brain, and cooperative computation (MIT Press, Cambridge, MA, USA, 1987). 3. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P.F. PatelSchneider, Eds., Description Logic Handbook: Theory, Implementation and Applications (Cambridge University Press, 2003). 4. C. Balkenius and P. Gärdenfors, KR, 32-39 (1991). 5. C. Balkenius, in Selected readings of the Swedish conference on connectionism, Ed. M. Bodn and L. Niklasson, (1993). 6. G. De Giacomo, M. Lenzerini, A. Poggi, and R. Rosati, in AAAI, 2006. 7. G. Frege, Beiträge zur Philosophie des Deutschen Idealismus, 58-77 (1918). 8. P. Haase and L. Stojanovic, in ESWC, (2005), pp. 182-197. 9. M.J. Healy and T.P. Caudell, Axiomathes 16(1-2), 165-214 (2006). 10. T. Kohonen, Self-Organizing Maps (Springer, 2000). 11. F.J. Kurfess, Applied intelligence 11(1), 5-13 (1999). 12. H. Liu, C. Lutz, M. Milicic, and F. Wolter, KR, 46-56 (2006). 13. W.S. McCulloch and W.A. Pitts, Bull. Math. Biophys 5, 115 (1943). 14. M. Minsky, The society of mind (Simon & Schuster, New York, NY, 1986). 15. R. Montague, Formal Philosophy: Selected Papers of Richard Montague (Yale University Press, New Haven, Connecticut, 1974), (Edited, with an introduction, by R.H. Thomason). 16. J. Piaget and B. Inhelder, Memory and intelligence (Basic Books, 1973). 17. J. Piaget, The Origins of Intelligence in Children (Norton, New York, NY, 1952). 18. D.E. Rumelhart and J.L. McClelland, Parallel distributed processing (MIT Press, Cambridge, MA, USA, 1986). 19. B. Russell, American Journal of Mathematics 30, 222-262 (1908). 20. R.C. Schank and R.P. Abelson, Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures (Erlbaum, Hillsdale, NJ, 1977). 21. P. Smolensky, On the proper treatment of connectionism (1993), pp. 769799,. 22. A. Tarsky, in Philosophy and Phenomenological Research 4, 341-376 (1944). 23. Wittgenstein, Logisch-Philosophische Abhandlung, (1921). 24. M. Wynne-Jones, in 13th IMACS World Congress on Computation and Applied Mathematics, Volume 2, Ed. J.J. Vichnevetsky, R. Miller, (International Association for Mathematics and Computers in Simulation, 1991), pp. 747-750.
BIDIMENSIONAL TURING MACHINES AS GALILEAN MODELS OF HUMAN COMPUTATION†
MARCO GIUNTI Dipartimento di Scienze Pedagogiche e Filosofiche, Università di Cagliari via Is Mirrionis 1, 09123 Cagliari, Italy E-mail: [email protected] Even though simulation models are the dominant paradigm in cognitive science, it has been argued that Galilean models might fare better on both the description and explanation of real cognitive phenomena. The main goal of this paper is to show that the actual construction of Galilean models is clearly feasible, and well suited, for a special class of cognitive phenomena, namely, those of human computation. I will argue in particular that Turing’s original formulation of the Church-Turing thesis can naturally be viewed as the core hypothesis of a new empirical theory of human computation. This theory relies on bidimensional Turing machines, a generalization of ordinary machines with one-dimensional tape to two-dimensional paper. Finally, I will suggest that this theory might become a first paradigm for a general approach to the study of cognition, an approach entirely based on Galilean models of cognitive phenomena. Keywords: Turing machines, galilean models, Church-Turing thesis.
1. Introduction Typically, a model of a cognitive phenomenon H is a dynamical system that (i) is implemented on a digital computer by means of appropriate software and (ii) allows us to produce correct simulations of the phenomenon H. Even though simulation models are the dominant paradigm in cognitive science, it has been argued (Giunti 1992, 1995, 1996, 1997, 1998a, 1998b, 2005) [7-13] that Galilean models might fare better on both the description and explanation of real cognitive phenomena. Galilean models are dynamical models of a different kind, in that they are dynamical systems with n (1 ≤ n) state components, where each component has a precise and definite empirical interpretation, as it corresponds to a measurable magnitude of the real phenomenon that the model describes. Ordinary Turing machines operate on a potentially infinite linear tape divided in adjacent squares. Bidimensional Turing machines (Dewdney 1989) † The main ideas of this paper were first presented at the seminar “Prendere Turing davvero sul serio”, Poligono delle Idee Seminar Series, Dip. di Scienze Pedagogiche e Filosofiche, Università di Cagliari, October 21, 2005.
383
384
M. Giunti
[5] work instead on a potentially infinite checkerboard, where they are capable of moving one square right or left (as ordinary Turing machines do) and, in addition, one square up or down. Bidimensional Turing machines are computationally equivalent to ordinary ones, and they are mostly known for the complex geometric patterns they can generate on the checkerboard. I will argue in this paper that Turing’s original formulation (1936, [20, sec. 9.] ) of the Church-Turing thesis can naturally be interpreted as implying (A) a detailed description of a type of cognitive phenomenon, namely, the one of human computation; ( B) the claim that, for any specific phenomenon H of human computation, there is an appropriate bidimensional Turing machine that turns out to be a Galilean model of H. I will then sketch how claim ( B) might function as the core hypothesis of a new empirical theory of human computation and, finally, I will suggest that this theory might become a first paradigm for a general approach to the study of cognition, an approach entirely based on Galilean models of cognitive phenomena.a 2. Phenomena simulation vs. Galilean models A simulation model of a real phenomenon H (Giunti 1995, 1997) [8,10] is a mathematical dynamical system that (i) is implemented on a digital computer by means of appropriate software and (ii) allows us to produce empirically correct simulations of H. A simulation is empirically correct if we are able to empirically establish that the simulating process is similar to H in some relevant respect. Which respects are to be considered relevant, and which empirical methods we may employ to establish the similarity, is usually clear in each specific case. Simulation models are the dominant paradigm in cognitive science. However, because of their design, they have severe limitations with respect to both data description and explanation. The descriptive limit concerns the correspondence between simulation data and real ones, which is not direct and intrinsic to the model, but at most indirect and extrinsic. For a simulation model does not incorporate measurable properties (magnitudes) of the real phenomenon among its basic components; in contrast, quantitative descriptions are typically obtained by matching such properties with superficial or global features of the model. The explanatory limit concerns the quality of the a
The spirit of the Galilean approach is somehow consonant with some of the ideas by Wells 1998, 2006.
Bidimensional Turing Machines as Galilean Models of Human Computation
385
explanations supported by the model. Typically, they are neither realistic nor comprehensive, as they are rather cast in a somewhat fictional and “in principle” style. This second limit, like the first one, is due to the fact that the basic components of a simulation model do not directly correspond to real aspects of the phenomenon itself. As a consequence, any explanation that is based on analysis of a model of this kind is bound to introduce a whole series of fictional characters, which do not have any real counterpart in the phenomenon. As a first approximation, we can think of a Galilean model as a dynamical system with n (1 ≤ n) state components, where each component has a precise and definite empirical interpretation, as it corresponds to a measurable magnitude of the real phenomenon that the model describes. A more precise characterization of a Galilean model presupposes a preliminary analysis of the very notion of a phenomenon. In general, a phenomenon H can be thought as a pair (F, BF) of two distinct elements. The first one, F, is a functional description of (i) an abstract type of real system ASF and (ii) a general spatio-temporal scheme CSF of its causal interactions; in particular, the functional description of the abstract system ASF specifies its structural elements (or functional parts) and their mutual relationships and organization, while the description of the causal scheme CSF specifies the initial conditions of ASF ’s evolution. The second element, BF , is the set of all concrete systems of type ASF that also satisfy the causal interaction scheme CSF ; BF is called the application domainb of the phenomenon H. For example, let He = (Fe , BFe ) be the phenomenon of the free fall of a medium size body in the vicinity of the earth (from now on, I will refer to He just as the phenomenon of free fall). In this case, the functional description Fe is as follows. The abstract type of real system ASFe has just one structural element, namely, a medium size body in the vicinity of the earth; the causal interaction scheme CSFe consists in releasing the body at an arbitrary instant, and with a vertical velocity (relative to the earth’s surface) and position whose respective values are within appropriate boundaries. BFe is then the set of all concrete medium size bodies in the vicinity of the earth that satisfy the given scheme of causal interactions.
b
Since the functional description F typically contains several idealizations, no concrete or real system RS exactly satisfies F, but it rather fits F up to a certain degree. Thus, from a formal point of view, the application domain BF of a phenomenon (F, BF) might be better described as a fuzzy set.
386
M. Giunti
Let DS = (X 1 × …× Xn, (g t )t ∈T ) be a dynamical systemc whose state space M = X 1 × …× Xn has n components Xi (1 ≤ i ≤ n, where i, n ∈ Z + = the non negative integers). An interpretation IH of DS on a phenomenon H consists in identifying each component Xi with the set of all possible values of a magnitude Mi of the phenomenon H, and the time set T with the set of all possible instants of the time T of H itself. An interpretation IH of DS on H is empirical if the time T and all the magnitudes Mi are measurable properties of the phenomenon H. A pair (DS, IH), where DS is a dynamical system with n components and IH is an interpretation of DS on H, is said to be a model of the phenomenon H. If the interpretation IH is empirical, then (DS, IH) is an empirical model of H. Such a model is said to be empirically correct if, for any i, all measurements of magnitude Mi are consistent with the corresponding values xi determined by DS. An empirically correct model of H is also called a Galilean model of H (Giunti 1995; Giunti 1997, ch. 3). A Galilean model is then any empirically correct model of some phenomenon. As an example, let us consider the following system of two ordinary differential equations dx(v)/dv = k, dy(v)/dv = x(v) , where k is a fixed real positive constant. The solutions of such equations uniquely determine the dynamical system DSe = (X×Y, (hv)v∈V), where X = Y = V = R (the real numbers) and, for any v, x, y ∈ R, hv(x, y) = (kv + x, kv2/2 + xv + y). On the other hand, let us consider again the phenomenon of free fall He, and let IHe be the following interpretation of DSe on He. The first component X of the state space of DSe is c
A dynamical system (Arnold 1977 [1]; Szlensk 1984 [19]; Giunti 1997, 2006 [10,14]) is a kind of mathematical model that formally expresses the notion of an arbitrary deterministic system, either reversible or irreversible, with discrete or continuous time or state space. Examples of discrete dynamical systems are Turing machines and cellular automata; examples of continuous dynamical systems are iterated mappings on R, and systems specified by ordinary differential equations. Let Z be the integers, Z + the non-negative integers, R the reals and R + the non-negative reals; below is the exact definition of a dynamical system. DS is a dynamical system iff there is M, T, (g t )t ∈T such that DS = (M, (g t )t ∈T ) and 1. M is a non-empty set; M represents all the possible states of the system, and it is called the state space; 2. T is either Z, Z +, R, or R +; T represents the time of the system, and it is called the time set ; 3. (g t )t ∈T is a family of functions from M to M; each function g t is called a state transition or a t-advance of the system; 4. for any t, v ∈ T, for any x ∈ M, g 0( x ) = x and g t + v (x) = gv (g t ( x ) ) .
Bidimensional Turing Machines as Galilean Models of Human Computation
387
the set of all possible values of the vertical velocity of an arbitrary free falling body, the second component Y is the set of all possible values of the vertical position of the falling body, and the time set V of DSe is the set all possible instants of physical time. Since all three of these magnitudes are measurable or detectable properties of the phenomenon of free fall He, IHe is an empirical interpretation of DSe on He, and (DSe, IHe ) is thus an empirical model of He. For an appropriate value of the constant k, such a model also turns out to be empirically correct.d Then, according to the previous definition, the pair (DSe, IHe ) is a Galilean model of He. It is quite clear that Galilean models can go well beyond the descriptive and explanatory limits of simulation models. Note first that data description in Galilean models is direct and intrinsic, for each component of a model of this kind determines the values of a specific magnitude of the corresponding phenomenon. Second, the explanations supported by a Galilean model are realistic and comprehensive, as each of its components corresponds to a specific magnitude of the intended phenomenon, and so any explanation based on an analysis of such a model cannot introduce any arbitrary or fictional character. For these reasons, anyone interested in improving the results of Cognitive Science, both on the descriptive and explanatory score, should seriously consider the prospect of constructing Galilean models of cognitive phenomena.e This, however, surely is not an easy task. The main problem we face is that of focusing on a particular class of cognitive phenomena for which the construction of Galilean models be clearly feasible and well suited. This, in turn, entails the availability of a sufficiently detailed description of this class, in such a way that (i) a few cognitive magnitudes relevant to the phenomena of this type be clearly identified, and (ii) a suitably detailed sketch of a specific kind of Galilean model, appropriate for this particular class of phenomena, be also given. My contention is that a quite natural interpretation of Turing thesis (1936, sec. 9.I ) indeed provides us with a detailed description of a particular class of cognitive phenomena, namely, those of human computation, and that such a description does satisfy both requirements (i) and (ii) above. I will explicitly
d
e
Quite obviously, if k = the value of the acceleration due to gravity, the model (DSe , IHe ) turns out to be empirically correct within limits of precision sufficient for many practical purposes. For a further defense of this tenet, see Giunti 1995, 1997, 2005 [8,10,13]. Eliano Pessa and Maria Pietronilla Penna pointed out to me (personal communication) that they pursued the idea of employing models of a Galilean kind in some of their work on iconic memory (Penna and Pessa 1992, 1998 [17,18]; Penna and Ciaralli 1996 [16]).
388
M. Giunti
address this issue in sec. 4. In the next section, however, we need to take a preliminary look to bidimensional Turing machines. 3. Bidimensional Turing machines An ordinary Turing machine can be thought as a device formed by a head, just one slot of internal memory, and a linear tape (external memory) infinite in both directions. The internal memory slot always contains exactly one symbol (internal state) taken from a finite alphabet Q = (q1, … , qm) with at least one element. The tape is divided in adjacent squares, where each square contains exactly one symbol taken from a second finite alphabet A = (a0, a1, … , an) with at least two elements. The first symbol a0 (the blank) is a special one, and is usually indicated by b. The blank is special in that a square containing it should in fact be thought as being empty. At any discrete instant of time (step), only a finite number of tape squares contains non-blank symbols or, in other words, the tape is always completely empty except for a finite portion. The head is always located on exactly one square of the tape (the scanned square), and it is capable of performing five basic operations: read the symbol on the scanned square, write a new symbol on such a square, move one square to the right (indicated by R), move one square to the left (L), do not move (H ). At each time step, the head performs a sequence of exactly three operations: the first is a read operation, the second a write operation, and the third a moving one. The result of the read operation is the symbol aj contained in the scanned square. However, exactly which writing and moving operations the head performs next is determined by such a symbol, by the current internal state qi and by the set of instructions of the machine. In fact, for each possible pair qi aj internal-state/ scanned-symbol, the machine has exactly one instruction (quintuple) of the form qi aj : ak Mqr , where ak indicates the symbol to be written, M (either R, L or H ) is the moving operation to be performed, and qr is the internal state the machine goes in at the next time step. It is thus clear that any ordinary Turing machine is a dynamical system whose state space has the same three components, as the future behavior of an arbitrary machine is completely determined by its set of quintuples and by the current values of the following three state variables: the head position (expressed by an integer coordinate that corresponds to the scanned square), the complete tape content (expressed by a function that assigns a symbol of the alphabet A to each tape coordinate), and the machine’s internal state. The simplest form of a bidimensional Turing machine is obtained by just replacing the linear tape with a checkerboard infinite both in the right/left
Bidimensional Turing Machines as Galilean Models of Human Computation
389
direction and in the upward/downward one. Accordingly, the head can now also move one square up (U ) and one square down (D). As before, the dynamic behavior of the machine is completely determined by its set of quintuples. The only difference is that now a generic quintuple has the form qi aj : ak Mqr , where M stands for either R, L, U, D or H . Bidimensional Turing machines of this simple kind are known for the complex geometric patterns they can generate on the checkerboard (Dewdney 1989). For our present purposes, we need bidimensional Turing machines of a somewhat more general kind. The main difference with respect to the simple ones concerns the more sophisticated syntax we allow for the machine quintuples. In fact, in the general case, an instruction is not just a simple quintuple, but a quintuple schema that typically represents a possibly infinite set of quintuples of a specific form. This is obtained by allowing four different kinds of basic symbols, and precisely: (i) constants a0, a1, … , an, … (ii) variables x0, x1, … , xn, … (iii) function terms f1, f2, …, fn, … (iv) simple internal states q1, q2, … , qn, … In addition, we also allow internal state schemata, which are obtained by concatenating a simple internal state with strings of other symbols. The exact definitions of both an internal state schema and a quintuple schema will be given below. Constants are the only symbols that can in fact be written on the checkerboard squares. As usual, the first constant a0 is the blank, and it is also indicated by b. Each variable may stand for a specified set of constants (metalinguistic variable), or for a set of numbers or other specified entities. Whenever a new variable is introduced, its variation range (i.e. the set of all its possible values) must be declared explicitly; it is permitted, but not mandatory, to also declare the initial value of the variable. The value of a constant may be explicitly declared as well, and in this case the constant is said to be bound (or interpreted). A constant that is not bound is called free. Besides constants and variables, we also allow the function terms f1, f2, …, fn, … as basic symbols of our bidimensional Turing machines. Function terms stand for functions, and whenever a function term is introduced, the corresponding functionf must be declared explicitly. Functions should be thought as auxiliary operations that can be performed as needed during the execution of a routine. Function terms, together with variables and constants, f
Any such function must be computable in the intuitive sense. A function f is computable in the intuitive sense just in case there is a mechanical procedure P that computes f, where P computes f iff, for any argument x, if P is started with initial data that correspond to x, it terminates in a finite number of steps, and its final data correspond to f (x). For an explicit characterization of a mechanical procedure, see sec. 4, par. 2.
390
M. Giunti
allow the formation of simple functional expressions. Complex functional expressions, obtained by (repeatedly) nesting function terms, are also allowed. For example, let us suppose that, for a given machine, the following basic symbols have been introduced: two numeric variables m and n and, as constants, the arabic numerals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with their usual meanings; in addition we also introduce two function symbols, + and ×, which respectively stand for the addition and multiplication operations on the non-negative integers. Then, the following are all simple functional expressions (m+n), (2+4), (n+7), (m×m), (m×2), (5×3), while the next two are complex ((m×m)+3), (((m×m)+3)+n). It is convenient to distinguish between functional expressions in which variables occur (open functional expressions) and those in which there is no occurrence of variables (closed functional expressions). Closed functional expressions can be thought as additional constants, while open ones as additional function terms. Functional expressions in which no free constants occur are called bound. A functional expression that is not bound is said to be free. A closed and bound functional expression always stands for some object, and it can thus be thought as an additional interpreted constant. An open and bound functional expression always stands for a function, and it can thus be thought as an additional function term together with the relative function declaration. Free functional expressions can instead be thought as either free constants or function terms for which the relative function declaration has not been given. Internal state schemata are obtained by concatenating a first string of variables, a simple internal state, and a second string of constants, variables, or bound functional expressions. Thus, the general form of an internal state schema is vqi s, where v is an arbitrary string of variables, s is an arbitrary string of either constants, variables or bound functional expressions, and qi is any simple internal state; either v or s may be empty. Note that internal state schemata include simple internal states as particular cases, for a simple internal state is an internal state schema where both strings v and s are empty. By definition, an internal state is any sequence of objects singled out by an internal state schema. The simple internal state qi occurring in internal state schema vqi s is better interpreted as a disposition of the machine to do a certain action that takes as objects the values of the constants, variables or functional expressions in the string s, and may also depend on the values of the variables in the string v. These latter variables are in fact parameters, that is, global variables of a certain
Bidimensional Turing Machines as Galilean Models of Human Computation
391
routine. More, precisely, a parameter is a variable whose value does not change during the whole routine execution, and which is available at each step of the execution itself. The instructions of a bidimensional Turing machine are quintuple schemata, where a quintuple schema is any sequence of five strings of symbols that satisfies the following conditions: (i) the first and the fifth string is an internal state schema; (ii) the second and the third strings are either constants, metalinguistic variables, closed and bound functional expressions that stand for constants, or open and bound functional expressions that stand for functions whose codomain is a set of constants; (iii) the fourth string is either H or any finite string of the four basic movement signs R, L, U, D. Such a string stands for the complex movement obtained by sequentially combining (from left to right) the basic movements expressed by each string component. A combination of n (0 ≤ n) movements of the same kind is respectively expressed by R n, Ln, D n, U n. If n = 0, then, by definition, R 0 = L0 = D 0 = U 0 = H and, if n = 1, then R1 = R, L1 = L, D1 = D, U 1 = U. More generally, we may also allow functional movement signs of the forms R e, Le, D e, U e, where e is either a closed and bound functional expression that stands for a non-negative integer, or an open and bound functional expression that stands for a function whose codomain is a set of non-negative integers; in the general case, the fourth string of a quintuple schema is then either H or any finite string of movement signs (basic or functional). Finally, by definition, a quintuple is any sequence of five objects singled out by some quintuple schema. The whole behavior of a bidimensional Turing machine is determined by its machine table or set of instructions. A machine table is any finite and consistent set of quintuple schemata, where a set of quintuple schemata is consistent iff the set of quintuples singled out by the quintuple schemata does not include any two different quintuples that begin with the same internal-state/scanned-symbol pair. A machine table is also intended to be complete, in the following sense. If the quintuple schemata of the machine table do not single out any quintuple that begins with some possible internal-state/scanned-symbol pair, then it is intended that the machine table also includes an identical quintuple that begins with any such pair. ( Where a quintuple is identical iff its initial and final internal states are identical, the scanned symbol is identical to the written one, and the movement is H .) Similar to ordinary Turing machines, bidimensional ones are dynamical systems with three state components. In this case the three instantaneous state variables are: the head position (expressed by a pair of integer coordinates that
392
M. Giunti
corresponds to the scanned square), the complete checkerboard content (expressed by a function that assigns a constant to each checkerboard coordinate pair), and the machine’s internal state. EXAMPLE 1 I give below the table of a machine that computes the function whose domain is the set of the arabic numerals {0, 1, 2, 3, 4, 5, 6, 7, 8}, whose codomain is the set of the arabic numerals {1, 2, 3, 4, 5, 6, 7, 8, 9}, and whose values are specified as follows: (0) = 1, (1) = 2, (2) = 3, (3) = 4, (4) = 5, (5) = 6, (6) = 7, (7) = 8, (8) = 9. From now on, I will refer to as the numeral successor function. The constants of this machine are the blank b and the ten arabic numerals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. In addition, this machine has just one metalinguistic variable, d, whose range is the set of numerals {1, 2, 3, 4, 5, 6, 7, 8, 9}. No function terms are allowed for this machine. Finally, its simple internal states are just two. The first one, S, can be thought as the disposition to produce the next numeral, while the second simple internal state, E, should be thought as the disposition to end the computation. The input numeral is written on a square of an otherwise blank checkerboard. Initially, the head scans the input numeral, and the internal state is S. The machine table is as follows. Table 1 (1)
S S S S S S S S S
0 1 2 3 4 5 6 7 8
: : : : : : : : :
1 2 3 4 5 6 7 8 9
H H H H H H H H H
E E E E E E E E E
Writes the successor of any Numeral between 0 and 8, remains on the same square, and calls the ending routine (2);
(2)
E
d
:
d
H
E
STOP.
EXAMPLE 2 As a second example, let us consider the table of a machine that computes the successor function s in decimal notation. This machine has the same eleven constants as the previous one. The machine has two metalinguistic variables, d, whose range is the set of numerals {1, 2, 3, 4, 5, 6, 7, 8, 9}, and c, whose range is the set of numerals {0, 1, 2, 3, 4, 5, 6, 7, 8}. This machine has just one function term, , which stands for the numeral successor function (see example 1). The simple internal states are two. The first one, S, can be thought
Bidimensional Turing Machines as Galilean Models of Human Computation
393
as the disposition to add one to an arbitrary number, while the second simple state, E, can be interpreted as the disposition to end the computation. The input number is written in decimal notation on a row of an otherwise empty checkerboard. Initially, the head scans the rightmost digit of the input number, and the machine is in internal state S. The machine table is as follows. Table 2 (1a)
S
c
:
(c)
H
E
TEST: finished; writes the successor of any numeral between 0 and 8, remains on the same square, and calls the ending routine (1ac1);
(1b)
S
9
:
2
L
S
TEST: not finished; writes 0 in place of 9, goes one square to the left and calls test (1);
(1c)
S
b
:
1
H
E
TEST: finished; writes the carried digit 1, stays on the same square, and calls the ending routine (1ac1);
(1ac1)
E
d
:
d
H
E
STOP.
EXAMPLE 3 Below is a machine that adds to an arbitrary non-negative integer n an integer d such that 1 ≤ d ≤ 9. This machine has the same eleven constants as the ones of examples 1 and 2. Its variables are three: n, d and c. The range of n is the whole set of the non-negative integers, and its initial value is the input number n; d is an integer variable such that 1 ≤ d ≤ 9, and its initial value is the other input number d. The variable c is metalinguistic. If is an arbitrary numeral, let ( ) be the number denoted by ; then, the range of c is the set of all arabic numerals such that 0 ≤ ( ) < d, and its initial value is the numeral 0. This machine has three function terms, s, and ; the first one stands for the successor function, the term stands for the numeral successor function, and the term stands for the function that, to any number d (1 ≤ d ≤ 9), associates the corresponding arabic numeral. There is just one simple internal state, S, whose intuitive meaning is the disposition to add one to an arbitrary number. The initial value of the counter c (i.e. the numeral 0) is written on a square of an otherwise blank checkerboard, and the computation starts with the head on that square. The machine table is below. Note that the output (n+d ) is given by
394
M. Giunti
the final value of the variable n, and that the variable d is in fact a parameter, for its value is constant during the whole computation, and available at each step. Table 3 (1a)
dSn
(1b)
dSn
c
(d)
:
(c)
H
dSs(n)
TEST: not finished; increases both the counter c and the output variable and then calls test (1);
:
(d)
H
dSn
TEST: finished; output is n, and thus STOPS.
It is surprising to realize that this simple machine has a quite intriguing psychological interpretation. Let us notice first that the machine only needs one external memory location, for it always stays on the same square of the checkerboard. Let us then consider the cognitive phenomenon Hf of a human being that computes the sum (n+d ) with the help of her hands. Initially, she memorizes both the number n, and the one digit number d , while both her hands are closed. She then opens one finger while she mentally adds one to the number n, and keeps repeating this pair of operations until the number of open fingers is equal to the number d. Now she stops, having in mind the result (n+d ). Let us then consider the following quite natural interpretation of the bidimensional Turing machine above on phenomenon Hf . In the first place, the machine’s discrete time steps correspond to the steps of the human being’s computation (that is to say, they correspond to the discrete time of the real phenomenon Hf ). Second, each internal state of the Turing machine corresponds to a possible content of the human being’s working memory during the computation. Third, the position of the head on the checkerboard corresponds to the spatial location attended to by the human being while computing (that is to say, the location of her hands). And finally, the checkerboard possible contents (i.e. the numerals 0, 1, 2, 3, 4, 5, 6, 7, 8) correspond to the possible configurations of her hands (closed, with one finger open, with two fingers, etc.). EXAMPLE 4 This machine writes from left to right in decimal notation an arbitrary non-negative integer n and then stops. The machine has the same eleven constants as the ones of the previous examples (see example 1). The variables are two: n and s. The range of n is the whole set of the non-negative integers, and its initial value is the number n to be written. The range of the
Bidimensional Turing Machines as Galilean Models of Human Computation
395
second variable s is the set of all strings of decimal digits, that is, all strings made up of the ten arabic numerals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. This machine has two function terms: h and t. The term h (short for head) stands for the function that takes as argument either an arbitrary non-negative integer n or a string of decimal digits c and, respectively, returns as value the leftmost digit of the decimal representation of n or the leftmost digit of c. The term t (short for tail) stands for the function that takes as argument either an arbitrary non-negative integer n or a string of decimal digits c and, respectively, returns as value the string obtained by deleting the leftmost digit of the decimal representation of n or the string obtained by deleting the leftmost digit of c; if either the decimal representation of n or c has just one digit, then, respectively, t(n) = 0 or t(c) = 0. There are two simple internal states, E and ??. E is the disposition to write the leftmost digit of the input number; while ?? is the disposition to check whether the writing of such number has been completed. The computation begins in simple internal state E, on an arbitrary square of an empty checkerboard. As said, the initial value of n is the number n to be written. The machine table is below. Table 4 (1)
En
b
:
h(n)
R
??t(n)
Writes the leftmost digit of number n, goes to the right, keeps the tail of n, and calls test (2);
(2a)
??s
b
:
h(s)
R
??t(s)
TEST2: not finished; writes the leftmost digit of the kept tail, goes to the right keeps the new tail, and calls test (2);
(2b)
??0
b
:
b
H
??0
TEST2: finished; STOP.
EXAMPLE 5 I give below (see table 5) a bidimensional Turing machine that computes the sum of an arbitrary number of addends in decimal notation, by implementing a procedure very similar to the well known column based rule. Each addend is written in a different row of the checkerboard, with its rightmost digit in a specified column (that is, the addends are justified to the right). Immediately above the uppermost addend and immediately below the downmost one there are two horizontal lines, as long as the longest addend; the two
396
M. Giunti
horizontal lines are justified to the right as well. The result will be written below the lower line, justified to the right. This machine has twelve constants; they are the eleven constants of the previous machines (see example 1) and, in addition, the horizontal line sign –. The variables are three: n, d, and s. The range of the variable n is the whole set of the non-negative integers, and its initial value is zero; d is a metalinguistic variable whose range is the set A of the nine arabic numerals 1, 2, 3, 4, 5, 6, 7, 8, 9; s is a variable whose range is the set of all strings of decimal digits, that is to say, all strings made up of the ten arabic numerals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. This machine has seven function terms: , +, u, r, l, h, t. The term stands for the function that, to each numeral in the set A above, associates the number denoted by that numeral. The function term + stands for the addition of an arbitrary non-negative integer n and an integer d such 1 ≤ d ≤ 9 (see example 3). The term u stands for the function that takes as argument an arbitrary nonnegative integer n and returns as value the units' digit of the decimal representation of n. The term r stands for the function that takes as argument an arbitrary non-negative integer n, and returns as value the number obtained by deleting the units'digit of the decimal representation of n; if the decimal representation of n only has one digit, r(n) = 0. The term l stands for the function that takes as argument an arbitrary non-negative integer n, and returns as value the number of digits of its decimal representation. The terms h and t stand for the functions (respectively, head and tail) specified in example 4. There are six simple internal states: S, E, W, C, ?, ??. S can be thought as the disposition to sum all the numbers in a given column; E is the disposition to write the leftmost digit of the sum of the leftmost column; W is the disposition to write the units'digit of the sum of any other column; C is the disposition to perform a carrying operation; ? is the disposition to check whether or not the current column sum is the final one; ?? is the disposition to check whether the writing of the final result has been completed. Note that the machine of example 4 is in fact part of the final subroutine (2a1)-(2a2) of table 5. The computation begins in simple internal state S, with the head scanning the rightmost digit of the uppermost addend. As said, the initial value of n is zero.
Bidimensional Turing Machines as Galilean Models of Human Computation
397
Table 5 START: adds n and all the numbers in a column, at the end of the column, goes one square to the left keeps the sum result, and calls test (2);
(1)
Sn Sn Sn Sn
d 0 b –
: : : :
d 0 b –
D D D L
(2a)
?n
b
:
b
R 2D Ll (n)
En
TEST2: finished; goes to the appropriate square to start writing the final result, and calls the ending routine (2a1);
(2a1)
En
b
:
h(n)
R
??t(n)
writes the leftmost digit of the result of (1), goes to the right, keeps the tail of the result, and calls test (2a2);
(2a2a)
??s
b
:
h(s)
R
??t(s)
TEST2a2: not finished; writes the leftmost digit of the kept tail, goes to the right, keeps the new tail, and calls test (2a2);
(2a2b)
??0
b
:
b
H
??0
TEST2a2: finished; STOP.
(2a2c)
??0
d
:
d
H
??0
TEST2a2: finished; STOP.
(2b)
?n
–
:
–
RD
Wn
TEST2: not finished; calls routine (2b1);
(2b1)
Wn
b
:
u(n)
LU 2
Cr(n)
writes the units'digit of the result of (1), goes to the bottom of the column to its left, keeps the carrying number, and calls routine (2b2);
(2b2)
Cn Cn Cn Cn
d 0 b –
: : : :
d 0 b –
U U U D
Cn Cn Cn Sn
carries the number kept by routine (2b1) up to the uppermost column square, and calls routine (1);
S(n+ (d)) Sn Sn ?n
398
M. Giunti
4. Turing thesis as the core hypothesis of an empirical theory of human computation Section 9 of Turing’s well-known 1936 paper begins with the explicit question as to the kind of operations that a human being typically performs when engaged in computing: The real question at issue is “What are the possible processes which can be carried out in computing a number?” (Turing 1936, [20, p.249]; my emphasys) By the terminology of sec. 2, we can say that Turing is asking for a suitably detailed description of a specific type of cognitive phenomenon, namely, the one of human computation. In general, by a phenomenon of human computation we mean any activity of a human being that consists in executing a purely mechanical or effective procedure, where a mechanical procedure is a finite set of clear-cut formal instructions for symbol manipulation; given a finite series of data, a human being must be able to carry out such instructions in a definite sequence of steps, with the exclusive aid of paper and pencil (or equivalent external devices), and without resorting to any special insight or ingenuity.g A few lines below, in sec. 9.I , Turing provides us with a comprehensive answer to the above question. In the first place, he proposes to just focus on computations carried out on a tape divided into squares (one-dimensional paper), with the motivation that “… it will be agreed that the two-dimensional character of paper is no essential of computation.” After introducing this simplifying hypothesis, Turing points out that a computation can always be thought as involving a finite number of symbols that can be printed on the paper, and a finite number of different “states of mind” of the human being that carries out the computation (the computer). Furthermore: The behavior of the computer at any moment is determined by the symbols which he is observing and his “state of mind” at that moment. We may suppose that there is a bound B to the number of symbols or squares which
g
This, except for one point, is Copeland’s characterization of a mechanical procedure (2002). The only difference is that Copeland also requires that a mechanical procedure “… will, if carried out without error, produce the desired result in a finite number of steps”. This requirement is not essential for a correct general characterization of a mechanical procedure and, in fact, it makes it too restrictive. For this condition would immediately rule out any program that is not ending for some input. On this point, also see Corradini, Leonesi, Mancini and Toffalori (2005, sec. 1.3).
Bidimensional Turing Machines as Galilean Models of Human Computation
399
the computer can observe at one moment. If he wishes to observe more, he must use successive observations. (Turing 1936, [20, p. 250]) Turing then invites us to … imagine the operations performed by the computer to be split up into “simple operations” which are so elementary that it is not easy to imagine them further divided. Every such operation consists of some change of the physical system consisting of the computer and his tape. We know the state of the system if we know the sequence of symbols on the tape, which of these are observed by the computer (possibly with a special order), and the state of mind of the computer. (Turing 1936, [20, p. 250]; my emphasis) This is a crucial passage, for Turing is in fact suggesting that any phenomenon of human computation is completely described by exactly three different magnitudes (the state variables of the phenomenon), which are (i) the complete content of the (one-dimensional) paper, (ii) the exact location of the symbols observed by the human computer and (iii) his state of mind. Having thus singled out the state variables, Turing goes on to describe the kind of change that these variables may undergo and, on the basis of this analysis, he finally concludes that we can always construct a machine that does essentially the same operations as the ones carried out by the human being: We may now construct a machine to do the work of this computer. To each state of mind of the computer corresponds an “m-configuration” of the machine. The machine scans B squares corresponding to the B squares observed by the computer. In any move the machine can change a symbol on a scanned square or can change anyone of the scanned squares to another square distant not more than L squares from one of the other scanned squares. The move which is done, and the succeeding configuration, are determined by the scanned symbol and the m-configuration. (Turing, 1936, [20, p. 251-252]) This machine is very similar to an ordinary Turing machine, for it is formed by a tape, an internal memory, and a head that, at each time step, scans a fixed small number of adjacent squares. Turing thesis properly so-calledh can thus be formulated as follows: h
Church thesis (1936) [2] is the following assertion: [CT ] A numeric function is computable in the intuitive sense iff it is recursive (or, equivalently, lambda-
400
M. Giunti
[TT ] Apart from inessential simplifications or idealizations, ordinary Turing machines are adequate models of the phenomena of human computation. As we have just seen, [TT ] should be interpreted in the strong sense that, apart from inessential details, the mechanical procedure that is executed by the human being involved in an arbitrary phenomenon of human computation is identical to the one executed by an appropriately chosen Turing machine.i As stated, [TT ] is a philosophical hypothesis, which turns out to be exceedingly plausible in the light of Turing’s speculative analysis of human computing. However, the very same analysis also provides us with the necessary elements for transforming [TT ] in the core methodological hypothesis of an empirical theory of the phenomena of human computation. First, we should notice that most of the idealizations or over-simplifications introduced by Turing in his analysis of human computing can be eliminated by taking into account not ordinary Turing machines, but bidimensional ones. In particular, this kind of machine allows us to directly deal with (a) human computations carried out on two-dimensional paper, (b) more complex movements than one square at a time, and (c) more complex operations as well, for the reading, writing, moving, and change-internal-state operations can now be carried out with the help of auxiliary functions. Second, Turing’s analysis tells us how to interpret the three state variables of any bidimensional Turing machine BT on the three corresponding magnitudes of any phenomenon of human computation C. For, according to the crucial passage quoted above, we should obviously interpret (i) the head position (expressed by a pair of integer coordinates that corresponds to the scanned square) as the spatial location (typically, the region of the two-dimensional paper) attended to by the human being at each computation step, (ii) the complete checkerboard content (expressed by a function that assigns a constant
i
definable). The non-trivial part (left-right implication) of [CT ] is usually called Church-Turing thesis (Copeland 2002 [3]; Gandy 1980, [6, p.124 ] ), for Turing thesis [TT ], in conjunction with the equivalence between the two formal notions of recursive and Turing computable function, logically entails [CT ]. It is widely agreed that this inference, sometimes called the analogic argument, is the strongest available argument in favor of [CT ]. For an extended discussion of the relationships between [TT ] and [CT ], see Giunti and Giuntini 2007, [15, sec. 5]. The mechanical procedure executed by an ordinary Turing machine can be identified with its set of quintuples. The set of quintuples of a Turing machine obviously is a mechanical procedure in the intuitive sense specified in the second paragraph of this section.
Bidimensional Turing Machines as Galilean Models of Human Computation
401
to each checkerboard coordinate pair), as the complete content of the paper, and (iii) the machine’s internal state as the content of the human being’s working memory during the computation. Furthermore, Turing’s analysis also makes clear that the machine’s discrete time steps should be interpreted as the steps of the human being’s computation (that is to say, as the discrete time of the real phenomenon C ). Since the bidimensional Turing machine BT is a dynamical system with three components (see sec. 3, par. 13-14), the interpretation SC proposed above is in fact an interpretation of BT on C in exactly the sense defined in sec. 2, par. 6, so that (BT, SC) is a model of C. In addition, if SC is empirical and (BT, SC) is empirically correct, then (BT, SC) is a Galilean model of C. From now on, I will refer to the proposed interpretation SC as the standard interpretation of a bidimensional Turing machine BT on a phenomenon C of human computation. The previous observations thus motivate the following methodological version of Turing thesis: [MTT ]
For any specific phenomenon C of human computation, there is an appropriate bidimensional Turing machine BT such that (BT, SC) turns out to be a Galilean model of C.
Unlike the original formulation [TT ], this version of Turing thesis has a definite empirical content. In the next section, I will sketch how [MTT ] might function as the core hypothesis of a new empirical theory of human computation. 5. Developing the theory The proposed methodological version [MTT ] of Turing thesis claims that, for any phenomenon of human computation, there is an appropriate bidimensional Turing machine that, when interpreted in the standard way, turns out to be a Galilean model of that phenomenon. Let us then consider the set of all such models, and let us call this set the [MTT ]-based theory of human computation.j If [MTT ] is true, then such a set is obviously non-empty, and thus the [MTT ]-based theory is consistent in a semantic sense; in addition, it is also complete, in the sense that it contains at least one Galilean model for each j
More precisely, the [M TT ]-based theory of human computation is the set of all pairs (BT, SC ) such that, for some phenomenon C of human computation , BT is a bidimensional Turing machine, SC is the standard interpretation of BT on C, and (BT, SC ) is a Galilean model of C.
402
M. Giunti
phenomenon in its intended domain (that is to say, for each phenomenon of human computation). Conversely, if [MTT ] is false, then the [MTT ]-based theory is incomplete and, possibly, inconsistent as well. Thus, investigating the truth/falsity of [MTT ] is tantamount to investigating the completeness/incompleteness of the [MTT ]-based theory of human computation. At the moment, we do not know whether the [MTT ]-based theory is complete, or even consistent. However, the very formulation of [MTT ], together with Turing’s general analysis of human computation, suggests a definite empirical method for systematically investigating the content of this theory, so that we will then be able to put forth our best informed guess as to both its consistency and completeness. An all too concise sketch of this method is below. EMPIRICAL METHOD FOR INVESTIGATING THE [MTT ]-BASED THEORY OF HUMAN COMPUTATION 1. Focus on a specific phenomenon C = (F, BF) of human computation, where each specific phenomenon is singled out by its functional description F (see sec. 2, par. 4), which is based on the particular mechanical procedure P executed by the human computer involved in C; 2. try and specify a bidimensional Turing machine BT that executes a mechanical procedure (i.e. a set of quintuple schemata) as similar as possible to the one executed by the human computer of the phenomenon C; 3. consider the standard interpretation SC of BT on C, and claim that: (BT, SC) is a Galilean model of C. 4. Then, try to confirm this claim; that is to say, specify observation methods for each of the three state-magnitudes of the standard interpretation SC , as well as for its time-magnitude; 5. on the basis of the specified observation methods, gather empirical timeseries for each state-magnitude; 6. compare the observed time-series with the corresponding theoretical ones determined by BT ; 7. if the fit between observed and theoretical time-series is sufficiently good, (a) take claim 3 to be confirmed; otherwise, (b) do not take claim 3 to be confirmed; 7a1. if (a), consider a new specific phenomenon of human computation and start again from 1; 7b1. if (b), carefully revise the previous steps in reverse order; more precisely, first revise 7, then 6, 5 and 4;
Bidimensional Turing Machines as Galilean Models of Human Computation
403
7b2. if none of the previous revisions is sufficient to get confirmation of claim 3, revise claim 3 itself, by revising either step 2 (modify BT ) or step 1 (refine the functional description F that singles out C ); 7b3. then go on to step 4 and repeat from there. The method above will allow us to put forth an informed guess as to the consistency of the [MTT ]-based theory in the following sense. As soon as, for some phenomenon C of human computation and some model (BT, SC), a confirmation of claim 3 is reached, our guess about the consistency of the theory will be positive. If, on the contrary, after a sufficiently long and careful implementation of the method, no confirmation of claim 3 is forthcoming for any model of any phenomenon, then we will have good grounds to conjecture that the theory is inconsistent. As for completeness, if the application of the method yields the prospect of an ever increasing series of confirmed models, with no especially hard open puzzle or anomaly, then our guess about the completeness of the theory will be positive. If, on the contrary, after a sufficiently long and careful application of the method to a particular phenomenon C of human computation, no confirmation of claim 3 is forthcoming for any model of C, then we will have good grounds to conjecture that the theory is incomplete. A special feature of the method above is worth mentioning. Suppose that, for some phenomenon of human computation C, and some model (BT, SC), claim 3 has been confirmed; also suppose that the bidimensional Turing machine BT includes some auxiliary function f. Then, since the function f is computable in the intuitive sense (see note f ) there is a mechanical procedure Pf that computes f. As Pf is a mechanical procedure, it singles out a phenomenon of human computation Cf (see step 1 of the method above). We can thus go on applying the method to Cf , and eventually repeat, until we find a phenomenon C* and a model (BT *, SC *) whose bidimensional Turing machine BT * does not include auxiliary functions. The phenomenon C* can thus be thought as a basic phenomenon of human computation, while all the ones we encountered along our way from C to C*, taken in reverse order, can be thought as phenomena of increasing complexity. It is then natural to advance the following hypothesis as to the relationship between a phenomenon Cn and its more complex successor Cn+1 in the previous chain: The mechanical procedure Pn (constitutive of the simpler phenomenon Cn) has been previously rehearsed, and then internalized, by the human computer involved in Cn+1, so that Pn can now be internally and automatically executed as needed during the external and conscious execution of Pn+1.
404
M. Giunti
6. Concluding remarks − toward a Galilean approach to cognition The main thrust of this paper has been to show that the actual construction of Galilean models is clearly feasible, and well suited, for a special class of cognitive phenomena, namely, those of human computation. Whether this Galilean approach to cognition can be extended to other types of cognitive phenomena, or even to all of them, is a question that, at the moment, is difficult to settle in the positive sense. At the very least, however, there seems not to be any special reason for denying this possibility. Also, it is sufficiently clear how this rather abstract possibility of today might become a concrete one in the future. In order to extend the Galilean approach to new types of cognitive phenomena K1, K2, … , Kn, … we should first of all give an explicit characterization D1 of the phenomena of type K1. In particular, the level of precision of D1 should be comparable to the one of the informal definition of human computation given in sec. 5, par. 2. Second, on the basis of D1, a further analysis A1 of K1 should clearly point out (i) whether the time magnitude T of an arbitrary phenomenon of type K1 is discrete or continuous, and (ii) a finite number of magnitudes M1, M2, …, Mk which determine the state evolution of any phenomenon of type K1. The level of depth of A1 should be comparable to Turing’s speculative analysis of human computing (1936, sec. 9.I ). Third, we should then give an explicit and detailed characterization E1 of a special class of ideal systems, and a corresponding formal definition L1 of a special class of dynamical systems, in such a way that there is an obvious one-to-one correspondence between the E1-ideal systems and the L1-dynamical systems. In addition, an arbitrary L1-dynamical system should have as many state components as the number of magnitudes specified by A1 and, on the basis of both E1 and A1, each state component should be naturally identified with the set of possible values of exactly one of these magnitudes; in other words, E1 and A1 should provide a standard interpretation SH1 of any L1-dynamical system on any phenomenon H1 of type K1. Finally, for any phenomenon H1 = (F1, BF1) of type K1, it should be possible to naturally identify the abstract type of real system ASF1 (see sec. 2, par. 4) with an appropriately chosen E1-ideal system.k Fourth, we should state the basic methodological hypothesis for a new empirical theory of the cognitive phenomena of type K1: k
Note that, in the case of the phenomena of human computation, this identification is provided by the original formulation [TT ] of Turing thesis.
Bidimensional Turing Machines as Galilean Models of Human Computation
[MT 1]
405
For any specific phenomenon H1 of type K1, there is an appropriate L1-dynamical system DL1 such that (DL1, SH1 ) turns out to be a Galilean model of H1.
Fifth, we should then consider the [MT 1]-based theory of K1,l and start its empirical investigation by applying an empirical method analogous to the one described in sec. 5. If we are able to effectively carry out this detailed research program for K1, K2, … , Kn , … , then the Galilean approach to cognition will be implemented. References 1. V.I. Arnold, Ordinary differential equations (The MIT Press, Cambridge, MA, 1977). 2. A. Church, American Journal of Mathematics 58,345-363 (1936). 3. B.J. Copeland, in The Stanford Encyclopedia of Philosophy, Ed. E.N. Zalta, (2002). URL = . 4. F. Corradini, S. Leonesi, S. Mancini and C. Toffalori, Teoria della computabilità e della complessità (McGraw-Hill, Milano, 2005). 5. A.K. Dewdney, Scientific American 261(9),180-183 (1989). 6. R. Gandy, in The Kleene symposium, Ed. J. Barwise, H.J. Keisler and K. Kunen, (North Holland Publishing Company, Amsterdam, 1980), pp. 123-148. 7. M. Giunti, Computers, dynamical systems, phenomena and the mind, Ph.D. dissertation, (Indiana University, Bloomington, IN, 1992), (Published by University Microfilms Inc., Ann Arbor MI. UMI order number: 9301444). 8. M. Giunti, , in Mind as motion: Explorations in the dynamics of cognition, Ed. R.F. Port and T.J. van Gelder, (The MIT Press, Cambridge, 1995), pp. 549-571. 9. M. Giunti, in Proceedings of the 18th annual conference of the Cognitive Science Society, Ed. G.W. Cottrel, Mahwah, (L. Erlbaum Associates, NJ, 1996), pp. 71-75. 10. M. Giunti, Computation, dynamics, and cognition (Oxford University Press, New York, 1997). 11. M. Giunti, in Prospettive della logica e della filosofia della scienza: Atti del convegno triennale della Società Italiana di Logica e Filosofia delle Scienze, Roma, 3-5 gennaio 1996, Ed. V.M. Abrusci, C. Cellucci, R. Cordeschi and V. Fano, (Edizioni ETS, Pisa, 1998), pp. 255-267. 12. M. Giunti, in Storia e Filosofia della scienza: Un possibile scenario italiano. Atti del convegno Storia e filosofia delle scienze: lo stato delle ricerche italiane di
l
In general , the [M T i ]-based theory of Ki is the set of all pairs (DLi , SHi ) such that, for some phenomenon Hi of type Ki , DLi is a Li-dynamical system, SHi is the standard interpretation of DLi on Hi , and (DLi , SHi ) is a Galilean model of Hi.
406
13.
14. 15. 16. 17. 18.
19. 20. 21. 22.
M. Giunti punta, Padova, 28-30 maggio 1997, Ed. E. Bellone and G. Boniolo, (Milano, 1998), pp. 89-98. M. Giunti, in Atti del XIX congresso nazionale dell'Associazione Italiana di Psicologia, sez. di psicologia sperimentale, Cagliari, 18-20 settembre 2005, (AIP, Cagliari, 2005). M. Giunti, in Systemics of emergence: Research and development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, Berlin, 2006), pp. 683-694. M. Giunti, and R. Giuntini, in Title yet to be announced, Ed. S. Mancini, (Mimesis Edizioni, Milano, 2007). M.P. Penna, and P. Ciaralli, in Third systems science european congress, Ed. E. Pessa, M.P. Penna and A. Montesanto, (Kappa, Roma, 1996), pp. 533-536. M.P. Penna, and E. Pessa, Comunicazioni Scientifiche di Psicologia Generale 8,151-178 (1992). M.P. Penna, and E. Pessa, in Proceedings of the second European Conference on Cognitive Modelling (ECCM-98), Ed. F.E. Ritter and R.M. Young, (Nottingham University Press, Nottingham, UK, 1998), pp. 120-126. W. Szlensk, An Introduction to the theory of Smooth Dynamical Systems (John Wiley and Sons, Chichister, England, 1984). A.M. Turing, Proceedings of the London Mathematical Society, Series 2, 42, 230265 (1936). A.J. Wells, Cognitive Science 22(3), 269-294 (1998). A.J. Wells, Rethinking cognitive computation: Turing and the science of the mind (Palgrave Macmillan New York, 2006).
A NEURAL MODEL OF FACE RECOGNITION: A COMPREHENSIVE APPROACH VERA STARA(1), ANNA MONTESANTO(1), PAOLO PULITI(1), GUIDO TASCINI(1), CRISTINA SECHI(2) (1) Università Politecnica delle Marche, Facoltà di Ingegneria, DEIT [email protected] [email protected] [email protected] [email protected] (2) Università degli Studi di Cagliari, Facoltà di Scienze della Formazione [email protected] Visual recognition of faces is an essential behavior of humans: we have optimal performance in everyday life and just such a performance makes us able to establish the continuity of actors in our social life and to quickly identify and categorize people. This remarkable ability justifies the general interest in face recognition of researchers belonging to different fields and specially of designers of biometrical identification systems able to recognize the features of person's faces in a background. Due to interdisciplinary nature of this topic in this contribute we deal with face recognition through a comprehensive approach with the purpose to reproduce some features of human performance, as evidenced by studies in psychophysics and neuroscience, relevant to face recognition. This approach views face recognition as an emergent phenomenon resulting from the nonlinear interaction of a number of different features. For this reason our model of face recognition has been based on a computational system implemented through an artificial neural network. This synergy between neuroscience and engineering efforts allowed us to implement a model that had a biological plausibility, performed the same tasks as human subjects, and gave a possible account of human face perception and recognition. In this regard the paper reports on an experimental study of performance of a SOM-based neural network in a face recognition task, with reference both to the ability to learn to discriminate different faces, and to the ability to recognize a face already encountered in training phase, when presented in a pose or with an expression differing from the one present in the training context. Keywords: face recognition, biometrics, neural networks.
1. Introduction Visual recognition of faces is an essential behavior of humans: human activity relies on the classification of faces as distinct from other objects and the ability to recognize facial stimuli has significant social implications. The importance of this recognition process can be easily understood when we take into account the fact that since birth infants show a special interest for faces (Johnson, 1991 [15], 407
408
V. Stara et al.
Johnson and Morton, 1991 [16]) and are able to discriminate mother’s face from a stranger’s face. Experimental studies performed with neurologically normal subjects have suggested that faces are perceived as a special class of stimuli, distinct from other patterned objects. Face recognition differs from object recognition in that the former involves representing a face as a single, complex whole, whereas the latter typically involves decomposition into constituent elements (Farah et al., 1999 [8]). According to this point of view, recognition performance is worse in neuropsychologically normal adults when faces are presented upside down than when objects are presented upside down (Valentine, 1988 [28]; Farah et al., 1995 [9]). Research on patients and neuroimaging studies have reported increased activation in the fusiform gyrus in concomitance with the presentation of faces, although less activation is observed if the faces are presented upside down (Kanwisher et al., 1997 [17]). This same group has reported greater activation in this region to faces than to human or animal heads (Kanwisher et al., 1999 [18]). Also using fMRI, Gauthier et al. (1999) [12] reported that as subjects acquired expertise in recognizing artificial stimuli, the middle fusiform gyrus in the right hemisphere was recruited and showed a pattern of activation that was indistinguishable from that elicited by faces. Similarly, under passive viewing, activation in this area was greater in a single subject with expertise in viewing “greeble” faces versus individuals lacking such expertise (Gauthier et al., 1997, 1998, 2004 [10,13,11]). Overall, these results suggest that the fusiform ‘face area’ becomes specialized with experience. In terms of facial emotion, Whalen et al. (1998) [30] reported increased activation in the amygdala to fearful faces, but decreased activation to happy faces. Morris et al. (1996) [20] observed that neuronal activity in the left amygdala was significantly greater in response to fearful as opposed to happy faces. Collectively, from the neuroimaging studies it appears that regions in and around the fusiform gyrus seem to play a role in face recognition, whereas the amygdala plays a particularly important role in the recognition of facial expressions. Within the amygdala, some nuclei have been found to be responsive to individual faces, whereas others respond to individual expressions (Aggleton et al., 1980 [2]; Nahm et al., 1991 [21]). Although an unambiguous picture is emerging that faces may be accorded special status by the brain, it remains unclear upon what basis face specialization develops. From an evolutionary perspective, recognizing faces would be adaptive, and thus, selected for through evolution.
A Neural Model of Face Recognition: A Comprehensive Approach
409
Both human and non-human primates use in fact their faces to produce a range of social signals; more importantly, non-human primates may depend more on this medium for communication than do adult humans, given the absence of oral language. Thus, it is not surprising that monkeys are adroit in both face and emotion recognition (Boysen and Bernston, 1989 [4]; Phelps and Roberts, 1994 [25]; Parr et al., 1998 [23]; Pascalis and Bachevalier, 1998 [24]). It is now to be taken into account that the body of knowledge summarized above is endowed not only with a scientific value but also with a practical valence. Namely there is a large number of commercial, security, and forensic applications requiring the use of face recognition technologies: crowd surveillance, access control, “mugshot” identification, face reconstruction, design of human computer interfaces, multimedia communication and contentbased image database management. So far, the performance in face recognition tasks of human subjects largely outperforms the one evidenced by Automatic Face Recognition Systems (AFR). This entails the need for emulating the ability of human visual system by relying on the knowledge previously quoted. In this regard, we must underline that, when trying to emulate human ability, some AFR could be forced to rely on genuinely systemic approaches. This could derive not so much from the need to integrate biological, as well as psychological knowledge, with technological one. Rather this could be a consequence of the fact that human visual system operates per se in a systemic way, as its recognition output emerges from the interactions occurring within a complex network of specific visual subsystems, each devoted to a specific detection task (see, e.g., Zeki, 1993 [31]; Bartels and Zeki, 2005 [3]). Of course, the adoption of such a perspective entails that, among the AFR, the ones based on artificial neural networks should be at the same time the most biologically plausible and successful. Namely artificial neural networks are, among the available models of emergent processes, perhaps the easier ones, as regards the concrete implementation and the operation. Just for this reason in this paper we will introduce a particular kind of AFR based on a specific neural network model. As regards the domain of AFR, we remark that in most cases the worry for a systemic approach is totally absent. An overall picture of this field is very complicated but will be shortly summarized in the following. Before beginning our exposition it is to be recalled that, after about 30 years of research, the field of face recognition gave rise to a number of feasible technologies whose development required an interdisciplinary cooperation between experts in very different domains, such as image processing, pattern recognition, neural networks, computer vision, computer graphics and
410
V. Stara et al.
psychology. This development was driven, in most cases, by practical needs. In particular, we had a growth of the need for user friendly systems that can secure privacy without losing person’s identity. Not only, whereas some biometric personal identification rely on the cooperation of the participants (for example fingerprint analysis or retinal/iris scan), face recognition systems are often effective without people’s cooperation. • For this reason, typical applications of Face Recognition are in: • Entertainment area for specific applications in video games, virtual reality, training programs, human robot interaction and human computer interaction; • Smart card area for specific applications in drivers’ licenses, entitlement programs, immigration, national ID, passport, voter registration, welfare fraud; • Information security area for specific applications in TV parental control, personal device logon, desktop logon, application security, database security, file encryption, intranet security, internet access, medical records, secure trading terminals; • Law enforcement and surveillance for specific applications in advanced video surveillance, CCTV control, portal control, postevent analysis, shoplifting, suspect tracking and investigation. The purpose of these systems is to recognize the shapes of the features of a person's face in a background. To achieve the goal, a camera captures the image of a face, and then the software extracts pattern information. The latter is then compared with the one contained in user templates. In this regard, two techniques are used: one compares feature sizes and relationships (for example the nose length and the distance between eyes); the other method matches person most significant image data with a record stored in a face database. Due to interdisciplinary nature of face recognition and its possible systemic view, in this contribute we decided, owing mostly to the reasons discussed above, to deal with face recognition through a comprehensive approach with the purpose to reproduce some features of human performance, as evidenced by studies in psychophysics and neuroscience, relevant to face recognition, through a computational system based on a neural network. What are the advantages and drawbacks of this systemic view? An obvious advantage would be the one of assuring the emergence of recognition in a way mimicking the one evidenced by neurophysiological studied on visual cortex operation. However, this advantage exists only in principle, as the details of the operation of human visual cortex are still poorly known. For this reason
A Neural Model of Face Recognition: A Comprehensive Approach
411
comparisons between data coming from neuroscience or psychology and the ones coming from computer science could be very useful. For psychologists, comparing human performance to the performance of computational models of face recognition can potentially give insight into the processes used by the human visual system to encode and retrieve faces. For computational theorists, knowledge of the way human face processing works may yield insights into processing strategies that will allow for more flexible and robust face processing systems (O'Toole et al., 2000 [22]). The principal drawback is obviously related to the difficulty in emulating the ability of human visual system. This difficulty is due to some open issues that need to be solved: 1. The face has a 3D shape. For this reason the appearance of the face could change depending on: projective deformations, which leads to stretching and foreshortening of different parts of face; self occlusion and dis-occlusion of parts of the face. If in the past experience a face was seen only from one viewing angle, in general it is difficult to recognize it from different angles. 2. The pose variation is an inevitable problem owing to the illumination variation of the background within and between days and among indoor and outdoor environments. The direct effect is due to the 3D shape of the face, that can create strong shadows and shading. It accentuates or diminishes certain facial features: the inherent amount of light reflected off of the skin and the non-linear adjustment in internal camera control, can have bad effects on facial appearance. 3. The face is also a non-rigid object: facial appearance varies with facial expression of emotion and paralinguistic communication along with speech acts. This problem is crucial for geometry based algorithms: facial expression affects the apparent geometrical shape and position of the facial features. 4. Faces change over time in hair style, makeup, muscle tension and appearance of the skin, presence or absence of facial hair, and over longer periods owing to effects related to aging. 5. Algorithms may be more or less sensitive to gender and race. Males and females might be harder to recognize owing to day-to-day variation in makeup or in structural facial features, for different local features and shape. Men’s faces have thicker eyebrows and greater texture in the beard region, whereas in women’s faces the distance between the eyes and brows is greater, the protuberance of the nose smaller, and the chin narrower than in men. Despite these problems, the synergy between neuroscience and engineering efforts allowed us to implement a model that had a biological plausibility,
412
V. Stara et al.
performed the same tasks as human subjects, and gave a possible account of human face perception and recognition. The obtained results evidence a good performance of SOM network we used. The latter, however, is crucially dependent on a correct choice of the number of categorizing units, as well as of learning parameter values. Moreover, also the gender of faces seems to play an important role, as female faces entail a lower performance with respect to the male ones. In any case, the use of a SOM network in face recognition appears as very convenient, owing to the fact that it allows a satisfactory performance despite its simplicity and the possibility of finding a neurophysiological interpretation of its operation. 2. The Neural Network Model The neural network model used in this research was a standard two-layered network with feedforward connections. Both input and categorization layer were 2-dimensional. The input stimulations were consisting in 2-dimensional images, suitably preprocessed through methods described in the next section. The training phase has been based on usual shortcut Kohonen algorithm. The explicit form of laws ruling time evolution of learning parameter and bubble radius, as well as the parameter values adopted in the simulations, were the following:
η 0 = 0.10 ; β = 0.0001; R0 = 4 ; b0 = 0.0001 R(t ) = R0 exp(−b0 t ) radius of activity bubble
α (t ) = η 0 exp(− β 0 t ) learning parameter
.
The duration of each training phase was directly fixed in advance by the experimenter. Once performed the training process, we left open two possibilities for using the network, keeping fixed the weight values obtained at the end of training, that is: a) a test (or categorization) task, in which a stimulus image was fed in input to the network and associated to the category represented by the winning categorization unit (according to Kohonen algorithm prescription), b) a recognition task, in which two sets of images were used, the target set, and the probe set. For each image of both sets, the output activation ui of each categorizing unit was computed through the formula:
ui =
k
wik xk
where wik denotes the components of the weight vector associated to the i-th categorizing unit, and xk is the output of the k -th input unit (to shorten the notations, we neglected here the 2-dimensional character of both input and
A Neural Model of Face Recognition: A Comprehensive Approach
413
categorization layer). In this way, each input image was associated to a pattern of activation of categorizing units. Then, for each image of the probe set, the Euclidean distances between its activation pattern and the ones of the single images of target set were computed. The image of the probe set taken into consideration was associated to the image of the target set whose activation pattern was characterized by the minimum distance. In most cases the images of target set and of probe set were referring to the same faces, the only difference being facial expression or pose. In this regard, if an image of the probe set was associated to another image of target set, containing the same face, the former face image was considered as recognized, while it was considered as not recognized in the contrary case. The face recognition accuracy was defined as the percentage ratio between the number of probe images recognized and the total number of probe images. 3. Face Image Preprocessing In order to make easier the operation of the network and to lower computational costs, a suitable preprocessing of face images was introduced. To this aim, a mask was superimposed on each face image, making some predefined points to coincide with eyes, mouth and nose. The mask dimensions can be adjusted to exactly match the points with corresponding face parts. Due to this operation, the images obtained with this method may have different dimensions. They were therefore rescaled to a standard size, through an interpolation routine, to allow a comparison between different faces. Then, the image was subdivided in two areas by means of a line passing vertically through the center of the nose, and the left part was specularly reproduced on the right, obtaining a perfectly symmetric face. This step was introduced to avoid the interference of natural asymmetries with the categorization procedure of the network, since the latter could mask the peculiar characteristics of the face due to different face parts positioning. Images were in gray-scale uncompressed PNG format. A normalization process was applied after network acquisition, to compensate for luminance differences. In these experiments we used the FERET dataset, a collection of images acquired through a 35-mm camera and then digitized. Two frontal views for each subject were taken into consideration: 1) a neutral facial expression (the learning probe), 2) a different facial expression (the testing probe). Each image of a given individual was taken on the same day with the same lighting. Moreover, we created 3 special classes of stimuli different for gender and race: 1) a class including 40 different male white faces (CLASS 1); 2) a
414
V. Stara et al. Table 1. The experimental design.
Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6
Output NODES 6×8 9 × 12 12 × 16 12 × 16 12 × 16 12 × 16
Class of STIMULI Class 1 Class 1 Class 1 Class 1 Class 2 Class 3
Number of LEARNING STEPS 20000 20000 20000 40000 40000 40000
class including 40 different female white faces (CLASS 2); 3) a class including 40 different faces, of which 20 were female faces (10 white and 10 black) and 20 male faces (10 white and 10 black) (CLASS 3). For each class there was a learning set and a testing set, so as to have a total of 80 face images for class. 4. The Experimental Design A study of performance of a SOM-based network in a face recognition task must be designed in order to answer a number of questions, the most important being the following ones: 1. what is the optimal number of nodes of categorization layer? 2. what is the optimal number of learning steps? 3. are gender or race differences important in explaining network performance? To this aim, we performed six successive experiments, in each one of which the training was carried out on 40 different face images of the class taken into consideration. The same images used in the training phase were fed in input in the categorization phase, while the recognition phase was dealing with the other 40 face images of the same class, not taken into consideration in the training phase. As previously remarked, each face image used in the recognition phase represented the same face of one of the face images used in the training phase, the only difference being the facial expression or the pose. The features characterizing the six experiments are summarized in the following Table 1. We underline that the first three experiments were done in order to identify the best dimension of the SOM based on the idea that the number of neurons influences the recognition accuracy. Once known the best dimension, in the last two experiments we studied the accuracy of the SOM as regards gender and race giving homogeneous stimuli in the 4th experiment and inhomogeneous stimuli in
A Neural Model of Face Recognition: A Comprehensive Approach
415
Table 2. The activated categorization nodes for each face in Experiment 1. 0 1 2 3 4 5 6 7 8
0 F10,F14
1
2 F18
3
F38
4
5
F24,F28
F1, F2
F9,F19 F4,F21
F37
F40 F25,F29,F31
F23
F3
F12
F6,F30
F16
F33
F13,F20
6
F26
F22
F27,F34
F15,F17
F5,F39
F8,F11,F32
Accuracy 1 Experiment
%
100 50
77,5
65
0 5.000
10.000
77,5
15.000
77,5
20.000
Epochs Figure 1. Recognition accuracy vs. number of training steps in Experiment 1, using 6×8 output nodes.
the 5th experiments. The accuracies evidenced for these classes were then compared with the accuracy found in class 1. 5. The Outcomes of the Experiments 5.1. Experiment 1 The table 2 shows, in categorization phase, the specific nodes that every face image used in learning phase appears to be associated to. As one can see, there are 11 nodes associated to one and only one face, while we have a partial overlapping of the other 29 faces on the same nodes. Then, relying on the other 40 images of faces of the same class belonging to the probe set, we computed the recognition accuracy as a function of duration of training phase. The results are shown in Figure 1, and evidence that the accuracy grows from 65% after 5.000 steps to 77,5% after 10.000 step, corresponding to a correct recognition of 31 faces. A prolongation of training phase does not
416
V. Stara et al. Table 3. The activated categorization nodes for each face in Experiment 2. 0 1 2 3 4 5 6 7 8 9 10 11 12
0 F33
1
2
3
F34
4 F5
5
F37
F23
F26 F30
7
F39
F14
F21
F40
F35
F12
8 F6,F38
9
F4
F27
F7,F22
6 F10
F20
F36 F18
F15
F13 F17
F25 F31
F24
F29
F3 F16
F9
F28
F2
F11 F1
F19 F8
F32
Accuracy 2 Experim ent 100 80
77,5 72 ,5
60
77,5
75
40 20 0 5000
10000
15000
20000
Ep o chs
Figure 2. Recognition accuracy vs. number of training steps in Experiment 2, using 9×12 output nodes.
produce an improvement of recognition accuracy, which lasts unchanged even after 20.000 steps. 5.2. Experiment 2 The activated categorization nodes (their total number is 36) in this experiment are shown in table 3. A partial overlapping of 4 faces occurs, more precisely in node 8,0 for faces 6 and 38 and on node 0,4 for faces 7 and 22. The recognition accuracy grows from 72,5% after 5000 steps to 77,5% after 15000 steps (see Figure 2). Even in this case a further increase of the number of learning steps does not produce an improvement of accuracy. From these data we can deduce that the new choice of number of nodes does not seem to
A Neural Model of Face Recognition: A Comprehensive Approach
417
Table 4. The activated categorization nodes for each face in Experiment 3. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
F17
1
2
F2
3
4
5
6
F20
7
8
9
10
F12
11
12
F13 F15
F1
F26
F34
F3,F22 F7
F16
F37
F5 F33
F30
F28
F24
F31
F36 F29
F23
F25
F40 F6
F9
F38 F18
F39
F19
F4 F14
F10
F8
F21 F32
F27
F35
F11
Accuracy 3 Experim ent 100 80 60 40 20 0
72,5
5000
80
77,5
10000
15000
80
20000
Epoc hs
Figure 3. Recognition accuracy vs. number of training steps in Experiment 2, using 12×16 output nodes.
influence the recognition accuracy. However, this choice led to a strong decrease of overlaps of different faces within the same category. 5.3. Experiment 3 The association between faces and categorizing nodes is shown in Table 4. Two faces (F3 and F22) are categorized in the same node 0,3 The recognition accuracy grows from 72,5% after 5000 learning steps to 80% after 15000 learning steps (see figure 3). This maximum value does not change if we add further 5000 learning steps. These data evidence how the new choice of the number of categorizing nodes be able to produce an increase in the
418
V. Stara et al.
Accuracy x Output Nodes 100
65
72,5 72,5
77,5 75 77,5
77,5 77,5 80
77,5 77,5 80
10000
15000
20000
50 0 5000
Epoc h s
6x8
9x12
12x16
Figure 4. Accuracy of Experiments x output nodes Accuracy CLASS 1 80 77,5
80
80
80
60 %
40 20 0 1000 0
2000 0
Epoch s
3000 0
4000 0
Figure 5. Recognition accuracy vs. number of training steps in Experiment 4.
maximum value of recognition accuracy and, at the same time, a decrease of categorization overlaps. These first three experiments showed us that the output layer configuration based on 12×16 nodes was the best among the tested ones (see in figure 4 a graphical representation summarizing our findings). Thus, in the following experiments we always used it. 5.4. Experiment 4 Even in this experiment we used 40 white male faces (CLASS 1) but we trained the network for 40.000 epochs. 80 % of faces were recognized (see the plot of accuracy vs. number of learning steps in figure 5). What should happen if into the CLASS 1 we would add 20 faces? Should we repeat the training phase? To answer this question we built a new dataset of 20 male faces (20 faces in the learning probe and the corresponding 20 faces in the
A Neural Model of Face Recognition: A Comprehensive Approach
419
Accuracy CLASS 2 80 60
60
60
60
60
% 40 20 0 1000 0
2000 0
Epoch s
3000 0
4000 0
Figure 6. Recognition accuracy vs. number of training steps in Experiment 5. Accuracy CLASS 3 80 60 %
70
72,5
72,5
72,5
40 20 0 1000 0
2000 0
Epoch s
3000 0
4000 0
Figure 7. Recognition accuracy vs. number of training steps in Experiment 6.
testing probe), and, using the weights obtained at the end of the previous experiment, we checked the recognition accuracy for 20 male faces. 95 % of the latter was recognized. Thus it seems that the network does not need to repeat the training phase when adding new faces. However, this optimal accuracy decreases if the stimuli are not of the same gender of those used in the learning phase: if we add 20 female faces, they are recognized for 42,5%, whereas if we add 20 faces, partly of female gender and partly of male gender, partly with white and partly with black skin, they are recognized for 60%. 5.5. Experiment 5 In this experiment we used 40 white female faces (CLASS 2) and we trained the network for 40.000 epochs. Faces were recognize for 60% (see figure 6). These data evidence that there are different performances associated to the different classes. Faces of CLASS 2 seem to be less recognized than faces of CLASS 1.
420
V. Stara et al. Accuracy between classes 100 80 %
60
77,5 60
70
80
72,5
80
60
72,5
80
72,5
60
60
3000 0
4000 0
40 20 0 1000 0
2000 0 CLASS 1
Epoch s CLASS 2
CLASS 3
Figure 8. Recognition accuracy in the different classes.
Also in this case we built a new dataset of 20 female faces to check the accuracy for the added stimuli while keeping unchanged the weights of this experiment. The new faces were recognized for 90% and, also in this case, this optimal accuracy decreased if the stimuli were not the same of the learning phase: if we add 20 male faces, the accuracy is 85%, whereas if we add 20 mixed faces the accuracy falls to 67,5%. 5.6. Experiment 6 In this experiment we used 40 faces (CLASS 3), 20 female faces (10 white and 10 black) and 20 male faces (10 white and 10 black) and we trained the network for 40.000 epochs. Faces were recognized for 72,5% confirming that different kinds of stimuli were associated to different levels of recognition accuracy (see figure 7). As in other experiments, we built a new dataset of 20 mixed faces to check the accuracy for the added stimuli with the weights obtained in this experiment. The new faces were recognized for 90%. The accuracy fell to 85% using 20 male faces and to 50% using 20 female faces. The findings obtained in experiments 4, 5 and 6 are graphically summarized in figure 8. 6. Conclusions Even if the use of neural networks in face recognition dates to 25 years ago (Stoham, 1984 [26]; Abu-Mostafa and Psaltis, 1987 [1]; Kohonen, 1989 [19]; Golomb and Sejnowski, 1991 [14]; Brunelli and Poggio, 1992, 1993 [5,6]), nevertheless techniques based on NN deserve more study (Chellappa et al., 1995 [6]), especially as regards gender classification and facial expression. We used a neural network architecture in order to account for some features of visual information implied in human face processing. It is to be remarked that
A Neural Model of Face Recognition: A Comprehensive Approach
421
our model does not reproduce neither the processes underlying the individuation and the extraction of the face representation from the visual field nor the processes leading from face representation to the semantic information relative to individuals which the face belongs to. We evidenced how a Kohonen’s neural network could recognize face stimuli in a way which is dependent upon its past experience. Therefore we could say that, in principle, a SOM-based network could attain a satisfactory performance as a face recognizer. However the experiments 1-6 evidenced that: I) network performance is crucially dependent on the number of categorizing units, II) network performance is strongly dependent on the nature of face images used in the training phase, more precisely on their gender. With regard to the latter point, by using 3 kinds of stimuli classes we investigated how the recognition accuracy could vary as a function of the nature of the class: we found the best accuracy in a class composed of only men (80%) followed by a “mixed” class (men and women together) (72,5%). The worse performance was found in a class of only women (60%). Data evidences that the system is able to produce stable representations from exemplars as well as humans do. Into these categories, a gender effect exists. We explain it by the Face Space’s Theory. Faces may be encoded by reference to a generalized prototype: a sort of schema that emerges as a result of a lifetime’s experience with faces (Valentine and Bruce, 1986). The prototype can be considered as the ideal format of a subject or object, stored in the long-term memory. This assumption is based on the “prototypicality hypothesis”: relevant stimuli are represented within a multidimensional feature space and, thus, considering each face as a point in this space, faces are organized in a face space with the most prototypical one in the center (Valentine, 1991). Faces dissimilar from this prototype are the more distant ones from the center, and are called “distinctive faces” The distinctiveness effect shows that some faces are much easily recognizable than others. In particular, some faces are more distinctive in the general populations and for this reason they are also easier to recognize. Distinctive faces are recognized better because they are further from other neighbouring faces in the face space and so are less susceptible to confusion between faces located near each other in the space. Face space’s theory seems to explain our data. Kohonen’s network produces categories based on statistical features of the incoming stimuli. In these categories there are distinctive faces such as male faces. Male and female faces differ in feature-based information such as the size of the nose and prominence
422
V. Stara et al.
of the brow. We could therefore interpret these categories as modelling the salience maps emerging from the interactions between different modules in visual cortex, each one of them being identified with a particular categorizing unit. These results seem to confirm the plausibility of our biological model. Anyway, in order to improve the performance, as a face recognizer, of a SOM-based network, we should better understand the reason for the different performances with faces of different gender. In turn, this requires a further extension of both experimental studies on human subjects and computational modelling activities within the domain of Face Recognition. References 1. Y.S. Abu-Mostafa, D. Psaltis, Scientific American 256, 88-95 (1987). 2. J.P. Aggleton, M.J. Burton, R.E. Passingham, Brain Research 190, 347-368 (1980). 3. A. Bartels, S. Zeki, Philosophical Transactions of the Royal Society B 360, 733-750 (2005).
4. S.T. Boysen, G.G. Bernston, Journal of Comparative Psychology 103, 215-220 (1989).
5. R. Brunelli and T. Poggio, in Proc. DARPA Image Understanding Workshop, (1992), pp. 311-314.
6. R. Brunelli and T. Poggio, IEEE Transactions on PAMI 15,1042-1052 (1993). 7. R. Chellappa, C.L. Wilson, and S. Sirohey, Proc. IEEE 83, 705-740 (1995). 8. M. Farah, G.W. Humphreys, H.R. Rodman, in Fundamen-tal Neuroscience, Ed. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
M.J. Zigmond, F.E. Bloom, S.C. Landis, J.L. Roberts, L.R. Squire, (Academic Press, San Diego, CA; 1999), pp. 1339-1361. M.J. Farah, J.W. Tanaka, H.M. Drain,. Journal of Experimental Psychology: Human Perception and Performance 21, 628-634 (1995). I. Gauthier and M.J. Tarr, Vision Research 37(12), 1673-1682 (1997). I. Gauthier, M. Behrmann and M.J. Tarr, Neuropsychologia 42(14), 1961-70 (2004). I. Gauthier, M.J. Tarr, A.W. Anderson, P. Skudlarski, J.C. Gore, Nature Neuroscience 2, 568-580 (1999). I. Gauthier, P. Williams, M.J. Tarr and J. Tanaka, Vision Research, 38, 2401-2428 (1998). B.A. Golomb, T.J. Sejnowski in Advances in Neural Information Processing System 3, Ed. D.S. Touretzky and R. Lipmann, (Morgan Kaufmann, San Mateo, CA, 1991), pp. 572-577. M.H. Johnson, S. Dziurawiec, H. Ellis, J. Morton, Cognition 40, 1-19 (1991). M.H. Johnson, J. Morton, Biology and Cognitive Development: The Case of Face Recognition (Blackwell Press, Cambridge, MA, 1991). N. Kanwisher, J. McDermott, M.M. Chun, Journal of Neuroscience 17, 4302-4311 (1997). N. Kanwisher, D. Stanley, A. Harris, NeuroReport 10, 183-187 (1999). T. Kohonen, Self-Organization and Associative Memory (Springer, Berlin, 1989).
A Neural Model of Face Recognition: A Comprehensive Approach
423
20. J.S. Morris, C.D. Frith, D.I. Perrett, D. Rowland, Nature 383, 812-815 (1996). 21. Nahm, F.K., Albright, T.D., Amaral, D.G. Society for Neuroscience Abstracts 17, 473 (1991).
22. A.J. O'Toole, Yi Cheng, P.J. Phillips, B. Ross, H.A. Wild, 2000. in Fourth IEEE 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.
International Conference on Automatic Face and Gesture Recognition, (28-30 March 2000), pp. 552-557. L.A. Parr, T. Dove, W.D. Hopkins, Journal of Cognitive Neuroscience 10, 615-622 (1998). O. Pascalis, J. Bachevalier, Behavioural Processes 43, 87-96 (1998). M.T. Phelps, W.A. Roberts, Journal of Comparative Psychology 108, 114-125 (1994). T.S. Stonham, In Aspects of Face Processing, Ed. H.D. Ellis, M.A. Jeeves, F. Newcombe and A. Young, (Nijhoff, Dordrecht, 1984), pp. 426-441. T. Valentine, Quarterly Journal of Experimental Psychology 43A, 161-204 (1991). T. Valentine, British Journal of Psychology 79, 471-491 (1988). T. Valentine, V. Bruce, Perception 15, 525-535 (1986). P.J. Whalen, S.L. Rauch, N.L. Etcoff, S.C. McInerney, M.B. Lee, M.A. Jenike, Journal of Neuroscience 18, 411-418 (1998). S. Zeki, A vision of the brain (Blackwell, Oxford, UK, 1993). W. Zhao, R.Chellappa, A. Rosenfeld, and P.J. Phillips, Face recognition: A literature survey, CVL Technical Report, (University of Maryland, 2000), ftp://ftp.cfar.umd.edu/TRs/CVL-Reports-2000/TR4167-zhao.ps.gz .
This page intentionally left blank
ANTICIPATORY COGNITIVE SYSTEMS: A THEORETICAL MODEL
GRAZIANO TERENZI Department of Psychology, University of Pavia, Italy E-mail: [email protected] This paper deals with the problem of understanding anticipation in biological and cognitive systems. It is argued that a physical theory can be considered as biologically plausible only if it incorporates the ability to describe systems which exhibit anticipatory behaviors. The paper introduces a cognitive level description of anticipation and provides a simple theoretical characterization of anticipatory systems on this level. Specifically, a simple model of a formal anticipatory neuron and a model (i.e. the -mirror architecture) of an anticipatory neural network which is based on the former are introduced and discussed. The basic feature of this architecture is that a part of the network learns to represent the behavior of the other part over time, thus constructing an implicit model of its own functioning. As a consequence, the network is capable of self-representation; anticipation, on a macroscopic level, is nothing but a consequence of anticipation on a microscopic level. Some learning algorithms are also discussed together with related experimental tasks and possible integrations. The outcome of the paper is a formal characterization of anticipation in cognitive systems which aims at being incorporated in a comprehensive and more general physical theory. Keywords: anticipation in cognitive systems, anticipatory neural networks, learning algorithms.
1. Introduction The ability to anticipate specific kinds of events is one of the most amazing features of biological organization. Mainly, it is related to the functioning of complex biological systems which perform cognitive functions. An anticipatory system is a system which decides its behavior by taking into account a model of itself and/or of its environment; i.e. it determines its current state as a function of the prediction of the state made by an internal model for a future instant of time (Rosen, 1985) [17]. A system endowed with anticipatory qualities is also named a proactive system. Proactivity and anticipation, thus, are two important features of biological organization. As a matter of fact, anticipatory properties play an important role in the mechanisms which rule learning and control of complex motor behaviors. Indeed, as it stands, motion is the only means biological organism have to interact both with their environment and with other organisms (Wolpert, 425
426
G. Terenzi
Gharamani and Flanagan, 2001 [24]). Recent empirical studies have demonstrated the involvement of the motor system in processes ranging from the observation and anticipation of action, to imitation and social interaction (Gallese, Fadiga, Fogassi and Rizzolatti, 1996 [4]; Rizzolatti and Arbib, 1998 [14]; Liberman and Whalen, 2000 [10]; Wolpert, Doya and Kawato, 2003 [25]). In this context the problem of anticipation is essentially the problem of understanding how cognitive systems of complex organisms can take into account the expected future evolution of events which take place in the interaction with their environment in order to take decisions and determine their behavior (Sutton, 1990 [20]; Stolzmann, 1998 [19]; Baldassarre, 2002, 2003 [1,2]). The study of anticipation in the context of sensory-motor learning and coordination has been carried out by developing an interesting ensemble of computational models which aim to take into account the anticipatory properties of biological neural networks. Among these models we must mention the Forward Models by Jordan and Rumelhart (1992), the architectures based on Feedback-Error Learning procedure (Kawato, 1990) [9] and on the Adaptive Mixture of Local Experts (Jacobs, Jordan, Nowlan and Hinton, 1991 [6]; Jacobs, 1999 [7]), such as the MOSAIC model (Wolpert and Kawato, 1998 [23]; Haruno, Wolpert and Kawato, 2001 [5]). Essentially, on the basis of computational studies it has been hypothesized that the central nervous system is able to simulate internally many aspects of the sensory-motor loop; specifically, it has been suggested that a dedicated neural circuitry is responsible of such processing and that it implements suitable “internal models” which represent specific aspects of the sensory-motor loop. Internal models predict the sensory consequences of motor commands and, for this reason, are called forward models, as they model forward causal relations (i.e. to a future instant of time) between actions and their sensory consequences. A forward model is employed to predict how the state of the motor system will change in response to a given motor command (motor command state of the motor system). A forward model is therefore a predictor or a simulator of the consequences of an action. The inverse transformation from the state to the motor command (state motor command), which is needed to determine the motor command required to reach a given goal, is performed by what is called an inverse model. Since they take into account only the global patterns of behavior of the system, a basic feature of these models is that they are essentially bounded to a coarse grained treatment of anticipation, thus giving rise to a macroscopic (i.e. global) notion of anticipation. Incidentally, they do not consider the possibility for this global
Anticipatory Cognitive Systems: A Theoretical Model
427
notion of anticipation to be a consequence of a more general theory of anticipation working on other levels as well; to step closer towards the aforementioned goal is one of the main focuses of this paper. On the other hand, it is also clear that biological systems are first of all physical systems. For this reason understanding anticipation is a problem for any physical theory which aims at explaining the emergence of biological organization. For this same reason, the quest for such an explanation requires both the clarification of the levels of description of the systems under study, and their integration in a unitary and more comprehensive theoretical framework. The main goal of this paper, then, is to identify a possible characterization of anticipation in biological systems which perform cognitive processing, in such a way as to bring to light some properties that any biologically plausible physical theory must incorporate. 2. Anticipatory Neurons Essentially, a formal neuron is function F which maps an input space I to a corresponding output space A. A non-anticipatory formal neuron is a function which has the form
at = F (it )
,
(1)
where it and at are respectively input and activation of the neuron at time t. An anticipatory formal neuron instead can have the following form
at = F (it , it +τ ) it +τ = G (it )
,
,
(2) (3)
where it+τ is computed by a suitable function G, named a predictor. If the activation at depends both on the input at time t and on activation of the neuron predicted at time t+τ the neuron has the following form
at = F (it , at +τ ) ,
(4)
at +τ = G (at ) .
(5)
This amounts to saying that the activation of a neuron at time t does not depend only on its input at time t, but also depends on its input (or in case on its output) at time t+τ, as it is computed by a suitable function G. Here, τ represents a “local time” parameter strictly related to predictor G. In the context of this model, we have to distinguish indeed between the global time t and the local time t’= t+τ. Whereas global time t describes the dynamics of function F,
428
G. Terenzi it it
F
at
F
F
at
at
G (a)
it
it+τ
at+τ
G (c)
(b)
Figure 1. Three different models of a neuron: (a) a standard non-anticipatory formal neuron; (b) an anticipatory formal neuron which anticipates over input; (c) an anticipatory formal neuron which anticipates over its own output.
local time t’ describes both the activation dynamics and the learning dynamics of the predictor function G. Parameter τ identifies a temporal window over which the dynamics of the anticipatory neuron is defined. It sets up the limit of an internal counter which is used, step by step, to fix values of function G during the learning phase (approximation). In this sense τ represents a measure for the quantity of information that a neuron can store while executing its computation. This fact also means that the neuron is endowed with a kind of internal memory. This memory makes the neuron take into account more information than the information which is already available instantaneously. 3.
An Example of Dynamical Evolution of a Single Anticipatory Neuron
The following example illustrates the dynamics of a single anticipatory neuron trained to associate input and output by means of the Widrow-Hoff rule (Widrow and Hoff, 1960) [22]. The predictor also is trained by means of the Widrow-Hoff rule. Let us consider a neuron with the generic form like the one illustrated in Figure 2. Let us suppose that F is a sigmoidal function, that is it has the form
at =
1 1 + e − Pi
(6)
where P is the activation potential of the neuron and it is computed as,
Pi =
wij i j
(7)
j
and where wij are connection coefficients of the input links to the i-th unit, and ij are its input values. Let us suppose that G has the same form as F except
Anticipatory Cognitive Systems: A Theoretical Model
it
w11
w21
429
at
F
it+τ
G
w12
Figure 2. An anticipatory formal neuron with suitable connections weights.
for computing the quantity, i ' t +τ , devoted to approximate its input at time t+τ instead of computing output at that is:
i' t +τ =
1 1 + e − Pi
(7)
where P, i.e. the potential of G, is computed as the potential of F. As it can be seen, the anticipatory neuron is a system which includes two different subsystems, the anticipator F and the predictor G. Whereas F computes its activation on the global time scale t, G computes its activation on the local time scale t’; and the activation is computed by storing in memory the input to the anticipator F both at time t and at time t+τ . Function G, on the other hand, is approximated by resorting to these values. Table A illustrates synthetically a generic dynamics for the previously introduced anticipatory formal neuron. Essentially, in the course of the dynamical evolution of the neuron, for each couple of patterns presented to the neuron, a propagation cycle and a learning cycle are first executed for the predictor G, and then a propagation cycle and a learning cycle are executed for the anticipator F. It must be underlined that the choice of Widrow-Hoff rule for the training cycle of the neuron has been taken only for simplicity and for illustrative purposes; other algorithms could suit well. 4. Networks of Anticipatory Neurons By taking inspiration from the simple model of an anticipatory neuron introduced in the previous section, networks of anticipatory neurons can be designed that carry out complex tasks which involve the employment of suitable “internal models” (be they implicit or explicit) of their external or internal environment in order to take decisions.
430
G. Terenzi Table A. Example of the computation of the system at subsequent time steps
-t=0 -τ = 1 - t’ = (t + τ ) = 1 - it = i0
-t=1 -τ = 1 - t’ = (t + τ ) =2 - it = i1
-t=2 -τ = 1 - t’ = (t + τ ) = 3 - it = i2
- Initialize variables - Compute G
- Approximate “G” 1 - Compute Error
- Approximate “G” 1- Compute Error
i1'= G (i0 ) - Propagate Activation
a0 =
F (i0 , i1')
- Approximate “F”
EG =
1 ' (i1 − i1 ) 2
1 E F = (a0 − a0T ) 2
1 ' (i2 − i2 ) 2
2 - Weights Update
2- Weights Update
∆w21 = −η
∆w21 = −η
∂EG = ∂w21
∂EG = ∂w21
= −η (i1'− i1 ) ⋅ G ' ( PG ) ⋅ i0
= −η (i2' − i2 ) ⋅ G'( PG ) ⋅ i1
where PG = ( w21i0 ) − sG
where PG = ( w21i1 ) − sG
- Compute Activation “G”
- Compute Activation “G”
i2' = 1- Compute Error
EG =
G (i1 )
- Propagate Activation
a1 =
F (i1 , i2')
i3'= G (i2 ) - Propagate Activation
a2 = F (i2 , i3')
2- Weights Update
- Approximate “F”
- Approximate “F”
∂E ∆w12 = −η F ∂w12
1- Compute Error
1- Compute Error
1 E F = (a1 − a1T ) 2
EF =
2- Weights Update
2- Weights Update
∂E ∆w12 = −η F ∂w12
∆w12 = −η
∂E F ∂w12
∂E F ∂w11
∆w11 = −η
∂E F ∂w11
∂E ∆w11 = −η F ∂w11
∆w11 = −η
1 (a2 − a 2T ) 2
The scheme in Figure 3 describes a simple example of a neural network constructed by resorting to anticipatory neurons as its building blocks, which calculate their activation as a function of input values predicted at time t+τ. Input units within the scheme are represented by an array i of input values. Hidden units are represented by a composition of sigmoidal functions, i.e. g and
Anticipatory Cognitive Systems: A Theoretical Model
431
Figure 3. Simple neural network constructed by resorting to anticipatory neurons as its building blocks.
f respectively. Whereas the functions g within the hidden layer give rise to specific activation values, xg, represented here by an array xg, functions f give rise to another set of activation values. Similarly, for the output layer, the activation of output units is computed by means of composition of two suitable functions g and f, and their activation values are collected respectively in the arrays ug and uf. It must be stressed that each function g is endowed with a stack A, i.e. a memory which stores input vectors spread across a suitable temporal window t + τ ; it can be represented by a τ × N dimensional matrix M, where N is the number of storable patterns, and τ is the time step when they are given as input to g. For each layer of the network there are two kinds of connections which play different roles: connections to functions f (fconnections) and connections to functions g (g-connections). Indeed, fconnections between input and hidden layers are characterized by the connection weights w (which in the figure are represented without indices); g-connections, on the other hand, are represented by connection weights w’. Similarly, on the one hand, f-connections between hidden and output layers are characterized by connection weights c; on the other hand, g-connections are characterized by connection weights c’.
432
G. Terenzi
The case reported in figure shows a network in which each anticipatory neuron is endowed with a neuron-specific stack A which lets the neuron store and compare patterns given as input to the neuron in its temporal window (i.e. Input or output patterns, depending on the nature of the device). With regards to the latter, the parameter τ can be approximated by suitable learning algorithms, thus giving rise to networks which are able to perform tasks which require indefinite long-range time correlations for their execution. Training in such a network can be made in many ways. A basic option is to employ a back-propagation algorithm (Rumelhart, Hinton and Williams 1986) [18] in order to learn weights w and c of the functions f within the layers of the network, and to use Widrow-Hoff rule for the weights w’ and c’ of the corresponding functions g. In the case of an “on-line” learning procedure, a learning step for the whole network can be implemented, for example, through the following succession of phases: • Initialize Variables • Give a pattern as Input and Propagate Pattern • For each layer, Modify weights of functions g according to their temporal window (Widrow-Hoff) Compute g Compute f Modify weights of functions f according to the corresponding target pattern (Backpropagation). Table B lists of the basic variables of the model, i.e. a set of constructs that can be used to implement and simulate the model under study.
5. The τ -Mirror Architecture: Networks In the previous sections we have introduced both single anticipatory neurons and a simple network architecture. In the example of the last section each function g was connected to one and only one function f within its layer, and, moreover, it was characterized by a neuron-specific time window. This fact entails the employment of a specific stack for each function g. According to this description, the stack can be represented as a variable-structure quadrimensional matrix which comprises 1) the parameter τ of the temporal window, 2) the number N of storable patterns, 3) a dimension which represents neurons, and 4) a dimension which represents the value of τ for those neurons.
Anticipatory Cognitive Systems: A Theoretical Model
433
Table B. The basic variables of the model. Constructs i
Input vector
Description
xf
Vector of the activation of functions f within the hidden layer
xg
Vector of the activation of functions g within the hidden layer
uf
Vector of the activation of functions f within the output layer
ug
Vector of the activation of functions g within the output layer
W
Weight matrix of the f-connections between input and hidden layers Weight matrix of the g-connections between input and hidden layers Weight matrix of the f-connections between hidden and output layers Weight matrix of the g-connections between hidden and output layers τ × N matrix which stores N input to functions g of inputhidden layer within temporal window τ τ × N matrix which stores N input to functions g of hiddenoutput layer within temporal window τ
W’ C C’ Aw Ac
Another possibility is to consider network architectures where functions g are connected in output not only to a single corresponding function f but to all of the other functions f belonging to the same layer. Moreover we can consider a generalized window t + τ for every g of the same layer or of the whole network. The set of all functions g of the network constitutes a real “temporal mirror” for the corresponding functions f. In a sense, we can say that a part of the network learns to represent the behavior of the other part from the point of view of the specific task at hand, by constructing an implicit model of its own functioning. Briefly, the network is capable of self-representation. For this reason the units that approximate functions g of the network within the temporal window t + τ (i.e. the ones we have previously dubbed “predictors”) are called “τ-mirror units” of the network. The schemes presented in Figure 4 and 5 represent this idea synthetically. Figure 4 illustrates an anticipatory network which predicts over its input spaces. As anticipated, if each function g is connected to every function f within the same layer, and not only to the corresponding ones, then they can be grouped together in a suitable layer, called a “mirror layer”. Moreover, if the same temporal window t+τ is generalized to all of the mirror units within the same layer, then the complexity of both the representation and the implementation of
434
G. Terenzi
Figure 4. An anticipatory neural network which anticipates over its input space.
the model can be further reduced. Figure 5 represents an anticipatory neural network which predict over its output spaces.
6. The τ -Mirror Architecture: Learning Algorithms We have previously introduced a learning scheme for the training of the neural network model under study. And this scheme is a supervised learning scheme which is based on 1) the employment of the Backpropagation rule for the adjustment of the weights of the functions f (W and C) and 2) the employment of the Widrow-Hoff rule for the adjustment of the weights of the functions g. By the way, there are other possibilities that can be explored.
6.1. Reinforcement Learning A straightforward solution is given by the employment of a reinforcement learning algorithm for approximating the weights of the functions f. Typically, reinforcement learning methods are used to approximate sequential dynamics in the context of the online exploration of the effects of actions performed by an agent within its environment (Sutton and Barto, 1998 [21]; Baldassarre, 2003 [2]). At each discrete time step t the agent perceives the state of the world st and it selects an action At according to the latter perception. As a consequence of each action the world produces a reward rt+1 and a new state st+1. An “action selection policy” is defined as a mapping from states to action selection probabilities : S × A [0, 1]. For each state s in S, a “state-evaluation
Anticipatory Cognitive Systems: A Theoretical Model
435
Figure 5. Anticipatory neural network which anticipates over its output space.
function” is also defined, V [s], which depends on the policy computed as the expected future discounted reward based on s:
V π [ s ] = E[rt +1 + γ rt + 2 + γ 2 rt + 3 + ...] =
[π [ a, s ] a∈ A
s' ∈S
and is
[ p ssa '( rt +1 + γ V π [ s '])]], (8)
where [a,s] is the probability that the policy selects an action a given the state s, E is the mean operator and γ is a discount coefficient between 0 and 1. The goal of the agent is to find an optimal policy to maximize V [s] for each state s in S. In a reinforcement learning setting, the estimation of expected values of states s, V(s), is computed and updated typically by resorting to Temporal Differences Methods (TD methods); this amounts to saying that the estimation of V(s) is modified by a quantity that is proportional to the TD-error, defined as follows
δ = r + γ V ( s') − V ( s) .
(9)
An effective way to build a neural netowrk which is based on reinforcement learning is to employ a so called Actor-Critic Architecture (Figure 6). Actor-Critic Architectures are based on the employment of distinct data structures for action-selection policy management (“actor”) and for the evaluation function (“critic”). The “actor” selects actions according to its input. In order to select an action, the activation of each output unit of the actor is given as input to a stochastic action selector which implements a winner-take-all selection strategy. The probability for an action ag to become the selected action aw is given by
436
G. Terenzi
Figure 6. An Actor-Critic architecture.
P[a g = aw ] =
mg [ yt ] k
mk [ yt ]
,
(10)
where mg[yt] is the activation of output units as a function of input yt . On the one hand, the “critic” is constituted by both an Evaluator and a TDcritic. The Evaluator is a network which estimates the mean future rewards which can be obtained in a given perceived state of the world yt = st. The Evaluator can be implemented as a feed-forward two-layer network with a single output unit which estimates V’[s] of V [s] , where the latter is defined as above. On the other hand, the TD-critic is a neural implementation of the function that computes the TD-error et as a difference between the estimations of V [s] at time t+1 and time t:
et = ((V π [ yt ]))t +1 − ((V π [ yt ]))t = (rt +1 + γ V π [ yt +1 ]) − V π [ yt ] ,
(11)
where rt+1 is the observed reward at time t+1, V’ [yt+1] is the estimated value of the expected future reward starting from state yt+1 and γ is, as usual, a suitable discount parameter. According to such a TD-error, the “critic” is trained by means of Widrow-Hoff algorithm; specifically its weights wj are updated by means of the following rule
∆ w j = η et y j = η ((rt +1 + γ V π [ yt +1 ]) − V π [ yt ] ) y j .
(12)
Anticipatory Cognitive Systems: A Theoretical Model
437
Also the “actor” is trained by resorting to the Widrow-Hoff algorithm and only the weights which correspond the winning unit are updated by the rule
∆ wwj = ζ et (4m j (1 − m j )) y j π π = ζ ((rt +1 + γ V ' [ yt +1 ]) − V ' [ yt ] )(4m j (1 − m j )) y j
(13)
where ζ is a suitable learning parameter between 0 e 1 and (4mj (1− mj)) is the derivative of the sigmoid function corresponding to the j-th output unit which wins the competition, multiplied by 4 in order to homogenize the learning rates of the two networks.
6.2. Reinforcement Learning for Anticipatory Networks Extending a reinforcement learning algorithm to the anticipatory neural networks described above is quite straightforward. Let us consider an anticipatory network like the one described in section 4., but with two-layers only. For this reason it will have a single layer of τ-mirror units. Then, let us consider an “Actor” constructed by resorting to the latter network; in this case, input units code for the state of the world at time t, whereas output units code for possible actions that can be executed at time t. The “Critic” can be considered as analogous to the one just discussed in the previous section. In this context, the approximation of weights for the functions f within the output layer of the network goes exactly as described in the previous section, that is by modifying the weights of the f-connections according to the TD-error computed by the Critic. But how can the weight over the g-connections be updated? Also in this case, the extension is straightforward if we consider the fact that the adjustment of the weights of these units is based on the WidrowHoff rule. The idea is to employ the TD-errors corresponding to the reference temporal window rather than the quadratic error corresponding to input (or output) patterns. This essentially means that the stack of the g units has to store, beside the input patterns to g within the temporal window τ, also the corresponding TD-errors. Therefore, the error signal is computed not by considering the quadratic error with respect to suitable targets, but by resorting to the TD-error corresponding to the reference time step.
6.3. Other Learning Algorithms Beside the ones discussed in the preceding sections, there is a wider set of learning methods that could be employed for the training of this kind of networks. However, their application must be suitably assessed and analyzed
438
G. Terenzi
firstly by discussing their theoretical implications. Even if it is not the goal of this paper to discuss this topic, we recognize that it is a very interesting possibility to extend the application of non-supervised learning algorithms (such as Kohonen-like-methods) to this training setting. Another possibility is to employ genetic algorithms.
7. Experimental Tasks for Simulation Assessment How can the viability of such an approach to the study and design of neural architectures be assessed? If, on the one hand, the assessment of the viability of a theoretical model essentially amounts to test its predictive and/or explicative power with respect to natural phenomena, on the other hand, the assessment of its utility from a practical point of view requires it to satisfy suitable constraints and goals within a design setting; in this sense it must solve specific scientific and design problems. A simpler simulation task, which is similar to the one just described, is to follow and anticipate a trajectory of an object in a more abstract manner, that is by employing the whole body of the simulated agent and not the simulated arm. Other experimental settings can be conceived that involve the execution of both planning and navigation tasks in simulated environments. More abstract simulation tasks can involve learning and anticipation of things such as numeric series, grammars and automata, which do not possess necessarily a material counterpart.
8. Conclusions and Future Work The construction of neural network architectures endowed with selfrepresentation abilities as well as with the proactive ability to show behaviors which do not depend only on their state at time t but also on their predicted state at a future time step is a strategic lever for understanding phenomena ranging from sensory-motor control to thought and consciousness. In this explorative paper we have introduced a possible form of both anticipatory neurons and networks which have been named “τ-mirror networks”. We have proposed some learning algorithms for this kind of networks and have put forward the possibility to extend other algorithms suitable for training. In a sense, it can be acknowledged that anticipation at the system level (i.e. what we have called the “global” notion of anticipation) can be considered as an emergent property of the interaction among the anticipatory components of the system. For this reason, the model presented in this paper provides a more
Anticipatory Cognitive Systems: A Theoretical Model
439
general treatment of anticipation on a fine grained basis, thus relating anticipation to the micro-behaviors of the components of the system. Among the future developments, a possibility (which has to be evaluated in the detail) is to change the form the anticipatory neurons and networks in order to make them more biologically and physically plausible. For example, Dubois (1998) has introduced an anticipatory form of the neuron of McCulloch and Pitts which is based on the notion of hyperincursion and that could be employed in the construction of another kind of anticipatory network. The hyperincursive computing systems are demonstrated to show stability properties much relevant if compared to the non-anticipatory counterparts. Therefore, a future development could be the integration of hyperincursivity within the theoretical framework presented in this paper. Notwithstanding the aforementioned theoretical developments, the most important and critical step is to develop a more comprehensive physical theory which will be able to describe anticipatory systems including those of the kind introduced in this paper. We strongly believe that such an integration is necessary to understand this very important aspect of biological organization. Without such a step, indeed, it would be impossible to account for a wide range of phenomena occurring in both the biological and cognitive realms.
References 1. Baldassarre, Planning with Neural Networks and Reinforcement Learning, PhD Thesis, (Computer Science Department, University of Essex, Colchester-UK, 2002). 2. Baldassarre, in Adaptive Behaviour in Anticipatory Learning Systems, Ed. Butz, Sigaud and Gèrard, (Springer, Berlin, 2003), pp. 179-200. 3. Dubois, in AIP Conference Proceedings 437, (1998), pp. 3-29. 4. Gallese, Fadiga, Fogassi and Rizzolatti, Brain 119, 593-609 (1996). 5. Haruno, Wolpert and Kawato, Neural Computation 13, 2201-2220 (2001). 6. Jacobs, Jordan, Nowlan and Hinton, Neural Computation 3, 79-87 (1991). 7. Jacobs, Trends in Cognitive Sciences 3(1), 31-38 (1999). 8. Jordan and Rumelhart, Cognitive Science 16, 307-354 (1992). 9. Kawato, in Neural Networks for Control, Ed. Miller III, Sutton and Werbos, (MIT Press, Cambridge, MA, 1990). 10. Liberman and Whalen, Trends in Cognitive Science 4, 187-196 (2000). 11. Luppino and Rizzolatti, News in Physiological Science 15, 219-224 (2000). 12. McClelland and Rumelhart, (Eds), Parallel Distributed Processing. Exploration in the Microstructure of Cognition (MIT Press, Cambridge, MA,1986). 13. Miller III, Sutton and Werbos, (Eds), Neural Networks for Control (MIT Press, Cambridge, MA, 1990).
440 14. 15. 16. 17. 18.
19. 20. 21. 22. 23. 24. 25.
G. Terenzi Rizzolatti and Arbib, Trends in Neurosciences 21, 188-194 (1998). Rizzolatti, Fogassi and Gallese, Nature Reviews Neuroscience 2, 661-670(2001) Rizzolatti and Craighero, Annual Review of Neuroscience 27, 169-192 (2004). Rosen, Anticipatory Systems (Pergamon Press, 1985). Rumelhart, Hinton and Williams (1986) in Parallel Distributed Processing. Exploration in the Microstructure of Cognition, Ed. McClelland and Rumelhart, (MIT Press, Cambridge, MA,1986). Stolzmann, in Genetic Programming 1998: Proceedings of theThird Annual Conference, (1998), pp. 658-664. Sutton, in Proceedings of the Seventh International Conference on Machine Learning, (Morgan Kaufman, San Mateo, CA, 1990), pp. 216-224. Sutton and Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998). Widrow and Hoff, IRE WESCON Convent. Rec., 4, 96-104 (1960) Wolpert and Kawato, Neural Networks 11, 1317-1329 (1998). Wolpert, Gharamani and Flanagan, Trends in Cognitive Science, 5(11), (2001). Wolpert, Doya and Kawato, Philosophical Transactions of the Royal Society 358, 593-602 (2003).
DECISION MAKING MODELS WITHIN INCOMPLETE INFORMATION GAMES
NATALE BONFIGLIO, SIMONE PERCIVALLE, ELIANO PESSA Dipartimento di Psicologia, Università di Pavia Piazza Botta 6, 27100 Pavia, Italy E-mail: [email protected] According to Evolutionary Game Theory decision making in games with incomplete information should be viewed as an emergent phenomenon. However, the adoption of this framework tells us nothing about the concrete modeling of the emergence of decisions within specific games. In this paper we took into consideration the case of Iterated Prisoner Dilemma Game (IPDG). In this regard we compared the outcomes of computer simulations of three different decision making models, two of which implemented through particular neural network architectures, with experimental data coming from observations about the behavior in IPDG of human players. The comparison was based on the use of a Genetic Algorithm, which let us know the best parameter values, for each kind of model, granting for the best reproduction of the observed pattern of experimental data. We found that the best fit was obtained by a model directly taking into account the inner expectancies of each player. This result suggests that the emergence of decision cannot be described by resorting to the simplest models of selforganization. More complex models are needed, including a detailed account of the operation of player’s cognitive system. Keywords: Iterated Prisoner Dilemma Game, decision making, expectancy, neural networks, genetic algorithms.
1. Introduction Decision making has been initially studied through the so-called normative approach, describing the way in which a person should make decisions if he/she would behave in a rational way. This approach is therefore prescriptive, in that it prescribes the principles an individual should refer to when making a rational choice. Later, because of evident failures of predictions made by the normative approach, a second perspective, called descriptive approach, arose, especially influencing psychologists studying thought and decision making [1]. This approach aims at building models that can describe and thus predict the decisional process implied in the choices individuals make, and determining the factors that influence them. The theoretical support for descriptive approach has been given by Evolutionary Game Theory [2,3,4,5,6,7], according to which a decision is nothing but the outcome of a process of emergence characterized by
441
442
N. Bonfiglio et al.
the complex interaction of different factors, as well as of a number of contextual features, in turn connected to the story of individual interactions occurred within a specific game. While this perspective underlines the capital role of a theory of emergence [8,9,10] in describing decision making, it, however, leaves unspecified the criteria to be adopted when choosing particular models of emergence in order to describe decision making processes in given game situations. In this regard, the only strategy so far available seems to be the one of choosing particular decision making models, on the basis only of reasons of mathematical convenience, and to compare their predictions with the observed behavior of human subjects engaged in a real game. This comparison should allow the individuation of the best model, among the ones taken into consideration. However, while this strategy could, at a first sight, appear as natural and easy, we will show that, besides a number of problems of implementation, it could be dangerous, leading to recognize as the “best ones” some models which, in practice, are unsuited to account for the observational data. In this regard, we will consider the decision making processes occurring within the context of incomplete information games, focusing our attention on Iterated Prisoner Dilemma Game. The latter, as it is well known, belongs to the category of incomplete information games. This choice was made for two main reasons: first of all, there is a large amount of literature on it [4,11,12,13,14], and, in second place, the iterated version of Prisoner Dilemma Game (PDG) allows of a number of different equilibrium situations (while one-shot PDG is characterized by only one equilibrium situation). Moreover, the relatively simple structure of IPDG makes simpler both the building of models of players’ behavior and the implementation of experiments on IPDG played by real human subjects. As regards the kind of models used to describe players’ behavior, our choice fell on artificial neural networks, or on computational models equivalent to them, in one way or another. This choice was suggested by a number of considerations, such as: 1. neural network models allow to describe in a simple way emergence phenomena; 2. these models are, in a sense, “universal”; namely a number of studies proved that most models, even of different nature, can be reduced to some kind of neural network; 3. in principle, we could make use of these models to find relationships between their behaviors and phenomena described by more refined models of behavior of biological neurons.
Decision Making Models within Incomplete Information Games
443
Table 1. Payoff matrix for IPDG with human subjects. Player 2 cooperation Player 1
cooperation defection
(5.000 ; 5.000) (30.000 ; -25.000)
defection (-25.000 ; 30.000) (0 ; 0)
In particular, in this paper we took into consideration three different models of this kind, two based on neural networks and one on a computational mechanism which could be expressed through a neural-network-like language. As a consequence of the foregoing discussion, the goals of the present study can be stated as follows: • to compare the performances of three different decision making models related to IPDG, two of which implemented through neural networks and one through a neural-like computational model; • to find which model best reproduces behavioral trends exhibited by human players. 2. Subjects and Experimental Procedure Before testing the models, we performed an experiment to investigate about the actual interactions between human subjects playing IPDG. The experiment was carried out on 30 subject pairs, each one composed by University students. The members of each pair confronted each other on an “economic” version of IPDG, the duration of which was fixed in advance as given by 100 moves. The latter information was not revealed to the subjects, in that, knowing the length of the interaction, the latter could transform the game into a series of single PDGs, each of which has a single equilibrium, defined by reciprocal defection [15]. The task the subjects were to face was that of pretending to be an entrepreneur who must choose to increase or reduce the price of an object. He could earn, loose or have a neutral outcome according to the choice made by the other player. The payoff matrix of this “economic” version is represented in Table 1. 3. The Models As specified before, the models tested through computer simulations are three. The underlying theories, bases, operation, neural networks and structures are described as follows.
444
N. Bonfiglio et al.
3.1. Pessa, Montesanto and Longobardi’s Model (1996) The difference between this model [16] and classical theories lies in the impossibility to describe the individual’s behavior without considering the context it refers to. Each player’s strategic choice changes with time, according to the experience accumulated during the game, and mainly through the interaction with the adversary, mediated also by the players’ cognitive systems. In turn, the mediation depends on three components: • a set of schemes which guides the decision as to what move to make in presence of particular interaction states; • a set of motivational values; • a motivational predisposition which influences the use of schemes in the game situation. In this regard previous studies [17,18] have identified three kinds of predispositions: to cooperation, competition and individualism. Pessa, Montesanto and Longobardi’s Model implies that the choice of the next move is determined on the basis of subjective maximum utility U . The latter is computed through a sum of three products: the previous move ( x is a numerical variable coding the kind of move) times the weight attributed to it ( α ), the value of the previous move of the adversary ( y ) times the weight attributed to it ( β ) and the previous outcome ( k ) times the weight attributed ( γ ). Formally:
U =α x+ β y+γ k Structurally, the neural network associated to the model in question is composed of two layers with feedforward connections (see Fig. 1). The first layer, of input, is composed of three units which code the values of previously introduced variables. The second layer, of output, is composed of two units that respectively code for cooperation and defection. The activation of the output unit is computed according to the “Winner-takes-all” mechanism, so that only one unit can produce an output signal. This unit is that associated with the maximum value of the weighed sum of its entries. The activation has the value 1 , if the winning unit is that of cooperation, while its value is −1 if the defection unit wins. Given that in the model in question the utility function is not fixed in time but varies as a function of the learning process activated by previous game history, a rule of connection weight modification is introduced. This modification takes place through a process based on the production of an association between stimulus and response given by temporal proximity. This mechanism was implemented in the model through the increase in value of the connection weights in the winning units, by means of a fixed quantity δ .
Decision Making Models within Incomplete Information Games
C
Player’s Previous Move
D
Adversary Previous Move
445
Move
Previous Outcome
Fig. 1. Artificial Neural Network of Pessa, Montesanto, Longobardi.
Formally:
wij′ = wij + δ Furthermore, a contemporary decrease of weights of the connections relative to the other output unit takes place, by means of the following law:
wij′ = ( wij − δ 1 ) 2 where: δ1 denotes a further parameter. 3.2. Bonfiglio’s Model (1999) The decisional process model put forward by Bonfiglio [19] can be considered as a development of the previous model by Pessa, Montesanto and Longobardi. Bonfiglio started from a series of hypotheses according to which every player who begins a game has expectations concerning the adversary’s strategy; this expectation often concerns a game model that is highly similar to one’s own, being it the only one known. Consequently, two situations may take place during the game:
446
1.
2.
N. Bonfiglio et al.
Both players start by experiencing the effects of different moves to get a knowledge of the adversary until one of the two, or both, adopts a strategy that is adaptable to the other; in this case, it is hypothesized that a cooperative subject should be more inclined to change his/her strategy to adapt to that of the adversary; a competitive player would instead pursue his/her own goals. Both reach an equilibrium already from the first moves and have no intention of trying alternative ones; they therefore keep unchanged the initial strategies.
The basic hypothesis can thus be formulated in terms of three points: • Expectation; each individual who is in a situation of uncertainty regarding a decisional choice about what behavior to adopt will have expectations about the adversary. • The behavioral model; each individual endowed with expectations regarding the behavior of his/her adversary begins to build a model of behavior, therefore supposing that the other person’s model is similar to his/her own. • Individual differences; each subject can be more or less reluctant to adapt his/her behavior to that of another. This depends on individual differences, on being more or less competitive, on personal motivations regarding a specific situation, on the goals an individual pursues and on his/her investment in the game. Structurally, the neural network in Bonfiglio’s model is more complex than the previous one. Each player is represented by a neural network, the scheme of which is represented in Fig. 2. The expectation at time t is computed on the basis of 5 previous moves played by the player and by the adversary (hence the ten units in the input layer), through a neural network with three layers with feedforward connections. Successively, a comparison is carried through with Expectation at time t − 1 , previously memorized. The comparison between the two expectations, the players’ moves and the payoff obtained by the player, all at t − 1 , concur to the decision making as to the actual move. 3.3. Busemeyer and Stout’s Model This model is generally known as the Theory of Valence Expectation Learning [20,21,22]. It is an adaptation to game situations of the Decision Field Theory (DFT) put forward by Busemeyer and Townsend [23] in 1993. The DFT is a
Decision Making Models within Incomplete Information Games
447
MOVE
Comparison of Expectations Player’s Move (t-1)
Expectation (t )
Expectation (t-1)
Adversary Move (t-1)
Player’s Payoff (t-1)
MEMORY
Figure 2. Structure of Bonfiglio’s artificial neural network (1999).
theory rooted in economics which aims at explaining the decisional process in conditions of uncertainty. The DFT is particularly interesting for its description of the dynamical processes that occur between the moment of the choice and the final decision. This dynamical formulation allows to explain the relationship between choice probability and time; also, it helps to explain the role of the effect of time pressure on the choice probability. Within it the temporal effects in the decisional process are accounted for by resorting to the influence of two components: • a memory of the decision made in the immediately previous time step; • the value attributed to the present situation, considered as stochastically fluctuating. However, DFT cannot be directly applied to IPDG in its original form, and this led to the Busemeyer and Stout model. According to the latter, the subject integrates the gains and the losses experienced in each test into a single affective
448
N. Bonfiglio et al.
reaction called valence. During the tests the player creates expectations relative to the valences of the possible alternatives, through a process of adaptive learning. These expectations then determine (in a probabilistic way) the choices that the subject makes at each test. The determination of the valence associated to the choice of a particular alternative A in the t -th test can be obtained through the following rule:
v A (t ) = w × PA (t ) + (1 − w) × G A (t ) where PA (t ) is the loss following choice A in the t -th test (of course expressed by a negative number), G A (t ) is the gain following choice A in the t -th test, w is a parameter called attention, whose value lies in the interval between 0 and 1, which determines how much sensitive the subject is to losses. The actual determination of the expectation relative to the value obtainable from choice A in the t -th test can be obtained through the following rule:
Asp[v A (t )] = a × v A (t ) + (1 − a ) × Asp[v A (t − 1)] In it, a stands for a parameter that fixes the speed of learning. This too lies between 0 and 1. If the value of a is too close to 1, there are strong and rapid changes, but also fast forgetting; opposite effects are obtained if a has a value too close to 0. Finally, the probability of choosing the alternative A in the (t + 1) -th test can be obtained through the following rule:
PrA (t + 1) =
e{s(t )× Asp[v A (t )]} k
e{s(t )× Asp[ vk (t )]}
Here, s (t ) denotes a sensitivity parameter. If the value of s (t ) is very low, then the choice becomes practically random, while if the value is high, the alternative with the highest expectation will be chosen in a deterministic way. It is further supposed that sensitivity increases with experience, according to the following law:
s (t ) =
t 10
c
where c is a further parameter. The DFT has been applied to IPDG by using: • a different model of decision for each player, with different values for w , a and c ; • different initial expectations for each player;
Decision Making Models within Incomplete Information Games
• • •
449
a probabilistic choice procedure; two possible types of choice: to speak or not to speak (cooperation or defection); the values attributed to each choice are modified on the basis of the payoffs obtained in each move.
Busemeyer and Stout’s model has not been implemented through a neural network, but simply through computational rules. However, it can be shown that their effect could be reproduced by the operation of a suitable equivalent neural network [24]. For this reason this model can be considered as “neural-like”. 4. Simulations As already said, to find the best model of IPDG we compared the sequences of outcomes of games played by neural network pairs – or Busemeyer and Stout pairs – with the ones occurring in games played by human subjects. This comparison required a suitable coding of game outcomes. In this regard each game was described by two vectors, each one with 100 components, coding the moves made by each single player in the different time game steps. The components of each vector were of binary nature, their value being 0 when coding a defection, and 1 when coding a cooperation. The comparison between the games played according to a given model and those played by human subjects was based on a single index C , in turn computed starting from two Bravais-Pearson correlation indices. The formula chosen for computing C was:
C=
(1 + ρ1 ) (1 + ρ 2 ) 4
where ρ1 denotes the Bravais-Pearson correlation coefficient between the series of moves played, within each players’ pair, by the first human player and the one of those played by the first neural network or Busemeyer-Stout player, while ρ 2 denotes the analogous correlation coefficient between the moves played by the second human player and the second neural network, or Busemeyer-Stout player. In this regard we remark that the behavioral models taken into consideration in this paper are all rather complex and contain many parameters. The main problem to solve is that of finding the correct values of the parameters granting for a good reproduction of specific games played by human beings. If this problem could allow of a solution, at least for a specific model, then we could
450
N. Bonfiglio et al.
reasonably assume that the latter would account for human decision making processes in IPDG. On the contrary, if we should recognize that such a solution does not exist, this would mean that none of the previous models can be used to explain the behavior of human players. However, it is to be taken into account that, owing to the complexity of models themselves, it is very difficult to find the correct values of models parameters, if any, by resorting to traditional methods. Therefore we tried to solve the problem through the use of Genetic Algorithms (GA). As it is well known, GA [25,26] are adaptive methods which can be used to solve research and optimization problems. They are based on computational rules inspired to genetic processes occurring in biological organisms. By imitating these processes, genetic algorithms can find solutions for real-life problems, if adequately coded. They work with a population of individuals, each of which represents a possible solution to the problem. Each individual is associated to an adaptation score, the fitness score, according to whether it is or not an adequate solution to the problem. The best individuals can reproduce, cross-breeding with other individuals of the population. This produces new descendants who share some characteristics of each parent. The least fit individuals are less likely to reproduce, and therefore extinguish. A whole new population of possible solutions is therefore produced by the selection of the best individuals of the current generation, who, reproducing among themselves, produce a new group of individuals. This new generation contains a higher proportion of the fit individuals’ characteristics of the previous generation. In such a way, after several generations, the good characteristics are spread to the whole population, in that they are mixed and exchanged with other good characteristics. By favouring reproduction among the fittest individuals, the most promising areas of the research space can be explored. If the GA has been well built, the population will converge towards an optimal solution to the problem. Each individual is also called chromosome, and the parameters are called genes. In this paper, each parameter is coded through a real number. As has been previously described, the GA needs a fitness function, which returns the fitness score to the individual considered. In our case, the function is described through the maximum value, on all games played by human subjects, of the correlation index C previously introduced. The operations carried out on chromosomes concern cross-over and mutation, the first through a random cut, only on individuals who have a fitness
Decision Making Models within Incomplete Information Games
451
Table 2. Average and maximum fitness values of each model. MODEL Pessa, Montesanto and Longobardi (1996) Bonfiglio (1999) Busemeyer and Stout (2002)
AVERAGE FITNESS 0.3901961
MAXIMUM FITNESS 0.4591469
0.4116290 0.3884554
0.4901118 0.4618777
that is superior to the threshold. The second consists of an increase or random reduction of a suitable percentage of single gene values. The fitness threshold under which individuals were eliminated was equal to 90% of the maximum fitness value registered in the whole population at time t . 5. Simulation Outcomes The values coming from comparisons between the games played by human subject pairs and artificial player pairs, as obtained from the best individuals found after applying GA to the models previously described are shown in the Table 2. As it is possible to see, none of the models was able to perfectly reproduce at least one of the games played by human subjects. 6. Conclusions The fact that none of the models taken into consideration can perfectly reproduce at least one of the games played by human subjects shows that a modeling of emergence of decisions, as postulated by Evolutionary Game Theory, is not feasible through simple modeling means. Cheap emergence is not possible! However, the models’ different abilities to reproduce the games must be highlighted. In particular, as Table 2 shows, the best performances occur in neural network models (Bonfiglio; Pessa, Montesanto and Longobardi) rather than in Busemeyer-Stout model. The data seem to suggest (even if in a very weak way) that, going towards a higher complexity of networks, related to the need for taking into account subjects’ expectations, it would be possible to reproduce in a better way the observed subjects’ behavior. Such a strategy, however, implies that, in order to account in a realistic way for the emergence of decisions in incomplete information games, we should resort to a detailed description of the operation of the whole cognitive system of each player, absent within the models taken into consideration. This, in turn, implies again that the hope of describing cognitive emergence by resorting only to simple general principles, expressed by simple computational rules, is vain. Such a
452
N. Bonfiglio et al.
circumstance, on the other hand, is nothing but a consequence of the failure experienced by actual Cognitive Science when trying to describe cognitive processing as an emergent phenomenon. New conceptual and technical tools are needed to make progress in this domain. References 1. H.A. Simon, The Quarterly Journal of Economics 69, 99-118 (1955). 2. R.M. May, Stability and Complexity in Model Ecosystems (Princeton University Press, Princeton, NJ, 1974).
3. J. Maynard Smith, Evolution and the Theory of Games (Cambridge University Press, 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
Cambridge, UK, 1982). R. Axelrod, The Evolution of Cooperation (Basic Books, New York, 1984). E. Akiyama, K. Kaneko, Physica D 147, 221-258 (2000). H. Gintis, Game Theory Evolving (Princeton University Press, Princeton, NJ, 2000). E. Akiyama, K. Kaneko, Physica 167, 36-71 (2002). J.P. Cruchtfield, Physica D 75, 11-54 (1994). J.H. Holland, Emergence: From Chaos to Order (Perseus Books, Cambridge, MA, 1998). G. Minati, E. Pessa, Collective Beings (Springer, Berlin, 2006). R. Axelrod, in L. Davis, Ed., Genetic Algorithms and Simulated Annealing (Morgan Kauffman, Los Altos, CA, 1987), pp. 32-41. W. Poundstone, Prisoner’s Dilemma (Doubleday, New York, 1992). J. Andreoni, J.H. Miller, The Economic Journal 103, 570-585 (1993). R. Axelrod, The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration (Princeton University Press, Princeton, NJ, 1997). R.D. Luce, H. Raiffa, Games and Decision (Wiley, New York, 1957). E. Pessa, A. Montesanto, M. Longobardi, in E. Pessa, M.P. Penna, A. Montesanto, Eds., 3rd Systems Science European Congress (Kappa, Roma, 1996), pp.1017-1021. D.M. Kuhlmann, A.F.J. Marshello, Journal of Personality and of Social Psychology, 32, 922-931 (1975). G.P. Knight, S. Kagan, Developmental Psychology 17, 783-790 (1981). N.S. Bonfiglio, General Psychology, 2 (1999). J.R. Busemeyer, I.J. Myung, Journal of Experimental Psychology: General 121, 177-194 (1992). I. Erev, A.E. Roth, American Economic Review 88, 848-881 (1998). J.R. Busemeyer, J.C. Stout, Psychological Assessment 14, 253-262 (2002). J.R. Busemeyer, J.T. Townsend, Psychological Review 100, 432-459 (1993). R.M. Roe, J.R. Busemeyer, J.T. Townsend, Psychological Review 108, 370-392 (2001). D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning (Addison-Wesley, Reading, MA, 1989). S.N. Sivanandam, S.N. Deepa, Introduction to Genetic Algorithms (Springer, Berlin, 2007).
EMERGENCE IN MEDICINE
This page intentionally left blank
BURNOUT AND JOB ENGAGEMENT IN EMERGENCY AND INTENSIVE CARE NURSES
PIERGIORGIO ARGENTERO, BIANCA DELL’OLIVO Department of Psychology, University of Pavia Piazza Botta 6, 27100 Pavia, Italy. E-mail: [email protected] Burnout phenomenon emerges from a constellation of factors which cannot be described in terms of cause-effect relationships. This study investigated levels of burnout in nurses working in Critical Care Units with a systemic approach, giving evidence of relation between nurses staff burnout and psychosocial workplace factors. The purpose of this study was to examine the relationship between job burnout in emergency and intensive care nurse with specific areas of work life in their organizations, using Maslach and Leiter work life model [23]. A cross-sectional survey was designed using the Italian version of the “Organizational Checkup System” in a sample of 180 Italian nurses. Results showed that high burnout levels were strongly related to high demands, low control, low fairness, lack of social support, and individual disagreement on values in the workplace. High professional efficacy levels were instead correlated to professional reward and leadership involvement. The article concludes by suggesting the possible areas for intervention in order to prevent job burnout and building job engagement. Keywords: critical care nursing staff, job burnout, emergence, systems.
1. Introduction Emergence is a fundamental feature of complex systems and can be thought of as a new characteristic or behavior which appears due to non-linear interactions within systems. Chaotic behavior in systems researches was observed within many fields of science, including biology, physics and social sciences. These newer developments of complexity theories made use of systems theories as well as theories of chaos and complex adaptive systems. Interrelated agents, self-organization, evolution toward the chaos and constant evolution are further characteristics of complex adaptive systems. Complex systems can exhibit multilevel feedback behavior. The complexity theory’s goal is the examination of a whole system and its environment. For example, within social systems, health care organizations are complex adaptive systems and as such, they share common characteristics with 455
456
P. Argentero and B. Dell’Olivo
these systems [25]. Emergent characteristics are dynamic in nature [12]: this means that emergent properties of a social system require continuous attention [17]. Thus, to provide an explanation for the complexity of the problems in health organization, there are a number of different investigative routes: from an individual’s behavior through the behavior of a whole group, workplaces, societies to that of the whole ecosystem. Furthermore, studies on complexity use chaos theory to suggest that the organizations should strive continuously to be at the edge of chaos, because living in a state of continuous chaos is costly. Working in emergency (to not be confused with ‘emergence’ quoted above) and intensive healthcare departments is often chaotic, and for this reason the staff is not always able to deal with the difficulties they may face during their shifts. This often leads to feelings of loss of control. Intensive and emergency care units are stressful workplaces, both for physicians and nurses [11]. Many emergency events, such as a patient with a severe trauma and patients with cardiac arrest, are difficult to manage and stressful for the staff. A further source of stress is the need to make important management decisions in a very short time. Stress associated with chaotic working conditions in hospital environments has been identified as a major cause of burnout, among health care staff. Recent work on burnout has developed new theoretical frameworks that more explicitly integrate both individual and situational factors. This approach of job-person fit, would seem to be an appropriate framework for understanding burnout. In particular Maslach and Leiter [23] formulated a model that focuses on the degree of match, or mismatch, between the person and his job environment. The greater the gap, or mismatch, between the person and the job, the greater the likelihood of burnout; conversely, the greater the likelihood of engagement with work. One new aspect of this approach is that the mismatch focus is on the enduring working relationship people have with their job. Leiter and Maslach [22] describe burnout as a psychological syndrome of Exhaustion, Cynicism and Inefficacy which is experienced in response to chronic job stressors. The first dimension: Exhaustion, measures fatigue without referring to other people as the source of one’s tiredness. The second dimension: Cynicism, reflects indifference or a distant attitude towards work in general, not necessarily to other people. Finally, Professional Efficacy encompasses both aspects of occupational and individual accomplishments. High scores on exhaustion and cynicism, and low scores on professional efficacy, are indicative of burnout.
Burnout and Job Engagement in Emergency and Intensive Care Nurses
457
Job engagement is assumed to be the positive antipode of burnout. As Maslach and Leiter [23] affirmed, Energy, Involvement and Efficacy, are the direct opposites of the three dimensions of burnout. In their view, burnout is an erosion of Engagement, whereby Energy turns into Exhaustion, Involvement turns into Cynicism, and Efficacy turns into Ineffectiveness [23]. According to Maslach and Leiter [23], job engagement is assessed by the opposite pattern of scores on the three burnout dimensions: that is, low scores on Exhaustion and Cynicism, and high scores on Efficacy, are indicative of job engagement. Maslach and Leiter model has brought order to the wide variety of situational correlates by proposing six areas of work-life that encompass the central relationships with burnout: Workload, Control, Reward, Community, Fairness, and Values. Workload refers to the relationship of work demands with time and resources. Increasing workload is consistently related to higher levels of emotional exhaustion [23]. Control incorporates role clarity within the organization that provides a clear understanding of expectations and responsibilities, witch leads to efficiency, decreased stress and autonomy in the work setting [20]. Reward refers to the actual time invested in the work setting and the recognition associated with it. Community includes the quality of social relationship within the organization. Fairness in the workplace involves trust, openness and respect and some extent a quality of managerial support [19]. Finally, Values refers to the congruence between organization’s expectations and those of its employees [23]. Maslach and Leiter [23] argued that low levels on each of these areas of work life reflect a job-person mismatch that, if it persists, may result in low levels of engagement [22]. According to this model, a mismatch in one or more of these areas of work life can result in burnout [22]. Some previous researches found that lower levels of burnout were associated with staff nurses’ perceived autonomy and control over practice [18], while high levels of emotional exhaustion among nurses, were related to time constraints and work overload [15,16,2]. Leiter and Maslach [22] examined the effect of the six areas of work life on burnout and they found that mismatched areas of work life most strongly related to emotional exhaustion were workload, fairness, and control. Cynicism was
458
P. Argentero and B. Dell’Olivo
most strongly related to a mismatch between value and fairness. Personal efficacy was most strongly related to control and values. Moreover Maslach and Leiter model has introduced other four dimensions, than the authors assume to be correlated to burnout, that is: Change and three management processes: Leadership, Skill Development and Group Cohesion. Change dimension is composed by 10 items, that explore changes perception inside organization, for example, regarding quality of leadership and sanitary team members cooperation. The three management processes are composed by 13 items. Leadership refers to the opinion about the management and the quality of communication inside the organization. Skill Development refers to the opportunities of professional development that the organization offers. Finally, Group Cohesion regards the group identity and the work group cohesion. This model, beyond to offer a theoretical framework for the study and understanding burnout, provides direction for improving work environments. According to Maslach and Leiter [23] as mismatches are interrelated, action on any one area will tend to improve at least some of the other ones. Addressing areas of mismatch fosters work engagement, thereby decreasing or preventing burnout. 2. Objective and Hypothesis The purpose of this study is to investigate the relationship between work-life perceptions and burnout levels in nurses working in Emergency and Intensive Health Care Units. It is clear that at this stage it is not possible to propose a complete systemic model of emergence of burnout phenomena. The first goal is therefore the one of individuating the factors which can influence the behavior of subjects and whose synergy makes emergent just burnout itself. This preliminary individuation is here based on the use of typically linear tools. In any case, this preliminary stage will be followed by a further stage in which it will be possible to check the nonlinearity of interactions produced by the factors quoted above. In particular, the objectives of the study are to examine: 1. the differences between nurses operating within four different Healt Intensive Care Departments in relation to job burnout and some work life areas; 2. the relationships between the three burnout dimensions and several areas of work life, organizational changes and process management. Leiter and Harvie [19], also suggested that it is realistic to expect that when nurses feel that they have reasonable workloads, control over their work,
Burnout and Job Engagement in Emergency and Intensive Care Nurses
459
adequate rewarding, positive working relationships and that their values are congruent with organizational values, they are less likely to experience burnout. Thus, in agreement to previous researches [23,20,15,16,2,22] we can formulate the following hypotheses: 1. Emotional exhaustion is related to workload, working control and fairness; 2. Cynicism is related to value, fairness and group cohesion; 3. Professional efficacy is related to values, rewarding and leadership involvement. 3. Methods 3.1. Design A cross-sectional correlational study was carried out in four Emergency and Intensive Health Care Departments in an Italian Hospital. Approval for the use of human subjects was obtained from the Hospital Technical-Scientific Committee of Bioethics. The nurses of each health care unit were informed on the aim of this study, stressing the opportunity to identify the critical work-life areas that needed enhancements so as to improve workplace quality and prevent burnout. In the introductory letter, the confidential and voluntary nature of our research participation was asserted. All the research participants were asked to fill in the informed consent attached to the questionnaire. They were also reassured on the anonymity and on the exclusive treatment of their data for the present research. 3.2. Participants Nurses (N = 180) working full-time or part-time in four intensive health care units in a hospital in Northern Italy: Emergency Care, General Intensive Care, Post-Surgical Intensive Care and Coronary Intensive Care Units, were involved in this research. 3.3. Instrument Job burnout was assessed using the Italian version [7] of the Organizational Check up Survey (OCS) [23], finalized to measure job burnout/engagement and workers evaluation on some aspects of their organizational context. This questionnaire is constituted by 4 Scales and 68 items, in order to measure subjects relation with their workplace and the areas of working life. The first
460
P. Argentero and B. Dell’Olivo
Scale is the Italian version of the Maslach Burnout Inventory - General Survey (MBI-GS) [26]. The MBI-GS includes 16 items distributed among three subscales: Exhaustion (five items; e.g., ‘I feel used up at the end of a work day’); Cynicism (five items: e.g., ‘I doubt the significance of my work’); Professional Efficacy (six items; e.g., ‘I can effectively solve the problems that arise in my work’). All items were scored on a seven-point frequency rating scale ranging from 0 (‘never’) to 6 (‘always’). High scores on Exhaustion and Cynicism and low scores on Professional Efficacy are indicative of burnout. The second Scale, finalized to investigate the Work Life Areas, is the Italian version of the Areas of Work Life Survey [23]. This scale is composed by 29 items, distributed between 6 sub-scales, that measures the six Areas of working life. The six areas (Workload, Control, Reward, Community, Fairness and Values) are distributed on a continuum (lowhigh). The third Scale is composed by 10 items that measure the perception of occurred organizational Changes. At last, the fourth Scale is composed by 13 items that investigate Management Processes, that is: Leadership, that refers to the opinion about one’s own bosses; Skills development, that is the possibilities of professional development that the Organization offers; Work group cohesion. For all affirmations, except the Change items, others Scales were expressed by 1 to 5 score: 1 = I disagree very much; 2 = I disagree; 3 = difficult to decide, 4 = I agree; 5 = I agree very much. As for the Change, the participants were asked to estimate the quality of changes that occurred inside the organization using a 5-point Likert scale ranging from 1 to 5: 1 = very negative change, 2 = negative change, 3 = no change at all, 4 = positive change, 5 = very positive change. After obtaining a written consent from the nurses, and before providing them with the Organizational Check up Survey, nurses were asked to fill in a questionnaire about their sociodemographic data (gender, age, marital status, children) and working conditions (permanent employment, seniority in hospital and weekly working hours). Nurses rated each item and reported their feelings about their job using a 7-point Likert scale ranging from 0 to 6 for the first OCS Scale, while a 5-point Likert scale ranging from 1 to 5 was used for the other three Scales. Each nurses out of the four health care units was rated according to the three burnout dimensions of Emotional Exhaustion, Cynicism and Professional Efficacy.
Burnout and Job Engagement in Emergency and Intensive Care Nurses
461
3.4. Data analyses Data analysis was carried out by means of the “Statistical Package for the Social Sciences” (SPSS) version 13.01. The following statistical analyses were performed on data collected: 1. descriptive analysis, to calculate the frequency distribution, percentages of category variables, the mean values and standard deviations for continuous variables; 2. analysis of variance, using the socio-demographic characteristics of workers and Health Care Units as independent variables, while questionnaire items regarding job burnout dimensions and work life areas, as dependent variable. Tukey’s multiple comparison post hoc analysis was used to compare each nurse group to the others; 3. correlational analysis (Pearson’s r) in order to determine correlations between the four OCS Scales. 4. rank order correlations analysis (Spearman’s Rho) between the three means burnout'measures and work environment variables for each Health Care Unit. 4. Results In total 140 nurses (78%) answered the questionnaires. As regards sociodemographic characteristics, with reference to gender, the majority of the nurses who answered the questionnaire were female (64%). Nurse averaged 35 years of age (SD = 8.28). Almost 54% were either married or cohabiting, and the majority (71%) had not children. The mean length of time spent working in heath care was 20 years (SD = 9.75), of which 10 years (SD = 9.25) were at the current workplace. With regard to the belongings Units Care, nurses were distributed in the following four Departments: Emergency Care (37%), General Intensive Care (27%), PostSurgical Intensive Care (19%) and Coronary Intensive Care (17%). The comparison of the mean values for all nurses in the 13 dimensions of the questionnaire, showed that the most problematic aspects regarded nurses workload (M = 20.40; SD = 3.58) and reward (M = 10.86; SD = 3.54) while the highest balance area was community or social integration (M = 16.89; SD = 3.56), that is the supportive climate between the work group members, that would represent a resource for nurses. The analysis of variance did not find significant differences between mean score on the main variables of the questionnaire in relation to sociodemographic (gender, age, marital status, having children) and working
462
P. Argentero and B. Dell’Olivo Table 1. Means (M) and Standard Deviations (SD) of the Organizational Check up Survey dimensions. PostSurgical Intensive Care M SD Emotional Exhaustion 18.90 4.48 Cynicism 22.70 4.77 Professional Efficacy 20.6 4.51 Workload 20.25 5.48 Control 9.26 2.38 Reward 9.17 2.93 Community 15.71 4.63 Fairness 15.23 5.82 Values 10.93 4.33 Change 28.39 7.85 Leadership 15.18 5.16 Skill Development 8.15 3.38 Group Cohesion 8.80 2.20
OCS dimensions
1 2 3 4 5 6 7 8 9 10 11 12 13
OCS dimensions 1 2 3 4 5 6 7 8 9 10 11 12 13
F
Emotional Exhaustion 6.56 Cynicism 6.45 Professional Efficacy 10.34 Workload 2.27 Control 3.34 Reward 6.62 Community 2.48 Fairness 2.46 Values 5.92 Change 4.97 Leadership 14.57 Skill Development 13.54 Group Cohesion 5.14
p <.01 <.01 <.01 ns <.05 <.01 ns ns <.05 <.05 <.01 <.01 <.01
General Intensive Care M 16.18 16.65 22.27 20.41 10.74 10.83 16.58 14.82 11.33 30.21 18.12 10.97 9.47
SD 4.87 5.47 4.85 5.67 2.44 3.86 4.66 5.52 4.97 7.85 4.91 3.16 2.03
Total Scores M SD 16.37 5.31 17.54 5.71 22.98 5.15 20.40 5.37 10.11 2.77 10.86 3.54 16.89 4.57 15.39 5.66 11.93 4.65 31.09 7.78 18.48 4.99 10.96 3.36 9.38 2.27
Emergency Care M 15.63 17.57 22.06 20.71 10.00 10.66 17.79 15.73 11.90 32.25 17.69 11.53 8.92
SD 4.55 5.90 4.71 5.00 2.90 3.56 4.18 5.57 4.63 8.87 5.28 3.24 2.61
Coronary Intensive Care M 14.75 13.22 27.00 20.22 10.42 12.79 17.46 15.78 13.57 33.5 22.92 13.18 10.34
SD 7.35 6.70 6.52 5.34 3.36 3.80 4.79 5.72 4.68 6.57 4.60 3.64 2.25
Italian Norms M SD 18.53 7.35 21.55 6.65 25.57 6.97 17.73 4.60 9.98 2.40 12.27 3.54 15.92 4.19 14.98 4.66 11.78 3.28 29.07 5.66 17.87 5.82 11.96 3.50 9.65 2.61
Note: ns = not significant.
conditions (permanent employment, seniority in job hospital and hours worked per week). As regard gender, data collected indicated that female nurses tended to rate higher Cynicism score than male, but these difference didn’t reach statistical significance. In order to compare the four Critic Care Units (Emergency Care, General Intensive Care, Post-Surgical Intensive Care and Coronary Intensive Care) in relation to burnout and Areas of Work Life, table 1 shows the mean values and standards deviations of the thirteen Organizational Check up Survey dimensions. In addition table 1 reports F and p values, and the Italian Norms.
Burnout and Job Engagement in Emergency and Intensive Care Nurses
463
Consistently with previous results, Emotional Exhaustion and Cynicism were included as indicators of the burnout variable, whereas Professional Efficacy was included as an indicator of the engagement variable. The analysis of variance revealed statistically significant differences for most dimensions. Nurses working in Post-Surgical Intensive Care Unit reported a statistically higher degree of burnout in comparison to nurses working in the Coronary Intensive Care Unit (p < .01). As for the Emotional Exhaustion, nurses belonging to the Post-Surgical Intensive Care Unit registered the highest level of Emotional Exhaustion (M = 18.90; SD = 4.48) with a statistically significant difference (F = 6.56; p < .01) in comparison to all other nurses; in particular, the Coronary Intensive Care department nurses reported the lowest Emotional Exhaustion scores (M = 14.75; SD = 7.35). The results of Tukey’s multiple comparison post hoc analysis revealed significant differences (p < .01) between the Post-Surgical Intensive Care ward and the Coronary Intensive Care ward. As for the other two dimensions of burnout (Cynicism and Professional Efficacy), similarly, Emotional Exhaustion with significant differences between all groups of nurses (p < .01) was also noted. As for Cynicism, the highest score (higher than the normative Italian values) was observed in the Post-Surgical Intensive Care unit (M = 22.70; SD = 4.77), while the lowest value belonged to the Coronary Intensive Care unit (M = 13.22; SD = 6.70), with a statistically significant degree in comparison to the Coronary Intensive Care unit (F = 6.45; p < .01). Regard to the Professional Efficacy dimension, the majority of health care units obtain lower scores, in reference to Italy Score Norms; on the contrary, Coronary Intensive Care Unit get a high score (M = 27.00, SD = 6.52). A Tukey post hoc test shows a significant differences (p < .01) between Coronary Intensive Care Unit and all the other Health Care Units. Considering overall burnout with the other dimensions of working life, in general we can assert that the nurses participants to this study have a quite balanced profile, but they negatively suffer the lack of job rewarding and the insufficient perspectives of organizational and personal development. Ours results show that the main risk factors in insurgence and burnout development are unsustainable workload, linked to high workload score in all four health care units, and poor professional rewarding. In terms of the work setting, the Workload or work pressure is the more highest in Emergency Departments in comparison to the other Health Care Units (in agreement to Adali and
464
P. Argentero and B. Dell’Olivo
Priami [1]), however the analysis of variance between the four groups of nurses did not produce any statistically significant difference among the groups. More than 80% of the participants felt that their level of autonomy was average, or higher, if compared to the Italian norms. Almost 80% of the subjects displayed average or higher levels of satisfaction with the Leadership dimension. These results also agree with Bell [6]. In order to verify the supposed relationships between the three burnout dimensions, as thought by Maslch and Leiter [23], and the others ten dimensions of Organizational Check up Survey (OCS) (six Areas of Work Life, perceived organization Changes and three Process of Management), it was conducted a correlation analysis (Pearson' s r) among the thirteen OCS scales, as evidenced in table 2. About Emotional Exhaustion dimensions, table 2 show that incidence of burnout was associated with Control, Fairness, Value, Leadership and Group Cohesion, but not with Workload. Thus our first hypothesis can be partially confirmed. In fact on the contrary of Leiter and Maslach [23] supposed, our result did not found a significant positive correlation between Workload and Emotional Exhaustion. Our data indicate that higher Emotional Exhaustion value is negatively correlated with the sense of work Control (r = -.27, p < .01), negatively correlated with Fairness (r = -.34, p < .01), negatively correlated with Value (r = -.33, p < .01), negatively correlated with leadership (r = -.28, p < .01) and negatively correlated with Group Cohesion (r = -.29, p < .01). Higher Cynicism value is negatively correlated with Value (r = -.22, p < .01), negatively correlated with Leadership (r = -.32, p < .01) and negatively correlated with Group Cohesion (r = -.42, p < .01). On the other hand, higher Professional Efficacy value is positively correlated with Reward (r = .29, p < .01), with Fairness (r = .26, p < .01), with Value (r = .27, p < .01), with Leadership (r = .40, p < .01) and with Group Cohesion (r = .26, p < .01). These data supported hypothesis 3 and partially hypothesis 2; in fact our results are in contrast with Leiter and Maslach [23], witch assumed that a lack of Fairness exacerbated burnout producing Exhausting and increasing a deep sense of Cynicism about the workplace. Our results demonstrated a significant negative correlation (p < .01) between Fairness and Emotional Exhaustion, but not between Fairness and Cynicism. The rank order correlations between means for burnout'measures and work environment variables for each health care unit were calculated using Spearman’s Rho correlation coefficient. The significance order correlations are illustrated in figure 1, 2 and 3.
Burnout and Job Engagement in Emergency and Intensive Care Nurses
465
Table 2. Correlation coefficients (Pearsons’ r) between the 13 Organizational Check up Survey Scales. 1 2 3 4 5 6 7 8 9 10 11 12 13
OCS dimensions Emotional Exhaustion Cynicism Professional Efficacy Workload Control Reward Community Fairness Values Change Leadership Skill Development Group Cohesion
1 2 3 4 5 6 7 8 9 10 11 12 13
OCS dimensions Emotional Exhaustion Cynicism Professional Efficacy Workload Control Reward Community Fairness Values Change Leadership Skill Development Group Cohesion
Note:
# p < .01 ;
1 -
8 -.337# -.007 .256# -.033 .091 .179* .187* -
2 .122 -
3 -.398# -.298# -
9 10 -.325# -.247# -.220# -.065 .273# .153 .142 .049 .140 .115 .241# .377# .140 .192* .359# .360# .309# -
4 .072 .114 -.073 -
5 6 -.286# -.171* -.085 -.107 .192* .290# -.021 -.017 .074 -
7 -.055 -.038 .108 -.065 .141 .201* -
11 12 13 -.281# -.058 -.288# -.322# -.227# -.424# .400# .165 .264# -.038 .024 -.003 .138 -.021 .152 .404# .230# .271# .076 .072 .025 .250# .176* .311# .218* .207* .289# .338# .363# .383# .425# .531# .378# -
* p < 0.05.
Figure 1, 2 and 3 show an example of dispersion graphs that respectively illustrate the correlation between: the nurses’ Emotional Exhaustion and their perception Value among all four health care units (Rho = -.81; p < .01); the nurses’ Cynicism and their opinion on Leadership (Rho = -.95; p < .01); the nurses’ Professional Efficacy and their Reward (Rho = .97; p < .01). These graphs indicate how high levels of Emotional Exhaustion and Cynicism correspond respectively to low levels of a nurse’s perception of Value and low values concerning the nurses’ opinion about Leadership and vice versa; moreover, high Professional Efficacy levels are associated with a higher level of nurses Reward.
466
P. Argentero and B. Dell’Olivo
Emotional Exhaustion
19,00
Post-Surgical Intensive Care
18,00
17,00 General Intensive Care 16,00
Emergency Care
15,00 Coronary Intensive Care 14,00 11,00
12,00
13,00
14,00
Values Figure 1. Correlation between the nurses’ ratings of Emotional Exhaustion and Values.
5. Discussion The main results of this study indicate that lack of social support from coworkers (Group Cohesion) and from one’s Leadership can be associated with both Emotional Exhaustion and Cynicism. Health care personnel with a high Community value, adequate Leadership support and Values consistent with organizational values, seem less likely to experience Emotional Exhaustion. The importance of adequate social support was expounded in previous studies [8]. This study also shows that lack of support from one’s co-workers and from leadership seems to contribute to burnout, increases Cynicism, and reduces Professional Efficacy. However, the most critical dimensions in all the Health Care Units considered are Workload and Reward. These results are consistent with the ones obtained by other researchers who designated lack of work Control, heavy Workload and low Rewarding as probably the strongest determinants of Emotional Exhaustion [15]. One seemingly contradictory phenomenon observed in our study is that while the reward dimension appears positively correlated with Professional Efficacy and negatively correlated with Emotional Exhaustion, Workload does not have any relationship to the other dimensions. In fact, workload is perceived as elevated in all our Health Care
Burnout and Job Engagement in Emergency and Intensive Care Nurses
467
24,00 Post-Surgical Intensive Care
Cynicism
22,00 20,00 Emergency Care General Intensive Care
18,00 16,00 14,00
Coronary Intensive Care
12,00 16,00
18,00
20,00
22,00
Leadership Figure 2. Correlation between the nurses’ ratings of Cynicism and perceived Leadership.
Units, including those Units where there was no relation at all to the level of exhaustion or professional realization of the nurses. This data can be explained by the fact that the reasons for the insurgence of burnout in staff nurses do not simply lie in Workload. Hard physical work, patients’ typology, or other psychosocial variables can limit social support from one’s colleagues and Leadership. As argued by Schaufeli and Enzmann [27], patient-related stressors seem to be less correlated to burnout than such stressors as high workload under pressure, role ambiguity, role conflicts and decreasing autonomy or loss of control. The lower burnout level found in the Coronary Intensive Care Unit, in presence of a high level of workload but high levels of reward, values’ consistency and positive opinion about one’s leadership, could be considered as an important aspect of one’s work. This might reflect the variability of the form of social support [14] identifiable as a moderator between work-life stressors and burnout. The purpose of this study was to provide insight into the nurses’ experience in Emergency departments with job burnout and work-life related stress. Understanding this experience will allow others (such as hospital management) to become aware of the negative effects that the phenomenon of burnout can have upon staff morale and patient care.
468
P. Argentero and B. Dell’Olivo
27,00
Coronary Intensive Care
Professional Efficacy
26,00 25,00 24,00 23,00
General Intensive Care
22,00
Emergency Care
21,00
Post-Surgical Intensive Care
20,00 9,00
10,00
11,00
12,00
13,00
Reward Figure 3. Correlation between the nurses’ ratings of Professional Efficacy and perceived Reward.
Intensive Care and Emergency department nurses face various work-related stressors as a result of the hectic and chaotic environment in which they are working. These nurses must be adept at handling multiple tasks simultaneously in order to meet the demands raising from increased volume of patients, high patient acuity as well as a shortage of personnel. These circumstances often lead to situations where the Intensive Care and Emergency department nurses are providing care in less than desirable conditions such as in overloaded Emergency departments, where there are not enough nurses to provide therapeutic care to the patients. It is essential that Intensive Health Care and Emergency departments nurses maintain a sense of order in terms of what is happening around them, so that they will be able to provide health care to their patients while still maintaining their own ability to react quickly. A feeling of loss of control was present in nurses feeling they could not properly manage the workplace environment and were unable to meet patient care goals. Some nurses described a sense of pressure and helplessness as they faced situations they could not handle. Other nurses described their loss of control as a result of frequent encounters with an increased volume of patients and high patient acuity which decreased the amount of time devoted to the care of patients. As a result, feeling out of balance with work environment leads some nurses to doubt their care-
Burnout and Job Engagement in Emergency and Intensive Care Nurses
469
giving abilities. In line with a systemic approach, these nurses’ experiences with job burnout have not only a negative impact on their ability to provide fair care to patients but also on the whole health system. The consequences of such job burnout experiences can turn into feelings of cynicism and lack of time. Feeling the burden of a heavy workload plays a significant role in nurses uncaring actions toward patients. Inadequate time to spend with patients leads to increased frustration, which affect the quality of care provided to patients. Some limitations of this study should be noted. First, this research investigated the experience of Intensive Care and Emergency Department nurses with burnout/engagement; however, the findings of this study cannot be generalized to the population of all the Intensive Health Care and Emergency departments, because only a small number of nurses participated in the present research and they worked only in one North Italy hospital. These nurses’ experiences may be different from other Intensive Health Care and Emergency department nurses in Italy as well as across other Countries. Similar studies performed on other populations of Intensive Health Care and Emergency department nurses may provide different results. Likewise, the work-life areas and management processes related to burnout faced by other Intensive Health Care and Emergency department nurses may not be similar to the work-life areas and management processes described here. Moreover, the number of individuals in different nurse groups is unequal, especially regarding gender; this issue is mainly relevant because of the low number of male nurses, making it impossible to establish, for instance, whether Cynicism is gender related. In addition, because this study is based on self-reported cross-sectional data this makes inferences about causality less feasible. Despite these limitations, the results still provide a meaningful contribution to burnout research, yielding valuable new information about the role that worklife areas and management process play in burnout. These results should encourage further investigations related to burnout in health care organizations. For instance, other quantitative studies that investigate the risk of burnout within other health care providers could also be conducted. These studies could be useful in predicting risk and burnout growth, which could enhance the growth of policies and programs helping health providers to prevent the development of this phenomenon. By maintaining the well-being of their care operators, health care systems can maintain the well-being of their patients. In this way, health care systems can ensure the well-being of the entire system if they can prevent their health workers burnout and improve engagement. The systems themselves may have a specific state of mind or mood, that Work Psychology defines
470
P. Argentero and B. Dell’Olivo
“organizational climate”. The climate of an organization is both cause and effect of the state of the individual’s mood. A disintegrated climate is evidence of a general situation that can lead to a diffuse burnout system. Therefore the control and improvement of the organizational climate could be useful tools to prevent burnout. In line with the findings of the research literature, the focus on job environment, as well as the persons working in it, is essential for dealing with burnout. This suggests that the most effective way of intervention is to combine changes in managerial practice with training interventions. Coping with work overload only is not enough for an effective intervention and focusing on some other kinds of mismatch may be more successful. For example, according to Leiter and Maslach [21] we have found that Coronary Intensive Care nurses may be able to bear greater workloads, this is probably because their work values are consistent with the organizational values and they feel well-rewarded for their efforts. This suggests that an organizational intervention to improve engagement should target the Value and Reward areas. As argued by other authors [21,29] the advantage of combined intervention (managerial approach and training) aims to build engagement in the workplace. In conclusion, the data that emerged from the present research confirm the need to measure the overall organizational variables in order to reach a systemic understanding of the burnout phenomenon. In particular, our results confirm that chaotic and unforeseeable job situations can be a cause of stress and, in turn, stress and workplace climate can be cause of burnout. A systemic vision of burnout phenomenon helps to understand the insurgence of problems and promote operator’s engagement. Further studies will give the empirical basis for introducing more complete systemic models of burnout, relying on the factors individuated by this first study. References 1. E. Adali and M. Priami, ICUs and Nursing Web J. 11, 1-19 (2002). 2. L.H. Aiken, S.P. Clarke, D.M. Sloan, J. Sochalski and J.H. Silber, J. of the American Medical Association 288(16), 1987-1993 (2000).
3. I. Altun, Nursing Ethics 9(3), 269-278 (2002). 4. A.B. Bakker, E. Demerouti and M.C. Euwema, Journal of Occupational Health Psychology 10(2), 170-180 (2005).
5. A.B. Bakker, W.B. Schaufeli, E. Demerouti, P.M.P. Janssen, R.Van der Hulst and J. Brouwer, Anxiety Stress Coping 13, 247–268 (2000).
6. R. Bell, M. Davison and D. Sefcik, J. of the American Academy of Physician Assistants 15(3), 40-56 (2002).
7. L. Borgogni, D. Galati, L. Petitta and Centro Formazione Schweitzer, Manuale dell’adattamento Italiano (O.S. Organizzazioni Speciali, Firenze, 2004).
Burnout and Job Engagement in Emergency and Intensive Care Nurses
471
8. R. Bourbonnais, M. Comeau and M. Vezina, J. of Occupational Health Psychology, 4, 95–107 (1999).
9. J. Cho, H.K. Laschinger and C. Wong, Canadian J. of nursing leadership 19(3), 4360 (2006).
10. E. Demerouti, A.B. Bakker, J. De Jonge, P.P.M. Janssen and W.B. Schaufeli, J. of Work and Environment and Heath 27, 279–286 (2001).
11. L. Goh, P.A. Cameron and P. Mark, Emergency Medicine Australasia 11(4), 250257 (1999).
12. J. Goldstein, Emergence 1(1), 49-72 (1999). 13. E.R. Greenglass, R.J. Burke and L. Fiksenbaum, J. of Community & Applied Social 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
Psychology 11(3), 211-215 (2001). J.R.B. Halbesleben and M.R. Buckley, J. of Management 30, 859–879 (2004). P. Janssen, J. Jonge and A. Bakker, J. of Advanced Nursing 29, 1360–1369 (1999). P.P.M. Janssen, W.B. Schaufeli and I. Houkes, Work and Stress 13, 74–86 (1999). S. Johnson, Emergence: The connected lives of ants, brains, cities, and software (Addison-Wesley Publishing Company, New York, 2001). H.K.S. Laschinger, P. Greco and C. Wong, Nursing Leadership 19(4), 41-56 (2006). M.-P. Leiter and P.-L. Harvie, International. J. of Social Psychiatry 42, 90–101 (1996). M.P. Leiter and C. Maslach, J. of Health and Human Service Administration, 472489 (1999). M.P. Leiter and C. Maslach Preventing Burnout and Building Engagement: A Complete Program for Organizational Renewal (Jossey-Bass, San Francisco, CA, 2000). M.P. Leiter and C. Maslach, in Research in Occupational Stress and Well Being, Ed. P.L. Perrewe and D.C. Ganster, (JAI Press/Elsevier Science, Oxford, UK, 2004). C. Maslach and M.P. Leiter, The truth about burnout. How Organizations Cause Personal Stress and What to Do About It (Jossey-Bass, San Francisco, CA., 1997). C. Maslach, W.B. Schaufeli and M.P. Leiter, Annual Review of Psychology 52, 397– 422 (2001). R.R. McDaniel and D.J. Driebe, Advances in Health Care Management 2, 11-36 (2001). W.B. Schaufeli, M.P. Leiter, C. Maslach and S.E. Jackson, in The Maslach Burnout Inventory: Test manual, (3rd ed.), Ed. C. Maslach, S.E. Jackson and M.P. Leiter (Consulting Psychologists Press, Palo Alto, CA, 1996), pp. 22–26. W. Schaufeli and D. Enzmann, The Burnout Companion to Study and Practice: A Critical Analysis (Taylor & Francis Ltd, London, 1998). M. Siegall and T. McDonald, Personnel Review 33, 291–301 (2004). L. Vickman, Medical Group Management J. 47(1), 18-21 (2000). W. Zhu, Z.M. Wang, M.Z. Wang, Y.J. Lan and S.Y. Wu, Journal of Sichuan University. Medical science edition 37(4), 632-663 (2006).
This page intentionally left blank
THE “IMPLICIT” ETHICS OF A SYSTEMIC APPROACH TO THE MEDICAL PRAXIS
ALBERTO RICCIUTI AIRS - Associazione Italiana per la Ricerca sui Sistemi and Attivecomeprima-Onlus (Breast Cancer Association) Via Livigno 3 – 20158 Milano, Italy E-mail: [email protected] The unstoppable acceleration of the scientific and technological development that is revolutionizing our socioeconomic systems in recent years has made the critical aspects and the inadequacy of medical epistemology more and more evident. Several elements have underlined the insufficiency of traditional ethical points of reference in Medicine, like the change of individual needs, the technical possibility of long-term management of heavy diseases, the change of the social and health systems caused by the interaction of different ethnic groups and cultures, several problems linked to the fair distribution of resources in regime of fiscal scarcity involving all the industrialized countries of our world. This brought to the necessity for Medicine to modify its coordinates, adjusting them on the person, and not on the disease. In order to reach this objective, the author strategically suggests Systemics as the epistemological guidance of the knowledge process, which can make the scientific method operate in an ethical and cultural horizon centered on the human being valorization, on the respect of his/her needs and the respect of his/her environment. A systemic approach of the medical thought can allow the re-orientation of the clinical look from a biological to a biographic one, the re-definition of the aim of the medical intervention as the restoration and support of self-organizing and self-regulating processes of the biological system, the achieving of a social and health expenditure’s saving through a major appropriateness of prescription and an inherent preventive valence of medical interventions, the offer of new and larger horizons for the development of scientific research. Keywords: medical ethics, bioethics, Systemics, medical epistemology, Systems theory.
1. Introduction When a civilization perceives that its system of values, which for centuries has guided its stability and coordinate system that has always allowed it to regulate the organization of the social system and to plan its future, is falling apart, the need of an ethical reflection emerges, that inevitably leads to the involvement of many fields of knowledge and action. This is what is happening at exponential speed in the last decades, deeply involving also Medicine and related topics.
473
474
A. Ricciuti
For example, the fact that Amartya K. Sen has been awarded the 1998 Nobel Prize in Economic Sciences for an economic theory that considers the degree of individual “happiness” as a fundamental criterion for the evaluation of economic policies’ appropriateness and effectiveness, inevitably leads to reflect on how and how much the image that the individual has of him/herself today is changing, of his/her needs and role in the world. We are facing a radically new conception of Economics, in a strong synergy with Ethics, exactly where Sen writes that “to the value of wealth, that always remains a basic element of the market, must be added happiness, which is a different concept from wealth. A person is richer than another one when he/she is happier and has reached a better quality of life”. The link between these issues and the problems that Medicine has to face today is evident. These problems are linked to the changed needs of the individual, to the radical change of the social and health system determined by the interaction between different ethnic groups and cultures, to conflicts linked to a fair distribution of resources in the regime of fiscal scarcity involving nearly all the industrialized countries of the world. And all of this leads to further important problems - not only related to health - between the North and the South of the world, that here we only mention. In short, the old normative Aristotelian conception of Ethics, based on the concept that the individual has the task to identify correct and universal criteria in order to live a good life (from which a moralistic type of ethics derives), not only it is not enough anymore, but also it is revealing wrong, because not suitable to guide the shocking process of individual, social and cultural change of which we are witnesses. 2. On the threshold of change Today, the way of all of us to feel our life and our role in the world, is changing not only deeply but also extremely quickly; and, consequently, the hierarchical relationships between individual, professional and social roles are changing. The essential difference, compared to past centuries, is the role that the scientifictechnological development has today, in its relationships with our daily life, with scientific research, and finally with our health and our illness. Once scientific knowledge used to produce technology. Today this process is to a large extent inverted: new technical possibilities induce and allow the production of new knowledge (as it has always been, and still is, for instance, in cosmology, thanks to the spatial telescope Hubble or in neurophysiology thanks to the sophisticated techniques of NMR and so on). In the last century, as never
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
475
happened before, scientific-technological knowledge (today they can’t be separated anymore) has broken up our way of thinking to health and illness, as well as the physician-patient relationship. The whirling acceleration of the possibilities offered by a more and more sophisticated technology is even upsetting our perception of time. The difference of scale between the natural time of the biological evolution - that in the past centuries gave stability to the daily life and trust in future projects - and the drugged time of our “live life” - to which we are getting used due to the technical possibilities of communication and information manipulation - it is worryingly conditioning our ability to perceive the events in their process structure (included the biological ones, which concern health and illness). In other words, we are loosing sense of the historical dimension of the events not only concerning our present, but also our past and future. That past that today’s young people are loosing, because they perceive it as more and more disconnected from the present, whose sense they find it hard to understand; and that future that they see more and more uncertain, that they find more and more harder to plan, and they are afraid to face. The contradictory and conflictual aspects we have to face are at least two. The first one, is that the scientific-technological development is maybe at the same time the cause of the problem and the most important tool, and however unavoidable, to get to the solution; the second, is that the acceleration of these changes has already overcome abundantly our ability to metabolize them, both to the individual and the collective level (Schiavone, 2007) [8]. It is a huge work to redesign the criteria of use of the scientifictechnological instruments, to reconstruct the relationships between different generations following a historical continuity and, above all, to redefine the sense and the role of a new Ethics. This work asks for planning ability, time – exactly that time that is shorter and shorter and running away) and the full awareness of being involved in this kind of process, which is fascinating and unstoppable, but also overwhelming - if we don’t succeed in governing it - because we already have abundantly overcome the threshold of reversibility, if it has ever existed. 3. Searching a new epistemology “Our civilization has conducted us, through the last dizzy line of its walk, on the extreme edge of a threshold over which it expects to us a passage full of risks but also of extraordinary opportunities” - Aldo Schiavone writes - and it hypothesizes that we would be able to be on the threshold of a “new humanism”, rather than to succumb.
476
A. Ricciuti
The change, despite ourselves, has already begun, and to undertake this walk is something like jumping on a running car. What appears the priority it is the necessity to change our coordinates of thought to govern the evolutionary process in our contemporary world and not to be overwhelmed by it. To change our coordinates doesn’t mean to throw away experiences or knowledge already acquired, but to reconsider them, according to a different logical order that at least can conduct us to a new exploitation of the contents of experience and knowledge that have allowed us to arrive here. To describe the link of relationships and interdependences that characterizes our time, in the good and evil, more and more often we speak about complexity and non linearity of its constituting processes. And to move in the complexity, we need to integrate the paradigm of the explanation inspiring our actual scientific method, with that of the understanding. And to understand the world and its constituting dynamics, we need to assume an orientation of thought that allows – as a precious instrument of knowledge to act ethically - to underline the net of relationships between the parts that constitute every complex system. In order to define this cultural orientation, Heinz von Foerster - one of its most important exponents - proposed the term Systemics; not as new “discipline”, but as the epistemological orientation that has been realized in the last ten years thanks to the interaction of extremely valuable contributions started with the formulation of the General Systems Theory of Ludwig Von Bertalanffy (1968) [1]. Kuhn affirms that the passage from a paradigm to another happens thanks to scientists that have a foot in the tradition and the other one in the future. In order to respect the past on one side and to turn to the future on the other, our proposal is to assume the evolution of the systemic thought as a fundamental epistemological operation of the knowledge process (Telfener and Casadio, 2003) [9]. The strategic approach we believe it’s the more suitable to manage the great problems we have to face, it is to succeed in applying systematically a methodology of study, research and work characterized by the constant and fruitful interaction of the scientific method with the systemic thought. There should be two ways to see, to think, that are complementary, two ways of observing and thinking - [...] Heinz von Foerster said [...] - One is
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
477
the proper way of the science, that comes from sci, “I divide”, that proposes also a defined methodology (currently it is too often thought with its “S” in capital letters); and the second is a complementary way of thinking and observing that it is the systemic one, that comes from sun, “I put together”, so that the different parts, all together they form a whole. [...] I propose to consider the systemics as a position, a way of observing, a cognitive attitude. To operate a distinction is scientific, to see the complementarity is systemic. With this view the two concepts overlap through the mutual definition of themselves and the other. It is not necessary to choose one or the other approach; we must contemporarily use the two approaches to be able to have a greater depth. From what has been said until here, it is evident how much the new cultural horizon that is taking shape through this new epistemology’s openness and richness of contents, can constitute an opportunity and a reference of extraordinary value to give a solid theoretical structure to the evolutionary change of our medicine in that humanized direction that both patients and physicians have wished for a long time. 4. Systemics and humanistic ethics The methodological premise to be able to develop the cognitive attitude characterizing the epistemology of complexity we have pointed out as Systemics foresees the observer’s awareness of being himself/herself part of the observed phenomenon, and that the simple act of observing the phenomenon, modifies it. In other words, the awareness that there is not a world “external” from us, which we are not part of and which we can observe and study in an aseptic way, as an object detached from us. Such position of the observer, involved in the same phenomenon that it observes, it is what has emerged from the revolution that has shocked the physics in the first thirty years of last century. If the quantum and relativistic physics had not been born, in our medicine today we would not have sophisticated instruments as the CAT, the NMR and the PET. On the other hand, medicine has always followed physics at a distance; it happened when Newton gave a strong physical justification to the new conception of the world emerged from the Copernican revolution and the image of the universe as a great mechanic device has taken consistence, governed by strict laws of physics, where every event is scanned from a linear time and it is observable from the outside, from a not involved observer. This cognitive attitude has been soon
478
A. Ricciuti
adopted by the physicians of that time who in practice still used to study on the texts written by Galeno (“to do as Newton did” was the world of fashion). The leap for the medicine of that time was great, and the undertaking road has conducted to the extraordinary development of the contemporary technological medicine through the path we all know. The following great revolution happened in physics during the first decades of the eighteenth century - in the context of the complex political and socioeconomic dynamics determined in the past century – it has imprinted the exponential scientific-technological acceleration mentioned above, which allowed a further evolution of the medicine’s technical possibilities and it has strengthened the image of a “science” that takes care of the diseases in order to recover them. In other words, the change of paradigm emerged with the quantum and relativistic physics and the new position of the observer involved in the observed world have not changed the epistemological approach of medicine. However, the technological instruments that the evolution of physics have allowed to realize entered in medicine – and abundantly -, but it is still used according to the “old paradigm” (if I can) focused on a reified disease, treated as it was the failure of a mechanical device, and not on the person who lives that condition in a far more complex existential dimension than medicine can identify with its epistemological approach inspired from over three centuries, and which itself perceives now as inadequate. And even here - it is useful to point this out – we find, at least in large part, the problems raised by the new technical possibilities of medicine, which have highlighted the increasing inadequacy of its theoretical approach and the difficulties in the ethical justification of its actions, which especially emerges in many conflicting situations generated and made possible by the application of its technology instruments, with the technical possibilities of long-term management of disabling diseases, assisted fertilization and genetic manipulation. But the systemics, as a cultural orientation that opens to the dialogue, to the interaction, to more cooperative than competitive strategies for action, to the awareness of recursive unity between observer and observed, is bearer of an “implicit” ethics, an ethic of respect. A humanistic ethics, not acting in accordance with “values” (“ethical of principles”: a moralistic and ideological ethics, which imposes its values to the others, because they are perceived as universally valid and that for this has been and still is responsible for many human conflicts over the centuries), but with the constant tension of the research
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
479
of adequacy between means and aims, along with a regular assessment of the consequences of its actions. An ethics that can overcome the critical point of the relationship between means and aims (even from a bad mean can come a good aim) that the “ethics of responsibility” of Max Weber, even with its value, has not been able to resolve; probably because still based on a conception of the observer separated from the observed reality. On the contrary, when the observer is aware of being inseparable part of reality and able to modify the observed, and when he/she acts consistently thinking in terms of systems, you can grasp the ethical implication wherever it is, that is - as Wittgenstein wrote - within the logic of the action itself. The ethical behavior then, in a systemic perspective, it is not something to “teach” or “explain”, it is not realized through the application of requirements and prohibitions, but it emerges from an act consistent with the epistemological orientation of the acting subject. It is a embedded ethics (Varela,1992) [10]. Von Foerster says: [...] I try to adhere to this rule, and to be sufficiently skilled in the use of language to make ethics implicit in every speech I make (in science, philosophy, epistemology, therapy, etc..). To make this happens, I intend to make language and action flow in the underground river of ethics so that they are not be bounced out. This allows ethics not to be explicit and the language not degenerate into moralism. But - von Foerster continues: [...] How it is possible to hide ethics to everybody, but at the same time enable it to determine language and actions? Fortunately ethics has two sisters allowing it to remain hidden. These create a beautiful and visible setting for us, a tangible fabric within which we can weave the texture of our lives. And who are these two sisters? One is metaphysics, the other one is dialectic. My intention is to talk about these two ladies and about how they can ensure ethics to become evident without becoming explicit. And, as Telfener and Casadio (2003) [9] clarify – this quote come from them “von Foerster means with the term metaphysic the epistemological choice that everyone makes, that is the lens through which we choose to observe (“the need to choose in respect to decisions which are in principle impossible to take”), and he means with the term dialectic the language, the dialogical dance, and the use
480
A. Ricciuti
of the language in the relationship with ourselves, with the other, and the community”. If under the “old paradigm”, still prevailing, the reflection on ethical problems, which began at least a couple of decades, can only be considered in itself a positive step, on the other hand Ethics – with the view of the epistemological technical-scientific approach- has quickly taken on the characteristics of a new “discipline” managed by experts. The ethical reflection, which should have a fundamental value in the evolution of individual and collective consciousness - especially in the multi-ethnic society in which we live with so many and complex problems – has been characterized by commissions and ethics committees (Rocca, 2004) [7]. Still, the drafting of informed consent - extremely important and timely - took the connotations of a bureaucratic set of documents that must be present in the patient’s case history and then signed, most of the time without having understood the meaning and even less the content, perceiving it erroneously only as a document of protection for physicians. It is therefore clear that the “old paradigm” can no longer suggest us proper strategies for the management of complex problems that arise by the individual and collective consciousness and deep changes of the economic and social systems of our time. Not only scientists require a new epistemology and consequently a new ethics, but all of us require it, because we are all cultured men in the sense that every individual has his/her own culture to emphasize and through which educate his/her children, his/her own system of beliefs and values that must be able to coexist with others, in the search of a possible mutual enrichment. That is why the search for a new epistemology and the consequent ethical reflection covers all the individuals and all human skills. Medicine is only one of the activities of human beings and certainly, for the complex problems it has to deal with - regarding the human being and his/her suffering situations and the people who live close to him/her – it is the systemic activity for excellence and soaked in ethical problems, because it involves deeply and simultaneously the “biological ego” and the “biographical” one of two people - the patient and the physician – who meet each other to go along a path of their history. It is evident how much the education of the man, first, and of the professional then, is of central importance for the emergence of an ethical sensitivity that can saturate every word, every choice, every act. Only then a diagnostic or therapeutic indication, from simple prescription, becomes medical act.
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
481
5. Systemics and Medicine But – one might ask – which changes the choice of this epistemological orientation determine in medical practice? To move simply the medicine coordinates from the disease to the person, it changes the perspective through which the clinical judgement is delivered, the contents of communication between patient and physician, the modulation of diagnostic choices and treatment strategies, the prescriptive style of the physician, its training and the contents of his professional education which are greatly enriched. The implicit ethics in the systemic attitude becomes a natural fertilizer of the relationship between patient and physician, and it allows more appropriate choices, which are more understood and shared by the patient. The clinical look is re-directed from biological to biographical The criticism most often moved to medicine is essentially the following: medicine takes care of the biological body as it is a container of the disease, considered as a failure of that mechanism; it devotes little attention to the person, to the personal needs, to the patient’s system of values and all this, in practice, does not fall within the clinical judgement expressed by the physician. We must take note that we cannot quit this problem permanently if we don’t put the biological and the biographical together in a “theoretical” sense. And this unity must belong to the medical epistemology: the physician should have, as a solid reference tool, a theory which foresees that turning to the biological body, at the same time we are turning, instantly and simultaneously, to the biographical body, the latter being nothing else that the biological body contextualized, that means that it lives in its space-time, in its emotional, personal and social dimension, which is a key node of the network of relationships in which it is immersed and which it helped it itself to achieve (Ricciuti, 2005) [4]. In practice, this means a medicine centred on the person and not on the disease, with all the implied consequences from the ethical point of view. It is not a matter of classifying or “forcing” the clinical case within a predetermined nosography (the diagnosis of “disease”), followed by the automatic operation of a prescription therapy codified by “guidelines” (which must remain what they are: a useful reference tool). It is a physician’s re-appropriation of his/her knowledge and his/her professionalism, of his/her capacity – thinking for “processes” and underlining their relations - to understand more deeply the meaning of the ongoing biological transformations in the patient and to identify
482
A. Ricciuti
the most effective tools to re-orient the process towards a renewed autoorganized capacity, and to more actively involve the patient to change his/her lifestyle if necessary. It is a way, for the physician, to take possession of ethical imperative of Heinz von Foerster: “Always act in order to increase the total number of choices”. The aim of the physician’s action is the re-establishment of the self-regulative processes’ autonomy In a medicine centered on the disease, the aim of the physician - as we were taught - is the treatment of the disease. But when the fundamental axis of medical epistemology is oriented in a systemic way, then the medical praxis is centered on the person. The aim of the physician’s intervention is no longer the recovery of disease, but the reestablishment and support of the autonomous self-organizing, self-repairing, self-defensive capacities of the individual as a complex system; the recovery belongs to the field of possible results, and in this richest cultural context it is more easily accessible and more durable. Never again then - hopefully - when the physician is facing a pathological irreversible process, the arrogant judgement will be pronounced “Nothing more can be done”, that sounds like a condemnation without appeal for the patient and reveals the sense of frustration of the physician for having failed in its mandate of healer. On the contrary, if “medicine” (from Latin mederi, to heal, to treat) is a competent aid, participated in solidarity with the person who is suffering, there is always something to do to help him/her to live his/her time in the best way. Because – as Umberto Galimberti says – no one dies because he/she falls ill, but we fall ill because we die. This knowledge attitude implies that the physician is aware of his/her own humanity in the relationship with the patient and that he/she uses it as a mean of treatment. Because if it is true that from a technical-scientific point of view the physician gives to the patient what he/she has and what he/she knows, from a human and communicative point of view, he/she gives to the patient what he/she is; and an unfair intervention from this point of view can be no less devastating than an incorrect intervention on the technical side (Ricciuti, 2006) [5].
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
483
A better appropriateness of prescription and a preventive value of the medical action allow to save more public spending A systemic-oriented clinical look leads to a redefinition of the concept of appropriateness of prescription (which is closely linked with the physician’s freedom of prescription): it is appropriate what takes into account not only the technical and scientific data, but also the historical context in which the disease has occurred and must be treated, namely the context of personal history of the individual. We are not talking about the search of the best therapy to treat the disease as a nosological abstract entity, but the best therapy to treat that person in a state of suffering in his/her cultural context, life and values. Doing this, we often discover that the best therapy is not always the most expensive. A systemic approach to the health problems of the patient has also a preventive intrinsic value, because strategically oriented to the restoring of the best possible degree of autonomy of the biological system, which thus will tend to fall ill less. And this can only results in a further reduction of socio-health spending. Unfortunately, however, today it happens that whenever the physician expresses a clinical judgement centred on the person, conflicting with the “statistically based” indications of the scientific knowledge about that disease suffered by the patient – without possessing an appropriate medical epistemology – the physician do so at his/her own risk and may even be considered ethically incorrect. With this cheating use of ethics, then, a pre-judice is burdened on him/her, which ratifies his/her behavior as morally reprehensible. This pre-judice has the flavor of an excommunication. A systemic epistemology of medicine, on the contrary, also implies a renewed concept of the physician’s freedom of prescription. Certainly it is not the freedom to prescribe eccentric therapies (there would be very little of systemic in this behavior), but the freedom of prescription that emerges from an assessment not only scientifically based, but also calibrated on the care of the patient’s personal instances, of his/her resources , of his/her system of values and emotional context. The physician should be free to calibrate his/her treatments on these considerations whenever he finds it appropriate or necessary, in accordance with his/her patient. Naturally any therapeutic tool chosen must have the best possible degree of validation in relation to the purpose for which it is used.
484
A. Ricciuti
A wider horizon for scientific research A systemic epistemology finally is the most solid theoretical justification for scientific research, to be open to a greater sensitivity to the vast field of studies that is waiting for it, focused on supporting life and not just to fighting the disease; on the production of strategies and tools that, improving the quality of life of the person and the environment - where he/she lives - allow greater wellbeing for the individual and the society and a real reduction (not always and only pretentiously induced by drugs) of many diseases that now we only “fight” with increasingly expensive tools and technologies. Diseases that are mostly closely related to our lifestyle that has brought us so far. When it comes to ethics and medicine, generally our thought runs to those conflict situations that sometimes arise when we cross the border regions of the entry and exit from our lives. Enormous problems, regard to which surely, for what we said so far, a “systemic thinking” can help to identify the right mediations. But there are other aspects that affect medical praxis in everyday life that should be rethought and revised. Today, medicine has very powerful tools, that often allow to live longer with serious and disabling diseases that, until a few decades ago, only allowed a short time of life. However, so far, we have not been concerned enough to support the quality of life of these patients, often made problematic by the longterm side effects caused by the same treatments (Ricciuti, 2006) [6]. To calibrate better the physician’s action to support the quality of life of the person - in health and disease - implicitly means to enrich the content of the care relation, to improve the physical and psychological well-being of the individual, to expand the horizons of medicine and reduce the related social cost. In this direction a greater commitment to scientific research is desirable. A research then oriented to life, and not just to disease. A research aimed not only to the study of pathological processes characterizing the disease, but also to the processes that allow the body to preserve and enhance its self-regulative, selfrepairing and self-defensive properties. Those processes that define - from the systemic point of view - the autopoietic properties of the living body that we are, that is a body whose biologic activity - precisely - consists in the network of production processes of the components constituting itself (Maturana and Varela, 1973) [3]. Lot of knowledge of these processes actually already exists in the literature, but it has a practical effect, in daily medicine, which is much less than deserved.
The “Implicit” Ethics of a Systemic Approach to the Medical Praxis
485
And this, we believe, is due to an essentially cultural fact that we have already abundantly emphasized: the orientation of clinical medicine to the treatment of the disease and not to the personal care. Such type of knowledge, however, is valuable and should be further strengthened; not only because it has a preventive value itself - with all that it can achieve in terms of reduction of morbidity and mortality for many diseases - but also because it allows for better support those processes that are deeply disturbed by many therapies, and often “heavy”, today available that allow, compared to the past, the long-term management of disabling illnesses - such as diabetes, cancer, autoimmune diseases - and complex situations of poly-diseases present in the same individual. The production of therapeutic tools addressed to the support of life processes (ad hoc drugs, biological supports and integrators “dedicated” to specific typologies of metabolic changes induced by certainly necessary treatments, but strongly disturbing the biological homeostatic processes in the long-term up to induce, in some cases, other diseases and so on), can greatly contribute to create, with concrete acts, a cognitive guidance of research and a clinical practice that has as strategic objectives not only to add years to life, but also life to years. These, in brief, are the issues and the path that led us and medicine on the edge of a change that now seems irreversible, as in every other sphere of human life in our time. And in medicine signals are strong and clear, both by patients and by physicians (Attivecomeprima-Onlus, AIOM Foundation, 2007) [2]. If medicine doesn’t have the courage to accept in it these instances for a constructive profound reflection on these questions, and re-orient its epistemological coordinates in this direction, politics also will be in great difficulties and perhaps unable to act in the most effective ways to enable a reorganization of health paths focused on the person and not the disease. References 1. L. von Bertalanffy, Teoria generale dei sistemi (Mondadori, Milano, 1971) (orig. ed.: General System Theory, Braziller, New York, 1968).
2. Attivecomeprima-Onlus, Fondazione AIOM, Eds., Quando il medico diventa
paziente. “Progetto Chirone”:la prima indagine in Italia sui medici che vivono o hanno vissuto l’esperienza del cancro (Franco Angeli, Milano, 2007). 3. H.R. Maturana, F.J. Varela, in Autopoiesi e Cognizione (Marsilio, Venezia, 1985 (orig. ed.: Autopoiesis and Cognition, (Reidel, Dordrecht, Holland, 1980). 4. A. Ricciuti, in Medicina/Medicine - Le cure “altre” in una società che cambia. Salute e Società, Anno IV, 3/2005, Ed. D. Secondulfo (Franco Angeli, Milano, 2005), pp. 49-66.
486
A. Ricciuti
5. A. Ricciuti, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa, M. Abram, (Springer, New York, 2006), pp. 133-146.
6. A. Ricciuti, La terapia di supporto di medicina generale in chemioterapia oncologica (Franco Angeli, Milano 2006).
7. B. Rocca, “Bioetica e diritto”, intervista a Mariachiara Tallacchini, L’Ospedale Maggiore, (2004).
8. A. Schiavone, Storia e destino (Einaudi, Torino, 2007). 9. U. Telfener, L. Casadio, Sistemica (Bollati Boringhieri, Torino, 2003). 10. F.J. Varela, Un know-how per l’etica (Laterza, Bari, 1992).
POST TRAUMATIC STRESS DISORDER IN EMERGENCY WORKERS: RISK FACTORS AND TREATMENT
PIERGIORGIO ARGENTERO, BIANCA DELL’OLIVO, ILARIA SETTI Department of Psychology, University of Pavia Piazza Botta 6, 27100 Pavia, Italy. E-mail: [email protected] Post traumatic stress disorder (PTSD) are emergent phenomena resulting from exposure to a traumatic event that causes actual or threatened death or injury and produces intense fear, helplessness, or horror. In order to assess the role of different factors contributing to this kind of emergent phenomenon prevalence rates across gender, cultures, and samples exposed to different traumas are examined. Risk factors for PTSD, including pre-existing individual-based factors, features of the traumatic event, and post-trauma interventions are examined as well. Several characteristics of the trauma, related to cognitions, posttrauma social support and therapeutic interventions for PTSD are also considered. Further work is needed in order to analyze the inter-relationships among these factors and underlying mechanisms. The chaotic nature of traumatic processes, the multiple and interactive impacts on traumatic events require a comprehensive perspective aimed at planning effective interventions. Treatment outcome studies recommended the combined use of training and therapies as first-line treatment for PTSD. Keywords: post traumatic stress disorder, risk factors, interventions.
1. Introduction This paper contains a review of literature dealing with a particular kind of emergent phenomenon occurring in emergency workers: the post-traumatic stress disorder. Despite the possible (and amusing) confusion between ‘emergence’ and ‘emergency’, the PTSD appears to be a kind of emergence very difficult to study. So far, it is possible only to individuate the main factors contributing to it, while avoiding the search for eventual cause-effect relationships, which in the context of an emergent phenomenon, would be devoid of any sense. In this regard research has traditionally focused on the development of physical and mental health symptoms in people who experienced trauma directly, but has overlooked the impact of traumatic situations on the emergency personnel [58]. In fact, the association between mental health and occupational factors among emergency workers has not been thoroughly investigated [1]. The topic of the emergency personnel’s well-being may be considered in a systemic perspective because of its links with the
487
488
P. Argentero et al.
helpers’ personality features and events. In fact, the effects on workers’ physical and mental health largely depend on both their personal characteristics and previous experiences, and the intensity and duration of trauma. Therefore, when considering health consequences for emergency workers, more general features of their personality and characteristics of specific traumatic events need to be investigated: this is the systemic perspective upon which this study has been structured. A number of different professionals belong to the occupational category of “Emergency workers”, including ambulance personnel, firefighters, paramedics, nurses, policemen and trauma-workers (e.g. psychiatrists): they must cope with a variety of duty-related stressors and for this reason are often exposed to traumatic stress while helping people in emergency situations [41]. Despite the fact that emotional disturbance after a traumatic event has been recognized in literature and emergency workers have been identified as a ‘‘highrisk’’ occupational group, there is a lack of data concerning the impact of psychological problems among them [6]. As a result of being involved in mass emergencies or critical incidents, disaster relief workers could develop various symptoms, both physical (such as sleeping problems, headache and stomachache), and mental (such as depression, anxiety and great worry), which can reach significant levels of strength and require proper treatment. The onset of these symptoms is due to the potentially traumatizing situations with which emergency personnel are required to deal. Some of those rated as most stressful are: accidents involving children, cot death, mass incidents, major fires, road traffic accidents, burns patients, dead on arrival, violent episodes and murder scenes [76,21]. Emergency workers represent a population at high risk for post-traumatic stress disorder (PTSD) as a result of their daily work. But despite this, most PTSD research has been focused on primary traumatic stress disorder, and partially neglected symptoms associated with secondary traumatic stress as experienced by emergency workers [29] even though the PTSD in this population (emergency workers) has been the focus of considerable interest in recent years [39]. In fact not only the victims but also the helpers can develop psychological and physical symptoms after a traumatic event and as Selley [69] in particular argues that health workers as well as primary victims are at high risk of developing PTSD [41]. Given the repeated daily exposure to human suffering and death, trauma workers are constantly exposed to traumatic stress and have been shown to be vulnerable to the development of a post-traumatic stress symptomatology [41]. Since at present little is known about variables that might eventuate in post-traumatic stress disorder in high-risk occupational groups [5], it is important to deepen our
Post Traumatic Stress Disorder in Emergency Workers: …
489
understanding of event variables, including the type and frequency of exposures, that may increase the risk of developing this symptomatology [5]. The hypothesis is that the severity and frequency of PTSD symptoms are related to the proximity and exposure to trauma [41]. Firstly, the interest in studying PTSD in emergency personnel is due to its remarkable frequency: prevalence rates of PTSD in rescue workers have been estimated to range from 3% to 24.5% [41]. Secondly, it should be noted that the perception of safety is an important component of psychological health and is a determinant for the ability to effectively work after exposure to traumatic events [33]: the subject’s emotional and behavioral responses might complicate an otherwise successful response. These are the reasons why psychological consequences of traumatic events or disasters require both immediate clinical attention and long-term treatment [67]. 2. Post-Traumatic Stress Disorder (PTSD) Since its introduction into the official psychiatric language in 1980, the notion of post-traumatic stress disorders (PTSD' s) has developed through three revisions of the Diagnostic and Statistical Manual of Mental Disorders (DSM): the 3rd edition (DSM-III) (APA, 1980); the 3rd edition revised (DSM-III-R) (APA, 1987); and the 4th edition (DSM-IV) (APA, 1994). Whereas the typical symptoms of PTSD have remained fairly constant across these revisions, the nature of the requisite stressor has always been subject to debate and shown great uncertainty. At present, the stressor criterion (DSM-IV PTSD A.1) is met if the individual “experienced, witnessed, or was confronted with [italics added] an event or events that involved actual or threatened death or serious injury, or a threat to the physical integrity of self or others” [2, p. 424]. 2.1. Definition of PTSD According to DSM-IV [2] the essential feature of PTSD is the development of specific symptoms after the exposure to a traumatic stressor. PTSD is characterized by three interacting groups of symptoms: intrusive or reexperiencing the trauma, avoidant or avoidance of trauma-related stimuli and hyperarousal phenomena or increased emotional arousal [81]. The DSM-IV also lists five intrusive symptoms, however, only one is required for a PTSD diagnosis. These symptoms are: distressing memories; distressing dreams of the event; acting or feeling as if the traumatic event were recurring; intense psychological distress; physiological reactivity (sweating, heart racing, etc) when reminded of the event. Avoidance symptoms represent
490
P. Argentero et al.
attempts to block out unpleasant memories and feelings, and three of them are required for a PTSD diagnosis. Examples of these kind of symptoms are: efforts to avoid thoughts; feelings or conversations associated with the specific trauma; efforts to avoid activities, places or people who arouse recollections of the trauma; inability to recall an important aspect of the trauma; markedly diminished interest in participation in significant activities; feelings of detachment from others, restricted range of affect and emotional responsiveness and a sense of foreshortened future. Hyperarousal symptoms lead individuals to feel constantly at risk, and two are required for the diagnosis of PTSD. Hyperarousal symptoms are: difficulty in falling or staying asleep; irritability or outbursts of anger; difficulty in concentrating; exaggerated startle response and hypervigilance for signs of danger. PTSD should only be diagnosed if symptoms persist for at least one month, whilst the remittance of symptoms within one month after the traumatic event indicate the diagnosis of an acute stress disorder [81]. Depending on the type of exposure to trauma, directly as a victim or indirectly as a helper, PTSD could be classified as primary or secondary traumatic stress disorder respectively [22]. Primary stress symptoms are directly related to the experience of primary traumatic stress as a victim, whereas secondary stress can be defined “as the natural consequent behaviors and emotions resulting from knowing about a traumatizing event experienced by a significant other; the stress resulting from helping or wanting to help a traumatized or suffering person” [22]. 2.2. PTSD’s causes and predictors Experiencing a traumatic event in helping traumatized, or distressed people produces stress reactions among helpers which can be considered as natural behaviors and reactions [29]. Emergency workers have to cope with many highly traumatic events and this repeated daily stress exposure makes them vulnerable to develop PTSD [3,21,40]. In more detail, risk factors for PTSD are the victim’ age (e.g., infants'and children' s injuries/deaths have a greater impact; [37]), exposure to serious injuries and/or death [35], and facing dangerous and/or unpredictable situations [51]. Although it is clear that emergency occupations are associated with a high risk of psychiatric morbidity, it is equally clear that trauma experience is necessary but insufficient to explain the development of post-trauma reactions. Rather, psychiatric morbidity appears to result from an interaction between specific event and individual characteristics [39]: because of this linked causes, PTSD cannot be viewed as an isolated phenomenon but it should be considered in a systemic perspective.
Post Traumatic Stress Disorder in Emergency Workers: …
491
In a study on the London Ambulance Service, Ravenscroft [64] concluded that job stress was the main factor of sickness: occupational stress in emergency situations can induce an acute or extended stress disorder [54]: if the symptoms remit within one month after the traumatic event, the diagnosis of an acute stress disorder is indicated, while if the symptoms persist for more than one month, PTSD is said to occur [41]. After involvement in a disaster, individuals may be at risk of developing acute stress disorders, which in turn can be a risk factor for developing PTSD. In fact, previous research has shown that 80% of those who fulfilled the criteria for diagnosis of acute stress disorder went on to develop PTSD [14]. Research studies have pointed out that there are many factors apt to support PTSD development, some of these could be considered occupational variables, others are individual and social variables. Therefore, PTSD onset and development cannot be viewed as an individual phenomenon, limited to the single person, but it should be considered within a systemic perspective because a combination of occupational, individual and social variables. Among the occupational factors that can potentially lead to the development of PTSD development, previous traumatic events have been shown to increase the risk of this disorder, which is in accordance with previous researches [68]. There is also a strong connection between the number of years in emergency service and PTSD symptoms: personnel with many years in the service should be more likely to develop PTSD. So the key factor seems to be not the number of traumatic events but rather the length of service in terms of a long history of many stressful events [41]. Consequently, background stressors such as shift work and time pressure may be factors that contribute to the upcoming of the overall distress, and the likelihood of some of the PTSD symptoms. With reference to individual and social variables, earlier studies [42,60,23,69] showed that factors such as age, family patterns and previous experiences of violence probably play a role in the development of PTSD. Post-stressor factors such as lack of social support, being unmarried, and drug abuse are factors associated with PTSD development. It is probable that personality traits influence how well the subject can cope with different types of stressors, in fact some people seem to be less likely to develop PTSD [42,16]: they have an high level education and experience in coping with life stressors. On the contrary, the lack of coping strategies may facilitate PTSD symptoms and other mental and physiological illnesses. Furthermore, some studies on emergency services personnel have identified pre-trauma, peri-trauma and posttrauma risk factors in the development of PTSD.
492
P. Argentero et al.
Pre-trauma individual risk factors include personality dimensions such as neuroticism and, to a lesser extent, introversion, as well as previous psychiatric history [20,53]. Gender appears to be another consistent risk factor, and female subjects appear more vulnerable [9]. In terms of peritraumatic variables, the severity of trauma to which the worker is exposed is an important factor, as is the presence of peritraumatic dissociation, and social support appears to be a strong post-trauma factor in promoting recovery [20,83]. In short, some authors [81] argue that the length of the service and the number of distressing missions within one month could be considered the most valid predictors of the occurrence of PTSD symptoms. The longer an emergency worker is on duty and more likely he/she is to participate in distressing missions, the higher is the probability of PTSD symptoms developing. At the same time, other studies [15] show that the number of recent incidents may not be a predictor of PTSD. Therefore, in a future perspective, it will be necessary to precisely identify factors responsible for PTSD onset for the emergency personnel. 2.3. Epidemiological data and comorbidity Prevalence of PTSD in a population based sample is estimated to be 7.8% [45] although estimates for specific “at-risk” populations may be higher. For example, in a study on firefighters, prevalence was estimated at 18% [81]. This result was confirmed by another research on German professional firefighters, in which a prevalence rate of 18.2% for PTSD symptoms was found [81]. Another example of group “at risk” is ambulance personnel: estimates of prevalence of PTSD in this group were around 20% [21,36]. In general, the prevalence of PTSD in emergency workers is usually estimated to be between 20% and 21% [21,36]. In some cases, a high prevalence of comorbid psychiatric symptoms associated to PTSD has been found, such as depressive mood, social dysfunction, and substance abuse, as shown by several studies [48,85,52,44,25]. In fact, subjects with PTSD can also present various psychiatric and psycho-physical symptoms, such as recurring and more intense physical pains than the general population, especially those related to general condition, cardiovascular problems, tension, pain and elevated blood pressure [57].
Post Traumatic Stress Disorder in Emergency Workers: …
3.
493
Preventive and Treatment Strategies for Post Traumatic Stress Disorder
Disasters, whether collective or personal, require careful management. In addiction to primary victims, other people may become involved in the disaster. Massive disasters evoke deep feelings in families and friends of primary victims, in emergency and rescue workers and in those people who carry responsibility for the management of the disaster, including police forces, volunteers and clinicians. The disaster affects both, emergency workers and victims. Disaster and chaos produced disrupt both emergency workers and trauma victims. These secondary victims must be helped to become secondary survivors for their own well being as well as for the benefits of all their significant others. The chaotic nature of these traumatic processes, together with the multiple and interactive impacts of traumatic events, require a comprehensive perspective in order to successfully address the situation. Because of the chaotic nature of the variety of factors and their interaction in treating the secondary survivors and interfacing such treatment with those of the primary survivors and the relationship, some structure apt to facilitate coordination is helpful. Viewing intervention from two dimensions (treatment goals and therapeutic setting) can be useful. By combining four levels of goals (training, awareness, personal development and skill acquisition) with individual or group therapy a balanced intervention can be provided in order to help people understand and find ways to the end of chaos. Treatment of acute post-traumatic stress reactions is fundamental to prevent the development of secondary chronic co-morbidity not only in primary victims, but also in emergency workers [77]. Most of the treatment literature about PTSD has focused on the management of acute stress-induced symptoms. According to the American Psychiatric Association (APA), and other expert panels [32,78], the current treatment recommendations for PTSD and co-morbid conditions include the use of medication and psychological intervention. A number of recent reviews and meta-analyses provide strong support for the efficacy of psychological interventions for PTSD [8,24,38,72,80]. Given the substantial personal and societal costs of chronic Post Traumatic Stress Disorder, mental health care professionals have developed early intervention methods designed to mitigate acute emotional distress and prevent the emergence of posttraumatic psychopathology. Psychological preventive strategies following traumatic exposure include psychological debriefing and psychotherapy.
494
P. Argentero et al.
Psychological debriefing is a brief crisis intervention usually administered within days of a traumatic event, it is designed to relieve and prevent eventrelated distress in normal people experiencing abnormally stressful circumstances [62,55,17,4,26]. Debriefing relies on three therapeutic ingredients: ventilating and reliving of the emotional responses by trauma in a context of group support, normalization of responses, and training about postevent psychological reactions. Treatment usually consists of a single extended session. A debriefing session, especially with a group of individuals (e.g., firefighters), usually lasts about three to four hours. By helping the traumaexposed individual “talk about his feelings and reactions to the critical incident” [55], the debriefing facilitator aims “to reduce the incidence, duration, and severity of, or impairment from, traumatic stress” [56]. According to Mitchell [55], a single debriefing session “will generally alleviate the acute stress responses which appear at the scene and immediately afterwards and will eliminate, or at least inhibit, delayed stress reactions”. Mitchell and Everly [56] recommended that debriefing should be offered to anyone exposed to a critical incident, regardless of whether the person is experiencing stress-related symptoms. During the debriefing process, participants are encouraged to describe their own experience of the accident and its aftermath in both factual and emotional terms followed by an educational presentation of common stress reactions and stress management. This gives an opportunity for a collective appraisal of the event in both factual and emotional memory and facilitates normalization of reactions and support. Studies of debriefing have been ambiguous and a recent meta-analyses found no evidence to support the use of debriefing to decrease psychological distress or prevent the onset of PTSD [79]; however other authors argued that this procedure is probably the most widely indicated for preventive intervention, representing a strategy designed to promote the processing of emotional and cognitive traumatic memories [63]. Other studies [13,12] showed that undergoing a few sessions of psychotherapy, starting 2-3 weeks after the traumatic event, could also be successful in preventing PTSD. Psychotherapeutic treatment strategies extend along the continuum from therapies where the focus is more cognitively reflective, to therapies where the focus is more directly sensorially experiential and skill-based. Each of the
Post Traumatic Stress Disorder in Emergency Workers: …
495
therapies includes reflective and experiential elements to different degrees. More reflective therapies include the use of interpersonal and psychodynamic therapies. Combinations of reflective and experiential therapies include cognitive behavioral therapy. More experiential and skill-based therapies include somatic (relaxation training and biofeedback), and exposure (graded exposure, eye movement desensitization reprocessing “EMDR”, and hypnosis) approaches. 3.1. Reflective Psychotherapies Interpersonal psychotherapy was developed to address the interpersonal and social problems influenced by social interactions. Interpersonal and social problems are often responsible for a patient with PTSD seeking treatment and can often influence the symptom course. Some study have demonstrated an improvement in social functioning in those subjects treated with this method but found that it had limited effects on more PTSD specific symptoms [65,66]. Psychodynamic psychotherapy explores a patient’s underlying personality structure which gives rise to the way a patient responds to life events including traumatic episodes. However there is minimal evidence to support the use of psychodynamic psychotherapy in the treatment of PTSD [65]. 3.2. Combination Reflective and Experiential Psychotherapies The Cognitive behavioral therapy helps patients to identify distorted automatic cognitive, affective, physiological, and behavioral responses towards current events and to focus instead on more rational responses appropriate to the situation, through reflective dialogue with therapists. The evidence for the treatment of PTSD focused on cognitive behavioral therapy is supported by some controlled studies [65,7,10,11,12]. 3.3. Experiential psychotherapies These therapies focus on developing self regulation in an attempt to master and control troublesome symptoms. The most common experiential therapies can be divided into: 1. somatic (self regulation through relaxation training and biofeedback), 2. attentional (developing control over cognitive processing), 3. exposure (graded exposure, eye movement desensitization reprocessing “EMDR” and hypnotherapy).
496
P. Argentero et al.
Somatic therapies for PTSD include a skill-based somatic component. Relaxation exercises have an autonomic emphasis. Patients are trained to reduce the sympathetic arousal associated with PTSD symptoms and improve parasympathetic recuperation using progressive muscle relaxation by biofeedback. Biofeedback can be useful as a method of self-regulation. Clinicians monitor the physiological data and suggest somatic exercises for the patient to use. This way, the specific approach to self-regulation is adapted to the specific requirements of each patient. Such physiological monitoring and feedback is a useful instrument to continually monitor objective arousal in patients with PTSD and assist patients in regaining a sense of mastery over their symptoms [84]. According to Attentional therapies, when trauma victims or emergency workers must think back to the most intense experience of their trauma, those intense experiences continue to play over and over in their minds. Attentional therapies have been developed to facilitate shift attention away from those intrusive thoughts. Patients significantly reduce their arousal [73]. In exposure Exposure-based therapy for PTSD, the patient is led through a vivid recollection of the trauma until extinction occurs. This therapy is applied using different forms of exposure. For example, in vivo exposure, people are generally required to return to the site of the traumatic event in order to reduce avoidance and promote mastery over the associated trauma cues. On the other hand, imaginal exposure is often used when in vivo exposure is not possible. Cues are presented as images to describe details of an event or set of events from the perspective of the stimulus propositions, response propositions and meaning propositions associated with the event [49]. Like in vivo exposure, the goal of imaginal exposure is to reduce avoidance and promote mastery. Common approaches to exposure therapy include: graded exposure, eye movement desensitization reprocessing (EMDR), and hypnosis. Graded exposure therapy attempts to elicit arousal at a level where the patient can tolerate and then increase exposure gradually over time as he/she learns new skills to modulate arousal. This approach is most often coupled with a skill-based de-arousal method, such as relaxation training. Skills might include relaxation training, breathing retraining, trauma education, guided self-dialogue, cognitive restructuring, and communication skills training. A more recent approach used to treat PTSD is Eye movement desensitization reprocessing (EMDR) [71]. This is a technique accidentally discovered to alter disturbing thoughts [70]. It has been applied to psychological
Post Traumatic Stress Disorder in Emergency Workers: …
497
problems, but it also had greater efficacy with PTSD. This technique involves patient focusing on a disturbing memory while the therapist initiates saccadic eye movements by asking the patient to track the horizontal motion of his/her finger moving rapidly in front of them. Following the therapist’s finger movement is thought to disassociate memories from associated emotions. The mechanism of action for EMDR is not based on any contemporary theory of human behavior or cognitive science [43]. A meta-analysis of EMDR and other exposure techniques found no significant differences in the outcomes [24]. At this time, the evidence base for EMDR treatment of PTSD shows it to be equivalent to other exposure therapies [65]. The inclusion of hypnosis as a possible treatment for traumatized people added a systematic and technical approach to achieving the reconstructing traumatic memories. Hypnotherapy with trance techniques have been used clinically to treat related stress disorders, work related stress and PTSD [82]. The patient is usually induced into a comfortable relaxed mental and physical state while simultaneously reviewing and distancing them self from the traumatic episode, thus learning to dissociate the traumatic event from arousing. Previous studies indicate that adding hypnosis to cognitive behavior therapy reduced a greater number of symptoms [46], in particular re-experiencing and hyperarousal symptoms, than the patients who received supportive counseling [10]. However, only a few results from controlled studies are available at this time [19]. Combination Treatments may be helpful in addressing the multiple problems that people with PTSD might exhibit. Glynn et al. [34] assessed the effects of adding a skill training intervention to exposure therapy. They found a decline in symptoms of anxiety, arousal, and reliving the trauma when using combination treatments than when using separate therapies. Some studies, that examined the efficacy of exposure therapy among traumatized refugees, found that both of these interventions were highly effective in reducing PTSD symptoms, anxiety, and depression [61,59]. Other studies examining the efficacy of PTSD interventions for victims of interpersonal violence, sexual assault, and abuse, found that exposure therapy were superior to the supportive counseling in reducing PTSD at post-treatment [30,32]. A large body of research documents the efficacy of PTSD combined interventions among people reporting heterogeneous trauma histories. A study that examined the efficacy of a combination therapy incorporating exposure
498
P. Argentero et al.
therapy techniques and cognitive therapy among a sample of battered women found a significant reduction in PTSD, depression and concomitant increase in self-esteem [47]. Blanchard et al. [7] examined an intervention that made use of psychoeducation and both exposure therapy, and also incorporated anger management. This intervention was better than the supportive therapy in reducing symptoms of PTSD. Tarrier et al. [74] compared exposure therapy and cognitive therapy in the treatment of outpatients with PTSD stemming from many different traumatic events, finding that both treatment groups manifested similar levels of improvement. Consistent with earlier works, Bryant et al. [12] studied a sample of civilian trauma survivors and found that those receiving exposure therapy combined with cognitive restructuring and supportive therapy manifested greater reductions in PTSD symptomatology and anxiety than those receiving only supportive therapy. On the contrary, a more recent study found that the combination of exposure therapy and cognitive restructuring led to a greater reduction in PTSD symptoms than exposure therapy, suggesting an added benefit of this combined intervention [27]. A lower global distress, anxiety, and depression, was evidenced by Shapiro [71] using EMDR to treat PTSD in individuals reporting different forms of trauma. In a clinic setting, Marcus et al. [50] found that EMDR, tested in a health maintenance organization, was more effective than standard psychological care in reducing symptoms of PTSD. A subsequent study that compared EMDR, exposure therapy, and relaxation training among a sample of persons reporting various forms of trauma, found that all three interventions reduced PTSD, related depressive symptoms, dissociative symptoms, and trauma-related anger at post-treatment [75]. 4. Conclusions To sum up, exposure therapy, cognitive therapy, and relaxation training interventions are treatments that make use of considerable empirical support across various trauma survivors. Current studies demonstrate the efficacy of the extended use of exposure therapies, cognitive therapies and relaxation training to treat PTSD patients. Some preliminary evidence indicates that psychological interventions [14,12,11,28] may be effective in preventing the development or progression of
Post Traumatic Stress Disorder in Emergency Workers: …
499
PTSD symptoms. As argued by Keane (2006) our growing knowledge of trauma may lead to substantial reduction of human suffering. The number of worldwide problems associated with war and disasters are as yet, not in decline. As a result, sound community strategy is needed to guide the social system’s response to survivors of these experiences [43]. PTSD in its most chronic form is a debilitating condition that affects individuals, their families and the entire social system. Emergency health care providers may ultimately become emergency victims, thus contributing to the vicious cycle. Then interventions to address the traumatic events occurrence (primary prevention) or to mitigate its effects once it occurs (secondary prevention) are needed. These findings suggest that PTSD constitutes a major problem for the public health systems across the world and that developing and implementing these prevention strategies could be the best resolution to the problems associated with PTSD. References 1. D.A. Alexander and S. Klein, British J. of Psychiatry 178, 76-81 (2001). 2. American Psychiatric Association DSM-IV, Diagnostic and statistical manual of mental disorder (American Psychiatric Association, Washington DC, 1994).
3. H.S. Andersen, A.K. Christensen and G.O Petersen, Anxiety Research 4, 245-251 (1991).
4. K. Armstrong, W. O’Callahan and C. Marmar, J. of Traumatic Stress 4, 581-593 (1991).
5. R. Beaton, S. Murphy, C. Johnson, K. Pike and W. Corneil, J. of Traumatic Stress 11(4), 821-828 (1998).
6. P. Bennett, Y. Williams, N. Page, K. Hood and M. Woollard, Emergency Medicine Journal 21(2), 235-236 (2004).
7. E.B. Blanchard, E.J. Hickling, T. Devineni, C.H. Veazey, T.E. Galovski, E. Mundy, L.S. Malta and T.C. Buckley, Behaviour Research and Therapy 41, 79-96 (2003).
8. R. Bradley, J. Greene, E. Russ, L. Dutra and D. Westen, American Journal of Psychiatry 162, 214-227 (2005).
9. N. Breslau, H.D. Chilcoat, R.C. Kessler, E.L. Peterson and V.C. Lucia, Psychological Medicine 29, 813-821 (1999).
10. R.A. Bryant, M.L. Moulds, R.D.V. Nixon, J. Mastrodomenico, K. Felmingham and S Hopwood, Behaviour Research and Therapy 44, 1331-1335 (2006).
11. R.A. Bryant and R.M. Guthrie, Psychological Science 16, 749-752 (2005). 12. R.A. Bryant, M.L. Moulds and R.V. Nixon, Behaviour Research And Therapy 41(4) (2003).
13. R.A. Bryant, T. Sackville, S.T. Dang, M. Moulds and R. Guthrie, American J. of Psychiatry 156(11), 1780-1786 (1999).
14. R.A. Bryant, A.G. Harvey, S.T. Dang, T. Sackville, C. Basten, J. Of Consulting And Clinical Psychology 66(5), 862-866 (1998).
500
15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.
P. Argentero et al. R.A. Bryant and A.G. Harvey, J. of Traumatic Stress 9, 51-62 (1996). V. Bunkhold, Psykologi. (Studentlitteratur, Lund, 1996). J.N. Butcher and C. Hatcher, American Psychologist 43, 724-729 (1988). S.P. Cahill, M.H. Carrigan and B.C. Frueh, J. of Anxiety Disorders 13(1-2), 5-33 (1999). E. Cardena, International J. Of Clinical And Experimental Hypnosis 48(2), 225-238 (2000). I.V.E. Carlier, R.D. Lamberts and B.P.R. Gersons, The J. of nervous and mental disease 185, 498-506 (1997). S. Clohessy and A. Ehlers, British J. of Clinical Psychology 38, 251-65 (1999). W. Corneil, R. Beaton, S. Murphy, C. Johnson and K. Pike, J. of Occupational Health Psychology 4(2), 131-141 (1999). D.W. Corneil, Prevalence of posttraumatic stress disorders in a metropolitan fire department (The Johns Hopkins University, Baltimore, 1993). P.R. Davidson and K.C. Parker, J. of Consulting and Clinical Psychology 69(2), 305-316 (2001). J.R.T. Davidson, D. Hughes, D. Blazer and L. George, Psychological Medicine 21, 1-9 (1991). C. Dunning, in Mental Health Response to Mass Emergencies, Ed. M. Lystad (Brunner/Mazel, New York, 1988) pp. 284-307. A. Ehlers, D.M. Clark, A. Hackmann, F. McManus and M. Fennell, Behaviour research and therapy 43, 413-431 (2005). A. Ehlers, D.M. Clark, A. Hackmann, F. McManus, M. Fennell, C. Herbert and R. Mayou, Arch. Gen. Psychiatry 60, 1024-1032 (2003). C.R. Figley, in Compassion fatigue, Ed. C.R. Figley (Brunner/Mazel, New York, 1995) pp. 1-20. E.B. Foa and S.A.M. Rauch, J. of Consulting and Clinical Psychology 72, 879-884 (2004). E.B. Foa, J. of Clinical Psychiatry 61(5), 43-48 (2000). E.B. Foa, T.M. Keane and M. Friedman, Effective Treatments for PTSD: Practice Guidelines from the International Society for Traumatic Stress Studies (The Guilford Press, New York, 2000). C.S. Fullerton, R.J. Ursano, J. Reeves, J. Shigemura and T. Grieger, The J. of nervous and mental disease 194(1), 61-63 (2006). S.M. Glynn, E.T. Randolph, D.W. Foy, M. Urbaitis, L. Boxer, G.G. Paz, G.B. Leong; G. Firman, J.D. Salk ; J.W. Katzman. J. Crothers, J. of Consulting and Clinical Psychology 67, 243-251 (1999). B. Green, M. Grace and G. Gleser, J. of Consulting and Clinical Psychology 53, 672-678 (1985). F. Grevin, Psychological Reports 79, 483-95 (1996). D. Hartsough, in: Role Stressors and Supports for Emergency Workers. NIMH, DHHS Publication No. 85-1408 (National Institute of Mental Health, Rockville, 1985). E.A. Hembree and E.B. Foa, J. of Clinical Psychiatry 61, 33-39 (2000). G.A. Hodgins, M. Reamer and R. Bell, The J. of nervous and mental disease 189, 541-547 (2001).
Post Traumatic Stress Disorder in Emergency Workers: …
501
40. A.K. Houston, Clinical Issues in Critical Care Nursing 4, 558-65 (1993). 41. A. Jonsson, K. Segesten and B. Mattsson, Emergency Medicine J. 20, 79-84 (2003). 42. S. Joseph, R. Williams and W. Yule, Understanding posttraumatic stress. A psychosocial perspective on PTSD and treatment (Wiley, Chichester, 1997).
43. T.M. Keane, A.D. Marshall and C.T. Taft, Annual Review of Clinical Psychology 2, 161-197 (2006).
44. T. Keane and J. Wolfe, J. of Applied and Social Psychology 20, 1776-1788 (1990). 45. R.C. Kessler, A. Sonnega, E. Bromet, M. Hughes and C.B. Nelson, Archives of General Psychiatry 52, 1048-60 (1995).
46. I. Kirsch, G. Montgomery and G. Sapirstein, J. of Consulting & Clinical Psychology 63, 214-220 (1995).
47. E.S. Kubany, E.E. Hill, J.A. Owens, C. Iannce-Spencer, M.A. McCaig, K.J. 48.
49. 50. 51. 52. 53. 54. 55. 56. 57. 58.
59. 60. 61. 62. 63. 64.
Tremayne and P.L. Williams, J. of Consulting & Clinical Psychology 72, 3-18 (2004). R.A. Kulka, W.E. Schlenger, J.A. Fairbank, R.L. Hough, B.K. Jordan, C.R. Marmar and D.S. Weiss, Trauma and the Vietnam War Generation: Report of Findings From the National Vietnam Veterans Readjustment Study (Brunner/Mazel, New York, 1990). P.J. Lang, Behav. Ther. 8, 862-866 (1977). S.V. Marcus, P. Marquis and C. Sakai, Psychotherapy 34, 307-315 (1997). T. Martelli, L. Waters and J. Martelli, Psychological Reports 64, 267-273 (1989). A.C. McFarlane and B.M. Papay, The J. of nervous and mental disease 180, 498504 (1992). A.C. McFarlane, British J. of Psychiatry 152, 116-121 (1988). J.T. Mitchell and A. Dyregrov, in International handbook of traumatic stress syndromes, Ed. J.P. Wilson and B. Raphael (Plenum, New York,1993) pp. 305-14. J.T. Mitchell, J. of Emergency Medical Services, 13, 47-52 (1988). J.T. Mitchell and G.S. Everly, Critical Incident Stress Debriefing: An Operations Manual (Chevron Press , Ellicott City, 1995). S.A. Murphy, R.D. Beaton, K.C. Pike and K.C. Cain, Association of Occupational Health Nurses J. 42, 534-540 (1994). B.S. Nelson, A. Goff, M.J. Reisbig, A. Bole, T. Scheer, E. Hayes, K.L. Archuleta, S. Blalock, C.B. Henry, B. Hoheisel, J. Nye, E. Osby, K.L. Sanders-Hahs Schwerdtfeger and D.B. Smith, American J. of Orthopsychiatry 76(4), 451-460 (2006). F. Neuner, M. Schauer, C. Klaschik, U. Karunakara and T. Elbert, J. of Consulting & Clinical Psychology 72, 579-587 (2004). L.S. O’Brien and S.J. Hughes, British J. of Psychiatry 59, 135-141 (1991). H. Paunovic and L.G. Ost, Behaviour research and therapy 39, 1183-1197 (2001). B. Raphael and J.P. Wilson, Psychological debriefing – theory practice and evidence (Cambridge University Press, Cambridge, 2000). B. Raphael, L. Meldrum and A.C. McFarlane, British Medical J. 310, 1479-1480 (1995). T. Ravenscroft, Going Critical: GMB/ Apex and T&G Unions 1994 survey of occupational stress factors in accident and emergency staff in the London ambulance service (GMB/Apex/and T&G Unions, London, 1994).
502
P. Argentero et al.
65. M. Robertson, L. Humphreys and R. Ray, J. of Psychiatric Practice 10(2), 106-118 (2004).
66. M. Robertson, P.J. Rushton, D. Bartrum and R. Ray, International J. of Group Psychotherapy 54(2), 145-175 (2004).
67. M.T. Sammons, American Psychologist 60(8), 899-909 (2005). 68. M.J. Scott and S.G. Strandling, Counselling for post traumatic stress disorder (Sage, London, 1994).
69. C. Selley, Practitioner 235, 635-639 (1991). 70. F. Shapiro, Eye Movement Desensitization and Reprocessing: Basic Principles, Protocols, and Procedures (Guilford, New York, 1995).
71. F. Shapiro, J. of behavior therapy and experimental psychiatry 20, 211-217 (1989). 72. J.J. Sherman, J. of Traumatic Stress 11, 413-435 (1998). 73. J. Spira, Using Meditation and Hypnosis to modify EEG and Heart Rate Variability. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85.
Paper presented at: Annual Meeting of the American Association of Biofeedback and Psychophysiology, Colorado Springs, CO. 2004. N. Tarrier, H. Pilgrim, C. Sommerfield, B. Faragher, M. Reynolds, E. Graham, and C. Barrowclough, J. of Consulting & Clinical Psychology 67, 13-18 (1999). S. Taylor., D.S. Thordarson, L. Maxfield, I.C. Fedoroff, K. Lovell and J. Ogrodniczuk, J. of Consulting & Clinical Psychology 71, 330-338 (2003). J. Thompson and I. Suzuki, Disaster Management 3, 193-197 (1991). G.J. Turnbull, A review of post-traumatic stress disorder. Part II: Treatment. Injury 29(3), 169-175 (1998). R.J. Ursano, C. Bell, S. Eth, M. Friedman, A. Norwood, B. Pfefferbaum, R.S. Pynoos, D.F. Zatzick and D.M. Benedek, American J. of Psychiatry 161(11), 1-61 (2004). A.A. van Emmerik, J.H. Kamphuis, A.M. Hulsbosch and P.M. Emmelkamp, Lancet 360(9335), 766-771 (2002). M.L. van Etten and S. Taylor, Clinical Psychology and Psychotherapy 5, 144-154 (1998). D. Wagner, M Heinrichs and U. Ehlert, American J. of Psychiatry 155, 1727-1732 (1998). J.G. Watkins, International J. of Clinical and Experimental Hypnosis 48(3), 324335 (2000). D.S. Weiss, C.R. Marmar, T.J. Metzler and H.M. Ronfeldt, J. of Consulting and Clinical Psychology 63, 361-368 (1995). B.K. Wiederhold, D.P. Jang, S.I. Kim and M.D. Wiederhold, Cyberpsychology and Behavior 5(1), 77-82 (2002). R. Yehuda and A.C. McFarlane, American J. of Psychiatry 152, 1705-1713 (1995).
STATE VARIABILITY AND PSYCHOPATHOLOGICAL ATTRACTORS. THE BEHAVIOURAL COMPLEXITY AS DISCRIMINATING FACTOR BETWEEN THE PATHOLOGY AND NORMALITY PROFILES
PIER LUIGI MARCONI ARTEMIS Neuropsichiatria Via Amba Aradam 22, 00184 Roma, Italy E-mail: [email protected] 369 patients, selected within a set of 1215 outpatients, were studied. The data were clustered into two set: the baseline set and the endpoint set. The clinical parameters had a higher variability at the baseline than at the endpoint. 4 to 5 factors were extracted in total group and 3 subgroups (190 “affective”, 34 type-B personality, 166 without any of both disorders). In all subgroups there was a background pattern of 6 components: 3 components confirming the trifactorial temperamental model of Cloninger; 1 component related to the quality of social relationships; 2 components (that are the main components of factorial model about in all groups) relating to quality of life and adjustment self perceived by patients, and to pattern of dysfunctional behavior, inner feelings, and thought processes externally evaluated. These background components seem to aggregate differently in the subgroups in accordance to the clinical diagnosis. These patterns may be interpreted as expression of an increased “coherence” among parameters due to a lack of flexibility caused by the illness. The different class of illness can be further distinguished by intensity of maladjustment, that is related to the intensity of clinical signs just only at the baseline. These data suggest that the main interfering factors are clinical psychopathology at baseline and stable personality traits at endpoint. This persistent chronic maladjustment personality-driven is evidenced after the clinical disorder was cured by treatment. An interpretative model is presented by the author. Keywords: psychopathological attractors, behavioral complexity, state oscillations
1. Introduction In the perspective of complex systems, the human mind may be modeled as an adaptive system, whose goal is to maintain the best internal state in any external environment. In such a perspective, the best adaptive cognitive system is the one which is able to produce the highest number of “useful behaviors” per each environment and per each internal affective state (affective state as psychological perception of the inner state) (Fig. 1). On the other hand the capability to produce useful behaviors is linked also to the subject’s ability to plan and evaluate behaviors of increasing level of complexity. 503
504
P.L. Marconi Rule Inference
Action Planning
Matching Device
C(jk) Da Expectance
E(j)
Reality
5HT Data Organizer and Memory
"Noise" Filter Working Hypothesis
A(k) Na
A (1) A (2) A (k) +
p (1) p (2) p (i)
Behavioural System 1 Behavoural System 2 Behavioural System 3
A (k') - E (j) -
Environmental Complexity
C(k) =Σ ij
F(k) = ∆
p (ijk) E (k)
E (k)
∆ C (k)
F (k) = Adaptive
Capacity per each Environment Set
Σ
Individual Complexity
R(uk) = Cu(k) Ca(k)
per A (k)
=
ij
Σ
1
i
A (k)
p(ijk) E(k) N(ijk) Z(ijk)
Figure 1. In the proposed model, the human mind has the goal of maintaining the most stable inner state; this goal is reached by planning, acting and monitoring congruent behaviors able to usefully act in the environment. In the planning of these behaviors the mind has to know the characteristics of the system to set the best way and the best time where perform the action to get the best efficacy with minimum effort.
When this behavior fails to be adaptive, a psychobiological distress occurs whose consequences may overload the biological-somatic functions (psychosomatic disorders) and/or the cognitive functions (psychotic disorders) and/or the behavior controlling functions (behavioral disorders) (Fig. 2). Which kind of cognitive dysfunctions may occur has been discussed elsewhere [1]. The study of psychopathological events through computational tools, like statistics and math, is presently possible owing to the development of psychometric instruments for clinical psychiatry. These instruments allow to cast in a quantitative format the clinical observations of specialists and the self descriptions of patients. Groups of symptoms (syndromes) may be considered as “expression” of the same “pathological event” which may be assessed using the combination (rating scale) of the ratings of each symptom. The rating scale total score usually is computed by just summing the partial scores used to describe the single items; some time this item score is weighted before being added. A statistical validation is performed to confirm that all items converge in only one measure (dimension).
State Variability and Psychopathological Attractors
Rule Inference
Action Planning
505
Incongruity
Matching Device
C(jk)
Da
Expectance
Reality
E(j)
5HT Data Organizer and Memory
"Noise" Filter Working Hypothesis
A(k)
Psychosis Psychosi s
Na
A(1) A(2) A(k) +
p(1) p(2) p(i)
Behavioural System 1 Behavoural System 2 Behavioural System 3
A(k') - E(j) -
Environmental Complexity
R(uk) < 1
Distress Risk
p(ijk)
C(k) =Σ ij E(k)
F(k) = ∆ E
(k)
∆ C(k)
F(k) = Adaptive
Capacity per each Environment Set
Σ
Individual Complexity
R(uk) = Cu(k) Ca(k)
per A(k)
=
ij
Σ
1
i
A(k)
p(ijk) E(k) N(ijk) Z(ijk)
Psychosomatics
Figure 2. When the cognitive system fails to plan and act adaptive behavior because of the lack of cognitive resources, the system loses the ability to decrease the emotional stress. The stress becomes distress and it is perceived as subjective discomfort. The emotional overload further interferes with the cognitive process: behavioral and thought dysfunctions may appear. This process progressively increases in severity toward the complete lack of adjustment behaviors with the appearance of the “residual syndrome”.
Using this psychometric approach it is possible to describe numerically all the main components (syndromes) considered in the mental status assessment and therefore it is possible to study the statistical properties inside and between them; both external evaluations and self descriptions can be considered. 2. Material and methods The study was performed on the clinical database of the Artemis Neuropsichiatrica Outpatient Service, which collects about 1215 patients evaluated either using psychometric scales (externally evaluated) and questionnaires (self evaluated) and free clinical descriptions. Not all the cases were studied with the full assessment of all parameters. For this reason the initial population was reduced from 1215 subjects to a selected group of 369 patients using a conformance level criterion (Fig. 3).
506
P.L. Marconi Visits (1st Sem.) Visits (Last Sem)
Conformity (1st Sem.)
Conformity (Last Sem.)
100 81,4 81,2
8
80
65,2 58,2
6
45,1
3,73
4
60
5,47
56,1
2,96
2,81
40
3,64 2,45
2
20
0
percentage of conformity
Number of Visits
10
0 Total Group
Non Selected Subject
Selected Subjects
Figure 3. Number of visits and conformity in the set of clinical parameters assessed in the original population of 1215 patients (total group) in the non selected group and in the selected group. The selected group of 369 patients has a higher conformity (81.2% at baseline – 81.4% at endpoint) and a higher number of visits done either at baseline and endpoint. 47,9
90
50 45,33
45,35
41,3
80 Percentage
707
60 40 30
40
24
70 50
45,7
45,29
508
117
215
191
98 154
81
144
73
30
Years
100
10
20
20 10 0
10 Totale Group With Mood Disorder with Pers.Disorder - B Selected Group w/o Mood Disorder w/o Pers.Disorder - B
Female
Male
NS
Age
(NS)
Figure 4. Number and age of subjects by group, diagnosis and gender.
The level of conformance was about 81% in the selected group and about 45% in the excluded group. The average age of patients included in the selected group was 45.33, and the female/male ratio was 1.4 (Fig. 4).
State Variability and Psychopathological Attractors
507
Table 1. Description of Clinical parameters and abbreviations used in tables.
Quality of Life - Physical Quality of Life - Environmental Quality of Life - Private Relationships Quality of Life - Psychological Personal Discomfort - Anxiety Personal Discomfort - Depression Personal Discomfort - Aggressiveness Psychopathology - Affective Psychopathology - Thought Processes Psychopathology - Behavior
Type of assessment Self Evaluated Self Evaluated Self Evaluated Self Evaluated Self Evaluated Self Evaluated Self Evaluated External External External
Baseline Initials QOL_PHY1 QOL_AMB1 QOL_SOC1 QOL_PSY1 RAI_ANX1 RAI_DEP1 RAI_AGG1 3TR_AFF1 3TR_IDE1 3TR_COM1
Endpoint Initials QOL_PHY2 QOL_AMB2 QOL_SOC2 QOL_PSY2 RAI_ANX2 RAI_DEP2 RAI_AGG2 3TR_AFF2 3TR_IDE2 3TR_COM2
Personality - Novelty Seeking Personality - Reward Dependence Personality - Harm avoidance Personality - Dysthymic State Personality - Trusting
Self Evaluated Self Evaluated Self Evaluated Self Evaluated Self Evaluated
PER_ESP1 PER_DIP1 PER_PAU1 PER_AST1 PER_FID1
PER_ESP2 PER_DIP2 PER_PAU2 PER_AST2 PER_FID2
Parameter Description
The 51.5 % of the subjects in the selected group was affected by a mood disorder. 9.2% of patients was diagnosed as affected by a Personality type-B Disorder. 15 parameters were included in statistics (Tab. 1): 4 parameters describe the quality of life and level of adjustment of the patient; 3 parameters describe the perceived inner feelings; 5 parameters describe personality features; 3 parameters describe externally observed psychopathology. To evaluate the type and significance of the change in the variability of values between baseline and endpoint, the average variance of values observed in each patient per period was computed (at baseline and at endpoint). The significance of the difference between the two observations in the absolute values and in the average variance between intrasubject observations was computed. To evaluate the level of the trend to get a “coherence” between values, a factorial analysis was performed with the baseline and endpoint data sets. The analysis has been performed in the total selected group and in 3 subgroups with different diagnosis. To evaluate other functions distinguishing the subgroups a discriminant analysis was performed using the factors extracted at baseline and at endpoint in the total group.
P.L. Marconi
508
T Score From General Pop. Average
CGI 1st Sem. CGI Last Sem.
QoL 1st Sem. QoL Last Sem.
MFP 1st Sem.
MFP Last Sem.
35 30 25 20 15 10 5 0
Totale Group With Mood Disorder with Pers.Disorder - B Selected Group w/o Mood Disorder w/o Pers.Disorder - B
Figure 5. The change between first assessment and last one in three parameters adopted as outcome indicators: CGI - Clinical Global Impression; QoL - Quality of Life; MFP - Average Score of all Psychopathological factors. Changes are categorized as total group and by subgroups. The bars are indicating the T-Scores from the average line of the General Population. Differences are statistically significant (p < .001).
3. Results 3.1. Study of variance between intrasubject observations Any of the included outcome measures shows a decrease of the absolute values and of the computed variability between baseline and endpoint (Fig. 5); the variability among the scores got from different visits still reduces between the first semester and the last one, even when corrected for the variability explained by the clinical changes (Fig. 6). 3.2. Factor Analysis in the total selected group From the baseline data set 4 factors were extracted (Tab. 2) in the total group of 369 patients. Three of these factors (1st factor, 3rd factor and 4th factor) were related to the three components of temperament described by Cloninger as Harm Avoidance (HA), Novelty Seeking (NS) and Reward Dependence (RD) [2]. The self perception of quality of life is linked to the Cloninger’s HA converging in the main factor (“maladjustment”).
State Variability and Psychopathological Attractors Var 1st Sem
Var Last Sem.
VR 1st Sem
509
VR Last Sem.
T Score From General Pop. Average
15
10
5
0
-5
Total Group
With Mood Disorder with Pers.Disorder - B Selected Group w/o Mood Disorder w/o Pers.Disorder - B
Figure 6. Variability computed in all outcome parameters (Var) and corrected by the amount of clinical change (VR) categorized as total group and by subgroups. The bars are indicating the TScores from the average line of the General Population.
The “affective dimension” get relevance and both “hypomanic” and “dysthymic” components correlate within the same “bipolar factor”. From the endpoint data set 5 factors were extracted (Tab. 3). Three factors (1st factor, 3rd factor and 5th factor) are related to the three Cloninger’s components of temperament and they are correlated with the baseline Cloninger’s components related factors (Tab. 4). The fourth factor is mainly related to the dysthymic state parameter. The main factor (maladjustment), as in the baseline and even at the endpoint, describes the quality of life and assesses the level of adjustment. It is related also either to the Cloninger’s HA and to the descriptors of subjective anxious depressive discomfort and distress. This factor is correlated to the main factor extracted at baseline (Tab. 4). The 2nd factor (“psychopathology) is related to all psychopathological parameters externally evaluated, and it is also correlated to self description of the emotional distress. This factor at the endpoint is not correlated with any one at the baseline, not even with the correspondent “psychopathology” factor (Tab. 4). The extracted factors at baseline and endpoint are statistically orthogonal each other. However some of the original variables are correlated with more than one factor. Six parameters at the baseline are correlated with more than one factor (Tab. 2); the dysthymic state component correlates not only with the “bipolar” factor but also with the psychopathology factor; the relational attitude to feel
P.L. Marconi
510
Table 2. Factor Analysis on the total selected group at the baseline. Table was computed after varimax rotation. The total variability explained by the 5 factor solution was 61.3%. % Variance Factor # QOL_PHY1 RAI_ANX1 QOL_PSY1 RAI_DEP1 QOL_AMB1 QOL_SOC1 RAI_AGG1 3TR_IDE1 3TR_AFF1 3TR_COM1 PER_ESP1 PER_AST1 PER_PAU1 PER_DIP1 PER_FID1
Maladjustment 25.0 1 -.774 .741 -.672 .651 -.651 -.643 .465
Psychopathology 19.6 2 -.390 .406
Bipolarity 9.1 3
Dependence 7.6 4
-.446 .856 .821 .816 .760 .641 -.552
.375
.481
.849 .506
.422
Table 3. Factor Analysis on the whole selected group at the endpoint. Table was computed after varimax rotation. The total variability explained by the 5 factor solution was 71.8%.
% Variance Factor # QOL_PHY2 QOL_AMB2 PER_PAU2 QOL_PSY2 RAI_ANX2 QOL_SOC2 RAI_DEP2 3TR_AFF2 3TR_IDE2 3TR_COM2 RAI_AGG2 PER_ESP2 PER_FID2 PER_AST2 PER_DIP2
Maladjustment 29.2 1 -.844 -.785 .773 -.748 .733 -.733 .647
Psychopathology 17.4 2
Hypomania 9.4 3
.454 -.369 .431 .848 .716 .678 .480
Dysthymia
Dependence
8.4 4
7.5 5
.399 .823 .617 .846 .954
reliable others is linked both to the personal discomfort and maladjustment and to the level of dependence referred by the subject. Four parameters are correlated with more than one factor at the endpoint (Tab. 3); the self description of anxiety, sadness and quality of affective relationships are correlated with both the “psychopathology” and the
State Variability and Psychopathological Attractors
511
Table 4. Correlation between factors extracted at baseline (rows) and factor extracted at endpoint (columns). Maladjustment Admission Factor Maladjustment Psychopathology Bipolarity Dependence
.504
Psychopathology
Hypomania
Dysthymia
.256 .507
Dependence
.186 .309 .495
“maladjustment” factors; the external assessment of thought disorders at the endpoint is linked also to the presence of a dysthymic state. Almost all factors at baseline are linked each other by common parameters (Tab. 2) and only three final factors are linked by common parameters (Tab. 3). At the baseline (Tab. 2) the Cloninger’s components of personality are less statistically independent each other then it is observed at the endpoint (Tab. 3). All the factors found are more “coherent” each other at the baseline, then at the endpoint, suggesting a higher reduction of flexibility in the pathological state diagnosed at baseline, even if “persistent” “pathological” components at the endpoint are found. Correlations between baseline and endpoint factors are present but they are found to be relatively weak (Tab. 4) 3.3. Factor Analysis in the affective disorder subgroup When just only the subgroup with an affective disorder diagnosis (190 subjects) is selected, 5 factors were extracted at baseline (Tab. 5). The first four factors extracted were highly correlated with the four factors extracted at baseline in the total selected group (Tab. 6). However a fifth factor was also extracted with some relations with the total group first factor (Maladjustment) (Tables. 5,6). In the factor analysis performed at the endpoint only 4 factors were extracted (Fig. 7) highly correlated with four of the five factors extracted at the endpoint of total group (Tab. 8). The fifth factor extracted in the total group (dysthymia) is related mainly to the second factor (Tab. 8), with which the psychopathological parameters and social distress self descriptors are correlated (Tab. 7). The dysthymic component is also related with the 4th factor (related to the Cloninger’s RD component) (Tab. 8). Either at the baseline and at the endpoint four factors have common parameters correlated with (Tab. 7); however at the baseline all four factors are linked each other as a single “cluster”, since at the endpoint we found two “clusters” of interrelated factors: the first “cluster” (maladjustment, subjective
P.L. Marconi
512
Table 5. Factor Analysis on the patients affected by a mood disorder at baseline. Table was computed after varimax rotation. The total variability explained by the 5 factor solution was 69.3%. Adjustment % Variance Factor # QOL_PHY1 QOL_PSY1 RAI_DEP1 QOL_AMB1 RAI_ANX1 QOL_SOC1 3TR_AFF1 3TR_IDE1 3TR_COM1 PER_AST1 PER_ESP1 PER_PAU1 PER_FID1 PER_DIP1 RAI_AGG1
24.5 1 .823 .719 -.715 .706 -.663 .657
Psychopathology 19.1 2
Bipolarity
Dependence
Irritability
8.3 4
7.7 5
9.6 3
.447 -.381 .909 .872 .821 .772 .685 -.563
-.458
.383 .780 .722
.840
Table 6. Correlation between factors extracted at baseline in the affective group (rows) and factor extracted the total group (columns). Total Group Factors Baseline Factor Adjustment Psychopathology Bipolarity Dependence Irritability
Maladjustment -.955
.248
Psychopathology .974
Bipolarity
.969
Dependence
.924
distress and psychopathology) appears to have no common parameters with the second “cluster” (hypomania and dependence) (Tab. 7). No correlation between psychopathology factor at baseline and psychopathology factor at endpoint is found (Tab. 9). The presence of a state of irritability at baseline is correlated with the level of maladjustment at endpoint; the level of pathologic hypomania at endpoint is correlated with the severity of bipolarity and with the level of dependence at baseline. An irritability detected at baseline and a higher level of adjustment relates with a lower level of hypomania at endpoint. Weak Correlations between baseline and endpoint factors are found (Tab. 9).
State Variability and Psychopathological Attractors
513
Table 7. Factor Analysis on the patients affected by a mood disorder at endpoint. Table was computed after varimax rotation. The total variability explained by the 4 factor solution was 65.6%. Maladjustment % Variance Factor # QOL_PHY2 QOL_AMB2 PER_PAU2 QOL_PSY2 RAI_ANX2 QOL_SOC2 RAI_DEP2 3TR_AFF2 3TR_IDE2 PER_AST2 PER_FID2 PER_ESP2 RAI_AGG2 3TR_COM2 PER_DIP2
25.0 1 -.844 -.775 .753 -.712 .700 -.697 .585
Psychopathology Anxious - Depress. 19.6 2
Psychopathology Hypomania 9.1 3
.381 -.443 .421 .847 .843 .519
.673 .606 .604 .549
Dependence 7.6 4
.502 .879
Table 8. Correlation between factors extracted at endpoint in the affective group (rows) and factor extracted the total group (columns) Total Group Factors Endpoint. Factor Maladjustment Psychopathology Anxious Depress. Psychopathology Hypomania Dependence
Maladjustment .992
Psychopathology
Hypomania
.781
Dysthymia
Dependence
.541
.456
.779
-.459
-.215
.412
.218
.917
Table 9. Correlation between factors extracted at baseline (rows) and factor extracted at endpoint (columns) in the affective sub group. Total Group Factors Admission Factor Adjustment Psychopathology Bipolarity Dependence Irritability
Maladjustment
Psychopathology Anx.-Depress.
Psychopathology Hypomania
-.454
-.329
.286
.242 .233 -.228
Dependence
.322 .323
P.L. Marconi
514
Table 10. Factor Analysis on the patients affected by a Personality Type-B Disorder at baseline. Table was computed after varimax rotation. The total variability explained by the 5 factor solution was 71.1%. % Variance Factor # 3TR_IDE1 3TR_AFF1 3TR_COM1 QOL_SOC1 QOL_PSY1 RAI_DEP1 RAI_AGG1 QOL_PHY1 QOL_AMB1 RAI_ANX1 PER_DIP1 PER_FID1 PER_ESP1 PER_PAU1 PER_AST1
Psychopathology 29.6 1 .905 .899 .875 -.694 -.646
Maladjustment 17.1 2
-.465 -.351 .470
-.424 .842 .704 -.670 -.524 .484
.443
Dependence 14.6 3
Dysthymia 9.8 4
.432 .824 .770 .627
-.357
-.836 .536
.498
Table 11. Correlation between factors extracted at baseline in the Personality Type-B Disorder subgroup (rows) and factor extracted the total group (columns). Total Group Factors Admission Factor Psychopathology Maladjustment Dependence Dysthymia
Maladjustment
.846
Psychopathology
Bipolarity
Dependence
.945 -.768
.930
3.4. Factor Analysis in the Personality Disorder type B subgroup In the Personality type-B subgroup, which includes a cluster of severe personality disorders, the number of subjects is a little bit lower (N=34) than the limits recommended for factor analysis (N= 4 × [Number of original parameters] = 4 × 15 = 45). This lack of conformance with the methodological constraints may lead to “low resolution” factors: the low communality variables are at risk to be not reliably linked to one or another factor. However the matching with the factorial model extracted in the global group may support an appropriate interpretation of results. In this subgroup 4 factors are extracted at baseline (Tab. 10), highly correlated with the baseline four factors extracted in the total group (Tab. 11).
State Variability and Psychopathological Attractors
515
Table 12. Factor Analysis on the patients with a Personality Type-B Disorder at end point. Table was computed after varimax rotation. The total variability explained by the 4 factor solution was 72.9%.
% Variance Factor # QOL_PHY2 QOL_SOC2 RAI_DEP2 QOL_AMB2 3TR_IDE2 QOL_PSY2 RAI_ANX2 PER_PAU2 3TR_AFF2 PER_AST2 PER_FID2 PER_ESP2 RAI_AGG2 PER_DIP2 3TR_COM2
Maladaptive Psychopathology 42.4 1 -.899 -.871 .852 -.805 .792 -.791 .765 .741 .615 .591 .495
Hypomania 13.5 2
Dependence 8.8 3
Incongruent Behavior 8.2 4
.374 -.356 .835 .761 .644
.364 .806 .923
The main factor (“psychopathology”) is related to a perceived low quality of life, to a hypomanic state, and to the psychopathological signs externally evaluated. Instead the parameters describing the subjective distress and level of adjustment correlate with the 2nd factor (“maladjustment). Cloninger’s HA is negatively related to the 4th factor (“dysthymia”) (Tab. 10). The “subjective anxiety” parameter correlates with all the four dimensions extracted, and the Cloninger’s NS is correlated with the Cloninger’s RD and the relational attitude expressed by patient (Tab. 10). At the endpoint 4 factors also are extracted (Tab. 12). Three factors highly correlate with 3 of the 4 factors extracted in the total group at the endpoint (Tab. 13); a new factor is extracted (“Incongruent Behavior”) and the externally evaluated psychopathological parameters are partially included in the 1st factor (Tab. 12). The initial two different dimensions linked to psychopathology and subjective distress extracted in the total group converge in only one factor, correlated with about almost all parameters (11/15 variable). This factor correlates with the Cloninger’s HA (Tab. 12), the subjective distress self described at baseline (Tab. 14) and it is highly correlated with the final main factor of the total group (Tab. 13). The 3rd factor, correlated with the Cloninger’s RD component, is characterized by the “hypomania” factor (related to the Cloninger’s NS component of temperament) and by psychopathology still detected at the final observation in the total group (Tab. 13). The “hypomania”
P.L. Marconi
516
Table 13. Correlation between factors extracted at endpoint in the Personality Type-B Disorder subgroup (rows) and factors extracted on the total group (columns). Total Group Factors Endpoint Factor Maladaptative Psychopathology Hypomania Dependence Incongr. Behav.
Maladjustment .925
Psychopathology
Hypomania
Dysthymia Dependence
.485 .940
-.475
-.459
.784
Table 14. Correlation between factors extracted at baseline (rows) and factors extracted at endpoint (columns) in the Personality Type-B subgroup. Final Factors Admission Factor Psychopathology Maladjustment Dependence Dysthymia
Maladjustment Psycho-pathology 0.529
Hypomania
0.585
Dependence
Incongruent Behavior
0.481
factor is highly correlated with the “hypomania” factor extracted in the total group (Tab. 13). The last factor (“incongruity”) is correlated to a deviant behavioral psychopathology statistically independent from the other externally assessed psychopathological signs (Tab. 12). This factor inversely correlates with the “dysthymia” factor seen in the total group indicating a relationship with a hyperthymic state. The 4 factors extracted at baseline and those extracted at the endpoint were weakly correlated each other. In this subgroup the main factor is correlated with 10/15 parameters at baseline, and with 11/15 parameters at the end point, since the average number in the other subgroups (“affective disorders” and “other disorders”) is 8 and 7 parameters respectively. Moreover 6 parameters are related with more then one factor at the baseline and 4 at the endpoint. All the four factors were linked each other by common parameters at the baseline, and only 3 on 4 factors were linked at the end point. These evidences and the large number of parameters correlated with the main factor may be explained by the small number of subjects. However a strong interrelationship between parameters (i.e. a high coherence) due to the severe personality disorders included (i.e. high psychological rigidity) may be not excluded. The “psychopathology” factor has more relevance than in the other subgroups and in the total group. At the baseline it is the main factor and at the
State Variability and Psychopathological Attractors
517
Table 15. Factor Analysis on the patients not included in the other two subgroups at baseline. Table was computed after varimax rotation. The total variability explained by the 5 factor solution was 62.7%.
% Variance Factor # QOL_PHY1 RAI_ANX1 QOL_AMB1 QOL_SOC1 QOL_PSY1 PER_FID1 PER_PAU1 3TR_IDE1 3TR_COM1 3TR_AFF1 PER_DIP1 RAI_AGG1 RAI_DEP1 PER_ESP1 PER_AST1
Maladjustment
Psychopathology
25.7 1 -.820 .804 -.712 -.707 -.635 .632 .517
17.1 2 -.359
Attachment Disorder 11.2 3
Cyclothymia 8.8 4
-.390 .832 .822 .695
.351
-.730 .648 .613
.407
.818 .666
Table 16. Correlation between factors extracted at baseline on the patients not included in the other two subgroups (rows) and factors extracted on the total group (columns). Total Group Factors Admission Factor Maladjustment Psychopathology Attachment Disorder Cyclothymia
Maladjustment 0.883 0.448
Psychopathology 0.969
Bipolarity
Dependence 0.491
0.242 0.952
-0.666
endpoint “maladjustment” factor and “psychopathology” factor converge in only one factor (“maladaptive psychopathology”). Mood descriptors seems to be the parameters related to more then one factor at the endpoint (Harm Avoidance, Affective Psychopathology, Angriness, Novelty Seeking), while at the baseline both bipolar and anxiety descriptors (Subjective Anxiety, Dystimic state, Novelty seeking) relate with more than one factor. 3.5. Factor Analysis in the remaining subgroup The group of 169 patients not included in the “affective subgroup” or in the “Personality Type-B subgroup” are considered apart.
P.L. Marconi
518
Table 17. Factor Analysis on the patients not included in the other two subgroups. Table was computed after varimax rotation. The total variability explained by the 4 factor solution was 68.3%. % Variance Factor # QOL_PHY2 QOL_PSY2 QOL_AMB2 PER_PAU2 QOL_SOC2 RAI_ANX2 RAI_DEP2 3TR_AFF2 3TR_IDE2 3TR_COM2 RAI_AGG2 PER_DIP2 PER_ESP2 PER_AST2 PER_FID2
Maladjustment 26.9 1 -.799 -.786 -.774 .723 -.705 .696 .640
Psychopathology 21.3 2 -.379
Dependence 11.2 3
Trusting 8.9 4
.417
-.417 .509 .479 .889 .782 .699 .507
.747 -.690 -.465
.464
.486 -.390 .811
Table 18. Correlation between factors extracted at endpoint on the patients not included in the other two subgroups (rows) and factors extracted on the total group (columns) Total Group Factors Endpoint Factors Maladjustment Psychopathology Dependence Trusting
Maladjustment .988
Psychopathology .953
Hypomania
-.612 .638
Dysthymia Dependence
.331 -.567 -.621
.750
Four factors are extracted at the baseline (Tab. 15), but only three are correlated with the baseline factors extracted in the total group (Tab. 16: “maladjustment”, “psychopathology”, “bipolarity”-“cyclothymia”). The remaining factor (3rd factor: “attachment disorder”) is inversely related to the “dependence” factor and positively correlated to the “maladjustment” factor and to the “bipolar” factor extracted in the total group (Tab. 16). At the endpoint four factors are extracted also (Tab. 17) but only the first two factors (“maladjustment” and “psychopathology”) are correlated with the first two extracted in the total group (Tab. 18). The other two factors were linked to both “dysthymia” and “hypomania” baseline factors of the total group (Tab. 18), but one of them is related to “dependence” also (Tabb. 17,18) and the other one to the patient’s believing in the reliability of other’s attitude toward self (Tab. 17).
State Variability and Psychopathological Attractors
519
Table 19. Correlation between factors extracted at baseline (rows) and factor extracted at endpoint (columns) on the patients not included in the other two subgroups. Endpoint Factors Baseline Factors Maladjustment Psychopathology Attachment Disorder Cyclothymia
Maladjustment 0.445
Psychopathology
Dependence
0.240
Trusting 0.270
-0.436 -0.526
0.274
At the baseline 4 parameters are correlated to more then one factor and all baseline factors are linked each the other by means of correlated parameters (Tab. 15). The variable linked to more then one component are very few, linked mainly to “anxiety” symptoms (somatic complaints, harm avoidance) and “depressive” feelings (dysthymia and sadness). At the endpoint 7 parameters are correlated to more than one factor and still all the factors are linked each other (Tab. 17). Still the variables linked to more than one factors are concerning anxiety (somatic complaints, harm avoidance) and “hyperthymia” (hyperthymia, aggressiveness). At the baseline, taking into account the factorial models extracted from the other subgroups, the peculiarity is the “attachment disorder” factor which is related to a low expression of dependence when high subjective discomfort and a high mood level are described also (Tab. 15). This factor appears to be correlated to a personal discomfort, linked to a “decrease” of the feeling of attachment and dependence associated to an increase of explorative behavior and hypomanic affective state. Such a description is congruent with the psychopathological description of people with “attachment pathology”, in which the actual dependence state is “denied”, with an “independence attitude” (where actual affective dependence still remains) sustained by an elevate activation related to an high subjective distress. The “dependent” factor observed at the endpoint has less “pathological features” but it is also linked to an euthymic state. 3.6. The Discrimanant Analysis between subgroups When a discriminant analysis is performed between 4 subgroups (affective disorder only, cluster B Personality disorder only, both the previous disorder, and nothing of the previous disorder), three discriminating factors can be extracted at baseline (“Maladaptive Psychopathology”, “Dependent Attitude”, “Hypomania”) (Tab. 20) and at the endpoint (“Persistent Maladjustment”, “Hypomania”, “Persistent Psychopathology”) (Tab. 21).
520
P.L. Marconi
Table 20. Standardized Coefficients of the Discriminant Functions (upper table) and group centroids (lower table). The used parameters are the factors extracted at baseline in the total group. Only the first function has a significant discriminating power. BLF: Baseline Factor. PDB: Personality Disorder type B. AD: Affective Disorder. Function # Significance BLF1-Adjustment Distress BLF2-Psychopathology BLF3-Hypomania BLF4- Dependence
Group 1 (PBD- / AD-) Group 2 (PDB+ / AD-) Group 3 (PDB- / AD+) Group 4 (PDB+ / AD+)
1 p=.003 .512 .835 .141 .204 Maladaptive Psychopathology -.255 .138 .235 .166
2 NS -.421 .075 -.202 .882 Dependence .008 .178 .057 -.572
3 NS .445 -.469 .645 .399 Hypomania -.009 .651 -.043 .068
However only the first factors on both analysis have a significant discriminating power (p =.003) (Tabb. 20, 21). These factors involve both a maladjustment, with subjective distress and the lowering of the perception of the quality of life. It is the patients’s self perception of a disfunctionality in its own capability to accomplish the best relationship between self and non-self, caring at the same moment the best inner state (see Fig. 1). At the baseline this factor is mainly correlated with psychopathology; the presence of hypomania and of a dependent attitude also can lead to an increase of severity on this dimension (Tab. 20). At the endpoint this factor seems to be not correlated to psychopathology, dependence attitude and dystimia, but a little bit influenced just only by hypomania (Tab. 21). Ranking subgroups by the “maladjustment-psychopathology” factor at the baseline the most severe diagnosis is Affective Disorder followed by the two groups with a Personality type-B Disorder (with or without an Affective Disorder). The less severe is the diagnosis involved in the last subgroup (Tab. 20). At the endpoint the most severe maladjustment is observed in the two subgroups with a Personality Type-B Disorder: the worst is that one with a comorbility of an Affective Disorder. The subgroup with an Affective Disorder has a lower rank and it is less maladaptive than the first two subgroups (Tab. 21). Matching the results obtained at the admission and after treatment, it can be argued that at baseline and at the endpoint two different causes of maladjustment can be evidenced.
State Variability and Psychopathological Attractors
521
Table 21. Standardized Coefficients of the Discriminant Functions (upper table) and group centroids (lower table). The used parameters are the factors extracted at endpoint in the total group. Only the first function has a significant discriminating power. EPF: Endpoint Factor. PDB: Personality Disorder type B. AD: Affective Disorder. Function # Significance BPF1-Adjustment Distress BPF2-Psychopathology BPF3-Hypomania BPF4- Dependence Group 1 (PBD- / AD-) Group 2 (PDB+ / AD-) Group 3 (PDB- / AD+) Group 4 (PDB+ / AD+)
1 p=.003 .931 -.058 .360 -.081 Maladjustment -.280 .245 .184 .466
2 NS -.268 .230 .740 .373 Hypomania .019 .878 -.099 .121
3 NS .221 .795 -.413 .353 Psychopathology .016 -.106 -.034 .205
At baseline the maladjustment is linked mainly to the affective state of the patient which interferes with functioning and which is leading to a subjective discomfort and distress. At the endpoint stable personality components can be the cause of the persistent maladjustment. 4. Discussion Data confirm the naturalistic model of expected outcome where the highest variability is observed at the baseline and the lowest at the endpoint; the goal of the clinical intervention is to make the patient population as the best similar to the normal population (Fig. 7). This model is different from the model of an experimental setting, in which the baseline population is the most homogeneous as possible and the outcome has to discriminate different effects of different settings and treatments (Fig. 7). A reason why a loss in “flexibility” of symptoms is not detected may be linked to the fact that the lack of flexibility is already inbuilt in the rating criteria of the adopted scales. In fact the “score” of severity increases as the “stability” (= rigidity) of the “symptoms” and their non affectability by external symptoms or inner control of wellness increases also. The “lack of flexibility”, expected to be seen in the “affected group” as loss of variability, may be “translated” by these scoring criteria as “high score” in the “severity level” of evaluated symptoms. However it is possible to argue that the highest instability found at baseline can be linked to the “lack of flexibility” of the system. An emergent property of a “complex and flexible” system may be an higher “stability” in spite of environmental input changes.
P.L. Marconi
522 #→ ∞ " $
→! !
%> 70
#→ ! " $
→" !
%→ &
!
"
!
'
"
#→ ∞ " (
→! !
%→ &
#→ ! " (
→" !
%(
Figure 7. The Experimental and Clinical Model compared. DS0 Standard Deviation at baseline, DS1 Standard Deviation at endpoint. T = T-Score. F = F parameter from the ANOVA. P = probability of F.
A lack of flexibility in the psychopathologic state can be the interpretation of the factor analysis results. In fact these results highlight the phenomena of “increased coherence” between symptoms and subjective feelings and self reported thoughts and behavior, more evident in baseline than at the endpoint. A constant presence of factors describing mood was detected as it was evidenced the importance of parameters describing anxiety and relational attitudes. The selected group was chosen as the most “complete” as regards the set of recorded descriptors but the group of patients selected is characterized by the prevalence of mood disorders, anxiety disorders (the latter at risk of developing into anxious depressive state), dependent personalities, attachment disturbances, temperamental instabilities which lead to maladjustment and personal or social distress. The diagnostic classification of the subjects included was mainly belonging to the affective and anxiety disorders classes and, as personality traits, in the “C” and “B” personality sub types (Fig. 8). In these subtypes of personality disorders
State Variability and Psychopathological Attractors
523
Table 22. Factors extracted in a previous study were Quality of Life, Subjective Distress and hetero-evaluated psychopathology not included 3. F1 F2 F3 F4 F5
External control Harm Avoidance Novelty Seeking Reward Dependence Psycho-Astenia
Mean 3.72 5.16 0.63 1.23 1.28
SD 6.09 7.92 6.78 8.85 10.41
t 4.68 5.00 0.72 4.68 0.95
Sig (2-tail) 0.001 0.001 0.475 0.289 0.347
the linkage with affective and anxiety disorders was already found as to the presence of dependent attitude (type C). This composition can influence the type of factorial models extracted and the parameters with the highest influence on more than one factor. About the “total group” factor model, the 5 factor model extracted from the endpoint data set in the total group are similar to the one (Tab. 22) extracted from a different group of patients studied with a similar but not equal set of parameters, since there was a lacking of the “psychopathological descriptors” externally evaluated by the specialist as well as a drop of quality of life descriptors. In a study already published [3], the main component was linked to the relational attitude of the patient, and 3 of the 5 factors were correlated with the three Cloninger’s temperament components (Novelty Seeking, Harm Avoidance, Reward Dependence). The 5th factor instead was linked to a “dysthymia” component. These data confirm the importance of the Cloninger’s three-factorial model of temperament as basic background over which psychopathology superimposes its own interference with affective disorders. The quality of the affective relationships linked to patient’s psychology sustains a background level of distress. The presence of an interfering factor linked to the medical psychiatric pathology and sensitive to drug treatment may be the reason why are low the correlations between baseline and endpoints factors (Tabb. 4, 9, 14, 19). In the present study the 5 factors extracted from the endpoint recorded data were containing also parameters describing an externally evaluated psychopathology and a self evaluated level of maladjustment perceived as reduced “quality of life” (Tab. 3). These factors are clearly linked to the new set of parameters, which confirm their descriptive importance. In fact they aggregate into the two main factors which describe two important aspects of the model proposed in Figg. 1, 2.
P.L. Marconi
524 Adjustment Dis. Anxiety Disorder
Eating Disorder Substance Abuse
Impulses Dyscontrol Personality C-Type
Affective Group
Personality B-Type
"Other"
Total Group
0
10 20 30 40 50 60 70 80
Figure 8. Comorbility detected in the total group and in the subgroups. Affective Disorder and Personality B-type were included only in the Affective subgroup and in Personality B-Type subgroup. These two diagnostic classes were excluded by the “Other” subgroup.
The “subjective” descriptors of distress and the external descriptors of psychopathology are found to correlate into two different factors. These data suggest the existence of different and unrelated criteria as regards the client and the specialist evaluation of the “problem” (Fig. 9). On one hand the client request seems to be reducing its own discomfort and increasing its level of perceived adjustment; on the other hand the specialist “translates” such a request as equivalent to reaching the disappearance of psychopathological signs and symptoms (Fig. 9). However a common factor “linking” the two perspectives may be found in the “functional dimension”. In our data this functional dimension seems to be the comfortability of “social relationships”, together with the related aggressiveness, incongruent behavior, anxiety and/or subjective distress (Tabb. 2,3,5,7,17). The importance of the “social relationships” may be linked to the finding of the shared group of parameters correlated with both the psychopathology descriptors and the subjective maladjustment descriptors. These parameters, in fact, are linked to relational discomfort (anxiety, somatic complaints, quality of social relationships, sadness). This relational dysfunction (related to
State Variability and Psychopathological Attractors %1
).
525
%1 %1
+ , - ./ 0
%
)
* ) Figure 9. The relationship between the patient goal and the specialists target. The last ones are intermediate goals directed to the patient final goal. Sometimes however an uncoupling between the two levels is observed as it seems evidenced by the present study data.
psychopathological underlying problem) may lead to a personal discomfort which makes client asking for help. In the total group at endpoint the “psychopathology” factor (the second factor in many factor analyses) has the “thought disorder” parameter “shared” with the “dysthymic factor” (Tab. 3). In other subgroups, at endpoint, there is a hyperthymic/dysthymic component (Tab. 12) or a bipolar component related (Tab. 17) to the psychopathology factor directly or indirectly through common shared parameters. All these results may be interpreted as the presence of mood persistent disorders at endpoint that lead to relational problems which are the perceived component of the personal discomfort in about all chronic psychopathological process. Many studies moreover give importance to the quality of social relationship as “protecting” factor also for reducing the risk of psychopathology. These results concern the “endpoint” data, where “residual” underlying psychopathological processes and the personal discomfort can interfere each other, creating a linkage between component originarily statistically independent but linked by the emergence of “symptoms”. In Table 23 an overview of all results of the factor analysis, performed in each subgroup, is presented together with the above considerations. Quite in all data analysis 4 factors are extracted, using the criterion of an Eigenvalue > 1. Only at baseline of the affective subgroup and at the endpoint of total group 5 factors were extracted as at the endpoint of total group. However a
P.L. Marconi
526
Table 23. Factors extracted in a previous study were Quality of Life, Subjective Distress and hetero-evaluated psychopathology not included 3.
Number of subjects BASE LINE Factors Number Factor List Linked Factors “Central” Factor
Warm Parameters Type of Warm Parameters END POINT Factors Number Factor List
Linked Factors “Central” Factors Warm Parameters Type of Warm Parameters Emergent Features Notes
Total Group
Affective Disorder
Personality Type-B
Other
369
190
34
166
4
5 Adjustment Psychopathology Bipolarity Dependence Irritability 4
4
4
Maladjustment Psychopathology Bipolarity Dependence 4 HA-QoL (Social Anxiety) QOL_PHY QOL_SOC PER_AST PER_PAU PER_FID RAI_ANX Socio Relational Anx.-Depressive
Psychopathology Maladjustment Maladjustment Psychopathology Dependence Attachm. Disorder Dysthymia Cyclothymia
4 3 Psychopathology HA-QoL Maladjustment HA-QoL (Subjective Anxiety) QOL_PSY QOL_PHY QOL_PHY QOL_SOC QOL_AMB PER_PAU PER_PAU PER_AST PER_ESP RAI_ANX RAI_DEP PER_AST RAI_ANX Bipolar Socio Relational Anxious-Depress. Adjustment
5 4 4 4 Maladjustment Maladjustment Maladapt. Psychop. Maladjustment Psychopathology Psychop.-Anx-Dep Hypomania Psychopathology Hypomania Psychop.-Hypom. Dependence Dependence Dysthymia Dependence Incongr. Behav. Trusting Dependence 3 3 4 2×2 groups HA-QoL Maladjustment Psychopathology (Social Anxiety) (Social Anxiety) (Social Anxiety) (Cyclothymia) Psychopathology (Hypertimia) QOL_SOC QOL_SOC PER_PAU QOL_PHY RAI_ANX PER_ESP PER_ESP PER_PAU RAI_DEP RAI_ANX RAI_AGG PER_AST 3TR_AFF RAI_AGG 3TR_IDE RAI_DEP Socio Relational Socio Relational Hypomanic Anxious Anx.-Depressive Anx.-Depressive Anxious Hypertimic Obsessive Hypomania QoL & HA Socio-Relat. Fact. Adjust. Distr. Attachment Psychop <> QoL Anxiety&Adjustment Mood State & RD Pathol. Social Anxiety Mood State&RD Bipolarity Mood State&RD Diagnostic Low Number of Diversity Cases
factor number higher then 4 seems more reliable, even if in the quoted previous study 5 factors were extracted (Tab. 22) [3].
State Variability and Psychopathological Attractors
527
The factors extracted are sensitive to the covariation of parameters between subjects. If they show different patterns of covariation this feature can lead to splitting of factors, where the use of different populations can lead to different factorial models 4. What is found here is just this: different diagnostic groups of patients show different factorial models, on the basis of a common background structure. The underlying structure seems to be characterized by 6 background “components”: 1. The subjective distress linked to the adjustment level and perceived as quality of life. There is a background interference on this perception done by the attitude of the patient to feel insecure, unable, fearful of what it is not well known about the present and/or the future (linked to the Cloninger’s HA component). 2. The observed “psychopathology” as dysfunction of behaviour, thought processes and inner feelings. 3. The quality of private relationships with an inner feeling of anxiety, sadness, aggressiveness linked to the perceived unsatisfaction. 4. The level of the mood as activity level, trend toward exploration and frequency of trail-shift (linked to the Cloninger’s NS component). 5. The level of the mood as energy level, attention, strength, feeling of competence and social matching capability. 6. The need for an external action for the personal emotional full satisfaction (dependence), that can lead the patient to be emotionally dependent from others and seeking for others’ reward (linked to the Cloninger’s RD component). This background component may be twisted in a “denied dependence”, which actually still remains under the surface attitude of “independence”. The expression of the background actual “dependent” or “independent” (different from “autonomic”) attitude may be modulated by the mood level. These background components are differently extracted among different subugroups. The specific pathological process leads to the distinction or to the aggregation of these basic components building the specific diagnostic pattern as the specific factorial model. However we have to take into account also that the number and the quality of the previously listed “background components” may be sensitive to the background diagnostic composition of the total population also [4]. The presence of a pattern of covariation among parameters more “coherent” at the baseline than at the endpoint is evident. The low correlation between factors extracted at baseline and factors extracted at endpoint is evident also.
528
P.L. Marconi
The low correlations between the factors extracted before and after treatment can be caused by the disappearance of interfering psychological factors sensitive to drugs. At the endpoint a persistence of “coherence” between parameters and factors is still detected. This finding can be interpreted as expression of a persistent “psychopathology” as it can be argued also by the persistence of a psychopathology factor and of a maladjustment factor in the endpoint factorial model. Results from discriminant analysis confirm the presence of a main interference at the baseline represented by the mood disturbance, which is the best cured component, while at the endpoint different source of interference still persist. The common functional factor evidenced is the maladjustment and personal discomfort. However it can be detached by the clinical judgment of “recovery” or “ill state”, and it can be linked to the patient’s compliance to treatment. In Personality Type-B Disorder this compliance is very unreliable and the mismatch between clinician’s and patient’s judgment about the presence of an “illness state” is more evident in our data as more frequent in actual clinical experience. The best change in the severity of maladjustment is observed in affective patients where the drug treatment have demonstrated the highest efficacy. 5. Conclusion Values of clinical parameters (either self described or externally evaluated) have more variability at the baseline than at the endpoint, where they have a trend converging to “normal” values. These data can be interpreted as a lower stability of “rigid” systems. Four/five factors are usually extracted by factor analysis performed either in the whole selected population and in the three diagnostic subgroups. These factors are expression of a “coherence” between parameters that can arise when a pathological interference occurs. Pathological states can be judged subjectively by the discomfort and subjective distress caused by maladjustment. The presence of an adaptive dysfunctionality (as controlled behaviour, perceived feelings, and thought process) can be judged however externally also, with evaluations that are not always correlated with the patient’s judgment. There are probably some background components linked to “personality”, as Cloninger already described (Novelty Seeking, Harm Avoidance, Reward Dependence), that influence the base reactivity of patients [2]. Psychological life-experience based components (character) have to be added to this
State Variability and Psychopathological Attractors Treatment
Distorted and Rigid System (Maladaptive) Distortion Psychopathologic
Non Affected
Psychopathologic Process B
Diagnosis 2 Diagnosis 1
Flexible System (Adaptive) Statistical Independence
"Normality"
A
529
"Persistent Distortion"
B
C(jk)
C(jk) E(j)
Maladjustment Risk Area
Treatment
A(k)
E(j)
Adjustment Recovery Area
A(k)
Figure 10. The pathological state is characterized by a “distortion” of the statistical independence between parameters and background components. This condition leads to a lack of flexibility of the system and in turn to a high maladjustment. Such a condition may be represented as a “flat plan” in the “adaptive graph” drawn at the bottom of the figure (see also about Figg. 1, 2). The treatment “reduces” the “coherence” and increases the statistical independence between parameters and components. The unstable state assessed at abscissa line becomes more stable as more adaptive. At the endpoint the “flat plan” in the “adaptive graph” is changed back to normal adaptability.
temperamental bias; the “character” affects the attitude toward and the quality of social relationships which can help people in managing personal stress also. There are therefore three “patterns” which can be overlapped and evidenced in illness states: 1. the acute clinical disorder pattern (in this study affective and/or anxiety disorders mainly), 2. the background temperament, 3. the presence of abnormal personality traits (character) which leads to a kind of disease protection or to a chronic maladjustment. The different illnesses can be classified not only as associated with the “typicality” of the “syndrome” factorial pattern but also as intensities of dysfunction (maladjustment), symptoms (subjective discomfort) and signs (psychopathology).
P.L. Marconi
530
The model proposed is presented in Fig. 10. The presence of a clinical disorder leads to a “distortion” of the “physiologic” pattern which is characterized by an high statistical independence among parameters, but those which are linked to temperament features (constitutional and linked to genome). The pathologic “distortion” is linked to a lack of flexibility and causes a drop in adjustment capabilities and the rise of a bigger instability (increase of variance) in symptoms and signs detected. The clinical picture however is complicated by an interference exerted also by coping style and background reactivity linked to the personality factors (character and temperament). As the clinical disorder (in this case affective or anxiety disorder) is cured, the instability (detected as variability) goes down as the system becomes more adaptive. However at the endpoint the interference exerted by personality factors becomes more relevant and it can be the main cause of the persistence of a maladjustement and personal and/or social discomfort. The present model has to be considered as a working model, and it needs to be confirmed by further evidence either got by larger number of cases or with the use of more parameters to increase sensitivity to different physiopathological components References 1. P.L. Marconi, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006).
2. C.R. Cloninger, Arch. Gen. Psychiatry 44(6), 573-88 (1987). 3. E. Marchiori and P.L. Marconi, in VIII Congress of the International Society for the 4. 5. 6. 7. 8. 9.
Study of Personality Disorders, ISSPD (International Society for the Study of Personality Disorders), (Florence, 2003). P.L. Marconi, in Psicopatologia dimensionale e trattamento farmacologico, Ed. T. Cantelmi and A. D’Andrea, (Antonio Delfino Editore, Roma, 2003). P. Pancheri and P.L. Marconi, Giornale Italiano di Psicopatologia 2(1), (1996). P.L. Marconi, P. Cancheri and R.M. Petrucci, in X World Congress of Psychiatry, Ed. J.J. Lopez-Ibor, F. Lieh-Mak, H.M. Vistosky and M. Maj, (Hogrefe and Huber Publishers, Kirkland, WA, 1999). P.L, Marconi and C. Gambino, in La clinica dell' ansia. Volume II, Ed. G.B. Cassano, P. Cancheri and L. Ravizza, (Il Pensiero Scientifico Editore, Roma, 1992). P.L. Marconi and F. De Palma, in Ossessioni, Compulsioni e continuum ossessivo, Ed. P. Pancheri (Il Pensiero Scientifico Editore, Roma, 1992). P.L. Marconi and P. Cancheri, in Atti del IX Congresso Nazionale di Informatica Medica (Associazione Italiana di Informatica Medica, Università Cà Foscari, Venezia, 3-5 ottobre 1996).
MODELS AND SYSTEMS
This page intentionally left blank
DECOMPOSITION OF SYSTEMS AND COMPLEXITY
MARIO R. ABRAM AIRS - Associazione Italiana per la Ricerca sui Sistemi Milano, Italy Recalling the decomposition methodology, the complexity of the decomposition process is described. The complexity of a system is connected with the depth reached in the decomposition process. In particular the number of subsystems and the number of active relations present in a decomposition are the elements used to define a complexity index. Some considerations about the decompositions sequences allow to put in evidence some basic properties useful to define the maximum values of complexity. Given some hypotheses about the relation patterns due to the starting steps in the decomposition process the range for each decomposition level is evaluated through computer simulations. In addition some connections with other knowledge contexts, as graph theory, are presented. Keywords: decomposition, subsystem, complexity, graph theory.
1. Introduction A possible way to describe a system and to put in evidence its structure may lie in developing a method for the characterization of the system, involving all the aspects connected with the definition of the subsystems and with the possible existence of relations between them. To maintain the control in the description process is not an easy work. Some intuitive and elementary considerations show how the number of binary relations involved between the subsystems increases quickly (with a quadratic law). So it is necessary to apply a rigorous methodology in order to keep complete control on the trend of decomposition of a system into subsystems. In this paper, following again the decomposition approach and using a general decomposition methodology [1] (section 2), we will investigate the possibility of defining and computing some complexity indices of a system. Recalling briefly some ideas about complexity, we will show how the complexity may be defined and evaluated in a decomposition process (section 3). In particular we will define the complexity as a property connected with the number of subsystems and the number of the active relations between the subsystems. Then we will see how the complexity of a system is defined and is meaningful within a specific decomposition level (section 4); in this context
533
534
M.R. Abram
some elementary complexity indices for each decomposition level will be defined evidencing their maximum values. Not all complexity indices are computable, so that some of them will be evaluated through simulations of the decomposition process (section 5). Some methodological remarks are reported in the conclusions (section 6). 2. Decomposition of systems The decomposition process enables to develop the partition of a system into subsystems, maintaining under control the development of the relations between all subsystems. In this way the patterns of relations between the subsystems coming out from the decomposition process are always strictly connected with the decomposition steps for each subsystem. Very briefly, each decomposition step is summarized by the following activities [1,2]: • A subsystem S kn −1 is duplicated with its relations in two new subsystems Pkn and N kn . • The subsystems Pkn and N kn are respectively identified by a property Pkn and its negation Nnk = ¬Pkn . • The relations between the subsystems Pkn and N kn and the other subsystems are reduced eliminating those not coherent with the properties Pkn and Nnk = ¬Pkn . • The new subsystems are then labeled as S kn = Pkn and S kn+1 = N kn . Figure 1 shows a simplified schema of the step n in the decomposition process. The matrix of relations is a picture of all binary relations between all subsystems forming the decomposition pattern. Alternatively the decomposition process may be represented as a directed graph Gn ( S n (Pn ), Rn ) in witch S n is the set of subsystems, Rn is the set of relations between the subsystems, Pn is the definition property used to characterize the subsystem S n . We remark that in this representation the graph structure is enriched by labeling each vertex with the specific property defining the subsystem associated to that node. Nevertheless the matrix of relations is for definition the incidence matrix of the graph Gn ( S n , Rn ) ; so we can investigate which properties of incidence matrices may be useful in describing the relations pattern of the decomposition. The decomposition process is so a synthesis of graphs: a graph of properties and a graph of subsystems. In this way it is possible to give a further definition of system. A system is a couple S n = (Tn , Bn ) where Tn is the graph of
Decomposition of Systems and Complexity
…
…
…
535
Sk
Pkn
S kn −1
S k +1
N kn
…
…
…
Figure 1. Decomposition of a system into two subsystems.
properties and Bn is the graph of subsystems. By construction the graph Tn is a tree of n vertices and n + 1 leaves and the graph Bn is a directed graph of n + 1 vertices and a maximum number of n (n + 1) edges. 3. Complexity and decomposition Different definitions of complexity were developed in order to give a quantitative description and to evaluate the variety of configurations or more in general of the multiplicity of properties in a system. By some authors the concept of complexity has been associated to the idea of redundancy, interpreted as multiplicity of choices. Some of these aspects are taken in consideration when defining a system and become evident when we attempt to decompose a system into subsystems. We are then interested to give a measure of the complexity involved into the decomposition process. In particular we will attempt to investigate how the number of subsystems and of the active relations on them affect the complexity of systems. In a decomposition process, the matrices of relations are isomorphous to the incidence matrices of a directed graph. So a first index of complexity is given by counting the number of elements different from zero in the incidence matrix. This means that this complexity index is given by the number of binary relations involved in that decomposition (the number of edges in the graph). When considering the decomposition process, some deeper evaluations of the hypothesis involved in its definitions are useful, because we need to have a more realistic evaluation of the maximum value of complexity indices related to our applications. An important aspect then emerges from these considerations: the complexity of a system is related to the specific decomposition built for the
536
M.R. Abram
description of that system. In particular the level of complexity of each system is connected with the level of decomposition reached for that system. So speaking about the complexity of a system is improper; instead it is correct to speak about the complexity of the description of a system. Some one attempted to describe the complexity of a system in a general way, but in reality even this description is given with reference to the background model chosen for the system.
4. Evaluating complexity Complexity may be evaluated in different ways, and it is convenient to introduce a definition and a metric that is meaningful and then useful in connection with the specific aspects of the problem. In evaluating the complexity indices three kinds of them are available: the simpler ones are directly functions of the decomposition index and their values are evaluated by means of a linear law. They can be listed as follows: • Logical. Indices related to the number of properties involved into the definition of a system and of the number of subsystems. These values are intrinsic to the decomposition process. • Relational. Indices connected with the number of relations between the subsystems. • Mean. Indices related to average values computed by means of logical and relational indices. While the logical indices are linear and give the dimension of the decomposition (the basic dimension of the problem), the relational indices give an idea of the number of connections involved and the number of interaction or loop structures that are present in the representation. When evaluating complexity indices it is easy to acknowledge that the logical indices are directly computable while the relational indices, being dependent on the choices adopted in the decomposition processes, are not computable in an universal way. In this regard it possible only to compute the maximum values of these indices that characterize the range of the values they can span. Given a decomposition step n and an index X , in defining each metric we will use the following convention: • Effective Value N Xn . • Maximum Value U Xn . • The only independent variable will be the decomposition index n .
Decomposition of Systems and Complexity
537
4.1. Logical indices Logical indices are related to the number of properties involved into the definition of a system and, consequently, to the number of subsystems involved. To proceed further we need to introduce the following nomenclature. • Number of properties N Pn used for the definition of a system at the decomposition level n . Then a system can be defined by a decomposition instantiating n properties and
N Pn = n . •
Number of subsystems N Sn in the decomposition n (number of vertices):
N Sn = N Pn + 1 = n + 1 Substantially it is given by the number of subsystems defined into the decomposition procedure; it is greater than the number of properties because includes the “environment” subsystem.
4.2. Relational indices The relational indices involve the number of relations and help to investigate some structural properties of the systems. Some useful indices are: • Number of relations N Rn between the subsystems involved in a specific decomposition n (number of edges). We are not able to give a formula to evaluate this index because its values come from the number of surviving relations in each decomposition step. n • Number of reducible relations N RR between the subsystems involved in a specific decomposition step n ( h assumes values related to the hypothesized reductions of the direct relations between the subsystems Pkn and N kn of the decomposition step n ): n N RR = 4(n − 1) + h
•
(h = 0,1, 2) .
(1)
Number of interactions N In present in the decomposition n (number of cycles); for a graph it is usually named cyclomatic number γ n [4] and represents the number of cycles that are present in the incidence matrix of the decomposition n . If N Rn is the number of relations between the subsystems (number of edges), and N Sn is the number of subsystems (number of vertices), the number of interactions is:
N In = γ n = N Rn − N Sn + 2 = N Rn − n + 1 .
538
•
M.R. Abram
Number of simple interactions N In1 that are present in the decomposition n n is the number of (number of cycles involving only two subsystems); if N RS binary symmetric relations between the subsystems, it is given by
N In1 = •
n N RS . 2
n Number of non-simple interactions N IK that are present in the decomposition n (number of cycles involving more than two subsystems): n N IK = N In − N In1 .
•
Sparsity, in a decomposition n , is the proportion of active relations with reference to the maximum number of possible relations (see the subsequent definition of an index as regards the meaning of U Rn ): n N SP =
N Rn U Rn
=
N Rn N Sn ( N Sn − 1)
=
N Rn . n (n + 1)
Because we are not able to give an explicit formula for the computation of some n indices as N Rn , N RS and then N In1 , it may be convenient to compute the maximum values of some indices in order to evaluate their range of validity. Then we can consider the following values: • Maximum number of relations involved in a decomposition n (maximum number of edges):
U Rn = ( N Sn ) 2 − N Sn = N Sn ( N Sn − 1) = (n + 1)n •
Maximum number of reducible relations. It is the maximum number of relation involved by subsystems P and N in decomposition n (maximum number of edges to be eliminated): n U RK = 4(n − 1) + 2 = 4n − 2
•
Maximum number of interactions involved in a specific decomposition n (maximum number of cycles) which is, for N Rn = U Rn :
U In = U Rn − n + 1 = n (n + 1) − n + 1 = n 2 + 1 •
Maximum sparsity involved in a specific decomposition n : n U SP =
U Rn U Rn
=1
Decomposition of Systems and Complexity
539
4.3. Mean values Mean indices are related to the average values computed by means of logical and relational indices. The mean values are related to the number of subsystems involved in the decomposition step. • Mean number of relations. Mean number of relations for each subsystem (mean number of edges)
N Rn = •
N Rn . n +1
N In N Sn
=
N In n +1
Mean number of simple interactions. Mean number of simple interactions for each subsystem for a specific decomposition n (mean number of simple cycles).
N In1 = •
N Sn
=
Mean number of interactions. Mean number of interactions for each subsystem for a specific decomposition n (mean number of cycles).
N In = •
N Rn
N In1 N Sn
=
N In1 n +1
Mean number of non-simple interactions. Mean number of non-simple interactions for each subsystem for a specific decomposition n (mean number of non-simple cycles). n N IK =
n N IK
N Sn
=
n N IK n +1
5. Simulation In order to evaluate the numerical values of relational indices it is necessary to use computer simulations. For this reason the decomposition process was implemented through a Matlab software. The hypotheses about the decomposition process influence the evolution of the decomposition itself. The reduction of number of relations in defining a subsystem can be performed in the following different ways:
540
M.R. Abram
0
0 5
5
10
10
15
15
20
20
25
25
30
30
35
35
40
40
45
45 50
50 0
10
20
30 nz = 724
40
50
0
10
20
30 nz = 278
40
50
Figure 2. Matrix of relations (left) and matrix of binary relations of simple interactions (right).
• • • • •
The direct relations between the new subsystems of the last decomposition step are not reduced (h = 0 in equation (1)). One relation of the two direct relations between the subsystems in the decomposition step may be reduced (h = 1 in equation (1)). The two direct relations in the decomposition step may be reduced (h = 2 in equation (1)). The maximum number of relations to be reduced for each decomposition step is fixed. The number of relations to be reduced is in accordance with a defined reduction law.
These possibilities were implemented in our simulations and the identification of the relations to be eliminated was given by a random choice (with uniform distribution) as regards: • The number k of the subsystem S kn to be decomposed into two subsystem Pkn +1 and N kn +1 . • The choice of the relations to be eliminated between the relations involving the two subsystems Pkn +1 and N kn +1 . As an example we report the simulation of a case with 50 properties (then 51 subsystems will be present at the end of the decomposition). The figures show the evolution of the previously defined indices and give an idea of the trend of larger decomposition processes. In particular in figure 2 the matrices of relations
Decomposition of Systems and Complexity 1
50 Decomposed subsystem Properties
45
0.9
40
0.8
35
0.7
30
0.6
25
0.5
20
0.4
15
0.3
10
0.2
5
0.1
0
541
0
5
10
15
20 25 30 Decomposition
35
40
45
50
0
0
5
10
15
20 25 30 Decomposition
35
40
45
50
Figure 3. Sequence of decompositions (left) and sparsity index (right) vs. decomposition index n.
and of simple interactions at the end of the decomposition process are shown. Figure 3 depicts the random choices of the subsystem to decompose and the evolution of sparsity index. The evolution of relation and interaction indices vs. decomposition index n are reported in figure 4; the values of indices, shown in logarithmic scales, are compared with the maximum number of properties U Pn , the maximum number of relations U Rn and the maximum number of reducible n relations U RR . In figure 5 are reported the mean values of some indices for each subsystem referred to the maximum numbers as before. It is evident the importance and the role of the reduction law, the choice of which influences the evolution of the process toward a manageable number of relations or an excessive amount of binary relations involved. In our simulation the reduction law gives the number of relation to n n eliminate N RE and has the form N RE = int(α n) , where we set the reduction coefficient α = 0.4 . The equation gives a value that is bounded by the effective number of the reducible relations between the two subsystems. The choice of the parameters in reduction law affects directly the sparsity index of the decomposition (figure 3). The results obtained from the previous simulations suggest some interesting considerations. They put into evidence the role of the observer, intended as the agent who drives the decomposition process. In previous numerical experiments the random choice of the subsystem to be decomposed can be interpreted as a simulation of the cognitive system of the observer [8]. Therefore the testing of different statistical distributions for the choice algorithm and its probabilistic modeling may give the possibility of investigating different cognitive strategies
542
M.R. Abram
Properties Relations (Maximum) Reducible Relations (Maximum) Relations (Total) Reducible Relations Eliminated Relations
3
10
Properties Relations (Maximum) Reducible Relations (Maximum) Interactions (Total) Simple Interactions Not Simple Interactions
3
10
2
2
10
1
10
10
1
10
0
0
10 0 10
10 Decomposition
1
10 0 10
1
10 Decomposition
Figure 4. Relations indices (left) and interactions indices (right) vs. decomposition index n. 2
2
10
10
Properties Relations (Maximum) Relations (Mean) Reducible Relations (Mean) Eliminated Relations (Mean)
Properties Relations (Maximum) Interactions (Mean) Simple Interactions (Mean) Non Simple Interactions (Mean)
1
1
10
10
0
0
10
10
-1
10
-1
0
10
1
10 Decomposition
10
0
10
1
10 Decomposition
Figure 5. Mean values of relations indices (left) and mean values of the interactions indices (right) vs. decomposition index n.
and consequently of evaluating the effects produced by different dynamics in the evolution of decomposition process.
6. Conclusions The previous findings suggest some considerations. • Implementing a decomposition process increases the description detail of systems and this gives an advantage in acquiring information about system properties; but we ask ourselves whether there is a limit beyond which the decomposition process is useless. • The decomposition process has a twofold meaning connected with the goal of decomposition itself; if we decompose a system to analyze it, we have a
Decomposition of Systems and Complexity
• •
•
•
•
543
description suited to grasp the properties of an existing system and the increasing complexity of decomposition coincides with the “analysis” complexity. On the contrary, in a design activity the decomposition is used to dominate the development of a project. Then the decomposition complexity is connected with the “project” complexity. The advantage of decomposition increases linearly with the number of subsystems but the number of relations increases in a quadratic way. It can be shown that when we increase the number of subsystems, as the latter is an increasing number of available functions, then the system performance can be higher and we can say that an increase of the decomposition level may increase the advantages due to the existence of subsystems. These advantages may increase linearly with the number of subsystems, but the costs for implementing subsystems themselves increase quickly in a quadratic way with the number of relations connecting them. This means that, beyond a given decomposition level, inevitably the costs will overcome the advantages. This implies that increasing the decomposition level is not always convenient. Then it is important to evaluate the cost/benefit ratio related to the number of subsystems and relations. The decomposition process is a rigorous methodology by means of which each subsystem is defined by a property strictly connected with the properties recalled in the previous decomposition steps. Each definition of a property then is important, because its activation into the decomposition process may duplicate the subsystem relations or reduce them. The definitions of indices of properties and subsystems show how each decomposition intrinsically gives rise to an environment subsystem defined through the properties complementary with respect to the ones of subsystems. This entails that whatever kind of decomposition implies the unavoidable introduction of an environment, whose properties are not absolute and objective, but defined in terms of hypotheses made by the system analyzer. It could be possible to describe the behavior of subsystems, independently from the nature of the elements, by resorting to concepts of information theory. This informational approach [5,7], mainly based on quantities such as mutual information, can be useful to evidence the presence of systemic features and therefore of the past occurrence of emergence processes giving rise to the system itself.
544
•
M.R. Abram
When an effective application must be developed and each relation is instantiated by a more detailed description, the definition of complexity must be specified and more specific and detailed definitions of complexity indices can be used. Then the complexity of a system is related to the complexity of the description adopted for that system. Further researches may be useful to go deeper into investigating the results coming from directed graphs theory, so as to find the possible connections of the latter with practical applications [3,6]. This may be advantageous when developing a software to manage the amount of available data and to evidence the structural properties of decomposition process.
•
Acknowledgments I would like to thank Prof. Eliano Pessa who provided information and useful discussion.
References 1. M.R. Abram, in Emergence in Complex, Cognitive, Social and Biological Systems, 2. 3. 4. 5. 6. 7. 8.
Ed. G. Minati and E. Pessa, (Kluwer Acadmic/Plenum Publishers, New York, 2002), pp. 103-116. M.R. Abram, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa, M. Abram, (Springer, New York, 2006), pp. 377-390. A.-L. Barabasi, in Handbook of Graphs and Networks, Ed. S. Bornholdt and H.G. Schuster, (Wiley-Vch, Weinheim, 2003), pp. 69-84. C. Berge, The theory of graphs (Dover, Mineola, NY, 2001). R.C. Conant, IEEE Trans. Sys. Man Cybernetics 6, 240 (1976), (reprinted in [6]). R. Diestel, Graph Theory, (3rd Ed.), (Springer-Verlag, Heidelberg, New York, 2005). G.J. Klir, Facets of Systems Science, (2nd Ed.), (Kluwer Academic/Plenum Publishers, New York, 2001). G. Minati, E. Pessa, Collective beings (Springer, New York, 2006).
HOW MANY STARS ARE THERE IN HEAVEN ? THE RESULTS OF A STUDY OF UNIVERSE IN THE LIGHT OF STABILITY THEORY
UMBERTO DI CAPRIO Stability Analysis s.r.l., Via A. Doria 48/A - 20124 Milano, Italy E-mail: [email protected] “Visible universe” is a spherical matter crust that rotates at a convenient speed around a central massive body which represents a black-hole. In addition it expands itself in all radial directions. Such structure was first postulated in 2004 and now is fully confirmed by experimental observations from the WMAP (Wilkinson Microwave Anisotropy Probe) released by NASA in March 2007. Using stability theory (ST) we explain present state and future evolution up to final reach of a stable dynamical equilibrium. We present a consistent set of closed form equations that determine basic quantities as radius, age, Hubble constant, mass, density and “missing mass”. At the end of the expansion the number of typical stars of visible universe will be equal to the Avogadro Number. Keywords: stability theory, structure of universe, emergence of stars.
List of symbols
G = 6.67258 × 10−11 Joule ⋅ m / Kg 2 gravitational constant; c = 1 ε 0 µ 0 = 2.99792458 × 108 m / s speed of light in empty space; ε 0 = 1/ 4π k permittivity in vacuum; k = 8987551788 = 1.6 × 10 −19 Joule ⋅ m / Coulomb 2 Coulomb constant; m0 = 9.1093897 × 10 −31 Kg electron mass; m p = 1.6726231× 10−27 Kg proton mass; q = 1.60217733 × 10 −19 Coulomb unitary charge; −34 h = 6.6260755 × 10 Joule ⋅ s Planck constant; µ0 = 4π × 10−7 Joule ⋅ s 2 / Coulomb2 ⋅ m magnetic permeability in vacuum; rB = 5.29177249 × 10−11 Bohr radius; H0 Hubble constant; α = 7.297353080 × 10 −3 fine structure constant; N A = 6.02213607 × 1023 Avogadro number; α mq = Gm p mq kq 2 = 4.406758406 × 10−40 pure number; 1 α mq = 2.269241715 × 1039 TSM = 3.7459739 × 1030 Typical Stellar Mass;
545
546
U. Di Caprio
γ r = (1 + 5 ) 2
d = d (t ) d 0 = d (t0 ) d f = d (t f ) = 0.5 t0 , t f M 0 = M (t0 ) , M f = M (t f ) M B , MG ρ 0 = ρ (t0 ) , ρ f = ρ (t f )
de-acceleration parameter; de-acceleration parameter, present value; de-acceleration parameter, final value; present time; final time; present mass, final mass; black-hole mass, visible galactic mass; present density, final density; ρc critical density; ρG , ρG galactic density, seeming galactic density; Ep , Ep Potential energy, equivalent Potential energy τ H = H 0−1 Hubble time; Hubble constant at time t ; H (t ) R0 = R (t 0 ) , R f = R (t f ) present radius, final radius; H (t ) R (t ) expansion speed; m0 , m p , q, α are invariant; G, k , µ0 , ε 0 , c vary with time; 2 2 G (t ) c (t ) = const = G c = 7.424257637 × 10 −28 . Note: In order to avoid confusion we use symbol d0 (rather than usual q0) to designate the de-acceleration parameter.
1. Introduction On March 2007 NASA has released a suggestive imago of the whole universe as “viewed” by the WMAP (Wilkinson Microwave Anisotropy Probe). This exceptional result is the culmination of years of practical and theoretical research primarily based upon observations by the spatial telescope Hubble, from 1995 on, and by interconnected observatories disseminated on earth. A first sensational synthesis was made known on October 2003 and successive months, giving numerical estimates of fundamental quantities as “age of Universe”, geometric form, radius, density of matter, expansion rate with time, birth of galaxies, Hubble constant. Such data deeply modified our knowledge of Universe and put in crisis the majority of existing cosmological theories. A first innovative study to cope with this new situation was presented at the 2004 AIRS Congress of Castel Ivano, Trento (I) and published by Springer in 2006 [5]. Here we illustrate further developments in the light of the most recent acquisitions. • The start point is that (cfr. with [5]) visible Universe has a finite extension and a spheroidal form. This undoubtedly means that Universe has a
How Many Stars are there in Heaven? The Results of a Study of Universe …
•
•
•
•
547
geometric center and a center of mass (with Cosmological Principle permission). The two properties must agree. We propose an original approach based on stability theory (ST) and on the equivalence Potential energy/mass (in extreme synthesis special relativity SR). The closed and spheroidal form is explained by a two-body structure, in which visible Universe consists of a matter crust that rotates around a central black-hole and simultaneously undergoes radial expansion. The Newton attraction force is counterbalanced by the centrifugal force. Of course this is a mathematical model only! However it proves to be effective in general and, in particular, for explanation of the missing mass problem: in the two-body problem the coupling Potential energy (of gravitational nature) is negative and since this energy is equivalent to mass, visible mass is only a fraction of the effective mass. This fraction grows in time, due to the fact that potential energy is inversely proportional to expansion radius and, by consequence, the absolute value of the energy in question decreases with time. A stabilization is got when expansion stops, i.e. when expansion speed becomes equal to zero. By application of Relativistic Stability Theory (RST) we establish a direct relation between the mass MB of the central black-hole and the final radius Rf of visible Universe at the end of the expansion. On the other hand the value of Rf is autonomously identified by a quantum gravitational condition that resents the correspondent of the classical Bohr condition in the hydrogen atom. Consequently the relation between MB and Rf allows us to determine MB (i.e. the black-hole mass) from Rf (i.e. the Universe final radius). A fundamental scale factor links together gravitational quantization and electromagnetic quantization . Such factor is defined by the ratio between electric force and gravitational force on the rotating electron in hydrogen and is in the order 1039 m. Multiplying the radius of Compton (about 10-13 m) by 1039 we obtain a radius 1026, which is in the order of the observed experimental value of the radius of Universe (however, in subsequent illustration we give precise figures). The preceding similarity is in itself of a “static” nature as it does not account for expansion of the radius (in Universe and not in hydrogen). The seeming gap is overcome by taking account of the de-acceleration parameter d of classical cosmology (e.g. see Weinberg). We give a formula that connects present value of the radius R0 to final value Rf via the present value d0. The latter is determined so that two basic physical
548
•
U. Di Caprio
properties are simultaneously verified: the present age of Universe is 13.7 billions years and the missing mass is about 97%. In parallel we show the final value of the de-acceleration parameter is df = 0.5 and that the corresponding “age” is tf = 40.043 billions years while the final missing mass is 0%. All in all it is possible to determine closed form equations for defining the final mass and density of universe inasmuch as the transients that lead from present values to final ones. An astonishing finding is that the total mass (visible Universe plus central black-holes) is equal to the mass of a NA typical stars (e.g. see Weinberg), with NA the Avogadro number.
2. Closed form equations of universe A) We postulate similarity between Universe and the hydrogen atom via the adimensional scale factor
α mq = (Gm p mq kq 2 ) = 4.406758406 × 10−40 . The present value H0 of the Hubble constant and the final mass of visible Universe are defined by
H 0 = τ H−1 ;
τ H−1 =
c
α
4/ 3
rB
α mq = (5.669546974 × 1017 ) −1 s −1
(1)
ρ f = ρ c = (3 8π G ) H 02 = 5.565321257 × 10 −27 Kg / m 3 ( ρ c critical density) (2) Above equations implicate
G m p m0 1 1 ρ f = ρ c = 6π 2 c µ 0 q 2 rB α 4 3
2
Final radius R f is given by R f = α rB 2α mq = 4.381444219 × 10 26 m . Mass M B of central black-hole is derivable from (2G c 2 ) M B = R f , which results in M B = 2.950762509 × 1053 Kg . In parallel the final mass of visible Universe is M f = ρ f (4π 3) R 3f = 19.60788 × 1053 Kg . And then the total Universe mass is M tot = M B + M f = 2.255864 × 1054 Kg and satisfies the relation
M tot = N A TSM ; ( N A Avogadro number; TSM typical stellar mass )
(3)
How Many Stars are there in Heaven? The Results of a Study of Universe …
h c TSM = 2π G
32
m 2p
γ r3
549
32
m 3 1+ 0 4π mp
= 3.7459739 × 1030
Kg .
B) Universe age is determined from the equation
t 0 = τ H (1 − 2d 0 ) −1 − 2d 0 (1 − d 0 ) −3 2 cosh −1
with
1 −1 d0
(4)
τ H = H 0−1 = 5.66946974 × 1017 s , γ r = (1 + 5 ) 2 = 1.61833989 , d 0 = d 0 (t 0 ) ; d = − RR R 2 de-acceleration parameter.
Equation (4) represents a significant extension of a noted Weinberg formula referring to the classical analysis of the cosmological problem (i.e. without assuming a two-body structure and, then, without considering rotation of visible Universe). The extension is represented by the factor 1/γr. Imposing that t0 ≈ 13.7 billion years (cfr. with experiments), we find d0 ≅ 0.2342 . Such value turns out equal to the positive solution of the algebraic equation
d0 1 − 2d 0
= 0.5 − 1 − 2d 0 ;
b0 = 0.2452 ≈
1 . 1.5e
(5)
C) The Universe radius at time t ≥ t0 is individualized by equation
R(t ) =
(
)
(
α rB 0.5 − b0 1 − 2d (t ) = R f 1 − 2b0 1 − 2d (t ) α mq
which, in particular, gives
R0 =
d0 α rB = 2.8148 × 10 26 m α mq 1 − 2d 0
)
(6)
(7)
The corresponding value of the density ρ 0 satisfies the equation ρ 0 = 2d 0 ρ c = 2d 0 ρ f and then is numerically equal to ρ 0 = 2.6068 × 10 −27 kg/m3. Consequently present value of mass of visible Universe is M 0 = ρ 0 (4π 3) R03 = 2.435 × 1053 Kg . Note that M 0 has the same order of magnitude of M B (black-hole mass) and contains about 1080 protons, in agreement with most reliable estimates in the literature. The following meaningful relations come into evidence
550
U. Di Caprio
R0 2d 0 = = 0.6424 ; Rf 1 − 2d 0
M0 R = 2d 0 0 Mf Rf
3
= 0.1262 =
1 7.925
They put into dramatic evidence the problem of missing mass and simultaneously point out that the mass deficit disappears at t = tf since then d → df = 0.5 . Remark 1. The Hubble time τ H defined by equation (1) is proportional, via the cosmological factor 1/αmq , to travel time of light when light crosses the atom of hydrogen. In fact radius α 4/3 rB is but the average distance of the electron from the proton in hydrogen, as pointed out by the relation
α m0 c 2 (h m0 c 2 ) 3 = m0 c 2 (α 4 3 rB ) 3 . 3. Equations of future expansion The expansion constant K0 can be computed from the equation
d0
K0 =
2
α1 3
c 2 ≈ 1.4579 c 2 expansion constant
(8)
which derives from the Weinberg K0 = (R0H0)2 (1 – 2 d0) and from our equation (see preceding analysis)
R0 ⋅ H 0 =
α rB d0 d0 c c ⋅ α mq = α mq 1 − 2d 0 α 4 3 rB 1 − 2d 0 α 1 3
On the other hand it must be [5]
R 2 (t 0 ) c2
=
K0 c2
− ϕ B0 − 1 −
R 2 (t 0 ) c2
ϕ0
with
ϕ B0 =
GM 0 GM B ≈ 0.7782 ; ϕ 0 = ≈ 0.6423 2 R0 c R0 c 2
Consequently
R 2 (t 0 ) K0 1 = − ϕ 0 − ϕ B 0 ≈ 0.1043 2 1 − ϕ0 c 2 c R 2 (t0 ) = 0.1043 c = 0.323 c = 9.682 × 107 m / s
(9)
How Many Stars are there in Heaven? The Results of a Study of Universe …
551
This gives present time expansion speed. In addition we take into account rotation (round central black-hole). The corresponding equations can be derived as follows: mass M of visible Universe satisfies the differential equation
γ rot M R = − which leads to
−R
GM MB R2
+
2 (γ rot M )ν rot R
R 1 G M B ν 2 c2 = − 2 R γ rot c 2 R c R2
i.e.
R2
d
c
2
=
2 ϕ B ν rot − 2 γ rot c
with ϕ B =
GMB
(10)
Rc 2
2 With our values we get 0.2342 × 0.1043 = (1 γ rot ) 0.7782 −ν rot c 2 wherefrom γ rot ≈ 1.438 ν rot c ≈ 0.7186 . Equation (10) allows us to determine in closed form the final rotation speed. In fact as R (t f ) = 0 and ϕ B (t f ) = GM B R f c 2 = 1 2 then 2 γ rot (t f )ν rot (t f )
c
2
=
1 2
→ γ rot (t f ) =
1 + 17 4
→
ν rot (t f ) c
= 0.6249 .
A further important step completes definition of the initial state. Using the eq.
R(t 0 ) = −d (t 0 )
R 2 (t 0 ) = −7.7128 ×10 −12 m / s R (t 0 )
we determine the initial acceleration. On the other hand we know the final values of radius, expansion speed, expansion acceleration and de-acceleration parameter. Therefore, by a convenient polynomial interpolation we can “reconstruct” future transients. We use the following
R (t ) t 1− = b1 Rf tf
1 3
+ b2
t tf
2 3
+ b3
t tf
+ b4
t tf
4 3
+ b5
t tf
5 3
+ b6
t
2
tf
where the bi’s are convenient numerical coefficients so that the boundary conditions (at t = t0 and at t = tf ) remain satisfied. An additional condition directly involves the de-acceleration parameter d(t) and is derived from eq. (6):
552
U. Di Caprio
1 − 2d (t ) =
1 R (t ) . 1− 2b0 Rf
4. Energy and missing mass The coupling Potential energy is defined by
E p (t ) = −
G M B M (t ) . R (t )
It is
E p (t 0 ) c
2
=−
G MB M0 ; R0 c 2
E p (t f ) c
2
=−
Mf GMB Mf =− 2 2 Rf c
G M B G M B Rf 1 Rf = = = 0.7749 R0 c 2 R f c 2 R0 2 R0
(11) (12)
Universe total energy is constant and equal to present value E(t0)
E (t 0 ) = M B c 2 + γ rot (t 0 ) M 0 c 2 + E p (t 0 ) ≈ 3.878 × 10 70 J E p (t0 ) = 1 −
R 2 (t0 ) c
2
E p (t0 ) = − 1 −
R 2 (t0 ) GM 0 M B ≈ −1.77 ×1070 J 2 R c 0
The experimental galactic density ρG is bonded to the critical density ρc by the numerical relationship (Weinberg) ρG ≈ 0.028 ρc. In the light of our formulation we give the following theoretical explanation. At present time
E p (t 0 ) 5 ρ G (t 0 ) = 1+ − (d f − d 0 ) = 1 − 0.7749 − 0.1656 = 0.05947 ρ0 M 0 c2 8 with df = 0.5
ρG ρ = 2d 0 G = 2 × 0.2349 × 0.05947 = 0.02794 . ρc ρ0
(13)
Hence E p (t0 ) represents the obscure energy; (5 8)(d f − d 0 ) represents the dark matter. The first eats 77.49% of present density; the second 16.56% . Finally Visible density amounts to 5.94% . Also, with reference to critical density (rather than to effective density) it is
ρ G (t0 ) = 0.02794 = 1 − 0.7749 − 0.1972 ρc
How Many Stars are there in Heaven? The Results of a Study of Universe …
553
We assume that at any future time the visible galactic density is determined by
E p (t ) ρ G (t ) 5 = 1+ − (d f − d (t )) 2 ρ 0 (t ) M (t ) c (t ) 8 In particular
E p (t f ) ρ G (t f ) ρ G (t f ) = =1+ + 0 = 50% ρ (t f ) ρc M f c 2f
which means that final galactic mass is half total universe mass.
5. Duration of expansion Universe total energy is constant (by assumption) while the Potential energy and the kinetic energy vary with time. Consequently the speed of light must vary and, indeed, as shown in [1] this is a general feature of Universe evolution from time zero. In parallel the gravitational constant varies as well, so that
G (t ) c 2 (t ) = G c 2 = const . It must be E(t) = const = E(t0) = 3.878×1070 J with
E (t ) = M B c 2 (t ) + γ rot (t ) M (t ) c 2 (t ) + E p (t ) E p (t ) = 1 −
R 2 (t ) c2
E p (t ) = − 1 −
R 2 (t ) G (t ) M (t ) M B R (t ) c2
In particular
E (t f ) = ( M f + M B )c 2f −
Gf M f MB Rf
with cf and Gf “final values” of the speed of light in empty space and of the gravitational constant. As
Gf M f MB Rf
=
GM B 1 M f c 2f = M f c 2f 2 R f c2
it follows from above equations that
E (t f ) = ≈
Mf 2
+ M B c 2f
19.60788 + 2.95076 ×1053 c 2f ≅ (12.75 ×10 53 ) c 2f Kg 2
554
U. Di Caprio
rotation
Galaxies
Black Hole R0
Figure 1. Universe imago from WMAP (left). Two body dynamic structure of Universe (right).
and, in the final analysis, it must be
15.5 × 1053 Kg × c 2f = E (t0 ) ≈ 3.878 × 1070 Joule c 2f = 2.57 × 1016 = 0.279 c 2 → c f = 0.5275 c The total expansion time tf is given by tf = τH (2/γr)(c/cf) ≈ 42.9 billion years. Such formula can be derived from (4) d0 with df (df = 0.5), t0 with tf , and introducing the factor (c/cf). Note that this corresponds to replace c with cf in formula (1) (that gives the present value of the Hubble constant). In other words the Hubble constant too varies with time and its final value Hf is smaller than present value H0. Another point is worthy mentioning: even in the final state visible mass Mf is a fraction (one half) of the existing one, since the remaining part is submerged by the residual obscure energy owing to the central blackhole. Hence we can define an “effective de-acceleration factor”
d eff = d f (2M f M f ) = 0.5 × 2 = 1 . If we further added the black-hole mass we would find
d eff = d f
2M f + M B Mf
= df 2+
MB = 1.075 Mf
in substantial agreement with noted experimental measurements.
6. Variation of the speed of light The following equation gives the time evolution of the speed of light in empty space
How Many Stars are there in Heaven? The Results of a Study of Universe …
c 2 (t ) =
1 R 2 (t ) M B + M (t ) 1 − 0.5 2 c (t )
E p (t ) = − 1 −
[ E (t0 ) − E p (t )]
555
(20)
R 2 (t ) G M (t ) M B ; E (t 0 ) = 3.878 × 10 70 J R (t ) c 2 (t )
The derivation of (20) from preceding equations is conceptually complex and passes through computation of the ratio R 2 (t ) c 2 (t ) in an autonomous way.
7. Temperature The following formula proves to be effective in computation of temperature at various stages of evolution,
9 1 Θ (t ) = 25 N A
4 3
M 0 c 2 rB R gas R(t )
(21)
where NA is the Avogadro number, Rgas the gas constant 8.31451 j/°K, rB the Bohr radius. With our numerical value it is
Θ (t ) =
7.12664 × 10 26 m° K R (t )
(22)
At time t ≈ 380000 years when matter dominate era began (e.g. see [6], [7]) radius was
R(t m ) = 6.327 × 10 −4 R0 = 1.7809 × 10 23 m
(23)
hence
Θ (t m ) =
7.12664 × 10 23 m ° K 1.7809 × 10 23 m
= 4000 ° K
(24)
Such temperature perfectly agrees with the theoretical value pointed out in [17]. At time ts when visible universe came out the region of influence of black-hole radius was half the final radius Rf (because of physical constraints deriving from stability [6]
R(t s ) = R f 2 = 2.1907 × 1026 m hence
(25)
556
U. Di Caprio Black Hole (MB)
Region of Instability
Region of Stability
Rs
Rss Rs Radius of Instability Rss Radius of Stabilization
Figure 2. Regions of influence of the black-hole.
Θ (t s ) =
7.12664 × 10 26 m ° K 2.1907 × 1026 m
= 3.253 ° K
(26)
Such value is in excellent agreement with the experimental value of temperature of fossil radiation. Using again formula (22) we can derive a reliable estimate of present temperature and of final temperature at the end of expansion. We find
Θ (t 0 ) =
7.12664 ×10 26 m ° K 2.8269 × 10 26 m
= 2.521 ° K
Θ (t f ) = Θ (t s ) 2 = 1.6265 ° K
(27) (28)
Remark 2. We have already seen that NA equals the number of typical stars contained in the whole Universe. The same number appears in formula (21) and that should not be considered a mere coincidence. 8. Diagrams We report a selected set of transients referring to evolution in (t0,tf): radius, radial speed, de-acceleration parameter, visible mass, density (real and visible), speed of light in empty space, gravitational constant. From them one can reascend to correlated quantities such as, e.g., permittivity in empty space (while
How Many Stars are there in Heaven? The Results of a Study of Universe … 11
1.1
557
x 10 7
10 9 8
Radial Speed
Normalized Radius
1
0.9
0.8
0.7
7 6 5 4 3 2
0.6
1 0
0.5
15
20
25
30
35
40
Time [Billion Years]
15
20
25
30
35
40
Time [Billion Years]
(a)
x 10 54
6
(b)
x 10 -27
2 5.5
1.8
5
1.6
4.5
Density
Visible Mass
1.4 1.2 1
4 3.5 3
0.8
2.5
0.6 0.4
2
0.2
1.5
0
15
20
25
30
35
3.5
1
40
Time [Billion Years]
15
20
25
30
35
40
Time [Billion Years]
(c)
x 10 8
(d)
x 10 -11
Gravitational Constant
Speed of Light
7
3
2.5
2
15
20
25
30
Time [Billion Years]
35
6
5
4
3
40
(e)
15
20
25
30
Time [Billion Years]
35
40
(f )
Figure 3. Two body dynamic structure of Universe. (a) Normalized radius vs. time; (b) Radial speed vs. time; (c) Visible mass vs. time; (d) Density vs. time; (e) Speed of light vs. time; (f ) Gravitational constant vs. time
the magnetic permeability is kept constant), rotation speed around the central black-hole, temperature.
9. Recapitolatory description of mathematical structure Our formulation of the cosmological problem is complex and innovative. We think it useful to frame it into a mathematical scheme. • First we have presented 6 equations for computation of H0 , ρc , ρf , Rf , Mf , MB (i.e. the Hubble constant, the critical density, the final density, the
558
•
•
•
•
•
U. Di Caprio
final radius, the final mass, the mass of central black-hole). Then, using 4 more equations we have shown that from above quantities we can derive present values of t0 , ρ0 , R0 , M0 (age, density, radius, Mass), provided that we know t0 (i.e. present value of the de-acceleration parameter). Since d0 is not known directly while t0 and R0 are measured quantities, we went back to a double estimate of d0. The estimates agreed and allowed us to compute ρ0 , R0 , M0 . A further coherency check has been provided by the temperature formula (the eleventh equation) which establishes a bond between temperature and radius. Such formula yields the correct value of Pezias fossil radiation (by the assumption that the corresponding temperature is that at the time when visible Universe firstly “went out” the region of influence of black-hole). In addition the same formula allows us to correlate temperature 4000 °K at the beginning of the “matter dominated era” (cfr. with Weinberg) with a radius R(tm) = 7.12664×1023 m from which, using a Weinberg’s further condition, we go back present radius R0 ≈ 2.8148×1026 m, in full agreement once more with experimental findings. A thirteenth equation determines K0 (expansion constant) from d0 ; and a further 14-th equation leads from K0 to R(t0 ) , present value of expansion speed. Knowledge of R(t0 ) (and of R0, M0, MB) allowed us to determine the Potential Energy E p (t0 ) and the total Energy E(t0) = 3.878×1070 J. In addition we were able of computing visible galactic density as a percentage of the effective density. At this stage we had used 18 equations. By means of polynomial interpolation we have “reconstructed” future transients of Universe expansion. The picture includes the variations of the speed of light and of the gravitational constant. The typical stellar mass is that identified by a noted formula pointed out by Weinberg, with a very marginal correction to account for relativistic stability.
10. Conclusions Using Stability Theory (ST), SR and a principle of similarity between microcosm and macrocosm we have set forward closed form equations that give a complete picture of Universe present structure and state as well us of future evolution up to final Equilibrium at the end of expansion. We found several striking results. • At the end of expansion Universe will contain a number of stars equal to the Avogadro number. That should be correlated to the property that all in all
How Many Stars are there in Heaven? The Results of a Study of Universe …
•
•
•
•
•
•
•
559
Universe is similar to an ideal gas. Stars are gas bubbles each of which represents a particle. Universe has a two-body structure similar to that of the atom of hydrogen and, in particular, its final radius is univocally determined by a quantization rule which is the correspondent of the noted Bohr rule. Final radius is about 40 billion light-years. While in hydrogen the coupling energy is electric, in Universe the coupling energy is gravitational. The scale factor is equal to about 1039 and determines both radius and Hubble constant (which amounts to about 54 MegaParsec/Km/s). The Hubble constant however varies with time and in future becomes smaller. From measured value of age, which is about 13.7 billion years, we have determined (via a closed form equation) present value of the de-acceleration parameter d0 = 0.2349. The final value at the end of expansion is df = 0.5 . Present value of expansion speed is 0.968×108 m/s. Final value is zero. Present value of rotation speed is 2.154×108 m/s and the corresponding relativistic coefficient is γrot = 1.434. The final value is γrot = 1.2807 (emigolden value). Two basic quantities keep constant in Universe evolution: total energy E0 = 3.878×1070 J and ratio G(t)/c2(t) = G/c2. As shown in [5] such properties are “verified” from time zero (big bang). However, both the speed of light and the gravitational constant vary with time. Final value of the speed of light is 0.5275 present value c. Time tf , i.e. total duration of expansion from time zero, is equal to 42.9 billion years. This means that within 29.2 billion years Universe will reach its dynamical Equilibrium. The formula that gives tf is the same that gives t0, provided that d0 is replaced with df and c with cf . Dark energy is determined by the gravitational coupling between visible Universe and central black-hole. In addition we put into evidence a hidden energy linked together with the variation of the de-acceleration parameter. As energy is equivalent to mass, both contribute to reduction of visible mass respect to effective mass and to reduction of visible galactic density. Dark energy “eats” 16.5% of density; hidden energy “eats” 77.5%. All in all visible density is 5.9% the effective density (and 2.79% the critical density), in agreement with noted experiments. (Note that present density is smaller than critical density). Ratio between final mass Mf and present mass M0 is however smaller and equal to about 7.92 .
560
•
U. Di Caprio
In the final Equilibrium, at time tf , a residual dark energy persists (while hidden energy becomes zero) owing to the residual gravitational coupling between rotating Universe and black-hole. This residual dark energy is equivalent to a mass Mf /2. Consequently the total Universe mass is (Mf + MB) and the effective value of the de-acceleration parameter is about 1.075, in agreement with cosmic observations.
Above results have both practical and conceptual relevance. They revolutionize standard cosmological models and show the effectiveness of the theory of Relativistic Stability [6], furthermore framing GR and SR into an entirely new context.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
G. Arcidiacono, Relatività e Cosmologia (Veschi, Roma, 1973). H. Bondi, Cosmology (Cambridge University Press, Cambridge, UK, 1961). U. Di Caprio, Supplement to Hadronic J. 16(1), 163-182 (2001). U. Di Caprio, Hadronic J. 23, 689 (2000). U. Di Caprio, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa, M. Abram, (Springer, New York, 2006). U. Di Caprio, Relativistic Stability, AIRS Congress, Castel Ivano, Trento (2007). R.H, Dicke, in Relativity, Groups and Topology, C. De Witt and B. De Witt, Eds., (Gordon and Breach, New York, 1964). A.D. Dolgov and Ya. B. Zeldovich, Rev. of Modern Physics 53, 1-41 (1981). G.C. Macvittie, General Relativity and Cosmology (Chapman & Hall, London, UK, 1965). E.A. Milne, Relativity, Gravitation and World Structure (Clarendon Press, Oxford, UK, 1935). J.D. North, The Measure of the Universe (Oxford University Press, Oxford, UK, 1952). H.P. Robertson and T.W. Noonan, Relativity and Cosmology (Saunders, Philadephia, 1968). M.P. Ryan and L.C. Shepley, Homogeneous Relativistic Cosmologies (Princeton University Press, Princeton, NJ, 1975). D.W. Sciama, Modern Cosmology (Cambridge University Press, Cambridge, UK, 1971). D.N. Schramm and G. Steigman, Scientific American, 6 (1988). Universe today; http://www.universetoday.com. S. Weinberg, Gravitation and Cosmology (John Wiley & Sons, New York, 1972).
DESCRIPTION OF A COMPLEX SYSTEM THROUGH RECURSIVE FUNCTIONS
GUIDO MASSA FINOLI Vicolo Arno, 2 - Silvi Marina (Teramo) E-mail: [email protected] Starting from an enough shared definition of “complex system”, we describe the hierarchies of levels and the emergent phenomena within a system, by resorting to concepts of measure and measure invariance. Through recursive functions, we introduce a mathematical representation allowing to show how symmetries, hierarchical structures and emergent properties take place. Keywords: complex systems, hierarchical levels, measure, measure invariance, recursive functions, symmetries.
1. Introduction The definition of complex system is still the subject of a number of discussions, anyway it is possible to base it on the following main features: 1. a complex system can be defined in terms of its elements and of interactions between these elements and with external environment but, except the simple (linear and classical) systems, it is never completely separable from the latter. We could say that the relationship with its environment is a crucial feature of its complexity. 2. This relationship cannot be expressed in terms of or reduced to linear functions; it emphasizes one of the characterizing aspects of these systems, that is their contextuality, i.e. the complete inseparability from their environment. This contextuality is also linked to the measure operation which, unlike the classical systems, can here influence the system itself [2,3,5,20,21]. 3. The relationships with the environment include both relationships among elements of the same level and relationships with the elements of lower and higher levels [6,7]. Therefore, if we denote with ei the i-th element of the K-th hierarchical level of a given system and with ℜ the relation it has with its environment Ai , we have that the system could be described by an expression like:
e ℜ( Ai ) i i
561
(1)
562
4.
G. Massa Finoli
The latter formally defines a system as an aggregate of elements, taking into account that ℜ includes both relationships with other elements of the K-th level and relationships with ( K − 1) -th and ( K + 1) -th levels [18,19]. If (1) behaves like a single system, it means it assumes some properties which are not attributed to those of the components and these emergent properties can be recognized within a reference system which isn’t that one of the components [14,24].
A complex system generally is a system which has a level leap and entails emergent properties. But there can be both systems with emergent properties without level leaps and systems with level leaps without emergent properties. 2. Measure systems and hierarchical levels One of the most important aspects of contemporary physics is the theory of measure, and the comprehension of the emergent phenomena goes through this theory even if redefined according to the necessary specificities of the complexity theory [1,12,17]. We begin with a reference system definition which isn’t connected with the spatio-temporal dimensions. In general, a measure is the recursive application of a unit value µ and this application defines a group structure. This involves a series of consequences: the first being that the lowest possible measure always exists, and it is the generator of the considered group. The second is that all obtainable measures are µ multiples. The third is that a measure is such only if it ends in its recursion, a measure that doesn’t end isn’t a measure. The existence of a possible lowest value of the measure is an aspect of great importance connected with the indetermination and computation concepts. We will use the definition according to which the measure is equivalent to a Turing machine that stops, so a measure is a computable entity but with a sure stop [24]. Starting from this presupposition it will always exist, once given a measure system, a not measurable aspect related to all infinite values of system variables lesser then µ ; we deal here with a continuous and uncountable infinity of these values. The consequence is that for them there isn’t the possibility to establish in a deterministic way the evolution of the system on the base of the initial measures [25,26,27]. But we emphasize another aspect: the system, whose variables have values obtained as multiples of the lowest generator value, implies an infinite series of possible values-elements that we can identify with the elements of a level. For
Description of a Complex System through Recursive Functions
563
this level any employed measure multiple of µ (nµ = k ) generates Z k congruence classes. So all the possible measure systems generable with nµ are invariant among them, it means that any combination of the elements of a level can be always expressed through a linear combination of a basic element; therefore the use of a measure system for the elements of a level doesn’t generate any change in the structure of that level elements. Our definition of hierarchical level includes the concept of measure invariance. So we say that the elements e1 en belong to the K-th level if the measures are invariant as regards the measure systems µ1 µ k , where such measure systems are all multiples of a µ generator system. Invariance means that all the elements of a measure system can be expressed as linear combinations of elements of another system, until all can be expressed as multiples of only one basic element. So a level leap, i.e. the passage from the elements of K-th level to the elements of ( K + 1) -th level, occurs when it isn’t preserved the measure invariance, i.e. the elements of ( K + 1) -th level cannot be expressed as linear combinations of the elements of K-th level. The relationship between the basic element of K-th level and that one of ( K + 1) -th level cannot be expressed through a rational value but it is expressed through a transcendental value. After all the passage from K-th level to ( K + 1) -th level cannot be reduced to a finite computation, i.e. the system as a whole will never be reducible in terms of its components [21]. 3. Recursiveness as expression of contextuality One of the most intricate and unsolved aspects is related to the problem of representing the contextuality of complex systems. In turn, it reduces to the problem of finding a way in order to represent the ℜ relationship. It is possible, for instance, to introduce this aspect as a noise element on the system. This is the point of view adopted by the “dynamical physics” framework proposed by researchers of the Flinders University (FU) [15,16]. It is summarized in the following equation defining the K-th level of a system:
Bij → Bij − α ( B − B −1 ) + ωij where Bij are nodes which are generated in conformity with the quoted relation. But this proposal entails a number of issues that don’t make it satisfying. 1. The impossibility to explicit the complex relationship with the environment is reduced to introduce a more or less stochastic term which should interfere
564
G. Massa Finoli
with the regular development of the system, denoted as ωij .This is
2.
3.
4.
unsatisfying because in this way the relationship ℜ with Ai is viewed as an interference on the system, while ℜ is the constitutive reason of the same system. A living organism is such since there is a given relationship with the environment and this relationship is an integrating part of the same organism. The second shortcoming arises from the fact that this approach considers a hierarchical level on a par with a whole system, with the same equation, but enclosed in a node B, and so on for the other nodes. This kind of solution defines the levels hierarchies through an almost banal relationship (a simple matrioska) and puts moreover some unsolved issues. If in a node there is an entire system, the problem is where ω takes its origin for that structure from and why the structure present in a node cannot interfere with the one of another node. Namely their relationship only occurs at the node level at the high level and not among the structures of the constituent level. The idea of the “monad” of Leibnitz appears as not suitable because in this way the complexity of the levels seems to be the same, in a sort of specular reflection. One of the greater anomalies in the FU proposal is that it doesn’t produce evident and stable symmetries, while the atomic physics of the second half of XX century emphasizes that the presence and the study of symmetries are the key to understand such complex and articulated phenomena. The FU approach, also starting from that one of QED and of QCD, in reality loses later each conservative and symmetric aspect of these theories because it introduces the ω factor which doesn’t allow any kind of stable symmetry. Another limitation is that there can be only one general formula in order to express the intricate jungle of the complex phenomena. The specific law of development, which is particular for that phenomenon and classes of similar phenomena, cannot be confused with the general conditions (or law) that are respected by all phenomena. After all there can be many laws and patterns for complex systems even if all must respect some general conditions that make them so.
In a complex phenomenon the relationship ℜ of contextuality cannot be separated from the relationships among the system elements, moreover there isn’t any possibility of separation among the measures carried out later on. The system is an interconnected whole, therefore it holds in some way the memory of its previous states. In this regard the recursive equations are one of the typologies of equations that allows to find the variations of a relationship on the measured element. Moreover they are introduced as interconnected in their development.
Description of a Complex System through Recursive Functions
565
For example if W0 is an initial vector in the vector space V on the complex numbers, an application ϕ of a vector of V in another vector of V in a recursive way gives:
ϕ (Wn −1 ) = Wn from which
Wn−1 = ϕ n (W0 ) .
Moreover the recursive equations have a property which can be the right representation of a contextuality factor. In a previous essay we emphasized that the deep aspect revealed by the quantum physics is that in the physical world the phenomena show themselves in their states as intimately superposed, and this superposition is at the origin of the relationship between measure and measured, and in order to explain it we make a purely theoretical example [19]. In this regard let us suppose that we deal with a world in which there is a minimum possible value for the determination of a suitable physical quantity. This occurs because, owing to uncertainty principle, values lesser than this minimum are not accessible to a measure. In this way we deal with a sort of quantized universe, like the one postulated by Fredkin [9,10,11]. Let us denote this minimum value by Bk . If we denote the action of measure by r ( Bk ) , then it consists, for every value of the physical quantity under consideration, in finding how many times Bk is contained within this value. In turn this implies a recursive process of application of Bk . And this recursion entails a sort of superposition of the different states reached in the single steps of measure process. This suggests that only a recursive function could allow us to grasp those aspect of superposition that the complex systems have in their intimate structure and that show themselves in the patterns of contextuality. 4. A recursive pattern for complex systems We start from the hypothesis that a relation ℜ between a system and its environment can occur and that this relation can represent the whole system. Moreover let us suppose that it remains invariant during the system evolution, being representative of its intrinsic nature. On the other hand, when we take into consideration a measure operation acting on the system we must recognize that, owing to the previous arguments, it must characterized by a recursive process based on a minimum accessible value. This does not mean that the real number resulting from this process exhausts the
566
G. Massa Finoli
possibilities which the system is endowed with. Namely the non-measurable part, i.e. the one beyond the minimum accessible value, can have an influence on system dynamics, as occurs in quantum mechanics. In order to take into account this circumstance we represent all measures related to the system by introducing a vector of measures having complex components, in such a way that each real part represents the actual measured value, while the imaginary part represents the influence described before. In general terms we can therefore write for the measure vector W : W = M + ig
where M is the real part and g the coefficient of its imaginary part. The latter can also be interpreted as a sort of noise acting on real measure. Now we can introduce the concepts of local and global symmetry in the following way. Given a system within a spatio-temporal context (i.e. defined by 3 spatial coordinates and 1 temporal), we can speak of a global symmetry when the conditions defining the system are the same for each temporal instant. We instead speak of a local symmetry when there is a conditions preservation for a cyclical interval of time, i.e. the conditions, present in the interval ∆ t , are the same again for each interval n ∆ t (n integer). However inside the interval ∆ t we can have an asymmetry that spreads following a given law. This circumstance is particularly important and allows us to distinguish between systems with immediate interaction and systems with time linked interaction [1,12,17]. If we consider the system with reference to the relation (1), when a global symmetry is associated to ℜ , the latter is the same for each part of Ai . A local symmetry is a symmetry among the Ai -fields and inside this sphere there is an asymmetry, that is equivalent to a propagation of a local interaction related to the considered sphere [8]. 5. An elementary recursive function In order to illustrate the foregoing concepts we will start with a simple example in which we have an initial complex vector W1 given by [19]:
W1 = M 1 + ig1 . Let us now consider a set of relations ℜ1 ℜ k endowed with the structure of a cyclical group. The composition Ψ of all relations belonging to the set, when acting on the initial vector according to the rules of multiplication between complex numbers, generates in turn a cyclical group whose elements are given
Description of a Complex System through Recursive Functions
567
from the successive transforms of this vector. This leads to the individuation of an invariant operator associated to the cycle of this group, once we focus our attention on the system defined by a vector V whose elements coincide with the single transforms of the initial vector. Namely the action of the group consists only in permuting the components of this general system vector. In other words the cycle of Ψ generates an invariant operator on the system vector V. Now we introduce a new interpretation according to which this cycle define a hierarchical level of order K . Within it we have a temporal succession of different systems states. However we can conceive this succession as associated to a superposition of these possible states and, when we treat it like a single object, we can say that it defines a new hierarchical level of order K + 1 . This passage is equivalent to a sort of temporal reduction, where the local symmetry of Ψ turns into global symmetry in a new time scale t '. In synthesis, in order to have a level transition within a spatio-temporal reference frame, we need that the component vectors are aggregated in an only vector. The latter has a local symmetry in the time t of the level of order K (components) which becomes a global symmetry in the time t 'of the level of order K + 1 . 6. More complex recursive patterns A more complex example is given by a nested recursive system which is constituted by 3 relationships and 2 initial vectors:
W0 = z1 = x1 + iy1 = M 1 + ig1 W1 = z 2 = x2 + iy 2 = M 2 + ig 2 and C ( z1 z 2 ) = Conjugate ( z1 z 2 ) x + 1 + iy 2 x +1 iy 2 x +1 iy 2 ℜ1 = 2 + i ; ℜ2 = 1 − 2 − ; ℜ3 = 1 − 2 + C ( z1 z 2 ) z1 z 2 C ( z1 z 2 ) z1 z 2 C ( z1 z 2 ) which are applied as follows:
W2 = ℜ1 (W0 ,W1 ) W3 = ℜ 2 (W1 ,W2 ) = ℜ 2 (W1 ; ℜ1 (W0 ,W1 )) W4 = ℜ 3 (W2 ,W3 ) = ℜ 3 (ℜ1 (W0 ,W1 ); ℜ 2 (W1 ; ℜ1 (W0 ,W1 ))) W5 = ℜ1 (W3 , W4 ) = ℜ1 (ℜ 2 (W1 ; ℜ1 (W0 ,W1 )); ℜ 3 (ℜ1 (W0 ,W1 ); ℜ 2 (W1 ; ℜ1 (W0 ,W1 )))) etc.
568
G. Massa Finoli
The result is a cyclic recursive space of order 4 composed by 3 vectors, in which we have:
V1 = (W x , Wx +1 ,W x + 2 ) V2 = (W x + 3 ,W x + 4 ,W x +5 ) = Ψ1 (V1 )
V3 = (W x + 6 ,W x + 7 , Wx +8 ) = Ψ 2Ψ1 (V1 )
V4 = (W x + 9 ,W x +10 ,W x +11 ) = Ψ 3Ψ 2Ψ1 (V1 ) V5 = Ψ 4Ψ 3Ψ 2Ψ1 (V1 ) = Ψ a (V1 ) = V1
The problem is: are Wi also interchanging? In this regard we remark that in many systems the change of the starting conditions assures the interchangeability of the values of W, in this case V = (Wi ,W j , Wk ) with i, j, k any value in the sphere of the 12 values shown before. So the structure is constant in time and the values are interchangeable according to the starting conditions. If we consider only one relation
ℜ=
x2 + 1 iy 2 + z1 z 2 z1 z 2
and we restrict ourselves to a configurational space given by the values of N ( z n ) (Norm of z n ) we see that it assumes some configurations which are function of the values of M 1 and M 2 and of g1 and g 2 . The values of N ( z n ) have a cycle of order 7; these values are expressed by 7 functions of which 4 are straight lines. For g1 = g 2 = 0 the other 3 functions have their starting point in M 1 , M 2 and M 1 + M 2 and oscillate in a sinusoidal way between 0 and M 1 + M 2 . We can now ask ourselves whether, when we go to a higher order level, this system shows or not emergent properties. In this regard we define as emergent properties, in the case under consideration, those values which are not included between 0 and M 1 + M 2 . It allows us to emphasize that emergent states are influenced by the values of g and therefore are not referred to M. In the quoted case since the states are directly depending on the conditions of all starting components, there is a leap of level but there aren’t any emergent properties. On the contrary, if we consider
ℜ'= −
x2 + 1 iy 2 + z1 z 2 z1 z 2
Description of a Complex System through Recursive Functions
569
with values g1 = g 2 ≠ 0 the function f i ( x) associated to the cycle keep a cyclic behaviour but at the same time their values don’t oscillate between 0 and M 1 + M 2 . In this case we have the presence of emergent states associated to the level change. To summarize: we have a level change when a local symmetry of a level changes into a global symmetry of a superior level. A level change is associated also to emergent states when the performance of the states isn’t a simple combination of the starting conditions of the elements. This generally happens in recursive systems that converge to states not depending on the initial conditions. Or, if there is a dependence on the initial conditions, the system must oscillate among the values that go beyond the initial conditions and this is generally obtained when g1 and g 2 are different from zero. It’s evident the contribute that is given by the values of g1 and g 2 , i.e. by the virtual measures of the system elements. Moreover we can see that if g ≠ 0 the below levels influence the level determinations in two fundamental ways: 1. All the variations of the below levels are synchronized so as to preserve the local cyclicity of the components level. 2. There are phase-displacements in the below cyclicity that determine a local no-cyclicity. If there isn’t a local symmetry at level K, a global symmetry at level K + 1 cannot take place, therefore a level passage doesn’t occur. It means the performance in K + 1 level depends on the particular situations of the not predictable below level. We call such a situation with the name “chaotic”. An example is obtained if we resort to the relation:
ℜ''=
x2 + 1 iy 2 − C ( z1 z 2 ) C ( z1 z 2 )
7. Conclusions In synthesis the study of particular recursive functions in the complex field can reproduce interesting and characterizing aspects of complex systems including those where there can be emergent systems, but without level changes (the case of the chaotic systems) and local asymmetry systems which go towards a global symmetry (dissipative systems).
570
G. Massa Finoli
References 1. K. Brading and E. Castellani, Eds., Symmetries in Physics: Philosophical Reflection (Cambridge University Press, Cambridge, 2003).
2. D. Aerts, J. Broekaert, L. Gabora, in Proceeding of Fundamental Approaches to Consciousness, Tokyo '99, Ed. K. Yasue (Benjamins, Amsterdam, 2000).
3. D. Aerts, B. Coecke and S. Smets, in Metadebates on Science, Ed. G. Cornelis, S. Smets and J.P. van Bendegem, (Kluwer Academic, Dordrecht, 1999), pp. 291-302.
4. D. Aerts and S. Aerts, Diederik Aerts and Sven Aerts, in Quo Vadis Quantum 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
Mechanics? Possible Developments in Quantum Theory in the 21st Century, Ed. A.C. Elitzur, S. Dolev, and N. Kolenda, (Springer, New York, 2004), pp. 153-208. S.Y. Auyang, Foundations of complex-system theories (Cambridge University Press, Cambridge, 1998). N.A. Baas, in Alife III, Santa Fe Studies in the Science of Complexity, Proc. Volume XVII, Ed. C.G. Langton, (Addison-Wesley, Redwood City, CA, 1994), pp. 515-537. N.A. Baas and C. Emmeche, Intellectica 25, 67-83 (1997). P.A.M. Dirac, I principi della meccanica quantistica (Bollati Boringhieri, Torino, 2001). E. Fredkin, Physica D 45(1-3), 254-270, (1990). E. Fredkin, in PhysComp '92: Proc. of the Wkshp. on Physics and Computation, Oct. 2-4, 1992, Dallas, Texas, (IEEE Computer Society Press, 1992). E. Fredkin, in Proceedings of the XXVIIth Rencontre de Moriond (Editions Frontieres, Gif-sur-Yvette, France, 1992). T.P. Cheng, Gauge theory of elementary particle physics (Clarendon, Oxford, 1984). J. Goldstein, Physics Today, 51, 42-46 (1998). J. Goldstein, Emergence 1(1), 49-72 (1999). K. Kitto, Modelling and generating complex emergent behaviour, PhD Thesis (The Flinders University of South Australia, 2006). C. M. Klinger, (2005) Process Physics: Bootstrapping Reality from the limitations to Logic, PhD Thesis, (The Flinders University of South Australia, 2005). Y. Makeenko, Methods of Contemporary Gauge Theory (Cambridge University Press, Cambridge, 2002). G. Massa Finoli, Per una visione unitaria dell’ecosistema. Un modello logico filosofico per la sociobiologia (Edizioni ETS, Pisa, 1987). G. Massa Finoli, Un modello logico filosofico per i sistemi complessi (Editori Riuniti, Roma, 2006). G. Minati and E. Pessa, Collective Beings (Spinger, New York, 2006). G. Minati, Systemist 28(2), 200-211, (2006). H.H. Pattee, BioSystems 60, 5-21 (2001). R. Penrose, The emperor's new mind (Oxford University Press, Oxford, 1989). E. Pessa, in Emergence in Complex Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp. 379-382. I. Prigogine, From Being to Becoming (Freeman, San Francisco, 1980). I. Prigogine and I. Stengers, La nuova alleanza. Metamorfosi della scienza (Einaudi, Torino, 1981). I. Prigogine, Le leggi del caos (Laterza, Roma-Bari, 1993).
ISSUES ON CRITICAL INFRASTRUCTURES
MARIO R. ABRAM(1), MARINO SFORNA(2) (1) Cesi Ricerca S.p.A., Via Rubattino 54, 20134 Milano, Italy E-mail: [email protected] (2) Terna S.p.A., Via Arno 64, 00198 Roma, Italy E-mail: [email protected] In the last decades, the interactions between the infrastructures of a country gained increasing importance and consequently people started to acquire consciousness of their mutual interdependencies. This situation became evident in the last years when a number of large blackouts occurred in U.S.A. and in Europe and portions of electric power systems collapsed and forced other infrastructures to collapse as a consequence. The paper reports a brief description of critical infrastructures, recalling some properties as safety, security, emergency, vulnerability and stability. Then, the implications of these characteristics in control actions are briefly evaluated. The need to build a reference model useful to analyze and to simulate the phenomena connected with critical infrastructures is discussed. Investigating these concepts as emergence of properties may be a chance; this point of view may be useful to identify and to evaluate a systemic approach to deal with critical infrastructures problems. Some remarks about the complexity of the problems involved in managing the interaction between critical infrastructures are finally reported. Keywords: infrastructure, critical infrastructure, interaction, control, security, criticality.
1. Introduction Modern societies developed a large set of systems and methods that enabled mankind to use different forms of energy. The computer technology accelerated the development of instruments for spreading and using information and enhanced the possibility to control more complex systems. These new resources changed the way of life and they were the chance to overcome diseases, illness, hungry, ignorance allowing a part of mankind to reach a more stable welfare and a potentially a peace coexistence. These results were reached increasing the availability of energy under their different forms and developing the ability to convert energy into different forms, especially into electricity. This increased spreading the supply networks and the operating of main energy systems became a strategic activity for all the countries. The situations evolved, the need of energy is still growing, but the world periodically confronted itself with very serious energy crisis.
571
572
M.R. Abram and M. Sforna
Recently, everybody experienced the effects of severe malfunctioning, as the large blackout of electric power system as that occurred in U.S.A in 2001 and 2003 and in Europe, in 2003 (Italy and United Kingdom) and 2006 (Western Europe). So, we got consciousness about the existence of criticalities in infrastructures that can drive to large and severe malfunctioning. In addition the pervasivity of the information technology now gives evidence of the mutual interactions between the different networks of public services. Moreover, tragic criminal events stressed the attention on terrorist actions with the necessity to know and solve the uncontrolled critical part of the infrastructures. The diffuse consciousness of the vulnerability of public services focused the experts’ attention on possible dangerous instabilities. The defense approach prevailed in the last years addressed the necessity to solve the security problem by eliminating the vulnerabilities of the infrastructures and mitigating the effects of not avoidable damages. In any case, it is recognized, that security problems have many aspects and often considering only on one of them may drive to reduce the attention on the others. So, the terrorism problems grasped the attention of decision makers often forgetting the intrinsic vulnerability of the systems. These problems involved economic and political communities and became the object of law actions. The examples or main events are the US Presidential Act [17,18] and the proposal for a European Union Directive on the identification and defense of critical infrastructures [3,4,5]. So, only recently the vulnerability of the infrastructures reached the due importance to get the attention of Governments and Institutions. In this paper we will investigate about these themes considering different approaches. Even if the electric power system is our leading reference, we will attempt to develop some general subjects that, hopefully, can be useful to investigate the interactions among different infrastructures and how these interactions may amplify the effects of vulnerabilities. In particular, after a definition of critical infrastructure (Section 2), the paper will describe some general concepts useful to identify structural and functional properties of the infrastructures from the viewpoint of customers and Providers (Section 3). Considering the evaluation of global properties of an infrastructure we recall the drivers that influence its operating strategies (Section 4). Then, some aspects involved in the interaction of infrastructures are examined (Section 5), and the related problems connected with the control strategies as well (Section 6). In addition, the role of human factor is briefly evaluated in Section 7. The emergence and the systemic approach related with
Issues on Critical Infrastructures
573
the problem of interacting infrastructures are considered in Section 8. Finally, methodological remarks are discussed in Section 9. 2. About critical infrastructures Usually an infrastructure is considered as the set of all the physical components which are needed to be build and operated to supply customers wit a goods, e.g.: materials, energy, information, communication, etc. When the extension of these services involves a whole country, or a community of countries, the possible problems of an infrastructure may become problems for the country, or many countries. Actually, an infrastructure is a complex organization structured in multilevel hierarchies. Its components must work and be operated in an integrated way. Examples of infrastructures are the electric power system, cold chain, fuels supply, communications networks, transportation, social services, health care, military defense, etc. Each infrastructure uses the services and the resources provided by other infrastructures and this can creates the conditions for a mutual dependence. It is enough to think how many services today depend on electricity and telecommunications [1,13,14]. Recently, this dependence increased quickly and it evolved into a strong interaction between the infrastructures. The mentioned blackout events showed how these interactions become critical and now it usual to mention the main infrastructures as critical. The problem is already at economic and political levels. So, Institutions started to consider, evaluate and plan the protection criteria for critical infrastructures and some reference definitions has been prepared. Critical infrastructures are: “Systems and assets, whether physical or virtual, so vital to the United States that the incapacity or destruction of such systems and assets would have a debilitating impact on security, national economic security, national public health or safety, or any combination of those matters” [17]. Again: “There exists a number of critical infrastructures in the European Union, which if disrupted or destroyed, would affect two or more Member States. It may also happen that failure of a critical infrastructure in one Member State causes effects in another Member State. Such critical infrastructures with a trans-national dimension should be identified and designated as European Critical Infrastructures (ECI). This can only be done through a common procedure concerning ECI identification and the assessment of the need to improve their protection” [4].
574
M.R. Abram and M. Sforna
3. Modeling infrastructures Each infrastructure, by means of the related Organization, applies and exercises its activity on specific domains. The Organization in charge of a certain infrastructure controls its components. We can attempt to describe this scenario using a simplified model and providing the relationships among the main states the infrastructure, or its representing technical system, may operate. There can be three main operating states for the system (Figure 1): • (S) System in Service. The system is correctly in operation and the values of the state variables are within the assigned range. All its subsystems are in state (S). The status performance indexes, stability, working point, reserve availability, are in a predefined range. • (O) System Out of Service. The system is not working. The state values are out of the range assigned. All of its subsystems are in state (O). • (D) System in Degraded Service. The system is in operation. However, some of the state values are out of the assigned range. Its subsystems can be in state (D) or (O). The delivered service is degraded and the whole infrastructure can potentially evolve to the state out of service (O). These definitions of operating states give a meaning to a global property of the system. They are an attempt to collect into a single variable the synthesis of all the available information about the system. Following the formalism of superimposed automata, this situation is shown in Figure 1(b), where the stable states (S), (D) and (O) are white and the transition states are gray. In general, the possible system control actions are designed with the goal to manage the states of the system according to planned procedures. For example, many levels of control conditions may be present. By using a global index of the system performance, the service state may range from System in service (S), with the maximum security index, i.e.: the service is continuous and affordable with minimum risk conditions, to System out of service (O), when no service is provided. Between these two conditions many intermediate levels of service may be present and they are grouped by the definition of System in degraded service (D). This simple representation is also applicable to each subsystem of the infrastructure. With reference to Figure 1(b), usually the transitions (3) and (4) are due to free evolution of the system, instead the transitions (1) and (2), to normal operation, are possible only with the application of manual or automatic control actions. The Organization/Company in charge of a certain infrastructure knows the status of the components, by means of the SCADA apparatuses/system, during
Issues on Critical Infrastructures
0
S
S
2
1
3
2
4
0
(a)
1
D
O
2
575
1
0
4
1
0
2
3
O
(b)
Figure 1. Simplified state diagrams of an infrastructure: (a) status of the system as it is perceived by the user (local perception of the system status by the customer); (b) status of the system as it is known by the operator in charge of the infrastructure (global perception of the system status).
the system operation and human operators can infer a general knowledge of the infrastructure operating states (S), (D) or (O). The state diagram for a component, a plant, or the infrastructure, can be represented by many automata, the transitions of witch are driven by the values of a large quantity of process variables and control signals. Moreover, the perception of the service status is different for the service Company and each Customer. In fact, Customers usually perceives only the two states System in Service (S) and System out of Service (O), as they are represented in Figure 1(a). Only special manufactures can perceive a Degraded Services (D) state, but only if their apparatuses (lamps, motors, computer, etc.) work not properly detecting a certain sub-set of degradation. This is a very simplified description, but this example can be a reference to show how the interactions of the Customer with infrastructures are usually a very simple ON/OFF condition. Exercising an infrastructure in the state (S), (D) or (O) is usually a management choice that defines the goals and the strategies of the Company. So, for example, an infrastructure may be operated with the goal to work between the states (S) and (D), performing high operating quality standards. On the contrary, for poor operating standards the infrastructure works between the states (D) and (O). Each Customer experiences that the same kind of service have different quality levels depending on time and location. Those levels can be a Company choice,
576
M.R. Abram and M. Sforna
but they also depend on occurrence of external events such as natural or artificial perturbations/contingencies. Definitively, the Customer has a local perception of the quality of service instead the Company is more interested in a global perception. 4. Evaluating infrastructure global properties For describing the states of an infrastructure may be useful to identify some of its global properties. It is now usual to consider the following terms: • Risk. Commonly speaking, it is the possibility of an event occurring that will have an impact on the achievement of objectives. More correctly, is the product of the probability of an event and the numerical evaluation of damage it causes. • Safety. It is the condition to which risks are managed to acceptable levels. • Security. It is the condition of being protected against danger or loss. • Emergency. It is a situation which poses an immediate risk to health, life, property or environment. Most emergencies require urgent intervention to prevent a worsening of the situation, or to mitigate the consequences. • Vulnerability. It is a condition of weakness of the system. The vulnerabilities does not compromise the system but they can potentially be exploited by a perturbation letting the system to evolve in undesirable effects. • Stability. It is a condition in which the variation of the system state variables will remain within a certain range and the state variable fade to stable final values, after a perturbation. The meaning of these concepts is related to the chosen model. Moreover, a model must be adequately defined because the local meaning may be very different from the global meaning. The priorities associated to the previous definitions change drastically with the setting of the main goals, or drivers, that inspire the managing strategies of the system. Some reference values are: • Quality of service. Attention focused on the Customer satisfaction. • Economy. Attention focused on the Costs reduction. • Remuneration. Attention focused on the Revenue and Stockholders. • Environment. Attention focused on increasing the respect for the Environment.
Issues on Critical Infrastructures
•
577
Sustainability. Attention on the Sustainable development in time, i.e.: the operating activity of the infrastructure is sustainable for the system and for its environment.
The previous considerations suggest that using these drivers is a consequence of the existence and application of a background economic and social model that is the actual reference in setting priorities. 5. Interactions between infrastructures The analysis and solution of the possible problems related to the operation of infrastructures is generally complex due to the fact that their interaction involve a large amount of components. Then, the problems of interaction among infrastructures becomes a problem of interactions among different components [12,13]. They could be single components, but can also be macro-constrains, such as: economical, political, social, environmental and energy sectors. Each component, or macro-constrain, influences each infrastructure and a greater amount of external components to which they are connected. Consequently, the approaches to the infrastructure interactions should be considered from a global point of view. Figure 2 shows a simplified description of the propagation of causal effects in the case an infrastructure (b) goes into state (D) or (O). If the degradation is enough large, it can be perceived locally by the Customer (Figure 2a) as a state of Out of service (O). In addition, the effected infrastructure (Figure 2b), may force another infrastructure (Figure 2c), into states (D) or (O). If the two infrastructures interact, when the system (2c) goes into the state (D) or (O), it can feedback on the infrastructure (2b) amplifying a degradation into the state (D) or (O) and, eventually, pushing the customer (2a) in state (O). Figure 3 shows this sequence. So, it is possible to state that: an infrastructure is critical if it can force another infrastructure into states (D) or (O). This condition has been already recognized in the following sentence: “Determining interdependencies and cascading failure modes in critical infrastructures is a complex problem that is exacerbated further by the diverging characteristics of the interconnected infrastructure types. Services in some types of infrastructure such as telecommunications or the electric grid are provided and consumed instantly. Others, notably oil and gas but also other infrastructures built on physical resources, however, exhibit buffering characteristics” [16].
578
M.R. Abram and M. Sforna
S
S
2
1
3
2
3
D
O
4
(a)
S
2
D
1
4
1
O
O
(b)
(c)
Figure 2. Simplified state diagrams: (a) local perception of the service of the infrastructure (b) and possible effects on the infrastructure (c).
In addition, the time scales of the phenomena, or processes involved in each infrastructure, play a crucial role. For example, the very fast dynamics of the electric power system interact with the slow dynamics of other infrastructures. Then, large processes, like those supported by infrastructures, evolve and interact showing a large spectrum of dynamics. This is the case of thermal processes which have usually a time constants of the order 10 2 ÷ 10 3 sec, while the power system dynamics evolve with time constants in the range 10 −1 ÷ 1 sec. The transient properties of infrastructures influence how the interactions evolve in time. The correct evaluation of these data and the identification of system parameters would be the basic requirements to build and validate realistic models for studying the possible interaction phenomena. 6. Infrastructure control To control a process means to act on some variables governing it in order to force a desired evolution. Consequently, each infrastructure is controlled to reach the goals established for a certain given services. In managing and operating an infrastructure, the control involves different levels of the Organization. For example, the control strategies at low levels of a factory are implemented and operated by technical personnel, taking into account the operating constraints and the resources available. On the contrary, strategies and policies that drive the technical operations are forecasted by the
Issues on Critical Infrastructures
S
S
2
1
3
S
2
3
D
O
4
(a)
579
2
D
1
4
1
O
O
(b)
(c)
Figure 3. Simplified state diagrams: the infrastructure (c) feedback on infrastructure (b) and on the customer (a).
Top management, instantiated by the Executive levels and designed by the sector Experts. Enlarging the vision, due to the fact that a power system is intrinsically unstable, its basic processes have the natural tendency to degrade in Out of Service (O). The permanence into the state In Service (S) is obtained only artificially with the help of dedicated and complex automatic control systems [15], called primary and secondary regulation in normal operation (Figure 4). Often the automatic control is integrated with manual adjustments, called terziary regulation. The strategies used to design the control of processes are very refined and usually they are performed by means of sophisticated and complex computerized systems that in real-time supervise and manage a large amount of information coming from an extensive set of variables and measures. In particular, the procedures for controlling degradation of service and putting the systems in secure conditions, are usually planned, simulated and possibly optimized off-line. Furthermore the Company can define the operating conditions for each subsystem and the global status can be composed by the information collected from each subsystem. So, for example, the availability of affordable ready system reserve, provided by plants in operation, can increase the global level of security of the electric infrastructure. It is a matter of redundancy of resources. Instead, some emergency control strategies may consider to degrade the service
580
M.R. Abram and M. Sforna
S
NATURAL EVOLUTION
5
4
5
4
6 6
A
3
3
0 7
7
8
2
E
O
1
1
8
0
2
CONTROLLED EVOLUTION
S A E O
In Service Alarm Emergency Out of Service
Figure 4. Simplified state diagrams: system with two states (A) and (E) of degraded service.
of a subsystem just to maintain the security of the remaining whole systems. Following this approach, it is common in power systems to adopt automatic load-shedding strategies which disconnect a defined and limited portion of Customers so that all the remaining Customers, and subsystems, can survive and continue to provide the service, even if in a temporary degraded status [7,15]. Similarly, the availability of redundant communication channels increases the probability to survive and overcome interruptions or to mitigate their effects on the information flows. For a national infrastructures, emergency/security policies are evaluated, verified and influenced by the Government Authorities or Regulators, that define the constraints and responsibilities. It should be considered that the problem of controlling an infrastructure involves the attribution of a responsibility in choosing the best operating conditions (with certain goals, constrains and resources). This responsibility is implemented in the control of the given infrastructure (Figure 5). It is possible to split the Responsibility concept into the following three levels: • Organization. It is the Company that is in charge of the infrastructure. It is constituted by all the human resources and assets, organized in hierarchical levels, that operate the infrastructure.
Issues on Critical Infrastructures
581
ORGANIZATION
ORGANIZATION
CONTROL
CONTROL
SCADA
SCADA
INFRASTRUCTURE A
INFRASTRUCTURE B
Figure 5. Simplified picture of the interaction levels between two infrastructures A an B.
•
•
Control. It is the level that operate the infrastructure. It is a structure with human experts constantly using the following level to remotely supervise and control the infrastructure. The Control itself can be structured into hierarchical sub-levels. SCADA (Supervisory Control And Data Acquisition). It is the set of apparatuses and systems that implements the control and supervision functions. It is the technological structure that implements and actuates the control functions in each component and subsystem of the infrastructure. The SCADA structure can be structured itself into hierarchical sublevels and it is the level that mostly relies on the services of other infrastructures and its effectiveness and quality is a function of these infrastructure performances.
The interactions emerge by means of the relationships existing among the elements of the infrastructure. These relationships are different and are allocated on different logical and structural planes. At the lowest level the infrastructures interact physically exchanging matter and energy regulated by commercial contracts. The SCADA systems can interchange information. Between control centers, human operators communicate by languages. Organizations exchange as market agents. The control problem is strictly connected with the attribution of responsibilities into the chain hierarchy. As a consequence, each infrastructure is responsible and have the direct control on its own structures and assets. Usually the infrastructures are managed as systems disjoint from the environment they
582
M.R. Abram and M. Sforna
are embedded (Figure 5). Moreover, inside one infrastructure there is not the perception on how the other infrastructure is organized and it is working. Definitively, the interactions are only physical, for the exchanged services, and commercial, for the economic compensations. There is not any exchange of knowledge on how the related infrastructures work, not to mention the information on weak points or, as minimum, the reciprocal exchange of basic signals for the other system states. It should be noted that within this scenario of deep knowledge of each own infrastructure and a lack of exchange information, a sort of protectionist, is difficult to manage the vulnerabilities arising from the interactions among different critical infrastructures. This is a problem that could be faced introducing higher level of control, that impacts on the interaction processes. From this viewpoint there could be three possible approaches that can be summarized as follow: • Centralized Control. A unique control center supervise and manages the interactions among the infrastructures. The control strategy can be rigid, vertical, hierarchical. The building of a similar structure needs to aggregate a very large political and economical consensus for gaining the authority to operate. On the contrary, the centralized control strategy is possible and realistic inside a single Company, where a Security Operation Center can supervise all the aspects concerning safety, security, assets protection, etc. beside the common network operation center that control the core business grid, i.e: the power system in the case of a power company. • Distributed Control. The control strategy is structured in decoupled subsystems. The central control delegates to peripheral systems the local control function, instead the central operating control authorities give the parameters for security and quality of service. A central SCADA operates the high level control. The electric power market is an example. This structure has the possibility to be implemented with the new agents that can emerge in large aggregation of States as it could happen in the European Community. • Network Interactions. It is the more realistic situation, a de facto condition. The interactions between the agents are not imposed but are negotiated and the interaction protocols are developed and formalized by standardization committees. Naturally the role of politics, cultures, traditions, etc., can define and allocate the priorities to accomplish the goals listed in Section 4.
Issues on Critical Infrastructures
ORGANIZATION
ORGANIZATION
CONTROL
CONTROL
SCADA
SCADA
INFR. A
INFR. B
583
Figure 6. Another way to picture the interactions between two infrastructures A and B.
Beside that, the hoped interactions among infrastructures could be enforced as an enlarging superimposition of the domains of influence. So, the control center of each infrastructure can have a glimpse of the operating status of the others interacting infrastructures (Figure 6). There are not physical constrains to implement that, but only confidential ones. Moreover, there could be the possibility of creating potentially conflict situations to be solve with clear agreements having the security and continuity of common operation as the unique driver. Every attempt to control the interactions substantially causes the redefinition of the chain of responsibilities. However, it should be recognized and stated that each Organization must adopt the following fundamental principle of self-responsibility: “The responsibility of a public service cannot be transferred to an involved Provider”. As a consequence, each Organization must be aware of the vulnerabilities it acquire with the Provider/s and must setup recovery preventive actions. The realization of the previous approaches may be in contradiction with the goals settled by the Companies in charge of the critical services supplied by critical infrastructures. At least, it has an economic impact. In a free market scenario, this requirements can unbalance competition depending on the robustness of each infrastructure and the adopted strategy of quality of service, actual or only promised it. If this is the paved a further problem could be: how must infrastructures be controlled? Which requirements must be imposed?
584
M.R. Abram and M. Sforna
In this context, the first step could be the common adoption of international standards that regulate the internal functions and processes for each infrastructure. A relevant example is the widespreaded application of Quality Standards such as ISO 9001:2000 [8], Environment Standard, such as ISO 14001:2004 [9], Safety Standard as BS OHSAS 18001:2007 [2], Supply Chain Standard, such as ISO 28000:2007 [10] and other special and sector standards. The applications of these standards can furthermore contribute to build the basic principia and the culture of the model adopted. In other words, it may be a step toward the building of a common background reference model. Beside, each Country is called to develop a coherent law system in accordance with the structures and the values of actual technical standards and of accepted international agreements. In this way, the choice of interaction standards is based on the definition of constraints that are approved, accepted, applied and then supported internationally. 7. The role of human factor The evaluation of the human factor in operating infrastructures is another relevant aspect. The key role of the operators and of experts located at different hierarchical levels may be very critical and can be the most critical factor in the infrastructure performances. Operators, when are correctly trained and motivated, can exercise an efficient and effective use of the procedures and of the complex and sophisticate supervision and control tools. In addition, in case of emergency conditions he can operate in the best way using all the possibilities and resources intrinsic in the controlled processes correctly understanding the available information. The role of operators is essential in supervision and control of emergency states because they can choose the more adequate strategy in order to reach a goal, especially when a critical situation is not modeled or planned for the automatic control by predefined apparatuses. For example, in the context of multilevel organizations, Company experts have a large knowledge base and the ability to evaluate and develop successful strategies. They are a fundamental resource in solving critical situations by means of their experience, technical and theoretical knowledge, spirit of invention, definition and execution of local procedures. They can develop different strategies in accordance or in oppositions with prescriptions, if needed. They find solutions outside the specific technical context of automatic systems. Another level in which the human factor is very important lies into investigating the processes and designing control functions and systems.
Issues on Critical Infrastructures
585
This consideration is much more relevant in the decisions making process. In fact, the choices of the drives at economic and political levels have the stronger influence on infrastructure performances and security. On the contrary, for the same reasons, experts and managers could be source of criticality when they operate with lack of experience, fraud, absence of knowledge, casualty, discontent, lack of commitment. 8. Systemic Approach and Emergence The arguments described in previous sections may be read from a systemic point of view. This will give a global perspective from which the increasing importance of relationship among the systems emerges. The problem to know and evaluate the status of the system and to model critical phenomena as emergent properties, acquired by the system during processes of emergence occurring within it, is actual and involves all the large and complex systems of all the infrastructures. The concept of emergence is strictly related to the theoretical role of the Observer [11,12]. Only the assumption of a suitable level of description by the observer may allow detection and modeling of emergent properties. For instance the level of description considering single bodies, cars or industries is necessary (as for a reductionistic level of description), but it will never be sufficient to detect emergence of flocks, traffic and industrial districts with their emergent properties. In this view emergence is intended as process of acquisition of new properties by a system modeled at a level of description higher than the one used to model what is considered by the observer interacting components. Crutchfield introduced the concept of intrinsic emergence giving rise to profound changes in system structures and requiring formulation of a new model of the system itself [6]. Following this paradigm, interactions among infrastructures establish a new system assuming new, emergent properties that cannot be modeled by using the level of description used for structures or single infrastructures. In this context, even the role of humans can be seen from a different viewpoint if we consider it as Observer. At the various level, the human may, or should, be able to see the emerging properties of systems and, as a consequence, he/she should be able to use them in designing and operating infrastructures. This position suggests to consider the set of infrastructures as a super-system in which we attempt to know and to use all the possible relationships among the existing systems. In this sense we can describe the service provided by an
586
M.R. Abram and M. Sforna
infrastructure as the properties emerging from all the elements concurring to form the system. Following this approach the vulnerability of an infrastructure/system is connected to the disappearing of the emergence processes that constituted or characterized the system. This is due to a perturbation that cancel the interactions among its elements. In this sense the infrastructure is the emergence of a service given by means of a physical structure. The vulnerability of infrastructures is located into interactions among subsystems then each interdependence relationship is a source of vulnerability. A systemic approach controlling a system means to be able of controlling, managing and driving the emergence processes. Controlling and managing vulnerability means to control the interactions among systems. Recalling Figure 5, this involves to control all the interactions at the physical, information, supervision and organization levels. These considerations have practical consequences. It is emerging the need of a more extensive call in responsibility for the Regulators, standardization Committees, decision makers, in general, who must identify, evaluate, update, communicate and support the applications of new operating models and procedures. 9. Conclusions Some general considerations were developed to investigate the emerging phenomena originated by interacting infrastructures. The complexity of the problems is evident. It is difficult to develop an adequate quantitative model, simple enough to be used and complete enough to describe the core of the problem and all the interaction phenomena. Many modeling approaches are available, but a relevant amount of activities are necessary for developing effective and useful representations of emerging properties in systems. The emergence of interactions in heterogeneous systems and in multilevel hierarchies, shows how the techniques and representations developed in many disciplines, even if sophisticated and detailed, are often inadequate and give only partial representations of the properties of interacting systems. For this reason it is difficult to develop models of the interaction among infrastructures which give a complete account of the problems complexity. The building of an adequate model is the basic requirement for analyzing and eventually developing a possible control on the interacting processes. The goal is to build
Issues on Critical Infrastructures
587
quantitative models complete enough to be useful but simple enough to be manageable. The problems involved in controlling the infrastructures manifest their importance when the different infrastructures interacts with multiple interaction levels. Because the interaction problems involves many technical, organizational, political levels, the effective managing of interactions implies the solution of a control problem that involves all the hierarchical levels. In order to bound the propagation of perturbations among systems and to increase the chances for surviving and recovering, a possible strategy could lie in reconsidering also the design requirements, the operating procedures and the managing criteria with the goal to reduce the levels of interaction. The standardization processes evolve quickly and the acknowledge of systemic approaches could accelerate the integration and harmonization of the existing norms and laws. These additional notes could address the future researches and activities. Acknowledgments We would like to thank Dr. Ing. Dario Lucarella (Cesi Ricerca) and Dr. Ing. Paolo Bossi (Cesi) who provided information, useful discussion and support. References 1. M. Amin, IEEE Control Systems Mag., 22 (2002). 2. BS OHSAS 18001:2007, Occupational health and safety management systems Requirements (BSI, London, 2007).
3. Commission of the European Communities, Green Paper on a European 4. 5.
6. 7. 8. 9.
Programme for Critical Infrastructure Protection, COM 2005/576, 17.11.2005, (Brussels, 2005). Commission of the European Communities, Communication from the Commission on a European Programme for Critical Infrastructure Protection, COM 2006/786, 12.12.2006, (Brussels, 2006). Commission of the European Communities, Proposal for a Directive of the Council on the identification and designation of European Critical Infrastructure and the assessment of the need to improve their protection, COM 2006/787, 12.12.2006, (Brussels, 2006). J.P. Crutchfield, Physica D 75, 11 (1994). U. Di Caprio, in Systemics of Emergence: Research and Applications, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), p. 293. ISO 9001:2000, Quality management systems - Requirements (International Organization for Standardization, 2000). ISO 14001:2004, Environmental management systems - Requirements with guidance for use (International Organization for Standardization, 2004).
588
M.R. Abram and M. Sforna
10. ISO 28000:2007, Specification for security management systems for the supply 11. 12. 13. 14. 15. 16. 17. 18.
chain (International Organization for Standardization, 2007). G. Minati and M.R. Abram, AEI 90, 41 (2003). G. Minati and E. Pessa, Collective Beings (Springer, New York, NY., 2006), S.M. Rinaldi, Proceedings of the Hawaii Int. Conf. on System Science, (2004). S.M. Rinaldi, J.P. Peeremboom and T.K. Kelly, IEEE Contr. Sys. Mag. 21, 11 (2001). M. Sforna, Safety & Security 1, 14 (2007). N.K. Svendsena and S.D. Wolthusena, Inform. Sec. Tech. Rep. 12, 44 (2007). The White House, The National Strategy for the Physical Protection of Critical Infrastructures and Key Assets (Washington, DC, 2003). USA Patriot Act, Public Law 107-56, October 26, 2001, (Washington, DC, 2001).
THEORETICAL PROBLEMS OF SYSTEMICS
This page intentionally left blank
DOWNWARD CAUSATION AND RELATEDNESS IN EMERGENT SYSTEMS: EPISTEMOLOGICAL REMARKS
LEONARDO BICH CE.R.CO. – Center for Research on the Anthropology and Epistemology of Complexity, University of Bergamo, Piazzale S. Agostino 2, 24129 Bergamo, Italy Email: [email protected] In this article we analyse the problem of downward causation in emergent systems. Our thesis, based on constructivist epistemological remarks, is that downward causation in synchronic emergence cannot be characterized by a direct causal role of the whole on the parts, as these levels belong to two different epistemological domains, but by the way the components are related: that is by their organization. According to these remarks downward causation, considered as relatedness, can be re-expressed as the noncoincidence of the operations of analysis and synthesis performed by the observer on the system. Keywords: emergence, downward causation, organization, (M,R)-systems, constructivism
1. Introduction Emergence is usually approached from two different points of view. The first is an ontological one, which concerns the formation of new levels of reality in a way that it is meant to maintain a coherence with the physicalist perspective. The other approach is epistemological and it focalizes on the observer and on his limits in modelising complex systems. In this article we assume the second approach. The reason is that we think that the issue of the unpredictability or the non-deducibility of the description of a complex system from that of the behaviour of its part in isolation concerns primarily the interaction between the observer and the system, as for example the limits in the precision of his measurements or the different kinds of observation he can perform at different levels of analysis. Also, unpredictability and non-deducibility concern the models the observer builds and not the systems themselves. Following these remarks we assume a constructivist approach mainly derived from the one proposed by Humberto Maturana and Francisco Varela in the autopoietic theory [9,19,21,22,36,37]. It consists in that the observer does not have a direct access to reality but only to his own experiences he performs in interaction with it. In such a way knowledge is not characterized by a
591
592
L. Bich
registration of the features of an objective reality but occurs in a relational domain where the regularities of these experiences are expressed in the models the observer builds. The autopoietic epistemological approach is based on the concept of unity, which makes it very suitable in a systemic perspective. The primary operation which characterizes the activity of the observer is the distinction of a unity from its background, an action that relies on his purpose and his point of view, and which specifies the identity of the unity together with its domain of existence [21,33]. Through this epistemic operation at least three levels can be distinguished on a unity: • its material parts considered in isolation, or better distinguished from a generic background; • the composite unity, which corresponds to the internal point of view and which constitutes the level of the interactions of the functional components which, differently from the material parts, are distinguished in relation to the system they integrate; • the simple unity, which corresponds to the external point of view and is distinguished from the medium it interacts with. It is considered as a whole with given properties. Each of these observative levels on the same unity determines a domain of existence characterized by the presence of specific elements and relevant properties. The problem of emergence is placed at the level of the relations between these domains [5] and consists in the possibility or not to express the property of one domain in terms of another. The first two levels, which in the case of purely additive interactions can be considered as coincident, have an ambiguous status, for they cannot be considered in a hierarchic relation like that between them and the third level. Their difference depends instead on the direction of the distinction, bottom-up for level 1 and top-down for level 2, and it is crucial to understand emergence. 2. Complex Emergence The use of the term “emergent” to name a non-additive property of a compound, in opposition to “resultant”, dates back to George Henry Lewes’ “Problems of Life and Mind” in 1875 [16]. With time it assumes more precise connotations and it is used to define the phenomena of appearing of qualitative novelties in nature, like properties, relations or levels not present in the pre-existent entities. The acknowledgment of the importance of emergent phenomena finds its first
Downward Causation and Relatedness in Emergent Systems: …
593
rigorization during the 20’s with the rising of a line of thought called British Emergentism [2,7,18,23,34]. Considering the most recent studies since the 70’ and the 80’s, which are focused on the role of the observer, it is possible to distinguish between emergence as unpredictability or as non-deducibility from some initial conditions or basic level [4,8,11,24]. To the first group belong the processes of self-organization like those studied by the thermodynamics of dissipative structures (thermodynamical emergence) of Ilya Prigogine [25] and the computer simulations of Artificial Life (computational emergence): the ordered and unitary behaviour which can be observed in a system is in principle deducible from the laws which characterize the model which describe the behaviour of the constituents, but it is not predictable because of the non-linear interactions between them. An infinite precision in the knowledge of the initial conditions is required to the observer, whose limits have the consequence that it is not possible to determine the evolution of the system after a certain amount of time: even a very slight difference in the initial conditions can give rise to very different behaviours. The emergence characterizing these phenomena, usually called of “self organization”, depends on the intrinsic limits in the process of measurement performed by the observer [30]. Therefore, what emerges is not a qualitatively new level or behaviour which needs to be described by a new model but just an ordered pattern which is recognized by the observer according to his properties as a cognitive agent. According to these remarks the models of these processes can be better called of self-ordering than of self-organization [1]. The new level that we observe is only epiphenomenal. Considering the interaction between some objects A1i , A2i ... Ani with properties P1i , P2i … Pni at the basic level of analysis i, the emerging level characterized by a new object Ai +1 is totally determined and describable in terms of the elements belonging to the level i and their properties (cfr. Humphrey, 1997) [13]:
A1i +1 ( P1i +1 )
A2i +1 ( P2i +1 )
↑ Realizes
A1i ( P1i ) + A2i ( P2i ) = epiphenomenal causality
→ = effective causality ↑ = realization
↑ Realizes →
A3i ( P3i ) + A4i ( P4i )
594
L. Bich
To the second group, which includes those processes which can be considered emergent in a proper sense, belong the phenomena of spontaneous symmetry breaking studied by Quantum Field Theory [3,24], and autopoietic processes [5]: the formation of a system or a new behaviour are not deducible in principle by the model which describes the initial dynamics and require a new kind of description, a new model which cannot be reducible to the initial one neither to a more comprehensive one. In this case, where the new level is not describable in terms of the lower one, what emerges is not merely epiphenomenal. The relatedness between components gives rise to a qualitatively new level with its proper characteristics, which needs new kinds of models in order to be described.
A1i +1 ( P1i +1 )
→
A2i +1 ( P2i +1 )
|
|
Cai ( Pai ) * Cbi ( Pbi )
Cci ( Pci ) * Cdi ( Pdi )
↑ interaction ↑ (non additive)
↓ degradation ↓
A1i ( P1i )
A3i ( P3i )
A2i ( P2i )
(simple unities)
(composite unities)
A4i ( P4i )
In this scheme, adapted from Humphrey 1997 [13], we can identify the three levels which an observer can distinguish on a unity. According to our approach the components Cai , Cbi … Cni integrating the composite unity are expressed with different names than those of the basic ones, and their interaction, being non-additive, is expressed by *. While in the case of emergence as unpredictability the levels of material parts and components coincide and that of the simple unity is merely epiphenomenal, in the phenomena of emergence as non-deducibility the three levels are qualitatively different and the lower one is not pertinent in the description of the others. Different terms are usually used to express this kind of emergence. The first one, which points out that the non-deducibility depends on the way the system is organized and not on the limits in the measurement process, is “intrinsic emergence” [11], but this term tends to obscure the role of the observer. Another
Downward Causation and Relatedness in Emergent Systems: …
595
definition is “emergence relative to a model” [8], which is focused on the limits in the descriptions proposed by the observer but does not express the character of necessity of this epistemological limit: the impossibility could depend on the kind of model, while in this case it is meant as a limit in principle for all models. This problem can be solved using another terminological option, that of “complex emergence” which can express in a proper way both the limit in principle and the role of the observer. We refer here to Robert Rosen’s definition of complexity: “to say that a system is complex […] is to say that we can describe the same system in a variety of distinct ways […]. Complexity then ceases to be an intrinsic property of a system, but it is rather a function of the number of separate descriptions required […]. Therefore, a system is simple to the extent that a single description suffices to account for our interactions with the system: it is complex to the extent that this fails to be true” [30]. In such a way, complex emergence expresses the failure or inadequacy of a single modality of description and the necessity to pass to a new one. The lack of relation between the different modes of description is what makes the emergent phenomena and especially downward causation so puzzling, as we will see in the next paragraph. In the analysis of emergent phenomena and in order to deal with the problem of downward causation a further distinction is necessary, that between synchronic and diachronic emergence [35]: • synchronic emergence concerns the recognition by the observer of a unitary system rising from the interaction between components and the consequent presence of different levels of analysis. It focuses on the hierarchic relation between parts and wholes and it is strictly connected to the problem of distinction exposed in the introduction. • diachronic emergence concerns the appearance in time of new entities in the natural world – for example the origin of living systems or the birth of new species in the evolutionary process – and of new behaviours at the level of the interactions between the system and its environment. We will mainly deal here with synchronic emergence in order to face the epistemological problem of downward causation in those systems characterized by complex emergence, but few remarks will be made also on the diachronic one.
596
L. Bich
3. Downward Causation With the term “downward causation” we generically mean the causal effect of the emergent whole upon the elements that constitute it, in such a way that their behaviour inside the system is different than in isolation. The direction is the opposite of that of the relation of realization, which goes from the elements to the whole they give rise to. It is possible to distinguish three different ways to consider this relation [12]: • strong downward causation: it implicates an ontological difference between the levels considered, due to the introduction of a non-scientific principle, as in vitalist theories. It violates the physicalist assumption of materialism because the upper level is not completely realized by the entities of the lower level and consequently the gap between levels does not depend on the ways the constituents are related. The causality of the whole on its parts in fact has the form of a non-material principle which directly influences the way the constituents behave. • Weak downward causation: it considers the difference between levels as merely formal. The emergent levels show an ordered pattern, but do not have an effective causal power on the elements of the lower levels. They are just epiphenomenal, in that they do not imply an effective change in the system. • Medium downward causation: the upper level influences the lower ones without appealing to a sort of vitalist explanatory principle. This influence consists in that the constituents behave in the system differently than in isolation, due to the way they are related in order to realize the unitary whole they belong to. While we exclude the first kind of downward causation as non-scientific, it is easy to associate the other two versions of downward causation to the two forms of emergence analysed above. Weak downward causation is exhibited by the self-ordering processes which are characterized by emergence as unpredictability. It is epiphenomenal because the behaviour of the elements of the system does not need a new model to be described. Consequently it has just an heuristic value. The third kind of downward causation is the more interesting because it implies an effective difference in the behaviour of the constituent, which requires the construction of a new model. Problems arise when we try to conceptualize this causal relation: how can we conceive a causal power of the whole on the parts if these belong to two different epistemological levels with
Downward Causation and Relatedness in Emergent Systems: …
597
no direct connection and which are described by different and incommensurable models? According to the epistemological remarks we made on complex emergence, which are based on models and not on natural objects in themselves, the whole cannot directly act at the level of its components, for the two levels belong to different and incompatible domains. They are in fact described by different models in such a way that a direct interaction between them is not approachable from both the conceptual and the operational points of view. A direct causal relation would imply that the two levels could be described by a same model, in contradiction with the definition of emergence. This epistemological problem can be solved if we consider downward causation as depending on relatedness [32], that is on organization. The way components are related in order to integrate and realize the emerging systems is what makes an effective difference in their behaviour. A possible way to deal with the problem of downward causation is thus to consider the difference between the two levels of the constituents which are identified by the action of distinction of a unity performed by an observer. From this epistemological point of view downward causation consists in the irreducibility of the models which describe the material parts of the system and the components, these last ones distinguished from the background of the organization of the unity they give rise to. The former depend on their intrinsic properties and are identified through a bottom-up approach; the latter depend on the organization of the emergent system and are distinguished through a topdown approach. While in self-ordering processes a bottom-up approach is sufficient and allows us to modelise the system, here is the top-down one which catches the internal dynamics of the system and determines the model of the emergent behaviour of components [6]. Downward causation, according to this epistemological approach, can be expressed as the irreducible difference between the bottom-up and the top-down approaches, which are not one the inverse of the other. This difference is not considered as a direct consequence of the action of the system as a whole but of the organization, the relatedness, to use an expression of the emergentist Conway Lloyd Morgan’s [18], which determines the identity of the functional components which it connects. Emergence comes up as a problem of observative levels which are characterized by a lack of connection. This makes two different conceptual or experimental activities performed by the observer, that of synthesis from the
598
L. Bich
material parts and that of analysis from the composite unity, not to coincide [31]. Two different kinds of models come out of these operations. These models do not deal with the states of a system S in itself but with mappings on them, which express the measurement interaction between the observer and the system. These mappings, called observables, define some equivalence relations R on S . Given an observable f:
f (S ) = S / R f
(1)
In analytic models the observables are defined on the system S and they are characterized as its projections onto its factors, which correspond to the components belonging to the second level of the distinction. The starting point is the observation of the system as a whole, which is the domain of the operation. Analysis, according to Rosen [31], can be expressed formally in Category Theory as a direct product:
M ( S ) = ∏α fα ( S )
(2)
Synthesis on the contrary is an assembly operation where S is not the starting point, the domain of the operation, but the range of it. It is not a projection of S but an injection of its constituents on it. It is a construction of the system from the analytic models describing its subsystems – that is its material parts – which belong to the first level of distinction. The observables in fact are not defined on S but on its subsystems An . The synthetic model of the system S can be expressed through the direct sum of the models describing its parts [31], as the smallest set containing them:
M (S ) =
n
M ( An )
(3)
In simple systems (see the first scheme in Sec. 2 above) the analytic and synthetic models are one the inverse of the other and the system is fractionable: the properties of the systems are localised in its material parts and can be expressed by models describing them. In the case of complex emergence (see the second scheme in Sec. 2 above) downward causation can be characterized as the non coincidence of analytic and synthetic models of S . Its main consequence is non-fractionability that consists in the lack of a one-to-one correspondence between the organization of the composite unity, identified by a top down distinction, and the structure that realizes it through the material parts, which is instead identified by a bottom up distinction.
Downward Causation and Relatedness in Emergent Systems: …
599
An observer who does not consider the problem of the incompatibility of models, but assumes just an ontological point of view focused on object in themselves instead of on the relation between models, theoretically approaches downward causation as a direct action of the whole on the parts. In this way he puts different levels, which are acknowledged as irreducible from the same definition of emergence as non-deducibility and which are the result of different operations of distinction, on a same one, thus making an epistemological mistake. Therefore this approach involves a kind of downward causation which contradicts the same definition of emergence it starts from. Here we faced the problem of downward causation from a synchronic point of view, focused on the relation between the whole and his constituents. Assuming a diachronic approach we can identify some differences. In fact observing the temporal evolution of the system in interaction with its environment we can identify the correlations of the changes in the composite and in the simple unity. In order to do this we need to put ourselves on a metadomain of second order descriptions, which are symbolic and conceptual and not operational [5]. In this case the causality between the whole and its constituents appears to the observer as the reciprocal modulation between the internal and the external dynamics [20]. Nevertheless this is not a direct causal relation, as the two domains are not intersecting and the two levels to which they belong still need to be considered distinct. It is rather a relation of reciprocal selection that the observer induces from the correlations he observes in time between the two different levels. Like we said above, a direct causal relation would imply a reduction between those levels, which would be described together into one more comprehensive model. Here instead it is just the identification – on a metadomain – of some correspondences which come out of the observation of the two different levels at the same time. 4. Conclusive Remarks In this article, by assuming an epistemological perspective on emergence focalized on the models build by the observer and on the three levels he can distinguish on a system, we showed that downward causation is more complex than the causal effect of the whole on the parts. It depends in fact on relatedness and can be considered as the non-congruence in emergent systems between the synthetic models, which describe a system starting from the parts in isolation, and the analytic ones, which describe the composite unity starting from its organization. These two different classes of models depend respectively on a bottom-up and a top-down operation which are not one the inverse of the other.
600
L. Bich
We also showed that in a physicalist perspective the assumption of an ontological approach to emergence and downward causation can lead to some mistakes, for it makes different levels to interact on the same one and it contradicts the definition of emergence, based on non-deducibility. A further step along this line of research can be to show examples of the non coincidence between analytic and synthetic models. An interesting case for the application of this theoretical framework is the formal model proposed by Robert Rosen to characterize the identity of living systems: the (M,R)-system [15,26,27,28,29]. His demonstration of the inequality of the operations of analysis and synthesis in this class of system [31] is obscure and it is quite controversial [10,17,38], also because it is connected with his thesis about the non-computability of the models describing living systems, which would put serious limitations to the goals of Artificial Life. Studies in this direction would be important for a better understanding of emergence in biological systems. Especially, they would open the way to the development of a non reductionistic approach to the biological study of the organization of living systems focalized on the analysis of their functional components in terms of the unity they integrate. References 1. D.L. Abel and J.T. Trevors, Phys. Life Rev., 3 (2006). 2. S. Alexander, Space, Time and Deity (Macmillan, London, 1920). 3. P.W. Anderson and D.L. Stein, in Self-Organizing Systems: The Emergence of Order, Ed. E. F. Yates, (Plenum Press, New York, 1985) p. 445.
4. N.A. Baas, in Artificial Life III, A Proceedings Volume in the Santa Fe Institute 5. 6. 7. 8. 9. 10. 11. 12. 13.
Studies in the Sciences of Complexity, Ed. C.G. Langton, (Addison-Wesley, Reading, 1994), p. 515. L. Bich, in Systemics of Emergence, Research and Development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, Berlin Heildelberg New York, 2006), p. 281. L. Bich and L. Damiano, Orig. Life Evol. Biosph., 37 (2007). C.D. Broad, The Mind and Its Place in Nature, (Routledge and Kegan Paul Ltd., London, 1926). P. Cariani, On the Design of Devices with Emergent Semantic Functions (Ph.D. Dissertation, State University of New York at Binghamton, 1989). M. Ceruti, La danza che crea (Feltrinelli, Milano, 1989). D. Chu and W. K. Ho, Artif. Life, 12 (2006). J.P. Crutchfield, Physica D 75 (1994). C. Emmeche, S. Køppe, F. Stjernfelt, in Downward Causation. Minds, Bodies and Matter, Ed. P.B. Andersen, C. Emmeche, N.O. Finnemann and P.V. Christiansen, (Århus University Press, Århus, 2000) p. 13. P. Humphreys, Philos. Sci., 64 (1997).
Downward Causation and Relatedness in Emergent Systems: …
601
14. J. Kim, In Emergence or Reduction? Essays on the Prospects of Nonreductive 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.
Physicalism Ed. A. Beckermann, H. Flohr and J. Kim, (De Gruyter, Berlin, 1992), p. 119. J-C. Letelier, J. Soto-Andrade, F. Guinez-Abarzua, M-L. Cardenas, A. CornishBowden, J. Theor. Biol., 238, (2006). G.H. Lewes, Problems of Life and Mind (Houghton, Osgood and Company, Boston, 1975). A.H. Louie, J. Integr. Neurosci., 4 (2005). C.L. Morgan, Emergent Evolution (Williams and Norgate, London, 1923). H. Maturana, Irish J. Psychol., 9 (1988). H. Maturana, J. Mpodozis and J.C. Letelier, Biol. Res., 28 (1995). H. Maturana and F. Varela, De Máquinas y Seres Vivos: Una teoría sobre la organización biológica (Editorial Universitaria, Santiago, 1973). H. Maturana and F. Varela, El árbol del conocimiento (Editorial Universitaria, Santiago del Chile, 1984). B.P. McLaughlin, in Emergence or Reduction? Essays on the Prospects of Nonreductive Physicalism, Ed. A. Beckermann, H. Flohr and J. Kim, (De Gruyter, Berlin, 1992), p. 49. E. Pessa, in First Italian conference on Systemics, Ed. G. Minati, (Apogeo, Milano, 1998), p. 59. I. Prigogine and I. Stengers, La Nouvelle Alliance. Métamorphose de la science (Gallimard, Paris, 1979). R. Rosen, Bull. Math. Biophys., 20 (1958). R. Rosen, Bull. Math. Biophys., 20 (1958). R. Rosen, Bull. Math. Biophys., 21 (1959). R. Rosen, in Foundations of Mathematical Biology, Ed. R. Rosen, (Academic Press, New York, 1972), vol. II p. 217. R. Rosen, Fundamentals of Measurement and Representation of Natural Systems (North-Holland, New York., 1978). R. Rosen, Life Itself: a Comprensive Inquiry into the Nature, Origin, and Fabrication of Life (Columbia University Press, New York, 1991). J. Schroder, Philos. Quarterly, 48 (143) (1998). G. Spencer Brown, Laws of Form (George Allen and Unwin Ltd, London, 1969). A. Stephan, in Emergence or Reduction? Essays on the Prospects of Nonreductive Physicalism, Ed. A. Beckermann, H. Flohr and J. Kim, (De Gruyter, Berlin, 1992), p. 25. A. Stephan, Grazer Philosophische Studien, 65 (2002). F. Varela, Principles of Biological Autonomy (North-Holland, New York., 1979). F. Varela, H. Maturana and R. Uribe, Biosys., 5 (1974). O. Wolkenhauer, Artif. Life, 13 (2007).
This page intentionally left blank
TOWARDS A GENERAL THEORY OF CHANGE
ELIANO PESSA Centro Interdipartimentale di Scienze Cognitive, Università di Pavia and Dipartimento di Psicologia, Università di Pavia Piazza Botta 6, 27100 Pavia, Italy E-mail: [email protected] This paper deals with the feasibility of a general theory of changes occurring both in nonbiological and in biological world. The aim of this theory should be that of classifying, describing, and forecasting the consequences of changes, as well as of finding the conditions which ensure a possibility of controlling them. The most important sub-case of this investigation would consist in a general theory of emergence, clarifying whether the latter could be or not obtained through a suitable generalization of physical theory of phase transitions. We will argue that this enterprise could be feasible, provided actual theoretical framework holding in physics be enlarged in a suitable way, so as to include phenomena not reducible to particles interacting through force fields of immutable nature. Keywords: emergence, phase transitions, biological models, quantum field theory.
1. Introduction The most important feature of the world of phenomena is the occurrence of changes. All of them occur in time. Some occur in time and space. Others occur in time and configurational variables (describing the inner structure of a given system). Among these changes some appear to be of utmost importance, as they lead to deep structural changes in the system being observed. When not reducible to an observable direct action of the environment, these latter are qualified by the word “emergence”. In the last years, the topic of emergence has been the subject of an intense debate (see Minati and Pessa, 2006, for a review and a list of references) between anti-reductionist philosophers, claiming that emergence cannot be described by actual physical theories, and theoretical physicists, asserting that emergence is nothing but a special kind of phase transition. Whatever may be the conclusion of this debate, a lot of experimental evidence made clear that a number of features, in a first instance seen as typical of emergent phenomena, depend only on the adopted observational time scale. Thus, while it is commonly felt that the sudden occurrence of a ferromagnetic state at Curie temperature is an example of emergent phenomenon, whereas the 603
604
E. Pessa
evolution of reptiles in billion years is not an emergent phenomenon, it is easy to acknowledge that, by adopting a time scale whose unit is some dozen of million years, the time trend of reptile evolution mimics the one of residual magnetization close to Curie point. Such a simple fact forces to enlarge the framework adopted for studying emergence so as to include all kinds of change. In this way the difficult question of the eventual difference between physical and biological (or psychological, economical, social) emergence is reduced to the simpler question of the difference between physical and biological models of change. And, as it is well known, about both kinds of models there is a conspicuous body of experience. Within this paper we will shortly discuss some aspects of physical models of change, trying to put into evidence their advantages and shortcomings. The latter will then be compared with some typical models of biological change, in order to understand why actual physical models fail to account for some typical features of biological world. We will argue, in this regard, that there would be a possibility of remedying for this failure, provided the actual framework used in physical theories were generalized in a suitable way. As the search for this generalization appears to be of utmost importance for the development of Systemics and of a general theory of emergence, hereafter the attribute “biological” will be used in its widest sense, that is referring to all phenomena which are commonly related to the existence of natural living beings. Therefore even psychological, economical, social phenomena will be libeled as “biological”. 2. Physical models of change The framework so far adopted by physicists to describe changes is based on two main components: elementary units and force fields. The latter give rise to the interactions between elementary units which, through a typically linear causeeffect mechanism, drive the dynamics of units themselves. In turn, the spatiotemporal dynamics of force fields is ruled by evolution equations including source terms due to elementary units. Both kinds of dynamics fulfill conservation principles, such as the one of total energy, and are therefore based on a Lagrangian or Hamiltonian formalism. This approach is characterized by a number of heavy shortcomings, which will not be mentioned here, and has been seriously criticized since Newton times. However, all alternative proposals so far made didn’t obtain a wide consent. The previous scheme needs some modification when the number of elementary units is so great as to prevent from a detailed description of their
Towards a General Theory of Change
605
individual dynamics. In these cases physicists resort to a distinction between two levels of description, the microscopic and the macroscopic one. The goal of Statistical Mechanics is, then, that of deriving the phenomenological laws ruling the macroscopic behavior starting from the knowledge of interactions between elementary units at a microscopic level. Needless to say, this goal is very difficult to reach, so that Statistical Mechanics still cannot be considered as a firmly grounded discipline. Before going further, it is to be remarked that the general scheme sketched before still permeates almost all theoretical physics, in a way which is partly independent from the general principles adopted. Thus, it works in classical as well as in relativistic mechanics, in quantum mechanics as well as in Quantum Field Theory (QFT). We will now focus on the application of the above framework to the most interesting kinds of change (at least from the point o view of Systemics): the socalled phase transitions. The theory of the latter has been built by resorting to a suitable combination of phenomenological observations, of phenomenological theories describing them, of suitable theoretical frameworks into which the latter have been embedded, and of statistical arguments supporting all this machinery. As regards phenomenological observations we will introduce a distinction two kinds of them. The first kind includes the behaviors observed when temperature approaches critical temperature, that is the divergence of (generalized) susceptibility, the divergence of amplitude of fluctuations of order parameter, the critical slowing down, and the discontinuity in the curve giving specific heat as a function of temperature. The second kind refers to existence of universality classes, evidenced by the fact that different phase transitions are associated to (almost) the same set of critical exponents appearing in the laws describing macroscopic behaviors near the critical point. Let us now focus on phenomenological theories, by remarking that in this context we have two main approaches: the Ginzburg-Landau theory and the Renormalization Group. While neglecting any technical description of these topics (which can be easily found on standard textbooks; see, for instance, Goldenfeld, 1992; Benfatto and Gallavotti, 1995; Cardy, 1996; Domb, 1996; Minati and Pessa, 2006, Chap. 5), we will limit ourselves to mention the outstanding importance of two main ideas: on one hand, the identification of phase transitions with symmetry breaking phenomena (and in practice with bifurcation phenomena in Ginzburg-Landau functional), and, on the other hand, the scaling hypothesis, allowing the elimination of all irrelevant parameters close to critical point. While both ideas are correct only in a limited number of cases, however they underlie a huge number of models of phase transitions, so
606
E. Pessa
that every concrete computation, done within this context, is, in a direct or indirect way, based on them. However, the main problem of theory of phase transitions (hereafter shortly denoted as TPT) is to fit the foregoing ideas with the general framework briefly sketched at the beginning of this section. In a number of cases this conciliation is made by resorting to classical physics. However, this approach entails the belief in classical thermodynamics and in classical statistical physics. The latter, as it is well known (cfr. [61,1]), is based on the so-called correlation weakening principle stating that, during a relaxation process, all long-range correlations between the individual motions of single particles tend to vanish when the volume tends to infinity. This, in turn, implies that, while spontaneous symmetry breaking phenomena, like the ones described by Landau theory, are allowed by classical physics without any problem (see, for instance, [33,65,26]), they give rise to new equilibrium states which are unstable with respect to thermal perturbations, even of small amplitude (one of the first proofs of this fact was given in [67]). At first sight, such a circumstance could not appear as a problem. After all, nobody pretends to build models granting for structures absolutely insensitive to any perturbation. However, a deeper analysis shows that instability with respect to thermal perturbations is equivalent to instability with respect to longwavelength disturbances, and whence entails the impossibility of taking infinite volume limits. The latter is a serious flaw, as one of the main pillars of TPT, that is the divergence of correlation length at the critical point, due to the occurrence of infinite-range correlations, has a meaning if and only if we go at the infinite volume limit. Therefore the aforementioned results imply that classical physics is a framework untenable if we are searching for a wholly coherent formulation of TPT, in which phenomenological models are in agreement with statistical physics. How to find an alternative framework? At this point the only remaining possibility is to make resort to QFT. The attractiveness of the latter stems from the fact that within QFT, and only within it, there is the possibility of having different, non-equivalent, representations of the same physical system (cfr. [34,37]; a more recent discussion on the consequences arising from this result, often denoted as “Haag Theorem”, can be found in [9,8,60]). As each representation is associated with a particular class of macroscopic states of the system (via quantum statistical mechanics) and this class, in turn, can be identified with a particular thermodynamical phase of the system (for a proof of
Towards a General Theory of Change
607
the correctness of such an identification, see [63]), we are forced to conclude that only QFT allows for the existence of different phases of the system itself. Within this approach all seems to work very well. Namely the occurrence of a phase transition through the mechanism of spontaneous symmetry breaking (SSB) implies the appearance of collective excitations, which can be viewed as zero-mass particles carrying long-range interactions. They are generally called Goldstone bosons (cfr. [32]; for a more general approach see [70]). Such a circumstance endows these systems with a sort of generalized rigidity, in the sense that, acting upon one side of the system with an external perturbation, such a perturbation can be transmitted to a very distant location, essentially unaltered. The reason for the appearance of Goldstone bosons is that they act as order-preserving messengers, preventing the system from changing the particular ground state chosen at the moment of the SSB transition. Moreover, the interaction between Goldstone bosons explains the existence of macroscopic quantum objects [70,43]. Here, it must be stressed that the long-range correlations associated with a SSB arise as a consequence of a disentanglement between the condensate mode and the rest of system [64]. This finding, related to the fact that within QFT we have a stronger form of entanglement than within Quantum Mechanics [20], explains how the structures arising from a SSB in QFT have a very different origin from those arising from a SSB in classical physics. Namely, while in the latter case we need an exact, and delicate, balance between short-range activation (due to non-linearity) and long-range inhibition (due, for instance, to diffusion), a balance which can be broken even by a small perturbation, in the former case we have systems which, already from the starting, lie in an entangled state, with strong correlations which cannot be altered as much by the introduction of perturbations. Despite these advantages, however, the situation within QFT-based approach is not so idyllic at is could appear at a first sight. Namely the mathematical machinery of QFT is essentially based on propagators (or Green functions) which allow to compute only transition probabilities between asymptotic states, without any possibility of describing the transient dynamics occurring during the transitions themselves. And, what is worse, it is easy to prove that, in most cases, such a dynamics must be described by classical physics. This raises a number of further problems, because the transition from the quantum regime, existing far from critical point, and the classical regime, holding close to critical point, is nothing but a phenomenon of decoherence, due to the interaction with the external environment. But, what are the physical features of the latter process (and of the symmetric process of recoherence,
608
E. Pessa
taking place after the phase transition and restating the quantum regime)? How to describe the external environment? How these phenomena can influence the formation of structures (like defects) surviving even after the completion of phase transition and signaling its past occurrence? All these questions cannot, unfortunately, be answered within the traditional framework of QFT. Namely these problems have been dealt with by resorting to suitable generalizations of it. 3. Beyond TPT Starting from the Eighties, the need for applying QFT, rather than to particle physics, to condensed matter and phase transition theory stimulated the introduction of suitable generalizations of old schemata. We will quote, in this regard, three main advances: a) the acknowledgement that Goldstone bosons can undergo a condensation, giving rise to observable macroscopic effects; among the methods introduced to describe such phenomena we mention the so-called boson transformation [70]; in this way it becomes possible to deal with defects arising as a consequence of a symmetry-breaking phase transition (an example is given by solitary waves, or solitons), as well as with macroscopic quantum objects; these developments allowed a deeper understanding of effects which could account for the occurrence of biological coherence, such as Davydov effect and Fröhlich effect; the former (for reviews of this topic, see [62,28,22,11,30]) consists in the production of solitons moving on long biomolecular chains, when the metabolic energy inflow is able to produce a localized deformation on the chains themselves; the latter, in its essence (see, for recent discussions, [47,48]), consists in the excitation of a single (collective) vibrational mode within a system of electric dipoles interacting in a nonlinear way with a suitable source of energy, or with a thermal bath; it is to be added that these dipoles are thought to be present both in water constituting the intracellular and extracellular liquid as well as in biological macromolecules, owing to ionization state connected to the existence of high-intensity electric fields close to cellular membranes; the coupling of both effects allowed for the introduction of a quantum theory of biological coherence (for a recent review see [23]) as well as of a quantum brain theory (the huge amount of literature on this topic is summarized in [40,41,72]), which had a number of important applications, such as the study of the role of cytoskeleton in neural cells [35,69], the operation of memory[71,2,55,29], and the basis for consciousness (see, for instance, [36]);
Towards a General Theory of Change
609
b) the introduction of a more refined theoretical description of the environment; starting from the pioneering work of Caldeira and Leggett (see, for instance, [12]) theoretical physicists began to introduce more and more explicit models of the environment, in turn described as a set of interacting entities (for instance, quantum Brownian oscillators); this allowed to deal in a more correct way with phenomena such as dissipation and decoherence (see, among the others, [13,14,15,73]); among the most interesting methods used to take into account dissipation within QFT we quote the so called doubling mechanism [18]; the latter describes the influence of a dissipating environment by doubling the original dissipative system through the introduction of a time-reversed version of it, which acts as an absorber of the energy dissipated by the original system, so that the whole system, including the environment, can be dealt with as if it were an Hamiltonian system; this framework makes possible to argue that the presence of a field-mediated interaction (present at the level of biological macromolecules) could work against decoherence, provided the field were of a particular kind; let us suppose, for instance, to have a simple quantum system lying in an entangled state (for example the Schrödinger cat state), interacting with a classical field inducing a dissipative dynamics; then (we follow here the argument of Anglin et al. [6]), owing to the fact the different degrees of freedom of the system react in a different way to action of the field, the interference which supported the entanglement disappears and the system state reduces to the product of the single states of its components; in other words, decoherence occurred; however, as the dynamics is dissipative, the system is forced to evolve, independently from its initial conditions, towards an attractor whose dimensionality is lesser than the number of degrees of the system; thus, after a suitable relaxation time, some degrees of freedom (or even all of them, when the attractor is an equilibrium point) fall in the same state, just as if the system were in an entangled state; decoherence disappeared, and recoherence took place! This apparent paradox is easily solved if we take into account that the dissipation induced another stronger kind of entanglement, the one between the system and the environment; and just such entanglement was responsible for the relaxation towards the equilibrium state; c) the explicit modeling of phenomena close to critical point of a phase transition in presence of realistic conditions, that is finite volume, finite time, boundary constraints (see, in this regard, contributions such as the ones of [42, 27,57,45,3,4,44,58,5,7]); this allowed to distinguish, during a phase transition, a number of different stages: the initial one of decoherence (in which quantum fluctuations become much smaller than thermal fluctuations), the classical one,
610
E. Pessa
close to critical point, in which the influence of chaos and noise becomes of utmost importance, and the final one of recoherence, in which the quantum regime is re-established; the features of the final phase arising after the completion of phase transition depend in a crucial way on the dynamics occurring in the classical stage, which, in a sense, structures the landscape for the processes occurring after recoherence; in some cases it is possible to influence the structuring processes occurring within classical stage through suitable external control actions, so as to transform an intrinsic emergence (in the sense of Crutchfield; see [21]) in a controlled pattern formation. The advances quoted above gave rise to a generalized form of phase transition theory which seems more suited to deal with the description of changes occurring in biological matter at the most basic level, that is the one of behavior of biological macromolecules and of processes involving cell membranes. However, there is the suspicion that this framework be unable to work when going to processes of biological change occurring at levels beyond the basic one. 4. Biological models of change Started near the beginning of twentieth century, the development of theoretical models of biological change (including economical, psychological, social change) has been so intense as to give rise to a huge number of different models. However, differently from what occurs in physics, it is impossible to find a common framework to which all models can be reduced. This circumstance (essentially due to a lack of interdisciplinarity) produced a low efficiency of most models of this kind. And it is to be recalled that Von Bertalanffy introduced General System Theory even to remedy for this state of affairs. As it is well known, his work, as well as the one of founding fathers of Systemics, helped to understand the capital role of Dynamical Systems Theory in describing changes in a number of different domains through a sort of unified language. However, this put into evidence even the intrinsic limitations of this approach, deriving from the special features of models of biological change. These latter became easily recognizable after the advent of computer-based simulations of biological models, which were analytically intractable. We can shortly list these features in the following. 4.1. Importance of individuality Contrarily to what occurs in physical models, whose single components have identical features, in biological models each component is endowed with individual features, partly differing from the ones of other components. And the
Towards a General Theory of Change
611
form of distribution among the components of these individual features is crucial for the operation of the whole system. This entails that most biological models describe disordered systems, that is systems for which it is often impossible to forecast a priori the nature of their dynamics on the only basis of their macroscopic statistical features. 4.2. Reactive nature of the environment In most cases the environment of a biological system (often constituted by other living beings) has a reactive nature, as it counteracts the actions of the system under study by sending to it suitable responses and, in some cases, even selecting some features of system itself (a process which often amounts to change the laws themselves ruling the dynamics of the system or the nature of its components). This interplay of the action and the reaction is at the basis of adaptation process and can sometimes be described in a shortened (even if unrealistic) way by resorting to concepts such as the one of fitness. Nothing similar occurs in physical models of change, where we deal, in the best case, with passive environments constituted by thermal baths or Brownian oscillators. 4.3. Creation of new kinds of components In a number of cases the dynamics of biological change can lead to the creation of new constituents, of a new kind not existing before the creation. The appearance of these new elements generally modifies in a radical way the form itself of dynamical laws fulfilled by the system. On the contrary, in every physical model the form of dynamical laws stems unchanged with time. 4.4. Absence of conservation principles Almost the totality of models of biological change lacks general conservation principles, such as the one of total energy. Therefore they cannot put under a Lagrangian or Hamiltonian form, making difficult to make a direct use of methods and results obtained within models of physical change, heavily relying on the Hamiltonian formalism. 4.5. Non-equilibrium dynamics Most, if not all, models of biological change describe non-equilibrium situations, because they deal with systems which are far from reaching adaptation. However, this implies the impossibility of resorting to general results
612
E. Pessa
holding in most physical models, such as detailed balance principle, fluctuationdissipation theorem, and all machinery of equilibrium Thermodynamics. 4.6. Importance of configurational variables Most constituents of biological models are characterized by a complex inner structure, described by suitable configurational variables, whose values have a crucial importance for the dynamics of constituents themselves. Nothing similar occurs in physical models, characterized by very simple constituents (typically point particles). 4.7. Multi-level hierarchical structure Most biological systems are characterized by a multi-level hierarchical structure, which contrasts with the simple two-level (macroscopic and microscopic) description occurring in physical models. Besides, the nature and the number itself of levels can change as a function of the interaction with environment. It is to be underlined the presence of inter-level interactions, the most interesting aspect of them often being the direct influence of higher levels on the lower levels behaviours. A typical example is given by ant polymorphism (see, for instance, [59,68,39]), in which the number of ant sub-species (workers, soldiers, etc.) varies as a function of macroscopic variables such as the numerical consistence of ant colony and the environmental constraints. While all these features evidence the large differences between models of physical and of biological change, the most obvious question is: why should we attempt to reduce these differences? After all, most models of biological change are based on a sophisticated mathematics and this makes us confident in their efficiency in describing biological phenomena. As the latter seem to be very different from physical phenomena, where is the need for an unification? The answer lies in the need for assessing model reliability (or, in other terms, validity). Namely the latter, within the highly fragmented world of biological models, is a very difficult enterprise. Most researchers, in this regard, make use of sophisticated statistical analyses which, however, almost always give significant results only in presence of a very large number of experimental data. And, as it is well known, the latter is a condition which, in most cases of practical interest, it is impossible to fulfill. On the contrary, in models of physical change reliability is based on very general principles which, in turn, are embodied within specific models. The latter give rise to precise predictions (which can be falsified even by only one experimental finding, without the need for statistics). In general, the lack of reliability is due to details of specific
Towards a General Theory of Change
613
models, which can be easily changed without changing the overall framework. Within this approach, therefore, the assessment of model reliability is a far simpler affair than in the biological case. There is, however, another reason for searching for a general theory of change, including the physical and biological ones as special sub-cases: the hope of individuating and classifying the possible different scenarios of change, each scenario being associated to a set of specific strategies for predicting, detecting, and (when possible) controlling changes, independently from their biological or physical nature. In order to evidence how much we are far from reaching this goal (despite the advances of theoretical physics), it is instructive to look at very simple models of biological change, asking ourselves what should be added to actual framework of theoretical physics in order to include the features of models themselves. 5. Bridging the gap between physics and biology? Before starting the analysis of a simple model of biological change, we remark that, among the features quoted in the previous section, one of the most disturbing for models of physical change is given by the existence of inter-level influences which, in most cases, support the reactions of the environment on the system under study. Typically these influences take the form of a global-local interaction, in which some microscopic variables are influenced by the values of suitable macroscopic variables (describing macroscopic environmental states or system states or both). The usual physical models don’t include interactions of this kind, which, on the other hand, could not be described within an Hamiltonian framework. Namely in the simplest cases they require integrodifferential or even functional equations whose mathematics is still largely unknown. In order to evidence the powerful influence of this kind of interactions, we will resort to an almost trivial example, consisting in an artificial neural network containing N units, totally interconnected. By adopting a discretized time scale, the output of the i-th unit at time t + 1 is given by:
xi (t + 1) = tgh
(
j
wij x j (t ) − si (t )
)
(1)
While the weights wij are initially chosen at random (the only condition being the vanishing of self-connections) and then kept fixed during all network evolution, the individual thresholds si (t ) (initially still chosen at random) are allowed to vary according to the dynamical law:
614
E. Pessa
(a)
(b)
(c) Fig. 1. The three figures show the evolution of average output of a neural network containing 50 units in three conditions, corresponding to α = 0 , α = 0.1 , α = 2. For further details see the text.
si (t + 1) = si (t ) + α [m(t ) − s0 ]
(2)
Here m(t ) denotes the average output of the network at time t (which is a macroscopic variable, while the single outputs and thresholds are microscopic variables) and s0 is a suitable threshold value for this average. Besides, α is a parameter chosen by the experimenter. Of course, when the value of this parameter is different from zero we have that (2) gives a simple example of global-local interaction. In order to make this example as simple as possible, we
Towards a General Theory of Change
615
avoided any external input. Thus, our model describes nothing but a closed disordered system (more precisely we deal with a quenched disorder) which must do nothing but to … evolve. In the Figures 1(a), 1(b), 1(c) we show three different snapshots of computer simulations of the evolution for 200 time steps of the average output of a network containing 50 units, all starting exactly from the same initial conditions (same weights, same initial threshold values, same initial output values), with s0 = 0.1 . The only difference lies in the values of α . In 1(a) we have α = 0 (absence of global-local interaction), in 1(b) we have α = 0.1 (weak global-local interaction), and in 1(c) we have α = 2 (strong global-local interaction). As it is easy to see, while in absence of global-local interaction we have a fast (and expected) relaxation towards an equilibrium state, even a weak global-local interaction induces a deep change in network dynamics (proving that such interactions are very effective in controlling microscopic dynamics), whose equilibrium state is shifted towards s0 . Then, the presence of a strong global-local interaction gives rise even to a deep change of the nature itself of network dynamics, which becomes quasi-periodic (and presumably chaotic). This elementary example shows clearly the dramatic effect produced by the introduction of a global-local interaction. Therefore a first problem to be solved by theoretical physicists should be the one of generalizing the usual formalisms so as to include interactions of this kind. Unfortunately, so far no convincing solution of this problem has been proposed, despite the development of theories such as the one of viscoelasticity (standard textbooks are [19,56,38]), which in an explicit way try to generalize the Lagrangian methods to the cases in which global-local interactions are occurring. In order to better illustrate the difficulties arising when trying to connect phenomenological models with usual framework adopted in models of physical change we will use a particular version of a very simple model of population evolution, introduced by Michod (see, among the others, [49,50]), and applied to describe the evolutionary biology of volvocine algae. The interest for this model stems from the fact that, starting only from a microscopic dynamics, it predicts the occurrence of global correlations, a circumstance often attributed, within physical models, only to QFT-based models. Within Michod model the i -th individual of a population is characterized by the values of two variables: its generative ability bi and its survival ability vi . In general, as shown by biological data, the two variables b and v are not reciprocally independent, but are connected through a general law expressed by a function ν (b) which is decreasing in b . In this regard, we introduced a specific form of ν (b) given by:
616
E. Pessa
ν (b) = [γ (1 + γ ) (b + γ )]
(3)
where γ is a suitable parameter. For reasons of convenience we constrained the values of v and b within the interval [0,1] and (3) tells us that v(0) = 1 , v(1) = 0 . However, as the total number of individuals is not constant (owing to the existence of a reproductive ability), but a function of time N = N (t ) , we supposed (Michod did not make explicitly this hypothesis), that the value of parameter γ appearing in (3) (that is the convexity of the curve) were dependent on the momentarily value of N through a law of the form:
γ =
β0 N
+ α0
(4)
where β 0 and α 0 are other parameters. What should we expect from the behavior of this simple model? Michod introduced two different kinds of fitness measure: the average individual fitness of population members, denoted by wˆ , and the total (normalized) population fitness, denoted by W . Their definitions are:
wˆ =
1 N
bν i i i
,
W=
1 N2
BV ,
B=
b i i
,
ν=
ν i i
(5)
From both biological observations and results of computer simulations Michod recognized that, while obviously both fitness measures vary with time, however for most time they have different values. Thus he was lead to introduce a quantity, called covariance and here denoted by Cov , which measures such a difference through the simple relationship:
Cov = wˆ − W
(6)
When the value of Cov is negative, the total population fitness is greater than the average individual fitness. In other words, we are in a situation in which it seems that there is some sort of global cooperation (or coordination) among individuals producing an increase of total fitness. Clearly this is not quantum coherence, but recalls some aspects of the latter. How is this increased total fitness reached? The simulations show that it is sometimes due to a sort of increase in specialization of population members. In this regard, we remark that every pair of values (v, b) characterizes a single individual and that the form of the distribution of these pairs (or better, only of b values, as (95) lets us find the value of v , once known the value of b ) gives a measure of the degree of specialization present within the population at a given time instant. Namely a flattened distribution means a strong difference between the individuals, and
Towards a General Theory of Change
617
Fig. 2. Group fitness (upper curve) and average individual fitness (lower curve) vs time.
therefore a high degree of specialization, while a strongly peaked distributions means small differences between the individuals and low degree of specialization. In order to check whether these effects are present even under the hypotheses we introduced before in (3)-(4), we performed suitable numerical simulations of the evolution of populations. They were based on a previous subdivision of the interval [0,1] of values of continuous variable b in a suitable number of equal sub-intervals, in correspondence to each one of which the value of b was identified with the middle point of the sub-interval. The mechanism of reproduction was random, based on a uniform distribution, and such that each reproductive value was interpreted as the probability, at each generation, of producing a number of offspring which were a fraction of a maximum possible number fixed in advance by the experimenter. The produced descendants were assigned at random to the different generative ability sub-intervals. Besides, even the value of v for each individual was interpreted as the probability of its survival in the next generation. In the Figure 2 we can see a plot of both kinds of fitness vs. generation number in a “life history” characterized by 100 generations, an initial total number of individuals given by 50, a number of 100 different sub-intervals of generative ability, and a maximum allowable number of descendants for each population member and for each generation given by 5. Moreover the maximum allowable number of individuals for each generative ability sub-interval was
618
E. Pessa
fixed to 100, and the initial value of g was 5. The values of remaining parameters were α 0 = 3 , β 0 = 100 . As it is immediate to see, the covariance is always negative and the group fitness prevails over the average individual fitness. What has been in this case the effect of population evolution on the distribution of values of b among the individuals. We remark that at the beginning of this simulation we chose to put all individuals within the same generative ability class, corresponding to the 50th sub-interval. This distribution is depicted in the Figure 3(a). For a comparison we show in the Figure 3(b) the final distribution obtained after 100 generations. As it is possible to see, not only the final distribution deeply differs from the initial one, but the former evidence a very high degree of specialization of single individuals. It is easy to understand that this simple model accounts not only for the evolution of populations of volvocine algae, but, more in general, of the fact that most biological organisms survive in a complex environment just owing to the fact that their components (cells or organs) are highly specialized and reciprocally cooperating. Now let us deal with the main question: can this model be dealt with through the methods, for instance, of QFT? If yes, through which algorithms? If not, for which reasons? To begin, we could argue that the model was not cast under the form of a system of differential equations and, therefore, it could not be dealt with through Hamiltonian-based methods. As a matter of fact it can be easily shown that it is not to possible to cast the model even under the form of stochastic differential equations (provided we not introduce suitable approximations which, however, would destroy the nature itself of the model). On the other hand, even methods such as the one of Doi-Peliti (see [24,53]; applications and generalizations are described by Cardy and Täuber [17], PastorSatorras and Solé [52], Smith [66]), which transforms a model based on a master equation in a sort of Hamiltonian field theory, cannot be applied, because it is easy to recognize that the master equation of Michod model is very complicated and not reducible to a reaction-diffusion form. Nor a mean-field approximation seems useful. This argument is, however, very weak as we could hypothesize that, in a near future, by using suitable new kinds of tricks and approximations, it would be possible, in one or in another form, to cast Michod model into a mathematical format closer to the ones popular in theoretical physics. In any case, this would be a problem only of technical (mathematical) nature, and not a serious conceptual obstacle. A stronger argument takes into consideration the nature of the environment which, as implicitly described in the previous model, has a typically reactive
Towards a General Theory of Change
619
(a)
(b) Fig. 3. (a) Initial distribution of generative abilities; (b) Final distribution of generative abilities.
character. On the contrary, QFT models are placed within simpler environments, such as thermal baths, and the concept itself of fitness is absent. We can therefore claim that QFT will never give rise to a reliable description of phase transitions in biological matter if we will not generalize it so as to include more realistic descriptions of biological environments. Of course, this generalization should also take into account the fact that, owing to previous reasons, phase transitions in biological matter are often of nonequilibrium type. The doubling mechanism quoted in Section 3 constitutes a first step towards this direction, but probably further steps are needed. A third argument deals with the nature of system components. While QFT models can be interpreted as describing assemblies of particles, all having the same nature, the members of population previously introduced are different individuals. In other words, once translated in the language of particle creation and annihilation, the model operating according the rules (3)-(4) describes creation and annihilation, not only of particles, but even of kinds of particles. In terms of a field language it is equivalent to a theory describing the birth and the
620
E. Pessa
disappearing of fields. Within QFT this would require the introduction of third quantization. This is still a somewhat exotic topic, dealt with almost exclusively within the domain of quantum theories of gravitation, but so far with only very few contacts with the world of QFT models of condensed matter (for a first attempt see [46]; see also [54]). However the latter domain could be the better context for applying this kind of extension of QFT, owing to the presence of “effective force fields” which appear and disappear as a function of environmental constraints. On the contrary, it would be useless within the world of elementary particle physics at high energies, where from the starting people are searching for evidence of universal force fields whose nature lasts unchanged. As a consequence of the previous arguments we can claim that, notwithstanding the theoretical efforts quoted in the previous sections, actually the gap between models of biological and of physical change, in particular QFT, is still very large, except in a limited number of cases, related to low-level phenomena. However, provided the generalizations mentioned before were included within a larger theoretical framework, the gap could probably be filled.
6. Conclusions The considerations made within this paper evidenced the difficulty of the road to be followed to reach at least a first form of a general theory of change. In this regard, we hope to have clarified what are the generalizations of physical models needed to fill the gap between models of biological and physical change. We remark that, even if, at first sight, the problems to be solved appear to be only of technical nature, really they are of conceptual nature. Namely the adopted technical methods are strongly dependent on the conceptual framework adopted in describing the world, and mostly on the goals underlying this description. In particular, since Newton times the world has been conceived as populated of systems constituted by elementary (and irreducible) entities plus the interactions between these entities. While this framework can be useful for describing planets revolving around the Sun, or electrons orbiting around a nucleus, it becomes useless when describing other kinds of systems, such as biological ones, in which notions such as cause-effect, elementary entity, system-environment boundary, can become devoid of any sense. We could thus say that, even if traditional physical framework adopts a sort of restricted systemic view, to describe even biological systems we need a generalized systemic view. Actual Systemics can strongly help in developing the latter, thus
Towards a General Theory of Change
621
contributing to the revolutionary conceptual transformation needed to build a general theory of change.
References 1. A. Akhiezer, S. Péletminski, Les méthodes de la physique statistique (Mir, Moscow, 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.
1980). E. Alfinito, G. Vitiello, Int. J. Mod. Phys. B 14, 853 (2000). E. Alfinito, G. Vitiello, Phys. Rev. B 65, 054105 (2002). E. Alfinito, O. Romei, G. Vitiello, Mod. Phys. Lett. B 16, 93 (2002). C.A. Almeida, D. Bazeia, L. Losano, J.M.C. Malbouisson, Phys. Rev. D 69, 067702 (2004). J.R. Anglin, R. Laflamme, W.H. Zurek, J.P. Paz, Phys. Rev. D 52, 2221 (1995). N.D. Antunes, P. Gandra, R.J. Rivers, Phys. Rev. D 71, 105006 (2005). A. Arageorgis, J. Earman, L. Ruetsche, Studies in the History and Philosophy of Modern Physics 33, 151 (2002). J. Bain, Erkenntnis 53, 375 (2000). G. Benfatto, G. Gallavotti, Renormalization Group (Princeton University Press, Princeton, NJ., 1995). L. Brizhik, A. Eremko, B. Piette, W. Zakrzewski, Phys. Rev. E 70, 031914 (2004). A.O. Caldeira, A.J. Leggett, Ann. Phys. 149, 374 (1983). E. Calzetta, B.L. Hu, Phys. Rev. D 61, 025012 (2000). E. Calzetta, A. Roura, E. Verdaguer, Phys. Rev. D 64, 105008 (2001). E. Calzetta, A. Roura, E. Verdaguer, Phys. Rev. Lett. 88, 010403 (2002). J. Cardy, Scaling and Renormalization in Statistical Physics (Cambridge University Press, Cambridge, UK, 1996). J.L. Cardy, U.C. Täuber, J. Stat. Phys. 90, 1 (1998). E. Celeghini, M. Rasetti, G. Vitiello, Ann. Phys. 215, 156 (1992). R.M. Christensen, Theory of Viscoelasticity. An Introduction (Academic Press, New York, 1971). R.K. Clifton, H.P. Halvorson, Studies in the History and Philosophy of Modern Physics 32, 1 (2001). J.P. Crutchfield, Physica D 75, 11 (1994). L. Cruzeiro-Hansson, S. Takeno, Phys. Rev. E 56, 894 (1997). E. Del Giudice, A. De Ninno, M. Fleischmann, G. Vitiello, Electromagnetic Biology and Medicine 24, 199 (2005). M. Doi, J. Phys. A 9, 1465 (1976). C. Domb, The critical point (Taylor and Francis, London, 1996). J.R. Drugowich de Felício, O. Hipólito, Am. J. Phys. 53, 690 (1985). J. Dziarmaga, P. Laguna, W.H. Zurek, Phys. Rev. Lett. 82, 4749 (1999). W. Förner, Int. J. Quantum Chem. 64, 351 (1997). W.J. Freeman, G. Vitiello, Physics of Life Reviews 3, 93 (2006). D.D. Georgiev, Informatica 30, 221 (2006). N. Goldenfeld, Lectures on phase transitions and the renormalization group (Addison-Wesley, Reading, MA, 1992).
622
E. Pessa
32. J. Goldstone, A. Salam, S. Weinberg, Phys. Rev. 127, 965 (1962). 33. D.M. Greenberger, Am. J. Phys. 46, 394 (1978). 34. R. Haag, In W.E. Brittin, B.W. Downs and J. Downs (Eds.). Lectures in Theoretical 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.
Physics, vol. 3 (Wiley, New York, 1961), pp. 353-381. S. Hagan, S.R. Hameroff, J.A. Tuszy ski, Phys. Rev. E 65, 061901 (2002). S.R. Hameroff, A. Nip, M.J. Porter, J.A. Tuszy ski, Biosystems 64, 149 (2002). Hepp, K. (1972). Helvetica Physica Acta 45, 237 (2002). D.E. Hill, Continuum Mechanics: Elasticity, Plasticity, Viscoelasticity (CRC Press, Boca Raton, FL, 2006). W.O. Hughes, S. Sumner, S. Van Borm, J.J. Boomsma, Proc. Nat. Acad. Sci. USA 100, 9394 (2003). M. Jibu, K. Yasue, Quantum Brain Dynamics and Consciousness: An Introduction (Benjamins: Amsterdam, 1995). M. Jibu, K. Yasue, In G.G. Globus, K.H. Pribram and G. Vitiello (Eds.). Brain and being. At the boundary between science, philosophy, language and arts (Benjamins: Amsterdam, 2004), pp. 267-290. P. Laguna, W.H. Zurek, Phys. Rev. Lett. 78, 2519 (1997). H. Leutwyler, Helvetica Physica Acta 70, 275 (1997). F.C. Lombardo, R.J. Rivers, F.D. Mazzitelli, Int. J. Theor. Phys. 41, 2121 (2002). G. Lythe, Int. J. Theor. Phys. 40, 2309 (2001). V.P. Maslov, O.Yu. Shvedov, Phys. Rev. D 60, 105012 (1999). M.V. Mesquita, A.R. Vasconcellos, R. Luzzi, Int. J. Quantum Chem. 66, 177 (1998). M.V. Mesquita, A.R. Vasconcellos, R. Luzzi, Braz. J. Phys. 34, 489 (2004). R.E. Michod, Proc. Nat. Acad. Sci. USA 103, 9113 (2006). R.E. Michod, Y. Viossat, C.A. Solari, M. Hurand, A.M. Nedelcu, J. Theor. Biol. 239, 257 (2006). G. Minati, E. Pessa, Collective Beings (Springer, Berlin, 2006). R. Pastor-Satorras, R.V. Solé, Phys. Rev. E 64, 051909 (2001). L. Peliti, Journal de Physique 46, 1469 (1985). E. Pessa, G. Resconi, In G.Minati, E.Pessa (Eds.). Emergence in Complex, Cognitive, Social, and Biological Systems (Kluwer, New York, 2002), pp. 141-149. E. Pessa, G. Vitiello, Int. J. Mod. Phys. B 18, 841 (2004). Yu.N. Rabotnov, Elements of Hereditary Solid Mechanics (Mir, Moscow, 1980). R.J. Rivers, Int. J. Theor. Phys. 39, 1779 (2000). R.J. Rivers, F.C. Lombardo, F.D. Mazzitelli, Int. J. Theor. Phys. 41, 2145 (2002). G.E. Robinson, Annual Review of Entomology 37, 637 (1992). L. Ruetsche, Philosophy of Science 69, 348 (2002). Yu.B.Rumer, M.Sh. Rivkyn, Thermodynamics, Statistical Physics, and Kinetics (Mir, Moscow, 1980). A.S. Scott, Phys. Rep. 217, 1 (1992). G.L. Sewell, Quantum Theory of Collective Phenomena (Oxford University Press, Oxford, UK, 1986). Y. Shi, Phys. Lett. A 309, 254 (2003). J. Sivardière, Am. J. Phys. 51, 1016 (1983).
Towards a General Theory of Change
623
66. E. Smith, Santa Fe Institute working paper #06-11-40 (Santa Fe Institute, Santa Fe, NM., 2006).
67. D.L. Stein, J. Chem. Phys. 72, 2869 (1980). 68. W.R.Tschinkel, BioScience 48, 593 (1998). 69. J.A. Tuszy ski, Ed., The emerging physics of consciousness (Springer, Berlin, 2006).
70. H. Umezawa, Advanced Field Theory. Micro, Macro, and Thermal Physics (American Institute of Physics: New York, 1993).
71. G. Vitiello, Int. J. Mod. Phys. B 9, 973 (1995). 72. G. Vitiello, My double unveiled (Benjamins, Amsterdam, 2001). 73. W.H. Zurek, Rev. Mod. Phys. 74, 715 (2003).
This page intentionally left blank
ACQUIRED EMERGENT PROPERTIES
GIANFRANCO MINATI Italian Systems Society, Milan, Italy E-mail: [email protected] We first discuss the concept of structure to differentiate between structural and systemic properties. Within this framework we discuss processes of establishing structures, such as phase transitions, organization and self-organization. The paper introduces concepts and insights about the process of Acquiring Properties (AP) in systems, not just possessing properties. The last point relates to establishing, sustaining and managing new properties in emergent and organizational systems. In the Appendix we briefly discuss this approach for the concept of mind possessed by living matter. Keywords: acquired property, emergence, self-organization, structure.
1. Introduction In various disciplinary fields there is increasing interest, witnessed by the increasing number of publications, in concepts such as emergence, selforganization, collective behavior and phase transitions. This interest relates to theories, models and simulations not only in physics, but for a variety of disciplines, such as Artificial Life, Biology, Cognitive Science, Economics and Social Systems. The interest is particularly evident through the fact that original models developed in physics could not be simply transposed to other disciplines by just changing the meaning of the variables considered as, on the contrary, has been possible with specific approaches such as Synergetics. We expect that different discipline-specific approaches will be represented in a future generalized Theory of Emergence based on multiple, non-equivalent approaches. How are systems established or phenomena modeled as such? A systemic approach is based on considering components interacting in a structured way. We first introduce the concept of structure deriving from relations between elements. New properties of structured elements may be produced by the structure itself, such as order. We then distinguish between: • Structured Interactions, occurring when interactions, i.e., the action of one element affects another’s, follow a predefined structure. An example of structured interactions is given in crystals and on assembly lines; and
625
626
•
G. Minati
Self-organized interactions. In this case elements interact in a nonstructured way, i.e., the constraints within which elements interact, is variable and self-established by elements depending upon boundary conditions, external input and with reference to parameters such as distance, timing, position and number of elements. Examples are flocks and swarms; lasers and ferromagnetism.
This is followed by a discussion of specific processes of establishing structures, such as: • Phase transitions. Second-order phase transitions, for instance, consist of an internal, global and simultaneous process of restructuring. • Organizing. Organizing is intended as introducing structures for interactions. We refer to the process of introducing organization into a set considered as having no organization at all or a different organization. • Self-organizing. The meaning of the prefix self relates to the fact that the variable structure by which elements interact (i.e., organization) is not imposed from without, but adopted autonomously, as a reaction or not to an external input. We are then able to focus upon two ways for establishing systems: • Through organization or • Self-Organization, considered as emergence when requiring the crucial constructivist role of the observer. We present this concentrated and necessarily limited review in order to introduce crucial points related to processes by which systems acquire new properties, subsequent to those originally possessed: • Acquisition of new properties. A system, i.e., the model of a phenomenon as such may have embedded the ability to establish new, unexpected, i.e., not explicitly designed by the observer, properties which can be modeled as properties of an autonomous system. Besides, such new properties may influence the original system. • How to sustain acquired properties. In the hierarchy of levels a property requires other lower levels. Is it possible to sustain a property without involving the lower levels? In the Appendix we consider the prospective of retaining acquired properties in the case of living matter provided with suitable cognitive system. One hotly
Acquired Emergent Properties
627
debated property is mind. A not modeled process, because of the evident impossibility of obtaining experimental information, is that related to new, emergent properties acquired by living matter provided with cognitive systems having higher levels of complexity, following the transition from living to nonliving matter. 2. Structures and Systems In mathematics we consider the structure of sets. This relates, for instance, to additional mathematical aspects, such as algebraic structures (groups, rings and fields), equivalence relations, measures, metric structures (i.e., geometries), orders and topologies. An abstract structure is a formal object defined by a set of composition rules, properties and relationships. These definitions are able to distinguish between structural and systemic properties. The structures of elements derive from relations between elements (e.g., positions of elements in a configuration, network or in relational models for databases). Relations may represent static or dynamic configurations of elements. In this case the new property is given by the structure itself and not by the interaction between elements. Examples are given by elements configured in an order (e.g., alphabetical, by age, dimension, weight) or placed by following any functional principles (e.g., lamps per wattage, food per expiry date and electronic components per function). A structure of relations is given by relations between relations (e.g., corporate organization) and structures of structures of elements are given by relations between relations between relations between elements (e.g., regional economics). With reference to systems, a structure in general describes the way by which interactions between elements take place. Interactions among elements are organized when following a structure. Sufficient conditions, in our current knowledge for the establishment of systems, relate to the suitable interacting [1,2,3] in (a) organized and (b) self-organized ways. Let us first consider systems established by (a) organized interacting components. 2.1. Structured Interactions We may distinguish two cases: • Change of structure - Interactions between elements driving towards the establishment of a new structure (e.g., phase transitions). In this case we have a new structure, a new configuration of relationships between elements, and then between their behaviors, as result of the process.
628
•
G. Minati
Establishment of a system - Interaction between elements driving towards the establishment of a system. In this case the elements, in order to continuously interact and to avoid the system dissolving in the environment, must respect some structural constraints, in order for interactions to be effective. Constraints may be, for instance, that elements must be at such a distance as to allow interaction to be effective in the case where this takes place through an exchange of energy, or elements must be connected when interactions take place through the exchange of information.
The first case has already been well described using theories of phase transitions available in the literature (see Section 3.1). In the second case, as mentioned above, components are considered able to have (1) reaction to an external input or (2) a behavior. In case (1) elements only react to inputs by following laws, such as those of physics and chemistry without any autonomous processing. The conceptual framework is that of stimulus-reaction. In such a way structured interactions between elements allow the establishment of systems through collective reactions to an external input thanks to the fact that inputs, i.e., actions on elements, propagate to all others making up the system, i.e., the structure of interacting elements, to adopt properties that individual elements do not possess. In this case systems are established through organization - processes of mutual input/output exchange of energy, matter or information between components of a network of pre-established relationships. Examples of systems established by elements interacting in a structure of relationships are given by chemical bonds and electronic circuits. In case (2) elements have a behavior due to the cognitive processing4 of the input. In this case interacting components are autonomous agents, i.e., agents possessing a natural or artificial (deriving, for instance, from the computational modeling of cognitive processes) cognitive system, such as birds or autonomous robots, allowing them to process the input and not just to react. Structured interaction between autonomous agents allows the establishment of systems having a behavior due to the organization adopted. Examples of systems established by autonomous agents interacting in a structure of relationships are given by assembly lines, military units and sports teams. In both (1) and (2), internal changes are processed as destructuring and eventually collapsing perturbations (e.g., the breakdown of an electronic circuit or the fault of an element in an assembly line).
Acquired Emergent Properties
629
2.2. Self-organized interactions Let us now consider systems established by elements interacting in a nonstructured way. Non-structured way means that the structure, i.e., the constraints by which elements interact, is not pre-established, variable and self-established by elements depending upon boundary conditions and with reference to certain parameters, such as distance, timing, position and number of elements establishing attractors and order parameters [5,6]. Also in this case components may be able to (1) react to an external input or (2) to show autonomous behavior. In the first case we have processes of mutual input/output exchange of energy, matter and information between elements in a non-organized way, i.e., without following pre-established rules. The process leads to the establishment, within suitable boundary conditions, of new self-organized collective entities from the coherent behavior of interacting components. Examples of collective entities established by interacting components are lasers, chemical oscillating phenomena such as the Belousov-Zhabotinsky reaction, ferromagnetism and superconductive systems. In the latter case we consider interaction as taking place between autonomous agents. Examples of collective entities established by autonomous agents are flocks, swarms, industrial districts and markets. We must stress how the two ways in which suitable processes of interaction take place, i.e., structured and self-organized, may be coexistent. For instance, social systems may be considered as organizations ignoring emerging processes taking place within them and vice versa. One typical example where these two ways coexist is an anthill where there are well-defined roles and simultaneous processes of emergence. 3. Processes of establishing structures We may consider three kinds of processes able to establish structures: Phase transitions, Organizing and Self-organizing. 3.1. Phase transitions A phase in physics relates to the state of matter. In short, a phase is a set of states of a physical system having uniform properties such as electrical conductivity, density, structure and index of refraction. Examples of phases of matter are the liquid, solid and gas phases. Phases, however, are not thermodynamic states. For instance, two liquids at different temperatures are in different thermodynamic states, but in the same state of matter. The expression
630
G. Minati
phase transitions relates to processes of changing from one phase to another. This process is called phase transition. It is possible to distinguish between firstorder and second-order phase transitions. First-order transitions require a finite time. It is possible to have the coexistence of different phases in the same system, such as ice in water and liquid and vapor. A suitable external perturbation (e.g., change of temperature or pressure) is able to induce a total or partial disappearance of one phase in favor of the other. Examples of first order phase transitions are solid-liquid-gas transitions. Second-order transitions occur without continuity and simultaneously within the whole system involved in the process. In this kind of transition there is no coexistence of the two phases. The transition consists of an internal, global and simultaneous process of restructuring. Second order transitions are activated by the fact that the structure corresponding to the initial phase becomes instantaneously no longer valid and a new structure is established. Examples of second-order transitions are transitions from paramagnetic to ferromagnetic states, the occurrence of superconductivity and superfluidity. Processes of phase transitions have been reported in the literature using different approaches and theories [2]. 3.2. Organization / Self-organization Organization may be introduced into a set considered as having no organization or a different organization. This is a process for building an artificial system [7]. The concept of self-organization has been widely studied and is often adopted as a synonym of emergence. The meaning of the prefix self relates to the fact that the structure by which elements interact (i.e., organization) is not imposed from the outside, but adopted autonomously, either as a reaction or not, to an external input. Processes of self-organization are processes which can make elements (a) adopt a structure or (b) to interact while following a continuously self-established structure. The first case (a) regards, for instance, phase transitions, with particular reference to so-called order–disorder transitions. Following the work of I. Prigogine [8] and H. Haken [9,10,11], processes of second-order transitions, so-called order-disorder transitions, have been considered as processes of self-organization [12] and the terms emergence and self-organization considered as synonyms. By the way, the two concepts are different inasmuch as a system is considered self-organizing when it is able to change its structure in reaction to external inputs [13]. Emergence, as introduced in literature [2,14,15,16,17], requires the constructivist role of the observer able
Acquired Emergent Properties
631
to detect not only structural or ergodic changes, but also the establishment of new properties [18,19,20,21,22,23]. Processes of self-organization are, for instance, described by models formulated in terms of partial differential equations. Systems may then allow for an infinite number of solutions and their general form cannot be identified using suitable parameters. The difference between two solutions of a partial differential equation is given by an arbitrary function. Moreover, it is possible to find in these models locally stable solutions as descriptions of self-organizing processes. The simplest of such models is the so-called Brusselator [8,24]. The model was very useful for studying the Belousov-Zhabotinsky reaction mentioned above [25]. Case (b) relates to the establishment of Collective Behaviors. In this case elements interact by establishing coherence rather than a fixed organizational structure. Coherent behavior (named Collective Behavior) of particles or agents is established not due to an explicit design setting functions and roles, but as a consequence of their coherent rather than structured interaction. The phenomenon consists of the occurrence of coherence between microscopic behaviors detected by an observer using a model different from that used for individual components. It means that the level of description used for modeling the microscopic behavior is insufficient for detecting coherent, collective behavior. Processes of Collective Behavior are processes of self-organization able to give rise to emergent entities (i.e., systems) having new properties such as in physics with the establishment of lasers, fluids and plasmas [26,27], in biophysics with DNA replication [28], in sociology [29,30,31], in chemistry (pattern formation, dissipative structures), biology (morphogenesis, evolution), economics (markets), sociology (urban growth), brain activities, computer sciences and meteorology as well as non-linear phenomena involving a macroscopically large number of agents as in the case of insect societies [32,33] and swarm intelligence [34]. We briefly mention how the following equivalences, often taken as valid, are not completely correct [2]: • Emergence equivalent to Phase Transitions. • Emergence equivalent to Self-organization [35]. With reference to the first equivalence, the difference relates to properties acquired by continuously and coherent changing of structures and properties acquired by a specific change of structure. The unsuitability of the equivalence also relates to the problem of generalizing the concept of phase transition when
632
G. Minati
dealing with similar processes taking place in other disciplinary contexts. In physics the indicators relate to physical variables, such as temperature and pressure. If we want to apply the same model to other disciplines we need to find the corresponding indicators, for instance, in cognitive science, social systems and biology. With reference to the second equivalence, the unsuitability of the equivalence relates to the absence of the constructivist role of the observer able to detect not only processes of establishment of coherence, but also to realize new properties, i.e., usages though modeling and meaning. We conclude this chapter mentioning approaches to detect that a process of emergence is taking place by assuming self-organization as a necessary condition [2,36, 37,38]. 4. Processes of establishing systems We need to briefly recall that in a constructivist view a system is a model established by the observer for comprehending a phenomenon. In this way the observer identifies parts while trying to model a phenomenon as a system. Observer and designer are one and the same only for artificial systems. In this case we know the parts, interactions and structure because they have been designed. A different partitioning corresponds to different, mutually equivalent or irreducible, models. Moreover, the same process relates to the identification of interactions among parts and how they are organized, i.e., their structure or organization. We have a system when we are able to describe parts, their interaction and their structure [17]. A system is then a model of a phenomenon assumed to be able to represent, explain and simulate it. Designing a system, i.e., new structures, partitions and interactions, is a way to establish new entities having new properties (new with reference to components and interactions). 5. Processes of establishing Acquired Properties (AP) With reference to what has been introduced above, new properties should correspond to the adoption of a) a new structure and/or b) a new way of interacting, and/or c) new parts becoming involved. Actually, it means to establish new systems to get new properties. By the way, systems do not only possess properties, but are also able, in their turn, to establish new ones in different ways. It means that the same modeling has the power to explain the establishment of new properties previously not considered. There are several ways by which a system may acquire new properties, for instance: • Establishing new subsystems - this occurs when a system is established by subsystems and a new configuration of the same subsystems establishes a
Acquired Emergent Properties
•
633
system with new properties. Examples are extended new functionalities of electronic devices by adding to or rearranging their configuration. Introducing parameter changes - relating to a system' s way of working, such as speed of information exchange among parts, intensity of interactions and level of sensitivity to external parameters (e.g., temperature and pressure).
We may assume that a system keeps its identity when keeping the same partitions, structure and interaction between elements. After processes, for instance, of phase transitions, re-organization and self-organization the system is considered as no longer being the same. The same phenomenon may be modeled at different levels of description corresponding to systems differing in their partition and/or interactions and/or structure. This approach is well known in physics where multiple descriptions are possible by using, for instance, waveparticle duality, quantum and non-quantum models. A system may itself activate the process of establishing new properties thanks to the simultaneous or multiple interactions as for Multiple Systems (MSs) and Collective Beings (CBs) [2]. A MS is a set of systems established by the same elements interacting in different ways, i.e., having multiple simultaneous or dynamical roles. Precisely because of their multiple simultaneous or dynamical roles they make emergent different systems having different properties. This concept was introduced several years ago in psychology by considering multiple-memory-system models [39]. Examples in systems engineering include interacting networked computer systems performing cooperative tasks and the Internet, where different systems play different roles establishing continuously new, emerging usages. CBs are particular MSs established by agents possessing the same (natural or artificial) cognitive system. In CBs multiple belonging is active, i.e., decided by the component autonomous agents. Examples of CBs are Human Social Systems where agents may belong to different systems (e.g., families, workplaces, traffic systems, mobile telephone networks and as buyers) or give rise to different systems, such as temporary communities (e.g., audiences, queues, passengers on a bus). In this case a multi-modeling approach, known as the Dynamic Usage of Models (DYSAM) has been introduced, based on approaches already considered in the literature having a common strategy of not looking for a unique, optimum solution. These include the well-known Bayesian method, Pierce’s abduction, Machine Learning, Ensemble Learning and Evolutionary Game Theory [2,40,41]. We consider below some specific processes able to make a system adopt new properties.
634
G. Minati
5.1. Acquiring new properties A system as such, i.e., the model of a phenomenon, may have embedded in it the ability to establish new, unexpected properties, i.e., not explicitly designed by the observer. Moreover, such new properties are able to influence the original system. Consider the case of computational systems, i.e. Turing Machines. Since the 1950s researchers have realized how the ability to compute could give rise to properties of different kinds such as playing chess, learning, pattern and handwriting recognition processes, establishing so-called intelligent behavior. A new generation of computational systems is able to perform so-called subsymbolic computation, including Neural Networks, Cellular Automata and Genetic Algorithms, where computational rules are not explicitly established, but emergent (computational emergence). Physically speaking they all are based on an electronic system (the computer) performing machine cycles, i.e., steps performed by the processor unit processing digital data using a program. This system is able to acquire properties not only related to the explicit purpose of the program, but nonexplicitly designed properties, such as computational emergent properties in Neural Networks, e.g., learning, classifying, pattern recognition and gameplaying. These are examples of acquired properties due to computational emergence. Similar processes take place in living matter provided with neurological systems able to perform processes of signaling and leading to cognitive functions able to establish a cognitive system “intended as a complete system of interactions among activities, which cannot be separated from one another, such as those related to attention, perception, language, the affective-emotional sphere, memory and the inferential system” [42]. These different interacting levels establish the cognitive system and one level influences and is dependent upon the other. So far we have not said anything new. Within this conceptual framework, however, we would like to introduce some new conceptual problems able to produce new approaches. One possible way of formalizing the process of acquisition of new properties is based on considering hierarchical levels as introduced in the Baas formalization of emergence. Consider S1, a set of interacting elements having observable properties at the level of single elements Obs1 (S1). Let S2 be a second-order structure, and the result R obtained by applying interactions Int1 to the elements of S1, whose observable properties are Obs1 (S1): S2 = R(S1, Obs1 (S1), Int1). In this way a property P of S2 is emergent if and only if it is observable at the S2
Acquired Emergent Properties
635
level but not at a lower level, i.e., at the S1 level [43,44]. It is then possible to construct hyperstructures [45]. We refer to processes of emergence taking place in emergent systems. The processes of acquisition of new emergent properties may be formalized as a hierarchy of processes of emergence, when a property emerges from the interaction of entities emerging from lower structures, such as in Baas hierarchies [43,44,45]. Another way is to consider processes of organization taking place in previously established systems. For instance, further processes of organization taking place in organizations needing to better specialize their activity, such as a corporation which needs to sub-divide its market and adopt different, specialized strategies. Another example is given by living matter adopting specialized functions, e.g., organs and specializations as in the brain. It is inappropriate to consider the kind of processes mentioned above as being completely separate, rather than considering them as simultaneous and integrated processes. An effective modeling of such systems is not based on separate models, but rather on integrated models able to represent interactions between levels and processes of emergence. We think that a suitable approach is that represented by DYSAM. The problem of managing processes of emergence relates on one hand to the ability to induce, sustain and regulate and, on the other, to the ability to deactivate, i.e., make de-emergent, acquired properties. 5.2. How to keep acquired properties In the hierarchy of levels a property requires other lower levels. Cognitive properties, for instance, are based on necessary lower levels such as the physiological ones. This also relates to maintaining properties of MSs. Is it possible to sustain a property without the lower levels being involved or the properties of a MS without the original system in which the process of establishing MSs took place? Lower levels are necessary as well being influenced by the higher ones. One approach can be based on substituting (by reproducing the same) lower necessary levels in order to sustain and keep higher levels. The process of substituting is possible for virtual systems. Virtual systems are temporally established by components belonging to another system like in the case of Multiple Systems. For example, a virtual company really exists only as a temporary way of using resources belonging to other companies.46 Another approach is based on reproducing what is emergent without reproducing the
636
G. Minati
process of emergence. For instance, it is possible to reproduce effects without reproducing the generating processes when recording and reproducing music. We must also consider in natural systems the process of reproduction together with the representation and transmission of knowledge. In this case processes of transmission from one supporting system to others take place thanks to educational and representational processes. There are different kinds of processes characterized by gradualism in the replacement of supporting levels. We refer, for instance, to teams replacing over time their members, in the same way new cells replace dead cells in living matter. This could be considered a new concept, that of re-emergence, referring to reproducing similar process of emergence supported by the presence of new replacement elements. 6. Appendix: The acquired mind One crucial and interesting case is that of the properties acquired by living matter (several definitions of living exist), provided with a suitable cognitive system, through processes of emergence [47,48]. After centuries of interdisciplinary study of the topic, mind is still controversial and the subject of different approaches [49,50,51]. The subject of mind is closely related to another which has also been the subject of interdisciplinary investigations for many years: consciousness. Science has made tremendous progress in the study of the system considered crucial for the establishment of mind, i.e., the brain. In the philosophy of mind, researchers study the so-called mind-body problem. Among the many varied approaches [52,53] mind may be considered as an emergent, acquired property of the system established by brain and body. There are different approaches for introducing ways of modeling the emergence of mental processes. Here, we briefly mention the so-called Computational Theory of Mind. In the philosophy of mind, a line of research known as Emergent Materialism [50,54], considers mental phenomena as emergent from interactions occurring at the physical level (i.e., brain and body) in the same way as learning emerges in Artificial Neural Networks through interactions among neurons in the Connectionist Theory of Mind. Another approach is that based upon Quantum Field Theory [55]. This approach is based upon the Quantum Field Theory-based approach to living matter suitable for modeling the physical emergence of the main features of biological emergence. On one hand we have evidence of autonomy of mind by considering, for instance, its specific illnesses confronted using specific levels of description, such as those of psychology and psychiatry. Moreover, mind is able to act upon
Acquired Emergent Properties
637
its supporting, supposedly establishing processes such as those related to brain and body. Actually mind is able to influence brain and body by imposing certain behaviors. One extreme is given by so-called self-destroying behaviors, such as the use of drugs, alcohol and suicide. We may have different possible approaches when trying to model processes of establishment of mind, such as considering it established by processes of organization and self-organization involving both brain and body. It should also be noted how we use mind to study mind. A not modeled process, because of the evident impossibility of obtaining experimental information, is that related to new properties - such as mind itself , acquired by living matter provided with cognitive systems having higher levels of complexity, following the transition from living to non-living matter. Some religions and philosophies mention in different ways the possibility of an eternal life, evidently without the need for a body. We may intend eternal life as the supporting of newly acquired properties, such as mind, without the support of living matter. Death may be then intended as the moment where the change in the supporting process takes place, as discussed in Section 5.2. Of course this is just a conceptual, and not even a hypothetical (how to experiment?), framework. 7. Conclusions The new contributions introduced in this paper relate to the following points. We considered some insights into the role of structure in the process of establishing systems. We then discussed three points: 1) Processes of establishing structures; 2) Processes of establishing systems; and, as a major contribution, 3) Processes of establishing Acquired Properties (AP). The last point relates to the establishment of new properties in emergent and/or organizational systems. This refers to a kind of hierarchy of properties, one being based upon other preceding ones. We then mentioned related problems, such as how to acquire and sustain new properties. These problems give a further idea of the richness of a future General Theory of Emergence [56] to be intended not as a single theory, but rather as a multi-dimensional theory. One initial approach uses the concept of the Dynamical Usage of Models (DYSAM) to cope with Multiple Systems and Collective Beings. We mentioned, in the Appendix, how this view may allow the conception of a new point of view related to crucial existential problems for humanity, such as the possibility of supporting minds after the death of the supporting biological matter. Creation, supporting and managing acquired properties are, in sum, perspectives of a GTE. It is expected to also introduce new epistemological
638
G. Minati
approaches, including multi-dimensionality and the simultaneous usage of irreducible models, in order to deal with problems, such as those related to life, mind, development considered as an emergent property of systems of growth, sustainability as an emergent property not reducible to linear combinations of local sustainability and the supporting of acquired properties in general. References 1. G. Minati, Multiple Systems, Collective Beings, and the Dynamic Usage of Models, Systemist, 200 (2006).
2. G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). 3. L. von Bertalanffy, General System Theory: Foundations, Development, Applications (George Braziller, New York, 1968).
4. L.N. Barsalou, Cognitive Science: An overview for cognitive scientists (Erlbaum, Hillsdale, NJ, 1992).
5. R.W. Ashby, in Principles of Self-Organizing systems, Ed. H. von Foester and G.W. Zopf (Pergamon, Oxford, 1962), pp. 255-278.
6. H. von Foerster, in Self-Organizing Systems, Ed. M.C. Yovitts and S. Cameron, (Pergamon, New York, 1960), pp. 31-50.
7. S. Gubernan and G. Minati, Dialogue about systems (Polimetrica, Milano, Italy, 2007).
8. G. Nicolis and I. Prigogine, Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations (Wiley, New York, 1977).
9. H. Haken, Erfolgsgeheimnisse der Natur (Deutsche Verlags-Anstalt, Stuttgart, 1981).
10. H. Haken, Advanced Synergetics (Springer, Berlin-Heidelberg-New York, 1983). 11. H. Haken, in Self-organizing systems: The emergence of order, Ed. F.E. Yates, (Plenum, New York, 1987).
12. J.H. Holland, Emergence from Chaos to Order (Perseus Books, Cambridge, Massachusetts, 1998).
13. N. Banzhaf, in Encyclopaedia of Physical Science and Technology, 3rd edition, vol. 15, Ed. R.A. Meyers (Academic Press, New York, 2001), pp. 589-598.
14. P. Corning, Complexity, 18 (2002). 15. J.P. Cruchtfield, Physica D, 11 (1994). 16. E. Pessa, in Proceedings of the First Italian Conference on Systemics, Ed. G. Minati, (Apogeo scientifica, Milano, Italy, 1998).
17. E. Pessa, in Emergence in Complex Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp. 379-382.
18. R. Butts and J. Brown, Eds., Constructivism and Science (Kluwer, Dordrecht, Holland, 1989).
19. E.M.A. Ronald, M. Sipper and M.S. Capcarrère, Artificial Life, 225 (1999). 20. A. Rueger, Synthese, 297 (2000). 21. H. von Foerster, Observing Systems, Selected Papers of Heinz von Foerster (Intersystems Publications, Seaside, CA, 1981).
Acquired Emergent Properties
639
22. H. von Foerster, Understanding Understanding: Essays on Cybernetics and Cognition (Springer, New York, 2003).
23. E. von Glasersfeld, in The invented reality, Ed. P. Watzlawick (Norton, New York, 1984), pp. 17-40.
24. A. Babloyants, Molecules, Dynamics & Life: An Introduction to Self-Organization of Matter (Wiley, New York, 1986).
25. B.P. Belousov, A periodic chemical reaction and its mechanism (Sbornik Referatoo po Radiatsionnoi Meditsine, Medgiz, Moscow, 1959), pp. 145-147.
26. A.S. Iberall and H. Soodak, Collective Phenomena, 9 (1978). 27. G.L. Sewell, Quantum Theory of Collective Phenomena (Oxford University Press, Oxford, 1986).
28. E. Bieberich, BioSystems, 109 (2000). 29. N.J. Smelser, Theory of Collective Behavior (Free Press, New York., 1963). 30. H. Blumer, in New Outline of the Principles of Sociology, Ed. A.M. Lee (Barnes and Noble, New York, 1951), pp.167-222.
31. R.H. Turner, in Handbook of Modern Sociology, Ed. REL Faris (Rand McNally, Chicago, 1964), pp. 382-425.
32. M.M. Millonas, Journal of Theoretical Biology, 529 (1992). 33. G. Theraulaz and J. Gervet, Psychologie Francaise, 7 (1992). 34. M.M. Millonas, in Artificial Life III, Ed. C.G. Langton (Addison-Welsey, Reading, MA, 1994), pp. 417-445.
35. P.W. Anderson and D.L. Stein, in Self-Organizing Systems: The Emergence of Order, Ed. F.E. Yates (Plenum, New York, 1985), pp. 445-457.
36. E. Bonabeau and J.-L. Dessalles, Intellectica, 85 (1997). 37. F. Boschetti, M. Prokopenko, I. Macreadie and A.-M. Grisogono, in Proceedings of
38. 39. 40. 41. 42. 43. 44. 45.
Knowledge-Based Intelligent Information and Engineering Systems, 9th International Conference, KES, Ed. R. Khosla, R.J. Howlett, and L.C. Jain, (Melbourne, Australia, Part IV, volume 3684 of Lecture Notes in Computer Science, September 14-16, 2005), pp. 573-580. G. Minati, in Proceedings of the Second Conference of the Italian Systems Society, Ed. G. Minati and E. Pessa, (Kluwer Academic/Plenum Publishers, London, 2002), pp. 85-102. E. Tulving, American Psychologist, 385 (1985). G. Minati and S. Brahms, in Emergence in Complex Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer, New York, 2002), pp. 4152. G. Minati and E. Pessa, Eds., Emergence in Complex Cognitive, Social and Biological Systems., Proceedings of the Second Conference of the Italian Systems Society (Kluwer Academic/Plenum Publishers, London, 2002). E. Pessa, La Nuova Critica, 53 (2000). N.A. Baas, in Artificial Life III, Ed. C.G. Langton, (Addison-Wesley, Redwood city, 1993). N.A. Baas and C. Emmeche, Intellectica, 67 (1997). K. Kitto, Modeling and generating Complex Emergent Behavior, Ph.D. thesis, The School of Chemistry, Physics and Earth Sciences (The Flinders University of South Australia, 2006), http://scieng.flinders.edu.au/cpes/postgrad/kitto_k/01front.pdf .
640
G. Minati
46. W.H. Davidow and D.M.S. Malone, The Virtual Corporation: Structuring and Revitalizing the Corporation for the 21st Century (HarperCollins, New York, 1992).
47. E. Pessa, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), pp. 355-374.
48. E. Pessa, M.P. Penna, and G. Minati, Chaos & Complexity Letters, 137 (2004). 49. J.L. McClelland and D.E. Rumelhart, Eds., Parallel Distributed Processing. Explorations in the microstructure of cognition (MIT Press, Cambridge, MA, 1986).
50. J.R. Searle, Minds, Brains and Science (Harvard University Press, Cambridge, Massachusetts, 1984).
51. J.R. Searle, Mind: A Brief Introduction (Oxford University Press Inc, Oxford, 2005). 52. J. Kim, in Oxford Companion to Philosophy, Ed. T. Honderich, (Oxford University Press, Oxford, 1995).
53. J. Heil, Ed., Philosophy of Mind: A Guide and Anthology (Oxford University Press, Oxford, 2003).
54. P. Churchland, Matter and Consciousness (Massachusetts Institute of Technology, Cambridge, 1988).
55. G. Vitiello, My double unveiled (Benjamins, Amsterdam, 2001). 56. G. Minati, in Systemics of Emergence: Research and Applications, Proceedings of
the Third Italian Systems Conference, Ed. G. Minati and E. Pessa, (Springer, New York, 2006), pp. 667-682.
THE GROWTH OF POPULATIONS OF PROTOCELLS ROBERTO SERRA (2,1), TIMOTEO CARLETTI (3,1), IRENE POLI (1), ALESSANDRO FILISETTI (2) (1) Dipartimento di Statistica, Università Ca’ Foscari, San Giobbe - Cannaregio 873, 30121 Venezia, Italy (2) Dipartimento di Scienze Sociali, Cognitive e Quantitative, Università di Modena e Reggio Emilia, Via Allegri 9, 42100 Reggio Emilia, Italy (3) Département de Mathématique, Université Notre Dame de la Paix Namur, Rempart de la Vierge 8, B 5000 Namur, Belgium The growth of protocells is discussed under different hypotheses (one or more replicators, linear and nonlinear kinetics) using a class of abstract models (Surface Reaction Models). A method to analyze the dynamics of successive protocell generations is presented, and it is applied to the problem of determining whether the duplication times of the protocell itself and of its genetic material eventually tend to a common value. The importance of the phenomenon of emergent synchronization for sustained protocell population growth and for evolvability is discussed. Keywords: protocells, Surface Reaction Models, emergent synchronization.
1. Introduction Protocells are lipid vesicles or micelles which are endowed with some rudimentary metabolism and contain “genetic” material, and which should be able to grow, reproduce and evolve. While viable protocells do not yet exist, their study is important in order to understand possible scenarios for the origin of life, as well as for creating new “protolife” forms which are able to adapt and evolve [8]. This endeavour has an obvious theoretical interest, but it might also lead to an entirely new “living technology”, definitely different from conventional biotechnology. Theoretical models can be extremely useful to devise possible protocells and to forecast their behavior. In this paper we address an important issue in protocell research. The protogenetic material in a protocell is composed by a set of molecules which, collectively, are able to replicate themselves. At the same time, the whole protocell undergoes a growth process (its metabolism) followed by a breakup into two daughter cells. This breakup is a physical phenomenon which is frequently observed in lipid vesicles, and it has nothing to do with life, although it superficially resembles the division of a cell. In order for evolution to 641
642
R. Serra et al.
be possible, some genetic molecules should affect the rate of duplication of the whole container. Mechanisms have been proposed whereby this can be achieved (see below). But then a new problem arises: the genetic material duplicates at a certain rate, while the lipid container grows, in general, at another rate. When the container splits into two, it may be that the genetic material has not yet doubled: in this case its density would be lower in the daughter protocells. Through generations, this density might eventually vanish. On the other hand, if the genetic material were faster than the container, it would accumulate in successive generations. So, in order for a viable population of evolving protocells to form, it is necessary that the rhythms of the two processes are synchronized. In some models (like the Chemoton [2]) this is imposed a priori in the kinetic equations, but it is unlikely that such a set of exactly coupled reactions spring up spontaneously. It is therefore interesting to consider the possibility that such synchronization be an emergent phenomenon, without imposing it a priori. In the following we will consider this possibility by analyzing an abstract version of the so-called “Los Alamos bug”, a model of protocells where the genetic material is composed by strands of PNA [6,7]. These resemble the better-known nucleic acids DNA and RNA, but have a peptide backbone and it is believed that they might be found in the lipid phase of the protocell. According to this hypothesis, different PNA's may influence the growth rate of their “container” by catalyzing the formation of amphiphiles (which form the protocell membrane) from precursors. The detailed mechanisms whereby this might happen can be found in [6,7]. Inspired by the Los Alamos bug, we developed a more abstract class of models (which can describe also different specific models) which are called Surface Reaction Models [9]. The simplest case (where the genetic material is composed by a single type of self-replicating molecule) will be described in section 2. This model couples the growth of the genetic material and that of the container, and a mathematical technique can be introduced to study how the quantity of the former varies in successive generations. This is described in section 3, where it is also shown that synchronization is indeed an emergent property, both in the case of linear and nonlinear kinetics. Note that the term “linear” refers to the rate equation of the replicator only: the overall model, with its coupling to the container growth and breakup, is definitely nonlinear. Since there may be different kinds of replicators, with different rates, the case of two coexisting replicators (linear and nonlinear) is discussed in section 4. Section 5 is then devoted to the case where replicators directly interact: a
The Growth of Populations of Protocells
643
comprehensive analytical theory can be developed for the linear case, while nonlinear kinetics is approached through simulations. A major consequence of synchronization is that the competition among protocells is darwinian, even if that of the replicators is not [4]. This aspect is discussed in the final section. This paper aims at presenting a unified view of the major results concerning synchronization, while for detailed calculations and demonstrations the reader is referred to [9,10,11,1], where further references to the scientific literature can also be found. 2. Surface reaction models Let us first consider the case where there is a single replicator in the protocell lipid phase, and let its quantity (mass) be denoted by X . Let also C be the total quantity of “container” (e.g. lipid membrane in vesicles or bulk of the micelle). We suppose that the lipid density is constant, so the volume V of the lipid phase is proportional to C . We assume, according to the Labug hypothesis, that the replicator favours the formation of amphiphiles and that, since precursors are found outside the protocell, only the fraction of X which is near the external surface is effective. We assume that also the replication of X takes place near the external surface. Let us further assume that • spontaneous amphiphile formation is negligible • the precursors (both of amphiphiles and of genetic molecules) are buffered • the surface area S is proportional to V β , and therefore also to C β ( β ranging between 2/3 for a micelle and 1 for a very thin vesicle) • diffusion is very fast within the protocell, so concentrations are constant everywhere in the lipid phase • the protocell breaks into two identical daughter units when it reaches a certain volume ( C = θ ) • the rate limiting step which may appear in the replicator kinetic equations does not play a significant role when the protocell is smaller than the division threshold • the contribution of X to the growth of C is linear • the rate of replication of X in the bulk ( d [ X ] dt ) would be proportional to [X ]ν (square brackets indicate concentrations) Under these hypotheses, as shown in [9] one obtains the following approximate equation which describes the growth of a protocell between two successive divisions:
644
R. Serra et al.
dC = α C β −1 X dt dX = η C β −1 X dt
(1)
When C reaches a critical value θ , the cell breaks into two equal daughter protocells; then, until the next duplication, the system is again ruled by Eq. (1). At the beginning of a new generation, both the initial value of X and that of C equal one half of the value which they have attained at the end of the previous generation, i.e. at the time of cell division. Note that, under the above assumptions, the doubling time at generation i is determined by the initial value of X . Synchronization implies constant division times, so it is achieved if one observes the same initial value of X in two successive generations. Synchronization can of course also be detected by the fact that doubling times become equal in successive generations. 3. One type of replicator per cell Let us first consider the linear case, i.e. let ν = 1 in Eq. (1). It is then immediate to observe that the quantity Q =η C −α X
(2)
is conserved during the continuous growth phase, so its value at the end of the growth is the same as it was at beginning. But since the protocell splits into two equal daughter cells, the initial value of Q , at the next generation, will be exactly one half of the previous value. As generations grow (i.e. as t > ∞ ), then Q > 0 and therefore the initial value of X approaches a constant value:
X∞ =
ηθ 2α
(3)
It can also be proven that the doubling time asymptotically approaches the value ln 2 η . Therefore synchronization is achieved in the case where the replicator follows a linear (i.e. first order) kinetics. The mathematical technique quickly described above can be applied to more general cases [9,10]. The key ingredient is that of finding a first integral of the equations which describe the continuous growth phase, and to obtain a recursion map for the initial values of X at successive generations, on the basis of the halving hypothesis. It can then be proven that the above result holds also for equations which are more general than the one considered above, and also for realistic protocell geometries. What is even more important, by renormalizing
The Growth of Populations of Protocells
645
time it can be proven that the asymptotic behavior is not affected by the value of β , so it suffices to consider the simpler β = 1 case. In particular, in the nonlinear case of Eq. (1) the conserved quantity is
Q = C (t ) 2 −ν −
α X (t ) 2 −ν η
(4)
and synchronization can be proven using the same methods as those described above.
4. Coexisting replicators There may be different replicators in a protocell: this would certainly be the case if they were nucleic acids, which can undergo random mutations, but the remark may hold also for more general hypotheses concerning their chemical nature. Let us then suppose that in the same cell there are two self-replicators X and Y . The generalization of Eq. (1) is then
dC = α ′ X + α ′′ Y dt dX = η ′ X ν C1−ν dt dY = η ′′ Y ν C1−ν dt
(5)
In this case one finds two first integrals of the continuous Eqs. (5), and one can then prove synchronization with the methods of section 3. It is interesting to consider what happens when a the fastest replicator gives a smaller contribution to the growth of the whole container then the other one, e.g. to consider the case α ′ > α ′′ and η ′ > η ′′ [9]. In the linear case (ν = 1 ) one finds that the fastest replicator displaces the other one, whose quantity per protocell eventually vanishes. The “altruist” get extinct in the long run. On the other hand, if ν < 1 the two can co-exist, and tend asymptotically to a situation where their relative ratio is proportional to that of their kinetic coefficients η ′ and η ′′ .
5. Interacting replicators In the case considered in section 4 there were different replicators in the same container, but they did not directly affect each other's synthesis. Let us now consider the case where replicators interact in a linear way. The model equations for the continuous growth between two successive divisions are then
646
R. Serra et al.
dX = α C β −1M X dt dC = C β −1α ⋅ X dt
(6)
where the matrix element M ij describes the effect of molecule of type j on the growth rate of molecule of type i. By considering the case β = 1 and using the techniques of section 3 one finds [11] the following conditions for the asymptotic value of the quantity of X at the beginning of each replication cycle X ∞ :
M X∞ = λ X∞
λ=
(7)
ln 2 ∆ T∞
therefore X ∞ must be an eigenvector of the matrix M belonging to the eigenvalue λ . It can be also proven that
X (Tk ) = e M (Tk −T0 )
X0 2 k −1
(8)
From Eq. (8), by considering the limit of very large times, one finds that the eigenvalue which must be considered in Eq. (7) is the one with the largest real part, let us call it λ1 . Physical interpretation of these results requires that l1 (which is proportional to duplication time) be real and positive, and that X ∞ be real and non negative (some components may vanish in the long time limit) . If the matrix M is non negative and non null (i.e. if every M ij ≥ 0 and there is at least one M ij ≠ 0 ) both conditions guaranteed by the Perron-Frobenius theorem, and in this case it can be proven that synchronization is always achieved. Numerical simulations [1] show that this is also the case whenever λ1 is real and admits a nonnegative eigenvector. When these conditions are not fulfilled, one often finds that some species get extinct, and that the above conditions apply to the remaining reduced system of equations. However, when the eigenvalues with the largest real part does not admit a non negative eigenvector, cases where synchronization does not take place may be observed. So the analytical theory is able to describe all those cases where the eigenvector corresponding to λ1 is nonnegativea, while simulations are required when this condition is not satisfied a
and also the trivial cases where there is no eigenvector with a nonnegative real part.
The Growth of Populations of Protocells
647
When the replicators interact in a nonlinear way, although analytical theory may provide useful results, simulation is the main tool to explore the system behavior. Preliminary experiments with some models of this kind show that, while synchronization is the most frequent outcome, interesting dynamical phenomena can also be observed, where the system approaches synchronization, and looks almost synchronized for fairly long times, but this stable situation abruptly changes. It may then be recovered after a turbulent transient. Nonlinear replication kinetics needs further exploration.
6. Conclusions We have seen in section 4 that two replicators which are found in the same protocell, and which grow in a parabolic way (i.e. with sublinear kinetics, ν < 1 ) can coexist. This phenomenon is typically observed also in population dynamics (i.e. without containers): sublinear kinetics leads to asymptotic coexistence of several species [3], a phenomenon which has been called “the survival of anybody”. On the other hand, selection pressure is much more effective in the darwinian case: the survival of only the fittest is guaranteed in population dynamics if the leading term in the kinetic equations is linear. Note that this corresponds to exponential growth, i.e. constant doubling time. But synchronization guarantees that this is exactly the case for protocells: even if the replicators interact in a parabolic way, the containers undergo exponential growth. Therefore, if different types of protocells exist, we can expect darwinian dynamics among them. While we have proven this for the case of surface reaction model, the same phenomenon has also been observed in different models [4]. Therefore selection pressure might be much more effective at the protocell level than at the molecular level. While synchronization is an interesting phenomenon per se, this remark shows that it may have profound effects on the evolvability of protocells populations.
Acknowledgments Support from the EU FET- PACE project within the 6th Framework Program under contract FP6-002035 (Programmable Artificial Cell Evolution) is gratefully acknowledged. We had stimulating and useful discussions during the warm hospitality at the European Center for Living Technology in two workshops which were held on march 16-18, 2006 and on march 18-19, 2007.
648
R. Serra et al.
References 1. A. Filisetti, M.Sc thesis (Dept. of Social, Cognitive and Quantitative Sciences, Modena and Reggio-Emilia University, 2007).
2. T. Ganti, Chemoton Theory, (Vol. I: Theory of Fluid Machineries; Vol. II: Theory of Living Systems), (KluwerAcademic/Plenum Publishers, New York, 2003).
3. J. Maynard-Smith and E. Szathmary, Major transitions in evolution (Oxford University Press, New York, 1997).
4. A. Munteanu, C.S. Attolini, S. Rasmussen, H. Ziock and R.V. Solé, Generic 5. 6. 7. 8. 9. 10. 11.
Darwinian selection in protocell assemblies, DOI: SFI-WP 06-09-032 (Santa Fe Institute, Santa Fe, 2006). T. Oberholzer, R. Wick, P.L. Luisi and C.K. Biebricher, Biochemical and Biophysical Research Communications 207, 250-257 (1995). S. Rasmussen, L. Chen, M. Nilsson, and S. Abe, Artificial Life 9, 269-316 (2003). S. Rasmussen, L. Chen, B. Stadler and P.F. Stadler, Origins Life and Evolution of the Biosphere 34, 171-180 (2004). S. Rasmussen, L. Chen, D. Deamer, D.C. Krakauer, N.H. Packard, P.F. Stadler and M.A. Bedeau, Science 303, 963-965 (2004). R. Serra, T. Carletti, and I. Poli, Artificial Life 13, 1-16 (2007). R. Serra, T. Carletti and I. Poli, in BIOMAT 2006, Ed. R.P Mondaini and R. Dilão (World Scientific, Singapore, 2007). R. Serra, T. Carletti, I. Poli, M. Villani and A. Filisetti, Submitted to ECCS-07: European Conference on Complex Systems (2007).
INVESTIGATING CELL CRITICALITY R. SERRA (1), M. VILLANI (1), C. DAMIANI (1), A. GRAUDENZI (1), P. INGRAMI (1), A. COLACCI (2) (1) Dipartimento di Scienze Sociali, Cognitive e Quantitative Università di Modena e Reggio Emilia, Via Allegri 9, 42100 Reggio Emilia, Italia (2) Excellence Environmental Carcinogenesis, Environmental Protection and Health Prevention Agency Emilia-Romagna, Viale Filopanti 22, Bologna, Italia Random Boolean networks provide a way to give a precise meaning to the notion that living beings are in a critical state. Some phenomena which are observed in real biological systems (distribution of "avalanches" in gene knock-out experiments) can be modeled using random Boolean networks, and the results can be analytically proven to depend upon the Derrida parameter, which also determines whether the network is critical. By comparing observed and simulated data one can then draw inferences about the criticality of biological cells, although with some care because of the limited number of experimental observations. The relationship between the criticality of a single network and that of a set of interacting networks, which simulate a tissue or a bacterial colony, is also analyzed by computer simulations. Keywords: Random Boolean networks, cell criticality, interacting networks.
1. Introduction The idea that complex adaptive systems are driven to a “critical” state has been proposed by different authors [11,3,2,10], although with somewhat different meanings, as a powerful general principle, which could be useful to understand biological as well as social systems. In order to make precise statements about this hypothesis, and to test it against available experimental data, it is convenient to provide a precise, albeit not all-encompassing, definition of criticality. Random Boolean networks (shortly, RBNs) [8,9] are particularly interesting in this regard as they allow such a precise statement to be made. They represent a well-known model of genetic networks which has proven fruitful, as it has allowed to uncover some features of the relationship between genome size and number of cell types in multicellular organisms, as well as of the relationship between genome size and typical length of the cell cycle. Two kinds of dynamical regimes are usually observed in RBNs, an “ordered” and a “disordered” one (the name “chaotic” is also sometimes used in 649
650
R. Serra et al.
the latter case, although it should be kept in mind that the system attractors, in the case of finite size networks, are always cycles). Networks with different parameters tend to be either in the ordered or in the disordered regime [9,4,17]. RBNs will be briefly described in section 2. Recently, it has been shown that random Boolean networks can also accurately describe the statistical properties of perturbations in gene expression (“avalanches”, defined in section 3) induced by silencing single genes, one at a time, in the yeast S. cerevisiae. It is also possible to relate the distribution of avalanches to a parameter (the Derrida parameter) which determines whether a cell is in the critical state and, by comparing the results of theoretical analyses and computer simulations with those of the actual experiments, it is possible to draw inferences about the value of this parameter in S. cerevisiae cells. Section 3 is dedicated to a discussion of this approach. There is suggestive evidence that the cells which have been examined are in an ordered state, but the value of the Derrida parameter is close to the one which corresponds to criticality. Note that the arguments in favor of the fact that life tends to be found “at the edge of chaos” apply to organisms as a whole, not to isolated cells. Many organisms tend to form colonies, where cells grow close to each other and communicate by transferring molecules to each other. Intercellular communication is even more intense in tissues of multicellular organisms. It is then important to understand the relationship between the dynamics of isolated cells and that of a collection of interacting cells. When a critical cell interacts with others, what is the overall dynamics? Is it more or less ordered? In order to investigate this issue it is possible to use a cellular automaton model, where each cell site is occupied by a RBN, which simulates a single cell. The interaction is modeled by letting the expression of some genes be influenced not only by the genes which are in the same cell, but also by the neighboring ones, in a way which mimics the transmembrane transfer of proteins or other molecules. Section 4 is dedicated to a description of the results of these studies. The final section is dedicated to critical comments and indications for further research. Both the results concerning the distribution of avalanches in gene expression, and those concerning the dynamical properties of interacting RBNs have been to a large extent published in, or submitted to, technical journals, which are quoted in the appropriate Sections and where also reference to other relevant works can be found. The original aspect of the present paper is that of focusing the discussion on the issue of cell criticality.
Investigating Cell Criticality
651
2. Random Boolean networks There are some excellent reviews and books on RBNs [9,10,1] so we will briefly summarize here only their main features. Let us consider a network composed of N genes, or nodes, which can take either the value 0 (inactive) or 1 (active). In a classical RBN each node has the same number of incoming connections kin , and its kin input nodes are chosen at random with uniform probability among the remaining N − 1 nodes (multiple connections from the same node being prohibited). It then turns out that the distribution of outgoing connections per node follows a Poisson distribution. The output (i.e. the new value of a node) corresponding to each set of values of the input nodes is determined by a Boolean function, which is associated to that node, and which is also chosen at random, according to some probability distribution. The simplest choice is that of a uniform distribution among all the possible Boolean functions of kin arguments. However, a careful analysis of some biological control circuits has shown that there is a strong bias in favour of the so-called “canalyzing” functions [6], where at least one value of one of the input nodes uniquely determines the output, independently of the values of the other input nodes. Both the topology and the Boolean function associated to each gene do not change in time. The network dynamics is discrete and synchronous. In order to analyze the properties of an ensemble of random Boolean networks, different networks are synthesized and their dynamical properties are examined. While individual realizations may differ markedly from the average properties of a given class of networks [4] one of the major results is the discovery of the existence of two different dynamical regimes, an ordered and a disordered one, divided by a “critical zone” in parameter space. Attractors are always cycles in finite RBNs: in the ordered regime their length scales as a power of N , moreover in this regime the system is stable with respect to small perturbations of the initial conditions. In the disordered regime the length of the cycles grows exponentially with N , and small changes in initial conditions often lead to different attractors. For fixed N , the most relevant parameter which determines the kind of regime is the connectivity per node, k : one typically observes ordered behavior for small k , and a disordered one for larger k . The parameter which determines whether a network is in the ordered or in the disordered regime is the so-called Derrida parameter, which measures the rate at which nearby initial states diverge. For a more detailed discussion, the reader is referred to [9,1,4,17].
652
R. Serra et al.
3. Avalanches in gene expression data The experiments discussed below are described in [7], while the theoretical analyses are discussed in depth in [13,14,15]: the reader interested in a deeper understanding of these topics is referred to these works. Hughes and co-workers have performed several experiments where a single gene of S. Cerevisiae has been knocked-out and, using DNA-microarrays, have compared the expression levels of all the genes of such perturbed cells with those of normal, wild type cells. The knock-out experiment can be simulated in silicon by comparing the evolution of two RBNs which start from identical initial conditions, except for the fact that one gene (the “knocked-out” one) is clamped permanently to the value 0 in the network which simulates the perturbed cell. The results of both experiments and simulations can be described by the distribution of "avalanches": an avalanche is the number of genes which are modified in a given experiment. In order to compare continuous experimental data with the results of Boolean models it is necessary to define a threshold for the former, so that two expression levels are considered "different" if their ratio exceeds the threshold. The initial simulations were performed using a classical RBN with 2 input connections per node, restricting the set of boolean functions to the so-called canalyzing ones, for the reasons given in Section 2. The comparison of the simulation results with the experimental distribution of avalanches is really good. This was fairly surprising, since the simplest model with practically no parameters, where all nodes have an equal number of inputs (a condition which is certainly not satisfied in real cells) was used. It was then possible to analytically determine that the distribution of avalanches in RBN, as long as they involve a number of genes which is much smaller than the total number of genes in the network, depends only upon one relevant parameter. Let q ≤ 1 be defined as follows. For a node chosen at random, say node A , suppose that one (and only one) of its inputs, also chosen at random, is changed: then q is the probability that node A does not change its value. Let pn be the probability that an avalanche involves n nodes, and let pout (k ) be the probability that a node has k outgoing connections. It can be proven that the distribution of avalanches depends only upon the distribution of outgoing connections: that’s why a simple model with an equal number of input links per node may work well. All the pn can be found from the knowledge of the “outgoing” moment generating function F :
Investigating Cell Criticality
F=
N −1 m=0
q m pout (m)
653
(1)
In classical RBN pout (k ) is Poissonian, and in this case it can be proven that
F = e −λ
λ ≡ (1 − q) A
(2)
Here λ is indeed the Derrida exponent, which also determines the network dynamical regime (cfr. section 2). Therefore the distribution of avalanches depends only upon a single parameter, namely the Derrida exponent. The simple model which we had used in our simulations had a value of this parameter slightly smaller than the critical one, and this turned out to be a fortunate choice. As suggested in [12] the dependency of the distribution of avalanches on λ can then be used to try to infer the value of the parameter which corresponds to it in real cells, and which should discriminate between ordered and disordered states. Among the different cases which have been considered, the best agreement (according the well-known χ 2 measure) with experimental data is provided by the case where λ = 6 7 , slightly smaller than the critical value 1. This supports the suggestion that life forms tend to exist at the critical state or in the ordered region, close to criticality [18]. Note however that, since only a single data set is available, it would be inappropriate to draw definite conclusions concerning this point.
4. Interactions among random Boolean networks The simulations of the interaction among Boolean networks is described in detail in [19,16, 5]. Our interest here concerns the behavior of interacting critical networks, and the main question concerns the effects of interaction on the dynamical regime. In order to model the interaction let us consider a 2D square lattice cellular automaton with M2 cells, each of them being occupied by a complete RBN. The neighborhood is of the von Neumann type (composed by the cell itself and its N, E, S, W neighbors). We assume wrap around so the overall topology is toroidal. Every RBN of the automaton is structurally identical, while the initial activation states of the various genes may differ . In particular, each of the RBNs has the following common features: 1. same number ( N ) of Boolean nodes;
654
2. 3.
R. Serra et al.
same topology, i.e. same ingoing and outgoing connections for each node of the network; same Boolean function associated to each node.
A key aspect of the model is the representation of interactions: the fact that some proteins can pass from one cell to another is modeled by assuming that a cell can be affected by the activation of some genes of a neighboring cell. Nodes able to interact with other cells are defined as shared nodes and they are a subset of the total number of nodes of the RBN. Let f be the fraction of interacting nodes. We define as elementary value of a certain node the value computed according its Boolean function and to the value of its input nodes, belonging to the same RBN. The shared value of a shared node, instead, is calculated taking into account also the activation value of the nodes of its neighboring cells, depending on a precise interaction rule. In our initial study we concentrated on the rule “AT LEAST ONE ACTIVE” node (ALOA), where the shared value of a node x in the cell A is 1 if its value or at least one of those of the nodes x in the four neighboring cells is 1 (and it is 0 otherwise). Let a G-automaton (or, equivalently, a G-colony) be a set of interacting cells, defined by: • the topology of interaction T (in our case this is fixed) • the interaction rule R • the interaction strength, measured by the fraction f of shared nodes • the genome G of the RBNs which are placed in each cell of the automaton We have considered a number of different indicators of the degree of order of the tissue, and we have observed that there is no common tendency in all the networks towards either a more ordered or a more disordered behavior, as the interaction strength grows. So it seems that one cannot simply claim that interaction favors order or disorder. In order to measure the influence of interaction on the degree of order, a useful variable is
Ω = DA + CWA
(3)
where DA is the number of different attractors of a definite G-automaton, while CWA is the number of cells whose RBN reached no attractors. The number of different attractors can be considered as an indicator of the homogeneity of the cells in the G-automaton. Yet, in many cases some cells could reach no attractors and their number would not be computed into this variable. Adding the number of cells with no attractor to the number of different
Investigating Cell Criticality
655
attractors is a way to compensate this effect. Thus, Ω is a variable which measures a kind of order; it is indeed a decreasing function, which attains its minimum value (1) when the order is maximum, and its maximum when the order is minimum (all the cells do not reach any attractor or reach different attractors). The analysis on several G-automata demonstrates the presence of three recognizable kinds of behavior, concerning the dependency of Ω upon f : • Ω constant and equal to 1: all the G-automata reach the same attractor, independently of the value of f and also in absence of interaction. The attractors of this class of G-automata are fixed points • increasing Ω : it reaches a maximum when f = 1 • bell-shaped Ω : we define as bell-shaped a curve with a single maximum for f ∉ 0,1 . It has proven convenient to introduce a further sub-distinction among the Gautomata characterized by a bell-shaped curve of Ω : • left-oriented bell shaped: the maximum of the curve is for f ≤ 0.5 . • right-oriented bell shaped: the maximum is for f > 0.5 . The above criterion divides the G-automata into classes according to a measure of the way in which their behavior changes as a function of the strength of interactions among neighboring cells. The different classes tend to have a similar behavior also with respect to other order indicators which are described in [16,5]. The most interesting observation, however, is that the average period of the attractors observed at f = 0 (i.e. non interacting cells) provides useful information to predict the class to which a particular G-automaton belongs and therefore provides useful information to forecast whether increasing interaction leads, for a given genome, to a more ordered or to a more disordered behavior. In particular, RBNs which, in isolation, have long periods tend to become more disordered as the interaction strength increases, while on the other hand cells with short attractors tend to become more ordered.
5. Conclusions Concerning avalanches, although our best estimate is slightly smaller than the critical value, it must be observed that the available data are not yet conclusive: the best estimate of Ramo et al [12] for the Derrida parameter coincides exactly with the critical value. Shmulevich, Kauffman and Aldana [18] have studied a
656
R. Serra et al.
different system (time courses in HeLa cells) and found estimates for l in the critical or ordered region. Interestingly enough, when one tries to simulate the distribution of avalanches using random boolean networks with a scale-free distribution of outgoing connections (shortly, SFRBNs) one obtains a reasonably good agreement with the experimental data, except for the largest avalanches. When the parameters are chosen in such a way that there is a good agreement with the distribution of the (by far most frequent) small avalanches, one finds that in scale-free RBNs the number of large avalanches is greater than that observed in their classical counterparts, and greater than that observed in real data [15]. However this last phenomenon may be due to the limited number of gene knockouts which have been performed (227 in a network with 6325 genes). Since large avalanches are related to the silencing of a hub node, the possibility that such a hub has never been hit, although unlikely, cannot be ruled out. Nonetheless, the comparison of observed avalanche distribution with theoretical behavior provides the most direct way to investigate the issue of cell criticality so far devised. Concerning the interactions of Boolean networks, the most interesting aspect seems to be that interaction tends to amplify some peculiar features of a given network. Briefly, ordered networks become more ordered when they interact with others networks of the same kind, while disordered networks become more disordered. One could develop interesting speculations about the interplay of evolution with such dynamical properties.
Acknowledgments This work has been partially supported by the Italian MIUR-FISR project nr. 2982/Ric (Mitica).
References 1. M. Aldana, S. Coppersmith, L.P. Kadanoff, in Perspectives and Problems in 2. 3. 4. 5. 6.
Nonlinear Science, Ed. E. Kaplan, J.E. Marsden and K.R. Sreenivasan (Springer, New York, 2003), also available at http://www.arXiv:cond-mat/0209571 . P. Bak, How Nature works (Springer, New York, 1996). P. Bak, C. Tang and K. Wiesenfeld, Phys. Rev. A 38, 364 (1988). U. Bastolla and G. Parisi, Physica D 115, 219-233 (1998). C. Damiani, M.Sc thesis (Dept. of Social, Cognitive and Quantitative Sciences, Modena and Reggio-Emilia University, 2007). S.E. Harris, B.K. Sawhill, A. Wuensche and S.A. Kauffman, Complexity 7, 23-40 (2002).
Investigating Cell Criticality 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
657
T.R. Hughes, et al., Cell 102, 109-126 (2000). S.A. Kauffman, Curr. Top. Dev. Biol 6, 145-182 (1971). S.A. Kauffman, The origins of order (Oxford University Press, NewYork, 1993). S.A. Kauffman, At home in the universe (Oxford University Press, New York, 1995). C.G. Langton, in Emergent computation, Ed. S. Forrest (MIT Press, Cambridge, MA, 1991). P. Ramo, J. Kesseli and O. Yli-Harja, J. Theor. Biol. 242, 164 (2006). R. Serra, M. Villani and A. Semeria, J. Theor. Biol. 227, 149-157 (2004). R. Serra, M. Villani, A. Graudenzi and S. A. Kauffman, J. Theor. Biol. 246(3,7), 449-460 (2007). R. Serra, M. Villani, A. Graudenzi, A. Colacci and S.A. Kauffman, Submitted to ECCS-07: European Conference on Complex Systems (2007). R. Serra, M. Villani, C. Damiani, A. Graudenzi, A. Colacci, S.A., Kauffman, Submitted to ECCS-07: European Conference on Complex Systems (2007). J.E.S. Socolar and S.A. Kauffman, Phys. Rev. Let. 90 (2003). I. Shmulevich, S.A. Kauffman and M. Aldana, PNAS 102, 13439-13444 (2005). M. Villani, R. Serra, P. Ingrami and S.A. Kauffman, in Cellular Automata, LNCS 4173 (Springer, Berlin/Heidelberg, 2006), pp. 548-556.
This page intentionally left blank
RELATIVISTIC STABILITY. PART 1 - RELATION BETWEEN SPECIAL RELATIVITY AND STABILITY THEORY IN THE TWO-BODY PROBLEM
UMBERTO DI CAPRIO Stability Analysis s.r.l., Via Andrea Doria 48/A - 20124 Milano, Italy E-mail: [email protected] With reference to the restricted two-body problem we show that Stability Theory (ST) and Special Relativity (SR) can be joined together in a new theory that explains a large class of physical phenomena (e.g. black-holes, cosmological dynamics) and overcomes the dualism between SR and General relativity (GR). After recalling the main features of ST (from the Method of Lyapunov to more recent developments up to analysis of fractals) we determine the canonic relativistic equations of the restricted two-body problem. A substantial novelty with respect to noted formulations is pointed out: three state variables (and not two only) are needed for “defining” said equations. They include variable v (magnitude of the rotation speed) in addition to radius and to radial speed. By means of eigenvalue analysis and by application of the Lyapunov theorem on stability in the first approximation we show that linearized system analysis gives a necessary condition only for stability: the radius must be greater than half the Schwarzschild radius. The derivation of a sufficient condition passes through the definition of a convenient Lyapunov function that represents the “local energy” around a given Equilibrium Point. Such derivation is deferred to Part II and results in the proof that the Schwarzschild radius actually represents the reference stable radius of the two-body problem. Keywords: stability theory, Lyapunov function, special relativity theory.
1. Introduction Actual theories of emergence are shaped on two main frameworks introduced long time ago: the theory of stability and relativity theory. The importance of the former doesn’t need to be emphasized, as it constitutes, already from the times of Von Bertalanffy, one of the pillars of systemics. On the other hand, relativity theory, already in the special case and more strongly in the general one, was the first framework in which it was possible to overcome the traditional cause-effect relationship, owing to the presence of nonlinear terms which, while needing a more complex mathematical apparatus, open the way to theories of pattern formation. It is therefore of outmost importance for systemics to investigate what occurs when stability theory and relativity theory meet in the study of particular large systems, such as stars, galaxies, or the whole universe.
659
660
U. Di Caprio
No general study is so far available about the relationships between Stability Theory and Special Relativity. Here we present results referring to the “restricted” two-body problem. We start with the following question: in which way the fact that the rotation speed cannot exceed the speed of light does influence the problem? Also, what is the form of the “canonic” state equations when relativistic effects are taken into account? What constraints are imposed by stability? To introduce the discussion we start with an intuitive reasoning: consider a circular orbit, with constant values of the radius R and the magnitude of the rotation speed v. Such orbit identifies a dynamical Equilibrium Point in the sense of Stability Theory, namely a point whose coordinates are R = const , R = 0 , v = const . In addition R = 0 and then the acting forces balance each other, i.e. F = Fc with F = GMm0 / R 2 , gravitational force and Fc = m0 v 2 / R centrifugal force. It follows from above eq. that
GMm0 R
2
=
m0 v 2 R
→
GM = v2 R
(1)
and so, the relativistic condition v < c results in
GM ≤ R → R ≥ RMIN c2
with
RMIN =
GM c2
(2)
This means that no equilibrium can exist if the radius is smaller than RMIN , Since the existence of an Equilibrium is a preliminary condition for stability we can see that SR by itself puts a stability problem into evidence. Also, the aforesaid minimum radius is half the Schwarzschild radius. This lets us foresee a possible connection between the purely mathematical Schwarzschild singularity and a physical condition derivable from stability. Consequently the known theory of black-holes could be reset on more solid grounds and the connections between SR and GR can be explored in a novel optic, along lines that represent a significant evolution of studies by various authors in the sixties (e.g. P. Caldirola of the University of Milan, H. Bondi of Cambridge University) and in the eihties (G. Jumarie, University of Quebec); the author of the present work belongs to the school of Milan. Another source of inspiration is the “post-newtonian approximation” by S. Weinberg (1972) [39]. That being said we think it important to recall something general about Stability theory and its practical applications. A complex physical system formed by interacting subsystems possesses defined functional characteristics or satisfies assigned requirements in a steady-state mode (Equilibrium), which must be kept at all times for sudden
Relativistic Stability. Part 1 - Relation between SR and ST …
661
“disturbances”, with an adequate margin of “safety”. Stability either represents the fruit of a deliberate design or, in the minimal case, it should be considered a cogent hidden property whose roots need anyhow be explored (in view, e.g., of detecting the number and kind of forces that in reality operate on the system, or in view of deriving physical constraints on the system parameters). Elementary examples of complex systems are the Atom, the Proton, an Electric Power System. Properties to be maintained are, respectively, the emission spectrum of the Atom, the mass of the Proton, the capability of a multimachine electric power system of meeting the “demand” without loosing synchronism (50 or 60 Hz). In our schematization the “disturbances” are either initial conditions in the classical sense or step disturbances that primarily determine a change of the reference Equilibrium Point. Stability analysis has a very solid theoretical foundation and a fairly wide coverage in the literature (please mainly refer to the book by W. Hahn “Theory and application of the Lyapunov' s Direct Method”). It dates back to 1892, before the formulation of Special Relativity and well beforehand the formulation of Quantum theory. We recall, among others the developments and extensions due to M.A. Aizerman, H. A. Antosiewicz. N.G. Chetaev, G.N. Duboshin, N. P. Erugin, E. Gibson; D.R. Inwerson, R.E. Kalman, S. Lefschetz, S. Letov, A.I. Lure, I.G. Malkin, S.K. Persidskii, K.P. Persidskii, V.M. Popov, G.K. Pozarickii, B.S. Razumchin, J.N. Roiteberg, E.N. Rozenwasser, D.G. Schultz, V.M. Stazinskii; G.P. Szegö, O. Taussky. More recently: E. Barbashin and N.V. Krasovskii [4], J.P. Lasalle [28], V.V. Nemyskii and V.V. Stepanov [32], which includes “dynamical systems defined in metric spaces”, N.N. Krasovskii [25], J.K. Hale [24], R.D. Driver [21], who extended the Lyapunov Method to functional differential equations in infinite dimensional space, V.I. Zubov [43], who extended the stability theory to partial differential equations, also giving a method for determining the Region of Asymptotic Stability. More recent extensions regard: strange attractors, fractals, the theory of catastrophes, the Whitney theory of singularities. For these subjects it is suitable to quote the following works: H. Withney “Singularities” [41], B.B. Mondelbroot [31] “Fractals”, R. Thom [38] “Theory of catastrophes”, V.I. Arnol' d [3], for a more rigorous treatment of all the above topics. A variety of applicative works have been published, among which those ones dealing with electric power systems EPS [15], and those with the cosmological problem [16]. The first show a definite way for the study of dissipative systems with n degrees of freedom. The energy function no more is an “integral” of the motion but represents a Lyapunov function. If in the neighborhood of an Equilibrium Point x 0 the rate of change dE / dt is negative, the energy will continually decrease
662
U. Di Caprio
until it finally assumes its minimum value E ( x 0 ) . A Lyapunov function in general possesses local properties: it is positive-definite at x 0 (namely it has a local minimum there) and at one time its “time-derivative along the system trajectories” V ( x) is negative-definite at x 0 (namely it has a local maximum). The study of the stability of the Universe leads to more than interesting results about present state (age, mass, radius, density etc) and future evolution, as well as past evolution. We make reference to an elementary and intuitive notion of stability: if, after the occurrence of a disturbance, the system returns to the stable state we say that the system is stable (Asymptotic stability). The system is also termed stable if it converges to another equilibrium in proximity of the initial equilibrium point (Weak stability). If the system “runs away” so that certain physical variables go on increasing as t → ∞ or leave a convenient bonded region (i.e. the so called Stability Region) then we say that the system is unstable. Physical systems can be described by their mathematical models. The following are some typical descriptions: 1. x = f ( x, u , t ) Nonlinear, time-varying and forced 2.
x = f ( x, t ) Nonlinear, time varying and force free
3.
x = f ( x, u ) Nonlinear, time invariant and forced
4.
x = f ( x) Nonlinear time invariant and force free (Autonomous).
where x represents the n-dimensional state vector, u represents the rdimensional input vector and t represents the independent time variable. Here we are interested to systems of type 4). Then it is understood that the internal forces are suitably “embodied” in the model, while no external force (vector u ) is present. With regard to linear autonomous systems the theory of stability is well known through the methods of Nyquist, Routh-Hurwitz, etc. On the contrary, in the case of nonlinear systems no such systematic procedures exist: closed form solutions of nonlinear differential equations are exceptions rather than the rule. A.M. Lyapunov in 1892 already (“Problem general de la stabilitè du movement”, Russian Ed.1892, reprinted in Annals of Mathematical Studies N.17, Princeton University Press, Princeton, N.J., 1949) set forth the general framework for the solution of such a problem. He outlined two approaches, known popularly as Lyapunov' s “first method” and the other the Second method of Lyapunov or the Direct Method. The distinction is based on the fact that the “first method” depends on finding approximate solutions to the differential equations. In the Second method no such a knowledge is necessary. We make reference to the second method. The existing connection between the Lyapunov method and the classic techniques of analysis of linear systems is illustrated by
Relativistic Stability. Part 1 - Relation between SR and ST …
663
the following Theorem on stability “in the first approximation”, which will be utilized in this Part 1. Consider the system represented by the non-linear constant equation
x = f ( x ) ; 0 = f (0 )
(3)
and the corresponding linear differential equation
x = A x ; A (n × n) matrix ; aij =
∂f i ∂x j
(4) x0
Then 1. The Equilibrium Point x = 0 is asymptotically stable if all the eigenvalues of matrix A have a negative real part. 2. The Equilibrium Point x = 0 is unstable if at least one eigenvalue of A has a real positive part. Consequently, the condition that none of the eigenvalues of A had a real positive part represents a necessary condition for stability of x = 0 . 3. If none of the eigenvalues in question has a real positive part but, however, some eigenvalues have real part equal to zero, the non-linear differential eq. (3) has a “critical behavior” with regard to the Equilibrium. Namely, the eventual stability or instability of x = 0 , cannot be derived from the analysis of the stability of the linear system (1.6), that represents the first order approximation of x = f (x) at x = 0 . The direct analysis of “nonlinear stability” will be afforded in Part II and will lead us to an absolutely original interpretation of the famous Schwarzschild radius of GR. Turning back to present Part 1, we proceed as follows. In Sec 2 we derive the canonic equations of the restricted two-body problem under the assumption that the mass of the rotating body varies with the velocity according to the classic Einstein relation. The canonic equations serve us in view of subsequent stability analysis via the Lyapunov Method Such equations are partially equivalent to those proposed by Caldirola [6], but differ in this fundamental respect: the state variables are three and not two since, in addition to radius R, and to radial speed R they include the magnitude of the rotation velocity v. Such result is not surprising and deploys its full potentiality in the study of black-holes (Part 2). In Sec. 3 we illustrate eigenvalues analysis and apply the aforementioned Lyapunov theorem on “stability in the first approximation”. We conclude that such analysis is absolutely inadequate.
664
U. Di Caprio
In Sec 4 we afford a crucial issue which looks worthy of greater attention than usual: in the special case in which the two-body problem is formulated in a classical (i.e. non relativistic) how can we derive from the Lyapunov method sufficient conditions for stability? Moreover we exploit the principle of equivalence Potential energy / mass for deducting from our dynamic model a necessary condition for relativistic stability. The determination of sufficient conditions is deferred to Part 2, altogether with the study of the Schwarzschild radius. 2.
Derivation of the canonic system equations of the two-body relativistic problem
Consider the plane motion of a m0 by the action of a central attractive force of newtonian type. The special relativity equation of the motion is given by
d (m v ) =F dt
→
dm dv v + ma = F dv dt
(5)
with
m0
m(v) = γ m0 =
2
2
1 − (v / c )
;
a=
dv dt
(6)
Setting v = v R + jvT ; v = v R2 + vT2 ; v R = R ; vT = Rθ and a = aR + jaT ;
a R = R − Rθ ; aθ = 2 Rθ + Rθ ; furthermore FR = − F = −
ρ
R2
( ρ > 0) ; FT = 0
(7)
(force F is always oriented according to the radius that connects the mobile point to the center of force) we derive from eq.s (5) (6) the three scalar equations
dm dv v R + ma R = FR ; dv dt
dm dv vT + maT = 0 ; dv dt
dm mv = 2 d v c − v2
(8)
Since v = R 2 + R 2θ 2 , it is
v v = RR + RRθ 2 + R 2θθ On the other hand eq.s (8) bring about
(9)
Relativistic Stability. Part 1 - Relation between SR and ST …
mv 2
c −v
2
v vT + m(2 Rθ + Rθ ) = 0
Rθ = −
→
vv 2
c − v2
vv 2
c − v2
665
Rθ + 2 Rθ + Rθ = 0
Rθ + 2 Rθ
(10)
From eq.s (9) and (10) we derive
vv 1+
( Rθ ) 2 = R ( R − Rθ 2 ) = v R a R c2 − v2
(11)
and, as ( Rθ ) 2 = vT2 = v 2 − vR2 , then
v v = vR aR
c2 − v2 c 2 − v R2
On the other hand (from (8)) we have a R = (12) leads to
v v = vR
eq.s
R = Rθ 2 +
FR c 2 − v R2 and hence equation m c2
FR v2 F v2 1− 2 = R R 1− 2 m m c c
Taking into account that Rθ 2 =
(12)
(13)
vT2 v 2 − vR2 = and v R = R we get from above R R
FR vv F R2 v2 − R2 − 2 2 R → R = R 1− 2 + m c −v m R c
(14)
Eq. (14) represents an equation with the canonic form R = f ( R, R, v) in the three state variables R, R, v . In order to completely define our state equations we need one more equation of the form v = g ( R, R, v) . It can be obtained from (13):
v=
R v2 F 1− 2 R v m c
(15)
The system of eqs. (14) and (15) where FR = − F = − ρ / R 2 and m(v ) given by eq. (6) defines a dynamic and relativistic model of the 3-rd order in the state variables R , R and v . The Points of Equilibrium of such model are identified by the solutions of the system of equations
666
U. Di Caprio
R = 0; R =0;v =0 →
FR v2 + = 0 ; R = R0 = cost ; v = v0 = cost m (v ) R
(16)
Therefore the Equilibrium Points are defined to within an arbitrary constant that represents the value of the radius R0 . The corresponding value of the speed is obtained from the algebraic eq.s
γ0
v02 c2
=
ρ R0 m0 c 2
=
GM R0 c 2
; γ0
v02 c2
=
γ 02 − 1 γ0
which give γ 0 as the positive solution of the 2nd order eq.
γ 02 − γ 0
GM −1 = 0 R0 c 2
and v0 as v0 / c = GM / γ 0 . Note that eq. (14) has the same structure of the corresponding nonrelativistic equation (given by Newton' s theory) and, indeed, the latter can be directly derived from it simply replacing m(v ) with m0 and letting c → ∞ . On the other hand eq. (15) turns automatically satisfied when c → ∞ : infact eq. (15) leads to R FR v= when c → ∞ (17) v m and, since FR / m = R + Rθ 2 , eq. (17) implicates
v v = RR + RRθ 2
c→∞
when
which, on the other hand, directly follows from computing the time derivative of both members of the equation v 2 = R 2 + ( Rθ ) 2 and recalling that, when c → ∞ , then (2 Rθ + Rθ ) → 0 . Finally, by introducing the state variables
x1 = R , x2 = R , x3 = v we find the canonic representation
x1 = x2 ;
x2 = −
x x2 x3 = 2 1 − 32 x3 c
−
ρ mx12
ρ mx12
1− ;
x22
c2
+ m=
x32 − x22 ; x1 m0
(18)
1 − ( x32 / c 2 )
Remark 1: The canonic state equations identify a third order dynamic model. This represents a fundamental advancement both with regard to Newton and to
Relativistic Stability. Part 1 - Relation between SR and ST …
667
Einstein as well as to other existing formulations. Furthermore the analysis remains unchanged if we postulate equivalence Potential energy mass (cfr. with Caldirola [6]) and replace m with mˆ
mˆ = γ mˆ 0 ; mˆ 0 = m0 +
Ep
ρ
= m0 +
(19) c Rc 2 Thirdly, we can deal either with the gravitational problem or with the electrical problem (e.g. the photon problem). In the first case we assume ρ = GMm0 , 2
while in the second case we assume ρ = kq 2 with k the Coulomb constant and q the unitary charge. 3. Linearized system equations Equation (18) has the general form x = f ( x) . From it we can determine the linearized system equation x = A( x − x 0 ) with x 0 Point of Equilibrium
( R0 ,0, v0 ) and with A 3×3 matrix of elements aij = ∂f i / ∂x j 0 A = a21 0 a 21 =
1 0 a32
∂FR 1 m(v0 ) ∂R
a 21 =
v02 R02
;
0 a23 ; 0 − ( R0 ,v0 )
a23 =
v02
R02
a 21 =
∂FR 1 m(v0 ) ∂R
;
a32 =
v0 2 − (v02 / c 2 ) R0 1 − (v02 / c 2 )
1−
;
− ( R0 ,v0 )
v2 c
2
a32 =
v02
R02
FR vm(v)
(20)
(21) ( R0 ,v0 )
v02 2
v0 −1 R0 c
(22)
The eigenvalue equation Det ( A − λI ) (I identity matrix) results in
− λ[λ2 − (a23a32 + a21 )] = 0 which gives λ1 = 0 ; λ2 = a23 a32 + a21 ;
λ3 = − a23 a32 + a21 = −λ2 with a23a32 + a21 =
v02 v02 −1 R02 c 2
As v0 ≤ 0 above eq.s entail that a23 a32 + a21 ≤ 0 and, also, a23 a32 + a21 < 0 if
v0 < c . Therefore if v0 < c the system eigenvalues are equal to λ1 = 0 ;
668
U. Di Caprio
λ2 = j − (a23 a32 + a21 ) ; λ3 = − j − (a23a32 + a21 ) , j = − 1 while if v0 = c the system eigenvalues are coincident and equal to λ1 = λ2 = λ3 = 0 . In the first
case λ2 and λ3 identify an undamped oscillatory mode, while λ1 identifies a constant mode. The linearized system is weakly stable and nothing can be said about stability of the original non-linear system. In the second case (i.e. v0 = c ) the linearized system is unstable and, due to the Viola Theorem, the non-linear system is unstable as well; however such case is without practical value, since when v0 → c then (γ 0 v02 / c 2 ) → ∞ and R0 → 0 .
4. A necessary condition for relativistic stability The condition of equilibrium (16) brings about ( ρ / R0 ) = m(v0 )v02 . As the
potential energy in Equilibrium is defined by E p = − ρ / R0 aforesaid equation
results in
E p = −m(v0 )v02
→ E p = −γ 0 m0 v02
(23)
Postulating that E p is equivalent to mass (which is primarily justified by the classical Einstein' s equivalence E = m c 2 ) and using eq. (19), we obtain from (23) and (19)
mˆ 0 = 1 − γ 0
v02 c2
m0
(24)
Replacing m0 with mˆ 0 we find the relations
Tˆ0 = (γ 0 − 1)mˆ 0 c 2 ;
Eˆ p 0 = −γ 0 mˆ 0 v02
(25)
(we call Eˆ p 0 relativistic Potential energy and Tˆ0 relativistic Kinetic energy in equilibrium). It is
v2 Tˆ0 + Eˆ p 0 = m0 c 2 1 − γ 0 02 c
1− γ0
γ0
(26)
From eq. (26), we derive the following necessary condition for stability (Appendix)
Relativistic Stability. Part 1 - Relation between SR and ST …
γ0
v02 c2
<1
669
(27)
which results in
γ < γr ; γr =
1+ 5 = 1.618 2
(the “golden ratio”)
γ r2 − 1 vr = = 0.7816 = c γr
v < vr ;
5 −1 2
(28)
12
(29)
We call vr reference speed and summarize the preceding results this way: in the two-body relativistic (restricted) problem a necessary condition for stability is that the rotation speed be smaller than the “reference value” and the corresponding mass coefficient be smaller than the golden ratio. Furthermore, since in Equilibrium radius R0 is bonded to the rotation speed by
R0 =
GM
γ 0 v02
(30)
condition (27) implicates the existence of a minimum radius for stability, Rmin = (GM c 2 ) . This lead us back to the elementary and intuitive reasoning in Chapter 1 (see eq. (2)).
Remark 2: In the formulation of the principle of equivalence energy/mass Einstein did not made an explicit distinction between Potential and Kinetic energy. In other writings he pointed out the equivalence Kinetic energy/mass. Hence the equivalence between Potential energy and mass might seem partially obscure. We emphasize that such equivalence is clearly illustrated by Caldirola and by systemic studies of the Universe and of particle physics. Moreover, it directly follows from our dynamic model, and directly descends from equivalence Kinetic energy/mass itself. In fact in Equilibrium
Ep m0 c 2 and consequently
=−
γ v2 c2
=−
γ 2 −1 1 = −γ + γ γ
670
U. Di Caprio
Ep m0 c
2
=−
T 1 +1 + 2 m0 c (T m0 c 2 ) + 1
Ep T T +1 + +1 2 2 m0 c m0 c m0 c 2
2
=1
Such relation between T and E p is of the type
( x + 1) y + ( x + 1) 2 = 1 → x 2 + (2 + y ) x + y = 0 x = − 1+
y y2 + 1+ + 2y 2 4
If it was y = 0 (i.e. if the Potential energy was not equivalent to mass), and x = 0 (i.e. Kinetic energy neither would be equivalent to mass). Also, as x = (T m0 c 2 ) = γ − 1 and, in Equilibrium, y = ( E p m0 c 2 ) = − γ 0 v02 c 2 , then if y = −1 , it is x = −0.5 + 1.25 = 0.618 → γ − 1 = 0.618 → γ = γ r .
5. Conclusion With reference to the two-body restricted problem ,either gravitational or electrical, we have derived a necessary condition for relativistic stability. The radius (i.e. the distance between the two bodies) must be greater than half the Schwarzschild radius. This result has been obtained by preliminary deriving the canonic relativistic equations and, afterwards, applying the Lyapunov Direct Method. The canonic equations point out three (and not two) state variables which are radius, radial speed and magnitude of the rotation speed. In addition we have postulated equivalence Potential energy /mass in agreement with a classical orientation of the school of Milan as well as with some important (though incomplete) extrapolations of GR (Landau, Weinberg) [27,39]. We have also seen that the aforesaid necessary condition imposes a constraint on the maximum rotation speed and then on the maximum value of the relativistic mass coefficient γ : it is γ max = 1.618 , i.e. equal to the golden ratio. In Part 2 we expand the analysis by determining a sufficient condition for asymptotic stability that directly mixes up the Schwarzschild radius and allows us to formulate an original theory of black-holes.
Bibliography See PART 2.
Relativistic Stability. Part 1 - Relation between SR and ST …
671
Appendix. Eigenvalue analysis and necessary condition for stability As ρ R0 = v02 m(v0 ) then
∂FR ∂R
∂ ρ − 2 ∂R R
= ( R0 , v 0 )
FR vm(v)
=− ( R0 , v 0 )
= R0
2 2 [v0 m(v0 )] R02
v ρ 1 =− 0 R0 R02 v0 m(v0 )
Also, as
dm(v) m ( v )v = 2 dv c − v2
FR m( v )
;
=− ( R0 ,v0 )
v02 R0
then
∂ FR ∂v m(v)
FR m (v )
=− ( R0 , v 0 )
( R0 , v 0 )
c
2
v0
− v02
=
v0 v02 R0 c 2 − v02
Consequently
2v02
a 21 =
R02
a23 =
c
−
2
v02
R02
v02
− v02
=
v02
R02
v02
if
c
2
<1 ;
v0 v02 −1 R0 c 2
v0 2v0 v0 2 − (v02 / c 2 ) + = R0 R0 R0 1 − (v02 / c 2 )
a23 a32 + a21 = a23 a32 + a21 < 0
a32 =
;
v02
v02
R02 c 2
−1
a23 a32 + a21 = 0
if
v02 c2
=1
As our relativistic model presupposes v0 ≤ c we can see that a23 a32 + a21 < 0 and consequently the eigenvalues are
λ1 = 0 ;
λ2 = j a23 a32 + a21 ;
It is inferred from (26)
λ3 = − j a23a32 + a21 .
672
U. Di Caprio
Tˆ0 + Eˆ p 0 < 0
if
1− γ 0
v02 c2
>0 ;
Tˆ0 + Eˆ p 0 > 0 if
and consequently the condition (1 − γ 0 v02 c 2 ) > 0 Further on, as
(1 − γ 0 v02
2
c ) = (1
γ 0 )(γ 0 − γ 02
+ 1)
1− γ 0
v02 c2
<0
is necessary for stability. and (γ 0 − γ 02 + 1) > 0 in
γ 1 < γ 0 < γ 2 , with γ 1 = (1 − 5 ) 2 and γ 2 = (1 + 5 ) 2 , then the condition γ 0 < (1 + 5 ) 2 is necessary. In parallel, since γ is an increasing function of v and (v / c) = ( γ 2 − 1 γ ) the condition v0 < v2 , with (v2 / c) = 0.7815 is a necessary (and equivalent) condition as well. Furthermore, as shown in Part 2, the equivalence Potential energy /mass leads to a reversing of the sign of the system eigenvalues for R < Rmin , so that one eigenvalue becomes real and positive (instability!).
RELATIVISTIC STABILITY. PART 2 - A STUDY OF BLACK-HOLES AND OF THE SCHWARZSCHILD RADIUS
UMBERTO DI CAPRIO Stability Analysis s.r.l., Via A. Doria 48/A - 20124 Milano, Italy E-mail: [email protected] We point out a sufficient condition for existence of a stable attractor in the two-body restricted problem. The result is strictly dependent on making reference to relativistic equations and could not be derived from classical analysis. The radius of the stable attractor equals the well known Schwarzschild radius of General Relativity (GR). So we establish a bridge between Special Relativity (SR) and GR via Stability Theory (ST). That opens one way to an innovative study of black-holes and of the cosmological problem. A distinguishing feature is that no singularities come into evidence. The application of the Direct Method of Lyapunov (with a special Lyapunov function that represents the local energy) provides us the theoretical background. Keywords: stability theory, Schwarzschild solution, black-holes.
1. Introduction We have seen in Part I that in the two-body relativistic (and restricted) problem there exists a critical radius Rmin so that orbits with radius R < Rmin are unstable. Let us deepen this issue and next expand the analysis. The assumption that Potential energy is equivalent to mass leads to a meaningful change of the system eq.s for R < Rmin . In fact like shown in Part 1, it is m g < 0 for R < Rmin , with
mˆ 0 = m0 +
E p0 c2
= m0 1 −
GM GM ; Rmin = 2 R0 c 2 c
(1)
Consequently the gravitational force becomes repulsive and the centrifugal force becomes centripetal. Due to such reversing the differential equation (14) in Part 1 is to be replaced with
R=
GM v 2 − R 2 − R R2
673
(2)
674
U. Di Caprio
while eq. (15) in Part 1 remains unchanged. By linearization of the system equations, according to the same scheme illustrated in Part 1, we find that the coefficients a 21 and a 23 change their sign whilst the coefficient a32 remains ˆ unchanged. In other words the linearized system matrix becomes equal to A with
0 ˆ A = aˆ 21 0 with
1 0 aˆ32
0 aˆ 23 ; aˆ 21 = − a21 ; aˆ 23 = −a23 ; aˆ32 = a32 0
aˆ 21 = −a21 ; aˆ 23 = −a23 ; aˆ32 = a32
(3)
(4)
then
aˆ 21 aˆ32 + a21 = −(a23 a32 + a21 ) = −
v02 v02 −1 > 0 R02 R
(5)
Equation (5) brings about that one of the system eigenvalues is real positive. Of course this means that the linearized system equations are unstable, in conformity with the results shown in Part 1. The circular motion is decomposed in two exponential motions one of which diverges from the Equilibrium. Hence an Equilibrium does not exists at all for R < Rmin (namely body m0 falls on body M ). This analysis gives us a sound basis for an innovative study of blackholes. The Lyapunov Method proves to be the right tool for affording the problem. In Sec. 2 we preliminary study the non-relativistic case and point out the crucial role of the Potential Energy. In Sec. 3 we analyze the relativistic case (with reference to the equations illustrated in Part 1) and answer the following question: there exists a circular orbit along which the Potential Energy takes a minimum? The answer is affirmative and the radius of the orbit in question is Rs = 2GM c 2 i.e. equal to the radius of Schwarzschild. In Sec. 3 we discuss a behavior of Potential Energy. In Sec. 4 we show the application of the Lyapunov Method. 2. Recall of the stability conditions for the non-relativistic case In the non-relativistic formulation (which can be obtained from relativistic one by letting c → ∞ ) the differential equation of motion is the well known eq.
R=−
GM a 2 − ; with R3 R2
−
GM a 2 1 + 3 = ( FR + Fc ) 2 m R R 0
(6)
Relativistic Stability. Part 2 - A Study of Black-Holes …
FR = −
GMm0
gravitational force; Fc = m0
R2 a areolar velocity.
a2 R3
675
centrifugal force;
Equation (6) has the following canonic representation
R
x = f ( x) ; x =
R
;
f ( x) =
0 − (GM x 2 ) + a 2 x 3
and any point x 0 defined by x0T = [ R0 0] with R0 = a 2 GM Equilibrium point. The linearized system matrix is given by
A=
0
1
a21 0
is an
; a 21 = R0−3 [2GM − 3a 2 R0−1 ] = R0−3 (GM )
and its eigenvalues are the solutions of the equation
λ2 = − a21 → λ1 = j a21 ; λ2 = − j a21 . So the linearized system equations are weakly stable. As regards the non linear system behavior the following function is a Lyapunov function R
V ( R, R ) =
GMm0 a 2 m0 1 m0 R 2 + − dR 2 R2 R3 R
(7)
0
This function is positive-definite at x 0 and its time derivative along the system trajectories is globally equal to zero. In fact V ( R0 ,0) = 0 , [∂V ∂R ]R0 = 0 , [∂V ∂R] R0 = 0 and
∂ 2V ∂R 2
=− R0
GMm0 R03
;
∂ 2V =0; ∂R∂R
∂ 2V ∂R 2
= m0
(8)
GMm0 a 2 m0 dV ∂V ∂V GM a 2 = ×R+ ×R= − R + m R − + 3 = 0 (9) 0 dt ∂R ∂R R2 R3 R3 R Above equations implicate that function V has a local minimum at x 0 and then V is positive-definite at x 0 . Such property brings about that the surfaces V = const are closed (around x 0 ). Moreover, as grad V ≠ 0 for x ≠ x 0 the aforesaid closure is kept up to R = ∞ (i.e. stability is of global type). It is
676
U. Di Caprio
readily seen that V ( R, R) = E − E0 where E is the classical energy function defined by E = (m0v 2 2) − (GMm0 R) . Of course we can write
1 m0 R 2 + ( E p − E p 0 ) 2 GMm0 1 m0 a 2 1 GMm0 Ep = + ; E p0 = 2 R 2 R 2 R0 V ( R, R ) =
The function E p has a local minimum at R0 (such property is not possessed by function E p ). We could extend our analysis to elliptic rather than circular orbits (and would find that elliptic orbits are weakly stable). However such extensions is beyond the scopes of the present work.
3. The stable orbit with minimum potential energy in equilibrium In Equilibrium
v2 v2 Eˆ p = −γ 0 02 1 − γ 0 02 m0 c 2 ≡ f (γ 0 ) m0 c 2 c c
(10)
with
f (γ 0 ) = − It is
dEˆ p dγ 0
=
γ 02 − 1 γ 02 − 1 + γ0 γ0
2γ 04 − γ 03 − γ 0 − 2
γ 03
2
γ 04 − γ 03 − 2γ 02 + γ 0 + 1 γ 02
=
=
2
(γ 02 + 1) 2
γ0
γ 02 − 1 1 − γ0 2
v02
2
1 = 2 (γ 02 + 1) γ 0 2 − 2 γ0 c Therefore
dEˆ p dγ 0
Also
d 2 Eˆ p dγ 02
=
= 0 in γ 0 = γ s
2γ 04 + 3γ 02 + 2γ 0 + 6
γ 04
→
with γ s
d 2 Eˆ p dγ 02
v s2 c2
=
1 2
> 0 for any γ 0 ≥ 1
(11)
(12)
(13)
(14)
Relativistic Stability. Part 2 - A Study of Black-Holes …
677
We conclude that the function Eˆ p (γ 0 ) has a minimum at γ 0 = γ s with γ s defined by the equations
γs
vs2 c
2
=
1 1 ; γs = 2 1 − (vs2 c 2 )
→ γs =
1 + 17 ≅ 1.28 4
For our commodity we call this number emi-golden ratio. (The corresponding value of the rotation speed is vs c = 0.6249 . From eq.
ρ
= γ s m0 v s2 ; ρ = GMm0
Rs
we get the value of the radius Rs of the orbit along which v = vs and γ = γ s
Rs =
ρ m0 c
2
(γ s2 vs2
2
c )
=2
ρ m0 c
2
=
2GM c2
(15)
What is the physical meaning of the aforesaid property? The application of the Direct Method of Lyapunov clarifies the issue.
4. Application of the Lyapunov method Any point x 0 = ( R, R0 , v0 ) with
γ 0 v02 =
2GM ; R0 > 0 ; R0 = 0 R0
is an Equilibrium Point (and R0 determines the value of v0 ). The special Point x 0 individualized by
γ 0 v02 c
2
R0 = Rs =
2GM c2
γ s vs2
2GM
=
2
=
2
c Rs
(16)
=
1 2
(17)
is asymptotically stable. In fact the following function V ( x) = V ( R, R, v) can be proven to be a Lyapunov function [8]
678
U. Di Caprio
V ( R, R, v ) = mˆ 0
R R0
+ + with
a 2 = GMR0 ;
GM R2 GM R02
2GM
1−
+
Rc 2
R −
GM R02
dR +
1 a2 2 R2
7 GM 1 ( R − R0 ) 2 + R 2 3 6 R0 2
(18)
b b1 b ( R − R0 ) R + 2 R (v − v0 ) + 3 (v − v0 ) 2 2 2 2
b1 = −2
v0 ; R0
b2 = 4(γ 02 + 1) ;
mˆ 0 = m0 1 −
γ 0 v02 c
2
=
b3 =
γ0
γ 02
+1
m0 2
(19) (20)
More precisely V-function (18) is positive-definite at x 0 and its time derivative along the system trajectories is locally negative-definite at x 0 . Consequently x 0 is asymptotically stable.
Theorem 1: In the (restricted) two-body relativistic problem there exists an asymptotically stable solution. Such solution is identified by a circular orbit with radius equal to the radius of Schwarzschild ( Rs = 2GM Rc 2 ). The corresponding radial speed is (obviously) equal to zero while the magnitude of the velocity v is equal to the emigolden speed defined by v s c = (γ s2 − 1) / γ s2 with γ s = 1.2807 the emigolden ratio.
Theorem 2: The condition R > Rmin with Rmin = GM c 2 , is necessary for stability while the condition R = Rs = 2GM c 2 is sufficient. Radius R = Rs identifies a stable attractor. Aforesaid conditions impose conditions on the rotation speed (and on the relativistic mass coefficient γ ). They are expressed by v < vmax ; γ < γ max necessary condition (21)
v = vs ;
γ =γs
sufficient condition
(22)
with
γ max = γ r =
1+ 5 ; 2
γ max
2 vmax
c
2
= 1;
γs =
1+ 17 ; 4
γs
vs2 c
2
=
1 2
(23)
Relativistic Stability. Part 2 - A Study of Black-Holes …
679
Theorem 3: The Lyapunov function (19) represents an Energy and satisfies the eq. V = Eloc − E0 with Eloc the “local energy” defined by
GM
Eloc = mˆ 0
R
2
1−
2GM Rc
2
dR +
GMmˆ 0 R02
R+
1 a2 mˆ 0 2 R2
7 GM − mˆ 0 ( R − R0 ) 2 6 R03 b R 2 b1 b + ( R − R0 ) R + 2 R(v − v0 ) + 3 (v − v0 ) 2 2 2 2 2
+ mˆ 0 mˆ 0 =
(24)
γ 0 v02 1 m2 2GM ; = ; R0 = 2 ; a 2 = GMR0 2 2 2 c c GMmˆ 0 E0 = R0
(25) (26)
When c → ∞ the local energy becomes equal to the classical conservation energy. Proof: The statement is evident. Moreover the V function satisfies eq. V = Eloc − E0 with
GM
E0 = Eloc ( x 0 ) = mˆ 0
R
2
−2
(GM ) 2 3 2
R c
+
dR R0
GMmˆ 0 1 a 2 + mˆ 0 R0 2 R02
and, as
GM R2 GM R
2
−2
−2
(GM ) 2 3 2
R c
(GM ) 2 R 3c 2
dR = −
=−
dR x0
GM (GM ) 2 + 2 2 R R c
GM GM GM 1 GM + = 2 R0 R0 R0 c 2 R0
then
E0 = mˆ 0 −
1 GM GM 1 GM GM m0 GM = mˆ 0 = + + 2 R0 R0 2 R0 R0 2 R0
Furthermore the function Eloc is representable in the form
680
U. Di Caprio
Eloc = mˆ 0
GM R2 − +
−2
(GM ) 2 R 3c 2
+
1 a 2 2GM GM + R dR 2 R 2 R0 c 2 R02
2GM 7 GM 2GM R 2 2 ( R − R ) + 0 R0 c 2 6 R03 R0c 2 2 2GM R0c 2
b b1 b ( R − R0 ) R + 2 R (v − v0 ) + 3 (v − v0 ) 2 2 2 2
and hence, since
lim mˆ 0 = m0
c→∞
it is
lim Eloc = m0
c →∞
GM R2
dR +
1 a2 ≡ classical energy function 2 R2
Corollary1: The local energy at the stable Equilibrium Point is positive (and equal to GMm0 2R0 ) Remark 1: When we apply Corollary 1 to the analysis of the two-body electrical problem (photon problem) we find that the local energy in equilibrium is but the photon energy EΦ = kq 2 2R0 with k the Coulomb constant. The preceding analysis allows us to reset the current theory on black holes. Around any massive body M there exists a “forbidden region” so that an external body m0 entering such region “falls” on M. E.g. if M = M sun the radius RB of the forbidden region is RB = (GMm0 c 2 ) about equal to 1.500 Km (much smaller than the radius of the sun itself). Clearly an anomalous situation arises when radius RM of the massive body is smaller than RB . Such situation is leadable back to a condition on volume and on density
Volume ( M ) <
ρM >
4 π RB3 3
if
RM < R B
M 1 M 1 c6 RB3 = = (4/3)π (4/3)π (GM/c 2 ) 3 (4/3)π G 3 M 2
(27) (28)
When condition (28) is verified we say that M is a black-hole. The critical density is defined by
Relativistic Stability. Part 2 - A Study of Black-Holes …
ρb >
M c2 (4π /3) G
3
1 M2
681
(29)
Such quantity is “relatively” larger for small black-holes. As an example ρ b = 1.46 × 1020 kg / m 3 if M = 2 × 1030 kg . ρ b = 5.83 × 1080 kg / m 3 if M = 1 kg . An external body m0 entering the region RM < R < RB undergoes chaotic events. Its mass becomes negative and the trajectory cannot be “closed”: it is decomposed into two distinct and contradictory “modes” which are an increasing exponential and a decreasing exponential. The body splits itself in two separate parts, one of which falls on black hole (while the other departs itself from he black-hole and finally comes out the forbidden region). As total energy must be kept, and the part falling on the black-hole is negative, such event determines partial “evaporation”. The outgoing part acquires a positive energy and travels toward the stable attractor which eventually consists of a matter crust with radius Rs (equal to the Schwarzchild radius). In a subsequent work we study the Region of Attraction of the stable attractor and the related “accretion disk”. Here we confine ourselves to a mention of the Hawking's formula on temperature
Θb =
c3 1 4π GM k B
(30)
and of the way by which our results shed further light on such formula. We find that (30) is equivalent to
Θb =
m0 c 2 α rB k B 2π Rs
; Rs =
2GM c2
with rB the Bohr radius, m0 the electron mass, k B the Boltzmann constant. No singularities appear in our formulation: in particular the gravitational force of the black-hole has a finite value.
5. Conclusion Using the Lyapunov Direct Method and the local energy as the Lyapunov function we determined a sufficient condition for asymptotic stability in the twobody relativistic (restricted) problem. There exists a special orbit with radius equal to the Schwarzschild radius, that represents a stable attractor. This result completes the analysis in Part 1 (where we pointed out a necessary condition, which required that the radius of a stable orbit must be greater than half the
682
U. Di Caprio
Schwarzschild radius). Thus we have established a bridge between Special Relativity (SR) and General Relativity (GR) via Stability Theory (ST). We have delineated the application to black-holes, setting forward an innovative analysis. The concept of local energy is novel and should be considered a primary contribution. It substantially generalizes the classical conservation energy and reduces to it when speed of light in vacuum tends to infinity. The local energy is defined with regard to a specific Equilibrium Point and possesses peculiar signdefinitiness properties. With reference to the stable Attractor the time derivative of such energy is negative-definite: that means that the two-body system is locally dissipative. This reminds us the theoretical results illustrated in [15] with regard to dissipative Electric Power Systems. Another striking finding is that along the basic stable orbit the local energy is positive (while the classical energy is negative). This explains the positive value of the energy of the photon in the two-body electrodynamical problem. Last but perhaps not least we have connected the radius of the stable attractor with a special value of the relativistic mass coefficient γ : the latter must be equal to the emigolden ratio much the same as (see Part 1) the minimum radius for stability turns out connected to a value of γ equal to the golden ratio.
6. BIBLIOGRAPHY 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
M.A. Abramowicz, Scientific American 268,26 (1993). G. Arcidiacono, Relatività e Cosmologia (Veschi, Roma, 1973). Arnol'd V.I., La teoria delle catastrofi (Boringhieri, Italy,1987). E.A. Barbashin, N.N. Krasovskii, Prikle. Kat. Mekh. 18, 345-350 (1954). A. Bruce, Nature 347, 615 (1990). P. Caldirola, Teoria quantistica relativistica (Viscontea, Milano, 1963) S. Chandrasekher, The mathematical theory of black holes (Clarendon Press, Oxford, 1983). U. Di Caprio, Int. J. of EPES 8(4), 225-235 (1986). U. Di Caprio, Int. J. of EPES 8(1), 27-41 (1987). U. Di Caprio, W. Prandoni, Int. J. of EPES 10(1), 41-53 (1988). U. Di Caprio, G. Spavieri, Hadronic J. 22, 675-692 (1999). U. Di Caprio, Hadronic J. 23, 689 (2000). U. Di Caprio, Int. J. of EPES 23(3), 229-235 (2001). U. Di Caprio, Supplement to Hadronic J. 16(1), 163-182 (2001). U. Di Caprio, Int. J. of EPES 24(5), 421-429 (2002). U. Di Caprio, in Emergence in Complex, Cognitive, Social and Biological Systems, Ed. G. Minati and E. Pessa, (Kluwer Acadmic/Plenum Publishers, New York, 2002), pp. 127-140. U. Di Caprio, in Systemics of Emergence: Research and Development, Ed. G. Minati, E. Pessa, M. Abram, (Springer New York, 2006), pp. 31-66.
Relativistic Stability. Part 2 - A Study of Black-Holes …
683
18. U. Di Caprio, Application of the Di Caprio-Lyapunov Method to the study of Cosmological Problems, Stability Analysis, Int. Rep. 2006-1, (2006).
19. R.H. Dicke, in Relativity, Groups and Topology, Ed. C. De Witt and B. De Witt, 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
(Gordon and Breach, New York, 1964). A.D. Dolgov and Ya. B. Zeldovich, Rev. of Modern Physics 53, 1-41 (1981). R.D. Driver, Arch. Mech. Analysis 10, 401-426 (1962). R. Gautreau, W. Savin, Modern Physics (McGraw-Hill, New York, 1978). W. Hahn, Theory and Application of Liapunov's Direct Method (Mc Graw-Hilll, New York, 1963). J.K. Hale, J. of Diff. Eqs. 1, 452-482 (1965). N.N. Krasovskii, Certain problems of the theory of stability of motion (Russian ed.: Moscow, 1959); (American ed.: Stanford, Ca, 1963). G. Jumarie, Subjectivity, Information, Systems: Introduction to a Theory of Relativistic Cybernetics (Gordon and Breach, New York, 1986). L.D. Landau, E.M. Lifisitz, The classical theory of fields (Butterworth-Heineman, Oxford, 1980). J.P. La Salle, IFAC Congress, art. 415 (1963). G.C. Macvittie, General Relativity and Cosmology (Chapman & Hall, London, U.K., 1965). E.A. Milne, Relativity, Gravitation and World Structure (Clarendon Press, Oxford, U.K., 1935). B.B. Mandelbrot, Encyclopedia of Mathematical Sciences, Vol. 1,4,5,39 (SpringerVerlag, Berlin, 1988). V.V. Nemyskii, V.V. Stepanov, Qualitative theory of differential equations (Russian ed.: Moscow, 1947); (American ed.: Princeton, 1960). J.D. North, The measure of the Universe (Oxford University Press, Oxford ,UK, 1952). H.P. Robertson, T.W. Noonan, Relativity and cosmology (Saunders, Philadelphia, 1968). M.P. Ryan, L.C. Shepley, Homogeneous Relativistic Cosmologies (Princeton University Press, Princeton, N.J., 1975). D.W. Sciama, Modern Cosmology (Cambridge University Press, Cambridge, U.K., 1971). J. Stachel, Ed., Einstein's miraculous year. Five papers that changed the face of physics (Princeton University Press, Princeton, N.J., 1998). R. Thom, Stabilitè strcturelle et morphogenis (W.A. Benjamin, Reading, Massachusetts, 1972). S. Weinberg, Gravitation and cosmology (John Wiley & Sons, New York, 1972). S. Weinberg, The quantum theory of fields: Foundations, (Vol. I) (Cambridge University Press, Cambridge, 1995). H. Whitney, Ann. Math. 62(3), 374-410 (1955). C.M. Will, Was Einstein right? (Basic Books, New York, 1986). V.I. Zubov, Methods of A.M. Lyapunov and their application (Leningrad, 1959), (English transl.: Noordhoff, Gronigen, 1964).
This page intentionally left blank
THE FORMATION OF COHERENT DOMAINS IN THE PROCESS OF SYMMETRY BREAKING PHASE TRANSITIONS
EMILIO DEL GIUDICE (1), GIUSEPPE VITIELLO (2) (1) Istituto Nazionale di Fisica Nucleare, Sezione di Milano, Via Celoria 16, I-20133 Milano, Italia (2) Dipartimento di Matematica e Informatica, Università di Salerno and Istituto Nazionale di Fisica Nucleare, Gruppo Collegato di Salerno, 84100 di Salerno, Italia The emergence of the phase locking among the electromagnetic modes and the matter components on an extended space-time region is discussed. The stability of mesoscopic and macroscopic complex systems arising from fluctuating quantum components is considered under such a perspective. Keywords: symmetry breaking, coherent domains, Anderson-Higgs-Kibble mechanism, gauge fields.
The general problem of the stability of mesoscopic and macroscopic complex systems arising from fluctuating quantum components is of great interest in many sectors of condensed matter physics and elementary particle physics. For example in the problem of defect formation during the process of nonequilibrium symmetry breaking phase transitions characterized by an order parameter [1]. Examples of topological defects are vortices in superconductors and superfluids, magnetic domain walls in ferromagnets, and many other extended objects in condensed matter physics, also including cosmic strings in cosmology, which may have been playing a role in the phase transition processes in the early Universe [2]. In the study of spontaneously broken symmetry theories in quantum field theory (QFT) the Anderson-Higgs-Kibble (AHK) mechanism is well established [3-5]: the gauge field is expelled out of the ordered domains and confined, through self-focusing propagation, into “normal” regions where the order parameter is vanishing. In this report, our attention is focused on the dynamics governing the radiative gauge field inside the ordered region, in particular on its role in the onset of phase locking among the e.m. modes and the matter components. In our discussion we will closely follow Ref. [6].
685
686
E. Del Giudice and G. Vitiello
At a first sight one would say that in the AHK mechanism the gauge field is in competition with the long range correlation established, as a dynamical consequence of the spontaneous symmetry breakdown, among the system components by the Nambu-Goldstone (NG) particles. However, as we show here, the radiative gauge field plays a role also in the ordered regions where it sustains the emergence of coherence. As we will show, such a role is in some sense complementary to, not in contradiction with the AHK mechanism. Phase locking among the matter field and the radiative gauge field in an extended space-time region is found to be the dynamical response of the system aimed to preserve the theory local gauge invariance. The physical meaning of local gauge invariance is to guarantee the stability of the system at mesoscopic and macroscopic space-time scales against quantum fluctuations characterizing the behavior of the quantum components at the microscopic scale. The QFT solution to the problem of building a stable system out of fluctuating components consists in prescribing that the Lagrangian of the system should be indeed invariant under the local phase transformation of the quantum component field ψ (x, t ) → ψ ' (x, t ) = exp(igθ (x, t ))ψ (x, t ) . Local phase invariance is then achieved by introducing the gauge fields, e.g. the electromagnetic (e.m.) field Aµ (x, t ) , such that the Lagrangian be also invariant under the local gauge transformation Aµ (x, t ) → A' µ (x, t ) − ∂ µ θ (x, t ) . This is devised to compensate terms proportional to ∂ µ θ (x, t ) arising from the Lagrangian kinetic term for the matter field ψ (x, t ) . The gauge field may be thus described as a compensating “reservoir” against variations in the many accessible microscopic configurations of the system. Our model system consists of an ensemble of a given number of two-level atoms, say N per unit volume V, which may represent rigid rotators endowed with an electric dipole. We consider the interaction of these atoms with the e.m. quantum radiative modes generated in the transitions between the atom levels and disregard static dipole-dipole interaction. The system is assumed to be spatially homogeneous and in a thermal bath kept at a non-vanishing temperature T. Under such conditions the system is invariant under dipole rotations and since the atom density is assumed to be spatially uniform the only relevant variables are the angular ones. In our discussion we use natural units = 1 = c . By closely following the presentation of ref. [6] we denote with dΩ = sin θ dθ dφ the element of solid angle and with (r , θ , φ ) the polar coordinates of r. The dipole wave field φ (x, t ) integrated over the sphere of unit radius r gives:
The Formation of Coherent Domains in the Process of Symmetry Breaking … 687 2
dΩ φ (x, t ) = N , which, in terms of the rescaled field χ (x, t ) =
1 N
(1)
φ (x, t ) , reads as
2
dΩ ϕ (x, t ) = 1 .
(2)
Under the assumed conditions the field χ (x, t ) may be expanded in the unit sphere in terms of spherical harmonics χ (x, t ) = l , m α l , m (t ) Yl m (θ , φ ) . By setting α l , m (t ) = 0 for l ≠ 0,1 , this reduces to the expansion in the four levels (l , m) = (0,0) and (1, m), m = 0,±1 . The populations of these levels are given by N | α l ,m (t ) |2 and at thermal equilibrium, in the absence of interaction, they follow the Boltzmann distribution. Since thermal equilibrium and the dipole rotational invariance imply that there is no preferred direction in the dipole orientation, | α1,m (t ) | for any m are independent of m and we may put
α 0,0 (t ) ≡ a0 (t ) ≡ A0 (t )eiδ 0 (t ) , α1, m (t ) ≡ A1 (t ) e iδ
iδ 1,m (t ) − iω 0 t
e
≡ α1, m (t ) e −iω 0 t .
(3)
(t )
Here α1, m (t ) ≡ A1 (t ) e 1,m . The amplitudes A0 (t ) and A1 (t ) and the phases δ 0 (t ) and δ1, m (t ) are real quantities; we will also define ω t ≡ δ1,0 (t ) − δ 0 (t ) . In Eqs. (3) ω0 ≡ 1 I , where I denotes the moment of inertia of the atom and gives a relevant scale for the system, ω 0 ≡ k = 2π λ . The eigenvalue of L2 2 I on the state (l , m) is
l (l + 1) 1 = = ω0 , 2I I where L2 denotes the squared angular momentum operator. The three levels (l , m), m = 0,±1 are in the average equally populated under normal conditions. This is confirmed by the absence of permanent polarization in the system. Indeed, in the assumed conditions, the time average of the polarization Pn along any direction n is found to be vanishing [6,7]. Therefore, we can safely write 2
m
2
α1,m (t ) = 3 a1 (t ) .
The normalization condition (2) gives
688
E. Del Giudice and G. Vitiello 2
2
Q ≡ α 0,0 (t ) +
2
2
α1, m (t ) = a0 (t ) + 3 a1 (t ) = 1 , ∀ t
(4)
m
and therefore
∂Q = 0 , i.e. ∂t ∂ 1 ∂ 2 2 a1 (t ) = − a 0 (t ) . ∂t 3 ∂t
(5)
The conservation law ∂Q ∂ t = 0 expresses the conservation of the total number N of atoms; Eq. (5) means that, due to the rotational invariance, the rate of change of the population in each of the levels (1, m), m = 0,±1 , equally contributes, in the average, to the rate of change in the population of the level (0,0), at each time t. We can set, consistently with Eq. (4), the initial conditions at t = 0 as 2
a0 (0) = cos 2 θ 0 ,
1 π 2 a1 (0) = sin 2 θ 0 , 0 < θ 0 < . 3 2
(6)
The θ 0 values zero and π 2 are excluded since it is physically unrealistic for the state (0,0) to be completely filled or completely empty, respectively. The parameter θ 0 can be properly tuned in its range of definition; for example, θ 2 = π 3 describes the equipartition of the field modes of energy E (k ) among 2 the four levels (0,0) and (1,m), a0 (0) ≅ | a1, m (0) |2 , m = 0, ± 1 , as given by the Boltzmann distribution when the temperature T is high enough, k BT >> E (k ) . Below we show that the lower bound for the parameter θ 0 is imposed by the dynamics in a self-consistent way. The field equations are [8,9]:
i
∂χ (x, t ) L2 = χ (x, t ) − i ∂t 2I i
d ρ k,r
[
]
k (ε r ⋅ x) ur (k , t )e − ikt − u r+ (k , t )e ikt χ (x, t ) , 2
∂u r (k , t ) k ikt 2 =id ρ e dΩ (ε r ⋅ x) χ (x, t ) , ∂t 2
(7)
where ur (k , t ) = (1 N ) cr (k , t ) , and cr (k , t ) denotes the radiative e.m. field operator with polarization r; d is the magnitude of the electric dipole moment, ρ ≡ N V and ε r is the polarization vector of the e.m. mode (the transversality condition k ⋅ ε r = 0 is assumed to hold). Notice the enhancement by the factor N appearing in the coupling d ρ in Eqs. (7) due to the rescaling of the fields. In Ref. [6] it has been shown that such a rescaling is actually responsible for the collective behavior of the system in the large N limit. This is related with the fact that, as evident from Eqs. (7), the collective interaction time scale is
The Formation of Coherent Domains in the Process of Symmetry Breaking … 689
shorter by the factor 1 N than the short range interactions among the atoms. Hence, the mesoscopic/macroscopic stability of the system vs the quantum fluctuations of the microscopic components. In obtaining Eqs. (7) we have restricted ourselves to the resonant radiative e.m. modes for which k = 2π λ = ω0 , and we have used the dipole approximation exp(ik ⋅ x) ≈ 1 since we are interested in the macroscopic behavior of the system. This means that the wavelengths of the e.m. modes we consider, of the order of 2π ω0 , are larger than (or comparable to) the system linear size. The amplitude of the e.m. mode coupled to the transition (1, m) ↔ (0,0) is denoted by
um (t ) = U (t ) eiϕ m (t ) ,
(8)
where U (t ) and ϕ m (t ) are real quantities. We remark that Eqs. (7) are not invariant under time-dependent phase transformations of the field amplitudes. Our task is to investigate how the local (in time) gauge symmetry can be recovered. Eqs. (7) are of course consistent with the conservation law ∂Q ∂ t = 0 and they also show that 2 ∂ ∂ 2 u m (t ) = −2 a1, m (t ) , ∂t ∂t
(9)
from which we see that um (t ) does not depend on m since α1, m (t ) = a1, m (t ) does not depend on m. We can derive another conservation law, 2
2
u (t ) + 2 a1 (t ) =
2 2 sin θ 0 , ∀ t 3
(10)
where u (t ) ≡ um (t ) , a1 (t ) ≡ a1, m (t ) , the initial condition (6) has been used and we have set
u (0 ) = 0 .
2
(11)
A0 (t ) = Ω U (t ) A1 (t ) cos α m (t ) ,
(12)
A1 (t ) = −Ω U (t ) A0 (t ) cos α m (t ) ,
(13)
U (t ) = 2Ω A0 (t ) A1 (t ) cos α m (t ) ,
(14)
A0 (t ) A1 (t ) sin α m (t ) , U (t )
(15)
Equations (7) give
ϕ m (t ) = 2 Ω
690
E. Del Giudice and G. Vitiello
where the dot over the symbol denotes time derivative,
Ω≡ and
2d 3
ρ 2ω 0
ω0 ≡ G ω0
α m ≡ δ 1,m (t ) − δ 0 (t ) − ϕ m (t ) .
(16)
Equations for δ1, m and δ 0 can be derived in a similar way. Eqs. (12) - (14) show that the phases turn out to be independent of m. Indeed, the right hand sides of these equations have to be independent of m since their left hand sides are independent of m, so either cos α m (t ) = 0 for any m at any t, or α m is independent of m at any t. In both cases, Eq. (15) shows that ϕ m is then independent of m, which in turn implies, together with Eq. (16), that δ1, m (t ) is independent of m. We therefore put ϕ ≡ ϕ m , δ1 (t ) ≡ δ1, m (t ) , α ≡ α m , u (t ) ≡ um (t ) and a1 (t ) ≡ a1, m (t ) . One can always change the phases by arbitrary constants. However, if they are equal in one frame they are unequal in a rotated frame and gauge invariance is lost. The independence of m of the phases is here of dynamical origin and the phase locking which we will find (see Eq. (18)) among δ 0 (t ) , δ1 (t ) and ϕ (t ) has indeed the meaning of recovering the gauge symmetry. The study of the system ground states for each of the modes a0 (t ) , a1 (t ) and u (t ) shows that spontaneous breakdown of the global SO(2) symmetry (the global phase symmetry) in the plane (a0, R (t ), a0, I (t )) occurs [6] (the indexes R and I denote the real and the imaginary component, respectively, of the field). In the semiclassical approximation [5], we find [6] that for the mode a0 (t ) there is the quasi-periodic mode with pulsation
m0 = 2Ω (1 + cos 2 θ 0 ) (the 'massive' mode with real mass 2Ω (1 + cos 2 θ 0 ) ) and a zero-frequency mode δ 0 (t ) corresponding to a massless mode playing the role of the NG field. Note that the value a0 = 0 consistently appears to be the relative maximum for the potential, and therefore an instability point out of which the system (spontaneously) runs away. On the other hand, a1 (t ) is found [6] to be a massive field with (real) mass (pulsation) σ 2 = 2Ω 2 (1 + sin 2 θ 0 ) . For the u (t ) field, the global SO(2) cylindrical symmetry around an axis orthogonal to the plane (u R (t ), u I (t )) can be spontaneously broken or not, according to the negative or positive value of the squared mass
The Formation of Coherent Domains in the Process of Symmetry Breaking … 691
µ 2 = 2Ω 2 cos 2θ 0 of the field, respectively, as usual in the semiclassical approximation. In the case, µ 2 < 0 , i.e. θ 0 > π 4 , the potential has a relative maximum at u0 = 0 and a (continuum) set of minima given by
µ2 π 1 2 u (t ) = − cos 2θ 0 = − ≡ v 2 (θ 0 ) , θ 0 > , 2 3 4 6Ω
(17)
representing (infinitely many) possible vacua for the system. They transform into each other under shifts of the field ϕ : ϕ → ϕ + α . The global phase symmetry is broken, the order parameter is given by v(θ 0 ) ≠ 0 and one specific ground state is singled out by fixing the value of the ϕ field. We have a 'massive' mode, as indeed expected in the AHK mechanism [5], with real mass 2 | µ 2 | = 2Ω | cos 2θ 0 | (a quasi-periodic mode) and the zero-frequency mode ϕ (t ) (the massless NG collective field, also called the “phason” field [10]). The fact that in such a case u0 = 0 is a maximum for the potential means that the system dynamically evolves away from it, consistently with the similar situation noticed for the a0 mode. We therefore find that dynamical consistency requires θ 0 > π 4 . We now observe that, provided θ 0 > π 4 , a time-independent amplitude U (t ) ≡ U is compatible with the system dynamics (e.g. the ground state value of A0 ≠ 0 implies U = const. ). Equations (14) and (15) indeed show that such a time-independent amplitude U = const. exists, U (t ) = 0 , if and only if the phase locking relation
α = δ1 (t ) − δ 0 (t ) − ϕ (t ) =
π 2
(18)
holds. Therefore,
ϕ (t ) = δ1 (t ) − δ 0 (t ) = ω ,
(19)
and this shows that any change in time of the difference between the phases of the amplitudes a1 (t ) and a0 (t ) is compensated by the change of the phase of the e.m. field. When Eq. (18) holds we also have A0 = 0 = A1 (cf. Eqs. (12), (13)). The phase relation (18) shows that, provided θ 0 > π 4 , α = 0 . It expresses nothing but the local (in time) gauge invariance of the theory. Since δ 0 and ϕ are the NG modes, Eqs. (18) and (19) exhibit the coherent feature of the collective dynamical regime. The system of N dipoles and of the e.m. field is characterized by the “in phase” dynamics expressed by Eq. (18) (phase locking): the local gauge invariance of the theory is preserved by the
692
E. Del Giudice and G. Vitiello
dynamical emergence of the coherence between the matter field and the e.m. field. Finally, we consider the case in which an electric field E due for example to an impurity, or to any other external agent, is applied to the atom system in the phase locking regime. Let us assume E to be parallel to the z axis. Then the term = −d ⋅ E , where d is the electric dipole moment of the atom, has to be added to the system energy. This will break the dipole rotational symmetry. The polarization Pn is given by [6]
Pn =
1 3
( A02 − A12 ) sin 2τ +
2 3
A0 (t ) A1 (t ) cos 2τ cos ω − ω 20 +4
2
t
(20)
whose time average is nonvanishing:
Pn =
1 3
( A02 − A12 ) sin 2τ .
Here τ is given [6] by
tan τ =
ω0 − ω 20 +4 2
2
.
The non-zero difference in the level populations ( A02 − A12 ) , as it is indeed found in the phase locking regime (see [6]), is therefore crucial in obtaining the non-zero polarization. As shown by Eq. (20), the polarization persists as far as the field E is active (i.e. ≠ 0 ). The system finite size prevents indeed from having a persistent polarization surviving the → 0 limit [11,12]. In such a limit the dipole rotational symmetry is thus restored. In conclusion, the system may be prepared with initial conditions given by Eqs. (6) and (11), where the value of the parameter θ 0 is in principle arbitrary within reasonable physical conditions. Starting at t = 0 from the initial condition | u (0) |2 = 0 , the system then evolves towards the minimum energy 2 2 state where a0 (t ) ≠ 0 and the amplitude u (t ) departs from its initial zero value. This implies a succession of (quantum) phase transitions [13] from the 2 initial u0 = 0 symmetric vacuum to the asymmetric vacuum u (t ) ≠ 0 , which means that in Eq. (17) θ 0 has to be greater than π 4 . In this way the lower bound for θ 0 is dynamically fixed as an effect of the radiative dipole-dipole interaction. This results in turn in the phase locking (18) which expresses the coherence in the time behavior of the phase fields (cf. Eq. (19)). The role of the phason mode ϕ is to recover the local gauge symmetry, thus re-establishing the
The Formation of Coherent Domains in the Process of Symmetry Breaking … 693
local gauge invariance of the theory. This is done through the emergence of the coherence implied by the phase locking between the matter field and the e.m. field. The gauge arbitrariness of the field Aµ is meant to compensate exactly the arbitrariness of the phase of the matter field in the covariant derivative Dµ = ∂ µ − igAµ . Should one of the two arbitrariness be removed by the dynamics, the invariance of the theory requires the other arbitrariness, too, must be simultaneously removed. This is the physical meaning of the phase locking. The link between the phase of the matter field and the gauge of Aµ is stated by the equation Aµ = ∂ µϕ ( Aµ is a pure gauge field). When ϕ ( x, t ) is a regular (continuous differentiable) function then it can be easily shown that E = 0 = B , namely the potentials and not the fields are present in the coherent region. In agreement with the AHK mechanism we thus find that in the ordered domains the fields E and B are vanishing; however, the gauge potentials are there nonvanishing and they sustain the phase locking in the coherent regime. We also observe that the existence of non-vanishing fields E ≠ 0 and B ≠ 0 is then connected to the topological singularities of the gauge function ϕ ( x, t ) [11], as it happens, e.g., in the presence of the vortex or other topologically non trivial solution, again in agreement with AHK mechanism. As already observed, the coupling enhancement by the factor N implies that for large N the collective interaction time scale is much shorter than the short range interaction time-scale among the atoms. This in turn implies the mesoscopic/macroscopic stability of the system vs the quantum fluctuations of the microscopic components [6]. In a similar way, for sufficiently large N the collective interaction is protected against thermal fluctuations. Much larger than k BT is the energy gap, more robust is the protection against thermal fluctuations. As a final comment we note that energy losses from the system volume, which we have not considered in the discussion above, do not substantially affect the collective dynamical features. An analysis of energy losses when the system is enclosed in a cavity has been presented in [14] in connection with the problem of efficient cooling of an ensemble of N atoms. Another problem which we have not considered above is the one related to how much time the system demands to set up the collective regime. This problem is a central one in the domain formation in the Kibble-Zurek scenario [2,15,16]. We only remark that since the correlation among the elementary constituents is kept by a pure gauge field, the communication among them travels at the phase velocity of the gauge field [6].
694
E. Del Giudice and G. Vitiello
1. References 1. Y.M. Bunkov and H. Godfrin, Eds., Topological defects and the non-equilibrium dynamics of symmetry breaking phase transitions, NATO Science Series C 549, (Kluwer Acad. Publ., Dordrecht, 2000). 2. T.W.B. Kibble, J. Phys. A 9, 1387 (1976); Phys. Rep. 67, 183 (1980); A. Vilenkin, Phys. Rep. 121, 264 (1985). 3. P. Higgs, Phys. Rev. 145, 1156 (1966). 4. T.W.B. Kibble, Phys. Rev. 155, 1554 (1967). 5. C. Itzykson and J.B. Zuber, Quantum Field Theory (MacGraw-Hill Book Co., N.Y. 1980). 6. E. Del Giudice and G. Vitiello, Phys. Rev. A 74, 022105 (2006). 7. E. Del Giudice, G. Preparata and G. Vitiello, Phys. Rev. Lett. 61, 1085 (1988). 8. C.C. Gerry and P.L. Knight, Introductory quantum optics (Cambridge University Press, Cambridge, 2005). 9. W. Heitler, The Quantum theory of radiation (Clarendon Press, 1954). 10. L. Leplae and H. Umezawa, Nuovo Cimento 44, 410 (1966). 11. E. Alfinito, O. Romei and G. Vitiello, Mod. Phys. Lett. B 16, 93 (2002). 12. E. Alfinito and G. Vitiello, Phys. Rev. B 65, 054105-5 (2002). 13. E. Del Giudice, R. Manka, M. Milani and G. Vitiello, Phys. Lett. B 206, 661 (1988). 14. A. Beige, P.L. Knight, G. Vitiello, New J. Phys. 7, 96 (2005). 15. T.W.B. Kibble, in Topological defects and the non-equilibrium dynamics of symmetry breaking phase transitions, NATO Science Series C 549, Ed. Y.M. Bunkov and H. Godfrin, (Kluwer Acad. Publ., Dordrecht, 2000), p. 7. 16. W.H. Zurek, Phys. Rep. 276, 177 (1997) and refs. therein quoted.
COGNITIVE SCIENCE
This page intentionally left blank
ORGANIZATIONS AS COGNITIVE SYSTEMS. IS KNOWLEDGE AN EMERGENT PROPERTY OF INFORMATION NETWORKS? LUCIO BIGGIERO University of L’Aquila, Piazza del Santuario 19, Roio Poggio, 67040, Italy E-mail: [email protected]; [email protected] The substitution of knowledge to information as the entity that organizations process and deliver raises a number of questions concerning the nature of knowledge. The dispute on the codifiability of tacit knowledge and that juxtaposing the epistemology of practice vs. the epistemology of possession can be better faced by revisiting two crucial debates. One concerns the nature of cognition and the other the famous mind-body problem. Cognition can be associated with the capability of manipulating symbols, like in the traditional computational view of organizations, interpreting facts or symbols, like in the narrative approach to organization theory, or developing mental states (events), like argued by the growing field of organizational cognition. Applied to the study of organizations, the mind-body problem concerns the possibility (if any) and the forms in which organizational mental events, like trust, identity, cultures, etc., can be derived from the structural aspects (technological, cognitive or communication networks) of organizations. By siding in extreme opposite positions, the two epistemologies appear irreducible one another and pay its own inner consistency with remarkable difficulties in describing and explaining some empirical phenomena. Conversely, by legitimating the existence of both tacit and explicit knowledge, by emphasizing the space of human interactions, and by assuming that mental events can be explained with the structural aspects of organizations, Nonaka’s SECI model seems an interesting middle way between the two rival epistemologies. Keywords: cognition, emergent properties, knowledge, mental states, organization.
1. Introduction A growing concern about knowledge, information and data as crucial competitive factors and main drivers of social development obscured any eventual difference among them. It was taken largely for granted the possibility to create and transfer them within and between organizations, with or without the intervention of knowledge management systems. In this “epistemology of possession” (Cook and Brown, 1999) among knowledge, information and data there are (if any) only slight differences, and all of them can be considered “things” producible and transferable within and between organizations. This
697
698
L. Biggiero
view has been challenged by a different approach, that started as an underground and minority approach and reached now the surface and legitimation of an alternative paradigm. In this “epistemology of practice” knowledge appears as radically different from information and data, and it is referred to the action of knowing rather than to an object. This rival views have a lot of implications at theoretical, empirical and managerial levels. It is not just a question to replace information with knowledge, and to consider organizations as knowledge processors instead of information processors. Once stated that organizations are cognitive systems (and even this assumption is questionable), their properties should be investigated and become a disputable matter because they depend on what does it mean “cognition”. In extreme synthesis, cognition can be marked by three types of capabilities, listed in the following ordering of growing complexity: A) the recognition, manipulation and production of those special sensorial data which are symbols; B) the interpretation of symbols, objects and events; C) the manifestation and activation of mental states, which usually are identified with speech acts, intentionality, emotions, purposive behavior. The two epistemologies take a univocal position for respectively the “A” and the “C” alternatives. In the epistemology of possession it is admitted that even artificial cognitive agents can create and transfer knowledge, and that the peculiar nature of tacit knowledge is substantially denied. However, this epistemology has been heavily criticized for not taking into account the specificity and the complexity of human interaction (Richardson, 2005; Tsoukas, 2005). Moreover, the theories consistent with that epistemology do not explain well the dynamics of competitiveness of firms and territorial systems (Amin and Cohendet, 2004; Nightingale, 2003). The epistemology of practice states that only individuals and human organizations can create knowledge, that tacit knowledge is irreducible to explicit knowledge, whose existence is questioned. However, the epistemology of practice has serious problems to explain such irreducibility and how tacit knowledge can be stored and transferred. Other approaches allow differentiating the required capabilities between the activities of knowledge creation and transfer, and between the creation of tacit and explicit knowledge. Nobody contends that cognitive agents can have different cognitive capabilities, but the core question is: “where does cognition start from?” In other words, does exist a minimum threshold to detect the presence of cognitive capabilities or is it just a question of degree, according to which our refrigerator,
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
699
being designed on a feedback mechanism, would be a very simple but cognitive agent? If keeping the highest threshold (the “C” capability), would agent-based simulation models show mental properties? They certainly have “A”, and likely also “B” properties, and further they can be self-organizing and learning systems. Is all this enough to say that they have mental states? The debate on the nature of tacit knowledge and the limits of codification is fully immersed in the previous issues. The supporters of the epistemology of practice, though claiming a non-reified nature of knowledge, often treat knowledge transfer in a rather traditional way, thus raising the question: transfer of what? The confrontation of the two epistemologies raises a number of other questions: is the juxtaposition between the two paradigms so radical to avoid any compromising position? If yes, then what would be the practical consequences for management and organization science? In particular, would it still making sense to speak about knowledge management systems? If yes, in which terms? What such systems could eventually manage? The answers can be found only revisiting and facing with two crucial debates which have developed during last decades, but which have received less attention by economists and organization scholars. The problem of the nature of cognition and the mind-body problem run parallel in the second half of last century, though the former dates much longer, back to Cartesian philosophy and, to some extent, to ancient Greek and Indian philosophy. The problem of the nature of cognition concerns what does mean “thinking”, computation, and, paradigmatically, whether computers can think. In organization science, given that knowledge creation is the peculiarity of cognitive (thinking) agents, the issue immediately affected by this debate refers to the eventual differences between data, information and knowledge. The mind-body problem deals with the relationship between the physical and the mental states (events) in cognitive systems. It gives indications for the question whether organizations can be considered collective minds, and thus, whether they can have mental states, and whether they can be assigned socio-cognitive and social-psychological properties, like intentions, identity, trust, reputation, etc. Both debates (cognition and mind-body) supply insights on the nature of tacit knowledge and its codifiability, and the related issue of a theory of knowledge management systems. The confusion, difficulties and ambiguities marking the plethora of positions in these fields come from the illusion that it would have been possible to avoid the confrontation with those fundamental issues. The major aim of this paper is to show the strong connections between them and the problems of theorizing in organizational cognition, in simulation modeling, in firm or
700
L. Biggiero
territory competitive analysis, and finally in knowledge management systems. Here of course we cannot revisit extensively both problems, but just refer to them for what matters mostly when considering organizations as cognitive systems. In next section it is discussed the problem of the nature of cognition, showing the differences between (old and new) cognitivists constructivists. In the third section it is address the mind-body problem, which is applied to the issue of organizational knowledge creation and transfer. It is proposed a correspondence between the philosophical approaches and the main current positions in economics and organization science. In the fourth section it is addressed the codification dispute, which is referred to the different positions occurring in the debate on the nature of cognition and the mind-body problem. It is shown how the seemingly distant question of intentionality gives interesting suggestions for the codification problem. The constraining consequences of the rival epistemologies for the crucial question of knowledge transfer are discussed in section five. Finally, in the sixth section some implications for organizations, territorial systems, and knowledge management systems are developed. The issue of complexity is not treated apart, but it will result evident how it crosses all the others. The discussion of this paper is taken mostly at the organizational level, with few indications for the individual and the inter-organizational (and territorial) level. 2. Constructivism vs. cognitivism and connectionism Winograd and Flors (1986), Varela et al. (1991), Varela (1992), and Venzin et al. (1998) look at the debate on artificial intelligence as a key reference for understanding knowledge in organizations. They identify three epistemological positions: the cognitivist, the connectionist, and the autopoietic. In the field of economics and organization science the first perspective is well represented by Simon (1969, 1977, 1997; March and Simon, 1958), Galbraith (1973) and nonevolutionary economists, while the second has been developed in different ways by Nelson and Winter (1982), Kogut and Zander (1992), Kogut (2000), Cohendet and Llerena (2003), Monge and Contractor (2003). Connectionists differ from cognitivists in that information and knowledge are supposed to be distributed within organizations and parallel computed, instead of centralized and sequentially computed. Moreover, behaviors are embedded into a set of routines and rules, which are taken together by means of socio-economic relationships. Indeed, the patrol of “old cognitivists” has been now replaced by “new cognitivists”, who can be considered as one single group with
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
701
connectionists. The distinction between old cognitivists and connectionists (or “new cognitivists”) is addressed by Casti (1989), who suggests to identify them with respectively the supporters of the strong and the weak program of artificial intelligence, or even the top-down and the bottom-up approaches to artificial intelligence. In the autopoietic perspective information is seen as interpreted data, and as such it cannot cross organizational borders. Only data can do that. As for knowledge, while in the connectionist position it depends on the organizational network, in the autopoietic epistemology “knowledge is always private”. Magalhães (1998) underlies the difference between data and information, which in its essence corresponds to that between syntax and semantics: by simply manipulating data is never possible to get information. It is necessary to interpret data through meanings. Human interactions create knowledge, which is seen as a process and not as an object. From knowledge to knowing (Orlikowski, 2002). Magalhães (1998) and Venzin et al. (1998) argue that this autopoietic viewpoint is well represented by Nonaka, Nishiguchi and Takeuchi (Nonaka and Nishiguchi, 2001; Nonaka et al., 1998; Nonaka and Takeushi, 1995), but indeed they consider explicit knowledge as one of the forms in which knowledge not only can be obtained or transferred, but also created. On the contrary, for the supporters of the autopoietic view, explicit (codified) knowledge is an oxymoron (Zeleny, 2005). Knowledge would be associated only to the tacit nature, which, on its own, is seen in the action of knowing and not as a state, as something which can be possessed. The supporters of autopoiesis differ quite a bit when concerning data and information. Some underline the distinction between information and knowledge (Zeleny, 2000, 2005), then assuming the conventional view that information is a sort of structured data. Others (Aadne et al., 1996; Magalhães, 1998; Venzin et al., 1998; Von Krogh et al., 1996) make a sharp distinction already between data and information, which is seen as interpreted data. Besides the criticisms that Biggiero (2001) [4] moved to the fundamental argument that organizations are autopoietic systems, one of the problems in this debate is that constructivists often assign cognitivists too naive or positivist epistemological positions. Although it is not impossible to find them, it is a too easy game to assign cognitivists the most traditional view of extreme rationalism and positivism. That reality is subjectively perceived does not seem a concept so hard to be accepted by new cognitivists (Biggiero, 2001 [5]). Fighting against the idea of objective perceptions and observers, and emphasizing the issue of self-reference constructivists seem to “force an open door”.
702
L. Biggiero
Approaches close to social constructivism (Berger and Luckman, 1967; Brown and Duguid, 1991, 1998; Gherardi, 2001; Organization, 2000; Orlikowski, 2002; Weick, 1969, 1995) or to cybernetic constructivism (Magalhães, 1998; Mingers, 1995; Varela et al., 1991; Von Krogh et el., 1998; Watzlawick, 1984; Yolles, 2006; Zeleny, 2005) belong to the epistemology of practice. If cognition were identified only with mental states (the “C” capability), and these were allowed to pertain only to humans, and, finally, if methodological individualism were accepted while connectionism rejected, then organizations would be not seen as cognitive systems. While we have many contributions towards a theory of organizational knowledge creation, with few exceptions (Yolles, 2006; Zeleny, 2005), we lack indications for knowledge management systems, assuming that this is possible and would not result in another oxymoron. If we come back to the graduate scale of cognition from symbols processing capability to the property of exhibiting and performing mental states, we see that cognitivists and constructivists lie at the opposite extremes: the former tend to consider symbols processing as a full sign of cognitive ability, while constructivists tend to limit it to systems able to exhibit and perform intentionality. Although not yet developing a theory of organizational knowledge creation, the supporters of social simulation through artificial societies are naturally consonant with the epistemology of possession. 3. The mind-body problem As well known, this problem concerns the relationship between the physical and the mental states of humans (Guttenplan, 1994; Haugeland, 1981), and we could say, by extension, of any cognitive system. It is a very old question, whose controversial development in systematic way dates back at least to Cartesio, who first proposed his famous ontological dualism: thought and consciousness derive from mind, which has a different substance from body. There are no laws connecting phenomena generated by them. Thus, the original position denies any relationship between physical and mental states. Cartesian ontological dualism is nothing but anachronistic in social sciences, and specifically in economics. Technological and economic structure of organizations has nothing or less to do with its socio-cognitive and social-psychological aspects, like identity, reputation, emotion, etc. Even more radically, the inner structure of organizations was regarded as a black box preventing investigations. Noneconomic or non-technological variables were neglected or excluded from the attention of economists. This view is still the mainstream, but evolutionary
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
703
theory of the firm and organizational economics have decided to open the black box, and at least to recognize the existence of the two spheres of physical (structural) and mental events. The second interesting position to be listed here is the eliminative materialism (Churchland, 1981), which denies any peculiar existence to mental events. They are pure appearance; there is only body. This view can be found in current organizational economics too, for instance when trust is treated as a phenomenon masking calculativeness (Williamson, 2002). Trust would be considered just and purely as an unclear term to indicate risk and agents’ ability to calculate their convenient decisions. Analogue concern is deserved to “epiphenomena” like reputation, identity, etc. Management and organization science do not follow this line of denying real existence to mental states, though faced by unilateral approaches reflecting single specialized disciplines, like accounting, marketing, etc. Thus, it is possible to find an accounting perspective on reputation, where this phenomenon is looked and measured essentially in terms of financial results. However, the difference with economics is that, even in these partial approaches, it is recognized the complex nature of mental events, which further are supposed to feed back on accounting or other variables as well. What has been long (and partially it is still) debated is whether mental states and intentionality can be assigned at organizational level, and not restricted to the individual level, and ultimately whether organizations are cognitive systems. According to methodological individualism the answer is negative, because these are properties peculiar only of individuals. This is also the dominant position in economics, with some differentiation in evolutionary economics. Conversely, in management and organization science that possibility is growingly accepted and it is consolidating the field of study of organizational cognition, identity, trust, reputation, etc. Even more, and following the same idea, these concepts are becoming to be applied also at inter-organizational and territorial level (Biggiero and Sammarra, 2003; Sammarra and Biggiero, 2001). The third perspective is reductionism, according to which the physical and the mental states have its own ontology and proper concepts and theories, but the mental derive from the physical states. This is currently the dominant view in natural and social sciences at individual level, with differentiated positions when concerning the organizational level. Reductionism has many variants, among which two mostly matter for our discussion. Both can be referred to the theory of tokens, which states that to each mental token does correspond a specific and unique physical token. The two variants differ in the epistemological relationships between physical and mental events. In one variant mental events
704
L. Biggiero
are totally predictable by studying the corresponding physical events, while in the other variant the predictability is denied. This latter is the position of Davidson (1980), who argues that the predictability of mental from physical events is precluded because psychology he says- is not a science but just a categorization and rationalization of behavior. The ontological reducibility of mental to physical events is complete, but the nomological reducibility definitely incomplete. Physical and mental events are mutually influencing, but the former have a major autonomy because there can be physical without mental states but not vice versa. Now, besides the philosophical implications and the disputable judgment on the epistemological status of psychology (MacDonald and MacDonald, 1995), Davidson’s view, once transferred on the field of organizational cognition is interesting for our discussion. It would suggest that, although mental events and knowledge as practice come from the cognitive patterns of human interactions within organization, the created knowledge would remain not completely available, traceable and explainable. As we will see in the next section, this could be the philosophical support for an explanation of why part of knowledge, which can be identified with tacit knowledge, will never be reduced to bytes or someway codified. 4. The codification dispute One of the main results of developing an evolutionary theory of the firm has been that of replacing the view of firms and information processors with a view of firms as knowledge processors (Amin and Cohendet, 2004). This change has had very valuable outcomes in better understanding firms’ behavior, in linking routines and decision making, and ultimately firm competitiveness to capabilities and learning processes, and finally in reducing the gap between economists and management scholars because of looking inside the black box. However, lacking a clear distinction between information and knowledge or reducing knowledge to systematized or structured information diminishes the impact of the evolutionary theory of the firm. The current debate on tacit knowledge and its codifiability mirrors this state of ambiguity and confusion. Connectivists and constructivists seem to share the same reductionist approach to the mind-body problem: mental states can be reduced to and explained by the underlying cognitive networks. However, the measure and extension of reduction and explanation can vary to a large extent. It can be almost substantially eliminated by constructivists, or conversely viewed as a strict mapping by some connectivists. This latter is the position taken by many
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
705
economists in the current debate on the possibility of codifying tacit knowledge (Cowan, 2001; Cowan et al., 2000; David and Foray, 1995; Foray and Cowan, 1995). They argue that in principle it is always possible for any practice to generate a codebook, which would gather all necessary tacit knowledge making it explicit, and consequently transferable. The concrete production of such codebooks would depend on economic convenience in broad sense. Nightingale (2003) suggests an intermediate view between constructivists and connectivists. He shares the connectivist idea of cognitive networks as the generators of mental states, language, tacit and explicit knowledge. Thus, he accepts the existence of both a reified and an interactive form of knowledge, and he agrees also to consider tacit knowledge as resident in cognitive patterns. He suggests to justify the existence and non-codifiability of tacit knowledge by placing it partially into unconsciousness and into that part of conscious mental state where non-verbal knowledge resides. The rationale comes from merging studies on neural networks and consciousness by Damasio (1994, 1999) and Edelman (1987, 1989, 1992), and those on intentionality, language and consciousness by Searle (1983, 1993, 1995, 1998). This way, tacit knowledge remains nonreducible to codebooks, but at the same time it escapes out of the mist of something non-definable and pseudo-scientific. Nightingale’s position appears consistent with Davidson approach to the mind-body problem. The seemingly similarity between connectivists (or new cognitivists) and constructivists in considering cognitive systems as compound by cognitive networks, and its mental states as characterized in terms of emergent properties can disappear when looking at the different outcomes of their approaches to the nature of knowledge and the mind-body problem. For connectivists tacit knowledge would present no any peculiar status which prevents its codification into explicit knowledge. At the very end, it would be just a question of bytes. Moreover, knowledge and mental states can be precisely, though hardly, understood and predicted by studying the structure and evolution of cognitive networks which are supposed to produce them. Finally, knowledge is reified to the status of object, as a database or a book. “The range of knowledge is much greater than the range of action” (Carley, 1999: 8), while for most constructivists in practicing there is much more knowledge than in explicit knowledge. According to connectivists and new cognitivists, artificial cognitive agents can create knowledge, because for cognition it is requested just the minimum capability – the alternative “A” – and, due to its recursive cognitive patterns, they are supposed to generate (or simulate) mental events. Being not able to use natural language and, from an evolutionary point of view, being at its infant stage, its thinking is poor, at least if compared with that of humans. Agent-based
706
L. Biggiero
simulation models (Conte and Castelfranchi, 1995; Gilbert and Terna, 2000; Gilbert and Troitzsch, 2005; Hegselsmann, et al., 1996; Pitt, 2005; Sichman et al., 1998) are interesting cases for this issue. Indeed, most theorists in this field are perfectly wary of the social nature of information and knowledge, albeit they usually do not make sharp distinctions between the two. Some of them are fully engaged in steering their scientific communities far from purely computational approaches (Castelfranchi, 2002). When their simulation models are built in an emergentist way, that is when its agents are able to see their individual and/or aggregate behavior, they possess both properties of self-reference and emergent cognition. Thus, they “think” and create knowledge. Is there any tacit knowledge? Computational scientists engaged in social simulation would answer positively to also this question (Falcone et al., 2002). When models are enough complex it would be possible to get the knowledge creation effects of collective behaviors lacking the possibility to exactly trace when where and how they occurred. If tacit knowledge is considered as a new form of the “ghost into machine”, it would be much closer to the supposed ghost operating in human interactions. Some computational scientists (Carley, 1999; Krackhardt, 1992, 1995) tend often to take an even more extreme position than “simple” cognitivists by assigning cognitive properties also to pure symbols repositories, like databases or books. Conversely, if computers or artificial cognitive networks were denied to think and create knowledge, or if it were assumed some form of ontological dualism or anti-reductionism between physical and mental states, then any social simulation obtained by designing and “running” artificial societies would be totally meaningless. On the other hand, being artificial cognitive networks based on bytes and (seemingly parallel) computation, if the previous conditions were reversed, then explicit knowledge would make sense. Tacit knowledge could be interpreted as residual knowledge, which would be measurable and explainable but not detectable and codifiable because embedded into the processes and “ecologies” (Carley, 1999) of interacting patterns. Nonaka seems to be in an intermediate position, which admits the existence of both tacit and explicit knowledge. According to the SECI model, one of the main goal of knowledge management systems is just that to enable the formation of a space of interaction in which knowledge can be easily created and converted from one form to another.
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
707
5. Knowledge transfer Although related, the questions of creation and transfer can be considered also separately: a system could be able to transfer but not to create knowledge. For example, if knowledge were concerned in both its forms of explicit and tacit knowledge, but at the same time the creation of knowledge were associated only with the highest cognitive capability (the “C” category of cognition), then a simple information technology platform of knowledge management would be a system able to transfer but not to create knowledge. It is noteworthy that this is a possibility allowed also by Nonaka’s SECI model. The reverse is apparently harder to imagine, because intuitively it seems that the capacity to create is more sophisticated of the capacity to transfer, and so one can think that, if a system is able to create, then it is able also to transfer knowledge. However, this logical relationship is not so certain, and it depends on the categorization of knowledge. For instance, if knowledge is narratives, if it is action in progress, if it is knowing, how is it possible to be transferred? For radical constructivists the non-reification of knowledge prevents the possibility of transfer, regardless how distant are the supposed senders and receivers. This limit holds even in the face-toface relationship between a teacher and her disciples (Von Glasersfeld, 1995). Being not an object, nothing can be transferred. This is a very crucial issue because even the ones who inspire their positions to some form of constructivism, when dealing with the problem of transfer, do accept that possibility. Indeed, almost all economic, management, and sociological literature on territorial systems, inter-firm networks, innovations and strategies is focused on, and takes for granted the possibility of knowledge transfer. It would be quite uncomfortable to have a theory of knowledge creation that implies the impossibility of transferring that knowledge. The epistemology of possession assumes that all the three entity are transferable, eventually with some difficulty when using computer-mediated communication technologies and/or when treating tacit knowledge. Indeed, in their view of tacitness there is no any impossibility, but just inefficiency (difficulty) in terms of computational or economic resources. They could argue also that, when knowledge is well codified, computer-mediated communication technologies are more and not less efficient than face-to-face. These positions are firmly rejected by constructivists (Magalhães, 1998; Venzin et al., 1998). It is hard to believe that explicit knowledge does not require the intervention of tacit knowledge to produce any useful and effective outcome. Any amount of cookbook does not guarantee at all to be a good cook. A reason of interest for
708
L. Biggiero
Nonaka’s SECI model is that it admits the existence of, and considers both tacit and explicit knowledge as two complementary forms. Nonaka et al. (2006) seem particularly wary of the problems connected to the definition of knowledge and cognition, and they try to find a way to make their model consistent with constructivism. In their model Ba represents the contextual condition for the most critical operations of transformation between the various forms of knowledge. It could be seen as covering the area between unconscious and conscious actions where Nightingale (2003) places the limits of codifiability, and where Davidson (1980) could put the breach to the reducibility of mental events to physical events. The reinterpretation of the Japanese concept of Ba refers to that space of interaction where gnosiological and semiotic complexity (Biggiero, 2001 [6]) move perceptions and cognition into the sphere of unconsciousness, and non- or para-verbal knowledge. 6. Conclusions In the light of the debates on the nature of cognition and on the mind-body problem, the two rival epistemologies of possession and practice can be updated and better reformulated underlining its implications. In the former perspective: (i) cognition is the emergent outcome of complex adaptive (partially selfreferential) networks; (ii) knowledge can be also explicit; (iii) artificial societies can have mental states; and (iv) tacit knowledge is, at least in principle, completely codifiable. Conversely, in the epistemology of practice: (i) cognition is associated with mental states in a one-to-one relationship; (ii) explicit knowledge is an oxymoron; (iii) artificial societies, being composed by noncognitive agents, do not have mental states and thus, cannot simulate in a satisfying way any mental event; (iv) (tacit) knowledge is a peculiarity of organizations and it is irreducible, because qualitatively different, to information and data. The two epistemologies seem juxtaposed and have even opposite implications. However, especially but not only in organization science, we are in a rather paradoxical situation. On one side the epistemology of practice is recruiting more and more adepts, giving a strong and exciting impulse to understand organizational mental events. On the other side the computational approach to social science has been renewed under the development and the extraordinary heuristic power of agentbased simulation models. But, as we have seen, this perspective finds its scientific sense only into the epistemology of possession. Moreover, many social simulation scientists are supposed to be
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
709
absolutely sensitive to the question of organizational mental events. Thus, it seems that the research agenda of next years has to cope with this question. Being a sort of third way between the two epistemologies, Nonaka’s SECI model is an interesting point of reference. It has the great merits to take into account (or at least to be open to consider the relevance of) organizational mental events, to legitimate the existence of explicit knowledge, and to admit that computational social simulation models can be cognitive systems in the highest sense. Its missing point is that it lacks a clear theory explaining where cognition comes from, what gives space to tacit knowledge, and to what extent mental events can be derived from organizational structure. In short, it is necessary to take clear positions respect to the two problems of the nature of cognition and the mind-body relationship. References 1. J.H. Aadne, G. Von Krogh and J. Roos,. in Managing Knowledge. Perspectives on 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Cooperation and Competition, Ed. G. Von Krogh and J. Roos, (Sage, London, 1996), pp. 9-31. A. Amin and P. Cohendet, Architectures of Knowledge, Firms, Capabilities and Communities (Oxford UP, Oxford, 2004). P. Berger and T. Luckman, The Social Construction of Reality (Allen Lane, London, 1967). L. Biggiero, in Sociocybernetics, Complexity, Autopoiesis and Observation of Social Systems, Ed. G. van der Zouwen and F. Geyer, (Greenwood, Westport (CT), 2001), pp. 125-140. L. Biggiero, Systemica 12, 23-37 (2001), (reprinted in LUISS International Journal, 2000). L. Biggiero, Nonlinear Dynamics and Chaos in Life Sciences 5, 3-19 (2001). L. Biggiero, Entrepreneurship & Regional Development 18(6), 1-29(2006). L. Biggiero and A. Sammarra, in The Net Evolution of Local Systems, Knowledge Creation, Collective Learning and Variety of Institutional Arrangements, Ed. F. Belussi, G. Gottardi and E. Rullani, (Kluwer, Amsterdam, 2003), pp. 205-232. J.S. Brown and P. Duguid, Organization Science 2(1), 40-47 (1991). J.S. Brown and P. Duguid, California Management Review 40(3), 90-111 (1998). J.S. Brown and P. Duguid, The Social Life of Information (Harvard Business School Press, Boston, 2000). K.M. Carley, Research in the Sociology of Organizations 16, 3-30 (1999). K.M. Carley and M. Prietula, Eds., Computational Organization Theory Lawrence (Erlbaum Associates, Hillsdale (NJ), 1994). C. Castelfranchi, International Journal of Cooperative Information Systems 11, 381-403 (2002). J.L. Casti, Paradigms lost (Avon Books, NY, 1989). P.M. Churchland, Journal of Philosophy 78, 67-90 (1981).
710
L. Biggiero
17. P. Cohendet and P. Llerena, Industrial and Corporate Change, 12(2), 271-297 (2003).
18. R. Conte and C. Castelfranchi, Cognitive and Social Action (UCL Press, London, 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.
1995). S.D.N. Cook and J.S. Brown, Organization Science 10(4), 381-400 (1999). R. Cowan, Research Policy 23(9), 1355-1372 (2001). R. Cowan, P. David and D. Foray, Industrial and Corporate Change, 9, (2000). R. Cowan and D. Foray, Industrial and Corporate Change 6, 592-622 (1997). A. Damasio, Descarte’s Error, Emotion, Reason and the Human Brain (Putnam, NY, 1994). A. Damasio, The Feeling of What Happens, Body and Emotion in the Making of Consciousness (William Heinemann, London, 1999). D. Davidson, Essays on Actions and Events (Clarendon Press, Oxford, 1980). G. Edelman, Neural Darwinism, The Theory of Neuronal Group Selection (Basic Books, NY, 1987). G. Edelman, The Remembered Present, A Biological Theory of Consciousness, (Basic Books, NY, 1989). G. Edelman, Bright Light, Brilliant Fire, On the Matter of the Mind (Basic Books, NY, 1992). R. Falcone, M. Singh and Y. Tan, Eds., Trust in Cyber-societies, Integrating the Human and Artificial Perspectives, (Springer, NY, 2002). D. Foray and R. Cowan, Industrial and Corporate Change 6, 595-622v (1997). B. Gallupe, International Journal of Management Reviews 3(1), 61-77 (2001). J. Galbraith, Designing Complex Organizations (Addison-Wesley, Reading (MA), 1973). S. Gherardi, Human Relations, 9 (2001). N. Gilbert and P. Terna, Mind & Society, 1, 57-72 (2000). N. Gilbert and K.G. Troitzsch, Simulation for the Social Scientist (Open University, Buckingham, 2005). S. Guttenplan, Ed., A Companion to the Philosophy of Mind (Blackwell, Oxford, 1994). J. Haugeland, Ed., Mind Design. Philosophy, Psychology, Artificial Intelligence (The MIT Press, Cambridge (MA), 1981). R. Hegselsmann, U. Mueller and K.G. Troitzsch, Ed., Modelling and Simulation in the Social Sciences from the Philosophy of Sciences Point of View (Kluwer Academic, Dordrecht, 1996). B. Kogut, Strategic Management Journal 21, 405-425 (2000). B. Kogut and U. Zander, Organization Science, 3, 383-397 (1992). D. Krackhardt, in N. Nohria and R. Eccles, Eds., Networks and Organizations, Structure, Form and Action (Harvard Business School Press, Boston (MA), 1992). D. Krackhardt, Entrepreneurship Theory and Practice 19, 53-69 (1995). J. Liebowitz, Ed., Knowledge Management Handbook (CRC Press, London, 1999). G. MacDonald and C. MacDonald, Eds., Connectionism, Debates on Psychological Explanation (Blackwell, London, 1995).
Organizations as Cognitive Systems. Is Knowledge an Emergent Property …
711
45. R. Magalhães, in Knowing in firms. Understanding, managing and measuring 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69.
knowledge, Ed. G. Von Krogh, J. Roos and D. Kline, (Sage, London, 1998), pp. 87122. J.G. March and H.A. Simon, Organizations (revised edition) (Wiley, NY, 1958). J. Mingers, Self-producing systems. Implications and applications of autopoiesis (Plenum Press, NY, 1995). P.R. Monge and N.S. Contractor, Theories of Communication Networks (Oxford UP, Oxford, 2003). R.R. Nelson and S. Winter, An Evolutionary Theory of Economic Change (Belknap Press of Harvard UP, Cambridge (MA), 1982). P. Nightingale, Industrial and Corporate Change 12, 149-183 (2003). I. Nonaka and T. Nishiguchi, Eds., Knowledge Emergence. Social, Technical and Evolutionary Dimensions of Knowledge Creation (Oxford UP. Oxford, 2001). I. Nonaka and H. Takeuchi, The Knowledge-Creating Company (Oxford UP, NY, 1995). I. Nonaka, K. Umemoto and K. Sasaki, in Knowing in Firms. Understanding, Managing and Measuring Knowledge, Ed. G. Von Krogh, J. Roos and D. Kline, (Sage, London, 1998), pp. 146-172. I. Nonaka, G. Von Krogh and S. Voelpel, Organization Studies 27(8), 1179-1208 (2006). Organization, Special Issue on Knowing in Practice (2000). W.J. Orlikowski, Organization Science 13(3), 249-273 (2002). J. Pitt, Ed., The Open Agent Society, Normative Specifications in Multi-agent Systems (Wiley & Sons, NY, 2005). K. Richardson, Managing Organizational Complexity. Philosophy, Theory, Application (IAP, Greenwich (CT), 2005). A. Sammarra and L. Biggiero, Journal of Management and Governance 5, 61-82 (2001). J.R. Searle, Intentionality (Cambridge UP, Cambridge, 1983). J.R. Searle, The Rediscovery of the Mind (MIT Press, Cambridge (MA), 1993). J.R. Searle, The Construction of Social Reality (Free Press, NY, 1995). J.R. Searle, Mind, Language and Society. Philosophy in the Real World (Basic Books, NY, 1998). J.S. Sichman, R. Conte and N. Gilbert, Eds., Multi-agent Systems and Agent-based Simulation (Springer Berlin, 1998). H.A. Simon, The Sciences of the Artificial (MIT Press, Cambridge (MA), 1969). H.A. Simon, Models of Discovery (Reidel, Dordrecht, 1977). H.A. Simon, Models Of Bounded Rationality, Vol. 3, Empirically Grounded Economic Reason (The MIT Press, NY, 1997). D.S., Staples, K. Greenaway and J.D. McKeen, International Journal of Management Reviews 3(1), 1-20 (2001). H. Tsoukas, Complex Knowledge. Studies in Organizational Epistemology (Oxford UP, Oxford, 2005).
712
L. Biggiero
70. F.J. Varela, in Understanding Origins, Contemporary Views on the Origin of Life, 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85.
Mind and Society, Ed. F. Varela and J. Dupuy, (Kluwer Academic, Dordrecht, 1992), pp. 235-263. F.J. Varela, E. Thompson and E. Rosch, The Embodied Mind. Cognitive Science and Human Experience (MIT Press, Cambridge (MA), 1991). M. Venzin, G. von Krogh and J. Roos, in Knowing in Firms. Understanding, Managing and Measuring Knowledge, Ed. G. Von Krogh, J. Roos and D. Kline, (Sage, London, 1998), pp. 26-66. H. Von Foerster, Observing Systems (Intersystems Publications, Seaside, 1982). H. Von Foerster, in Self-Organization and Management of Social Systems, Ed. U. Ulrich and G.J.B. Probst, (Springer, NY, 1984), pp. 2-24. H. Von Foerster, Understanding Understanding. Essays on Cybernetics and Cognition (Springer, NY, 2003). E. Von Glasersfeld, Radical Constructivism. A Way of Knowing and Llearning (The Falmer Press, London, 1995). G. Von Krogh, J. Roos and K. Slocum, in Managing Knowledge. Perspectives on Cooperation and Competition, Ed. G. Von Krogh and J. Roos, (Sage, London, 1996), pp. 157-183. G. Von Krogh, J. Roos and D. Kline, Eds., Knowing in Firms. Understanding, Managing and Measuring Knowledge (Sage, London, 1998). P. Watzlawick, Ed., The Invented Reality (Norton, NY, 1984). K.E. Weick, The Social Psychology of Organizing (Award Records Inc., Newberry, 1969). K.E. Weick, Sensemaking in Organizations (Sage, London, 1995). T. Winograd and F. Flores, Understanding Computers and Cognition. A New Foundation for Design (Ablex Publishing Co., NJ, 1986). M. Yolles, Organizations as Complex Systems. An Introduction to Knowledge Cybernetics (IAP, Greenwich (CT), 2006). M. Zeleny, in The IEBM Handbook Of Information Technology in Business, Ed. M. Zeleny, (Thomson Learning, Padstow (UK), 2000), pp. 162-168. M. Zeleny, Human Systems Management. Integrating Knowledge, Management and Systems (World Scientific, London, 2005).
COMMUNICATION, SILENCE AND MISCOMMUNICATION MARIA PIETRONILLA PENNA (1), SANDRO MOCCI (1,2), CRISTINA SECHI (1) (1) Univ. degli Studi di Cagliari, Fac. di Scienze della Formazione, Dip. Psicologia (2) Univ. degli Studi di Cagliari, Fac. di Scienze della Formazione, Dip. Studi Filosofici Email: [email protected], [email protected], [email protected] The classical theories about communication have different views on the relevance of the requirement of the intentionality of the communicative agent. The composition of these views seems to be problematic since it leads to incompatible outcomes when we try to classify communicative behaviors. Some approaches, in order to build a synthesis, shift on the addressee the task of detecting the intentionality and thus cannot account for a number of interesting communicative phenomena. The systemic perspective instead, through the circularity of the inferences on system elements and the sharing of the attributes and the overall communicative characteristics of the system, defines, specifies and more generalizes the concept of communication, enabling to better single out the variety of phenomena connected to it and to catch the emergence of their communicative value. Keywords: Communication, communicative system, inferential circularity, silence.
1. The multidimensionality of human communication The human communication is a multidimensional activity: it has cognitive, social, cultural, economic, political involvements and it is strictly connected to the action. Therefore we must deal with communication from different perspectives, according to the peculiar concern of the specific thematic area that determines the analysis. As a matter of fact each perspective tends to define communication in the light of its own disciplinary interests and of phenomena included in its domain. That is why there is not a definition of communication univocally accepted. Unfortunately it is not only a matter of definition, but it also linked to issues concerning the contents and the specific features of communicative act. Every approach tends to regard communicative those phenomena which appear to be meaningful and functional to its own point of view. This fact has produced and still produces much confusion too. Besides, the concept of communication is heavily dependent on the (mostly recent) history of attempts made to define the concept itself. For example Shannon’s and Weaver’s mathematical theory identifies the communication basically with an unilateral transmission of 713
714
M.P. Penna et al.
information. The further integration of the feedback concept could constitute an element of generalization in order to define a communicative act since it implied the concept of exchange which is only used in order to regulate the informative flow. Since then the scheme based on the directionality continue to be connected to communication, conceived as addressed from a transmitter (Tx) to a receiver (Rx) and vice versa through a specific channel. In this way the communication has lost the sense of diffusion, of putting in “communis” the context, the sharing’s sense and the circularity. The requirement of purposiveness is an another example of strong conceptual conditioning. Within the psychological approach to linguistic and communicative phenomena the Palo Alto’s Group (Watzlawick et al., 1967, [12]) introduces the notion of behavioral interaction, thus widening the domain of analysis so as to include the context. This allows, in principle, to build the basis of an interactive approach to human communication. This point of view lets us go beyond the traditional framework “of the communication as an unidirectional phenomenon” (from the speaker to the listener) and regards as communicative all behaviors occurring within a definite context. In this way the communication assumes a pragmatic meaning, enabling the communicative act to affect the future behaviors of the communicants subjects. 2. It’s not true that all is communication Watzlawick’s famous metacommunicational axiom “One Cannot Not Communicate” (1967), wanted to emphasize the pragmatic aspect of the message, and it also was introduced to stress the need for looking not only at the simple semantic value, but has been often misunderstood. Finally it has brought some confusion between two important concepts: the behavior and the communication. It’s true that the “not behavior does not exists in an interactive situation”, (1967) because behavior does not have a counterpart, but the not communication is “a priori” not excluded. This can occur in case of the pure causality of the interaction, the state of strong confusion or of the issuer’s unconsciousness. However, Watzlawick took an important step defining the area of communication in an interactive situation. But the consequence of his pragmatic approach stands on the fact that each communicative act is regarded as “preterintenctional” (beyond our control, to mention an Italian legal term), that is beyond the agent intentionality. Who interacts cannot exempt from communicating. Even intentionally silent, we can communicate in a no-verbal way, through gestures, attitudes, posture. But it’s not granted that every behavior is communication, while there are many chances that a phenomenon, apparently
Communication, Silence and Miscommunication
715
without communicative features, is communicative instead. Behavior and communication are different phenomena and if we make them coincide: “[…] everything becomes communication (also the most accidental and unaware action) and we have not more the possibility to understand which are properties of the communication and specificities as such…” (Anolli, 2002 [1]). Moreover Anolli clarifies that the communication must be an interactive observable exchange, that has reciprocal intentionality and awareness (Anolli, op. cit.). The exchange requires that the behavior is not unilateral. So Shannon’s and Weaver’s theory is not more sufficient, as the interaction supposes the interdependence and circularity of the relationship. A mutual change of the respective communicative attributes must occur. Anolli clarifies the concept of the requirement of the intentionality and awareness as well. So the requirement of the intentionality of the communicative act appears essential in order to decide when an act is communicative. On the contrary, all scientific tradition related to Palo Alto’s School defends the lack of intentionality of communication. Also “popular” psychology has a similar concept. This is induced by the high diffusion power of media which consider as a good communicator who is able to catch the attention on himself. 3. The problem of requirement of intentionality Many scientists are changing their minds, holding that an act must be intentional in order to be communicative, unlike Palo Alto’s supporters. D.C. Dennett (1987) [3] sustains, in general, that man thinks that every natural agent, human being, animal, natural power, bases its behavior on goals. Miller and Steinberg (1975) [5], in particular, hold that every kind of human communication is made to obtain an answer or to influence the interlocutor. They maintain that the communicative act is not possible if there is not the intentional element. Grice (1975) [4] has introduced the distinction between the informative intention and the communicative intention. While the former only wants to increase the informative content of the addressee, the former instead constitutes a specific will to communicate, because it is based on the addressee’s awareness, on the issuer’s will to share the knowledge of diffused informative contents. These different points of view seems not to induce a dilemma: is intention fundamental in order to communicate or not? The problem is partially overcome by Buller and Burgoon (1994) [2] that add a further qualifying element in order to decide if we are in front of a proper communicative act.
716
M.P. Penna et al.
They analyze the perception, by the receiver, of the intention to communicate. Therefore there would be communication only when the transmitter has the intention to transmit, expressing positively such intention, and the receiver detects this will. In this way a relationship of communicative awareness is set. If the fundamental requirements are two then we communicate only when these are both present. When the intention is evidently absent, the receiver can only attribute communicative processes to the issuer, but he cannot define them as such. But, if the receiver does not perceive the issuer’s intention to communicate, even if this is manifestly present (because he is evidently declared, or observable by thirds), then this can be only considered a communicative attempt. Finally, a normal behavior is shaped if neither the issuing reveals the intention, nor the receiver tries to solve the problem. Buller’s and Burgoon’s model, trying to mediate the contrast on the intentionality, opens more problems than those it can solve. It introduces a gray area of an attempted or attributed communication, which is the realm of the miscommunication, like the irony, the deductive, false, pathological communication. All of them are, however, fully ways of communicating. In fact in these cases the relation of intentionality is always present and it is modulated, or better, it is disguised with the purpose of “to say, in order not to say” (Anolli, 2002 [1]). The one who is not able to find the communicative intention of the interlocutor perhaps has been deceived by the same interlocutor about his/her own intentions. Apparently, the communicative content does not seem to be addressed to him/her as actually is. Thus the intention carries out a metacommunicational role, because it is informative on the content of the communication itself. Paul Watzlawick (1967) [12] holds this when claiming that: “Every communication has a content and relationship aspect such that the latter classifies the former and is therefore a metacommunication” Sometimes, paradoxically, the true communicative content is the intention itself. In fact the pragmatic valence of the message is included in the will of the agent to focus on his/her own will to push the addresser to a definite behavior. That is why the content of the message can be easily inferred by clear contextual indications, as asserted by Gregory Bateson. He maintained that in every communication “a level of news and a level of order exists” (1951): the transmitted intention or, time after time, disguised on purpose, constitutes the order, that is the pragmatic member of the message which aims at determining a specific behavior in the interlocutors. On the other hand when neither intention is transmitted, nor received even if there is interaction, we are not in presence of a simple behavior,
Communication, Silence and Miscommunication
717
as Buller and Burgoon think, but we go into the field of the pathological communication. This is characterized by the impossibility to define clear relationships, even interacting with it, as in the case of the schizophrenic communication. (Selvini Palazzoli et al., 1975 [10]). 4. The circularity of the inferences The way to specify the concept of communication, by defining its exclusive characteristics and adding attributes, creates a paradoxical situation. In fact, the more you attempt to specify, that is to distinguish the functions of the transmitter and receiver and to characterize the specific parameters, the more you lose the general sense of the communicative concept as a “sharing”; in addition you are loosing more and more its resolutive power. The latter consists in the ability to characterize, to decipher, to distinguish all those phenomena that could be communicative but, by adding those specifications, are taken away from their field of attribution. Such a situation is described by the usual metaphor, “you can see the trees, but the forest is not seen”. An obvious case is the one of silence: if the requirement of the intentionality is not considered essential, all are inclined to recognize silence as a communicative act. But, if the intention is necessary to communicate, in this case to consider silence as an act of communication becomes problematic, unless we are satisfied with what Watzlavick (1967) [12] asserts when he says that silence manifests the will “to communicate what is not wanted to be communicated”, which is banally obvious. It is now necessary to turn upside down the paradoxical formulation: to specify is correct because theoretical confusions and conceptual collocations among similar phenomena are avoided (Anolli, 2002) [1]. We must enlarge the analysis of the interactions to all communicative field, by considering it as a system. In this way the communicative interactions do not interest only the transmitter or receiver, but also the observers come into play, being in turn observed, within and outside the same system. In fact through an explicit systemic framework we can define the communication in way which is specific, complete, coherent and synthetic. Only in this way it makes sense to speak about the circularity of the inference. Since a long time we know that the inferential process is fundamental in the communicative act. It was Peirce (1894) [7] who defined the inferential property of the sign. A message with an important equipment of signs can express a communicative content richer than that expressed in the interaction. On the contrary a message poor of signs, in order to be fully understood, can need a
718
M.P. Penna et al.
remarkable inferential activity by the receivers. But the inference can be limited only to bilateral interaction, falling back in previously mentioned contradictions. It is its circularity among the elements of the system that allows to recover the sense of the “putting in common” of meanings of the communication, which can therefore emerge from the circularity of the inferences and become the emerging result of a relational system with the attributes of interactivity, intentionality, visibility, awareness, expectation. With the help of other paradoxes, it is perhaps the phenomenon of silence the key that allows to turn over the interpretation. We have already analyzed the silence as a communicative phenomenon of systemic nature (Penna, Mocci, 2005) [8]. In fact we have found that within this perspective it can be a communicational carrier, and this occurs when it is inserted within a process of inferential circularity among the system constitutive elements themselves, determining the rising of the global communicative meaning (emergence of meaning). In order to overcome the impasse of the intentionality we must then consider the whole communication as the emerging result of the interactions in a communicative system. Such a system is the mean which enables a more integrated representation and avoids the localized approach based only on the communicative relationships. The system could be viewed as a whole emerging from the overall elements, their attributes, the possible communicative relations among the elements themselves, and the characteristics of these relations, while considering the elements as endowed whit common properties. In this case all elements would be potential communicators, that is endowed whit the faculty of communicating. These abilities, if exercised, would be able to influence the state of all the other elements which constitute the system, modifying attributes and relations. The change of the attributes of any member of the system, that is the change of the single one, would be reflected in that of all the others, so the system would be modified. Such a change of the system would make emerge a systemic property, not present before and not noticeable considering only the elementary relations among the members of the same system. In this way the relations between the elements, that is the interactivity, the visibility, the mutual intentionality, the awareness, the sharing of the meanings and of the symbolic systems are distributed among all the actors of the communicative field. They are not attested in a determinist way between the issuer and the addressee. It is mainly the inferential circular process that distributes such characteristics, as for example the intentionality. Describing at a microscopic level of the system an interactive relation in a classical sense may not appear as communicative.
Communication, Silence and Miscommunication
719
The addressee may lack the awareness of being, for instance, the addressee of a message. However this knowledge can emerge from an other relation, for example between him/her and the other observers. The problem, according to Buller’s and Burgoon’s model, of the addressee who does not find the intentionality of the transmitter, for example, would be overcome, being solved by the fact that in the system is present an observer who attests the intentionality as well. In this case the observer qualifies the action as communicative, making to come up (to emerge) the content. As we have already said, in this context, we are speaking about relations and interactions, and therefore we take in consideration communicative phenomena basically from a pragmatic point of view. As a matter of fact we analyze how the communicative phenomena are determined by and are reflected in the actions of the members of the system. Besides specifying the meaning, the systemic characteristic of the phenomenon will also enable the predictivity: the more the system is open, the more it facilitates the communication. Consequently the communicative content of the system will be a function of its degree of opening. In closed systems there could be only an increase of information, while in the open ones, besides detecting the state of the single relations among the elements, communication could be found too. It is in fact important to be able to estimate the communicative content of an act and its pragmatic effects. Such effects often determine substantial modifications in behaviors, even though not communicative. For example, in complex organizations, companies, institutions, the communicative system nearly always reflects the global relational structure of the same institution. Therefore a communicative act can foreshadow a substantial organization modification. That’s why it is important to enlarge the consideration to all the elements of the field, since their role is crucial, beyond their specific characteristic and their communicational attributes. Naturally the systemic redefinition of the communication can be considered as useful also from the semantic point of view, that is in terms of production and circulation of meanings. In this case the system of the relations is isomorphic to the context, as a matrix of meanings. The model of the communicational system can be described in terms of the characteristic levels of the open systems (Minati, 2004 [6]). The microscopic level could be articulated in a preliminary identification and description of the communicative system, of its members in their respective roles of actors and observers, of the same relational net, the state and the awareness of the interactions. It could then follow a macro description basically constituted by an analysis of the average effects of the system. The average effects are equivalent to the description through the classical theories of communication. Finally, we
720
M.P. Penna et al.
can get into the real phase of description of emergence of a communicative content constituted by the circulation of meaning that is created through the net of observations, inferences, intentions of the observers-actors. This is the true emerging phase of the systemic properties that, in our case, are connected to the creation and the circulation of the meaning and of communicative content by means of the interaction between elements. 5. Conclusions Which is the conceptual gain of the systemic approach in comparison to the classical theories? First of all the systemic approach completes the definition of the phenomenon and allows a wider generalization because it attributes a communicative value to a wider number of phenomena. In fact, actions such as silence, pauses in conversation and miscommunication are considered communicative because they make come up (emerge) meanings and have pragmatic valence. Then it better distinguishes the communicative phenomena from those merely informative through the concept of the circularity of the inferences on the awareness and on the intentionality and it is coherent since it uses the same concepts, characteristics and requirements of the interaction in order to explain every communicative act. References 1. L. Anolli, Ed., in Psicologia della Comunicazione (Il Mulino, Bologna, 2002). 2. D.B. Buller and J.K. Burgoon, in Strategic interpersonal communication, Ed. Daly and Wiemann, (Erlbaum, Hillsdale, N.J., 1994), pp. 191-223.
3. D.C. Dennett, The Intentional Stance (The MIT Press, Cambridge, Mass, 1987). 4. H.P. Grice, in Syntax and Semantics, Vol. 3, Speech Acts, Ed. P. Cole and J.L. Morgan, (Academic Press, New York, 1975), pp. 41-58.
5. G. Miller and M. Steinberg, Between people: a new analysis of interpersonal communication (Science Research Associates, Chicago, 1975).
6. G. Minati, Teoria Generale dei Sistemi – Sistemica – Emergenza: un’introduzione, (Polimetrica, Monza, 2004).
7. C.S. Peirce, in Collected Papers, Vol. 2, (Harvard University Press, Cambridge, Mass., 1894), pp. 1931-1935.
8. M.P. Penna and S. Mocci, in Proceedings of the 6th Systems Science European Congress, Paris, September 19-22, 2005, (Paris, 2005), CD-ROM.
9. J. Ruesch and G. Bateson, Communication: The Social Matrix of Psychiatry (Norton, New York, 1951).
Communication, Silence and Miscommunication
721
10. M. Selvini Palazzoli, L. Boscolo, G. Cecchin, G. Prata, Paradosso e
controparadosso. Un nuovo modello nella terapia della famiglia a transazione schizofrenica (Feltrinelli, Milano, 1975). 11. C. Shannon and W. Weaver, The mathematical theory of communication (University of Illinois Press, Urbana, 1949). 12. P. Watzlawick, J.H. Beavin, D.D. Jackson, Pragmatics of human communication. A study of interactional patterns, pathologies, and paradoxes (Norton & Co., New York, 1967).
This page intentionally left blank
MUSIC: CREATIVITY AND STRUCTURE TRANSITIONS
EMANUELA PIETROCINI Accademia Angelica Costantiniana, http://www.accademiacostantiniana.org Dipartimento di Musica Antica, Piazza A. Tosti 4 Roma RM Italy E-mail: [email protected] Music, compared to other complex forms of representation, is fundamentally characterized by constant evolution and a dynamic succession of structure reference models. This is without taking into account historical perspective, the analysis of forms and styles, or questions of a semantic nature; the observation rather refers to the phenomenology of the music system. The more abstract a compositional model, the greater the number and frequency of variables that are not assimilated to the reference structure; this “interference” which happens more often than not in an apparently casual manner, modifies the creative process to varying but always substantial degrees: locally, it produces a disturbance in perceptive, formal and structural parameters, resulting more often than not in a synaesthetic experience; globally, on the other hand, it defines the terms of a transition to a new state, in which the relations between elements and components modify the behavior of the entire system from which they originated. It is possible to find examples of this phenomenon in the whole range of musical production, in particular in improvisations, in the use of the Basso Continuo, and in some contrapuntal works of the baroque period, music whose temporal dimension can depart from the limits of mensurability and symmetry to define an open compositional environment in continuous evolution. Keywords: music, emergence, complexity, creativity, structure transitions.
1. Introduction “The Changes have no consciousness, no action; they are quiescent and do not move. But if they are stimulated, they penetrate all situations under heaven. If they were not the most divine thing on earth, how could they do this?” (T'uan Chuan, Kung Tsë) In his foreword to the English translation of the Book of Changes (I Ching), a time-honored monument to Chinese thought, C.G. Jung (I Ching, 1950 [23]) puts forward an interesting interpretation of the text, in particular with regard to the philosophy of events and their succession. In this, as in other sacred Taoist and Confucian texts, the principles of causality pass almost unnoticed compared to the great importance attached to chance: “The moment under actual observation appears to the ancient Chinese view more of a chance hit than a
723
724
E. Pietrocini
clearly defined result of concurring causal chain processes. The matter of interest seems to be the configuration formed by chance events in the moment of observation, and not at all the hypothetical reasons that seemingly account for the coincidence. While the Western mind carefully sifts, weighs, selects, classifies, isolates, the Chinese picture of the moment encompasses everything down to the minutest nonsensical detail, because all of the ingredients make up the observed moment ”. More significant still is his analysis of the temporal dimension of events, from which Jung extracts the ‘principle of synchronicity’: “… synchronicity takes the coincidence of events in space and time as meaning something more than mere chance, namely, a peculiar interdependence of objective events among themselves as well as with the subjective (psychic) states of the observer or observers …” (I Ching, 1950 [23]). If applied to music, these considerations offer a new key to the reading of musical events, and especially the creative moment. Composition has always made use of procedural models of various degrees of complexity to organise sounds (Bruno and Pietrocini, 2003 [12]): from the basic operations (ordering, classifying, assembling, sequencing), to syntax (melody, harmony, rhythm) and formal systems (modality, tonality, seriality …); these and other models utilised over time can be described mathematically. It is to be pointed out, though, that because of the way the models relate and interrelate, development, as in all complex systems, has been decidedly nonlinear (Bruno, 2002 [11]; Benvenuto, 2002 [9]): the really relevant transformations and changes seem to have happened ex abrupto, not supported by sufficient connections of structural, historical and aesthetic causality. Very often, these great changes are ascribed to the genius of a great composer, whose extraordinary intuition produced a turning point that decisively influenced all subsequent production (David and Mendel, 1966 [16]; Apel, 1967 [2]). This theory is certainly valid and supported by numerous examples, although it is not true all of the time or in all cases. However, whatever the historical considerations and aesthetic consequences, it is this phase transition that is the focus of this paper. From the systematic point of view, in fact, we can say that change is the product of an emergence process which is characterized by a particular form of implicit learning by the observer-composer and collective learning when the new system is shared, acknowledged, generalized and reworked (Von Bertalaffny, 1968 [40]; Minati and Pessa, 2006 [26]).
Music: Creativity and Structure Transitions
725
How structural change comes about and exactly when, we cannot know for sure; it is possible, though, to trace signs of anomalies, “interferences”, that seem more often than not to appear by chance during the musical discourse; it is even more interesting to see how, also in quick succession, an accident can substantially modify the creative process. In this work I will be examining some phenomenological aspects of the emergence process in music, concentrating on transformation and change; the first part will deal with elements of the compositional process connected to structure reference systems; the second part is concerned with variation in relation to the composition and performance of music, quoting some significant examples of works from the baroque period. Given the nature and purpose of the paper, use will not be made of purely musicological analytical models, but rather an investigatory method pertaining to the field of research. 2. Part I 2.1. “Musica est scientia bene modulandi … et bene movendi” These are the opening words of Augustine’s “De Musica” (IV sec. d.C.), one of the most important musical treatises from Late Antiquity. Coherent with the principles of classical aesthetics, Augustine underlines that music should be regarded as a science, since reason is used and “proceeds according to the law of numbers in the proportional respect of time and interval” [1]. Music, therefore, is defined as the “science of well regulated movement, movement sought in itself” [1]. But what is meant by “movement” in music? We may attempt a definition by referring to dynamism: the way elements relate and interact, changes of state, modifications, transitions … Therefore, everything that takes place in the act of “cum ponere” in music is like continuous becoming, which, taken in absolute terms, harks back to the ideal representation of the “music of the spheres” of Plato and the Pythagoreans: perfect, immanent and imperceptible to the senses. However, it is because of this very becoming, in the world of the senses, that the idea takes concrete shape in the form of an artistic object. In this case, composition can be seen as the construction of musical architectures. The operational strategies and models utilized refer firstly to discrete space-time, which defines the framework in which the musical event is to be represented; secondly to sound and its physical qualities - pitch, intensity, timbre; thirdly to the duration of the sound, taken as describing the event in space-time (Nattiez, 1977 [30]). The application of models in the compositional process consists
726
E. Pietrocini
Figure 1.
Figure 2.
Figure 3.
essentially in the organization of sound material into structures of different levels of complexity (Pietrocini, 2006 [33]). To make the principal features of this process clear we will quote a few examples of musical construction. One of the simplest compositions is the sequence: all you need is a certain number of sounds set out in time and ordered according to a criterion (Figs.1, 2). If then each sound in the sequence is given duration, we have a melodicrhythmic structure, a “musical phrase”; to this we can apply different degrees of loudness (Forte, Piano, Crescendo, Diminuendo etc.), determining the dynamics (Figure 3). Naturally, in this case the sounds used were structured a priori. In fact, the chosen sounds belong to a range defined in the tempered system (an octave divided into 12 equal semitones) and the duration values used, which have been part of the musical code since the 13th century (Gallo, 1979 [20]), are the result of time-honoured reflection on rhythmic units and symbolic systems (Cattin, 1979 [14]). Furthermore, it is important to specify that the values represent only duration ratios not the actual length of the sound in time. This in fact is established by pulse, more commonly defined as “musical tempo”. Simple operations in complex systems … or rather simple operations that have contributed to the development of complex systems? 2.2. The importance of number five To represent the range of sounds in the example quoted we used the image of a simple keyboard (Figure 1). In mechanical keyboard instruments (piano, harpsichord, organ etc,), the pitch of the individual sounds is predefined: each key corresponds to one or more vibrating bodies (strings or pipes) which are
Music: Creativity and Structure Transitions
727
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Figure 4. Table 1.
Sounds 1 2 3 4 5 6 7 8
Frequency ratio 1 3/2×3/2×1/2 3/2×3/2×3/2 2/3×2 3/2 3/2×3/2×3/2 3/2×3/2×3/2×3/2×3/2 2
tuned to produce a determined frequency. The frequencies are arranged in ascending order, from left to right. From the way the keys are arranged it is easy to identify a pattern that is repeated at different pitches (Figure 4). The sequence defines a framework (the octave) which has 12 sounds, here identified with numbers (white keys) and symbols (black keys). When the sequence is repeated, the same symbols and numbers are identified as sounds whose frequency is in a ratio of 2:1; for example, the sound 6 in the first series has a frequency of 440 Hz, the corresponding sound in the second series is 880Hz. This relationship between the two sounds, the octave interval (diapason), was first identified by the Pythagoreans in their studies on the monochord: by dividing the string in half you get the next highest octave. Applying the same principle but dividing the string by 2/3 you get a fifth (diapente). The octave and the fifth have a ratio with the base sound of 2:1 and 3:2 respectively. All the other sounds of the octave can be identified through numerical relationships and the succession of fifths (Righini, 1994 [35]) (Table 1).
728
E. Pietrocini
Figure 5.
Figure 6.
Figure 7.
Using the names of the notes to identify sounds of the white keys in the sequence, the ratios involved in the succession of fifths can be seen in Table 2. In the present symbolic system, arranging the sounds on a five line staff as on the keyboard, we get a scale. The ratio between adjoining sounds are called tones (8/9) and semitones (243/256). If we continue with intervals of a fifth starting with note B (the last obtained with the ratio of 2:3 in the series in Figure 5), we get the sounds of the black keys, which musically are indicated with the signs of # (sharp) and b (flat) (Figure 6). By using this method, based on a succession of fifths, we obtain all 12 sounds of the octave, and the cycle should close with the note we started with (Figure 7). However, this does not happen: between the initial sound and the final note there is a difference defined by the ratio of 531441/524288, that is about 23.46 cents - the Pythagorean comma, and the difference between the two values is called the schisma.
Music: Creativity and Structure Transitions
729
Table 2. Notes
PYTHAGOREAN SCALE ratios
C
1
D
9/8
E
81/64
F
4/3
G
3/2
A
27/16
B
243/128
C
2
The equal division of the octave (using a ratio of 1/2) in reality is mathematically irreconcilable with the cycle of fifths (based on the ratio of 2/3) because no power of two can ever equal any power of three. 2.3. A blanket that is too short The problem, first identified by the Pythagorean Archytas of Tarentum (428347 B.C.), has long been the subject of study for mathematicians and musical theorists (Boyer, 1968 [10]), also because of its wide-ranging scientific and philosophical implications. Still, the Pythagorean system remained in use in Western musical practice until the emergence of polyphony and fixed-note instruments in the 15th century (Apel, 1962 [3]). In this period, increasing use was made of intervals of a third and a sixth in polyphonic compositions, which, however, were particularly unpleasant to the ear; for this reason, organ builders began to “temper” the fifths, that is to tune them in a way as to distribute the Pythagorean comma, and obtain major thirds that were more consonant, that is closer to the ratio of 4/5. It is in the 16th century that “temperament” is first mentioned by writers: this is meantone temperament, in which all fifths are flattened by the same amount by distributing the comma and eliminating the beat on major thirds for the process of tuning (Bellasich, Fadini, Leschiutta and Lindley, 1984 [7]). However, in meantone temperament, the fifth cycle still does not close properly, since it produces a very sharp interval called “wolf fifth”. Other solutions were found that offered greater consonance and modifications were made to the construction of some instruments. In 1558 the musical theorist Gioseffo Zarlino proposed a radical reform of the musical scale. To the ratio of 2/1 (octave), 3/2 (fifth) and 4/3 (fourth) he
730
E. Pietrocini Table 3. Unison Major Second Major Third Perfect Fourth Perfect Fifth Major Sixth Major Seventh Octave
1:1 9:8 5:4 4:3 3:2 5:3 15:8 2:1
Figure 8.
added the major and minor third, which had a ratio of 5/4 and 6/5 respectively. The remaining intervals were obtained by interpolating the ones that had already been determined: major tone = fifth − fourth = 9/8; sixth = fourth + major third = 5/3; seventh = fifth + major third = 15/8 (Table 3). In Zarlino’s scale (“scala naturale” or natural scale) (Figure 8) there are two different tone intervals, the major tone (9/8) and the minor tone (10/9); it cannot be considered temperament because it cannot be obtained using a cycle procedure and the intervals are perfect only with respect to the base note, thus it was impractical for musical practice, despite the fact that specific instruments were built for the purpose such as the archicembalo or arciorgano, which had 31 keys per octave, but they soon fell into disuse. Even though Zarlino’s theory was closer than any other to the phenomenon of harmonic sounds, which wasn’t discovered until Sauver, in 1700, meantone temperament was used in musical practice for much of the 17th century, while research on the cyclic method continued in parallel with the evolution of instruments and performance techniques (Tuzzi, 1993 [39]) (Figure 9). In 1691, the German Andreas Werckmeister discovered that cyclic tuning using five tempered fifths and seven Pythagoreans fifths could close the cycle of fifths and eliminate the “wolf fifth” so that music could be performed in all tonalities. To this system numerous variations were introduced, known in Germany as well temperament, today often called unequal temperament. “The
Music: Creativity and Structure Transitions
731
Figure 9. Harmonic sounds; differences between harmonic sounds and sounds of the natural scale expressed in cents
Well Tempered Clavier” of J. S. Bach was the first work to systematically explore its potentialities, although we still do not know for sure to which of these temperaments the author was referring. In “well tempered” tuning systems, the tonalities differ because interval width is not constant; this aspect helps explain the reason for choosing a certain tonal framework to produce a desired expressive effect or rhetoric function, at least until the mid-19th-century (Raschl, 1977 [34]). In the 18th and 19th century an increasing number of theorists and musicians turned their attention to the problem of temperament: Leibniz, Mersenne, d’Alembert and, among musicians, Rameau, placed the physicalmathematical modelling of acoustic phenomena and the theory of harmonic sounds at the basis of music theory and began to consider the possibility of equal temperament, which would allow music to be played in all tonalities (Fubini, 1976 [19]). Following on from the theories of Werckmeister, in 1706 the mathematician Neidhart formalised equal temperament with the introduction of a very simple idea: he divided the octave into 12 equal parts using an exponential function. Given the octave ratio of 2/1, a semitone will have a value of 12 2 , that is a number which is multiplied 12 times by itself to give 2 (Righini, 1994 [35]). Considering the question in acoustic terms, we can say that by multiplying a frequency by the 12th root of two we obtain a frequency that is a semitone higher than the base frequency (Table 4). Equal temperance is a theoretical expedient that became a stable part of musical practice between the 19th and 20th centuries: it eliminated the distinction between major/minor tone and diatonic/chromatic semitone, sharps and flats (for example G# = Ab), dividing the tone into two equal semitones. This simplification eliminates many of the inconveniencies for fixed note instruments. The only disadvantage is that compared to natural harmonics the notes are slightly out of pitch. And this is not exactly a negligible detail, even though the ear is almost used to ignoring the difference.
732
E. Pietrocini Table 4. Unison
1
Major Second
12
22
Major Third
12
24
Perfect Fourth
12
25
Perfect Fifth
12
27
Major Sixth
12
29
Major Seventh Octave
12 11
2
2
The blanket is still too short but it gives cover, as long as we keep still … a pity that music is in continuous movement. 3. Part II 3.1. The first move on the chessboard Returning to our simple keyboard, all 12 sounds of the octave finally have a name, a symbol to effectively represent them and a well-defined frequency: the chessboard is ready and the pieces are in place for the great game of composition. The first move is to generate a musical idea: this could be a fairly simple structure, such as an interval, a chord, a rhythmic element, or a more extended sequence, such as a series of intervals, chords or rhythms that make up a musical phrase. But when and how does the idea first come to the composer? And this is the most mysterious aspect of the game. For the Romantics, the generative idea was the fruit of artistic inspiration, a manifestation of the infinite spirit which emerges through finite determinations, by virtue of being able to preconceptually sense the noumenal reality beyond phenomenal limits: “music is the artform which is most devoid of corporeal elements, in that it represents a
Music: Creativity and Structure Transitions
733
Table 5. Natural notes
Do
Re
Mi
Fa
Sol
La
Si
Sharp
Do
Re
Fa
Sol
La
Flat
Re
Mi
Sol
La
Si
German notation
C
D
E
F
G
A
B
H
Figure 10.
movement in itself, detached from objects and carried by invisible wings, such as the wings of the spirit ” [36]. Effectively, if we think of the immortal tunes of great works, which have become part of the shared heritage of civilization and universal language (is there a community that does not identify, for example, with Beethoven’s “Ode to Joy”?) we cannot but recognize in them a transcendent principle. Great themes “speak” to individual and collective conscience, perhaps because they represent a moment of unified experience, space-time linking the bodily self, conscience and the mind (Solms and Turnbull, 2002 [38]). We must not forget, however, the perceptive aspect of the musical phenomenon: an idea, understood as an event-object, needs to undergo a construction process to manifest itself. In this sense, a musical theme can take shape simply by selecting sounds according to a logical criterion or even by trusting in chance and, subsequently, using a structure model to combine the various elements (Bent and Drabkin, 1980 [8]). There are numerous examples of these procedures throughout the history of musical production. The most famous of all is the B-A-C-H theme, which associates the alphabet to German musical notation (Table 5). The “Great Bach” (Johann Sebastian) transformed his name into a musical signature, and even used it in his dedications to “God the Creator of everything”. Very often, at the end of his manuscripts we find the letters S.D.G., which stand for Soli Deo Gloria; with the help of gematria, it is immediately clear that SDG (18+4+7 = 29) corresponds to (9+18+2 = 29) JSB.
734
E. Pietrocini
Figure 11. The autographed manuscript of the last page of The Art of Fugue.
The theme B-A-C-H appears in several of his compositions (Hofstader, 1979): in Kleines Harmonisches Labyrinth BWV 591 for organ, in the Canonic Variations on “Von Himmel Hoch” BWV 769, and the Passion according to Matthew (Matthäuspassion) BWV 224, in which the chorus sings “Wahrlich, dieser ist Gottes Sohn” (“Truly, this man was Son of God ”) and in the final fugue of The Art of fugue (Die Kunst der Fuge) BWV 1080. In this last work, which Bach left unfinished at his death, the theme appears five beats from where the score stops; there follows a note, written by his son Carl Philipp Emanuel: “Über dieser Fuge, wo der Nahme BACH im Contrasubject angebracht worden, ist der Verfasser gestorben” (“At the point where the composer introduces the name BACH in the countersubject to this fugue, the composer died ”). Beyond the musicological controversies on the truth of this statement, we would like to think that, yet again and for the last time the great Bach had placed his seal on an inheritance whose very incompleteness is of the most profound significance: the eternal tension of continuous transformation (Figure 11). 3.2. The game of changes As briefly mentioned before, the initial musical idea can emerge in different ways, not ascribable to a project or to a logical criteria established a priori. There is a particular musical practice in which the theme stems from the performance itself: improvisation (Simpson, 1667 [37]). Improvisation is present in all musical cultures and it is reasonable to believe that it contains the germs of creative production: an intuitive process in which the generative idea emerges through a form of implicit learning which involves both cognitive and affective aspects. “Musical thinking”, in fact, involves an integrated experience of the
Music: Creativity and Structure Transitions
735
perceptive world in every moment of the temporal continuum (Solms and Turnbull, 2002 [38]). In improvisation, each single phrase emerges with the need to continually transform the thematic material; there’s no development in the repetition or reiteration of models. Only through variation, understood as a dynamic process of revision, can creative production manifest itself fully. The act of variation, which involves modifying the organizational models and structures of musical material, is a conscious, ordered operation (Pietrocini, 2006 [33]). The generative idea is adapted by modifying pitch, duration, and dynamic characteristics in terms of the present, consciously and especially with reference to relational systems. It is impossible in this paper to discuss the incredible number of variation techniques: we will simply try to briefly describe the basic elements and illustrate, through historical perspective, some examples that can help shed light on different levels of complexity. 3.3. Variations and simple systems of musical organization 3.3.1. Rhythmic variations → rhythmic system Given the initial rhythmic element, changes can be made to duration values, pulse and/or subdivision. Original rhythmic element
Example of duration variation
Example of duration variation
736
E. Pietrocini
3.3.2. Melodic variations → melodic system Given a series of initial sounds, changes can be made to the order:
Original series
Example of inverse variation (the original is turned upside down and the intervals are inverted)
Example of retrograde variation (the original is reversed, starting from the end). The primary structure or musical phrase is the result of interaction between rhythmic and melodic systems, and their respective forms of variation: a process of rhythmic variation has been implemented by applying duration values, according to an established pulse, to a series of consecutive notes in the octave, as in the example
The original series is still recognizable, but it can no longer be assimilated to the product of the transformation, because the formal characteristics are incompatible. Interaction between rhythmic and melodic systems through variation has led to the configuration of an entity whose relational models belong to another level of complexity and a new reference system (Pessa, 2000, 2002 [31,32]). In fact the original has undergone a process of change. 3.4. Variations and complex systems of musical organization Historically, the organization of a musical phrase according to pitch and duration arose from the evolutionary needs of language: from the earliest recorded history, vocal modulation of sounds has been associated to the rhythm and stress of the words in Western and Eastern civilization.
Music: Creativity and Structure Transitions
737
Figure 12. Melisma on the syllable DE.
As regards classical Western tradition, the first documented forms of melodic rhythmic variation are to be found in mediaeval liturgical chant: it involves ornamental procedures obtained through the performance of several notes for the same syllable of text; these new melodic structures, called melisma, were then sung to other texts, giving rise to Tropes and Sequences, self contained pieces which soon became accepted parts of the sacred repertoire (Cattin, 1979 [14]) (Figure 12). Melisma are probably one of the most archaic forms of improvisation: ethnomusicologists have identified numerous examples in primitive cultures and in popular tradition . In the same context, in particular in group improvisations, the use of melisma has been ascertained in practices involving polyphony, i.e. the simultaneous performance of different sounds or a series of different sounds, which are superimposed and proceed in parallel (de la Motte, 1981 [29]). 3.5. Counterpoint Polyphony in classical music is documented for the first time in Scotus Eriugena’s De Divisione Naturæ in the ninth century, but we may suppose that before this there was a consolidated custom of sacred and profane vocal practice, arising probably from the performance of the same melodic line with voices of different register (Apel, 1962 [3]; Howen, 1992 [22]). The structures obtained from the simultaneous association of more sounds or several melodic lines are called harmony and counterpoint, respectively. In these frameworks, the procedural models are very complex: the techniques of variation establish intricate networks of relationships between these systems. As described above, the problem of relations between intervals and the correct relationship between sounds has always been at the heart of musical research. Counterpoint (punctum contra punctum) involves a series of interval relationships established between superimposed sounds, which in turn are organised into independent melodic lines. The relationships and techniques of variation in this system were first formalised in the 12th century, with the development of organum (de la Motte, 1981 [29]). This is a composition for several voices in which one or more
738
E. Pietrocini
Figure 13. Parallel, note against note, at the same interval distance.
Figure 14. Contrary motion, note against note.
Figure 15. Melisma, several notes against one.
overlying or underlying parallel melodies are added to the vox principalis, a monodic chant from liturgical repertoire; the former are variations of the cantus firmus sung by the vox principalis (Figures 13,14,15). In organum, as in later polyphonic forms, the process of composition consists in the variation of the cantus firmus or tenor (which as of the 13th century was a melodic phrase complete with mensural values) and the superimposition of the original idea and melodic lines or phrases derived from it. Once again, a process of change has taken place. In fact, the contrapuntal process determines the emergence of properties that configure a new dynamic system (Baas and Emmeche, 1997 [4]). Counterpoint has greatly influenced the evolution of musical production, indeed it was the only system of reference for classical music until the 16th century and it has determined irreversible changes to the formal representation of musical thought. In a certain sense, it can be said that the contrapuntal system constitutes, even today, the archetype of musical architecture.
Music: Creativity and Structure Transitions
739
Figure 16.
3.6. Harmony Harmony, too, can be seen as a “vertical reading” of counterpoint. Probably, this new model stemmed, as always, from a practice widespread among musicians that had a poly-vocal instrument: the reductio partituræ (Del Sordo, 1996 [17]), which involved playing the polyphonic score on an instrument; inevitably, the musician had to read vertically, concentrating on simultaneous sounds. This practice, adapted to the resources of the instrument, led to chords being formed (Figure 16). Very soon, the process of variation introduced through the vertical reading of polyphony, gave rise to a completely autonomous system, based on a succession of chords in relation to the main melody (Caccini, 1614 [13], Bianconi, 1982 [6]). The most evident aspect of this change is in the multiplication of different levels of perspective on which the sound material is placed. In counterpoint all voices play an equal part in the musical texture; the generative idea, for example the theme of a fugue, is presented by each in turn and all reworked elements (countersubjects, divertimenti etc.) underline it, without needing to use musical dynamics; in harmony, hierarchies are defined on the basis of the expressive primacy of the original idea (Bach, 1753) [5], which appears in a single voice; therefore it is necessary to differentiate the functions and the dynamic levels of the other parts. The new system began to be assimilated as a form from the middle of the 16th century. In his 1558 Istituzioni harmoniche, Zarlino (1558) [41] speaks of music as an instrument that “moves the heart”, closely connected to the text: “we now have to try to make the harmony fit the words. I said make the harmony fit the words, because although in the second part … it was said that melody is a mixture of speech, harmony, and number, and that in a composition one should not be before the other, yet speech is the main thing and the other two parts are there to serve it …”.
740
E. Pietrocini
This relational model took on concrete shape in the practice of the Basso Continuo, an extempore form of accompaniment which was popular in European musical culture until the early 1800s (Ferguson, 1975 [18]). It must be added that the entire harmonic system, in modal and tonal frameworks, is based on hierarchical relationships: each chord has a specific function and can be placed in relationship with the others on the basis of procedural models defined in terms of agogic criteria. We spoke Zarlino’s important studies and theories in the first part. We may add that the development of the harmonic system kept pace with research and developments in temperament, and work on relational harmonic models has contributed decisively to the configuration of musical macro systems (modality, tonality, seriality …). Nevertheless, we must again underline that the development of harmony is also fundamentally linked to the extemporaneousness of the variation process. At the height of the baroque period, instrumental music for keyboard and lute used harmonic models and redefined them on the basis of the resources of the instruments (Hubbard, 1965 [25]). Improvisation, widespread in all vocal and instrumental music, involves an exploration of technical and expressive possibilities that goes well beyond custom. Even where the relationship is one of subordination to the melody, as in the Continuo, chord agglomerations tend to accumulate extraneous elements such as appoggiaturas, chromaticism and passing notes. In improvised compositions such as toccatas and unmeasured preludes, these “interferences”, which used to be accidental and sporadic, became consolidated practice and then an integral part of the musical discourse (Moroney, 1985[28]). 3.7. Toccatas and Preludes Obviously, this brings substantial change to the entire relational system: if we listen to a toccata by Johann Jakob Froberger a or an unmeasured prelude by Louis Couperinb , it is easy to lose your musical orientation; in these codified pieces the “interferences” are used in a conscious way to produce stupendous pieces of great expressive strength with incredibly “modern” harmonic solutions.
a b
German composer (Stutgart 1616 − Héricourt, Montbéliard 1667). French composer (Chaumes-en-Brie 1626 − Paris 1661).
Music: Creativity and Structure Transitions
741
Figure 17. Beginning of the first Toccata FbWv 101 by Johann Jakob Froberger.
Figure 18. Beginning of the prélude non mésuré in A flat by Louis Couperin (original in Ms Bauyn, Bibliotheque National de France, Res. Vm (7) 674-675, modern edition: Oiseau-Lyre, 1985).
Froberger was active in France between 1652 and 1662: he was certainly acquainted with many musicians at the French court, and in all probability knew Louis Couperin, then organist at the cathedral of Saint Gervais, in Paris. What is certain, is that Couperin was well acquainted with the works of Froberger, even entitling one of his Préludes non mésurés “a l’imitation de Monsieur Froberger”. The first beats, in fact, have a strong analogy to the beginnings of one of German composer’s toccatas; almost immediately, though, a different course is taken, and, although we can see the use of assimilable phraseological models, the compositional solutions and harmonic successions denote completely original development (Figures 17,18).
742
E. Pietrocini
It is to be noted that the type of notation used by the two composers is completely different: Froberger used mensural notation, with well-defined duration values, although the division into beats and pulses is not always indicated with precision; Couperin, on the other hand, used a white notation without duration values or indications of time, but long curved lines (tenues) to indicate sound groupings in the same chord structure. These two choices probably denote efforts to effectively represent a method of performance on the one hand, and on the other the desire to leave factors involving the extemporary nature of interpretation up to the performer. 4. Conclusions What is of greatest interest, for this paper, is the concrete example offered by these two compositions of some nonlinear effects of the transformations that take place in musical structures through variation. Perhaps it is not superfluous to underline that, in music, variation also constitutes a formal structure, much used in classical and popular musical production; however, we have deliberately ignored this aspect to focus on the variation understood as a dynamic process involving the reworking and transformation of the original material. There is one thing that all the elements described have in common: variation is used to substantially modify relational systems only in determined conditions of criticality. One of these seems to be extemporaneity; in improvisation, the temporal dimension is the here and now: there is no planning, correction or going back but a course is plotted along the lines of intuitive thought (Morin, 1993 [27]). This is a very particular form of implicit learning on the go, which perhaps is analogous to the mysterious cognitive processes of early childhood. A second element of criticality is the saturation of procedural models; extremely high levels of complexity are liable to lead to the complete “collapse” of a compositional system. This phenomenon can be seen, for example in some of the contrapuntal works of Johann Se Bach (David, 1972 [15]): right in the middle of a highly intricate and faultless fugue, we may come across what could easily be labeled an “error”: a false relationship, a prohibited movement, a dissonant interval that is not resolved in the way it should be. Incredibly, the presumed error is immediately followed by a structure that redefines its function according to a new procedural model, re-establishing overall balance. In any event, what is most interesting is not so much the transformation itself, but the collective learning that it determines (Minati and Pessa, 2006
Music: Creativity and Structure Transitions
743
[26]): in time the new systems are recognized, codified, used by the social community and incorporated into shared heritage. We can say that the very meaning of music emerges from this continuous invention and transformation: in the great game of change, the match never ends. References 1. Agostino, De musica (Sansoni, Florence, 1969). 2. W. Apel, Geschichte der Orgel und Klaviermusik bis 1700 (Bärenreiter-Verlag, Kassel, 1967). 3. W. Apel, Die notation der polyphonen Musik. 900 - 1600 (Breitkopf & Härtel Musikverlag, Leipzig, 1962). 4. N.A. Baas and C. Emmeche, Intellectica 25(2): 67-83 (1997), (also published as: SFI Working Paper 97-02-008, Santa Fe, Institute, New Mexico). 5. C.Ph.E. Bach, Versuch über die wahre Art das Clavier zu spielen (1753), (Italian translation: L’interpretazione della musica barocca - un saggio di metodo sulla tastiera, Ed. G. Gentili, (Edizioni Curci, Verona, Milan, 1995)). 6. L. Bianconi, Il Seicento in Storia della Musica, Vol. IV, (Società Italiana di Musicologia, EDT, Turin, 1982). 7. A. Bellasich, E. Fadini, S. Leschiutta and M. Lindley, Il Clavicembalo (EDT, Turin, 1984). 8. I. Bent and W. Drabkin, Analysis (Macmillan Publishers Ltd., London, 1980), (Italian translation: Analisi musicale, Ed. C. Annibaldi, (EDT, Turin, 1990). 9. S. Benvenuto, Lettera Internazionale,73-74, 59-61 (2002). 10. C.B. Boyer, A History of Mathemathics (John Wiley & Sons, 1968), (Italian translation: Storia della Matematica (Arnoldo Mondadori Editore, Milan, 1980)). 11. G. Bruno, Lettera Internazionale, 73-74, 56-58 (2002). 12. G. Bruno and E. Pietrocini, in Mathesis Conference Proceedings – Vasto, April 1012, 2003, Ed. E. Rossi, (Abruzzo Regional Council Presidency, 2003). 13. G. Caccini, Le Nuove Musiche et nuova maniera di scriverle (1614), (facsimile of the original at the Florence National Library, Archivium Musicum S.P.E.S. Florence, 1983). 14. G. Cattin, “Il Medioevo I” in Storia della Musica, Società Italiana di Musicologia, Vol. I, Part II, (EDT, Turin, 1979). 15. H.T. David, 1972, J.S.Bach’s Musical Offering, New York: Dover publications. 16. H.T. David and A. Mendel, The Bach Reader (W.W. Norton, New York, 1966). 17. F. Del Sordo, Il Basso Continuo (Armelin Musica - Edizioni Musicali Euganea, Padua, 1996). 18. H. Ferguson, Keyboard Interpretation from the 14th to the 19th Century (Oxford University Press, New York, 1975). 19. E. Fubini, L’estetica musicale dall’antichità al Settecento (Einaudi, Turin, 1976).
744
E. Pietrocini
20. A. Gallo, in Storia della Musica, Vol. II, Società Italiana di Musicologia, (E.D.T., Turin, 1979). 21. D.R. Hofstadter, Gödel, Esher, Bach: an Eternal Golden Braid (Basic Book; New York, 1979), (Italian translation: Gödel, Esher, Bach, un’Eterna Ghirlanda Brillante, (Adelphi, Milan, 1984). 22. H. Howen, Modal and Tonal Counterpoint from Josquin to Strawinskj (Wadsworth Group/ Thomson, Belmont, CA, 1992). 23. I Ching, The “I Ching” or the Book of Changes (Bollingen Foundation, New York, 1950). 24. Learning, Italian translation: Il Contrappunto modale e tonale da Josquin a Strawinskj (Ed. Curci, Milan, 2003). 25. F. Hubbard, Three centuries of harpsichord making (Harvard University Press, Cambridge, 1965). 26. G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). 27. E. Morin, Introduzione al pensiero complesso. Gli strumenti per affrontare la sfida della complessità (Sperling e Kupfer, Milan, 1993). 28. D. Moroney, Critical Apparatus in L. Couperin, Pièces de Clavecin, Publiées par Paul Brunold, Éditions de l’Oiseau-Lyre, Monaco, 1985. 29. D. de la Motte, Kontrapunkt - Ein Lese und Arbeitsbuch (Bärenreiter- Verlag, Kassel, 1981), (Italian translation: Il Contrappunto, un libro da leggere e da studiare (Ricordi & C., Milan, 1991). 30. J.J. Nattiez, Il discorso musicale (Einaudi, Turin, 1977). 31. E. Pessa, La Nuova Critica 35, 53-93 (2000). 32. E. Pessa, in Emergence in Complex Cognitive, Social and Biological Systems. Ed. G. Minati and E. Pessa, (Kluwer Academic/Plenum Publishers, New York, 2002), pp. 379-382. 33. E. Pietrocini, in Systemics of Emergence, Ed. G. Minati, E. Pessa and M. Abram, (Springer, New York, 2006), pp. 399-415. 34. E. Raschl, Beihefte der Denkmaler Tonkunst den Osterreich 28, 29-103 (1977). 35. P. Righini, L'acustica per il musicista. Fondamenti fisici della musica (Zanibon, Padua, 1994). 36. W.J. Schelling, “Philosophie der Kunst”, in Schelling Werke, Vol. III, (Eckardt Verlag, Leipzig, 1907). 37. C. Simpson, The Division-Viol or The Art of playing extempore upon a Ground (1667), Lithographic facsimile of the second edition, (J. Curwen & Sons, London). 38. M. Solms and O. Turnbull, The Brain and the inner world (2002), (Italian translation: Il cervello e il mondo interno (Raffaello Cortina Editore, Milan, 2004)). 39. C. Tuzzi, Clavicembali e Temperamenti (Bardi Editore, Rome, 1993). 40. L. Von Bertalanffy, General Systems Theory (George Braziller, New York, 1968). 41. G. Zarlino, Istituzioni harmoniche (Venice, 1558).
THE EMERGENCE OF FIGURAL EFFECTS IN THE WATERCOLOR ILLUSION BAINGIO PINNA(1), MARIA PIETRONILLA PENNA(2) (1) Department of Science of Languages, University of Sassari Via Roma 151, I-07100 Sassari, Italy email: [email protected] (2) Department of Psychology, University of Cagliari Via Is Mirrionis, 1, Cagliari, Italy email: [email protected] The watercolor illusion is characterized by a large-scale assimilative color spreading (coloration effect) emanating from thin colored edges. The watercolor illusion enhances the figural properties of the colored areas and imparts to the surrounding area the perceptual status of background. This work explores interactions between cortical boundary and surface processes by presenting displays and psychophysical experiments that exhibit new properties of the watercolor illusion. The watercolor illusion is investigated as supporting a new principle of figure-ground organization when pitted against principles of surroundedness, relative orientation, and Prägnanz. The work demonstrated that the watercolor illusion probes a unique combination of visual processes that set it apart from earlier Gestalt principles, and can compete successfully against them. This illusion exemplifies how long-range perceptual effects may be triggered by spatially sparse information. All the main effects are explained by the FACADE model of biological vision, which clarifies how local properties control depthful filling-in of surface lightness and color. Keywords: perceptual organization, grouping principles, color spreading, figure-ground segregation, filling-in.
1. Introduction The watercolor illusion (Pinna, 1987 [18]; Pinna et al., 2001, 2003 [19,20]) is an assimilative spread of color emanating from a thin colored edge (orange) lining a darker chromatic (purple) contour (see Figure 1). The spread of color (coloration effect) is uniform and it extends over large distances. The spatial limit of color spreading is approximately 45 deg. The coloration is complete at 100 ms. All colors can generate a strong coloration effect. The watercolor illusion also occurs on colored and black backgrounds. The optimal line thickness is approx. 6 arcmin. The color spreading effect is
745
746
B. Pinna and M.P. Penna
(1)
(2)
Figure 1. The coloration effect in the Watercolor illusion: When a purple contour is flanked by an orange edge, the entire enclosed area appears uniformly colored by the color spreading of the orange edge. The coloration appearance is like a solid surface color. Figure 2. When Fig. 1 is physically entirely colored in the inside edge by the same orange used in the fringe, the frame appears as a flat plane with two inner wiggly rectangles laying on it. The frame does not manifest the strong figural effect of Fig. 1, where it appears as a rounded surface with two small wiggly rectangles perceived as holes within the solid frame and revealing the white empty space behind them.
much stronger with wiggly lines but it also occurs from straight lines and from chains of dots (see Pinna et al., 2001 [19]). High luminance contrast between inducing lines shows the strongest coloration effect, however, the color spreading is clearly visible at near equiluminance. In high luminance contrast conditions, there is an asymmetry in the amount of color spreading from the two lines. The line with a less luminance contrast relative to the background spreads proportionally more than the line with higher luminance contrast. The color spreads in directions other than the line orientation. In quasi-equiluminant conditions both lines spread at a similar intensity and in opposite directions. Although the direction of the color spreading due to both lines goes in opposite directions, they produce a coloration effect obtained by a combination of both colors in terms of saturation. The chromatic reciprocal influence between lines operates also when there are more than two adjacent and parallel lines. When both lines are replaced by chains of dots and the dots of the inner chain are alternated in different colors, they spread less strongly but the resulting color is a combination (additive mixture) of the component colors. The colors combine also when there are more than two colors or when the dots of the outer
The Emergence of Figural Effects in the Watercolor Illusion
747
purple chain are replaced by dots with alternated different colors. By inserting an empty gap between the two lines, the color spreading is weakened. These phenomenological features, as well as others discussed in the following, suggest that watercolor illusion should be considered as an emergent effect fulfilling constraints which cannot be reduced to the standard ones introduced by Gestalt psychologists (like surroundedness, relative orientation, and Prägnanz). It rather evidences the operation of an entirely new figureground organization principle. The latter can be understood, from a theoretical point of view, as a consequence of the interaction between two different processes operating within visual cortex, that is parallel boundary grouping and surface filling. The complementarity of these processes, described by a recent model of visual perception (Grossberg, 1994, 1997, 2000 [4,5,6]), seems to account both for the observed visual phenomenology and for the neurobiological findings about visual cortex. 2. Figural Effects in the Watercolor Illusion The watercolor illusion not only imparts the color of the inner edge onto a large enclosed area (coloration effect), it also enhances the figural property (figural effect) of this area relative to the surrounding (complementary) area that appears as background (Pinna, 1987 [18]; Pinna et al., 2001, 2003 [19,20]). In Figure 1, the frame surrounding the inner wiggly rectangles manifests a figural appearance and a univocal (poorly reversible) figure-ground segregation that is not comparable with a condition where the same outlined figure is physically entirely colored in the inside edge by the same orange used in the fringe (see Figure 2). In Figure 1, the watercolor illusion strengthens the figural effect of the frame by segregating it in depth and giving it the perceptual property of a rounded surface, extending out from an otherwise flat surface (volumetric effect). On the other hand, the two tilted and small wiggly rectangles appear as holes within the solid frame revealing the white empty space behind them. In contrast to Figure 1, in Figure 2 the frame appears as a flat plane with two inner wiggly rectangles laying on it. While in Figure 1 the figure-ground organization is difficult to reverse in favor of the wiggly rectangles appearing as a figure on the top of a flat large rectangle, in Figure 2 this latter result may appear more saliently than the complementary perceptual result, where the large frame is perceived in front with two wiggly rectangular holes. In Figure 3, a brighter physical orange within the frame elicits perceptual figure-ground organization
748
B. Pinna and M.P. Penna
(3)
(4)
Figure 3. By making brighter the physical orange of Fig. 2 within the frame, it is easier than in Fig. 2. to perceive wiggly small rectangles upon a large flat rectangle. Figure 4. A control for Fig. 1 is obtained by replacing the orange line with a purple line as the outer one. Differently from Fig. 1, the frame does not show any figural salience comparable with the watercolor condition of Fig. 1.
similar to Figure 2, but with stronger differences than Figure 1; that is, it is easier to perceive wiggly small rectangles upon a larger flat rectangle. These percepts can be interpreted in the light of Rubin’s principle of relative contrast (Rubin, 1921 [22]): All else being equal, the region with the higher contrast tends to appear as a figure. Therefore the frame in Figure 2, having a greater contrast than the one in Figure 3, tends to appear in front more as a figure; however, in contrast with the Rubin’s principle, it appears much less as a figure than the watercolored frame in Figure 1. Figure 4 shows a control for Figure 1 by replacing the orange fringe with a line of the same purple belonging to the outer boundary of the frame within Figure 1. In Figure 4, the frame does not present the same figural salience as the one in Figures 1, 2 and 3 but it can be more clearly perceived as a rectangular background behind the inner wiggly rectangles that are now more strongly perceived as two solid figures and not as empty spaces or as holes, as in Figures 1, 2 and 3. By adding an orange fringe to Figure 3, the strong figure-ground conditions of Figure 1 are restored (see Figure 5). Notice that in Figure 5, the inner orange is physically the same as the one in Figure 3, but it appears darker, much denser, and much more as a surface color (Katz, 1911, 1930 [14,15]) than the one in Figure 3. The darkness is due to the coloration effect, while the surface color appearance depends on the figural effect of the watercolor illusion. Both come
The Emergence of Figural Effects in the Watercolor Illusion
749
from the darker orange fringe added to the purple outer line that becomes the boundary of the figure. In Figures 1 and 5, the figural effect of the watercolor illusion is pitted and prevails against the classical Gestalt factors of surroundedness (Rubin, 1921 [22]) and relative orientation (Bozzi, 1975 [1]). In the next Sections, the figural effect of the watercolor illusion is investigated as a new principle of figureground organization when pitted against both principles of surroundedness, relative orientation and Prägnanz. It has been shown (Pinna et al., 2003 [20]) that the watercolor illusion can be considered a distinct and more effective principle of figure-ground segregation than usual ones of proximity, good continuation, closure, symmetry, convexity, and past experience (Wertheimer, 1923 [23]). 2.1. Experiment 1: Watercolor illusion vs. Surroundedness In this experiment, the watercolor illusion, assigning to a given region the status of figure against the Gestalt principle of surroundedness, is shown under more simple conditions than those illustrated in Figure 1 and 5. The surroundedness principle states that, all else being equal, a shape surrounded by a larger one tends to be perceived as a figure, while the surrounding shape appears as a background. 2.1.1. Subjects Fourteen undergraduate students who were naive to the purpose of the experiment participated. All had normal or corrected-to-normal vision. 2.1.2. Stimuli The basic stimulus was made of two squares of different size concentrically included within each other (Figure 6). The sides of the outer square were fixed (10.2 deg), whereas the side length of the inner square was varied as follows: 7.9, 6.3, 4.0, and 2.3 deg. According to the Gestalt principle of surroundedness, the smaller, enclosed square should be perceived as figure. Five edge conditions were used: (i) Purple contour only. (ii) Orange fringes lining the interspace (“frame”) between the two squares. This last condition pits the watercolor effect against the Gestalt factor of surroundedness. (iii) Orange fringes lining the inside edge of the small square and the outside edge of the large square. Here, the watercolor effect is synergistic with surroundedness. (iv) Red fringes lining the frame between the two squares. The watercolor illusion should be weaker
750
B. Pinna and M.P. Penna
(5)
(6)
Figure 5. By adding an orange fringe to Fig. 3, the strong figure-ground conditions of Fig. 1 are restored. The inner orange is physically the same as the one in Fig. 3, but it appears darker, much denser, and much more as a surface color than the one in Fig. 3. Figure 6. Stimulus used to test the watercolor illusion against the Gestalt factor of surroundedness in determining figure-ground organization. According to the Gestalt factor of surroundedness, the smaller enclosed square should be perceived as figure, but when this factor is pitted against the figural effect of the watercolor illusion, the small enclosed square appears as a hole and the frame between the two squares is now perceived as a figure.
because of the smaller luminance contrast between purple and red (Pinna et al., 2001, 2003 [19,20]) than between purple and orange of the previous condition (ii). (v) Finally, same physical color than the orange fringe uniformly covering the area of the frame between the two squares. This is a control to demonstrate that the watercolor illusion cannot be reduced to a condition where coloration is the only property implied, but it shows that the figural property is not necessarily linked to the coloration effect. Thus; the results for condition (v) were expected to be similar to the purple-only condition (i). The stimuli were hand-drawn by using a graphic tablet. The CIE x,y chromaticity coordinates of the chromatic components of the patterns were: (purple) 0.30, 0.23; (orange) 0.57, 0.42; (red) 0.62, 0.34. Stimuli were presented under Osram Daylight fluorescent light (250 lux, 5600 °K) and were observed binocularly from a distance of 50 cm with freely moving eyes. 2.1.3. Procedure There was a training period preceding each experiment to familiarize subjects with the task. During practice, subjects viewed some well-known figures from the literature (e.g., face-vase) to familiarize them with concepts of figure and
The Emergence of Figural Effects in the Watercolor Illusion
751
ground. They practiced scaling the relative strength or salience of each figure using percentages. The task was to report what was figure and what ground. In addition, subjects quantified the relative strength (in percent) of a given surface being perceived as figure or ground. Observation was unlimited, but responses were prompt. Each stimulus was presented in a random sequence that was different for each subject. 2.1.4. Results In Figure 7, mean ratings (in %) of the frame being perceived as a figure are plotted for the 5 edge conditions with frame width as a parameter. In the purpleonly condition (i), the frame was perceived as a figure only when its width was smaller than the size of the inner square. This result was expected because of the proximity factor that groups the stimulus to be perceived as a frame and not as a square inside another square. Accordingly to the proximity principle, when the frame was wider than the size of the inner square, the inner square appeared as a figure. The surrounding area became part of the larger square, which completed itself amodally behind the small square. The double organization, perceived in the purple-only condition, was a good control for the other conditions. When orange fringes were added to the inner edges of the frame to produce watercolor spreading (ii), the frame was always perceived as a figure. The opposite result was obtained when the orange fringes were added to the inside edges of the inner square (iii). Under these conditions, the inner square always appeared as a figure, even when the frame is so narrow that it should be organized to be perceived as a figure due to the proximity factor. Differently from the purplecontour-only condition (i), where the width created a division in what (frame or square) is perceived as a figure, in the latter two conditions (ii and iii) the watercolor illusion won irrespective the width. When red fringes were added to the inside edges of the frame, watercolor won again. However, the figural effect depending on the red line was weaker than the effect depending on the orange fringes (condition ii). Finally, as expected, when orange color was physically and uniformly added to the area of the frame, the results were not significantly different from the purple-only condition. On the base of these results the watercolor illusion imparts not only a coloration effect but also an independent figural effect. A two-way ANOVA revealed that the relative strength (in percent) of the frame being perceived as a figure changed significantly depending on the size of the frame (F3,260=472.891, p<0.0001) and the 5 edge conditions
752
B. Pinna and M.P. Penna
Figure 7. Results of watercolor illusion vs. surroundedness: Means of the relative strength of the frame being perceived as figure plotted for five conditions and four frame widths.
(F4,260=1116.621, p<0.0001). The interaction between the two factors is also significant (F12,260=71.825, p<0.0001). In the Fisher PLSD post-hoc analysis, almost all the differences between the individual conditions, both for the width of the frame and for the 5 edge conditions, are significantly different (p<0.0001). The only exception is the purple-line-only (i) vs. physical-orange-colorcondition (v) (p=0.2792). 2.2. Experiment 2: Watercolor Illusion vs. Relative Orientation Here the two squares of the first experiment were arranged according to the Gestalt principle of relative orientation (Bozzi, 1975 [1]), a factor even present in Figure 1 and acting against the figural effect of the watercolor illusion. This principle states that, all else being equal, if, under conditions like those illustrated in Figure 6, the inner square is rotated relative to the outer square, thus breaking the parallelism principle, it should appear as a figure much more strongly than the interspace (“frame”) between the two squares. This principle is the complement of the parallelism Morinaga’s principle (1942) − contours with the same or similar orientations will appear to group in one figure − that in its turn is a special case of the symmetry principle of Wertheimer (1923). Note that, under these conditions, the relative orientation principle is working jointly with the surroundedness factor. This experimental combination should then further weaken the effect of the watercolor illusion on figure-ground segregation.
The Emergence of Figural Effects in the Watercolor Illusion
753
Figure 8. Stimulus used to test the watercolor illusion against the factor of relative orientation in determining figure-ground organization. According to this factor, the smaller, enclosed and rotated square should be perceived as figure, but when this factor is pitted against the watercolor illusion, the inner rotated square appears as a hole and the frame between the two squares is now perceived as a figure.
2.2.1. Subjects A different group of fourteen undergraduate students who were naive to the purpose of the experiment participated. All had normal or corrected-to-normal vision. 2.2.2. Stimuli Stimuli were the same as in the previous experiment except for the rotation of the inner square, which was 10 deg clockwise to the outer square (see Figure 8). The procedure was the same as in the previous experiment. 2.2.3. Results In Figure 9, mean ratings (in %) of the interspace between the two squares being perceived as figure are plotted for the 5 edge conditions with frame width as a parameter. Compared to the previous experiment, the strength of the proximity factor decreased by virtue of the relative orientation of the two squares. In the purple-only condition (i), the inner square appeared as a figure most of the time. An equilibrium rating of 45% was reached when the square was 7.9 deg in width. However, the influence of the watercolor illusion in biasing the interspace to appear as figure, when orange fringes were added to the interspace (condition ii),
754
B. Pinna and M.P. Penna
Figure 9. Results of watercolor illusion vs. relative orientation: Means of the relative strength of the interspace being perceived as figure plotted for five conditions and four sizes of the inner square.
remained very high. This also applied although to a lesser degree when the fringes were red (condition iv). Here, only the larger interspace produced an equilibrium rating between the responses for the interspace and the inner square. When the effect of watercolor spreading was synergistic with relative orientation and surroundedness (condition iii), the figural effect on the inner square was higher than the results obtained previously. As in experiment 1, physical coloration of the interspace with orange (condition v) yielded the same rating pattern as the purple-only condition. Results of a two-way ANOVA revealed that the relative strength (in percent) of the interspace between the two squares being perceived as figure changed significantly dependent on the width of the interspace (F3,260=238.094, p<0.0001) and the 5 edge conditions (F4,260=838.119, p<0.0001). The interaction between the two factors is also significant (F12,260=22.464, p<0.0001). In the Fisher PLSD post-hoc analysis all the conditions, both the width of the interspace and the 5 edge conditions, are significantly different (p<0.0001), except for the purple-contour-only condition (i) vs. the physicalorange-color condition (v) (p=0.1207). 2.3. Experiment 3: Watercolor illusion vs. Prägnanz Prägnanz principle states that, all else being equal, regular, simplest and most stable of possible shapes may also contribute to figure-ground segregation (Koffka 1935 [16, p. 138]). As demonstrated in the previous experiments, the frame can become figure due to the watercolor illusion. Thus, we used the
The Emergence of Figural Effects in the Watercolor Illusion
(a)
755
(b)
Figure 10. Stimuli used to test the watercolor illusion against the factor of Prägnanz in determining figure-ground organization. (a) When orange fringes are added to the frame obtained in between the larger circle and the square and in between the hexagon and the small circle, the frame appears as a figure, while the space in between the square and the hexagon and the space inside the small circle appear as holes (left). (b) When orange fringes are added to the other complementary frames obtained in between the square and the hexagon and inside the small circle and outside the larger circle, these fringed frames appear as figure and the complementary regions as holes (right).
watercolor illusion as a tool to change the appearance of some regular geometrical shapes by making complementary frames between them to be perceived as figure. In other words, the watercolor illusion is used to create different figures depending on where it occurs. 2.3.1. Subjects A different group of fourteen undergraduate students who were naive to the purpose of the experiment participated. All had normal or corrected-to-normal vision. 2.3.2. Stimuli The basic stimulus used was made by 4 geometrical figures of different sizes, one included in the other: a small circle was included in a larger hexagon, that was included in a larger square, that was included in a larger circle. The small circle has a 4 deg diameter; the larger circle 11.31 deg diameter; the hexagon has a 3 deg apothem; the square has a 7.4 deg side. The stimulus was varied in 5 different conditions. (i) Purple-only contour, where four different outlined figures, or four surfaces not transparent, one overlapped to the other, are predicted to be perceived. (ii) Orange fringes added to the frame in between the larger circle and the square and in between the
756
B. Pinna and M.P. Penna
Figure 11. Results of watercolor illusion vs. Prägnanz: Means of the relative strength of the interspace in condition (ii – see the text) being perceived as figure, plotted for five conditions.
hexagon and the small circle (see Figure 10top). The watercolored frames are predicted to be perceived as figures, while the complementary spaces in between the square and the hexagon and the space inside the small circle should appear as holes. (iii) Orange fringes added to the frames complementary to (ii); i.e. those in between the square and the hexagon, inside the small circle and outside the larger circle (see Figure 10bottom). Under these conditions, the fringed frames should appear as figures while the complementary regions as holes. (iv) Physical orange in the inner edges of the frames of condition (ii). No differences with the purple-only condition are predicted. (v) Physical orange in the inner edges to the frames of condition (iii). No differences with purple-only condition are predicted. The stimuli were hand-drawn by using a graphic tablet. The procedure was the same as in the previous experiment. 2.3.3. Results Mean percentage ratings for 14 subjects are plotted in Figure 11 as a function of the frames in condition (ii) perceived as figures. In the purple contour condition (i), the 0 value is due to the result that no subject reported to perceive the frames as figures, but all of them perceived the entire figure as made up (at 50%) either of transparent outlined geometrical elements or of overlapped surfaces. When the orange fringes were added to the frames, as described in condition (ii), more than 95% of the subjects perceived them as figure and the complementary regions as holes. The opposite was true in the condition (iii). But what is
The Emergence of Figural Effects in the Watercolor Illusion
(12)
(13)
757
(14)
Figure 12. Test stimuli for evaluating watercolor spreading in disambiguating grouping and figureground organization: purple contour only. Figure 13. Test stimuli for evaluating watercolor spreading in disambiguating grouping and figureground organization: orange fringe added within the crosses. Figure 14. Test stimuli for evaluating watercolor spreading in disambiguating grouping and figureground organization: orange fringe added in a complementary manner to Figure 13.
important for our purpose is that no subject in condition (ii) and (iii) perceived either the outlined or overlapped figures reported in condition (i). The overlapping figures were instead clearly reported in conditions (iv) and (v). The physical orange added to the frames destroyed the transparent outlined effect but not the overlapping figures. In both of the last conditions, less than 5% of the subjects perceived the frames. Results within a one way ANOVA revealed that the relative strength (in percent) of the frame being perceive as figure changes significantly by changing the conditions (F4,65=322.668, p<0.0001). In the Fisher PLSD post-hoc analysis all the conditions are significantly different (p<0.0001) except in conditions (i) X (iv) and (i) X (v). Given the strong links between coloration and figural effects, although, as shown, they can sometimes be decoupled, the aim of this section is to demonstrate that the watercolor effect is a powerful principle to create camouflage or to reverse figure-ground segregations induced by Rubin’s classical principles of figure-ground organization. 2.4. Experiment 4: Camouflage and Disambiguation The following experiment examines an effect observed first by Kanizsa (1980) [13]: the camouflage of some shapes (crosses) due to a special geometrical arrangement. He noticed that by making some Greek crosses touch one another in their vertexes, they tend to disappear inside a set of large intersected squares because of the good continuation principle that wins over the prägnanz principle belonging to the crosses (see Figure 12). This is a clear camouflage result quite
758
B. Pinna and M.P. Penna
similar to others present in nature. The question we want to answer in this experiment is: Can the watercolor illusion make the crosses emerge again, and can it create a stronger camouflage? 2.4.1. Subjects Fourteen undergraduate students who were naive to the purpose of the experiment participated. All had normal or corrected-to-normal vision. 2.4.2. Stimuli Each cross was composed of four squares, whose sides were 1.7 deg in length. The overall figure was 17.1 x 14.1 deg. To answer these questions, Figure 12 was varied in 5 different ways. (i) The purple-contour-condition was used as a control (Figure 12). (ii) Orange fringes were added to the inner edge of every cross so that the crosses were expected to appear due to disambiguating figureground organization in favor of the crosses (Figure 13). (iii) Orange fringes were added to the outside edge of every cross. In this way the small squares were expected to appear as figures (Figure 14). (iv) The orange color was physically and uniformly added to the surface of the crosses. (v) The orange color was physically added to each of the small squares. Every cross was composed by juxtaposing four squares, whose side was 1.72 deg long. The overall figure was 17.1 X 14.1 deg. The stimuli were hand-drawn by using a graphic tablet. The CIE x,y chromaticity coordinates, presentation conditions and observation distance were the same of the previous experiment. The procedure was the same as in the previous experiment. 2.4.3. Results In Figure 15, mean ratings (in %) of the crosses being perceived as figures are plotted as a function of the 5 edge conditions. With the purple contour only (i), the mean rating is 35%. This value increased to 92% after the orange fringes were added to the inner edges of the crosses (condition ii). When the orange fringes were added to the small squares (condition iii), mean ratings dropped to 2%. The physically colored conditions (iv and v) produced results not significantly different from the purple-contour-only condition. A one-way ANOVA revealed that the relative strength (in percent) of the crosses being perceived as figure changed significantly depending on the 5 edge conditions (F4,65=175.346, p<0.0001). In the Fisher PLSD post-hoc analysis,
The Emergence of Figural Effects in the Watercolor Illusion
759
Figure 15. Results of watercolor illusion used to camouflage and disambiguate: Means of the relative strength of crosses being perceived as figures for test conditions.
all the conditions were significantly different (p<0.0001), except for the purplecontour-only condition, the physical orange in crosses, and the comparison of purple-contour-only with physical orange in small squares. The watercolor effect may thus be used to better define areas, to disambiguate areas camouflaged within the ground by simultaneously defining their boundaries and giving them figure status as well as imparting coloration to the enclosed areas regardless of their shapes. The watercolor illusion can also be used to camouflage and split what would be unified if no colored fringe were added to create the illusion. The comparison between Figures 16, 17, and 18 demonstrate this case. In Figure 16, the horizontal rectangle is perceived watercolored and as a continuous stripe along its direction. It is also perceived as transparent and partially showing parts of two annuli underneath. In Figure 17, the integrity of the horizontal rectangle and of one of the previously perceived rings is broken due to the way the colored fringes are placed. Figure 18 shows a control with purple line only for Figures 16 and 17. The superiority of the watercolored areas relative to the physically colored ones is shown in a different way in Figure 19. The crosses are not perceived as a homogenous background but they appears as segregated figures, although the strong articulation of the squares by virtue of their chromatic variation and their size smaller than the crosses represent two figural principles acting against the watercolor illusion.
760
B. Pinna and M.P. Penna
(16)
(17)
(18) Figure 16. The horizontal rectangle is perceived watercolored, transparent and partially showing parts of two annuli underneath, and as a continuous stripe along its direction. Figure 17. Due to the way the colored fringes are placed, the integrity of the horizontal rectangle and of one of the rings perceived in Figure 16 is broken. Figure 18. A purple line only control for Figures 16 and 17.
The experimental evidence reported in this section calls for an inadequacy of traditional Gestalt organization principles in explaining watercolor illusion phenomenology. In turn, this requires new models of visual perception which, while being able to account for visual phenomenology, should also be consistent with neurobiological findings about visual cortex. To our knowledge, the only one model so far proposed fulfilling these two requirements is the FACADE neural model of 3D vision and figure-ground separation, introduced by Grossberg (see Grossberg 1994, 1997, 2000 [4,5,6]). We remark that, as this model is implemented by using the language of artificial neural networks (see, for instance, textbooks such as Fausett 1994 [2], Rojas 1996 [21], Gupta et al. 2003 [11]), it can be easily compared with the one of biological neural networks
The Emergence of Figural Effects in the Watercolor Illusion
761
constituting the visual cortex. In the next section we will shortly summarize the general principles underlying the model, as well as how it could explain our experimental findings. 3. The FACADE neural model of 3D vision and figure-ground segregation The FACADE model posits that parallel boundary grouping and surface fillingin processes are realized by the cortical interblob and blob streams, respectively, within cortical areas V1 through V4 (Grossberg, 1994, 1997). These boundary and surface processes exhibit complementary properties (Grossberg, 2000) and their interaction generates a consistent perceptual representation that overcomes the complementary deficiencies of each stream acting on its own. These complementary properties include the following: Boundaries form inwardly between pairs or greater numbers of inducers, are oriented, and are insensitive to contrast polarity; that is, boundaries pool contrast information at each position from opposite contrast polarities. Surfaces fill-in outwardly from individual lightness or color inducers in an unoriented fashion using a process that is sensitive to contrast polarity; that is, surfaces are visible. These boundary and surface processes are modeled by the Boundary Contour System (BCS) and Feature Contour System (FCS), respectively (Grossberg and Mingolla, 1985, [8,9]; Grossberg and Todorovic, 1988 [10]). FACADE theory proposes how two-dimensional monocular properties of the BCS and FCS may be naturally embedded into a more comprehensive theory of 3-D vision and figure-ground separation that was introduced in Grossberg (1994, 1997, 2004) [4,5,6]. In particular, FACADE theory proposes how brain processes that have evolved in order to represent the world in three-dimensions also enable us to perceive two-dimensional images as figures and backgrounds in depth. Some of these figure-ground mechanisms enable partially overlapping, occluding, and occluded image parts to be separated and completed in depth. The same mechanisms shed light on how the watercolor illusion can support a figural percept. In Figure 1, for example, the watercolor illusion segregates the colored frame in depth and gives it the appearance of a rounded figural surface. This rounded percept becomes stronger as the contrast ratio between the two colored lines is increased. Several factors contribute to these percepts within FACADE theory. One factor is that there are depth-specific and color-specific (including achromatic) Filling-In Domains, or FIDOS, in which depthful surface capture occurs in
762
B. Pinna and M.P. Penna
Figure 19. Although the strong articulation of the squares, by virtue of their chromatic variation and their size smaller than the crosses, represent two figural principles acting against the watercolor illusion, the crosses are not perceived as a homogenous background but, rather, appear as segregated figures. This result confirms the superiority of the watercolored areas to be perceived as figures relative to the physically colored ones.
response to depth-specific boundary representations. In particular, regions of different color can get filled-in on FIDO surface representations that represent different depths. The determination of figure and background can be traced to the spatial distribution and relative strengths of BCS boundaries and how they interact with surface inducers to selectively fill-in FIDO surface representations that represent different depths. In particular, when two colored lines of different contrast are contiguous, then three parallel rows of boundaries are generated, usually of progressively decreasing boundary strength. Such an array generates a spatially sparse version of a boundary web, which is a spatial array of boundaries that can restrict filling-in within relatively small surface regions. It was earlier predicted how a boundary web can elicit a percept of a rounded surface in depth (Grossberg, 1987 [3]). The main idea behind this predictive success can be summarized as follows. Consider a 2D shaded ellipse. How does such a 2D image generate a percept of a 3D curved surface? Such a 2D image activates multiple filters, each sensitive to a different range of spatial scales. Other things being equal, larger filters need more contrastive evidence to fire than do smaller filters. Likewise, larger filters can, other things being equal, binocularly fuse closer, and thus larger and more binocularly disparate images, than can smaller filters. Smaller filters can respond to smaller features and to farther, and thus smaller and less binocularly disparate
The Emergence of Figural Effects in the Watercolor Illusion
763
images than can larger filters. On the other hand, larger filters can respond to a wider range of disparities than can smaller filters. These disparity-selective properties of multiple-scale filters often go under the name of the size-disparity correlation (Julesz and Schumer, 1981 [12]). The multiple-scale filters then feed grouping cells which use cooperative-competitive interactions with which to select and complete boundary representations that are sensitive to different ranges of relative depth from the observer. These competitive interactions include the spatial competition that helps to explain how the watercolor effect occurs, In particular, these various depth-selective boundary representations selectively capture lightness and color features at FIDOs that fill-in the captured surface features at those depths, and create the boundaries of the regions within the surface features are contained. If some of these boundaries are weakened, as in the contrast-sensitive spatial competition described above, then color can flow out of a region to the extent that the boundary has been weakened. With these caveats in mind, consider how multiple scales would respond to a shaded ellipse. Other things being equal, smaller scales can fire more easily nearer to the bounding edge of the ellipse. As the spatial gradient of shading becomes more gradual with distance from the bounding edge, it becomes harder for smaller scales to respond to this gradient. Thus, other things being equal, larger scales tend to respond more as the distance from the bounding edge increases, so the regions nearer to the center of the ellipse look closer due to the size-disparity correlation. A similar thing happens, albeit with a more spatially discrete filter input, in response to a watercolor image such as the one in Figure 1. Here, just as in response to a shaded ellipse, there is a spatial array of successively weaker filter responses as the distance increases from the most contrastive edge of the display. These successively weaker filter responses activate boundary and surface processes much as one would expect from a spatially discrete version of a shaded ellipse, and these processes can generate a rounded appearance using the same size-disparity correlation mechanisms. A key new feature of the watercolor effect, which is due to the discrete changes in successive boundary contrasts, is that the boundaries can weaken one another to allow surface color to spread within the depth-selective boundaries that are formed in response to the multiplescale filter responses. That is why the interior of the watercolor region can look a little closer to the observer than the bounding edge. Because of this perceived depth difference, a region suffused with the watercolor illusion can have a stronger figural quality than one filled with a uniform color, which tends to look flat. This point was illustrated in the demonstrations of camouflage and disambiguation.
764
B. Pinna and M.P. Penna
Acknowledgements Supported by: Fondazione Banco di Sardegna, Alexander von Humboldt Foundation, PRIN ex 40% Cofin. es. 2005 (prot. 2005112805_002) and Fondo d’Ateneo (ex 60%) (to BP). I thank Massimo Dasara and Maria Tanca for assistance in testing the subjects. References 1. P. Bozzi, in Studies in Perception, Ed. Flores-D’Arcais, (Giunti-Martello, Firenze, 1975), pp. 88-110.
2. L.V. Fausett, Fundamentals of Neural Networks (Prentice Hall, Upper Saddle River, 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
NJ, 1994). S. Grossberg, Perception & Psychophysics 41, 97-116 (1987). S. Grossberg, Perception & Psychophysics 55, 48-120 (1994). S. Grossberg, Psychological Review 104, 618-658 (1997). S. Grossberg, Trends in Cognitive Sciences 4, 233-245 (2000). S. Grossberg, Behavioral and Cognitive Neuroscience Reviews, (in press, 2004). S. Grossberg and E. Mingolla, Psychological Review 92, 173-211 (1985). S. Grossberg and E. Mingolla, Perception & Psychophysics 38, 141-171 (1985). S. Grossberg and D. Todorovic, Perception & Psychophysics 43, 241-277 (1988). M.M. Gupta, L. Jin, N. Homma, Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory (Wiley, Hoboken, NJ, 2003). B. Julesz and R.A. Schumer, Annual Review of Psychology 32, 572–627 (1981). G. Kanizsa, Grammatica del vedere (Il Mulino. Bologna, 1980). D. Katz, Die Erscheinungsweisen der Farben und ihre Beeinflussung durch die individuelle Erfahrung. Zeitschrift für Psychologie, Ergänzungsband 7 (Barth, Leipzig, 1911), pp. 6-31. D. Katz, Die Erscheinungsweisen der Farben, 2nd edition (1930) (Translation into English: R.B. MacLeod and C.W. Fox, Eds., The World of Color (Kegan Paul, London, 1935)). K. Koffka, Principles of Gestalt Psychology (Harcourt Brace, New York, 1935). S. Morinaga, Archiv für die gesamte Psychologie 110, 309-348 (1942). B. Pinna, in Il laboratorio e la città. XXI Congresso degli Psicologi Italiani, Ed. V. Majer, M. Maeran and M. Santinello, (1987), p. 158. B. Pinna, G. Brelstaff and L. Spillmann, Vision Research 41, 2669-2676 (2001). B. Pinna, J.S. Werner and L. Spillmann, Vision Research 43, 43-52 (2003). R. Rojas, Neural Networks. A systematic introduction (Springer, Berlin, 1996). E. Rubin, Visuell wahrgenommene Figuren (Gyldendalske Boghandel, Kobenhavn, 1921). M. Wertheimer, Psychologische Forschung 4, 301-350 (1923).
CONTINUITIES AND DISCONTINUITIES IN MOTION PERCEPTION BAINGIO PINNA(1), RICHARD L. GREGORY(2) (1) Facoltà di Lingue e Letterature Straniere, Università di Sassari Via Roma 151, I-07100, Sassari, Italy, E-mail [email protected] (2) Department of Experimental Psychology, University of Bristol 8 Woodland Road, BS8 1TN, Bristol, United Kingdom New types of apparent motion effects depending on continuities and discontinuities placed along continuous or discontinuous boundaries are here illustrated. These effects suggest that global grouping processes (i.e., proximity, good continuation) may affect the local motion signals, and they are affected by them. The bidirectional interaction between local and global motion signals may be considered as the phenomenal result of a feedback between local and global motion processes. Keywords: Illusions, Gestalt grouping principles, Illusory motion, Motion perception, Local and global motion processes.
1. Grouping discontinuities In Figure 1 the squares, delineated by two white and two black edges each and grouped by proximity in two concentric rings, elicit an illusory counter rotation of the two rings while the head is moved towards the figure or away from it, and the gaze is fixed on the central dot (Pinna and Brelstaff, 2000 [12]). Each square shows a diagonal orientation polarity obtained by joining the two vertexes where black and white lines meet. The squares belonging to the two concentric rings have opposite orientation polarities. Direction-selective neurons at early stages of visual processing, signaling the speed of the diagonal orientation polarity of its preferred orientation through the receptive field (Grossberg, Mingolla and Viswanathan, 2001 [5]; Gurnsey, Sally, Potechin and Mancini, 2002 [6]; Morgan, 2002 [8]), may be responsible for the local motion vectors perpendicular to the orientation polarity (aperture problem; Nakayama and Silverman, 1988 [9]). The local motion signals within single squares group into a whole circular flow accordingly to the proximity factor. In Figure 2 the proximity grouping is not concentrically but radially oriented; while the single squares maintain their alternated orientation polarities from the outside to the inside of the figure (Pinna, 1990 [10]). Due to the 765
766
B. Pinna and R.L. Gregory
Figure 1.
Figure 2.
grouping factor, the counter rotating effect is reduced, and a waving or twisting global motion through the radial grouping now appears. In Figure 3 the square elements of Figure 1 are grouped circularly through proximity and similarity of shape, due to the opposite rhombic skew of each square between the two rings of elements. Within these conditions the counter rotating effect is even stronger than in Figure 1. The synergistic local motion signals, due to the orientation polarities and to the tilted sides of the skewed squares, may be responsible for the increased strength of the relative motion effect. In Figure 4 the radial grouping of the rhombic elements of Figure 3, due to proximity and good continuation (the sides of the elements create zigzag-like virtual contours), increases the strength of the waving and twisting apparent motion of Figure 2. If diagonal orientation polarities and tilted sides of the skewed squares are antagonistic the strength of the relative motion effect is strongly decreased (see Figure 5). The apparent rotation is annulled in Figure 6 where the orientation polarities are not circularly and radially arranged around the annuli made up of squares, like in figure 1, where the same side of each square is tangent to the circumference. In Figure 6 orientation polarities and squares have the same orientation. Under these conditions, on the basis of the previous theoretical suggestions the only possible apparent motion is not a rotation but a sliding motion which can be described as follows: (i) an upward motion of the stimulus results in an apparent motion to the left; (ii) similarly, a downward motion of the stimulus
Continuities and Discontinuities in Motion Perception
Figure 3.
Figure 4.
Figure 5.
Figure 6.
767
produces an apparent motion to the right; (iii) leftward and rightward motions result in upward and downward apparent motions, respectively. The apparent rotation is restored if four groups of the squares, equally oriented like in Figure 6, present four different orientation polarities circularly and radially arranged around the annuli (Figure 7). The apparent rotation, weaker than in Figure 1, depends on the fact that within each group the squares have the same orientation polarity. It is important to notice that instead of having four perpendicular sliding motion effects, the local motion vectors of the four set of squares group circularly inducing an apparent rotation. A second important remark is that orientations polarities (implicit orientations) are much stronger in inducing relative motion than explicit orientations of the square sides.
768
B. Pinna and R.L. Gregory
Figure 7.
Figure 8.
Linked to the previous effects are the so-called “phenomenal phenomena” first described by Gregory and Heard (1972) [4] and related to other phenomena as, for instance, the reverse phi and four-stroke motion (Anstis and Rogers, 1975, 1986 [1,2]) and the visual illusions based on single-field contrast asynchronies (Shapiro, Charles and Shear-Heyman, 2005 [13]). With the luminance modulation of an edge-striped grey rectangle, which has a dark stripe on the left side and a light stripe on the right, the entire figure shifts. When the background luminance is modulated the apparent motion is reversed. Briefly, by grouping differently discontinuous elements, the local motion signals group accordingly. 2. Grouping continuities The waving and twisting effect of Figure 4 is perceived (by zooming as in the previous Figures) as a serpentine radial motion in Figure 8, where the previous zigzag discontinuities of the checks are now replaced by continuous S-shape lines. In Figure 9, by increasing the number of lines of Figure 8, the line proximity induces, besides the serpentine apparent motion through the line orientation, the counter rotation of the concave and convex curves grouped to create a 3-D surface (Manca and Pinna, 2000 [7]). Figure 10 demonstrates that a slight waving or curving of each radial line is sufficient to elicit the apparent rotation effect. However, the serpentine motion is much reduced, and assimilated in the rotation. In brief, by varying the grouping of continuous lines the local and global motion signals change accordingly.
Continuities and Discontinuities in Motion Perception
Figure 9.
Figure 10.
Figure 11.
Figure 12.
769
3. Ungrouping continuities In Figure 11 the radial wavy lines are replaced by concentric wavy circumferences. By slowly rotating the figure around the centre, instead of moving the head towards the figure or away from it, a wavy continuous motion through the concave and convex surface, created by the concentric elements, is perceived (Manca and Pinna, 2000 [7]). The whole surface loses its rigidity (Nakayama and Silverman, 1988). By introducing more and more abrupt discontinuities within the concentric elements, e.g. replacing the circular elements with concentric decagons – Figure
770
B. Pinna and R.L. Gregory
Figure 13.
Figure 14.
12 - and hexagons – Figure 13 (Manca and Pinna, 2000 [7]), the smoothed and curved non rigid motion of Figure 11, induced by the rotatory movement of the figure, becomes more and more as rigid motion in different directions (roughly around themselves) of the separated and independent sectors. Related effects are reported by Weiss and Adelson (2000) [16] and by Sparrow and Stine (1998) [14]. These are obtained by grouping equally oriented sides of concentric polygons, and are separated by radial illusory contours across the abrupt discontinuities. Briefly, by ungrouping contour continuities, the holistic apparent motion continuity is also ungrouped. 4.
Continuities (real, illusory and/or virtual) with discontinuities along them
When the boundaries of continuous straight and wide stripes are undulated, producing sinusoid contours on both sides of the stripe, both the continuity of the stripes and the sinusoidal discontinuities along their boundaries are independently perceived. If the eye follows the tip of a pen moving vertically (up and down) along the space between two diamonds, whose inner and outer boundaries show sinusoid (although not regular) discontinuities (see Figure 14), the discontinuities segregate themselves from the wide contours of each diamond, and appear to run along the boundaries of the diamonds in the opposite direction to the eye motion. As a consequence the running discontinuities of the two diamonds appear to converge and diverge while they flow along converging and diverging boundaries. The apparent motion of the discontinuities induces a global deformation of each diamond that appears to pulsate, expanding and
Continuities and Discontinuities in Motion Perception
Figure 15.
771
Figure 16.
contracting itself along the short axis (Manca and Pinna, 2000). The diverging/converging apparent motion of the sinusoids induces, respectively, the expanding/contracting effect of the continuous wide contours of the diamonds. This effect can be better appreciated when the tip of the pen is moved vertically along the inner long axis of each diamond. When the eye follows the tip of a pen moving vertically, both effects − the running of the sinusoids along the boundaries of the diamonds and the global dynamic shape distortion of the diamonds − can be clearly perceived (Figure 15) where the stripes are crossed to create a grid with oblique fences. By slowly moving the eyes randomly, the running effect of discontinuities along continuous boundaries can be clearly perceived, even if they are abrupt disconnections, as illustrated along the boundaries of the eight-like shapes of figure 16 (Spillmann and Pinna, 2000 [15]). In Figure 17 the discontinuities created by the misaligned terminators of the gray stripes induce, even if the eyes do not move, the slow vertical (up and down) gliding motion in counter phase along the black stripes. Some stripes appear to go slowly up, some others to go down, as though they are looking for their impossible alignment (Spillmann and Pinna, 2000 [15]). In Figure 18 (Pinna, 1990 [10]) the discontinuities, due to the line-ends of displaced gratings inducing an illusory diamond, show strongly both the effects described previously for Figure 14. The discontinuities appear to run along the illusory contours, in the opposite direction to the eye movement, when it follows the tip of a pen moving vertically along the longest axis of the diamond. Their diverging and converging apparent motion relative to the eye movement induce, respectively, the expanding and contracting global effect of the illusory diamond.
772
B. Pinna and R.L. Gregory
Figure 17.
Figure 19.
Figure 18.
Figure 20.
Both motion effects are stronger than in the previous conditions (figures 14 and 15) and disappear with increasing the width of the illusory contours, or by overlapping the line terminators. Within these variations the dependence of the illusory contours from the line terminators plays a basic role. The global shape distortion effect becomes very weak, while the running effect is stronger than in Figure 18, when the illusory contours are not abrupt directional variations, as in the diamond, but sinusoidal-like shapes as illustrated in Figure 19. The discontinuities appear to run along the illusory contours, while the inner illusory shape appears stationary, i.e. the global shape distortion is not perceived or is much reduced (Pinna, Anstis and Macleod, 2001 [11]). By following the tip of a pen while it moves vertically up and down, discontinuous dash lines appear to run within and along the two undulated roads,
Continuities and Discontinuities in Motion Perception
Figure 21.
773
Figure 22.
in opposite direction to the eye motion (Figure 20, Spillmann and Pinna, 2000 [15]). The interaction between continuities and discontinuities, with differently defined motion signals, appears once more to be the basic factor for these motion effects. Again, discontinuities such as dash lines appear to run along continuous virtual circles in opposite direction to the eye motion (Figure 21, Spillmann and Pinna, 2000 [15]). The perceptual result is similar to the one illustrated in Figure 16. These effects are likely connected to the booge-woogie illusion (Cavanagh and Anstis, 2002 [3]) When the gaze is fixed on the pen tip, while it is slowly moved horizontally along the middle of the upper triangle of the X-like shape of Figure 22, discontinuities due to separated black squares appear to run along the virtual axis of the X-like shape, induced by the continuation of square elements. A global distortion of the two arms of the curved X-like shape is also perceived. More precisely, by following with the eye the pen tip moving from left to right, the squares of the left arm of the X-like shape appear to run upwards while the squares of the right arm run downwards. As a consequence, and according to the local motion of the squares, the whole arcs move one relative to the other. The effects shown in this section demonstrate that the local motion signals of discontinuous elements (sinusoids, indentations, displaced line terminators and dash lines) may affect the global motion signals, inducing a shape distortion; however the opposite is also true, the whole shape-whether real, illusory or virtual- determines the direction of the local motion signals. Nevertheless, the discontinuities along different kind of contours segregate from the contours, i.e. the discontinuities are perceived as independent. They move independently
774
B. Pinna and R.L. Gregory
(apparent running along the contour) even if they affect the global contour shape. Similarly, the global shape is also perceived independently from the discontinuities on its contours. Continuities of the contours and discontinuities of the elements split up (phenomenal scission), appearing reciprocally independent, but at the same time they affect each other. The direction of the running effect is opposite to the eye motion and flows along the contours, influencing the induced global shape. Converging and diverging of the running direction is accompanied by the expanding and contracting of the contours along which the discontinuities run. 5. Conclusions Summing up, we showed two main kinds of apparent motion effects, perceptually distinct but reciprocally influencing each other in their motion direction. (1) Continuities (gradual or abrupt directional or luminance variations within lines) and discontinuities (sinusoids, indentations, displaced line terminators and dash lines), placed along continuous contours, appear to run along them, even if they are illusory or virtual. (2) The running apparent motion may affect the global shape of the contours where the microelements run, inducing a dynamic distortion effect of the whole shape. These effects may depend on the: (a) grouping of concentrically and radially discontinuities made up of squares with two black and two white sides arranged to create opposite orientation polarities (Figures 1-7); (b) grouping of concentrically and radially discontinuities made up of elements whose luminance modulates producing an illusory shift; (c) grouping of radially and circularly continuous lines (Figures 810); (d) ungrouping of concentrical distorted circles by replacing them with concentric decagons and hexagons and as a consequence inducing the segregation of sectors of the whole shape (Figures 11-13); (e) grouping of discontinuities along continuous contours either real, illusory or virtual (Figures 14-22). Through the previous categories we suggest that: (a) local motion signals of single elements (discontinuities) may be affected by global grouping processes; (b) local motion signals group, on the basis of spatial grouping factors like proximity and good continuation; (c) conversely, the global motion effect is influenced, in its turn, by local motion signals and by their grouping; they induce a dynamic distortion effect within the whole shape. This local and global reciprocal bidirectional determination of the main motion vectors may be considered as the phenomenal result of a complex feedback between local and global motion dynamics (likely related to the size of V1 and MT/MST receptive fields and to the feedforward and feedback relationship between these motion
Continuities and Discontinuities in Motion Perception
775
areas), which, although affecting each other in their resultant motion vector, keep their perceptual (phenomenal scission) and directional differences. Neurophysiological mechanisms underlying these phenomena and, more particularly, the feedback process remain to be studied. Acknowledgements Supported by: PRIN ex 40% Cofin. es. 2005 (prot. 2005112805_002), Fondo d’Ateneo (ex 60%), Fondazione Banco di Sardegna, and Alexander von Humboldt Foundation (to BP). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
S.M. Anstis and B.J. Rogers, Vision Research 15(8-9), 957-961 (1975). S.M. Anstis and B.J. Rogers, Perception, 15(5), 627-640 (1986). P. Cavanagh and S. Anstis, Perception 31, 1005-1011 (2002). R.L. Gregory and P. Heard, Quarterly Journal of Experimental Psychology, 35A, 217-237 (1972). S. Grossberg, E. Mingolla and L. Viswanathan, Vision Research 41, 25212553 (2001). R. Gurnsey, S. Sally, C. Potechin and S. Mancini, Perception 31, 1275-1280 (2002). S. Manca and B. Pinna, Perception 29(Supplement), 64, (2000), (ECVP2000, abstract). M. Morgan, “Running rings around the brain”, The Guardian Thursday, 24 January (2002), 2002;http://www.guardian.co.uk/Archive/Article/0,4273,4341518,00.html. K. Nakayama and G. Silverman, Vision Research 28, 747-753 (1988). B. Pinna, Il Dubbio sull'Apparire (Upsel Editore, Padua, 1990). B. Pinna, S.M. Anstis and D.I.A. Macleod, Journal of Vision 1(3), 373 (2001), http://journalofvision.org/1/3/373, DOI 10.1167/1.3.373. B. Pinna and G.J. Brelstaff, Vision Research 40, 2091-2096 (2000). A.G. Shapiro, J.P. Charles and M. Shear-Heyman, Journal of Vision 5, 764782 (2005). J.E. Sparrow and W.W. Stine, Vision Research 38(4), 541-556 (1998). L. Spillmann and B. Pinna, Perception 29(Supplement), 112-113 (2000), (ECVP2000, abstract). Y. Weiss and E.H. Adelson, Perception 29(5), 543-566 (2000).
This page intentionally left blank
MOTHER AND INFANT TALK ABOUT MENTAL STATES: SYSTEMIC EMERGENCE OF PSYCHOLOGICAL LEXICON AND THEORY OF MIND UNDERSTANDING D. ROLLO, F. BUTTIGLIERI Department of Psychology, University of Cagliari
In recent years, a number of studies that have examined how social experiences are related to children’s theory of mind development, have found that: (1) the frequency of mothers’ mental state utterances used in mother-child picture-book reading, is correlated with children’s theory of mind abilities; (2) mothers’ use of cognitive terms is related more strongly to children’s theory of mind performances than the mothers’ references to other mental states, such as desires or emotions (Adrian, Clemente, Villanueva, Rieffe, 2005; Ruffman, Slade, Crowe, 2002; Taumoepeau, Ruffman, 2006; Dunn, 2002). Despite the evidence for the role of mothers’ language, there is disagreement over how exactly it improves children’s theory of mind development. In short, mentalistic comments contain distinctive words, grammatical constructions and pragmatic features. The question is, however, which factor is critical (de Rosnay, Pons, Harris, Morrell, 2004). The present study addresses this issue and focuses on relationship between mothers’ mental state terms and children’s performances in theory of mind tasks (emotion understanding and false belief tasks). Mothers were asked to read some pictures to 10 children between 3;0 and 5;0. Among the different mental state references (perceptual, emotional, volitional, cognitive, moral and communicative), it was found that the frequency and variety of mothers’ mental state words were significantly associated with children’s mental lexicon. In addition, emotional terms correlated positively with children’s false belief performance. Kind of emotional words that are used by the mothers with reference to the Italian language will be discussed. Keywords: theory of mind, children mental lexicon, mother’s language.
1. Introduction Several studies have shown place language, and in particular the use of it in social interactions, at the hard core of the processes involved in mind understanding. “Primitive access to the social-cultural world is available through participation in its routines, but access to the ways in which the world semiotically structures concepts, ideas, frames, and theories is available only through language” (Nelson, 1996, [16,p. 312]). Language constitute systems of 777
778
D. Rollo and F. Buttiglieri
symbols conventionally used in constructions that convey meaning between people. The people use, and children learn to use, varying systems for talking together in different settings. “The basic assumption here is that language is constituted of socially shared symbols […] and the child is embedded from the outset within a social world that becomes increasingly articulated, definite, symbolic and systematized. In this view language itself is one of many cultural systems” (Nelson, Kessler Shaw, 2002, [17,p. 27]). Children can learn to think about their experience and to interpret it from the conversations with their mothers. Parents tend to treat the children as social partners and conversationalists almost from birth, and children respond with attentive looks, gurgles, smiles. This practice is important to the children’s entering into meaningful communicative exchanges (Astington, Jenkins, 1999) [2]. A cognitive development domain of great current interest is children’s theory of mind, and the semantic domain corresponding to the purported theory is that of internal state terms. The development of mental state words has been investigated by a number of researchers (Bartsch, Wellman, 1995 [3]; Wellman, 1991 [24]; Wellman, Bartsch, 1988 [25]; Bretherton, Beeghly, 1982 [6]) for clues to children’s understanding of the mind under the assumption that the use of such words (especially know and think) to refer to internal states reflects an organized theory of those states (Nelson, Kessler Shaw, 2002) [17]. Other investigators (Taumoepeau, Ruffman, 2006 [22]; Adrian, Clemente, Villanueva, Rieffe, 2005 []; de Rosnay, Pons, Harris, Morrell, 2004 [1]; Pons, Lawson, Harris, de Rosnay, 2003 [20]; Ruffman, Slade, Crowe, 2002 [21]; Dunn, 2002 [9]; Dunn, Bretherton, Munn, 1987 [10]) have examined the relations between mother’s mental state language, produced during parent-child book reading, and children’s psychological lexicon, but also the connection in the children between mental state language and performance on tests of theory of mind understanding (e.g. false-belief tasks, Wimmer, Perner, 1983 [27]). In particular, there is a growing body of evidence supporting a social interactionist framework in which parents input facilitates the development of children’s social understanding and in which parental use of mental state language plays an important role in the development of false belief (Ruffman, Slade, Crowe, 2002 [21]; Taumoepeau, Ruffman, 2006 [22]). For instance, Ruffman, Slade and Crowe (2002) assessed the relation between mother mental state language and child desire language and emotion understanding in 15-24month-olds in a study in which mothers were asked to talk about a series of pictures with their children. The results of this study demonstrated that mother use of desire language with 15-month-old children uniquely predicted a child’s later mental state language and emotion task performance. Ruffman and
Mother and Infant Talk about Mental States: Systemic Emergence …
779
colleague allowed to clear in some important aspects the role for mothers’ mental states utterances in facilitating children’s subsequent theory of mind at all three sets of time points of the study. This relation held even when many potentially mediating variables were accounted for, including the children’s language ability, their initial social understanding (as manifest in their initial theory of mind and mental state language), their age and the mothers’ educational background. Also other studies found that it is not a specific mental state term category that correlates with later children’s theory of mind performance, rather than a composite series of references (mental state terms like think, know, want, hope), while other aspects of mothers’ language, like descriptive or causal comments and links to child’s experience, seem to have a less influence on child’s performances (Booth, Hall, Robison, Kim, 1997 [5]; Wellman, Woolley, 1990 [26]; Harris, Hutton, Andrews, Cooke, 1989 [11], Beeghly, Bretherton, Mervis, 1986 [4]). Recently Ruffman, Slade and Crowe (2002), and Taumoepeau and Ruffman (2006), showed that mothers refer most frequently to desire terms when the children are younger, while as soon as their children grow up they increase the use of belief and knowledge references. References to think and know increased with age although the proportion of desire terms to think and know may vary considerably for individual children. Thus, before 2 years, mother input about desire may be a mechanism by which children’s emerging implicit understanding about mental life is made explicit. This mechanism can be conceptualized within the zone of proximal development such that mothers’ use of specific types of mental state language at critical points in the child’s development bootstraps the child’s social understanding (Vygotskij, 1934) [23]. Taumoepeau and Ruffman have examined also the relation between mother’s mental states references and later child’s desire and emotion understanding. With regard to this issue, the most important result was that mothers who talk about psychological themes promote their children’s mentalstate understanding. The effect of maternal language is not restricted to false belief understanding. It also applies to the later understanding of belief-based emotions. Why, then, do mother references to desire at 15 months relate to the development of later child desire and emotion understanding? Part of the explanation can be accounted for by general word learning, such that children will learn words that they most consistently hear in their environment. That is, children learn about mental state terms in the same way that they learn about ordinary language, perhaps even without appealing to any conceptual advances in their ability to understand these words as referring to mental states
780
D. Rollo and F. Buttiglieri
(Taumoepeau, Ruffman, 2006 [22]; Peterson, Slaughter, 2003 [19]; Huttenlocher, Haight, Bryk, Seltzer, Lyons, 1991 [13]). According with Ruffman, Slade and Crowe (2002), Harris, de Rosnay and Pons (2005) found that mothers’ mentalistic descriptions predicted children’s correct emotion attributions even when the sample was restricted to children who had mastered the simpler false-belief task. Harris, de Rosnay and Pons (2005) investigated whether mothers’ mentalstate discourse is linked to children’s performance on a more demanding task typically mastered at around 5 or 6 years of age. Recall the story of Little Red Riding Hood: only around the age of 5 or 6 years do many children realize that Little Red Riding Hood feels no fear of the wolf when she knocks at the door of grandmother’s cottage. In a study of children ranging from 4 1/2 to 6 years de Rosnay, Pons, Harris and Morrell (2004), found that mothers’ use of psychological lexicon when describing their children and their children’s own verbal ability were positively associated not only with correct false belief attributions, but also with correct emotion attributions in tasks utilizing stories akin to that of Little Red Riding Hood (Harris, de Rosnay, Pons, 2005) [12]. Astington and Jenkins (1999) found that preschoolers’ theory-of-mind performance was not a predictor of subsequent gains in language. Rather, the reverse was true: language ability was a good predictor of improvement in theory-of-mind performance. Children with superior language skills— particularly in the domain of syntax—made greater progress over the next 7 months than other children did in their conceptualization of mental states. The claim that language makes a difference for children’s developing theory of mind is convincing: children’s own language abilities predict their rate of progress in understanding the mind, and their access to conversation, especially conversation rich in mentalistic words and concepts, is an equally potent predictor. There is disagreement over how exactly the children and mother languages help the development of theory of mind. These studies show that, even if there is a clear relation between mother’s attitude to talk about mental states and a variety of child social understanding measures (mental state language, emotions and beliefs understanding), there’s still not clearness (1) if this relation is a causal-only relation (2) if this relation is guided from specific aspects of mother’s language. In sum, psychological lexicon contains distintive words (e.g., think and know), grammatical constructions (e.g., embedded propositions), and pragmatic features (e.g., the enunciation of individual perspectives). Which factor is critical? (Harris, de Rosnay, Pons, 2005) [12].
Mother and Infant Talk about Mental States: Systemic Emergence …
781
Therefore, the third aim of this study was to look more closely into the contribution of beliefs, desire and emotion usage in maternal language to the prediction of children’s theory of mind. The present study can be situated within Vygotskij’s thesis that culture and society (e.g. family) play an important role in facilitating the acquisition of higher order mental functioning. In particular, the cooperative task of conversation enables the child to internalize ways of thinking through exposure to conversation about ‘‘thinking’’ with adult partners. There were two goals in this study. First, we examined mother’s mental state lexicon, in order to describe its main characteristics in relation with child’s gender and age, and to test the relation between the mothers’ frequency of mental states utterances and children’s false belief performance. Second, we examined some aspects of child’s development involved in psychological states’ understanding, to test the relation with child’s age and gender. The main hypothesis alludes to a significant relation between mother’s narratives, with particular reference to the frequency and variety of psychological state terms, and child’s social understanding development, valued in terms of emotional-cognitive abilities, and in terms of general linguistic competence, with special reference to the ability in the use of mentalistic terms. 2. Method 2.1. Subjects The study has been realised with 10 children of 4 to 5-years old (mean age 4.5 years old) and their mothers; boys and girls were equally represented. The children were conventionally divided in two age-groups, (mean age 4.1, mean age 4.9- years old), with the aim to test possible differences related to child’s age and gender. The 10 mother-child dyads, residing in the city of Carbonia (Cagliari), belonged to a homogeneous social-economic level (middle class), and all the children attended a maternal school. 2.2. Materials and Procedure The 10 dyads were examined through three observation sessions, during which were assembled data both about children’s abilities in emotions and beliefs understanding and their use of mental state language, and about mothers’ mental state language main characteristics. In particular, children were examined individually through: (1) emotionphotographs and schematic faces-recognition tasks and emotion-thoughts, desires and situations caused-identification tasks; (2) first order false-belief task;
782
D. Rollo and F. Buttiglieri Table 1. Statistical correlation between mothers’ and children’s mental state language.
emotional positive .857 emotional negative Cognitive perceptual moral judgment -.728 obligation volitive -.665 ability physiological affect expression communicative
communicative
Affect expression
physiological
ability
volitive
obligation
Moral judgment
perceptual
cognitive
Emotional negative
Emotional positive
CHILD’S MENTAL STATE LANGUAGE MOTHER’S MENTAL STATE LANGUAGE
-.705 .681
-.900 -.718 .703
.748 .703 .068 .099
.705 .077
(3) picture-book reading task, through which it was possible to estimate children’s mental state language abilities and their way to talk about emotions; (4) for linguistic knowledge the children have been submitted individually to a session with the evaluation test, TVL (Cianchetti, Sannio Fancello, 1997) [7], that investigates in all the specific functional sectors of the language, through comprehension and production measures. With regard to points (1) and (2), were used interactive tests from the CDROM “Autism and cognitive-emotional competences” (Pinelli, Santelli, 2005), which consist in a valuational-training instrument used for the analysis of cognitive and emotional abilities, through different activities presented as a sort of animated cartoon; while with regard to point (3), relating to the valuation of child’s psychological lexicon, was used a story presented to the children through 21 images arranged in chronological order, which were realized finding inspiration in the picture-book “Frog, where are you?” (Mayer, 1969) [14]. Finally, mother’s narratives were examined through a reading-pictures task: they were asked to read 20 pictures to their children, characterized both of emotional and mentalistic contents, concerning to everyday life common aspects (a girl hugging her mother, some children making a race, a sick boy in his bed, a mother making a cake, etc.), with the aim to get important information about mother’s mental state language, turned to her child in an interactive and narrative context.
Mother and Infant Talk about Mental States: Systemic Emergence … mother
783
child
30 25 20 15 10 5
tot mental states
communicative
affect expression
physiological
ability
volitive
obligation
moral judgment
perceptual
cognitive
emotional negative
emotional positive
0
Figure 1. References to mental states used from children and their mothers.
Mothers’ and children’s narratives were audiotaped and transcribed, and later codified respect to the variety and the frequency of mental state terms used during the narratives, through a coding manual made up from different mental state references (emotional positive and negative states references, affect expression references, cognitive states references, volitive states references, perceptual states references, moral judgment and obligation states references, ability states references, physiological states references and communicative terms). 3. Results One of the most important goals of this study was to describe the main characteristics of maternal language produced in a dyadic-interactive context between mothers and their children through a picture-reading task, and even less to verify how maternal narrative style, with special regard to psychological states references, could produce significant effects on general theory of mind child’s performances, and in particular on his ability to use a specific language to refer to his own and others’ internal states. The results show that maternal narrative style produces significant effects not so much on general theory of mind children’s abilities (for example the falsebelief understanding), as rather on child’s ability to use a mental states language referred.
D. Rollo and F. Buttiglieri
784
mothers of 4.1 years-old children
mothers of 4.9 years-old chidren
30 25 20 15 10
tot mental states
communicative
affect expression
physiological
ability
volitive
moral judgment
perceptual
cognitive
emotional negative
emotional positive
0
obligation
5
Figure 2. References to mental states used from mothers in relation to children’s age.
male's mothers
female's mothers
30
25
20
15
10
tot mental states
communicative
affective expression
physiological
ability
volitive
obligation
moral judgment
perceptual
cognitive
emotional negative
0
emotional positive
5
Figure 3. References to mental states used from mothers in relation to children’s gender.
In particular, this effect seem to be very marked in regard to emotional terms’ production: maternal use of both positive and negative emotions references and of affect expression references (like smile, cry, hug, kiss) seem to modulate children’s ability to talk about emotions, correlating positively to the same terms’ categories used by children (see Figure 1 and Table 1). As expected, and consistent with the results of Taumoepeau and Ruffman (2006), we found that mothers talked more frequently about emotions when children were younger, with talk about beliefs increasing.
Mother and Infant Talk about Mental States: Systemic Emergence …
785
Mothers seem to be inclined to modulate their language according to children’s age and gender, for example preferring the use of emotional terms when children are more young, and increasing little by little the use of cognitive terms when they are growing up, or referring most frequently to positive emotions talking with girls rather than they do talking with boys (see Figure 2 and Figure 3). With regard to this aspect, it is interesting to underline how similar inclinations were found also in children’s narratives, as a further confirmation of the fact that the type of language used from mothers affects upon the development of important children’s language abilities, supporting them not only in the understanding of many different and ever more complex mental states, but also in the ability to refers them verbally. As we have been emphasizing throughout this paper, the theory of mind understanding is not an individual construction. It is a collaborative construction that continues over many years, indeed throughout life. References 1. J.E. Adrian, R.A. Clemente, L. Villanueva, F. Child Lang. 32, 673-686 (2004). 2. J.W. Astington, J.M. Jenkins, Developmental Psychology 35, 1311-1320 (1999). 3. K. Bartsch, H.M. Wellman, Children talk about the mind (Oxford University Press, New York, 1995).
4. M. Beeghly, I. Bretherton, C.B. Mervis, British Journal of Developmental Psychology 4, 247-260 (1986).
5. J.R. Booth, W.S. Hall, G.C. Robison, S.Y. Kim, Journal of Psycholinguistic Research 26, 581-603 (1997).
6. I. Bretherton, M. Beeghly, Developmental Psychology 18, 906-921 (1982). 7. C. Cianchetti, G. Sannio Fancello, TVL/Test di Valutazione del Linguaggio, livello prescolare (Erickson, Trento, 1997).
8. M. de Rosnay, F. Pons, P.L. Harris, J. Morrell, British Journal of Developmental Psychology 22, 197-218 (2004).
9. J. Dunn, in Growing points in developmental science, Ed. W.W. Hartup, R.K. Silbereisen, (US Psychological Press, Philadelphia, 2002).
10. J. Dunn, I. Bretherton, P. Munn, Developmental Psychology 23, 132-139 (1987). 11. P.L. Harris, C.N. Hutton, G. Andrews, T. Cooke, Cognition and Emotion 3, 379400 (1989).
12. P.L. Harris, M. de Rosnay, F. Pons, Current Directions in Psychological Science 14, 69-73 (2005).
13. J. Huttenlocher, W. Haight, A. Bryk, M. Seltzer, T. Lyons, Developmental Psychology 27, 236-248 (1991).
14. M. Mayer, Frog, where are you? (Dial Press, New York, 1969). 15. E. Meins, C. Fernyhough, R. Wainwright, M. Das Gupta, E. Fradley, M. Tuckey, Child Development 73, 1715-1726 (2002).
786
D. Rollo and F. Buttiglieri
16. K. Nelson, Language in cognitive development: The emergence of the mediated mind (Cambridge University Press, New York, 1996).
17. K. Nelson, L. Kessler Shaw, in Language, literacy and cognitive development. The 18. 19. 20. 21. 22. 23. 24.
25. 26. 27.
development and consequences of symbolic communication, Ed. E. Amsel, J.P. Byrnes, (Erlbaum, Mahwah, NJ, 2002). J. Perner, T. Ruffman, S.R. Leekam, Child Development 65, 1228-1238 (1994). C.C. Peterson, V. Slaughter, Cognitive Development 18, 399-429 (2003). F. Pons, J. Lawson, P.L. Harris, M. de Rosnay, Scandinavian Journal of Psychology 44, 347-353 (2003). T. Ruffman, L. Slade, E. Crowe, Child Development 73, 734-751 (2002). M. Taumoepeau, T. Ruffman, Child Development 77, 465-481 (2006). L.S. Vygotskij, Myšlenie i re , (Gosudarstvennoe social’no-ekonomi eskoe izdatel’stvo, Moskva-leningrad, 1934), (It. transl.: Pensiero e linguaggio, Laterza, Roma-Bari, 1990). H.M. Wellman, in Natural theories of mind. Evolution, Development and Simulation of Everyday Mindreading, Ed. A. Whiten, (Basil Blackwell, Oxford, 1991), pp. 19-38, (It. transl.: in La teoria della mente, Ed. L. Camaioni, Laterza, Bari, 1995). H.M. Wellman, K. Bartsch, Cognition 30, 239-277 (1988). H.M. Wellman, J. Woolley, Cognition 35, 245-275 (1990). H. Wimmer, J. Perner, Cognition 13, 103-128 (1983).
CONFLICT IN RELATIONSHIPS AND PERCEIVED SUPPORT IN INNOVATIVE WORK BEHAVIOR
ADALGISA BATTISTELLI (1), PATRIZIA PICCI (1), CARLO ODOARDI (2) (1) Department of Psychology and Cultural Anthropology, University of Verona, Italy E-mail: [email protected] (2) Department of Psychology, University of Florence, Italy In recent years, the idea that innovation is one of the determining factors in the efficacy and survival of organizations has been strongly consolidated. Individuals and groups within the various organizations undertake specific creative activities with the express intention of deriving direct benefits from the changes with regard to the generational phase of ideas. Innovative Work Behavior (IWB) is a complex behavioral pattern which consists of a set of three different tasks, namely, idea generation, idea promotion and idea realization. Considering the scant attention that has been paid to date to the potentially different role of antecedent factors in the various phases of innovative behavior, the aim of the present work was to examine the combined conflicting and supportive roles on innovation within the three stages of IWB. The results obtained from a sample of 110 Public Elementary School teachers confirm, as expected, that in the realization phase there are a positive influence from conflicting and supportive roles on innovation and a positive influence from support for innovation also in the phase of idea promotion; whereas, unexpectedly, a positive influence from conflicting is exercised in the phases of idea generation. Keywords: innovation, innovative work behavior, organizational support, antecedents of innovation.
1. Introduction In recent years, the idea that innovation is one of the determining factors for the efficiency and survival of an organization, has become more and more widespread. A number of different strategies exist, by which modern organizations face changes in the socio-economic context. These include the decentralization of the productive process (the acquisition of smaller firms, made up of spin offs), the adoption and implementation of new technologies that permit a qualitative and quantitative leap, in terms of the productive process potential, straightforward internal innovation and covering areas such as processes, products, human resource management and other specific variables and services for the organization. Whatever strategy an organization ultimately decides to adapt, must be cognizant of the complex dynamic of human resource management that the
787
788
A. Battistelli et al.
process of change or innovation implies. Even the most strategic and detailed economic and financial plans can be destined to fail, due to a lack of attention to the so-called “soft” dimension of the innovative process, strongly linked as it is to communication and trust [32]. The importance of these two processes is often underestimated, even though it is strongly effects the level of staff commitment, motivation and identification with the organization and work. When the specific organizational reality is looked at under the microscope, we encounter through observation the pure process of internal innovation, in which individuals, wards, departments and working teams/teamwork are involved. Together, they often play a deciding role in initiatives and the driving force for varying kinds of innovation [35] which thus appear as emergent (on theories of emergence see, for instance, [8,14,24]). By observing the different organizational realities, it becomes evident how traditional logic that views external pressures, such as market demands and increasingly challenging profit margins, as a strong incentive for supporting innovation, must compare itself with the potentially negative effect on creativity. The beneficial effects of diversity/heterogeneity, in terms of people’s knowledge and competence in the generation of new ideas and the quality of decisions taken, assume a negative value, if one considers their potentially divisive strength. The necessity of giving employees financial rewards by way of acknowledgement for their efforts at innovation, becomes critical at that point in which the effort itself threatens to overturn the positive effects of intrinsic motivation, which is usually a guiding factor and predictor of personnel’s creativity in the workplace. In the light such considerations, innovation appears to be a complex phenomenon which requires a better understanding of what facilitates it within organizations. Furthermore innovation, being a systemic process characterized by emergence, forces us to consider that even the relationships which naturally develop in the surrounding psychosocial contest, and all consequential aspects of innovation at the level of individual and social analysis, need to be kept under tight control. Numerous studies have attempted to enquire into “antecedent” factors of innovation. They have shown not only how highly improbable it is to succeed in identifying the unique factors that facilitate or inhibit all types of innovation but also how it may even be considered vain to expect that the innovative process will essentially develop in the same way in all organizational contests [3]. For many reasons, the School can be considered a mirror and expression of deep and radical changes in contemporary society. Due to huge changes in the
Conflict in Relationships and Perceived Support in Innovative Work Behavior 789
political and social framework of the State over the last hundred years, a great number of reforms within the Italian School System, took place one after an other. These included the 1877 “Coppino Law” on obligatory school attendance and the reform on taking an Entrance Exam to gain access to the final three years of Primary School in 1923. A further reform took place in 1934 under Mussolini’s Regime, with the so-called Nationalization of School, which in effect, transformed it into a State-controlled instrument for the “education” of citizens. At the other end of the democratic spectrum, reforms during the 1970s led to the integration of students with varying degrees of handicap and the introduction of Special Education Teachers into classes. More recent reforms include the abolition of the Primary School Exam and the reorganization of socalled Scholastic Cycles. Since time immemorial, the theme of change in School has always been of real interest to teachers and students, but neither can the ongoing relationship between a succession of socio-political events nor the innovations regarding scholastic programs, structure and organization be denied [7]. Therefore, innovation of necessity, plays a strong role in the educational sector, in which, because of sudden radical changes to programs and organizations, often linked to a succession of socio-political events, it has become necessary for teachers to continually adjust to the latest change. We therefore witness organizational changes in working teams, whereby every teacher is called to belong to a number of different teams that engage in varying projects, developing and realizing a huge number of different activities, at any given time. This ultimately leads to inherent changes in the understanding of the precise role of individual teachers, in particular referring to their degree of discretion in choosing the most effective ways, in which to perform their job. The current research involved seven Primary Schools, belonging to two Elementary School Districts just outside Florence, in which, for the last few years, a number of innovative projects have been underway, which may be described as follows: Intercultural projects, whose aim is the integration of schoolchildren from other cultures into the Italian system, through the use of fables as a teaching source. Projects linked to reading, with the involvement of various bodies and educators from outside the school. Projects linked to local community knowledge and geared towards the world of Arts, Culture and Tourism, with the aim of educating children to a
790
A. Battistelli et al.
knowledge of their local environment, by increasing capacities of critical observation, possibly through the use of multi-media technologies. In the light of the abovementioned considerations, it becomes interesting to examine the theme of innovation in the educational sphere, where sudden radical changes to programs and organizations, often linked to a succession of sociopolitical events, have become the order of the day. It thus becomes necessary for teachers to continually adapt to said changes. 2. Innovative behavior Innovative behavior in the workplace is a complex behavior that consists of three different phases and behaviors: idea generation, idea promotion and idea realization [23,28]. In every domain, individual innovation begins with the phase of idea generation, that is to say, the production of new and useful ideas [1,23,36,15]. This phase is distinguished by being the most significant expression of creativity, understood as a process through which individuals or groups develop unique ides or new solutions, by way of response to emerging problems [26]. The production of creative ideas, often arising from problems linked to respective tasks, incongruities or discontinuity in the performance, is a process which individuals face daily at work. Furthermore, it is also pertinent to the dynamic of problem-solving [11,23,27]. The subsequent task, in the individual innovative process, consists in the promotion of generated ideas and in finding allies and supporters, prepared to sponsor the research. When a person in the workplace finds a new and useful way of doing things, they often get involved in social activities looking for support and assistance for their idea, ultimately building up a coalition of supporters capable of providing them with the necessary power to put it into practice [13,22,23,15]. The final phase of the innovative process is represented by idea realization, through the creation of a prototype or an innovation model, that is to say, a product or a new way of doing things, in order to have the possibility of testing and applying it within a working role, group or indeed an entire organization [22,15]. Within this process, it is possible to discern the presence of numerous and varied predictors for every phase of innovative behavior. According to Anderson and King (1993) idea generation behavior implies a process that is exclusively intra-individual. For this reason, in this phase we can easily expect an effective and determining role of the variables, linked to the
Conflict in Relationships and Perceived Support in Innovative Work Behavior 791
individual personality and that are ab initio strictly interconnected to the creative process, namely a cognitive creative style, intrinsic motivation, pro-activity and personal initiative. However, in its intra individuality, the process of idea generation also appears to be influenced by inter-individual and contextual factors, even if these last mentioned factors were less investigated. For example, the perception of a conflict situation (conflict with colleagues), understood as a manifestation of a disagreement predominantly linked to the task, will encourage the setting in train of arguments and discussion, useful for the generation of new ideas and solutions [34,9], such as the support for innovation. On the other hand, the perception of an unfair balance between efforts in bringing about the relevant innovation and organizational acknowledgement (injustices regarding effort-acknowledgement) seem to negatively influence said innovative behavior [18]. The same reasoning can be used in relation to the promotion and ideas realization phases, where the support for the innovation in question and the availability of resources are only a few of the more “social” aspects which facilitate the realization of ideas [34]. With the aim of observing the innovative characteristics of a organizational system, such as the educational system and retracing the research analysis of the relevant literature, the current work concentrates on the role of two different psychosocial variables, which are antecedents to the innovative behavior of teachers at work: the conflict between colleagues and the level of support for innovation in the educational sphere. 3. Support For Innovation Using an innovative approach in a mechanistic organization, designated for preserving a defined course of actions, could be marked above all by a series of conflicts, unlike what might otherwise occur in a systematic organization, in which workers are stimulated to adapt themselves innovatively to rapid situational change and to unusual circumstances that they might face [25]. The system of rules, praxis, procedures and the organizational climate can stimulate individuals to innovate, control and shape the relationship between innovational costs and benefits and to reduce situational conflicts. An innovative individual strongly depends on their organization and supervisor for the following aspects of work: information(data, competences, objectives, policies and strategies), resources (material, financial and human) and socio-political support. In fact, the abovementioned elements are necessary for the innovative individual to proceed with eventual developments in and promotion and implementation of the relevant innovation [23].
792
A. Battistelli et al.
According to Siegel and Kaemmerer (1978), organizations perceived as being innovative, have a higher probability of motivating and anticipating the creative behavior of their members than those organizations which are viewed as being traditional (non-innovative). In actual fact, they defined an innovative organization as one which predicts the creative functioning of their members and a traditional organization as one which is not specifically oriented towards the creative functioning of its members. In the opinion of Siegel and Kaemmerer (1978), the five characteristics that distinguish innovative organizations from those that are more traditional, making them more supportive of attempts to introduce new ways of doing things in the workplace, are as follows: • Leadership, defined as that which must support the beginning, the development and the diffusion of new ideas, whilst simultaneously assuring a clear decentralization of power. • Ownership, defined as that situation, where it is possible to clearly discern when the members of an organization or department feel like the originators and developers of the ideas, processes and procedures with which they work. • Norms for diversity, defined as those that characterize an organization which confers a high value to the creative approach and to problem solving. Members of an innovative organization have an positive attitude towards diversity. The system therefore, responds positively to creative manifestations and those behaviors judged as deviant, are rare. • Continuous development, defined as that for which change is an ongoing process within an innovative organization. Members have an attitude of continuous enquiry in regards to organizational responsibilities, with a constant change of emphasis, in accordance with the goals of the organization. • Consistency, defined as the coherence between innovative organizational processes and the objectives of the organization itself. In this case, organizational members perceive that the way in which a task is carried out, could have immediate but non-intentional consequences that could eventually cause conflict with the objectives of various workplace activities. The literature regarding organizational support for innovation has shown how an organization that listens and provides support, attention and incentive to the initiatives and ideas of its members, receives higher affective involvement and
Conflict in Relationships and Perceived Support in Innovative Work Behavior 793
better role and extra role performances, up to and including innovative behavior, by way of recompense [28,6,10,30]. Supportive organizations should therefore communicate the presence of the abovementioned five characteristics on innovation, putting into practice various forms of verbal and material support, which may include anything from feedback in the form of acknowledgement/encouragement to the decision on how much space, time and resources to dedicate towards innovation. Innovation will have a greater probability/likelihood of appearing in contexts, in which it is possible to perceive positive support for innovation and attempts to introduce said innovation, that is to say, doing things in a different way from usual, is more likely to be rewarded, rather than opposed or indeed punished [2,22]. The support of and for innovation is the expectation, approval and practical support of attempts to introduce new and better ways of doing things in the workplace [33]. In the same way, Ekvall (1996) [12] emphasizes the importance of this support, in an effort to try and avoid the traditional “instinctive no”, verified when every proposal is automatically refused with a counter argument. If people perceive that their attempts at change and improvement are supported, they will probably succeed in being more innovative. In other words, they will feel more confident in the exposition and promotion of their ideas. Furthermore, due to the fact that support for the relevant innovation on the part of the organization can be expressed in various forms, from verbal support to assurances given, in terms of dedicating ample time and resources to put into practice new ways of doing things, so too the behavior of converting ideas into work practice will obviously be clearly favored by the perception of said contextual variable. For these reasons, it can be presumed that the innovative behavior of teachers in the current study, in its three behavioral phases, will be positively influenced by the support for innovation, as perceived by the teachers themselves. In particular, the following hypothesis may be formulated: H1: Support for the innovation will positively influence the innovative behavior in its three phases of idea generation, idea promotion and idea realization. 4. Conflict and Iwb The innovative study proposes a positive vision of peoples’ involvement in their social and organizational environments, identifying real possibilities of
794
A. Battistelli et al.
transforming and/or reshaping their organizations. Nonetheless, a few social determinants exist which often act as obstacles to innovation. Colleagues and collaborators can appreciate an innovative idea, coming from another person in that precise moment, in which they perceive that existing theories and procedures begin to be less appropriate for problem solving in the workplace. Often however, following an initially tentative welcome of the idea, the successive phase of the process is marred by arising disagreements [16]. Cultural and structural changes, which the innovation implies, of their nature entail interpersonal conflicts. The conflict can represent the basis for building up the innovation in question, in that moment in which it is conducted in a constructive way, with identification, selection and comparison of different perspectives, aimed at the implementation of idea quality. This conflict can simultaneously represent a negative consequence of innovation, if one only considers the costs linked to the adaptation and the cognitive and emotional reconstruction of daily action patterns. The working out of the conflict, in terms of costs and benefits, depends on the motivation and capacity of people to resolve problems, emerging from innovative change. Recently three types of conflicts have been identified [19,20], namely relational conflict, task conflict and process conflict. Relational conflict refers to personal situations, for example colleagues not liking each other. Task conflict concerns knowledge of different point of views on how a task should be carried out. The third and most recently identified form of conflict is that concerning process, that is to say, knowledge of the disagreement, regarding how the task should be carried out, including aspects such as individual responsibility or the delegation/sharing of resources. According to Shalley (2002), it could be useful to incorporate the three forms of conflict into the innovation theory, specifying at which point of the innovative process each particular form could be more important than the others. In fact, it may be assumed that every innovative situation is linked to conflicting situations, understood both as the cause of the change and as a consequence of it. In the few relevant studies conducted to date, regarding conflict [16], the concept is understood as any behavior that a colleague intentionally engages in, as a consequence of innovation, in order to block the innovative individual. By its nature, the innovative individual’s proposal challenges the pre-existing structure of relationships, informal norms and reciprocal expectations. In these cases, the conflict finds its own justification given that, as Jones (2001) confirms, “the preference and the habit for procedures and familiar actions are
Conflict in Relationships and Perceived Support in Innovative Work Behavior 795
difficult to give up, to the extent that people have an inborn inclination to return to their original behaviors, a predisposition that prevents change” [21, p. 398]. Certainly, following this theoretical line it is easy to imagine how inhibiting a conflicting action may be, when compared to the other idea generation, that is to say, in order to avoid eventual conflict, individuals would probably be forced to suppress their innovative ideas. On the other hand, in the opinion of West (2002) and West et al.(2004) both in groups and in organizations, conflict unfolds as an antecedent of innovative work behavior. Therefore, individuals would be more likely to be innovators in situations and environments, in which it is easy to meet and compare notes, also thereby giving space to minorities and the airing of disagreements, to the detriment of conformism. Understanding the contradictory nature of the research results presented to date, it may be assumed that the conflict, in the form of disagreement and relational tension, linked mainly to job description (task related), acts in a different way, in relation to the various phases of the process. On the one hand, this stimulates debate about and comparison of perspectives, linked to new idea generation, whilst on the other hand they may censure their free expression and the sponsoring thereof, therefore, inhibiting its effective implementation, with the aim of avoiding other negative consequences. The following hypothesis may therefore be formulated: H2: The conflict between colleagues will have a positive influence on the idea generation phase and a negative influence on the idea promotion and idea realization phases. 5. The Method The research was conducted according to a transversal scheme and a quantitative methodology, using a questionnaire. The survey was presented to the teachers, providing them with a few general indications, whilst highlighting both the anonymous and non-judgmental nature of the research, in relation to the innovation level of the school or to individual teachers. 5.1. The Sample The sample was composed of 111 Primary School Teachers, belonging to seven Elementary School Districts of Florence. It represented two fundamental teaching areas, namely logical-mathematical and linguistic-creative. It was composed mainly of women (n = 107), with the average age being 44.6 (DS =
796
A. Battistelli et al.
10.27). Total years of teaching service averaged at 18.71 (DS = 11.47), whereas the average period of teaching service in the current school was 9.86 (DS = 8.75) years. In order to have an indication of the teachers’ innovative level, it proved opportune to analyze the frequency of extra-curricular activities, carried out by teachers and to look at the number of projects considered to be a relevant and integrative part of their teaching practice. 54.95% of teachers admitted to having done other tasks beyond formal teaching duties in the previous five years of their career. In particular, we are talking about tasks mainly related to coordination of the teaching system, and responsibility for certain projects linked to the inter-cultural or multimedia sphere. In relation to said projects, 36.93% of subjects admitted to having been the instigators thereof, in addition to having participated in other similar and integrative projects as part of teaching activity, relative to the general life of the school. 5.2. The measures The questionnaire was composed of two sections, the first comprising of a general enquiry into personal details, such as age, sex, career seniority and of teaching in a particular school and the second section included three scales, with the ultimate aim of analyzing innovative work behavior, the perception of conflict with colleagues and organizational support for innovation. The response format was based on the 7 point Likert Scale. In particular, a 9 item scale of innovative individual behavior was used to observe innovative behavior in the workplace (Innovative Work Behavior, IWB) by Janssen (2000) [15]. The items are based on the Scott and Bruce (1994) scale of innovative work behavior [28]. Three items, refer to idea generation behavior, three to idea promotion and a further three refer to idea realization. The teachers were invited to indicate how often, during their job, they engaged in individual activities, for example: “to generate new ideas in relation to difficult problematical situations” (idea generation), “trying to garner support for innovative ideas” (idea promotion) and “to transform the innovative ideas into useful applications” (idea realization). 8 items regarding the support dimension for creativity were chosen, in order to point out the organizational support for innovation, in terms of continuous development and organizational orientation towards ongoing research of new and useful ideas for problem solving. These items were dealt with according to the inventory of Siegel and Kaemmerer (1978).
Conflict in Relationships and Perceived Support in Innovative Work Behavior 797 Table 1. Descriptive analysis of the studied variables. VARIABLES AGE SUBJAREA CARR SEN CONFLICT INN.SUPP IWB IWBgen. IWBpro. IWBreal.
N 104 109 108 110 111 111 111 111 111
M 44.62 1.91 9.86 3.35 4.76 4.47 4.87 3.98 4.54
D.S 10.27 1.01 8.75 1.20 1.04 1.07 1.29 1.21 1.29
.77 .87 .90 .87 .85 .84
The subjects were requested to indicate their response on the 7 point scale, where 1=(strongly disagree) and 7=(strongly agree), in relation to the statements, regarding support and the continuous organizational orientation towards the perceived creativity in the relevant school. For example, “This school can be described as flexible and continually adaptive to change”. In order to best observe potential conflict between colleagues (conflict with co-workers) the 4 item Janssen (2003) scale was used, where individuals indicated the frequency of conflicting behavior with their colleagues in relation to internal disagreements regarding problems linked directly to the school and in their interpersonal relationships. (For example, “How often does it happen that … You and your colleagues find yourselves disagreeing over principal educational values of the school”). 6. The results Table 1 shows the descriptive analysis, relative to the studied variables. The innovative behavior of the subjects presents an average of M = 4.47, putting it in the mid-high range of the Likert (1-7) scale used. From the averages, it can be further deduced how, in comparison with the measures of the three phases of innovative behavior, subjects admitted to having adopted to a large extent idea generation behavior (M= 4.87; DS=1.29) when compared to a more active idea realization behavior (M= 4.54; DS=1.29) and the possibility of promoting their personal ideas in the workplace (M= 3.98; DS= 1.21). In Table 2, regarding correlation analysis, a first indication of a few possible relationships between the three innovative behavior phases and the hypothesized antecedents are given. It should be noted how, even if with a weak co-relation, the conflict correlates negatively with support for innovation.
798
A. Battistelli et al. Table 2. Correlation Analysis.
Variables 1 1 1.AGE -.13 2. SEX -.09 3. QUAL -.02 4. SUBARE 5. CARSEN .67(2) .03 6. CONF. .08 7. IN.SUPP -.09 8. IWB -.09 9. IWBsug -.11 10. IWBpro -.07 11. IWBrea
2
3
4
5
1 .07 1 -.02 .28(2) -.02 -.04 .02 .01 -.10 .00 -.12 .06 -.10 .03 -.03 .19 -.12 .01
1 -.06 .06 -.02 .01 .04 .05 -.04
1 .04 .18 -.07 .01 -.17 -.04
6
7
8
9
10
11
1 -.20(1) 1 .26(2) .16 1 .36(2) .01 .84(2) 1 .06 .20(1) .80(2) .47(2) 1 .22(1) .20(1) .88(2) .63(2) .60(2)
1
(1) Correlation is significant at the 0.05 level (2-tailed). (2) Correlation is significant at the 0.01 level (2-tailed).
Multiple regressions with the stepwise method were used to analyze the conflicting influence between colleagues and the support for innovation perceived in the three phases of individual innovative behavior. The idea generation phase appears to be influenced positively by the conflict between colleagues, but not in a significant manner by their perception of organizational support (Table 3). Compared with the ideas promotion phase, Table 4 shows how only the perception of support for innovation seems to positively influence said behavior. Finally, both the perceived conflict and perceived support for innovation have a positive influence on the ideas realization phase. It can be deduced from Table 5 that the behavior of the transformation of one’s own idea into something concrete and effective appears to be favored both by conflict and by support. Summarizing therefore, innovative work behavior seems to be favored by a perception of conflict in the workplace, which is at the same time supportive of attempts at innovation by its personnel. In particular, the two examined psychosocial variables, in terms of antecedents of innovative behavior, appear to influence differently the three phases of the process of individual innovation. With regards the guiding hypotheses of the study, based on the data, we cannot fully confirm in H1 the extent of positive influences for innovational support, as hypothesized in relation to all three phases of individual innovative behavior that appear to be found only in relation to the phases of idea promotion and idea realization. In other words, innovational support in this case does not appear to facilitate the spontaneous development of new ideas for improving
Conflict in Relationships and Perceived Support in Innovative Work Behavior 799
Table 3. Conflict with colleagues and support as antecedents in the idea generation phase. Dependent Variables Predictors R² adjusted = .120; F = 15.903; p < .000 Idea generation Conflict co-workers
B .358
t 4.498
p .000
Table 4. Conflict with colleagues and support as antecedents in the idea promotion phase. Dependent Variables Predictors R² adjusted = .035; F = 4.995; p < .023 Idea promotion Support for innovation
B .210
t 2.259
p .027
Table 5. Conflict with colleagues and support as antecedents in the idea realization phase. Dependent Variables Predictors R² adjusted = .109; F= 7.675; p < .001 Conflict co-workers Idea realization Support for innovation
B .283 .277
t 3.295 3.094
p .001 .003
methods, processes and work procedures but rather its influence appears to be more linked to the successive phases of more practical and “social” innovative behavior. As regards the second hypothesis, the perception of a conflicting situation with colleagues appears to positively influence not only idea generation behavior but also the more practical and socially divisible behavior of idea realization, whereas no discernible influence, in relation to the intermediate behavior of idea promotion, can be observed. The above data allows us to consider H2 as partially confirmed. In actual fact, such a hypothesis sustained a positive conflict relationship in idea generation behavior and a negative relationship with the other two behaviors of idea promotion and realization. 7. Discussion and Conclusions The results of the study suggest that generally speaking individuals, in terms of innovative work performance, are favored by a conflicting environment which at the same time supports their efforts at finding alternative solutions to problems. The three behaviors of IWB appear to be favored in a different manner and to a different degree by conflict and support, from that moment when the first phase of the process (idea generation) appears to be helped exclusively by the perception of a conflict, by the intermediate phase of the process (idea promotion) which only appears to be helped by innovational support, to the final
800
A. Battistelli et al.
phase (idea realization) which appears to be helped by the perception of an environment that combines conflict and support. The alternating states of influence of the factors analyzed within the different phases of the process, allow us to observe clearly the difficulties in controlling the emergence of the innovation process, by the interested organizational systems. In fact, on the one hand it emerges from the results that people’s innovative behavior, taken in its totality, appears to be helped not only by the perception of conflict in the workplace but also by a supportive organizational context, whereby it is particularly oriented towards continuous research into finding alternative solutions to problems, thus confirming a series of propositions extrapolated from the literature [35,29,5,]. On the other hand, as has previously been shown, people’s innovative behavior is helped by the perception of a conflict with their colleagues but presumably only at an intermediate level within the organization which directs said conflict towards a more constructive and performance oriented confrontation, without contradicting the evidence of Janssen [16,17] on conflict, understood as a negative consequence of IWB. Furthermore, not all phases of the innovative process are helped by conflict in the workplace, referring here in particular to an insignificant relationship with idea promotion, probably because a context of continually doubting and questioning ideas and opinions does not favor garnering support from colleagues. Unlike the support and importance of a “facilitating” organizational climate, the current study emphasizes the fact that the phases of idea implementation (promotion and realization) are influenced to a greater degree. In line with previous research [28,5], the above observation contributes to defining support as a socio-contextual variable which has a determining role in the practical realization of innovation. This implies that companies and organizational systems should pay particular attention, towards the ongoing assurance of material and temporal resources, in order to guarantee their welltimed participation in changes, on the part of all their employees. An analysis of the perception of variables in an exclusively retrospective manner, excluding the possibility of conducting a longitudinal study of the process, clearly represents an obvious limitation to the current study. Nonetheless, through this study it has become possible to understand, if only partially, the influence of two of the most important socio-contextual variables, regarding people’s innovative behavior in their daily work, from idea creation to idea realization, through the promotion phase.
Conflict in Relationships and Perceived Support in Innovative Work Behavior 801
References 1. T.M. Amabile, R. Conti, H. Coon, J. Lazenby and M. Herron, Academy of Management J. 39(5), 1154-1184 (1996).
2. T.M. Amabile, E.A. Schatzel, G.B. Moneta and S.J. Kramer, Leadership Quarterly 15(1), 5-32 (2004).
3. N. Anderson, C.K.W. De Dreu and B.A. Nijstad, J. of Organizational Behavior 25, 147-173 (2004).
4. N. Anderson and N. King, International Review of Industrial and Organizational Psychology 8, 1-34 (1993).
5. C.M. Axtel, D.J. Holman, K.L. Unsworth, T.D. Wall and P.E. Waterson J. of Occupational and Organizational Psychology 73, 265-285 (2000).
6. M. Baer and M. Frese, J. of Organizational Behavior 24, 45-68 (2003). 7. E. Catarsi, Storia dei programmi della scuola elementare (1860-1985), (La Nuova Italia, Firenze, 1990).
8. J.P. Cruchtfield, Physica D 75, 11-54 (1994). 9. C.K.W. De Dreu and M.A.West, J. of Applied Psychology 86, 1191-1201 (2001). 10. L. Dorenbosch, M.L. van Engen and M. Verhagen, Creativity and Innovation Management 14(2) 129-141 (2005).
11. P.F. Drucker, Innovation and entrepeneurship: Practice and principles (Heinemann, London, 1985).
12. G. Ekvall, European J. of Work and Organizational Psychology 5(1), 105-123 (1996).
13. J.R. Galbraith, Organizational Dynamics 10, 5-25 (1982). 14. J.H. Holland, Emergence from Chaos to Order (Perseus Books, Cambridge, 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
Massachusetts, 1998). O. Janssen, J. of Occupational and Organizational Psychology 73, 287-302 (2000). O. Janssen, J. of Occupational and Organizational Psychology 76, 347-364 (2003). O. Janssen, J. of Organizational Behavior, 25, 201-215 (2004). O. Janssen, E. Van De Vliert and M.A. West, J. of Organizational Behavior 25, 129145 (2004). K. Jehn and E. Mannix, Academic of Management J. 44, 238-251 (2001). K.A. Jehn, G.B. Northcraft, and M.A. Neale, Administrative Science Quarterly 44, 741-736 (1999). G.R. Jones, Organizational theory: text and cases.(Addison-Wesley, New York, 2001). R.M. Kanter, The Change Masters: Corporate Entrepreneurs at Work (Allen & Unwin, London, 1983). R.M. Kanter, in Research in organizational behavior Vol.10, Eds. B.M. Staw and L.L. Cummings (CT: JAI Press, Greenwich, 1988), pp. 169-211. G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). H. Mintzberg, D. Raissinghani and A. Theorett, Administrative Science Quarterly 21(2), 246-275. (1976). L.W. Porter, G.A. Bigley and R.M. Steers, in Motivation and work behaviour, Eds. L.W. Porter, G.A. Bigley and R.M. Steers. (The McGraw-Hill Companies, Columbus, Ohio, 2003) vol. 7, pp. 559-591).
802
A. Battistelli et al.
27. J. Rank, V.L. Pace and M. Frese, Applied Psychology: an International Review 53(4), 518 –528 (2004).
28. S.G. Scott and R.A. Bruce, Academy of Management J. 37, 580-607 (1994). 29. C.E. Shalley, Applied Psychology 51(3), 406–410 (2002). 30. H. Shipton, D. Fay, M. West, M. Patterson and K. Birdi, Creativity and Innovation Management 14(2), 118-128 (2005).
31. S.M. Siegel and W.F. Kaemmerer, J. of Applied Psychology 63(5) 553-562 (1978). 32. D. Fay and H. Lührmann, European J. of Work and Organizational Psychology 13(2), 113-119 (2004).
33. M.A. West, Social Behavior 4, 173-184 (1989). 34. M.A. West, Applied Psychology: An International Review 51, 355-387 (2002). 35. M.A. West, G. Hirst, A. Ritcher and H. Shipton, European J. of Work and Organizational Psychology 13(2), 269-299 (2004).
36. R.W. Woodman, J.E. Sawyer and R.W. Griffin, Academy of Management review 18, 293-321 (1993).
ROLE VARIABLES VS. CONTEXTUAL VARIABLES IN THE THEORY OF DIDACTIC SYSTEMS
MONICA ALBERTI, LUCIA CIRINA, FRANCESCO PAOLI Dept. of Education, University of Cagliari Via Is Mirrionis 1, 09123 Cagliari, Italy E-mail: [email protected] Partisans of the constructivist approach to mathematics education, such as Brousseau or Chevallard, developed an accurate theoretical framework in which didactical systems are viewed in a systemic perspective. What they somewhat fail to draw, however, is a sharp distinction between role variables – concerning the roles played in the didactical interaction by the individual elements of the system (Student-Teacher-Knowledge) – and contextual variables – concerning the action on the learning process of the system as a whole. Our research in progress on 2nd graders' word problem solving strategies applies the previous dichotomy to class management strategies adopted by teachers. Partial evidence collected so far points to the tentative conclusion according to which, contextual variables being equal, differences in teaching styles and methods may deeply reshape the role component of didactical systems. If we take into careful account this distinction, we can shed additional light into some hitherto unexplained phenomena observed in the literature. Keywords: System theory, didactic systems, mathematics education, constructivism, teaching methods, teaching styles.
1.
System theory and didactics: The constructivist approach to maths education
System theory is intimately tied to mathematics education at least since Chevallard [8,9] formulated his concept of didactic system as a key to understand mathematics teaching and learning processes in an integrated perspective. According to Chevallard, a didactic system is made up by three agents – Teacher, Student and Knowledge – and it is the task of educational science (in particular, of maths education) to investigate how such agents relate to one another and interact in the context of the instructional process: The scientific study of didactic processes brings to the fore what may be called a didactic system, which is to be the main concern of the didactician [...] The crudest form is one in which the didactic system (as a theoretical model) is seen as made up of just one part, which is knowledge. [...] The
803
804
M. Alberti et al.
next step is taken when one becomes aware that the teaching process is concerned in the first place with the child, or the student. [...] The dividing line between [pedagogy and didactics of mathematics] separates those conceptions in which the teacher remains outside the system under consideration, and those in which the teacher is included as a new theoretical “part” in the didactic system, together with knowledge and student [8, p. 150]. Chevallard repeatedly highlights the connection between his concept and general system theory, pointing at didactic systems as pertinent examples of the notion of open system. In so doing, Chevallard is of course influenced by the other theoreticians of the French constructivist school of mathematics education (whose reflections are, in turn, deeply affected by Chevallard’s work). We are thinking, in primis, of the theory of situations of Guy Brousseau [6], whose openly professed systemic approach is partly borrowed from the field of organizational sociology, whence he derives the emphasis laid on negotiation processes and the marked game-theoretical flavor which permeates his view [25]. In particular, one of the key concepts appears the distinction between didactic situations – where teachers and students are explicitly aware of their respective roles in the didactic relation, in such a way that students are inclined to respond to the teacher’s real or perceived didactic expectations, rather than to engage in an autonomous construction of knowledge – and a-didactic situations – where students act and reason “as if the teacher were not there”, although the teacher is there and has expressly designed the situation itself in order to trigger a learning process. In other words, if the teacher has successfully disguised her/his didactic goals, the student’s behavior will merely depend on his relationship with the knowledge to be constructed [6,21]. In this approach, therefore, the ultimate goal for the teacher should be that of priming a devolution process in her/his students, driving them to take upon themselves the full responsibility of engaging in the activity of knowledge construction in a given learning situation – driving them, in other words, to learn for themselves and not to meet what they identify with the teacher’s expectations, or to get a good mark, or else to scrape a pass. But the didactic action cannot be fully effective if the devolution process is not matched by a complementary process of institutionalization, whereby the concepts which are being acquired receive an official status that warrants their reusability in different contexts. In this way, a personal discovery is turned into an interpersonal, universal knowledge [2].
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
805
2. Role variables and contextual variables in didactic systems So far, so good. Chevallard’s didactic systems are perfectly consistent with the mainstream systemic perspective to the extent that it is expressly stated that such systems must be examined in their entirety, rather than as regards the specific features of the individual agents. This paper, however, aims at advocating an approach which takes this systemic feature at face value. If we really want to understand what is at stake in didactic systems, we should not only focus on a detailed investigation of the roles played therein by students, teachers, and knowledge, but also try and answer the following question: How should a didactic system as a whole evolve towards conditions which guarantee a successful construction of knowledge? The preceding question appears all the more fundamental in the light of some research findings which receive little or no explanation if we confine ourselves to considering the roles of the single components of the model. A case in point is the phenomenon of the sensitivity to didactic contract [26,27]. An extreme variability has been observed in the students’ willingness to break some implicit clauses of school problem solving (arithmetical problems are always solvable, the teacher invariably provides her students with a set of data which are sufficient to achieve the solution, mathematics requires computation, and so on) in dependence of the characteristics of their teachers: in some contexts “traditional” teachers increase such willingness more than “innovative” teachers, whereas in other contexts it is the other way around. An analysis which confines itself to an investigation of the roles of teacher, student, and knowledge, therefore, is bound to fail as a thorough explanation of the phenomenona . The conceptual reason for this state of affairs is easy to find: being a didactic system a holistic entity, the construction of knowledge is a typically emergent process which cannot be described in terms of traditional cause-effect relationships. Therefore the adoption of a constructivist view necessarily entails a description of the evolution of a didactic system in terms of the concepts used within the theories of emergence (see, for instance, Crutchfield [11]; Holland [16]; Minati and Pessa [19]). This raises a number of difficulties, mainly stemming from the fact that most concepts and techniques used within this domain are borrowed from the physical theory of phase transitions and nobody knows whether they can be transferred or not to biological or social contexts. At first sight this translation sounds implausible, at least within the context of a
Among the data which should be assessed and explained to a greater depth we could include also some teachers’ observed failures in handling students’ reasoning processes (cfr. [7]).
806
M. Alberti et al.
didactic systems, due to the intrinsic complexity of human beings and to the difficulty of finding a small number of relevant (and measurable) state variables suited to describe the essential dynamics of these systems. Such a circumstance seems to be a strong obstacle when trying to use the knowledge available from theories of emergence in order to influence the behavior of a didactic system. To find our way around the previous difficulties, we suggest to give a somehow more formal status to the distinction hinted at above between two kinds of variables governing the functioning of didactic systems: • Role variables, which concern the individual components of the didactic system: Student, Teacher, and Knowledge. Some pertinent examples are all the variables which call into play the didactic contract, or the variables regarding teaching methods or styles (see below); • Contextual variables, which concern the functioning of the system as a whole. Examples include: how to avoid the inhibition in school contexts of naïve but effective after-school problem solving strategies, how to solve cognitive transfer problems, how to structure the school system in such a way as to minimize math anxiety. Before we go on, two clarifications are in order. In the first place, the phrase “contextual variables” has been used by several authors in the literature with slightly or even markedly different meanings [1,5,13]. Henceforth, we will assign this term exclusively the meaning resulting from the above definition. Secondly, we do not mean at all to argue that contextual variables have received no attention in the constructivist approach to maths education. Quite to the contrary: they have been the focus of detailed and careful analysis. The problem is, though, that the key concepts of the approach – such as the notion of a-didactic situation – are formulated with explicit reference to the roles played therein by the individual components of the model. One could develop the deceptive impression that it suffices to effectively intervene on such role variables to improve the student’s learning. Yet, this would be plain wrong: no deep modification of learning conditions can take place if we fail to leave a mark on the overall structure of the didactic system. For example, role-playing experiences in which junior high school students were asked to pretend to be elementary school teachers and explain to their imaginary students some mathematical notions (e.g. that triangles have three altitudes) resulted in an observed failure on the students’ part to assume the role at issue: students did not abandon at all the inappropriate response patterns which characterized their behavior as maths learners (e.g. the attempt to produce
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
807
statements which sounded as “formal” as possible) even if they were requested to play a different role within the system [12]. This suggests that while a modification of contextual variables can affect role variables in the didactic system, a sheer change concerning role variables which implies no deep transformation in the general conditions of the system as a whole is probably bound to be ineffective. The above considerations suggest that a strategy of intervention on a didactic system based on slight modifications of either role or contextual variables has no hope, per se, of making emergent the construction of knowledge. In this respect didactic systems are very different from most physical systems, where even a very small change of parameter values can give rise to the emergence of entirely new behaviors. In turn, this entails the need for understanding how to introduce deep changes of contextual variables, a problem which is still very difficult to solve in full generality. 3.
Managing contextual variables: Towards a full-blooded systemic perspective on maths education
If there is any lesson to be drawn from our previous remarks, we could summarize it as follows: an intervention in a didactic system can only claim to be effective if it affects the system itself in its entirety. How can this happen in practice? And, in particular, how can we achieve this goal in the elementary school, where so many students develop math anxiety to a degree that will cause irreparable damages in their relationship to the discipline for the years to come? Perhaps the best model to look at is the kindergarten, by now widely recognized as a school system where children can learn in an effective way in appropriately structured environments; where some detrimental clauses of the didactic contract are not yet in force; where children can do and learn mathematics, as it were, without knowing it – more than that, without even knowing that there is such a horrible thing which is called mathematics and which scares to death hordes of older pupils. Kindergarten teachers usually agree that the devolution process is attained in a comparatively easy way with their students. Of course, this model is not easy to export into elementary school, where the knowledge which students are supposed to construct presents a much higher degree of formalisation. This is a fascinating challenge for maths education: is it possible for the devolution-oriented instructional model prevailing in kindergartens to espouse the formalized knowledge which is at stake already in the lowest grades of primary school? We will not venture to attempt an answer
808
M. Alberti et al.
to this terribly complicated question; we will confine ourselves to point to a couple of aspects which should perforce be taken into account if one were to boldly undertake such an attempt. • Interdisciplinary didactics. In our primary schools, “interdisciplinary teaching” is a recurrent phrase which pops up all too often in planning documents or in worksheets, but which is seldom practiced for real. However, one of the reasons why the devolution process happens so naturally in kindergartens is the fact that disciplinary fences are so loose that children do not even notice whether they are developing a “mathematical” or a “linguistic” ability; the focus is always on the comprehensive development of the child’s personality. Imagine how great it would be if this could carry over to our elementary schools! After all, how can you develop an anxiety for mathematics, if its learning is constantly intertwined with the learning of (perhaps) less intimidating disciplines such as natural science, history or language? This, of course, presupposes the presence, at a practical level, of teaching teams which are ready to cooperate in the planning and management of their teaching activities. At a more theoretical level, the same goal calls for the development of a discipline which could be termed interdisciplinary didactics and which would stand halfway in between general didactics and disciplinary didactics [20,17]. With the former, it would share the absence of a privileged link with an individual discipline; with the latter, it would share the readiness to get involved into the development of specific teaching contents (curricula, teaching units, concrete tools to be used in the classroom). • Formative assessment. Boston [4] defines formative assessment as an assessment activity whose goal is “to gain an understanding of what students know (and don’t know) in order to make responsive changes in teaching and learning [...] Techniques such as teacher observation and classroom discussion have an important place alongside analysis of texts and homework”. The beneficial effects of formative assessment, both in terms of average improvements in students’ test scores and in terms of their enhanced awareness of any gaps that may exist between their desired goal and their current knowledge, understanding, or skill, have been amply documented in the literature (see e.g. [3]). In particular, applications to mathematics have been presented in Fontana and Fernandes [15] and in McIntosh [17]. Since the axiological, almost exclusively summative assessment procedures which are commonplace in our primary school are known to be one of the main roots of maths anxiety (see e.g. [24,29]), the
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
809
adoption of a formative assessment model could go some way towards effectively modifying a decisive contextual variable and recreating thereby the favorable devolution-driving environment that is pervasive in the kindergarten and that all too often is irretrievably lost in the elementary school. Notice that both aspects mentioned above would require a modification of contextual variables; our primary schools would look like completely new environments, and children’s concepts of school and of mathematics would result into quite different ones, if they came to be accomplished. We are certainly aware that this is bound to remain a sort of regulative ideal which is impossible to carry out up to the hilt – after all, every good teacher must be ready to make compromises in her/his teaching practice in the light of the individual situation (s)he has to manage. This does not mean, however, that the underlying goal of modifying as radically as possible the contextual variables that hinder devolution should be given up. As it can be seen from the previous considerations, the complexity of both the didactic systems and the social environment into which they are embedded prevents, so far, the adoption of precise strategies in order to induce the deep changes of contextual variables invoked in the previous sections. In such a situation, however, some lesson can still be learnt from the practical observation of specific examples. This is all we can offer in this case, but perhaps it will be useful in order to suggest new approaches. 4. An example from our research We now want to illustrate, by means of an example drawn from our research activity, an attempt to set up an observation where role and contextual variables are viewed not as disjoint factors, but as tightly interconnected ambits whose mutual impact has to be taken into due account [22]. The object of our study was to assess the respective repercussions of different teaching methods and styles on the problem solving behavior of school-age children. The distinction between teaching methods and teaching styles, regretfully enough, is sometimes overlooked in the literature. In our opinion, however, it is of the utmost importance to distinguish between these two aspects. Teaching methods can be defined as the different didactic strategies used in the teaching process of a given discipline. Examples of dichotomies regarding teaching methods are: explicative vs. heuristic methods [23]; teacher – vs. student – or task-oriented methods, and so on. On the other hand, teaching
810
M. Alberti et al.
styles can be defined as “a pervasive way of approaching the learners that might be consistent with several methods of teaching” [14]. Examples include: the difference between “devolving” and “institutionalizing” styles [27], or the distinction between procedural and conceptual styles [10], directly related to the topic of teachers’ mathematical Pedagogical Content Knowledge (PCK: [28]). The sample examined in our observation was a class of 23 2nd graders. As a preliminary move, we interviewed their teacher and spent some time in the class during the regular school activities, in order to socialize with the students and to understand the socio-cognitive dynamics of the group. Altogether, these children showed good levels of group discussion and group activity skills. As a part of a strategy directed at overcoming multiplicative problem solving strategies exclusively based on a repeated addition, the class teacher had devised a set of multiplicative problems, two of which are reproduced below. FIRST PROBLEM: CHESS PAWNS. During playtime four children decide to play chess. They use bottle caps as pawns. Each player draws from a box 16 caps of a color different from that of his opponent’s. How many caps will they draw from the box altogether? SECOND PROBLEM: BLACK SQUARES. Denise draws a chessboard and has to color all its black squares. In each row there are 4 of them. She has just completed the sixth row, when she has to quit because it is lunchtime. How many black squares did Denise color so far? The teacher structured these activities in such a way as to allow her students to be confronted with a situation of unbalance. In fact, students do their class work in an autonomous way on the basis of a problem situation devised by the teacher. This situation presupposes an a priori analysis which permits to funnel its mathematical contents without making them explicit, as well as to spot in advance the difficulties which students will encounter and their possible solutions. In this case, the teacher aimed on the one hand at evaluating to what extent the problem’s components could affect the reasoning on the text (hence the solving process), on the other hand at observing whether students used multiplication or repeated addition in their computations. Some phrases in the texts could lead students to a wrong representation of the problem situation: e.g. the word “draw” in the first problem, or the phrase “black squares” in the second, which cued already encountered problems on the total number of black squares in a chessboard.
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
811
The observation of the relevant classroom activities allowed us to distinguish four stages in the teacher’s management of the situation. 4.1. First stage: Introducing the activity This first stage lasted 7 minutes for each one of the two problems. The teacher introduces the activity by drawing the pupils’ attention to the subject to be dealt with, so that the problem situation makes sense for them. It is important to remark that she does not stress the possible difficulties hidden in the problem, but encourages children to write down individually their solution strategies, as well as any difficulty they may have encountered. In such a way, she not only makes sure that children cannot influence in any way their mates’ reasoning during the solving process, but also leads them to make sense of their solving strategies and to render their reasoning procedures explicit by writing down an explanation which promotes an awareness of the performed steps. 4.2. Second stage: Individual solution Children are engaged in the individual solution of the problem: they reread the text, underline the relevant information and shape their own path towards the solution. At this stage, the teacher refrains from driving them to the correct solution – she just walks amidst the desks, encourages children or asks them for clarifications, but provides no feedback for their suggested solutions, whether they are correct or not. Even in the presence of explicit calls for help, she confines herself to such statements as “Reread carefully your text and check what you have written”. At the same time, she remarks again the need for an “explanation” of any procedure or reasoning strategy they use. The class needed 33 minutes to complete the first problem, whereas the second one only required 20 minutes. We observed different attitudes emerging from the students’ behaviors: • Some children provide first a graphical representation of the problem situation and then choose a calculation strategy, sometimes (but not always) matched by the explicit writing of an operation; • Other children skip the graphical representation and prefer to explain directly how they solved the problem; • Finally, some children provide a drawing, a written calculation and an explanation for the employed procedure. At this stage, children focus on an individual action upon their learning environment. They realize that their previous knowledge is a prerequisite for
812
M. Alberti et al. Table 1.
Solving strategies Graphical representation with no written calculation Repeated addition Repeated addition and multiplication Multiplication Other pertinent strategies Other non-pertinent strategies
First problem 6% 18% 53% 23%
Second problem 17% 67% 11% 5%
attaining the correct solution, but is by no means sufficient in itself. The difficulties inherent to the problem can be fully overcome only by restructuring such knowledge in such a way as to find a personal solution and to advance an explanation for it. Children made recourse to different strategies in solving these problems; we grouped them according to the schema contained in Table 1. 4.3. Third stage: Group discussion During the preceding stage the teacher already took note of some observed difficulties and of the respective attempts to overcome them. While setting up the ensuing discussion, she is aware that it will be impossible to debate all the mathematical contents which play a role in the solution of the problems: some choices are in order. Therefore, she selects text reconstruction and correct task identification as the focus of the discussion, which lasts 50 minutes for each of the two problems. This stage presents the features of a formulation situation, but also of a validation and institutionalization situation. Students discuss about which strategy is most appropriate for the problem situation at issue and acknowledge that in order to so it is not enough to provide the result of the calculation, but it is necessary to explicitly formulate the solving path they have chosen. Unlike in the previous phase, however, the students’ reflections are now subject to their mates’ acceptance or rebuttal judgments. The teacher does not unveil the correct solution, but leaves her students free to determine the validity of the discussed procedures. The correct answers for each problem come to the surface out of the discussion, and only some aspects of the knowledge at stake are explicitly institutionalized. To clarify the issue, we report an example from the discussion concerning the commutative property of multiplication and the different roles played by the two factors. While providing a graphical representation for the second problem, two students had colored all the black squares in 8 rows of the chessboard. Another pupil had noticed that the result of her written calculation (3 × 8 = 24)
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
813
was correct, for she had counted the black squares contained in the drawing, but was uncertain about the correctness of the operands because she had counted column-wise and not row-wise. The teacher compares the solutions 3 × 8 = 24 and 4 × 6 = 24, triggering the following discussion: Teacher: If you had to choose between these two paths, taking into account what is written in the text, which one would you choose? P.: 4 × 6. Teacher: Why? Someone wants to explain why 4 × 6 is more appropriate to the text? E.: Because these are the numbers in the text. Teacher: Is that all there is to it? Are there any words in the text which suggest that this choice is better than the other one? P.: It doesn’t say “8 columns”, it says that Denise stops when she has completed the sixth row, so it has to do with rows. Teacher: So in this text we don’t have only numbers, but also word, and it is about... L.: ...rows. Teacher : While F. considered columns. Then, what do you think, should we look for the most appropriate calculation or not? In this way we are looking for numbers, not solving the problem. The last sentence allows the teacher to point out the distinction between the multiplication algorithm, the properties of the operation, and their use in the problem solving process. The above discussion highlights the dual nature of numbers in a word problems: they stand not only for abstract, mathematical objects, but also for concrete object that contribute to a mental representation of the problem situation. Although the suggested solution (3 × 8 = 24) provided the correct result from an abstract viewpoint, it led to a wrong mental representation in that it failed to mirror the textual description. 4.4. Extending the problem text At this stage, students are permitted to extend the problem text by introducing new data and/or questions, and at the same time are asked to rephrase individually the arguments advanced during group discussion. It is another institutionalization situation which allows to find an immediate application for the newly acquired knowledge. As regards the first problem, for example, some children chose to extend the problem text by focusing on the number variable
814
M. Alberti et al.
concerning players; some children preferred to focus on the variable concerning caps; some children modified both variables; finally, some children unraveled an emotional aspect of the text. 5. Conclusions By means of the previous observational experience, we tried to analyse how teachers can exert an impact on the learning environment and on the evolution process of the didactic system, which includes as its part the evolution of the relation between students and knowledge. Our observation and the above mentioned evidence coming from the literature are unanimous in suggesting that the communicative styles of teachers affect every instructional situation, as well as its overall effects and dynamics, whatever their chosen teaching methods may be. Rather that claiming that a given method is superior to another, it seems more appropriate to conclude that the adopted methods (including not only the teacher’s strategies and techniques, but also her/his intentional and not intentional behavior) have a greater impact on the students’ motivational, operational and relational attitudes than on their specific abilities and cognitions. In short, to induce emergence within a didactic system is even more difficult than expected. Furthermore, in our observations we focused not only on the respective roles played by the teacher and the students in the didactic interaction, but also on the evolution of the didactic system in its entirety – from the characteristics of the learning environment, to the aspects which tended to ease or to hinder the devolution process – in line with the systemic approach advocated at the beginning. In this way, we tried to provide a concrete example of the mutual interplay of role variables and contextual variables in a particular classroom setting, highlighting how complicated (and how demanding in terms of new conceptual tools, beyond the ones already known within the context of physical emergence) is the problem of inducing emergence. Acknowledgments Disagreement is the main spur which promotes scientific advancement. Therefore, we want - first and foremost - to express our gratitude to Maria Pietronilla Penna who, in many lively discussions, did not refrain from expressing her disagreement with most of the theses we advanced and in this way greatly stimulated our work. We also thank her for her precious observations on the notion of emergence. Moreover, we thank Maria Polo for
Role Variables vs. Contextual Variables in the Theory of Didactic Systems
815
her helpful and friendly advice, and Silvana Saba for providing a great example of professional classroom management. References 1. J.C. Abric, in Les représentations sociales, Ed. D. Jodelet (PUF, Paris,1989), pp. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
187-203. A. Bessot, L’Educazione Matematica 12(1), 77-90 (1991). P. Black and D. Wiliam, Assessment in Education 5(1), 7-74 (1998). C. Boston, Practical Assessment, Research and Evaluation 8(9), (2002). M. Brossard and P. Wargnier, Bulletin de Psychologie 412, 703-709 (1993). G. Brousseau, Theory of Didactical Situations in Mathematics (Kluwer, Dordrecht, 1997). G. Brousseau and P. Gibel, Educational Studies in Mathematics 59, 13-58 (2005). Y. Chevallard, Recherches en Didactique des Mathématiques 2-1, 146-158 (1980). Y. Chevallard, La transposition didactique: Du savoir savant au savoir enseigné (La Pensée Sauvage, Grenoble, 1985). H.L. Chick and M.K. Baker, in Proceedings of the 29th Conference of the International Group for the Psychology of Mathematics Education, Eds. H.L. Chick and J.L. Vincent, (Psychology of Mathematics Education-PME, Melbourne, 2005), pp. 249-256. J.P. Cruchtfield, Physica D 75, 11-54 (1994). B. D’Amore and P. Sandri, L’Insegnamento della Matematica e delle Scienze Integrate 19A(3), 223-246 (1996). M. Dunkin and B. Biddle, The Study of Teaching (Holt, Rinehart & Winston, New York, 1974). B. Fischer and L. Fischer, Educational Leadership 36(4), 245-251 (1979). D. Fontana and M. Fernandes, British Journal of Educational Psychology 64(3), 407-417 (1994). J.H. Holland, Emergence from Chaos to Order (Perseus Books, Cambridge, Massachusetts, 1998). H. Martin, Integrating Mathematics Across the Curriculum (Skylight Training and Publishing, Arlington Height, 1999). M.E. Mc Intosh, Clearing House 71(2), 92-97 (1997). G. Minati and E. Pessa, Collective Beings (Springer, New York, 2006). Newton D.P. and Newton L.D., Coordinating Science Across the Primary School, (Falmer Press, London, 1998). M. Polo, L’Educazione Matematica 20(1), 4-15 (1999). M. Polo, M. Alberti, L. Cirina, S. Saba, L’Educazione Matematica (forthcoming) (2007). A. Ramiszowski, Designing Instructional Systems (Kogan Page, London, 1984). R.E. Reys, M.N. Suydam, M.N. Lindquist and N.L. Smith, Helping Children Learn Mathematics (Allyn & Bacon, Boston). B. Sarrazy, Revue Française de Pedagogie 112, 85-118 (1995).
816
M. Alberti et al.
26. B. Sarrazy, La sensibilité au contrat didactique, PhD Thesis, (University of Bordeaux, 1996).
27. B. Sarrazy and J. Novotnà, in SEMT 05, Ed. J. Novotnà, (Pedagogical Faculty of the University of Prague, 2005), pp. 33-45.
28. L.S. Shulman, Educational Researcher 15(2), 4-14 (1986). 29. S. Zemelman, H. Daniels and H. Hyde, Best Practice: New Standards for Teaching and Learning in America’s School (Heinemann, Portsmouth, 1998).