Advances in Human-Computer Interaction
Advance s i n H uman-Co mpute r I nte ractio n
Edited by
Shane Pinder
In-Tech
IV
Published by In-Teh In-Teh is Croatian branch of I-Tech Education and Publishing KG, Vienna, Austria. Abstracting and non-profit use of the material is permitted with credit to the source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside. After this work has been published by the In-Teh, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work. © 2008 In-teh www.in-teh.org Additional copies can be obtained from:
[email protected] First published October 2008 Printed in Croatia
A catalogue record for this book is available from the University Library Rijeka under no. 120101004 Advances in Human-Computer Interaction, Edited by Shane Pinder p. cm. ISBN 978-953-7619-15-2 1. Human-Computer. 2. Interaction I. Shane Pinder
V
Preface
It is an impossible task to bring together, under a single cover, the many facets of humancomputer interaction. After all, we have come a long way in the past several decades, to a point where we consider not only what may be intuitive to the designer, but rather the user, the environment, and the intent of our efforts. No single person can claim expertise across the entire field, which now brings together professions that formerly did not share the same vocabulary and still, in many cases, do not share the same philosophy. It is only appropriate that the study of human-machine interaction reveals a greater complexity within humanto-human interaction. In these 34 chapters, we survey the broad disciplines that loosely inhabit the study and practice of human-computer interaction. Our authors are passionate advocates of innovative applications, novel approaches, and modern advances in this exciting and developing field. It is our wish that the reader consider not only what our authors have written and the experimentation they have described, but also the examples they have set. This book brings together the work of experts around the world who have used their expertise to overcome barriers in language, culture, and abilities, to name only a few challenges. The editors would like to thank the authors, who have committed so much effort to the publication of this work.
Editor
Shane Pinder Director of Research Defiant Engineering Canada
[email protected]
VII
Contents
1.
Preface
V
Technology Enabled Learning Worlds
001
Ray Adams and Andrina Grani
2.
Current Challenges and Applications for Adaptive User Interfaces
013
Victor Alvarez-Cortes, Víctor H. Zárate, Jorge A. Ramírez Uresti and Benjamin E. Zayas
3.
Towards a Reference Architecture for Context-Aware Services
031
Axel Bürkle, Wilmuth Müller and Uwe Pfirrmann
4.
A Robust Hand Recognition In Varying Illumination
053
Yoo-Joo Choi, Je-Sung Lee and We-Duke Cho
5.
How Do Programmers Think?
071
Anthony Cox and Maryanne Fisher
6.
Experiential Design: Findings from Designing Engaging Interactive Environments
085
Peter Dalsgaard
7.
Evaluation of Human Cognitive Characteristics in Interaction with Computer
107
Neboja orevi and Dejan Rani
8.
Audio Interfaces for Improved Accessibility
121
Carlos Duarte and Luís Carriço
9.
Intelligent Interfaces for Technology-Enhanced Learning
143
Andrina Grani
10.
Design of Text Comprehension Activities with RETUDISAuth
161
Grammatiki Tsaganou and Maria Grigoriadou
11.
Computer-based Cognitive and Socio-emotional Training in Psychopathology
173
Ouriel Grynszpan
12.
Facial Expression Recognition as an Implicit Customers Feedback Zolidah Kasiran, Saadiah Yahya (Dr) and Zaidah Ibrahim
189
VIII
13.
Natural Interaction Framework for Navigation Systems on Mobile Devices
199
Ceren Kayalar and Selim Balcisoy
14.
Review of Human-Computer Interaction Issues in Image Retrieval
215
Mohammed Lamine Kherfi
15.
Smart SoftPhone Device for Networked Audio-Visual QoS/QoE Discovery & Measurement
241
Jinsul Kim
16.
Sonification System of Maps for Blind
263
Gintautas Daunys and Vidas Lauruska
17.
Advancing the Multidisciplinary Nature of HCI in an Undergraduate Course
273
Cynthia Y. Lester
18.
Simple Guidelines for Testing VR Applications
289
Livatino Salvatore and Koeffel Christina
19.
Mobile Device Interaction in Ubiquitous Computing
311
Thorsten Mahler and Michael Weber
20.
Integrating Software Engineering and Usability Engineering
331
Karsten Nebe, Dirk Zimmermann and Volker Paelke
21.
Automated Methods for Webpage Usability & Accessibility Evaluations
351
Hidehiko Okada and Ryosuke Fujioka
22.
Emotion Recognition via Continuous Mandarin Speech
365
Tsang-Long Pao, Jun-Heng Yeh and Yu-Te Chen
23.
Nomad Devices Adaptation for Offering Computer Accessible Services
385
L. Pastor-Sanz, M. F. Cabrera-Umpiérrez, J. L. Villalar, C. Vera-Munoz, M. T. Arredondo, A. Bekiaris andC. Hipp
24.
Rewriting Context and Analysis: Bringing Anthropology into HCI Research
397
Minna Räsänen and James M. Nyce
25.
Interface Design of Location-Based Services
415
Chris Kuo-Wei Su and Li-Kai Chen
26.
Brain-CAVE Interface Based on Steady-State Visual Evoked Potential
437
Hideaki Touyama
27.
Multimodal Accessibility of Documents
451
Georgios Kouroupetroglou and Dimitrios Tsonos
28.
The Method of Interactive Reduction of Threat of Isolation in the Contemporary Human Environment Teresa Musio and Katarzyna Ujma-W sowicz
471
IX
29.
Physical Selection as Tangible User Interface
499
Pasi Välkkynen
30.
Geometry Issues of Gaze Estimation
513
Arantxa Villanueva, Juan J. Cerrolaza and Rafael Cabeza
31.
Investigation of a Distance Presentation Method using Speech Audio 535 Navigation for the Blind or Visually Impaired Chikamune Wada
32.
The Three-Dimensional User Interface
543
Hou Wenjun
33.
User Needs for Mobility Improvement for People with Functional Limitations
575
Marion Wiethoff, Sacha Sommer, Sari Valjakka, Karel van Isacker, Dionisis Kehagias and Dimitrios Tzovaras
34.
Recognizing Facial Expressions Using Model-based Image Interpretation Matthias Wimmer, Zahid Riaz, Christoph Mayer and Bernd Radig
587
1 Technology Enabled Learning Worlds 1CIRCUA,
Ray Adams1 and Andrina Granić2
School of Computing Science, Middlesex University 2Faculty of Science, University of Split 1United Kingdom, 2Croatia
1. Introduction We live in a dramatically evolving knowledge society that is founded on the assumption of equal access to relevant skills and technology-dispensed knowledge. If so, then effective inclusion in society requires powerful new learning resources. In this newer social context, organisations may increasingly become learning organisations and employees may increasingly become knowledge workers. At the same time, new levels of accessibility are equally important to motivate the identification and removal of new barriers to inclusion created inadvertently by new technologies. On this basis, our purpose here is to identify and evaluate some of the key issues that are essential to the new types of learning that will be needed by knowledge workers in learning organisations. To do so, we combine expertise in cognitive science and computing science. First, we present and evaluate three different approaches to human learning supported by technology: • definition of learning resources; learning resources are defined as information that is stored in a variety of media that supports learning, including materials for example in print, video and software formats, • definition of (technology-enhanced) learning environments; learning environments, as places arranged to enhance the learning experience, are defined on an interdisciplinary basis comprising three essential components: pedagogical functions, appropriate technologies and social organization of education and • definition of learning worlds; learning worlds are partially immersive, virtual milieu that deploy smart and adaptive teaching and learning technologies to create novel experiences based on sound pedagogical and psychological principles. Second, we present and evaluate some key issues that include: • The changing role of digital libraries to meet the increasing thirst for knowledge. • How can learning environments be designed and evaluated for accessibility, usability and ambient smartness? • The design and development of more effective, technology-enhanced learning environments. • How can ubiquitous learning environments be developed? • What new assessment methods must be developed to guide the design and development of such systems?
2
Advances in Human-Computer Interaction
•
How can new technologies such as virtual reality applications and brain computer interfaces be applied to effective human learning? We show how a simple but innovative synthesis of key disciplines such as computing science and cognitive science, can be deployed in combination with such topics as ergonomics, e-learning, pedagogy, cognitive psychology, interactive system design, neuropsychology etc to create new learning worlds that boost human learning to meet the demands of the 21st century.
2. A framework for different approaches to human learning supported by technology There are at least three different perspectives on human learning, namely learning resources, technology-enhanced learning environments and learning worlds as defined in turn below. As our primary focus is on human learning, our treatment of learning resources, technology-enhanced learning environments and learning worlds etc will also need to have a focus on the human. To do so, we introduce a simple and convenient structure that may help you to see the key issues and what needs to be done with them in the dual context of human learning and e-learning technologies. Only the relevant details will be presented here, but you may wish to follow up any issues of interest or where you need greater clarity, by referring to our reference list. At a simple but effective level, a human technology system can be captured by a consideration of: • A user model (a depiction of the requirements, preferences, strengths and weaknesses of the intended users / students) • A technological model (a description of the key parameters of the technological platforms to be used, permanent, default or current) • A context-of-use model (a model that captures the relevant aspects of the context or contexts for which the system is intended; namely software such as the operating system, the physical context such as at home or in a street, the psychological context such as working as part of a small or large team and the social / cultural context such as a Western European country or a South American location). • A task model (a model that captures the nature and features of the task or tasks that the system is intended to support, such as a traveller dealing with emails or a tourist storing and displaying photographs). Here, of course, we are particularly looking at the subset of tasks that are to do with the human acquisition of new knowledge and skills. Also, in this sub-context, the user is more likely to be referred to as a student, learner etc. To make the above structure a little more concrete, we now present a little more of a typical user model structure. To do so, we have chosen our own user model structure, not because it is the best, but because it is both typical and relatively simple. As you will see from the diagram (see Fig. 1), Simplex Two is a theory that seeks to capture the key aspects of human information processing by identifying nine components of human cognition and related processes. These nine components have been validated in two recently published studies (Adams, 2007) that show that Simplex Two captures many, if not all, vital, global aspects of human psychology. Readers should consult this flagship paper if they want to consider the justification and natures of each component or module. The theory
Technology Enabled Learning Worlds
3
is set out below as a flow diagram in which the human is depicted, in part, as a processor of information. Information enters the system through the senses into a sensory / perceptual system or into a feedback system and then into the Executive Function. This Executive Function orchestrates all conscious activities of the system and the eight other modules. However, each module possesses both memory and the capacity to process and transform any information that it holds. The Executive Function creates the necessary coordination required between the different functions so that a specific task can be carried out. Each module of Simplex Two captures an important aspect of the human learner’s psychology. Each module has been selected for three reasons. First, it is an important overall aspect of human psychology, second it is reflected in the concerns of interactive system designers and third it is identified in the meta-analyses reported by Adams (2007). The nine modules are summarised as follows: 1. Perception / input module. This module deals with the initial registration and evaluation of incoming sensory information and, in conjunction with other modules, initial attempts at sense making. 2. Feedback management. Surprisingly, the human brain seems to have at least two perceptual systems (Milner & Goodale, 1995), including a second input system that deals with the feedback that arises as a consequence of our own actions (both physical and cognitive). This dichotomy is also found in the interests of system designers and current work on system design (Adams, 2007). This module processes the feedback provided to the learner from the environment and from e-learning resources. 3. Working memory. When we carry out any task, we often have to hold information in our head whilst doing so. For example, we hold a phone number, a password or sets of instructions. This is referred to as working memory (Baddeley and Hitch, 1974; Baddeley, 2000). Timescales vary, but many tasks would be impossible were it not for this function (Ericsson & Kintsch, 1995). Working memory is an important component of Broadbent’s Maltese cross theory (Broadbent, 1984), a theory from which Simplex has developed. This module of Simplex keeps and processes the information that we need to hold in mind whilst carrying out our tasks. 4. Emotions and drives. When we are dealing with incoming information, it is often of some significance to us, rather than being neutral. It may be interesting, threatening, stressful, frustrating, relevant etc. The human mind is quick to determine if something in our environment is a threat or an attraction and to respond accordingly. This module deals with the emotional and motivational responses to events, imbuing them with significance and meaning. Even with e-learning, the student’s emotions and motivations become engaged and exert a significant influence on learning and performance. For computer learning systems, the designer must take significant account of the intended learners’ emotional and motivational responses. Do they enjoy using the system or is it irritating or frustrating? Do they find the system a source of motivation or discouragement? 5. Output. This module stores and selects the correct responses that a student needs to make in a given context to a given stimulus. To do so, it must set up and hold the required response in memory and to build up a long-term repertoire of possible responses associated with specific context and stimuli. 6. Output sequences. In many cases, the learner must construct a complex sequence of responses as an important aspect of new skill acquisition. For example, we often learn a
4
7.
8.
9.
Advances in Human-Computer Interaction
sequence of keystrokes on a computer that are required to carry out a task, such as sending out an email. The complex sequence of actions seems to “fire off” without reference to specific contexts or stimuli for specific actions. Both researchers and designers make the distinction between responses and complex response sequences. Long term memory. This module provides the long term storage and processing of the knowledge that we require to carry out everyday activities such as studying and developing skills. It is the major source of declarative knowledge i.e. knowledge that we can declare. It also provides that information to support the tasks that need it. For example, consider when a symbol on a computer screen reminds us of something we saw once before or when we need to remember what a specific symbol means. Some tasks require only a little of our knowledge (for example, simple same different judgements) whilst other tasks depend upon much greater quantities of learned information, for example language translation. Mental models. This module provides the capacity to create and retain the mental models that are required to conduct specific tasks, such as navigating around the University Library or around a supermarket, solving logical problems (Some As are Bs, some Bs are Cs; are some As also Cs?) or counting the windows in your home. Executive functions. The Executive Module transfers information between the different modules, transforms information, retains a record of current progress and records the transactions / structures that are required to carry out a task or set of tasks. It also learns to create more efficient transactions / structures with feedback. The Executive Function is far from being a homunculus (a fictional person in your head that tells you what to do) but is an important component of human cognition. It is often associated with the frontal lobes of the human brain such that injuries to these areas can result in disastrous failures of executive functions.
Figure 1. Simplex two
Technology Enabled Learning Worlds
5
3. Learning resources for e-learning students Learning resources are defined as information that is stored in a variety of media that supports learning, including materials for example in print, video and software formats. Considering the nine-point approach to Simplex Two, resources can be classified accordingly, relating to each of the nine components. The educationalist should consider each of the following nine fields when designing learning resources for their intended students. 1. Input resources refer to the different human senses with which information is received and understood. For example, information may be presented in different ways (e.g. audio or visual) or through multimedia (e.g. audio, visual and video). Each type of input has different types of features that may be more or less helpful for different tasks. For example, sound may be dramatic and impressive but visual materials are more persistent. 2. Feedback is considered to be essential to successful learning by most experts (e.g. Annett, 1994; Juwah et al, 2004). But it can be delivered in many different ways and through many different modalities (sight, hearing etc). Juwah et al suggest that principles of effective feedback can be identified. They tentatively suggest the following seven features of good feedback. It will facilitate self-assessment and reflection, stimulate teacher and peer dialogue, clarify the nature of good performance, (goals, criteria, standards expected), give opportunities to close the gap between actual and required performance, deliver high quality information to support positive thinking and give teachers the information that they can use to enhance teaching. 3. Working memory is now recognised as an important contributory factor to intelligent human performance and learning (for example Engle, Tuholski, Laughlin and Conway, 1999; Baddeley, 2000; Oberauer, Schulze, Wilhelm and Su¨ß, 2005). If so, the educationalist should be careful to allow the intended students to work within their working memory capacity most, if not all, the time. Learning resources should be presented in bite-sized chunks and be relatively digestible. 4. The emotions and drives of the students are important for learning success. O’Regan (2003) has concluded that emotions are central and essential to e-learning. Sankaran (2001) stressed the importance of motivation in e-learning. Clearly, learning resources must be chosen carefully to aim for a positive-emotion student experience and be suitably motivating, especially when using controversial or sensitive materials. 5. The response requirements of the developed learning resources should enable students to make appropriate responses that are neither too difficult nor arbitrary. 6. The learning resources should support the students in their attempts to develop complex and skilled response sequences. 7. The learning resources should not make unrealistic demands on the prerequisite knowledge that students must possess before they can participate in the proposed. They should not overload long-term memory. 8. The learning resources should be organised and presented to as to enable students to create suitably adequate mental models with which to structure newly acquired knowledge. 9. The students need to be supported so that they can deploy and develop their executive skills to develop overall learning strategies.
6
Advances in Human-Computer Interaction
4. Technology-enhanced learning environments Learning environments can be characterized as places arranged to enhance the learning experience. They are defined on an interdisciplinary basis based on three essential components: pedagogical functions, appropriate technologies and the social organization of education. Widdowson (posted 21st May, 2008) asks “We can only create effective learning environments once we are clear about learning itself. What learning should young people be engaged in and what should learning be like for our 21st century learners, both today and in the future?” The author goes on to suggest some critical questions. They include (our wording) the following. What learning spaces are needed to create better opportunities for active, practical, collaborative, individual and constructive learning that engages the students? How can we design learning spaces to enable learners to develop and progress? How can measure learning environment effectiveness? Do our learning environments challenge and inspire young people? How do our learning environments support flexibility and student diversity? We suggest that systematic and substantial answer to these questions and other related questions depends, in part, upon the development of a framework such as Simplex Two, or something better. As Widdowson (2008) concludes, the design of a learning space must be based upon an understanding of learning itself. We would also add that it should also be based on an appreciation of the diverse skills, requirements and preferences of the intended students (Adams, 2007; Adams and Granić, 2007). Clearly, learning environments must be fit for purpose (Day, 1995). One way to ensure fitness for purpose of e-learning materials is the creation and maintenance of different versions that are suitable for the different student populations (i.e. versioning: Brooks, Cooke, and Vassileva; 2003). Second, they should also inspire a sense of wonder about learning itself. To quote Albert Einstein “The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom this emotion is a stranger, who can no longer pause to WONDER and stand rapt in awe, is as good as dead: his eyes are closed” (http://www.quotationspage.com/quote/1388.html; accessed August 08). For example, McGinnis, Bustard, Black and Charles (2008) have argued “e-learning should be enjoyed in the same way (insert: as computer games) and can be enhanced by incorporating games techniques in e-learning system design and delivery.” However, Burg and Cleland (2001) have cautioned that poorly implemented computer based learning can also destroy any sense of wonder in learning. Third, learning environments must also be accessible. For example, Williams and Conlan (2007) would counteract cognitive accessibility problems by providing a means whereby users can visualize the complex space in which they are learning. In fact, accessibility problems can be found at any of the nine components of human information processing presented by Simplex Two (see above). A complementary approach is offered by Adams, Granić and Keates (2008), as shown in Table 1 below. They proposed five levels of accessibility that can be applied to an e-learning system and parallel them with the Internet layered model. They are hardware access (problems caused by lack of access to the necessary technology), connectivity access (problems with access to systems and resources), interface access (design of the interface creates accessibility difficulties), cognitive access (problems of navigation and accessing the contents of an application or website) and goal / social access (where a system allows you to access your goals). The e-learning system developer should find it a simple process to
7
Technology Enabled Learning Worlds
monitor these five aspects of the accessibility of their system. These five aspects can be applied to both the specific requirements of their intended users and to the challenges set by the design of the e-learning system itself. Comparing:
Accessibility types
Internet layered model
1
hardware access
physical
2
connectivity access
data link
3
interface access
network
4
cognitive access
transport
5
goal / social access
application
Table 1. Accessibility and the Internet layered model
5. Learning worlds Learning worlds (or environments) are partially immersive, virtual milieu that deploy smart and adaptive teaching and learning technologies to create holistic and novel experiences based on sound pedagogical and psychological principles. The learning world or environment provides the learner with a consistent, familiar and over-arching context in which the user can access a range of different learning experiences and systems. They can also be personalised to match the requirements, strengths and weaknesses of an individual learner, creating higher levels of accessibility whilst still supporting collaborative working. For example, van Harmelen (2006) argues that learning environments can support different pedagogic methods, be collaborative or personal, easily extended or not, capable of supporting customisation or not and controlled by teachers or students or some balance of both. Learning worlds can vary significantly in a number of dimensions. First, learning worlds vary significantly in size. For example, Warburton and Pérez-García (2008) present the concept of the massive multi-user virtual environment (MUVE), whilst Blanc-Brude, Laborde and Bétrancourt (2003) present a learning micro-world. Second, learning worlds can be outdoors and mobile. For example, Chang, Shih and Chang (2006) present outdoor mobile learning, whilst most applications are located within traditional physical settings. Learning worlds can focus on different aspects of learning. Whilst, some learning worlds focus on cognitive skills (Jackson and Fagan, 2000) others focus on social factors. Skog, Hedestig and Kaptelinin (2001) report the Active Worlds system that is designed to support social interactions in 3D virtual learning spaces for distance education. The aim is to build on tacit aspects of social knowledge that are seen as critical for effective learning. A long list of these dimensions could be constructed, but the main point to be drawn is that learning worlds have the power to serve a wide variety of different learning objectives. The two caveats we would add are as follows. (a) The sheer power of the learning world concept means that it needs very careful handling. Technology for its own sake can detract from the learning experience unless learning world designs are learner centred. (b) Learning world designs should be sensitive to the accessibility requirement of the intended users.
8
Advances in Human-Computer Interaction
6. The changing role of digital libraries to meet the increasing thirst for knowledge It is clear that digital libraries contribute substantially to the provision of structured, declarative knowledge to online communities of students and scholars, who need to have substantial, accessible cognitive resources at their fingertips. Such libraries represent a major investment of both applied computing science and pedagogic expertise. They proffer potentially very valuable support for learning through the building of smart learning environments. At the moment, sad to say, digital libraries are often set behind monolithic interfaces that can offer an overwhelming richness of data. But that should not blind us as to their potential to provide smart, accessible, cognitive support for human learning in the context of the inclusive knowledge society. There is no doubt that Digital Libraries can be an exciting and impressive way for students to glean new knowledge without straying too far from the study. They certainly offer considerable volumes of declarative knowledge to students working within the context of the modern Information Society. At the same time, some of our students find Digital Libraries to be difficult to use and the contents of Digital Libraries difficult to access. There is a considerable volume of work if we are to go beyond the simply impressive nature of the size and contents of Digital Libaries to develop the extent to which current and future digital libraries, can be made sufficiently usable, accessible and smart to support an inclusive information society and the aspiration of universal access (for example; Adams and Granić, 2007). These authors used a set of converging methods (separate measures of usability, accesibility and system smartness) to evaluate a random sample of digital libraries through their websites. They concluded that, whilst Digital Libraries are both substantial and functional repositories for knowledge, they can be improved significantly, particularly in their accessibility and smartness. They presented substantial statistical significance levels in their data. A new measure of system smartness is introduced and found to provide a useful metric for present purposes, though it is clear that further work will be needed.
7. How can learning environments be designed and evaluated for accessibility, usability and ambient smartness? Many digital libraries are impressive in principle but often difficult to use in practice. If so, what comes after the present generation of digital libraries? One more future-proof concept is the notion of the ubiquitous knowledge environment or post-digital library. It is defined by its aims. They include the achievement of suitable levels of accessibility, usability and smartness for their structures, contents and their user interfaces. A second aim is to make the post-digital library available on a more ubiquitous and mobile basis. The first step towards such developments has been the evaluation of current digital libraries to identify their potential to support inclusive, smart ubiquitous knowledge environments (Adams and Granić, 2007). Clearly, the second step is begin to develop new designs for these ubiquitous knowledge environments, in the light of the specific design issues raised by current work. Can digital libraries become smarter and more accessible, in so doing creating ubiquitous knowledge environments? The concept of the ubiquitous knowledge environment seeks to capture the potential for convergence between current and future technologies, moving towards towards ambient intelligence and ubiquitous computing. Such knowledge environments move beyond the desktop or laptop to form part of our physical
Technology Enabled Learning Worlds
9
environments. Thus we should consider the creation of smart settings, with ubiquitous knowledge environments as a vital component. Access to the knowledge encapsulated would be processed through smarter everyday artifacts that deploy sensor data based responses in order to provide the users with relevant, user-model specific, situation-aware and context-of-use-aware knowledge at the point of use.
8. The design and development of more effective, technology-enhanced learning environments Clearly, there is a long way to go in the development of better technology-enhanced learning environments. On the one hand, there is potential applicable power of new technologies that deliver learning materials and support student / system interactions in ways that are impressive, at least to the system designers and educators, if not always to all of the intended students. On the other hand, there is a growing awareness of the importance of insights into (a) the psychology of human learning coupled as well as into (b) the diversity of human learning requirements and related user modelling to capture that diversity. The danger is that we depend upon the impact of new technologies, 3D effects, virtual worlds, augmented realities, ambient intelligence in support of trivial applications etc. On the contrary, we must attain a basic level of understanding of the diversity of human learning and how to create user-sensitive methods with which to learning environments. As discussed above, learning environments must, inter alia, be fit for purpose, usable and accessible, also responding to learner interactions in smarter ways. Drawing upon emerging technologies that are subjugated to learning objectives, the development of ubiquitous knowledge environments must also draw upon advanced design and development methodologies. Such advanced methodologies should pay sufficient attention to the evaluation of system relevance, fitness for purpoe, usability and acessibility based upon robust measurement methods.
9. How can ubiquitous learning environments be developed? If higher standards of user satisfaction, usability, accessibility and system smartness can be achieved, then it would be possible to create convergence between technologies such as digital libraries, artificial intelligence (in the weaker sense of simulating human behaviour), ambient intelligence and ubiquitous computing. The substantive contents of such knowledge environments could be unleashed into the external, physical world rather than staying on the desktop or laptop. If so, the present methods of questionnaire based evaluation would focus not only on significant components of the smart environment like the smart digital library, but more so on an evaluation of the overall, smart environments themselves. These methods, or their successors, could be used to design and evaluate better ubiquitous knowledge environments. Access to the knowledge encapsulated would be accessed and processed through smarter everyday artefacts that deploy sensor data based responses in order to provide the users with relevant, user-model specific, situation-aware and context-of-use-aware knowledge at the point of use.
10
Advances in Human-Computer Interaction
10. What new assessment methods must be developed to guide the design and development of such systems? Clearly, Digital Libraries provide a useful and invaluable source of knowledge on a wide range of subjects, for a significant number of scholars and students. There can be few working scholars who do not make use of them. Digital Libraries are, at the moment, set behind monolithic interfaces that are typically accessed from the desktop or laptop environments in working or academic buildings. Surprisingly, however, the sample of libraries evaluated here clearly needed improvements in both accessibility and smartness. (These questionnaires are available from the authors). Clearly, if digital libraries are to form the basis for the realization of ubiquitous knowledge environments, they will become smarter and more accessible. We strongly recommend that Digital Library developers and redevelopers should evaluate their systems for user satisfaction, usability, accessibility and system smartness. One approach would be to use expert judgements but an equally important approach is to involve a sample of intended users, building useful user models in the process.
11. How can new technologies such as virtual reality applications and brain computer interfaces be applied to effective human learning? Exciting new technologies, such as virtual reality applications and brain computer interfaces offer new potentialities for e-learning for the twenty first century. Virtual reality applications (VRAs) allow us to create virtual learning worlds in which dangerous, risky, expensive or unexplored skills can be explored and acquired. However, it is clear that the design of such virtual worlds is not easy. Whilst design heuristics can guide their design, it turns out that VRA effectiveness depends on different design points than, say, a website. This, in turn, means that developers should use VRA specific design heuristics. In the above discussions and analyses, we have shown how a simple but innovative synthesis of key disciplines such as computing science and cognitive science, can be deployed in combination with such topics as ergonomics, e-learning, pedagogy, cognitive psychology, interactive system design, neuropsychology etc to create new learning worlds that will augment and boost human learning to meet the demands of the 21st century.
12. Conclusions and Recommendations There are a number of important recommendations to both the practitioner and the researcher in the advancing world of e-learning environments. First, perhaps most importantly, advances in technology per se should not be confused with advances in human learning capabilities enhanced by technology. Whilst it is clear that technological innovations offer the potential of exciting new developments in e-learning in functionality, usability and accessibility, it is also now clear that new technologies can create new problems for e-learning system design. For new technologies, new design heuristics may be needed, since the critical success factors of design change, not only with different groups of intended students but also with different types of technologies. Second, this first point leads to the second point, namely the importance of user-sensitive design, where both the student and the teacher should be viewed as users of the system. The system design should not only support the requirements of each type of user, but also support true collaboration
Technology Enabled Learning Worlds
11
between them. The importance of prototyping, user and expert evaluations and iterative design methods should all be used to support user sensitive design. Third, powerful, effective but simple methods of evaluation of e-learning systems should be used, including functionality, fitness for purpose, usability, accessibility and system smartness. We have used a series of questionnaires that we can give you to support expert and user judgement. Clearly, there is potentially a great diversity of different types of e-learning environments, with an equally diverse range of learning objectives to be achieved. These diversities should not be seen, in themselves as problems, but rather as opportunities to make the current and growing diversity of the teachers, students, researchers and practitioners who wish to use the best and mot relevant –learning system available to them
13. References Adams, R. (2007). Decision and Stress: Cognition and e-Accessibility in the Information Workplace. Universal Access in the Information Society. 5, 363-379. Adams, R. and Granić, A. (2007). Creating Smart and Accessible Ubiquitous Knowledge Environments. Lecture Notes in Computer Science. 4555, 13-22. Springer Berlin / Heidelberg Adams, R., Granić, A. and Keates, L. S. (2008). Are ambient intelligent applications more universally accessible? In Universal Access to Novel Interaction Environments: Challenges and Opportunities CHAIR: C. Stephanidis. In AHFE International 2008. Annett, J. (1994). The learning of motor skills: sports science and ergonomics perspectives. Ergonomics, 37, 5-16. Baddeley, A.D., Hitch, G.J. (1974). Working Memory, In G.A. Bower (Ed.), The psychology of learning and motivation: advances in research and theory (Vol. 8, pp. 47-89), New York: Academic Press. Baddeley, A.D. (2000). The episodic buffer: a new component of working memory? Trends in Cognitive Sciences, 4, 417-423. Blanc-Brude, T., Laborde, C. and Bétrancourt, M. (2003). Usability of speech based and multimodal interaction in a learning micro-world. IHM 2003: Proceedings of the 15th French-speaking conference on human-computer interaction on 15eme Conference Francophone sur l'Interaction Homme-Machine Broadbent, D.E. (1984). The Maltese cross: A new simplistic model for memory. The Behavioral and Brain Sciences. 7 (1): 55-68. Brooks, C., Cooke, J. and Vassileva, J. (2003). Versioning of Learning Objects. Proceedings of the 3rd IEEE International Conference on Advanced Learning Technologies 296- 297. Burg, J. and Cleland, B. (2001). Computer-Enhanced or Computer-Enchanted? The Magic and Mischief of Learning With Computers. ED-MEDIA 2001 World Conference on Educational Multimedia. Chang, C., Shih, K. & Chang, H. (2006). Some Studies on Technology Support for Outdoor Mobile Learning. In E. Pearson & P. Bohman (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 (pp. 1170-1177). Chesapeake, VA: AACE. Day, C. W. (1995). Qualitative Research, Professional Development and the Role of Teacher Educators: Fitness for Purpose. British Educational Research Journal, 21,. 357-369.
12
Advances in Human-Computer Interaction
Engle RW, Tuholski SW, Laughlin JE, Conway AR. (1999). Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. J Exp Psychol. Gen.,128, 309-31 Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211-245. Jackson, R. L. and Fagan, E. (2000) Collaboration and learning within immersive virtual reality. CVE '00: Proceedings of the third international conference on Collaborative virtual environments. New York, NY: ACM. Juwah, C., Macfarlane-Dick, D., Matthew, B., Nicol, D., Ross, D. and Smith, B. (2004). Enhancing student learning through effective formative feedback. The Higher Education Academy Generic Centre - June 2004. (copy available from the present authors). McGinnis, T., Bustard, D.W., Black, M. and Charles, D. (2008). Enhancing E-Learning Engagement using Design Patterns from Computer Games. First International Conference on Advances in Computer-Human Interaction IEEE ACHI 2008) Milner, A.D. & Goodale, M.A. (1995) The visual brain in action. Oxford: Oxford University Press. Oberauer, K., Schulze, R., Wilhelm, O. and Su¨ß, H. (2005). Working Memory and Intelligence—Their Correlation and Their Relation: Comment on Ackerman, Beier, and Boyle. Psychological Bulletin, 131, 61–65. O’Regan, K. (2003). Emotion and E-Learning. Journal of Asynchronous Learning Networks, 7, 78-92 Sankaran, S. R. and Bui, T. (2001). Impact of Learning Strategies and Motivation on Performance: A Study in Web-Based Instruction. Journal of Instructional Psychology, September, 1-9. Skog, D., Hedestig, U. & Kaptelinin, V. (2001). Supporting Social Interactions in Distance Education with a 3D Virtual Learning Space. In C. Montgomerie & J. Viteli (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2001 (pp. 1751-1752). Chesapeake, VA: AACE. van Harmelen, M. (2006). Personal Learning Environments. Proceedings of the Sixth International Conference on Advanced Learning Technologies (ICALT'06) Warburton, S. & Pérez-García, M. (2008). Motivating pupils, linking teachers through active learning with Multi-Users Virtual Environments. In Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 (pp. 3574-3578). Chesapeake, VA: AACE. Widdowson, R. (posted 21st May, 2008). Designing effective learning environments. http://flux.futurelab.org.uk. Consulted August 08. (copy available from the present authors). Williams, F.P. and Conlan, O. (2007). Visualizing Narrative Structures and Learning Style Information in Personalized e-Learning Systems. Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007)
2 Current Challenges and Applications for Adaptive User Interfaces Victor Alvarez-Cortes, Víctor H. Zárate, Jorge A. Ramírez Uresti and Benjamin E. Zayas Instituto de Investigaciones Eléctricas; Tecnológico de Monterrey México
1. Introduction In this chapter we present the current advances in the field of adaptive user interfaces, analysing the different research efforts, the challenges involved as well as the more recent and promising directions in this field. Initially, we introduce the foundations of adaptive user interfaces, also referred in technical literature as Intelligent User Interfaces (IUIs), then we move to explore the motivation and rationale for their use, and finally we discuss the challenges they currently have to deal with. In this context, IUIs are presented as a multidisciplinary field, with relevant research and cross-fertilized ideas derived from different areas, however special emphasis is put on the approaches taken by three core disciplines: Artificial Intelligence (AI), User Modelling (UM) and Human-Computer Interaction (HCI). After providing the foundations for IUIs, an in-depth revision for each approach is presented including the most recent findings in models, algorithms and architectures for adaptive user interfaces. Although, adaptive user interfaces are considered a recent research field, this chapter is enriched with a state-of-the-art of IUIs applications. The material included presents the most relevant developed IUIs applied in different real domains either as a research prototype or as a complete system. A methodological analysis of these systems is presented, contrasting its advantages, limitations and domain-dependence for its success and acceptance by users. The analysis aims to uncover common principles for effective IUI design. Also, this chapter details our proposed taxonomy which is applied for the comparison of the different IUIs systems. Finally, the chapter presents the gaps left by the approaches under analysis and concludes with a discussion of the challenges currently open, presenting a number of possible future research directions. Las interfaces de usuario para los sistemas de computación han cambiado mucho en los últimos 20 años. Las primeras interfaces basadas en texto que utilizaban la línea de comando para acceder a los recursos del sistema operativo, han sido sustituidas por interfaces gráficas que son manipuladas a través de dispositivos de entrada como el teclado y ratón. En la actualidad, la interfaces buscan ser más intuitivas al usuario al presentar elementos gráfico de fácil asociación con elementos reales mediante el uso de metáforas (Dix et al., 2003).
14
Advances in Human-Computer Interaction
Un paradigma de interacción utilizado de manera extensa en los sistemas operativos actuales es el uso de múltiples ventanas para presentar información, el uso de iconos para representar elementos del entorno como son carpetas, archivos, dispositivos, etc, junto con el uso de menús y botones, que faciliten la interacción con el sistema. A este paradigma se le conoce como WIMP (Windows, Icons, Menus, Pointers) desarrollada por Xerox PARC en los 80´s, y utilizado inicialmente por las computadoras Apple Macintosh y actualmente disponibles en otros sistemas como Microsoft Windows, OS/Motif, Risc OS y X Window System (Shneiderman & Plaisant, 2004). Sin embargo, aún con estos avances y la funcionalidad ofrecida por las interfaces de usuario de los sistemas actuales, la mayoría de ellas aún siguen siendo limitadas en cuanto al manejo de las diferencias que existentes entre los diversos usuarios de la interfaz, quedando clara que existe una limitación en el desarrollo de sistemas que puedan ser personalizados y adaptados al usuario y al entorno. Las Interfaces Inteligentes de Usuario (IUI, por sus siglas en inglés) es un sub-campo de HCI y tiene como objetivo mejorar la interacción humano-computadora mediante el uso de nueva tecnología en dispositivos de interacción, así como a través del uso de técnicas de inteligencia artificial, que le permitan exhibir algún tipo de comportamiento inteligente o adaptivo.
2. Uso de las IUIs Las IUIs tratan de resolver algunos de los problemas que las interfaces tradicionales, llamadas de manipulación directa (Shneiderman, 1997) no pueden afrontar. • Crear sistemas personalizados: No existen dos usuarios que sean iguales y cada uno tiene diferentes hábitos, preferencias y formas de trabajar. Una interfaz inteligente puede tomar en consideración estas diferencias y proporcionar métodos personalizados de interacción. La interfaz conoce al usuario y puede usar ese conocimiento para establecer el mejor medio de comunicación con el usuario. • Problemas de filtrado y exceso en información: Tratar de encontrar la información necesaria en una computadora o en Internet puede resultar una tarea complicada. Aquí una interfaz inteligente puede reducir la cantidad de información relevante en grandes base de datos. Al filtrar información irrelevante, la interfaz puede reducir la carga cognitiva del usuario. Adicionalmente una IUI puede proponer nuevas y útiles fuentes de información desconocidas al usuario. • Proporcionar ayuda para nuevos programas: Los sistemas de información pueden llegar a ser muy complicados de usar al inicio. Es común que conforme el usuario empieza a entender las funcionalidades del programa, nuevas versiones o actualizaciones aparecen, incluyendo nueva funcionalidad. En esta situación un sistema inteligente de ayuda puede detectar y corregir usos incorrectos o subóptimos para realizar una tarea, explicar nuevos conceptos y proporcionar información para simplificar las tareas. • Hacerse cargo de tareas por el usuario: Una IUI puede ver que es lo que está haciendo el usuario, entender y reconocer su intento y ocuparse de la ejecución completa de ciertas tareas, permitiéndole al usuario enfocarse en otras. • Otras formas de interacción: En la actualidad los dispositivos más comunes de interacción son el teclado y el ratón. Una línea de investigación de IUI conocida como interfaces multimodales investiga nuevas formas de interacción a través de
Current Challenges and Applications for Adaptive User Interfaces
15
otras modalidades de entrada y salida para interactuar con el usuario. Al proporcionarse múltiples formas de interacción, la gente con discapacidades podrá utilizar sistemas complejos de cómputo.
3. Definición y áreas relacionadas A través de los años numerosas definiciones de inteligencia han surgido para definir sistemas y comportamientos, sin embargo no existe un consenso sobre qué se debe considerarse inteligente. No obstante, la mayoría de las definiciones mencionan la habilidad de adaptación ( aprender y lidiar con nuevas situaciones), la habilidad para comunicarse y la habilidad para resolver problemas (Russell & Norvig, 2003). Una interfaz “normal”es definida de manera simple como la comunicación entre un usuario (humano) y una máquina (Meyhew, 1999). Una extensión de esta definición para una interfaz inteligente es que utiliza algún tipo de componente inteligente para llevar a cabo la comunicación humano-computadora. Por eso también se les conoce como interfaces adaptivas, porque tienen la habilidad de adaptarse al usuario, comunicarse con él y resolverle problemas. Una definición más formal es: Las interfaces inteligentes de usuario buscan mejorar la flexibilidad, usabilidad y poder de interacción humano computadora para todos los usuarios. Al hacerlo, explotan el conocimiento de los usuarios, las tareas, herramientas y contenido, así como los dispositivos para soportar la interacción con diferentes contextos de uso (Maybury99].
Figure 1. IUIs y sus diferentes disciplinas Debido a que la adaptación y la resolución de problemas son temas centrales de investigación en inteligencia artificial, muchas IUIs están fuertemente orientadas al uso de técnicas de IA, sin embargo no todas las IUIs tienen capacidad de aprendizaje o resolución de problemas. Muchas interfaces que se denominan inteligentes se enfocan al aspecto del canal de comunicación entre el usuario y el sistema (máquina) y generalmente aplican
16
Advances in Human-Computer Interaction
técnicas nuevas de interacción como procesamiento de lenguaje, seguimiento de mirada (gaze tracking) y reconocimiento facial. Lo cierto es que muchos campos de investigación tienen influencia en las IUIs tales como la sicología, ergonomía, factores humanos, ciencias cognitivas y otras como se muestra en la figura 1. Una de las propiedades más importantes de las IUIs es que son diseñadas para mejorar la comunicación entre el usuario y la máquina. No importa mucho que tipo de técnica sea utilizada para conseguir esta mejora. Una lista de varios tipos de técnicas que son usadas en las IUIs son: • Tecnología de entrada inteligente: se refiere al uso de técnicas para obtener la entrada del usuario. Estas técnicas incluyen lenguaje natural (reconocimiento de habla y sistemas de diálogo), seguimiento y reconocimiento de gestos, reconocimiento de expresiones faciales y lectura de labios. • Modelado de usuario: aquí se incluyen las técnicas que le permiten a un sistema mantener o inferir conocimiento acerca de un usuario basado en la entrada de información recibida. • Adaptividad de usuario: comprende todas las técnicas que permiten que la interacción humano-computadora sea adaptada a diferentes usuarios y diferentes situaciones de uso. • Generación de explicaciones: comprende todas las técnicas que permiten que la interacción humano-computadora sea adaptada a diferentes usuarios y diferentes situaciones de uso. • Personalización: Para poder personalizar las IUIs normalmente incluyen una representación del usuario. Estos modelos de usuario registran datos acerca del comportamiento del usuario, su conocimiento y habilidades. Nuevo conocimiento del usuario se puede inferir basado en las entradas y el historial de interacción del usuario con el sistema. • Flexibilidad de Uso: Para ser flexibles muchas IUIs utilizan adaptación y aprendizaje. La adaptación toma lugar en el conocimiento almacenado en el modelo del usuario al hacer nuevas inferencias al usar la entrada actual. El aprendizaje ocurre cuando cambia el conocimiento almacenado para reflejar las nuevas situaciones encontradas o nuevos datos.
4. Inteligentes y Sistema Inteligentes Con frecuencia se comete el error de confundir lo que es una IUI con lo que es un sistema inteligente. Un sistema que muestra alguna forma de inteligencia no necesariamente es una interfaz inteligente. Existen muchos sistemas inteligentes con interfaces de usuario muy simples que no son inteligentes. Asimismo el hecho de que un sistema tenga una interfaz inteligente no nos dice nada acerca de la inteligencia que existe en el sistema. Ver figura 2. Desafortunadamente la frontera entre un sistema y su interfaz de usuario no siempre es tan claro como en la figura y muchas veces la tecnología utilizada en una IUI también es parte del sistema, o la IUI representa en realidad la totalidad del sistema. Asimismo muchas veces se pone poco énfasis en la interfaz y su separación para ser desarrollada por un grupo especializado en interfaces y ergonomía, no siempre son una realidad. A menudo al desarrollar un sistema se le da mayor importancia a la parte interna y de algoritmos, dejando la parte de la interfaz como en componente más del sistema (Hook, 1999).
Current Challenges and Applications for Adaptive User Interfaces
17
Figure 2. Sistema inteligente vs Interfaz inteligente
5. Beneficios y críticas El campo de las IUIs en ninguna manera se encuentra maduro y aún existen problemas abiertos que deben resolverse. Quizá esta se una de las razones por la cuál las IUIs (gracias a la IA) ha recibido escepticismo de una parte de HCI. El problema central radica en que las IUIs violan principios de usabilidad aceptados dentro del desarrollo de sistemas de manipulación directa. Los trabajos de Maes en el MIT (Maes, 1994) y otro de Wernn (Wernn, 1997) advierten sobre el potencial de las interfaces inteligentes y sus capacidades, ponen de manifiesto los puntos que deben ser resueltos antes de que las IUIs sean aceptadas y usadas de manera extendida. Para Shneiderman (Shneiderman,1997) un sistema adaptivo es impredecible y menos transparente que una interfaz tradicional. Si un sistema puede adaptar su respuesta y no da la misma salida dos veces ante la misma entrada, el sistema entonces se vuelve impredecible. Otro problema similar es el control sobre la interfaz. Las posibilidades son si el usuario toma el control y decide la siguiente acción a realizarse o si el sistema de manera autónoma con base a su conocimiento parcial toma el control y decide la siguiente acción. Esta situación ha sido abordada desde diferentes enfoques dando lugar a especializaciones dentro de las interfaces de usuario inteligentes. Las interfaces adaptables permiten que el control lo tenga el usuario y sea él quien dirija y controle la adaptación a través de opciones para poder personalizar y adaptar la interfaz. En el otro lado encontramos las interfaces adaptivas donde el sistema tiene la inteligencia suficiente y realiza la evaluación del estado para llevar a cabo algún tipo de adaptación sin la intervención del usuario. Un esquema de interacción que ha recibido aceptación es el de iniciativa mixta o combinada, donde se comparte le interacción entre el usuario y el sistema (Armentano, 2006). Una discusión más extensa entre expertos del área de interfaces sobre las ventajas y desventajas entre las IUIs y la interfaces de manipulación directa la encontramos en el trabajo de Birnbaum (Birnbaum, 1997).
18
Advances in Human-Computer Interaction
6. Aprendizaje Computacional en las IUIs El uso de técnicas de inteligencia artificial para diseñar IUIs ha sido un enfoque que ha sido aceptado en la comunidad de interfaces y aplicado a diferentes dominios y aplicaciones. Las técnicas reportadas en la literatura van desde los tradicionales sistemas de producción (basados en reglas) hasta técnicas más modernas como son planificación y modelos probabilísticos gráficos (Horvitz, 1998). Técnicas más recientes de IA también se han utilizado en las IUIs tales como agentes autónomos (Rich & Sidner, 1996), (Eisenstein & Rich, 2002). Todas estas técnicas han sido utilizadas para generar diferentes grados de inteligencia o adaptabilidad y aprovechar el conocimiento que se tiene del usuario y sus tareas para proporcionar una mejor interacción y experiencia de uso del sistema. Las técnicas y algoritmos de aprendizaje computacional (Machine Learning (ML) han tenido un rápido desarrollo ganando aceptación dentro de la comunidad de AI. El aprendizaje computacional ha sido aplicado con relativo éxito en varias dominios, principalmente en aplicaciones Web para colectar información y minar datos sobre el historial de interacción y de navegación de los usuarios (Fu, 2000). Asimismo el aprendizaje ha sido utilizado como un medio para inferir modelos de usuario (Stumpf et al., 2007) basado en datos pasados con el objeto de descubrir patrones desconocidos y adaptar el comportamiento de la interfaz. En términos de ML, una interfaz inteligente puede ser conceptualizada como “un componente de software que mejora su capacidad para interactuar con el usuario mediante la construcción de un modelo basado en la experiencia parcial con ese usuario” (Langley, 1999). Esta definición muestra claramente que una interfaz inteligente está diseñada para interactuar con usuarios reales y humanos. Aún más, si la interfaz debe ser considerada como inteligente, entonces debe mejorar su interacción con el usuario al pasar el tiempo, considerando que una simple memorización de esas interacciones no es suficiente, sino que la mejora debe provenir como resultados de una generalización en experiencias pasadas para establecer nuevas interacciones con el usuario. Es posible identificar dos amplias categorías de interfaces inteligentes dentro del enfoque de ML, las cuales difieren en el tipo de retroalimentación que debe proporcionar el usuario: • Informativas: Este tipo de interfaz trata de seleccionar o modificar información para el usuario al presentar sólo aquellos elementos que el usuario pueda encontrar interesantes o útiles para la tarea que está realizando. Los ejemplos más comunes son los sistemas de recomendación de productos (i.e la librería en línea Amazon.com) y filtros de noticias, donde se dirige la atención de los usuarios dentro de un amplio espacio de opciones. En este tipo de sistemas la retroalimentación del usuario generalmente incluye marcar las opciones de recomendación como deseables y no deseables y evaluarlas asignando alguna puntuación. Sin embargo esta clase de interfaces distrae la atención de la tarea central, pues el usuario debe proporcionar retroalimentación. Se reporta en la literatura la existencia de métodos menos intrusivos para obtener retroalimentación al observar el proceso de acceso (Sugiyama, 2004). • Generativas: Este tipo de interfaces están enfocadas principalmente a la generación de alguna estructura útil de conocimiento. Aquí se incluye programas para la preparación de documentos, hojas de cálculo, así como también sistemas para planificación y configuración. Estas áreas soportan un tipo de retroalimentación mejorada ya que el usuario puede no sólo ignorar alguna recomendación, sino
Current Challenges and Applications for Adaptive User Interfaces
19
sustituirla por otra. Los tipos de retroalimentación están amarrados a los tipos de interacción que soporte la interfaz. Varios sistemas requieren que el usuario corrija acciones no deseables, lo cual es un problema de interrupción, sin embargo sistemas recientes incorporan esquemas menos instrusivos mediante la observación de las acciones del usuario. (Franklin et al., 2002). También se reporta en la literatura que minar los registros del uso en Web es un enfoque factible para construir interfaces Web adaptivas . El historial de acceso Web constituye una fuente abundante de información que permite experimentar con conjuntos de datos reales. Los sistemas Web adaptivos le facilitan al usuario la navegación al proporcionarle accesos directos a sitios de manera personalizada. Durante el proceso de adaptación, los datos de acceso del usuario son la fuente central de información y es utilizada para construir el modelo del usuario, que refleja el patrón que el sistema infiere para los usuarios y describe varias características de los mismos. La minería de datos usando reglas de asociación se ha utilizado para minar el historial de navegación de sitios Web y también para proponer o sugerir nuevas ligas basada en un filtrado colaborativo.(Mobasher et al., 2001).
7. Modelado del usuario en la IUIs La interfaz de usuario y el modelado del usuario (UM) pueden ser vistos como como dos lados de un mismo componente. El modelo de usuario consiste en los algoritmos que son implementados en los componentes de software mostrando el concepto de personalización desde la perspectiva del sistema. Por otro lado, la interfaz inteligente es la interfaz gráfica (GUI) y le presenta al usuario un contenido generado por el UM, mostrando la personalización desde el punto de vista del usuario como se muestra en la figura 3.
Figure 3. Dos perspectivas del modelos del usuario Las aplicaciones de modelado de usuario pueden ser definidas como aplicaciones donde “usuarios con diferentes objetivos, intereses, niveles de experiencia, habilidades y preferencias pueden adecuar a la medida el comportamiento del programa a necesidades individuales generalmente mediante el uso de un perfil de usuario” (Kobsa, 1994) El propósito del UM es colectar, procesar y mostrar datos adaptados al nivel de la interfaz mediante una recopilación de conocimiento de información de dos tipos.
20
Advances in Human-Computer Interaction
•
Implícita: Esta recopilación de datos involucra examinar archivos de registros como historiales de navegación o los “cookies” en un navegador (Fu, 2000). El descubrimiento de patrones en los datos permite el filtrado de contenidos y su direccionamiento y entrega basada en suposiciones hechas sobre el comportamiento del usuario. Por ejemplo, si un usuario regularmente revisa ciertos elementos (ítems) particulares, el UM identificará el patrón y puede alterar la interfaz para desplegar esos elementos y asociar el contenido en una pagina Web de acceso al usuario. Esta es una técnica automatizada de UM que no requiere de ninguna retroalimentación directa del usuario para identificar un patrón de comportamiento y modificar el contenido de acuerdo a éste. • Explícita: La captura de datos explícitos involucra el análisis de datos metidos por el usuario, proporcionando información acerca de sus preferencias mediante el llenado de un perfil de usuario. Ejemplos de estos datos son la edad, sexo, ciudad, historial de compras, preferencias sobre contenidos y distribución de la información en pantalla. Esta técnica también se conoce en el área de modelado como una técnica UM informada. Esta técnica requiere la entrada de datos por parte de los usuarios y un mecanismo para reconocer patrones en las preferencias de los usuarios con el objeto de modificar la interfaz basada en esas preferencias. Los modelos de usuario pueden ser utilizados para crear servicios personalizados mediante la adaptación a necesidades individuales usando información o técnica de filtrado colaborativo (Carenini, 2003). Los usuarios pueden tomar control sobre su interacción al elegir contenidos de manera explícita basada en perfiles de usuarios o a través de sistemas de recomendación que establezcan asociaciones entre un historial de compra individual o de navegación con el de otros usuarios similares (Kirsh-Pinheiro et al., 2005). Una interfaz adaptiva presenta los modelos de usuario al nivel de la interfaz al desplegar contenidos personalizados (Eirinaki & Vazirgiannis, 2003). Asimismo los usuarios también tienen la oportunidad de interactuar directamente con el modelo de usuario cuando lo crean o al editar los perfiles. El modelado de usuario y la interfaz adaptiva esencialmente presentan vistas personalizadas de contenidos que pueden ahorrar tiempo para localizar información o guiar a los usuarios a contenidos disponibles y desconocidos al usuario. En resumen, el UM es una alternativa sólida para la presentación de contenidos personalizados y es considerado una herramienta útil para el usuario que permite el uso de datos explícitos. Por otro lado existe escepticismo acerca del uso de datos implícitos para crear estereotipos imprecisos basados en complejos algoritmos de inferencia (Peyton, 2003), ya que una gran cantidad de de literatura reporta pocas pruebas con usuarios. “Una rápida mirada de los 9 primeros años de UMUAI revela que sólo una tercera parte de los artículos incluye algún tipo de evaluación. Esto es un porcentaje muy bajo” (Chin, 2001).
8. HCI en las IUIs El enfoque de HCI para resolver los problemas de interacción se ha centrado en el uso de diferentes modos de interacción o modalidades para comunicar al usuario con el sistema. La idea central es realizar esta interacción de la manera más parecida a como lo hacemos los humanos. Si tomando en cuenta que los humanos percibimos el mundo a través de los sentidos (tacto, vista, oído, olfato y gusto), parece lógico la idea de unificar esta información en una computadora que sea capaz de procesarla a través de diferentes modos de acuerdo a los dispositivos disponibles (teclado, micrófono, cámara, etc.) dando lugar a una interfaces
Current Challenges and Applications for Adaptive User Interfaces
21
multimodal. Las interfaces multimodales combinan dos o más modos diferentes de comunicación con el objeto de mejorar el canal de interacción entre el humano y el sistema. Las interfaces multimodales actuales pueden procesar dos o más modos de entrada combinados usando tecnologías basadas en reconocimiento, con el objeto de identificar con precisión o interpretar las intenciones de comunicación del usuario (Reeves et al., 2004). Existen varios tipos de sistemas multimodales, incluyendo aquellos que procesan voz y entrada manual con lápiz o a través de pantallas sensibles al tacto y audiovisual. Las tecnologías de visión computacional y procesamiento de voz son fundamentales al desarrollo de este tipo de interfaces y son revisadas extensamente en (Oviatt et al., 2000). Un enfoque novedoso en las interfaces multimodales es la integración de tecnología de agentes, componentes multimedia y de realidad virtual junto con técnicas de inteligencia artificial para desarrollar personajes similares a humanos que interactúan con los usuarios de una manera más natural. Mediante el uso de de visión computacional y procesamiento de voz, estos personajes animados o agentes conocidos como ECAS Embodied Conversational Agents representan una línea de investigación que integra varias áreas (Cassell, 2000). En estos ambientes virtuales de los ECAs, las interfaces enfrentan nuevos retos de diseño para HCI.
9. Sistemas relacionados A continuación se presentan algunos de los sistemas reportados en la literatura de interfaces inteligentes que presentan algunas de las características de adaptación relevantes a la propuesta doctoral presentada en este documento. 9.1 Sistema SurfLen Una de los elementos centrales de las interfaces adaptivas es su capacidad para anticipar las tareas que realizará el usuario basada en observar las acciones actuales, su comportamiento y el modelo del usuario. Una línea de investigación reportada en la literatura de IUI se basa en establecer acciones (tareas, navegación, planes, estrategias, etc.) futuras considerando el análisis del historial de las acciones pasadas del usuario (o también de un conjunto de usuarios) mediante el uso de técnicas de IA con el objeto de llevar a cabo algún tipo de razonamiento sobre dicha información e inferir la acción a seguir. El trabajo de (Fu, 2000) se ubica dentro de esta categoría, pero está enfocado a un sistema de recomendación de información basado en Web. La hipótesis planteada por los autores es que el historial de navegación del usuario contiene información suficiente para descubrir conocimiento acerca de páginas de interés al usuario, sin la necesidad de que el usuario le asigne una calificación o tecleé información adicional, lo cual es una desventaja de los sistemas actuales. Los autores plantean que de manera activa pero silenciosa es posible mantener el seguimiento de lo que el usuario ha leído, recolectar el historial de navegación en un repositorio centralizado y aplicar técnicas de IA para descubrir conocimiento. Por ejemplo, si dos usuarios han leído varias páginas similares, podemos inferir que ambos usuarios tienen intereses similares. Este conocimiento sobre la similaridad de patrones de navegación, es usado para generar (inferir) las recomendaciones. Para probar los planteamientos, los autores desarrollaron un prototipo de un sistema de recomendación llamado SurfLen que sugiere páginas de interés a usuarios sobre tópicos específicos. El prototipo utiliza el algoritmo de Agrawal propuesto por los autores para
22
Advances in Human-Computer Interaction
generar el conjunto de reglas de asociación con el uso de una optimización apriori. La evaluación del prototipo reporta que el número de sugerencias correctas por usuario aumenta conforme el historial de navegación se vuelve más representativo de sus intereses. La parte relevante del artículo es el uso del algoritmo de optimización A-priori de Agrawal aplicado al historial de navegación pero con el objeto de realizar recomendaciones. Asimismo el esquema propuesto con reglas de asociación ofrece ventajas sobre sistemas similares al no requerir intervención por parte del usuario. La parte débil del artículo es que las pruebas son simuladas y en un entorno controlado. En los resultados de las pruebas del prototipo se menciona que ofrece ventajas sobre otras técnicas, sin embargo no hace ninguna comparación con un sistema similar. Por otro lado, la creación de los datasets conforme se va aumentando el número de URLs y de usuarios debe resultar costoso en tiempo de procesamiento, aún con la optimización propuesta, tema que no se menciona. El presenta artículo es relevante a a la propuesta doctoral al exponer el uso de técnicas de IA para analizar la manera en que los operadores navegan la interfaz de un sistema de supervisión y control, con el objeto de implementar el mecanismo de adaptación. Una posibilidad sería analizar el historial de navegación para operadores (experto, novato, intermedio) en situaciones (normal, emergencia) o en fases (arranque, operación, paro), para crear un patrón de navegación que me permita predecir (recomendar) el siguiente despliegue (o acción) basado en el historial de un operador (o de varios operadores similares) 9.2 Interfaces Adaptivas para Control de Procesos El sistema reportado en (Viano et al., 2000) resulta relevante debido a que aborda el tema de las interfaces adaptivas, enfocado a interfaces de usuario para operaciones críticas tales como el control de procesos. Adicionalmente el prototipo presentado por los autores se plantea para ser utilizado en dos aplicaciones: un sistemas para el manejo de redes eléctricas y un sistema para la supervisión de una planta de generación termoeléctrica. Los autores argumentan que debido a la complejidad cada vez mayor en los sistemas de control, las limitaciones de los operadores para manejar grandes cantidades de información en tiempo real y en situaciones de falla, junto con las exigencias para mantener la continuidad en la operación, permite considerar el enfoque de los sistemas adaptivos, los cuales pueden alterar su estructura, funcionalidad o interfaz con el objeto de acomodar las diferentes necesidades individuales o grupos de usuarios. El enfoque adaptivo que proponen permite asistir al operador en adquirir la información más relevante en cualquier contexto particular. El planteamiento central del artículo es que el diseño de la interfaz del operador es rígido, es decir, se establece durante el tiempo de diseño, tomando en cuenta las mejores guías o prácticas recomendadas por HCI, sin embargo, una vez establecido, se mantiene durante la ejecución de la aplicación. Este mapeo entre el tipo de información que se está recibiendo de campo y la forma (y medios) utilizado para presentarlo al usuario, no es única, sino que el diseñador selecciona la que considera más efectiva y eficiente, sin embargo otros posible mapeos son descartados. Así, sin importar que tan bueno sea el diseño, será fijo con las desventajas asociadas tales como una estructura rígida, no diseñado para situaciones de emergencia, etc. La arquitectura de interfaz adaptiva propuesta considera la adaptación en la presentaci´ on de información. Los autores denominan su modelo como de ”Mapeo
Current Challenges and Applications for Adaptive User Interfaces
23
Flexible” basado en el estado actual del proceso, el ambiente y el conocimiento de factores humanos. El artículo propone una arquitectura multi-agentes, con agentes para las siguientes funciones: modelo del proceso, medios, resolución de despliegue, presentación, base de datos de factores humanos y operador. Asimismo se contemplan dos principios para iniciar el mecanismo de adaptación: cuando se detecte desviación del proceso y en la desviación del operador (no reacciona de acuerdo al procedimiento esperado). Al inicio los autores mencionan que los beneficios del enfoque propuesto se evalúan en dos prototipos, sin embargo en la sección 6 (Prototipos), se precisa que los dos prototipos están en desarrollo no se presenta ningún tipo de evaluación. La parte de aportación de este artículo a mi trabajo de investigación es la descripción de las necesidades en los sistemas de control de procesos donde un esquema adaptivo es necesario. La parte fuerte del artículo es la arquitectura propuesta, aún cuando la explicación dada sobre los componentes y la interacción entre ellos sea escasa. Por otra parte los autores no mencionan los mecanismos de coordinación, aspecto importante y que representa un reto en los sistemas multi-agentes. En términos generales los autores no profundizan en ningún aspecto y el estado del arte que presentan es básico. 9.3 Sistema ADAPTS El sistema ADAPTS presentado por (Brusilovsky & Cooper, 2002) aborda la utilización de diferentes modelos (tareas, usuarios y entorno) para adaptar contenido y navegación en un sistema adaptable de hiper-media. Estas características son similares a las de la propuesta doctoral, donde se plantea realizar un modelo de interfaz adaptiva de iniciativa combinada, aplicada al dominio eléctrico dentro de un sistema inteligente de ayuda con el objeto de adaptar el contenido de la información presentada, tomando en cuenta las características del usuario y el estado de la planta. Los autores describen el sistema ADAPTS Adaptive Diagnostics And Personalized Technical Support, un sistema electrónico de soporte para técnicos de mantenimiento compuesto por una guía adaptiva a partir de un sistema de diagn´ ostico con acceso adaptivo a información técnica, abarcando ambos lados del proceso: qué hacer y cómo hacerlo. Es un proyecto extenso resultado de la colaboración de investigadores de la Naval Research Airs Warfare Center, Aircraft Division, Carnegie-Mellon University, University of Connecticut y la compañía Antech Systems Inc. Un componente importante del sistema lo constituye el IETM Interactive Electronic Technical Manuals, el cual proporciona una gran cantidad de información acerca del sistema: Cómo está construido, su operación y qué hacer en caso de problemas específicos, etc. ADAPTS es un sistema adaptivo complejo que ajusta la estrategia de diagnóstico en base a quién es el técnico y qué es lo que está haciendo, adaptando dinámicamente la secuencia de configuraciones, pruebas, procedimientos de reparación/reemplazo basado en las respuestas del técnico. Asimismo integra conocimiento del dominio, tareas de mantenimiento y las características de usuario. El modelo del usuario propuesto por los autores determina qué tarea se debe realizar, qué información técnica seleccionar para describir la tarea y cómo presentar de la manera más adecuada dicha información para un técnico particular. La problemática planteada en el artículo es que la cantidad de información potencialmente relevante en un momento dado dentro del proceso de reparación puede llegar a ser muy grande y es un reto para los técnico
24
Advances in Human-Computer Interaction
encontrar la información más adecuada a su experiencia y contexto de trabajo. Para esto es necesaria llevar a cabo de manera permanente una valoración dinámica del conocimiento del técnico, su experiencia, preferencias y contexto de trabajo. En el modelo propuesto la experiencia es calculada a partir de varias evidencias sobre el nivel de experiencia del técnico que son recolectadas por el sistema al interactuar con el técnico. Asimismo ADAPTS utiliza un modelo de usuario tipo sobreposición multi-aspectos ( Multi-Aspect Overlay Model) el cual resulta más expresivo, pero al mismo tiempo más complejo de generar y mantener. Los autores presenta el sistema ADAPTS, sus modelos propuestos y cómo pueden ser utilizados para construir un sistema adaptivo, sin embargo no especifican (justifican) porque los 20 aspectos propuestos utilizados para evaluar la experiencia de los usuarios son suficientes o los adecuados. Un aspecto cuestionable es que los autores asumen en su modelo que si un usuario solicita un despliegue del IETM necesariamente lo lee, sin embargo convendría incluir algún nivel de probabilidad o tiempo para establecer esta afirmación, como lo hacen otros sistemas similares ante esta situación. Llama la atención que no se presentan datos sobre experimentos de validación, su evaluación, utilidad real, aplicación en campo, satisfacción de los usuarios, etc., a pesar de que se trata de un sistema extenso y completo.. 9.4 Modelo de decisión El enfoque propuesto en (Stephanidis et al., 1997) presenta a las interfaces inteligentes de Usuario como componentes que se caracterizan por su capacidad de adaptarse en tiempo de ejecución y tomar varias decisiones de comunicación referente al “qué”, “cuando”, “por qué” y “cómo” para interactuar con el usuario, todo esto mediante el empleo de una estrategia de adaptación. El uso de modelos como una parte central para ampliar el entendimiento sobre los procesos involucrados en la adaptación de las interfaces se presenta en (Puerta, 1998), donde se hace una clara diferencia entre un modelo y una arquitectura y los objetivos que persiguen cada uno. Los autores conceptualizan la estrategia de adaptación como un proceso de toma de decisiones, caracterizada por atributos que involucran aspectos de la interfaz de usuario que están sujetos a adaptación y denominados por los autores como “constituyentes de adaptación”. Asimismo la adaptación en tiempo de ejecución implica cierto tipo de monitoreo de la interfaz con el objeto de evaluar el estado de los elementos críticos de la interacción llamados “determinantes de adaptación” y sobre los cuales se condicionan las decisiones de adaptación. Otro aspecto abordado son los Objetivos propios del proceso de adaptación. En el esquema propuesto por los autores, se establece que las adaptaciones serán realizadas mediante un conjunto de reglas, llamadas “reglas de adaptación” que en esencia y de manera simplificada lo que hacen es asignar ciertos ‘“constituyentes de adaptación” a “determinantes de adaptación” específicos, dado un conjunto de “Objetivos de adaptación”. Una de las motivaciones es el hecho que aunque varios enfoques han sido reportados en la literatura, actualmente no existe consenso respecto a las características, comportamiento y componentes esenciales que deben conformar las interfaces inteligentes. El problema que plantean es que los elementos críticos del proceso de adaptación (determinantes, constituyentes, objetivos y reglas) difieren sustancialmente en los sistemas actuales. Así
Current Challenges and Applications for Adaptive User Interfaces
25
tenemos que los sistemas existentes adaptan ciertos constituyentes predefinidos, basado en un conjunto predeterminado de determinantes, a través del uso específico de reglas, con el fin de alcanzar objetivos pre-especificados. Con lo anterior argumentan que el proceso de adaptación no es flexible y que no puede ser transferido fácilmente entre aplicaciones. Para enfrentar las limitaciones anteriores los autores proponen: 1. Utilizar un enfoque metodológico que permita la personalización del conjunto de determinantes de adaptación y constituyentes de adaptaci´ on. 2. La incorporación de los objetivos de adaptación como parte integral del proceso de adaptabilidad y. 3. la modificación de las reglas de adaptación, de acuerdo a los objetivos de adaptabilidad. La idea central del enfoque propuesto está basada en una clara separación de los atributos, del proceso de adaptación. También se plantea que al establecer esta separación, permite que los atributos de la estrategia de adaptación puedan ser personalizados fácilmente a los requerimientos de diferentes dominios de aplicación y gruposde usuario, con lo cual podrían ser reutilizados con modificaciones menores en otras aplicaciones. Los autores presentan una arquitectura general de una interfaz inteligente, describiendo los componentes,la interacción entre ellos y su relación con la estrategia de adaptabilidad. Asimismo se presenta de manera formal una representación de los elementos de adaptación utilizados. Concluyen el trabajo con los beneficios del enfoque propuesto, los cuales parecen ser cuestionables en varios puntos, principalmente porque no ofrecen pruebas de sus afirmaciones. 9.5 Sistemas de Iniciativa Combinada Los sistemas de iniciativa combinada ha surgido como una alternativa a los esquemas de interacción que permite manejar la iniciativa desde una perspectiva más flexible, aunque más compleja. El sistema presentado por Bunt (Bunt et al., 2004) describe la importancia del soporte adaptivo, un tema central dentro del campo de las interfaces adaptivas de iniciativa combinada (Horvitz, 1999), en las cuales es necesario que en ocasiones el sistema realice las adaptaciones de manera automática, asumiendo los respectivos problemas asociados tales como falta de control, transparencia y predictivilidad, y en otras, ofrecer los mecanismos para que el usuario mismo tome control y lleve a cabo la adaptación; sin embargo hay evidencias de que muchas veces el usuario no las realiza, y cuando lo hace, no es claro si lo hace de una manera efectiva (Jameson & Schwarzkopf, 2002). El artículo expone la necesidad de ofrecerle a los usuarios un esquema adaptivo que les apoye en la tarea de personalización de una interfaz. Los autores proponen una solución de iniciativa combinada, en donde si el usuario es capaz de personalizar eficientemente por si mismo, no se requiere adaptación iniciada por el sistema. En caso contrario, el sistema puede intervenir para proporcionar asistencia. Asimismo se analiza el valor de la personalización y cómo ofrecer lo que llaman soporte adaptivo (ayudar a usuarios aprovechar ventajas de una Interfaz Adaptable). Se presenta un estudio experimental y se examinan los aspectos necesarios para que los usuarios lleven a cabo una personalización efectiva de una GUI basada en menús. Resulta interesante el análisis que se hace sobre las características de las tareas y los comportamientos de personalización que afectan el desempeño. También se realiza un
26
Advances in Human-Computer Interaction
estudio con una simulación del modelo del proceso (GOMS) basado en un modelado cognitivo que genera predicciones del desempeño del usuario. Los autores analiza con datos de prueba si vale la pena realizar la personalización de la interfaz y si realmente existen beneficios. Para probar su trabajo, se realizan 2 experimentos exploratorios utilizando el simulador GLEAN (Baumeister et al., 2000). En el primer experimento se lleva a cabo una comparación de diferentes estrategias de personalización que varían respecto al momento de su realización, es decir cuándo son realizadas y también analizan si el “overhead” de personalizar retribuye en algo. En el segundo experimento se enfocan a comparar estrategias que difieren en términos de qué funcionalidades elige agregar el usuario a la interfaz personalizada y sus implicaciones. Como aportación del trabajo podría considerarse el uso de un modelo GOMS que le permita simular y evaluar el impacto en la personalización tomando en cuenta de manera conjunta los factores de la estrategia de personalización basada en el momento que se realiza, la frecuencia de ejecución de las tareas, su complejidad y el nivel de experiencia del usuario. Finalmente los resultados obtenidos de los experimentos muestran la importancia de personalizar la interfaz, y consecuentemente se fortalece la justificación de “guiar” al usuario a personalizar su interfaz como una línea importante de investigación.
10. Discusión De la revisión de la literatura de IUIs presentada en este capítulo se observa que la mayoría de los sistemas analizados son aplicados a sistemas basados en Web y de oficina o de propósito general. El sistema presentado por Viano (Viano et al., 2000) representa uno de los pocos sistemas para aplicaciones críticas, tales como las de operación y control, sin embargo a diferencia del modelo que proponemos (MIA-IC), su sistema es adaptivo y es aplicable a la visualización de contenidos únicamente, no incluyendo navegación adaptable. Asimismo el sistema MIA-IC es específico a tareas poco frecuentes y críticas, nicho que ninguno de los sistemas presentado aborda. El sistema ADAPTs (Brusilovsky & Cooper, 2002) es similar en algunos aspectos al modelo que proponemos, ya que utiliza varios modelos para realizar la adaptación, como son un modelo de usuario, un modelo del dominio y otro de tareas similar al MIA-IC. Asimismo ADAPTs utiliza un sistema inteligente que le reporta el diagnóstico de fallas y utiliza un manual electrónico para encontrar la información que le explique al usuario cómo realizar la reparación. Esta situación parece similar a la del MIA-IC, que se conecta al sistema SAO de ayuda y realiza adaptación tanto de contenidos como de navegación. Una diferencia importante es que ADAPTs es un sistema adaptivo, mientras que en el MIAIC proponemos un esquema de iniciativa combinada, ofreciendo mayor flexibilidad en la interacción con el usuario. Por otra parte el énfasis del sistema ADAPTs es su complejo y novedoso modelo de usuario multi-aspectos que permite de manera dinámica establecer el nivel de experiencia del usuario en todo momento. En nuestro modelo, al tener un modelo integral se busca que la información de los diferentes modelos sea el componente que nos permita hacer más preciso el proceso de adaptación, no par la cantidad de información de los diferentes modelos, sino por la información más adecuada al operador y al estado de la planta. Otra diferencia es que ADAPTs una guia adaptiva para mantenimiento y no es para situaciones poco frecuentes o críticas.
Current Challenges and Applications for Adaptive User Interfaces
27
Dado que las IUIs construyen modelos al observar el comportamiento de sus usuarios, es un reto abierto aún el generar modelos útiles mediante algoritmos que realicen un aprendizaje de manera rápida. El punto aquí no se refiere al tiempo de procesamiento en el CPU, sino al número de ejemplos de entrenamiento necesarios para generar un modelo preciso de las preferencias del usuario. La mayoría de las aplicaciones de minería de datos suponen que se tienen cantidades considerables de datos, suficientes como para inducir conocimiento preciso. En contraste, las interfaces adaptivas dependen del tiempo del usuario al utilizarlas y entonces se requiere de mecanismos de inducción que puedan proporcionar una alta precisión a partir de un conjunto pequeño de datos de entrenamiento. Otro reto abierto es la cantidad limitada de evaluación empírica que actualmente existe para los sistemas adaptivos, por lo que más investigación es necesaria para establecer de manera medible si una interfaz adaptiva es mejor comparada a una no inteligente. Varios métodos empíricos de evaluación de usabilidad han sido utilizados y adaptados en el contexto de interfaces adaptivas, tales como entrevistas, cuestionarios, el protocolo Think Aloud (Dix at al., 2003), y otros que en esencia constituyen algunos de los métodos más usados en la validación y pruebas de los sistemas tradicionales, sin embargo su aplicabilidad a los sistemas adaptivos se considera limitado.
11. Conclusión Aunque hemos presentado en este capítulo la investigación en el área de IUIs, sus retos y enfoque de solución con especial énfasis en el hecho que es un campo multidisciplinario, se observa que los tres enfoques que actualmente se utilizan con mayor éxito para enfrentar los retos de las IUIs aún muestran una falta de intercambio en los hallazgos y problemas enfrentados entre las disciplinas involucradas. De la revisión de literatura se detecta que es necesaria una mayor integración e intercambio de ideas entre las diferentes disciplinas que conforman el área de interfaces inteligentes de usuario. Creemos que la investigación en el desarrollo de nuevos modelos de interfaz que integren componentes de varias disciplinas ayudará a encontrar soluciones diferentes, pues en general se observa que actualmente la investigación que se realiza tiende a favorecer el uso de los mismos conocimientos, técnicas y metodologías probadas y aceptados dentro de una misma disciplina, y la llamada fertilización cruzada aún es necesaria. -.
12. Referencias Armentano, M.; Godoy, M. and Amandi, A. (2006) Personal assistants: Direct manipulation vs. mixed initiative interfaces. Int.J. Hum.-Comput. Stud., 64(1):27–35, 2006. Baumeister, L.K.; John, B. E. and Byrne. M. D. (2000) A com parison of tools for building GOMs models. In CHI ’00: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press, pages 211–214, New York, NY, USA. Birnbaum, L.; Horvitz, E.; Kurlander, D.; Lieberman, H. ; Marks, J. and Roth, S. (1997) Compelling intelligent user interfaces. How much AI? In IUI ’97: Proceedings of the 2nd international conference on Intelligent user interfaces, pages 173–175, New York, NY, USA.
28
Advances in Human-Computer Interaction
Brusilovsky, P. and Cooper, D. W. (2002) Domain, task, and user models for an adaptive hypermedia performance support system. In IUI ’02: Proceedings of the 7th international conference on Intelligent user interfaces, ACM Press, pages 23–30, New York, NY, USA. Bunt, A.; Conati, C. and McGrenere, J. (2004) What role can adaptive support play in an adaptable system? In IUI ’04: Proceedings of the 9th international conference on Intelligent user interfaces, ACM Press, pages 117–124, New York, NY, USA. Carenini, G,; Smith, J. and Poole D. (2003) Towards more conversational and collaborative recommender systems. In IUI ’03: Proceedings of the 8th international conference on Intelligent user interfaces, ACM Press, pages 12–18, New York, NY, USA. Cassell, J. (2000) Embodied conversational interface agents. Commun. ACM, 43(4):70–78, USA. Chin, D. (2001) Empirical evaluation of user models and user-adapted systems. UMUAI 01’, User Modeling and User-Adapted Interaction, 11(1-2):181–194, USA. Dix, A.; Finlay J.; Abowd G.; and Beale R. (2003) Human-computer interaction (3rd ed.). Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Eirinaki, M and Vazirgiannis, M. (2003) Web mining for web personalization. ACM Trans. Inter. Tech., 3(1):1–27, USA. Eisenstein, J. and Rich, C. (2002). Agents and GUIs from task models. In IUI ’02: Proceedings of the 7th international conference on Intelligent user interfaces, ACM Press, pages 47– 54, New York, NY, USA. Franklin, D.; Budzik, J. and Hammond, K. (2002) Plan-based interfaces: keeping track of user tasks and acting to cooperate. In Proceedings of the 7th international conference on Intelligent user interfaces, ACM Press, pages 79–86, USA. Fu, X.; Budzik, J. and Hammond, K. (2000) Mining navigation history for recommendation. In IUI ’00: Proceedings of the 5th international conference on Intelligent user interfaces, ACM Press, pages 106–112, New York, NY, USA. Hook, K. (1999) Designing and evaluating intelligent user interfaces. In IUI ’99: Proceedings of the 4th international conference on Intelligent user interfaces, ACM Press,pages 5–6, New York, NY, USA. Horvitz, E.; Breese, J.; Heckerman, D.; Hovel, D.; and Rommelse, K. (1998) The Lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pages 256–265, Madison, WI, USA. Horvitz, E. (1999) Principles of mixed-initiative user interfaces. In CHI ’99: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press, pages 159– 166, New York, NY, USA. Jameson A. and Schwarzkopf, E. (2002) Pros and cons of controllability: An empirical study. In AH ’02: Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Springer-Verlag, pages 193–202, London, UK. Kirsch-Pinheiro, M.; Villanova-Oliver, M.; Gensel, J. and Martin H. (2005) Context-aware filtering for collaborative web systems: adapting the awareness information to the user’s context. In SAC ’05: Proceedings of the 2005 ACM symposium on Applied computing, ACM Press, pages 1668–1673, New York, NY, USA.
Current Challenges and Applications for Adaptive User Interfaces
29
Kobsa, A. (1994) User modeling and user-adapted interaction. In CHI ’94: Conference companion on Human factors in computing systems, ACM Press, pages 415–416, New York, NY, USA. Langley, P. (1999) User modeling in adaptive interfaces. In UM ’99: Proceedings of the seventh international conference on User modeling, Springer-Verlag New York, Inc., pages 357– 370, Secaucus, NJ, USA. Maes, P. (1994) Agents that reduce work and information overload. Commun. ACM, 37(7):30–40, USA. Maybury, M. (1999) Intelligent user interfaces: an introduction. In IUI ’99: Proceedings of the 4th international conference on Intelligent user interfaces, ACM Press, pages 3–4, New York, NY, USA. Mayhew, D.J. (1999) The usability engineering lifecycle: a practitioner’s handbook for user interface design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Mobasher, B.; Dai, H.; Luo, T. and Nakagawa, M. (2001) Effective personalization based on association rule discovery from web usage data. In WIDM ’01: Proceedings of the 3rd international workshop on Web information and data management, ACM Press, pages 9– 15, New York, NY,USA. Oviatt, S.; Cohen,P.; Wu, L.; Vergo, J.; Duncan, L.; Suhm, B.; Bers, J.; Holzman, T.; Winograd, T.; Landay,j.; Larson, J. and Ferro, D. (2000) Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions for 2000 and beyond. Peyton, L. (2003) Measuring and managing the effectiveness of personalization. In ICEC ’03: Proceedings of the 5th international conference on Electronic commerce, ACM Press, pages 220–224, New York, NY, USA. Puerta, A. R. (1998). Supporting user-centered design of adaptive user interfaces via interface models. In Proceedings of the First Annual Workshop On Real-Time Intelligent User Interfaces For Decision Support And Information Visualization.ACM. Press. USA. Reeves, L.M.; Lai, J.; Larson, J. A.; Oviatt, S.; Balaji, T. S.; Buisine, S.; Collings, P.; Cohen, P.; Kraal, B.; Martin, J.; McTear, M.; Raman, T.V.; Stanney, K. M.; Su, H. and Ying Q. Guidelines for multimodal user interface design. Commun. ACM, 47(1):57–59. USA. Rich, C. and Sidner, C.L. (1996). Adding a collaborative agent to graphical user interfaces. In Proceedings of the 9th annual ACM symposium on User interface software and technology, ACM Press, pages 21–30. Russell S. and Norvig P. (2003) Artificial Intelligence: A Modern Approach (2nd ed.). PrenticeHall, Englewood Cliffs, NJ, USA. Shneiderman, B. and Plaisant, C. (2004) Designing the User Interface : Strategies for Effective Human-Computer Interaction (4th ed.). Addison-Wesley Publishing, Boston, USA. Shneiderman, B. (1997) Direct manipulation for comprehensible, predictable and controllable user interfaces. In IUI ’97: Proceedings of the 2nd international conference on Intelligent user interfaces, ACM Press, pages 33–39,New York, NY, USA. Stephanidis, C.; Karagiannidis, C. and Koumpis, A. (1997) Decision making in intelligent user interfaces. In IUI ’97: Proceedings of the 2nd international conference on Intelligent user interfaces, ACM Press, pages 195–202, New York, NY, USA.
30
Advances in Human-Computer Interaction
Stumpf, S.; Rajaram, V.; Li, L.; Burnett, M.; Dietterich, T.; Sullivan, E.; Drummond, R. and Herlocker, J. (2007) Toward harnessing user feedback for machine learning. In IUI ’07: Proceedings of the 12th international conference on Intelligent user interfaces, ACM Press, pages 82–91, New York, NY, USA. Sugiyama, K.; Hatano, K. and Yoshikawa, M. (2004) Adaptive web search based on user profile constructed without any effort from users. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, ACM Press, pages 675–684, New York, NY, USA. Viano, G.; Parodi, A.; Alty, J.; Khalil, C.; Angulo, I.; Biglino, D.; Crampes, M.; Vaudry, C.; Daurensan, V. and Lachaud, P. (2000) Adaptive user interface for process control based on multi-agent approach. In AVI ’00: Proceedings of the working conference on Advanced visual interfaces, ACM Press, pages 201–204, New York, NY, USA. Wernn, A. (1997) Local plan recognition in direct manipulation interfaces. In Proceedings of the 2nd international conference on intelligent user interfaces, ACM Press, pages 7–14. USA.
3 Towards a Reference Architecture for Context-Aware Services Axel Bürkle, Wilmuth Müller and Uwe Pfirrmann Fraunhofer Institute for Information and Data Processing Germany
1. Introduction This Chapter describes an infrastructure for multi-modal perceptual systems which aims at developing and realizing computer services that are delivered to humans in an implicit and unobtrusive way. The framework presented here supports the implementation of humancentric context-aware applications providing non-obtrusive assistance to participants in events such as meetings, lectures, conferences and presentations taking place in indoor “smart spaces”. We emphasize on the design and implementation of an agent-based framework that supports “pluggable” service logic in the sense that the service developer can concentrate on the service logic independently of the underlying middleware. Furthermore, we give an example of the architecture’s ability to support the cooperation of multiple services in a meeting scenario using an intelligent connector service and a semantic web oriented travel service. The framework was developed as part of the project CHIL (Computers in the Human Interaction Loop). The vision of CHIL was to be able to provide context-aware human centric services which will operate in the background, provide assistance to the participants in the CHIL spaces and undertake tedious tasks in an unobtrusive way. To achieve this, significant effort had to be put in designing efficient context extraction components so that the CHIL system can acquire an accurate perspective of the current state of the CHIL space. However, the CHIL services required a much more sophisticated modelling of the actual event, rather than simple and fluctuating impressions of it. Furthermore, by nature the CHIL spaces are highly dynamic and heterogeneous; people join or leave, sensors fail or are restarted, user devices connect to the network, etc. To manage this diverse infrastructure, sophisticated techniques were necessary that can map all entities present in the CHIL system and provide information to all components which may require it. From these facts, one can easily understand that in addition to highly sophisticated components at an individual level, another mechanism (or a combination of mechanisms) should be present which can handle this infrastructure. The CHIL Reference Architecture for Multi Modal Systems lies in the background, and provides the solid, high performance and robust backbone for the CHIL services. Each individual need is assigned to a specially designed and integrated layer which is docked to the individual component, and provides all the necessary actions to enable the component to be plugged in the CHIL framework.
32
Advances in Human-Computer Interaction
2. The CHIL Project The architecture presented in this chapter is a result of the work carried out in the FP6 research project CHIL (IP 506909) partially funded by the European Commission under the Information Society Technology (IST) program. 2.1 Project Goals The main idea behind the CHIL project is to put Computers in the Human Interaction Loop (CHIL), rather than the other way round. Computers should proactively and implicitly attempt to take care of human needs without the necessity of explicit human specification. This implies that a service is aware of the current context of its users in order to react adequately to a user’s situation and environment. Computers are repositioned in the background, discretely observe the humans and attempt to anticipate and serve their needs like electronic butlers. To realize this goal and to overcome the lack of recognition and understanding of human activities, needs and desires as well as the absence of learning, proactive, context-aware computing services, research in the following areas was carried out within the CHIL project: • Perceptual Technologies: Proactive, implicit services require a good description of human interaction and activities. This implies a robust description of the perceptual context as it applies to human interaction: Who is doing What, to Whom, Where and How, and Why. • Software Infrastructure: A common and versatile software architecture serves to improve interoperability among partners and offers a market driven exchange of modules for faster integration. • Services: Context-aware CHIL services assisting humans interacting with each other are designed, assembled and evaluated. Prototypes are continuously developed and their effectiveness explored in user studies. 2.2 CHIL Services CHIL focuses on two situations, in which people interact with people and exchange information to realize common objectives: meetings and lecture rooms. In order to support the interaction between humans in these scenarios, the following context-aware services have been implemented: • The Connector attempts to connect people at the best time and by the best media possible, whenever it is most opportune to connect them. In lieu of leaving streams of voice messages and playing phone tag, the Connector tracks and knows its masters’ activities, preoccupations and their relative social relationships and mediates a proper connection at the right time between them. • The Memory Jog is a personal assistant that helps its human user remember and retrieve needed facts about the world and people around him/her. By recognizing people, spaces and activities around its master, the Memory Jog can retrieve names and affiliations of other members in a group. It provides past records of previous encounters and interactions, and retrieves information relevant to the meeting. • The Travel Service provides assistance with planning and rearranging itineraries. It detects if a user is going to miss a scheduled travel connection e.g. due to the delay of a meeting, and automatically searches for alternative travel possibilities. Based on user
Towards a Reference Architecture for Context-Aware Services
33
preferences, it provides a selection of the “best” travel possibilities found. The Travel Service can either be evoked directly by the user through one of his personal devices or triggered by another CHIL service. • The Meeting Manager handles all meeting related issues. For example, it keeps track of organisational meeting data, e.g. the list of planned participants and the meeting location. It can display a personalized welcome message, when a participant enters the meeting room. Furthermore, the Meeting Manager Service provides support during a meeting. One of its functionalities is to automatically start the corresponding presentation when a presenter approaches the presentation area. Other services such as a Socially Supportive Workspace were implemented by the various CHIL partners. In addition, a series of basic services for common and elementary domain tasks are available. CHIL, however, was not intended as a project to develop specific application systems. It was intended to explore them as exemplary placeholders for the more general class of systems and services that put human-human interaction at the centre and computing services in a supporting function on the periphery, rather than the other way round. 2.3 Project Results The CHIL consortium (15 institutes and companies in 9 countries) was one of the largest consortia tackling the above mentioned problems to-date. It has developed a research and development infrastructure by which different CHIL services can be quickly assembled, proposed and evaluated, exploiting a User Centred Design approach to ensure that the developed services answer real users’ needs and demands. The CHIL project has pushed the state-of-the-art in various perceptual technologies (Stiefelhagen & Garofolo, 2007; Stiefelhagen et al., 2008). Furthermore, we have developed a highly flexible architecture employing collaborative context-aware agents, which enable the implementation and provision of the CHIL services. The effectiveness of the developed components and technologies has been demonstrated by implementing a number of CHIL services.
3. Related Work Developing architectural frameworks supporting ubiquitous computing projects is not a recent trend. Major pervasive and ubiquitous computing projects have placed significant effort in designing such platforms that can facilitate integration, debugging, development, management and eventually deployment of the end-services. The m4 project (multimodal meeting manager) (Wellner et al., 2005) presents a client-server architecture using JSP-generated frames in a meeting browser to produce the output of audio and video streams and services. The AMI Project (Augmented Multi-Party Interaction) (AMI, 2008), probably the project most closely related to CHIL, focuses on technologies which are integrated by a plug-in mechanism of a relatively simple, browserbased framework that allows indirect communication between the modules. Both of them concentrate on technologies and adapt existing software for integration purposes. CHIL in contrast developed an architecture that is particularly designed for the integration of and direct communication between context-aware services. Other previous research focused on distributed middleware infrastructures (Roman et al., 2002), on architecture frameworks for developers and administrators (Grimm et al., 2000),
34
Advances in Human-Computer Interaction
on the design process and the development of frameworks and toolkits (Dey, 2000), on context-aware broker agents (Chen et al., 2004), on client-server architectures with a central server and multiple clients to support the management of multiple sensor input for different services (Johanson, 2002), on flexible, decentralized networks connecting dynamically changing configurations of self-identifying mobile and stationary devices (Coen et al., 1999), on architectures for coordination of I/O-devices and the exploitation of contextual information (Shafer et al., 1998), and on systems that span wearable, handheld, desktop and infrastructure computers (Garlan et al., 2002). All these efforts justify the importance of ubiquitous computing architectures. At the same time they manifest that there is no global unified framework addressing all needs. Rather, the majority of these architectures concentrate on one or more application specific goals. A fundamental limitation of these architectures is that they assume that all context-aware components (e.g., perceptual components, situation modeling middleware) are contributed by the same technology provider, which is not always the case. CHIL as well as other emerging pervasive computing environments are composed by a variety of distributed heterogeneous components from multiple vendors. Another drawback of these architectures is that they were built upon legacy technologies and therefore do not benefit from emerging technologies (e.g., the semantic web). The CHIL architecture presented in the following sections is proposed as a reference model architecture for multi-modal perceptual systems.
4. The CHIL Reference Architecture Due to the scale of the CHIL project with its large number of partners contributing with a diversity of technical components such as services, event and situation modelling modules, context extraction components and sensors, as well as the complexity of these components, a flexible architecture that facilitates their integration at different levels is essential. A layered architecture model was found to best meet these requirements and allow a structured method for interfacing with sensors, integrating technology components, processing sensorial input and composing services as collections of basic services. In order to realize context-aware, proactive, and intelligent services, both context-delivering and collaborating components have to be integrated. Context is any information that can be used to characterize the situation of a person or computing entity that is considered relevant to the interaction between a user and an application (Dey, 2000). In CHIL, context is delivered both by perceptual components and learning modules. Perceptual components continuously track human activities, using all perception modalities available, and build static and dynamic models of the scene. Learning modules within the agents model the concepts and relations within the ontology. Collaboration is enabled by a set of intelligent software agents communicating on a semantic level with each other. The result of the architecture design work is the CHIL Reference Architecture, which is an architectural framework along with a set of middleware elements facilitating the integration of services, perceptual components, sensors, actuators, and context and situation modelling scripts. The framework and the associated middleware elements facilitate integration and assembly of ubiquitous computing applications in smart spaces. Specifically, they mitigate the integration issues arising from the distributed and heterogeneous nature of pervasive, ubiquitous and context-aware computing environments.
Towards a Reference Architecture for Context-Aware Services
35
4.1 Layer Model The layered system architecture that facilitates the integration of these various components is presented in Fig. 1. According to the components’ different functional levels such as audio and video streaming, perceptual components, sensor control, situation modelling, services and user interfaces, the architecture consists of 8 horizontal layers. They are completed by two vertical layers, the ontology and “CHIL utilities”, which are accessible by all components and provide elementary functions relevant to all layers.
Figure 1. The CHIL architecture layer model The lower layers deal with the management and the interpretation of continuous streams of video and audio signals in terms of event detection and situation adaptation and contain the perceptual components. The upper layers of the infrastructure enable reasoning and management of a variety of services and user interfacing devices. “User-front-end” and “Services and control” are implemented as software agents, thus the service and user interface management is based on agent communication. All layers use a common ontology as a backbone. A detailed description of the layers is given in the following section, the agent infrastructure of the CHIL system is described in detail in section 6. 4.2 Description of the Layers User Front-End: The User Front-End contains all user related components such as the Personal Agents, Device Agents and the User Profile of a CHIL user. The Personal Agent acts as the user’s personal assistant taking care of his demands. It interacts with its master through personal devices (e.g. notebook, PDA, smart phone) which are represented by
36
Advances in Human-Computer Interaction
corresponding Device Agents. The Device Agents are the controlling interfaces to the personal devices. The Personal Agent also provides and controls access to its master’s profile and preferences, thus ensuring user data privacy. The User Profile stores personal data and service-related preferences of a user. Services and control: The Services and Control layer contains the components which implement the different services as well as service-to-service information exchange and user-service communication. The services are implemented through software agents. The Service and Control layer comprises both the service agents and their management. The interaction with other agents within this layer and the User front-end layer follows welldefined standards and the communication mechanisms of the agent platform, while communication with the other layers follows internal mechanisms. Services are both reusable basic services as well as complex higher-level services composed of suitable basic services. Situation Modelling: The Situation Modelling layer is the place where the situation context received from perceptual components is processed and modelled. The context information acquired by the components at this layer helps services to respond better to varying user activities and environmental changes. For example, the situating modelling answers questions such as: Is there a meeting going on in the smart room? Is there a person speaking at the whiteboard? Who is the person speaking at the whiteboard? This layer is also a collection of abstractions representing the environmental context in which users act and interact. An ontological knowledge-base maintains up-to-date state of objects (people, artefacts, situations) and their relationships. Additionally it serves as an “inference engine” that regularly deduces and generalizes facts during the process of updating the context models as a result of events coming from the underlying layer of Perceptual Components. The searched-for situations are dictated by the needs of active services, as for the detection of the level of interruptability of a particular person in a room for whom a service has to monitor incoming calls. In the CHIL Reference Architecture, the situation model takes its information from the perceptual components and produces a set of situation events that are provided to services. Perceptual Components: Perceptual Components are software components which operate on data streams coming from various sensors such as audio, video and RFID-based position trackers, process it, interpret it, and extract information relating to people’s actions. Such information may be the people’s locations, IDs, hand gestures, pose recognition etc. The CHIL Project brought together perceptual component developers from different partners; all of them were using customized tools, different operating systems, a wide variety of signal processing (and other) libraries, and in general a plenitude of equipment which was not agreed upon from the early stages of the project. However, the CHIL Project had the vision to be able to combine the developed technologies and algorithms and deliver state-of-the-art services which would exploit the whole underlying sensor and actuator infrastructure and, ideally, would be independent of the technology provider. These circumstances had to be considered when designing the Perceptual Components tier. The design of the Perceptual Components tier does not define the parameters of the core signal processing algorithm, but pertain to the input and output data modelling aspects of it; it considers the Perceptual Component as a “black box”. The design specifies and gives guidelines, how perceptual components shall operate, “advertise” themselves, subscribe to receiving a specific sensor data stream and how they shall forward their extracted context to
Towards a Reference Architecture for Context-Aware Services
37
the higher layer of the CHIL Reference Architecture. This specification incorporates all the interfaces for communicating with the sensors, the ontology and the services. Logical Sensors and Actuators: Sensors and actuators are keys in the design and implementation of multi-modal perceptual services. They act as the “eyes and ears” of the system and provide a continuous flow of output data to the processing components which extract pertinent information by applying algorithms able to extract elementary context. This layer comprises several abstractions which wrap the sensor control and transmission components for each one of the sensors in the smart space. Several APIs for initializing the component, capturing the sensor data, for configuring the component, and for starting and stopping it are provided. Each sensor is controlled by a specified sensor controller, which provides the low-level commands to the sensor. The sensors, and therewith the logical sensor components produce a continuous flow of information which is transmitted using a particularly designed interface, the ChilFlow middleware (see below), to any consuming component in the CHIL system. Finally, each logical sensor is able to communicate with the framework’s knowledge base where it can register and “advertise” itself to the rest of the framework. Control/Metadata: Control and Metadata provide mechanisms for data annotation, synchronous and asynchronous system control, synchronizing data flows, effective storing and searching multi-media content and metadata generated by data sources. Low-level Distributed Data Transfer: The Low-level Distributed Data Transfer layer of the CHIL Architecture is responsible for transferring high-volume and/or high-frequency data from sensors to perceptual components or between perceptual components. This layer is implemented by the ChilFlow data-transfer middleware. This middleware is heavily used by developers of perceptual components to distribute their components over several networked computers in order to exploit more computational power. In order to free developers from handling networking issues and managing the connections between components, ChilFlow offers an easy to master, yet powerful object-oriented programming interface, which provides type-safe network transparent one-to-many communication channels for data exchange between components. These communication channels are called flows. Flows provide higher abstraction level over ordinary network connections (e.g. TCP sockets) that fit the communication characteristics of components and simplify the work of the components’ developers significantly. CHIL utilities: This layer provides basic functionality that is needed by components in all layers of the framework. One particular example is global timing, an important issue in distributed, event driven systems like CHIL where time-stamped messages and streams are sent through the infrastructure. Ontology: In order to enable the intended cognitive capabilities of the CHIL software environment, it is necessary to conceptualize entities and to formally describe relations among them. Through the ontology, CHIL software components both know the meaning of the data they are operating on and expose their functionality according to a common classification scheme. The CHIL ontology is defined using the Web Ontology Language OWL (OWL, 2008). It comprises several modules that are physically represented by separate Web resources with distinct URLs, among them the fully integrated CHIL agent communication ontology. The idea of modularization is that software developers only need to reference those parts of the ontology that are relevant for them. Additionally, modularization increases performance, for
38
Advances in Human-Computer Interaction
example when deploying the agent communication subset of the ontology in order to generate software agent code. For managing the ontological data, as well as for reasoning and detecting inconsistencies, access to an ontology is typically backed by a central knowledge base management system. The CHIL Knowledge Base Server exposes the functionality of arbitrary off-the-shelf ontology management systems by a well defined API based on OWL. As such, it provides unified access to the central ontological knowledge base for heterogeneous platforms, programming languages and communication protocols. Together with this API and the CHIL ontology, the knowledge base server constitutes the Ontology layer of the CHIL architecture.
5. Sensor Infrastructure & Context-Awareness Sensing infrastructures are a key prerequisite for realizing context-aware applications. Within CHIL several sites have constructed in-door environments comprising multi-sensor infrastructures called “smart rooms”. They include: • Microphones and microphone arrays (Brandstein & Ward, 2001) • Cameras (fixed, active with pan, tilt and zoom (PTZ) or panoramic (fish-eye)) • RFID-based location trackers (Ubisense, 2007) Based on these sensor infrastructures, a variety of perceptual components have been built and evaluated such as 2D and 3D-visual perceptual components, acoustic components, audio-visual components, RFID-based location tracking, as well as output components such as multimodal speech synthesis and targeted audio. Perceptual components derive elementary contextual information; however in general they lack information about the overall current status of the people’s interactions and environmental conditions. To be able to “fuse” many of these perceptual components together in order to determine more sophisticated and meaningful states, additional middleware interfaces have been developed to facilitate the intercommunication of these components. The middleware acts as a receptor of the whole range of the elementary context cues and processes them in order to map the resulting contextual information into a complex situation. This process is called situation recognition and is a major part of every human-centric ubiquitous application. Situation recognition in our implementation follows the “network of situations” approach. This scheme allows the interconnection of distinct cases (situations), which are connected with edges, forming a directed graph. A transition from one situation to another occurs if given constraints are satisfied. As soon as this transition is feasible, the service logic is applied, and the active state is reflected by the newly reached one. Context-awareness is hence modelled by this network. Fig. 2 illustrates such a network of situations which can be used to track situations during meetings in a “smart room”. A meeting is a sequence of three main situations: “MeetingSetup”, “OngoingMeeting” and “MeetingFinished”. Starting with the initial “MeetingSetup” and its sub state “Arrival”, the situation changes to “WelcomeParticipant” when the person identification perceptual component signals that a person is entering the room; the new person will be welcomed by displaying a message. Based on location and activity information of all attendees, the “Meeting Detector” perceptual component recognizes the start of the meeting and triggers the transition to the next main situation “OngoingMeeting”. During the meeting, the body
Towards a Reference Architecture for Context-Aware Services
39
tracker perceptual component tracks the locations of the participants, detects when a person approaches or leaves the presentation area and switches the situation accordingly between the “Discussion” and “Presentation” state. The Meeting Detector again determines the end of the meeting and triggers the transition to the final “MeetingFinished” situation.
Figure 2. A “network of situations” model for tracking meetings in a smart room
6. Agent Infrastructure for Context-Aware Services Context-aware services are usually based on complex heterogeneous distributed systems comprising sensors, actuators, perceptual components, as well as information fusion middleware. In such sophisticated systems agents are preferred (Huhns & Singh, 1999), since they are equipped with a large set of capabilities. An agent is “...any entity that can be viewed as perceiving its environment through sensors and acting upon its environment through
40
Advances in Human-Computer Interaction
effectors” (Russell & Norvig, 2003). Incorporating that “An agent is a computer system, situated in some environment, that is capable of flexible autonomous action in order to meet its design objectives.” (Jennings et al., 1998) and an intelligent inter-agent communication, a multi-agent system seems to perfectly meet the challenges of such complex human-centric systems. Autonomous agents are computational systems that inhabit some complex dynamic environment, sense and act autonomously in this environment, and by doing so, realize a set of goals or tasks for which they are designed (Maes, 1994). Hence, they meet the major requirements for the CHIL architecture: to support the integration and cooperation of autonomous, context-aware services in a heterogeneous environment. Summarized in this short term, the major goals for the infrastructure span discovery, involvement and collaboration of services as well as competition between services in order to perform a certain task the best way possible. Standardized communication mechanisms and protocols have to be considered to raise information exchange onto a semantic level and to ensure location transparency. To this end, we have devised a multi-agent framework that meets the following target objectives: • Facilitate the integration of diverse context-aware services developed by numerous different service providers. • Facilitate services in leveraging basic services (e.g. sensor and actuator control) available within the smart rooms. • Allow augmentation and evolution of the underlying infrastructure independent of the services installed in the room. • Control user access to services. • Support service personalization through maintaining appropriate profiles. • Enable discovery, involvement and collaboration of services. • Provide the human-oriented services to the end user the best way possible. The JADE (Java Agent DEvelopment Framework) platform (JADE, 2008) was selected to be the backbone of our system as it is equipped with many features, which make it a highly flexible and secure platform. As seen in (Altmann et al., 2001), JADE performs adequately in many of the evaluation criteria which include agent mobility capabilities, administration, network traffic issues, stability, security, debugging capabilities etc. Finally, JADE is compliant to the standards for agent communication, agent management and message transfer of the IEEE Computer Society standards organization FIPA (Foundation for Intelligent Physical Agents) (FIPA, 2008). The following sections describe the CHIL agent infrastructure and how we achieved our objectives. Moreover, they demonstrate how we realized a multi-agent system that is capable to “solve problems that are beyond the individual capabilities or knowledge of each problem solver” (Jennings et al., 1998). 6.1 Agent Description The CHIL software agent infrastructure, shown in Fig. 3, is composed of three levels of agents: personal and device agents that are close to the user and offer the functionality and results of the human-centric services to the end user, basic agents providing elementary tasks like service discovery, collaboration and agent communication to the service developers, and service agents, which implement or cover the functionality of the various context-aware services.
Towards a Reference Architecture for Context-Aware Services
41
Figure 3. The CHIL software agent infrastructure User-related Agents Every person in the CHIL environment has a dedicated Personal Agent that acts as a personal secretary for him. Personal Agents are assigned during a login procedure and manage the complete interaction between its user and the CHIL system, supported by specialized device agents, which are bound to specific input and output devices. The Personal Agent knows what front-end devices its master has access to, how it can best receive or provide information and what input and notification types he prefers (notebook, cell phone call, SMS, targeted audio, etc.). Moreover, the Personal Agent is permanently aware of its master’s context (location, environment, activity) by communicating with the Situation Watching Agent, in order to act or react in an appropriate way. Furthermore, the Personal Agent provides and controls access to its master’s profile and preferences, thus ensuring user data privacy. Similar to a user and his Personal Agent, each device has its own Device Agent that manages the complete communication between the device and a user’s Personal Agent. The Notebook Device Agent handles the interaction between the graphical user front-end on the user’s notebook and the user’s Personal Agent, the Cell Phone Device Agent controls the communication between the Personal Agent and the user’s cell phone. The Cell Phone Listener Agent supervises all incoming cell phone based requests: it watches a specific port and in case of an incoming call, it creates a Cell Phone Device Agent and passes over the socket connection to it. The newly created agent will then handle all further communication between the cell phone and the user’s Personal Agent.
42
Advances in Human-Computer Interaction
Basic Agents The major basic agents in the CHIL environment are the CHIL Agent and the CHIL Agent Manager. The CHIL Agent is a subclass of the JADE Agent, which itself is the elementary agent in the JADE agent platform, and acts as the fundamental abstract class for all agents in the CHIL agent framework. It provides methods for essential agent administrative functionality (agent setup and takedown), for ontology-based messaging (create, send, receive messages, extract the message content), utility jobs like logging, and, in cooperation with the CHIL Agent Manager, for directory facilitator service (DF service) tasks such as register and deregister agents, modify agent and service descriptions and search serviceproviding agents based on a semantic service description. Special importance is attached to keep the agent communication conform to the FIPA (FIPA, 2008) specifications: the complete message transfer is compliant to the FIPA Interaction Protocols and the FIPA Communicative Acts and is based on a well-defined communication ontology, as recommended in the FIPA Abstract Architecture Specification. The CHIL Agent Manager is a central instance encapsulating and adding functionality to the JADE Directory Facilitator (DF) and coordinating the selection and use of agents. Each CHIL agent registers its services with the CHIL Agent Manager, so the agent manager is, at any time, aware of all available agents, their abilities and the required resources. The CHIL Agent Manager can act both as a matchmaker and a broker. In case of an incoming service request, it knows which agents can solve the problem and which resources are needed. If the requested service can be provided by a single agent, it returns – serving as a matchmaker the agent identifier of the capable agent to the requester, which in turn may contact the service providing agent. As a broker and in the case of a more complex problem, the CHIL Agent Manager decomposes the overall problem into sub problems, computes a problem solving strategy and invokes the subtasks on appropriate service agents. Having received all necessary partial results, it computes the overall solution and returns the final result to the initial requester. Service Agents Each Service Agent is associated with a specific service and provides access to the service functionality both to users and other agents. Service agents register the service functions with the CHIL Agent Manager using ontology based service descriptions, thus mapping the syntactical level of services to the semantic level of the agent community. These service descriptions can be queried both by Personal Agents in order to provide the service to human users and by other service agents, which may compose various functions from other services to supply their own one. The basic Service Agent is an abstract ancestor class for all specific service agents; it provides the methods for the registration of service features including the necessary resources. Specialized service agents comprise the core functionality of the associated service whereas the service itself may be implemented as agent or may be covered by an agent, which then provides a suitable interface to the service. This applies to the Connector Agent, the Memory Jog Agent, the Travel Agent and the Meeting Manager Agent; the associated services have already been described in the section “CHIL Services”. Two special agents provide common abilities, which are useful for the whole agent society: the Smart Room Agent and the Situation Watching Agent. The Smart Room Agent controls the various optic and acoustic output devices in the smart room. It may communicate general messages to all attendees in the smart room by displaying them on the whiteboard
Towards a Reference Architecture for Context-Aware Services
43
or transmitting them via loudspeaker. Furthermore, it is able to notify single participants without affecting other attendees, using the steerable video projector or targeted audio, dependant on the user’s preferences. The Situation Watching Agent wraps the Situation Model of the smart room. It monitors the smart room and tracks situation specific user information such as the current location, movement and activity on different semantic levels (simple coordinates as well as hints like “attendee X approaches whiteboard”). Moreover, it can provide location information of important artefacts like notebooks, whiteboards, tables etc. Other agents may query the current situation at the Situation Watching Agent as well as subscribe to events; both information retrieving methods are supplied by well-defined access and subscription APIs to the Situation Model. 6.2 Intelligent Messaging As proposed in the FIPA Abstract Architecture Specification (FIPA, 2008), the complete information exchange between agents is based upon a well-defined communication ontology. Thus, the semantic content of messages is preserved across agents, a concept that becomes highly important by the fact that various services are implemented by a great number of developers in different places and the necessity of these developers to understand each other correctly. The CHIL Communication Ontology is completely defined using the Web Ontology Language OWL (OWL, 2008) and fully integrated in the overall CHIL domain ontology. It is based upon the “Simple JADE Abstract Ontology”, an elementary ontology provided by JADE, which must be used for every ontology-based message exchange within the JADE agent management system. The communication ontology extends the CHIL domain ontology by tokens which are indispensable for agent communication, particularly agent actions for requesting services and response classes to hold the results of the services and return them to the requesters. The ontology classes are used in the Java environment by means of the JADE abs package, found in jade.core.abs. Their handling is implemented by the basic CHIL Agent, providing it to the agent community by methods for ontology-based encoding, sending, receiving and decoding ACL (Agent Communication Language) messages. A second level of intelligent messaging is achieved by the implementation of advanced agent coordination. Coordination of agents can be performed using well-defined conversation protocols (Cost et al., 2001). As a first approach, we use the interaction protocols and communicative acts specified by FIPA (FIPA, 2008). The communication support methods of the basic CHIL Agent, supplemented by initiator and responder classes for submitting and receiving messages, ensure that the agent communication is strictly compliant to the FIPA specification. Together with the central CHIL Agent Manger as the agent coordinating instance, the FIPA compliance and the ontology-based message transfer form a highly sophisticated approach of Intelligent Messaging. 6.3 Multiple Scalable Services In the CHIL project, several complex context-aware services have been implemented in different places and had to be integrated in one system. To minimize the integration effort and to allow the service developers to concentrate on the core service functionality, a framework for the integration of services has been highly important. Hence, one of the
44
Advances in Human-Computer Interaction
major goals of the CHIL agent infrastructure was to provide a mechanism that allows the distributed development of services, an easy integration and configuration and the cooperation and communication of multiple services in the CHIL system. A simple service can easily be integrated by creating an agent which handles the framework tasks and the service control, and integrate the agent in the CHIL system. In this way the agent acts as a wrapper for the service; the agent could also embed the service logic itself. But usually a service is more complex and requires particular functionality from other agents. For example, the Connector Service needs, in order to establish a connection the best way possible, knowledge about the user’s connection preferences (phone call, SMS, notebook, audio) and the social relationship between two ore more participants (VIP, business or personal contact). For the sake of privacy, these personal information tokens must be held by the Personal Agent, although they are only used by the Connector Service. Thus, a service may require exclusive service specific functionality that is or must be realized by another agent than the service agent itself. Implementing such functionality in the agent itself implicates that all service providers which use this agent would have to synchronize the agent’s code. This technique would quickly raise significant problems coordinating the implementation and configuration of software components. 6.3.1 Pluggable Behaviours To this end, a plug-in mechanism has been designed that allows an easy integration of service specific functionality in other agents without modifying the agents’ code: service specific agent behaviours are moved to pluggable handlers, which are agent independent and plugged into appropriate agents during their start-up phase. Using this mechanism, the agents themselves remain service independent, contain only the common methods and attributes all partners have agreed on, and thus become stable modules. Three types of pluggable handlers have been considered to be necessary, namely: 1. Setup handler: are responsible for service specific initialization and will be executed in the setup phase of an agent. 2. Event handler: are triggered by events from outside the agent world, e.g. the user’s GUI, a perceptual component, the situation model or a web service. 3. Responder: are added to the agent’s behaviour queue and react on incoming ontologybased ACL (Agent Communication Language) messages sent by other agents. At start-up time each agent parses the service configuration files, determines which behaviours have to be instantiated and adds them to its behaviour queue. Since the code for this mechanism is concentrated in the basic CHIL Agent, the plug-in mechanism is available for all agents without additional implementation work for the agent developer. Moreover, the source of the configuration data could easily be switched; instead of using XML-based files, the service configuration could similarly be imported from a knowledge base. Table 1 shows a sample implementation of a pluggable handler; the example is about a responder, which informs a user (i.e. the user’s Personal Agent) about connection requests from other users. The interface structure allows implementing the handler and the behaviour in two different classes; the handler will be recognised by the plug-in mechanism and provides the actual behaviour that will be added to the agent's behaviour queue. In this example, both are joined in one class: the responder implements the PluggableResponder interface and its getAcceptedMessages and getBehavior methods in order to be handled correctly by the plug-in
Towards a Reference Architecture for Context-Aware Services
45
mechanism as well as extending a generic CHIL responder behaviour and overwriting the prepareResponse and prepareResultNotification methods. public class InformAboutConnectResponder extends CHILSimpleAchieveREResponder implements PluggableResponder { public InformAboutConnectResponder(Agent agent) { super(agent, ((PersonalAgent)agent).matchAbsOntoRequestAction (new AbsAgentAction("InformAboutConnect"))); } public MessageTemplate getAcceptedMessages() { return getPA().matchAbsOntoRequestAction( new AbsAgentAction("InformAboutConnect")); } public Behavior getBehavior() { return this; } protected ACLMessage prepareResponse(ACLMessage request) throws CHILFipaRefuseException { // create the response message return responseMessage; } protected ACLMessage prepareResultNotification ACLMessage request, ACLMessage response) throws CHILFipaFailureException { // create the result message return resultMessage; } } Table 1. An example of a pluggable responder for the Personal Agent accepting ontologybased messages GetAcceptedMessages returns a JADE message template, i.e. a pattern that specifies all types of messages the responder will react to. In this case, it returns a template that accepts messages of type request containing the ontology-based agent action InformAboutConnect, the same message template that is used in the constructor. GetBehavior returns the actual behaviour, which realises the responder’s functionality and which is added to the agent's behaviour queue. In this example it’s the responder itself, but the developer may also create and return a separate behaviour object. The prepareResponse and prepareResultNotification methods ensure FIPA compliance by implementing the agree/refuse and the failure/inform path of the FIPA Request Interaction Protocol respectively. The (optional) prepareResponse method receives the original request and informs the initiator, if the request is agreed to or refused. Additionally, it may start to compile the request. The (mandatory) prepareResultNotification completes the processing of the request and returns the results, or, in case of an error, a failure message to the initiator. 6.3.2 Scalable Services The Pluggable Behaviours mechanism allows a service provider to plug new service specific functions in multiple agents without the need for recompilation (see Fig. 4(a)). However, to
46
Advances in Human-Computer Interaction
fully exploit the features of all CHIL services and to raise the functionality of the whole system beyond the sum of its components, a service must be able to define new messages, which can be understood and compiled by other services. This means that each service provider should be able to define his own service specific ontology. As a consequence, each agent participating in such a multi-service communication has to be able to handle messages from different services and to work with several service ontologies, as illustrated in Fig. 4(b). Hence, service specific ontologies have to be handled in the same way as Pluggable Behaviours: the framework must be capable to plug them in without the need for recompiling the agent’s code. To this end, the Pluggable Behaviours mechanism has been extended to Scalable Services by realising a plug-in technique for service specific communication ontologies.
(a)
(b) Figure 4. Pluggable Handlers and Scalable Services A new service is specified and integrated into the CHIL system by means of a XML-based configuration file. Table 2 shows an example of a service using its own ontology and adding an event handler to the service agent and a responder to the Travel Agent. The file defines all agents participating in the service, their behaviours and the service ontology. Each pluggable handler is specified by its type (setup, event and responder) and its class name. A priority value assigned to each handler can be used to determine the order of execution, which is particularly important for setup behaviours. The service ontology is specified by a name, the namespace, the location and the class name of the ontology class. Furthermore,
Towards a Reference Architecture for Context-Aware Services
47
the configuration file provides an additional feature to system developers and administrators: it allows enabling/disabling certain functionality by simply adding/removing the appropriate configuration elements in the configuration file, without having to recompile the source code. Table 2. Sample service configuration file using multiple agents and a service-specific communication ontology Similar to a service configuration file informing a service about all participating agents, the CHIL system is informed about all participating services: a master configuration file, also based on XML, specifies the services that are activated on system start-up by their names and the location of their configuration files. A sample can be seen in table 3. Table 3. Master configurations file for services
48
Advances in Human-Computer Interaction
6.4 Personalisation A CHIL computing environment aims to radically change the way we use computers. Rather than expecting a human to attend to technology, CHIL attempts to develop computer assistants that attend to human activities, interactions and intentions. Instead of reacting only to explicit user requests, such assistants proactively provide services by observing the implicit human request or need, much like a personal butler would. Each CHIL user is described by a user profile, which contains a set of personal attributes including administrative data relevant to the CHIL system (e.g. access rights, user capabilities and characteristics) and individual personality information like professional and personal interests, contacts and the social relationships between the contacts and the user (VIP, business or personal) as well as interaction, device and notification preferences such as notebook, PDA, cell phone call, SMS, MMS, targeted audio, etc. The administrative part of the user profile is maintained by the system administrator; personal data can be added and modified by the user exclusively by means of a suitable GUI. Access to and control of the user profile is managed by the user's Personal Agent. Thus, the Personal Agent does not only operate as a personal assistant, but also as a privacy guard to both sensitive and public user data. Since the Personal Agent is the only one having access to the user’s profile, it ensures user data privacy. The Personal Agent controls, via dedicated Device Agents, the complete interaction between its user and the CHIL system: it knows what front-end devices its master has access to, how it can best receive or send information to and from its master and what input and notification types its master prefers. Furthermore, the Personal Agent communicates (using both requests and subscriptions) with the Situation Watching Agent to be permanently updated about its master’s current context (location, activity, state of the environment) and the availability of the various devices in a dynamically changing situation. Based on the static data of the user profile and the dynamic context information, the Personal Agent handles user input and connection and notification requests to its master the best way and with the most appropriate media possible. 6.5 Qualitative Advantages of the CHIL Agent Framework The benefits of the CHIL multi-agent framework are manifold. On the one hand, the architecture undertakes a wide range of tedious tasks, easing the deployment of new services, and on the other hand it provides a transparent layer to the developer in terms of information retrieval. It offers high flexibility, scalability and reusability and it facilitates the integration of components at different levels, like • Services, • Perceptual Components, • Sensors and Actuators, and • User Interfaces. Particularly the plug-in mechanism for agent behaviours constitutes a powerful technique for the development, test, integration, configuration and deployment of new services and components. Developers may create new agents and behaviours and use this mechanism for easy behaviour integration and agent configuration, thus facilitating and accelerating the process of development and testing. They may benefit from the reusability feature of the agent framework by including own behaviours in already existing agents in order to use the functionality of these agents. And they may profit from the flexible configuration facility,
Towards a Reference Architecture for Context-Aware Services
49
allocate behaviours to different agents, turn behaviours and even complete services on and off, in the development and test phase as well as in the deployment and integration phase. Another important quality factor is the use of an ontology based agent communication. Elevating the collaboration of components on a semantic level does not only augment the robustness of the system in terms of mutual understanding of internal components, but also reduces the error-proneness when integrating new components and enhances the interoperability with external systems significantly. A high level of scalability is ensured by the fact that all agents can be distributed to a theoretically unlimited number of computers. Furthermore, the described technology for service composition enhances the scalability of the CHIL agent framework in terms of functionality. Additionally, detailed guidelines for service integrators are available, which help service developers to integrate their services into the CHIL architecture framework.
7. Prototype Implementation The CHIL agent-based infrastructure has been utilized for implementing several nonintrusive services (cf. section 2.2) which demonstrate how this framework is appropriate for developing context-aware cooperating applications. The following paragraphs present one of the implemented example scenarios. 7.1 Example Scenario This scenario incorporates the two CHIL services Connector Service and Travel Service and a number of elementary services such as Meeting Service and Smart Room Service. Both CHIL services are integrated into the CHIL architecture using the before described plug-in mechanism. A meeting takes place in a CHIL smart room equipped with sensors and output devices. The presence of each meeting participant is considered to be crucial for the outcome of the meeting. Hence, the meeting will be delayed, if one of the participants is late. All meeting participants are known to the CHIL system and have CHIL-enabled personal devices, i.e. notebooks, PDAs or smart phones with a CHIL software client. Most of the participants have itineraries with flights or trains leaving shortly after the scheduled end of the meeting. One of the participants realizes that he will be late for the meeting. He uses one of the functionalities of the Connector Service by sending an ”I’m late”-notification (together with the expected arrival time) to the CHIL system from his personal device (e.g. a smart phone, see Fig. 5, left). The system then informs the other participants about the delay via the smart room devices or personal devices, dependent on the current location of the participant, user preferences and the available output media. The Personal Agents of the other participants know the planned itineraries of their masters. Triggered by the delay message each Personal Agent determines whether the delay is likely to let its master miss his return connection. If this is the case the Personal Agent providently initiates a search for alternative connections. It provides the Travel Agent with the necessary information including user preferences, e.g. if its master prefers to fly or take a train. The Travel Agent processes the request by retrieving information from online services of railway operators, airlines and travel agencies. Eventually, it sends a list of possible connections to the Personal Agent which notifies its user. Notification is done unobtrusively taking into account the current environment situation of its master (e.g. “in meeting”), the currently
50
Advances in Human-Computer Interaction
available output devices (i.e. personal devices like smart phones, PDAs, notebooks and output devices of the smart room, e.g. targeted audio or steerable video projector) and the preferred way of notification (e.g. pop-up box or voice message).
Figure 5. Smart phone version of the CHIL service access client A possible outcome of the search could also be that the Personal Agent informs its master that he should leave the meeting as planned, since there was no suitable alternative itinerary. In case the CHIL user is not satisfied with any of the proposed itineraries or wants to look up travel connections himself, he can use his CHIL-enabled personal device to do so. Fig. 5, right, shows the query mask on a smart phone. An equivalent user front end is available for notebooks.
8. Conclusion Developing complex sensing infrastructures, perceptual components, situation modelling components and context-aware services constitute extremely demanding research tasks. Given the tremendous effort required to setup and develop such infrastructures, we strongly believe that a framework ensuring their reusability in the scope of a range of services is of high value. This is not the case with most ubiquitous, pervasive and contextaware systems, which tend to be very tightly coupled to the underlying sensing infrastructure and middleware (Smailagic & Siewiorek, 2002; Ryan et al., 1998). It is however expedient in projects like CHIL, where a number of service developers concentrate on radically different services.
Towards a Reference Architecture for Context-Aware Services
51
In this chapter we presented a reference architecture for context-aware services. We put the main focus on the distributed agent framework that allows developers of different services to concentrate on their service logic, while exploiting existing infrastructures for perceptual processing, information fusion, sensors and actuators control. The core concept of this framework is to decouple service logic from context-aware and sensor/actuator control middleware. Hence, service logic can be “plugged” in a specific placeholder based on well defined interfaces. The agent framework has been implemented based on the JADE environment and accordingly instantiated within real life smart rooms comprising a wide range of sensors and context-aware components. The benefits of this framework have been manifested in the development of different cooperating applications providing assistance during meetings and conferences as well as facilitating human communication.
9. References Altmann, J.; Gruber, F.; Klug, L.; Stockner, W.; Weippl, E. (2001). Using mobile agents in real world: A survey and evaluation of agent platforms, Proc. of the 2nd International Workshop on Infrastructure for Agents, MAS, and Scalable MAS at the 5th International Conference on Autonomous Agents, pp. 33-39, Montreal, Canada, June 2001 AMI (2008). Augmented Multi-party Interaction (2008), Available from http://www.amiproject.org, accessed July 31, 2008 Brandstein, M.; Ward, D. (2001). Microphone Arrays: Signal Processing Techniques and Applications, Springer, ISBN 978-3-540-41953-2, Berlin, Germany Chen, H.; Finin, T.; Joshi, A.; Perich, F.; Chakraborty, D.; Kagal, L. (2004). Intelligent Agents Meet the Semantic Web in Smart Spaces, IEEE Internet Computing 8, 6 (Nov. 2004), pp. 69-79, ISSN 1089-7801 Coen, M.; Phillips, B.; Warshawsky, N.; Weisman, L.; Peters, S.; Finin, P. (1999), Meeting the Computational Needs of Intelligent Environments: The Metaglue System, Proceedings of MANSE’99, pp. 201–212, Dublin, Ireland, Dec. 1999 Cost, R. S.; Labrou, Y.; and Finin, T. (2001). Coordinating agents using agent communication languages conversations. In: Coordination of Internet agents: models, technologies, and applications, Omicini, A.; Zambonelli, F.; Klusch, M.; Tolksdorf, R. (Eds.), pp. 183196, Springer-Verlag, ISBN 978-3-540-41613-5, London, UK Dey, A.K. (2000). Providing Architectural Support for Building Context-Aware Applications, Ph.D. dissertation, Georgia Institute of Technology, ISBN 0-493-01246-X, Atlanta, GA, USA FIPA (2008). The Foundation for Intelligent Physical Agents, Available from http://www.fipa.org, accessed July 31, 2008 Garlan, D.; Siewiorek, D.; Smailagic, A.; Steenkiste, P. (2002). Project Aura: Towards Distraction-Free Pervasive Computing, IEEE Pervasive Computing, special issue on ”Integrated Pervasive Computing Environments”, Vol. 1, Issue 2, (April 2002), pp. 22–31, ISSN 1536-1268 Grimm, R.; Anderson, T.; Bershad, B.; Wetherall, D. (2000). A System Architecture for Pervasive Computing, Proceedings of the 9th ACM SIGOPS European Workshop, pp. 177–182, ISBN 1-23456-789-0, Kolding, Denmark, Sept. 2000, ACM, New York, USA Huhns, M.N.; Singh, M.P. (1997). Agents are everywhere, IEEE Internet Computing, Vol. 1, Issue 1, (January 1997), pp. 87-87, ISSN 1089-7801
52
Advances in Human-Computer Interaction
JADE (2008). Java Agent DEvelopment Framework, Available from http://jade.tilab.com, accessed July 31, 2008 Jennings, N.R.; Sycara, K.; Wooldridge, M. (1998). A roadmap of agent research and development. Journal of Autonomous Agents and Multi-Agent Systems, 1(1), (1998), pp. 7–38, ISSN 1387-2532 Johanson, B.; Fox, A.; Winograd, T. (2002). The Interactive Workspaces Project: Experiences with Ubiquitous Computing Rooms, IEEE Pervasive Computing, Vol. 1, Issue. 2, (Apr-Jun 2002), pp. 67–74, ISSN 1536-1268 Maes, P. (1994). Agents that reduce work and information overload. Communications of the ACM 37, 7 (July 1994), pp. 30-40, ISSN 0001-0782 OWL (2008). W3C Semantic Web, Web Ontology Language (OWL). Available from http://www.w3.org/2004/OWL, accessed July 31, 2008 Roman, M.; Hess, C.; Cerqueira, R.; Ranganat, A.; Campbell, R. H.; Nahrstedt, K. (2002). Gaia: A Middleware Infrastructure to Enable Active Spaces, IEEE Pervasive Computing, Vol. 1, No. 4, (October 2002), pp. 74–83, ISSN 1536-1268 Russell, S.J.; Norvig, P. (2003). Artificial Intelligence: A Modern Approach. Prentice Hall, ISBN 9780137903955 Ryan, N.; Pascoe, J.; Morse, D. (1998). Enhanced reality fieldwork: the context-aware archaeological assistant. In: Computer Applications and Quantitative Methods in Archaeology, V. Gaffney, M. van Leusen and S. Exxon (Eds), Oxford Shafer, S.; Krumm, J.; Brumitt, B.; Meyers, B.; Czerwinski, M.; Robbins, D. (1998). The New EasyLiving Project at Microsoft Research, Proc. of Joint DARPA / NIST Workshop on Smart Spaces, pp. 127–130, July 30-31, 1998, Gaithersburg, Maryland, USA Smailagic, A.; Siewiorek, D.P. (2002). Application Design for Wearable and Context-Aware Computers. IEEE Pervasive Computing 1, 4 (Oct. 2002), pp. 20-29, ISSN1536-1268 Stiefelhagen, R.; Garofolo, J. (Eds.) (2007). Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, April 6-7, 2006, Revised Selected Papers Series: Lecture Notes in Computer Science , Vol. 4122 Sublibrary: Image Processing, Computer Vision, Pattern Recognition, and Graphics, Springer, ISBN 978-3-540-69567-7 Stiefelhagen, R.; Bowers, R.; Fiscus, J. (Eds.) (2008). Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers Series: Lecture Notes in Computer Science , Vol. 4625, Sublibrary: Image Processing, Computer Vision, Pattern Recognition, and Graphics 2008, XIII, Springer, ISBN 978-3-540-68584-5, Ubisense (2007). The Ubisense Precise Real-time Location System. Available from http://www.ubisense.net, accessed July 31, 2008 Wellner, P.; Flynn M. (Eds.) (2005). m4 - multimodal meeting manager. Deliverable D4.3: Report on Final Demonstrator and Evaluation, 25 February 2005, Available from http://www.dcs.shef.ac.uk/spandh/projects/m4/publicDelivs/D4-3.pdf, accessed July 31, 2008
4 A Robust Hand Recognition In Varying Illumination 1Seoul
Yoo-Joo Choi1, Je-Sung Lee2 and We-Duke Cho3
University of Venture and Information, 2Korean German Institute of Technology, 3Ajou University South Korea
1. Introduction As ubiquitous computing provide up-graded smart environments where humans desire to create various types of interaction for many kinds of media and information, the research in the area of Human-Computer Interaction (HCI) is being emphasized to satisfy a more convenient user interface. In particular, the gesture interaction technique has been one of the important research areas under ubiquitous computing environment since it can only utilize widespread consumer video cameras and computer vision techniques without the aid of any other devices to grasp human movements and intentions(Park, 2004; Jung, 2007). Among the gesture interaction techniques, recognition of hand poses and gestures has especially received attention due to great potential to build various and user-centric computer interfaces. The applicability of hand pose recognition is very high in applications where system users can not use existing interface devices such as a keyboard and a mouse since they are required to wear heavy protective gloves for industrial processes. Various types of gesture interfaces have been also presented in three-dimensional games based on virtual reality and these interfaces have enhanced an interest level and creativity within these environments for the users. Humans can distinguish hand poses very quickly through their complex optical systems, while it is very difficult for a computer system to rapidly and accurately understand hand poses. Therefore, many researchers have tried to simulate the human optical system, which can extract objects of interest from complex scenes and understand the context among objects. One of the major factors that disturb automatic gesture recognition is illumination change. The sudden illumination changes lead to the misunderstanding of background and foreground regions. We propose a robust hand recognition technique that can stably extract hand contours even under sudden illumination changes. Figure 1 shows the flowchart for our proposed method. The proposed method acquires the background images for a restricted duration and calculates the mean and standard deviation for the hue and hue-gradient of each pixel within the captured background images. That is, a background model for each pixel is built. The hue and hue-gradient of the input images captured in real-time are calculated and compared to those of the background images. The foreground objects are extracted based on the difference magnitude between those of the input image and the background image. To accurately extract the tight object region of interest, we calculate the eigen value and eigen
54
Advances in Human-Computer Interaction
vector for the initially extracted object region and extract the object-oriented bounding box(OBB) on the optimized hand region based on the two eigen vectors. Then, the OBB region is divided into 16 sub-regions and the hand region profile is produced based on the histogram created from the number of edges for each sub-region. The profiles of nine hand poses are trained and each hand pose is recognized using a multi-class SVM algorithm.
Figure 1. System flowchart
2. Previous Work Current research of hand pose recognition can be classified into hardware sensor based methods that mostly utilize datagloves and image processing based methods that utilize two-dimensional pattern information or three-dimensional models. Since methods that use datagloves obtain three-dimensional data through sensors directly attached to the gloves in real-time, these methods make hand gesture analysis easy. However, more or less expensive equipments are required and there are problems in implementing natural interfaces since the cables are necessary to connect the equipments to the computing system. As image processing based methods, restricted approaches that uses red or green gloves to skip the complex pre-processing of the hand regions extraction have been presented (KAIST,2004). Other approaches have attempted to recognize hand movements based only on images captured from video cameras under normal lighting conditions without any specialized equipments (Park et al., 2002; Jang et al., 2004; Han, 2003; Jang et al., 2006; Tanibata et al., 2002; Licsar et al., 2005). To support general and natural interfaces, an interest in the hand pose recognition without the use of any specialized equipments or gloves have increased.
A Robust Hand Recognition In Varying Illumination
55
A variety of hand features have been applied for hand pose recognition. In the research proposed in (Park et al., 2002), the direction and size of the hand region was considered to determine the area of interest window and the improved CAMSHIFT algorithm was used to track the coordinates of the hand tip. The focal point of their research was in recognizing the pointing direction by extracting the coordinates of the hand tip from a single hand pose instead of various hand poses. In (Jang et al., 2004)’s research, they defined three main features for the hand pose. First, the normalized hand region on the direction and size was calculated and then from the center point of the normalized hand region, straight lines were beamed on the rotation at the regular angle. The distance on the straight line beamed from the center of the region to the outer boundary of the hand was used as the feature. The ratio of the length of the minor axis to that of major axis was also used as the hand feature. In (Han, 2003)’s research on hand pose recognition, they first estimated the entropy on the frame difference between adjacent frame images and through the distribution on the skin color, then they extracted only the hand region from the frame image. Nest, the contour of the hand was detected using a chain-code algorithm and an improved centroidal profile method was applied to recognize the hand pose on a speed of 15 frames per second. In (Jang et al., 2006)’s research, they proposed a method that divides the hand pose features into structural angles and hand outlines. They defined the relation between these hand features by using enhanced learning. Through their research, they proved the appropriateness of their proposed method by applying their method to a hand pose recognition system that uses not one but three cameras. As shown above, although many methods have been proposed for hand pose recognition, most of the research is carried out in limited situational environments with no change in lighting or the background. Thus, environment-dependent research results have been produced(Park et al., 2002; Jang et al., 2004; Han, 2003; Jang et al., 2006; Tanibata et al., 2002; Licsar et al., 2005). In contrast to the research, the application areas for hand pose recognition techniques call for environments that can handle illumination changes. Thus there is an increased demand for research on robust hand pose recognition under illumination changes. In addition, a variety of hand features that characterize the hand pose have been proposed in order to improve the success rate of the recognition, however, these features are quite complex making it difficult for real-time processing. This calls for better research in defining hand features that can efficiently and accurately represent hand poses.
3. Hand Region Extraction 3.1 Hue and Hue-Gradient Distribution of the Hand Region The RGB color model is an additive color model that adds the three additive primaries R(Red), G(Green), B(Blue) to create a desired color. RGB color model is sensitive to light and shadow making it difficult to extract object outlines during image processing(Kang, 2003). On the contrary, HSI color model is a user-oriented color model that is based on how humans perceive light. This model is composed of H(Hue), S(Saturation), and I(Intensity). Figure 2. Shows the HSI color model is depicted as a basic triangle and color space. Hue is expressed as an angle ranging from 0° to 360°and saturation is the radius ranging from 0 to 1. Intensity is the z axis with 0 being black and 1 being white(Kang, 2003).
56
Advances in Human-Computer Interaction
Figure 2. HSI color model The hue component in the HSI color model represents the object’s inherent color value independently from the brightness or saturation of an object. This model can be converted from the RGB model through Equation (1)(Kang, 2003). Figure 3 shows the hue and saturation difference between the hand region and the background region.
(1)
Figure 3. Hue and hue-gradient in hand region and background region In HSI color model, components related to the color are represented by hue and saturation, and brightness is represented by intensity, which makes it easier to separate a region of
57
A Robust Hand Recognition In Varying Illumination
interest regardless of illumination change. HSI color models are widely used in the field of interpretation of images from cameras because when there is a change in the color, all three RGB components in the RGB model are changed, whereas in the HSI model, only the hue are mainly changed. Table 1 shows the histograms for the hue image and saturation image under changes in the illumination. When there is a sudden change in the lighting condition, we can see that the histogram of the saturation is completely changed. On the other hand, although there is some change in the histogram, the distribution characteristic of the histogram of the hue is relatively preserved. Original Image
Saturation Histogram
Hue Histogram
Table 1. Change of saturation and hue in response to illumination change 3.2 Background Subtraction Based On Hue and Hue-Gradient Background subtraction is an object detection method that first obtains the average background image from frame images inputted for a set period of time, and then carries out a pixel comparison between the average background image and the incoming input image (Haritaoglu, 1998). This method is summarized as follows. if ( |In(x)-Bn(x)| >Tn(x)) x is a foreground pixel else x is not a foreground pixel
(2)
Bn(x) is the background image pixel. In other words, if the difference between the background image pixel and the current image pixel In(x) is bigger than the threshold Tn(x), then the background pixel is regarded as the foreground pixel. In contrast to the frame subtraction methods that can only detect areas of movement in a short period of time during a scene change, background subtraction methods can detect the overall region of the moving objects. Thus object detection can be efficiently carried out if the background image well represents the current environmental condition without much change. However, if there are changes to the environment such as illumination changes and the background image does not accurately represent the actual environment anymore, the objects cannot be successfully detected using the background subtraction method. Therefore, the accuracy of the background subtraction method is highly dependent on how accurate the background model represents the current background. For accurate object detection, the background image needs to continuously learn the changing environment over time.
58
Advances in Human-Computer Interaction
The hue component represents the color of the image itself and minimizes the illumination effect. The hue-gradient image maintains the background image’s features and on the other hand eliminates the illumination changes and shadow effects. This section introduces a background subtraction method based on hue and hue-gradient. First, to acquire the background model image in the HSI color space, a background model that has been trained for a certain period of time is built with hue and hue-gradient of the background region. Through this model, the object is extracted by first calculating the hue and hue-gradient values when the object is inputted and then separating the foreground region from the input image by comparing the threshold defined based on the background model (Choi, 2007).
Figure 4. The procedure of background image creation During the background learning stage, learning is carried out on both the hue component and hue-gradient value for each pixel. To measure the color change caused by general illumination changes, the background image is first acquired for a time period of Ti and then the mean and variance values for each pixel’s hue is calculated. Likewise, a sequence of background images inputted for a time period of Ti is used to calculate the hue-gradient background image. The individual pixels are used with the adjacent pixels to calculate the gradient size of the hue component value. Another averaged background image is then built with the hue-gradient component value. ) of each pixel in the background image The equation to calculate the hue gradient size ( is shown in Equation (3). (3) The equation to gradually calculate the mean and variance of the hue component for each pixel for time t is shown in Equation (4). Mean Update : :t=0 :t≥1 Variance Update :
(4) :t=1 : t ≥2
59
A Robust Hand Recognition In Varying Illumination
The equation to gradually calculate the hue-gradient component’s mean and variance for is the hue-gradient mean’s initial each pixel for time t is shown in Equation (5). value and is the mean at time t. is the hue-gradient variance’s initial is the variance of hue-gradient at time t. value and Mean Update : :t=0 :t≥1 Variance Update :
(5) :t=1 : t ≥2
The inputted images which include the object are converted from the RGB color space to the HSI color space. If difference between the converted hue value of each pixel and that of background image is greater than the variance of background hue image, it is regarded as the object candidate region. Likewise, if difference between the hue-gradient of each pixel of the input image and that of the background image is greater than the variance of background hue-gradient image, it is regarded as the object candidate region. The region that satisfies with both conditions is extracted as the object region. The procedure of the proposed object extraction is shown in Figure 5.
Figure 5. The Procedure of the proposed object extraction The comparison to extract the object region is shown in Equation (6). `(6)
60
Advances in Human-Computer Interaction
and represent the hue and hue-gradient value of pixel i of the current input image, respectively. , , , and represent the means of the hue and hue-gradient, variances of hue and hue-gradient of the background model, respectively. and each represent the weight value for the threshold region.
4. Hand Pose Recognition 4.1 Hand Pose Feature Extraction This section presents a hand pose recognition method that recognizes the 1-9 hand sign of the sign language regardless of its direction and size. Figure 6 shows the 1-9 hand sign. 18 scalar feature values, that is, two normalized eigen values of the OBB for the detected hand region, the number of hand edge points in 16 subregions that are defined in the detected OBB, are used as features of a hand pose.
Figure 6. Numbers 1 to 9 using hand signs 4.1.1 Extraction of the eigen value and eigen vector The detected object has different directions and sizes making image recognition difficult. To increase the success rate of the recognition of hand poses regardless of hand’s direction and size, the object-oriented bounding box(OBB) is calculated by extracting the two eigen vectors for the detected hand object. The eigen vetors are preserved even though the object is rotated or scaled. Eigen value and eigen vector always form a pair (Kreyszig, 1999a). The eigen value and eigen vector are derived from the covariance matrix that expresses the specific object’s fluctuating value. Covariance matrix is the statistical criteria, which represents the changing aspect of each fluctuating value when two or more fluctuating data is given. The covariance matrix is and calculated using the following equation when the samples’ random data is bivariate (Kreyszig, 1999b). If (7) The covariance matrix C is (8)
A Robust Hand Recognition In Varying Illumination
61
The covariance is equal to the variance of x, and the value is equal to the . The covariance matrix is always symmetric and has the same number of variance of y, eigen vectors as the dimension of the covariance matrix. Eigen vectors are always perpendicular to each other as shown in Figure 7.
Figure 7. Example of eigen vectors for one hand pose 4.1.2 Definition of feature values Figure 8 shows the flowchart for the OBB extraction process for a hand pose. Figure 8 (a) shows the target OBB region for the hand pose and Figure 8 (b) shows mean point A and the major/minor eigen vectors. Figure 8 (c) represents the transform to move the object coordinates to x-y coordinates. That is, the mean point is moved to the origin, and major vector and minor vector are aligned to x axis and y axis, respectively. Figure 8 (d) shows that the origin is translated to the center of the window. To align the major and minor eigen vectors of the hand region to x axis and y axis, the and , as Equation (9). The contour points can be matrix W is defined by eigen vectors, transformed to the x-y coordinates by using Equation (10). (9) (10) After the main axes of the object are transformed, a maximally possible OBB is defined based on the variances of x and y, and . The reason for holding a maximally possible OBB region with the variance and in advance is to prevent the size of OBB from being extremely misestimated due to image noise that exist outside the hand region. The OBB is adjusted by finding the maximum contour point from the starting point on both the x and y axis’ positive and negative directions within the maximally possible OBB. By defining the maximally possible OBB in advance with the variance and optimizing the OBB based on the contour point within the OBB, the error rate for false ROI extraction due to background noise can be reduced. Figure 8 (e) shows the result of the optimized OBB detection. 18 feature component values are used to recognize the hand signs representing 1 to 9 from the hand images. First, the eigen values from the two directions of the hand region is normalized using the variances of x and y within the OBB. The OBB is then subdivided into 16 regions as shown in Figure 9 and the number of overall contour pixels contained in each region is counted. Two normalized eigen values and the number of contour points for each sub-region are used as the feature information.
62
Advances in Human-Computer Interaction
(a) Input image and OBB region of interest
(b) Center point and eigen vectors
(c) Alignment of eigen vectors
(e) optimized OBB Figure 8. Procedure for extracting the optimized OBB
(d) Translation of origin
A Robust Hand Recognition In Varying Illumination
63
Figure 9. Sixteen sub-regions of an OBB 4.2 Recognition using SVM SVM (Support Vector Machine) is a method that was developed by Vladimir Vapnik and his research team at AT&T Bell Research Lab. It is an object identification method that is widely used from the field of data mining to pattern recognition application areas such as face recognition(Han, 2005; Cristianini, 2000; Weida, 2002). A linear SVM is a Statistical Learning Theory(SLT) which classifies the arbitrary data based on the decision function being defined by PDF(Probability Density Function). PDF is obtained by the learning process of the training data and the categorical information.
Figure 10. Procedure of hand shape recognition using SVM SVM is an alternative learning method to polynomial, radial basis function and multi-layer perceptron. SVM can abstract patterns into higher level feature space and make globally optimal classification to be possible. Most of traditional pattern recognition methods such as neural network and statistical methods are based on empirical risk minimization(ERM) for optimal data execution, while SVM is based on the structural risk minimization(SRM), which minimizes the probability to misclassify unknown data set(Han, 2005). Originally the SVM was conceived for only dual class, but it has been expanded to multi-class classification responding to need. Expanding the SVM from a dual class to multi-class involves combining the dual class SVMs. In this section, nine hand poses are categorized using the multi-class SVM(Crammer, 2001). The
64
Advances in Human-Computer Interaction
multi-class SVM technique is used on the feature values extracted from the hand poses in section 4.1 to produce the learning model. The hand pose is extracted from the real-time input images and the feature values are extracted to classify the hand pose based on the hand learning model. Figure 10 shows the process for hand pose learning and recognition using the SVM. The training data created for the SVM learning algorithm is sorted as the class name, feature classification number, and feature value. Learning is executed in off-line by the multi-class SVM learning function. The class name is the hand sign number from the hand signs 1 to 9’s classification and each class defines 18 feature values which are sorted as the feature classification number and the extracted feature value. The feature values contain the two normalize eigen values and the overall number of pixels for OBB’s 16 sub-regions.
5. Experimental Results 5.1 Hand Region Detection Under Illumination Change The testing environment set up for this experiment was implemented with Visual C++ software on a Windows XP with a Pentium-IV 3.0 GHz CPU and 1 GB of memory. Logitech Quickcam Chat camera was used to acquire input test images for the hand tracking. The images are 320 x 240 in size captured as a 24bit RGB color model. The hue based background image was produced by converting the RGB color space into the HSI color space, and then calculating the hue and the hue-gradient’s mean and variance. During the next step, the hue and hue-gradient values are extracted from the image pixels captured in real-time. The hue and hue-gradient values of the input image are compared with those of the background images. As a result, the binary image including the contour of the hand is produced. In other words, the contour of the hand is marked as white and the background is marked as black.
(a) input image
(b)RGB color model
(c) Normalized RGB Color Model
(d) Hue
(e) Hue and H-Gradient
Bright Illumination
Dark Illumination
Table 2. Comparison of background subtraction methods based on different color models under sudden illumination changes For the experiment, we compared the results of the existing methods with that of the proposed method under the exact same illumination from when the initial background was
65
A Robust Hand Recognition In Varying Illumination
built. Then we also compared them under drastically different illumination conditions from the initial illumination. To create the illumination change, one condition used a 80 lm/W normal white fluorescent lighting together with a 18W 63.1 lm/W white desktop lamp directly on the hand region. The other condition eliminated the 18W 63.1 lm/W white desktop lamp and carried out the experiment to extract the hand region without rebuilding the background model for the varying illumination. Object none 1 2 3 4 5 6 7 8 9
Covariance Matrix 0
0
0
0
589.96
216.33
216.33
983.66
619.37
293.45
293.45
1506.82
566.75
182.57
182.57
1384.14
751.79
99.61
99.61
1468.81
931.3
5.57
5.57
585.33
913.41
117.5
117.5
952.35
831.65
154.6
154.6
1278.43
832.16
28.46
28.46
1413.11
1037.26
-73.61
-73.61
1387.16
Eigen Value
Eigen Vector
0
0
494.33
1079.3
531.11
1595.08
527.82
1423.06
738.21
1482.39
585.24
931.39
813.78
1051.99
783.37
1326.71
830.77
1414.51
1022.4
1402.02
1
0
0
1
0.91
-0.4
0.4
0.91
0.96
-0.29
0.29
0.96
0.98
-0.21
0.21
0.98
0.99
-0.14
0.14
0.99
0.02
-1
1
0.02
0.76
-0.65
0.65
0.76
0.95
-0.3
0.3
0.95
1
-0.05
0.05
1
0.98
0.2
-0.2
0.98
Table 3. Covariance Matrix, Eigen Value and Eigen Vector’s Mean on the Hand Signs 1 to 9 As shown in Table 2, when the dark and bright illumination conditions are compared, we can see that applying the proposed method better preserves the hand contour. Table 2. (b) is the binary image produced from the difference between the background image based on the RGB color model and the input image. It is the most sensitive to light change. Table 2. (c) is the result of applying normalized RGB color model as the background model. Here the contour of the hand is roughly preserved but the hand’s internal noise as well as shadow noise is more prominent. Table 2. (d) is the binary image based only the HSI color model’s hue value. There is an improvement from the RGB color model but it reacts to the shadows
66
Advances in Human-Computer Interaction
due to the illumination changes. Table 2. (e) shows that even under sudden illumination changes, the region information can be stably extracted using the proposed method. (a) Hand Sign 1
(b) Hand Sign 2
(c) Hand Sign 3
(d) Hand Sign 4
(e) Hand Sign 5
(f) Hand Sign 6
(g) Hand Sign 7
(h) Hand Sign 8
(i) Hand Sign 9
Eigen Vectors
Optimized OBB
Eigen Vectors
Optimized OBB
Eigen Vectors
Optimized OBB
Table 4. Optimized OBB detection results on the hand signs (1-9)
A Robust Hand Recognition In Varying Illumination
67
Table 3. shows all the covariance matrix values as well as the eigen values and eigen vectors on the hand signs 1 to 9. Each hand sign has an eigen value proportioned to the input image object’s ROI size. Figure 11 compares 90 samples for the hand signs representing from 1 to 9. When image interpretation is carried out using the correlation analysis on the major and minor eigen values, we can see that an initial classification is possible on the 1 to 9 hand signs. Table 4 shows the optimized OBB detection result for the object in the input image. The first row shows eigen vectors to be extracted from the result image of background subtraction and the second row represents the OBB optimized by transforming object coordinates using eigen vector matrix and by detecting object’s outermost points.
Figure 11. Two-dimensional distribution on the eigen value for hand signs 1 to 9
Figure 12. Optimized OBB on hand sign 1 from different directions. (Left) Input image (Middle) ROI and eigen vectors prior to optimization (Right) Optimized OBB
68
Advances in Human-Computer Interaction
Figure 12. shows the OBB that has been optimized on the same hand sign 1 expressed from different angles. As we can see, even with changes in the hand direction, the normalized OBB’s direction is steadily maintained and enabling stable hand recognition independent of hand direction. 5.2 Hand Recognition The data used for this experiment consists of 1620 images – 180 images per hand sign (1 to 9). All 6 subjects individually changed hand poses on each of the hand signs resulting in 30 captured images per hand sign. 18 different hand features were learned through the SVM learning algorithm and hand recognition was performed. The result of the hand recognition is summarized in Table 5. This includes the result of failed recognition by one subject on 1,2, 4, and 7. The mean success rate of recognition on the 9 hand signs showed 92.6%. For movies incoming at 30 frames per second, the proposed hand recognition method was able to extract and recognize hand poses at 20 frames per second. The hand recognition processing time on one frame showed 1.260 msec enabling real-time processing.
class
Number of Tests
Normal Recognition
Faulty Recognition
Recognition Rate
1
180
150
30
83.3
2
180
150
30
83.3
3
180
180
0
100
4
180
150
30
83.3
5
180
180
0
100
6
180
180
0
100
7
180
150
30
83.3
8
180
180
0
100
9
180
180
0
100
Sum
1620
1500
120
92.6
Table 5. Optimized OBB Detection Result on hand signs 1 to 9
6. Conclusions This chapter introduced a hand pose recognition method to robustly extract objects under sudden illumination changes. The introduced method constructs the background model
A Robust Hand Recognition In Varying Illumination
69
based on the hue and the hue-gradient, and then robustly extracts the object contours from images obtained from a fixed camera by the background subtraction. It was shown that this method stably detects the object even under various lighting conditions set by the experiment. This research carried out a comparison experiment on the different color models for image recognition systems. The HSI color model minimizes the illumination and shadow effects and by using the hue and hue-gradient values from this color model for the background subtraction method, it showed that the proposed method sharply decreases the noise and shadow effects caused by sudden illumination changes. The optimized OBB that was produced from the extracted region is then divided into 16 sub-regions.The number of edges of the hand in that sub-region and the normalized eigen values are used for defining the hand pose feature which are then trained by the SVM learning algorithm. This method was designed for use in gaming environments where illumination conditions can change rapidly in the limited simple environment. When the proposed method was tested in a complex environment, it was shown that if the hand region has similar color to the background region, the hand’s contour was lost. In the future, this method will be expanded to include stable hand pose recognition in complex background environments.
7. References Choi, Y.;Lee, J.;Cho W. (2007) Robust Extraction of Moving Objects based on Hue and Hue Gradient, LNCS, Vol. 4555, pp.784~791 Crammer, K.; Singer, Y. (2001) On the Algorithmic Implementation of Multi-class SVMs, JMLR Cristianini, N. (2000) An Introduction to Support Vector Machines, Cambridge University Press Han, H. (2005) Introduction to Pattern Recognition, Hanbit Media, pp.274~282 Han, Y. (2003) Gesture Recognition System using Motion Information, The KIPS Transactions: PartB , Vol. 10, No. 4, pp.473-478 Haritaoglu, I.; Davis, L.; Harwood, D. (1998) W4(Who? When? Where? What?) a real time system for detecting and tracking people. In FGR98 Jang, H.; Bien, Z. (2006) A Study on Vision-based Robust Hand-Posture Recognition Using Reinforcement Learning, Journal of IEEK : CI. Vol. 43, No.3, pp.39-49 Jang, H.; Kim, D.; Kim, J.; Jung, J.; Park, K.; Z. Zenn Bien. (2004) Decomposition approach for hand-pose recognition, International Journals of HWRS, Vol. 5, No.1, pp.21-25 Jung, K. (2007) “Introduction to Ubiquitous Computing: Connection to new media”, jinhan M&B, pp. 48~63 Licsar, A.; Sziranyi, T. (2005) User-adaptive hand gesture recognition system with interactive training, Image and Vision Computing, Vol.23, No.12 , pp.1102-1114 Kang, D.; Ha, J. (2003) Digital Image Processing using Visual C++, SciTech Media, pp.314~322 KAIST (2004) Hand signal recognition method by subgroup based classification, Korean Patent, 10-0457928 Kreyszig, E. (1999a) Advanced Engineering Mathematics, Eight Edition, John Wiley & Sons, Inc., pp. 371~375
70
Advances in Human-Computer Interaction
Kreyszig, E. (1999b) Advanced Engineering Mathematics, Eight Edition, John Wiley & Sons, Inc., pp. 1150~1151 Park, J.; Yi, J. (2002) Efficient Fingertip Tracking and Mouse Pointer Control for Implementation of a Human Mouse, Journal of KISS B Vol.29, No.11, pp.851-859 Park, S. (2004) A Hierarchical Graphical Model for Recognizing Human Actions and Interactions in Video, Disseratation of Ph.D, The University of Texas at Austin Tanibata, N.; Shimada, N. (2002) Extraction of Hand Features for Recognition of Sign Language Words, The 15th International Conference on Vision Interface, pp.391-398 Weida, Z. (2002) Linear programming support vector machines, Pattern Recognition, Vol.35, No.12, pp.2927-2936.
5 How Do Programmers Think? Anthony Cox and Maryanne Fisher
Centre for Psychology and Computing, Saint Mary’s University Canada 1. Introduction Regular expressions are an example of a regular language and are a subset of both the context free and the context sensitive languages. As the syntactic structure of many computer programming languages can be described using a context-free grammar, regular expressions can be viewed as a simplified and restricted programming language. While lacking many of the more sophisticated programming language features (e.g., types, functions), regular expressions still provide users with sequencing, alternation, and iteration constructs, and thus provide an abstract view of basic control-flow features. Concatenation can be regarded as a form of sequencing where the elements of an expression containing concatenation must occur in a specific, linear sequence. The “or“ operator, represented with a vertical bar |, provides alternation and permits one to choose between two options, just as an “if-then-else” statement permits the choice of two alternatives. Finally, the Kleene closure, represented by a superscript asterisk * functions as an iteration operator and performs a role similar to looping constructs (e.g., “do”, “while”, or “for” in the C programming language). Thus, regular expressions can be viewed as simple programs with basic control-flow constructs, but with no explicit data management. Alternatively, regular expressions are used in computer software, such as grep, vi, and Perl, as a mechanism to describe the targets of search operations. For this role, regular expressions provide a pattern description mechanism with pattern matches identifying desired search solutions. For example, the expression [eE][nN][dD] identifies the term “end” where the letters can independently be in upper or lower case (e.g., “eNd", “ENd”, “END”). Regular languages, while being highly formalized and restrictive with regard to their expressiveness, are never-the-less a form of language. It has been documented that humans develop the ability to read before they develop the ability to write (Salvatori, 1983). Consequently, by comparing novice’s skills as they learn to read (i.e., applying) and write (i.e., creating) regular expressions, one can examine the relationship of formal languages to natural languages (e.g., English) and potentially permit research on the use of natural language to be applied to computer programming. Increased understanding of regular expression use can therefore provide understanding of how we, as humans, interact with computers when using formal languages. That is, understanding the cognitive skills associated with lower level formal languages provides insight on the use of higher level imperative style languages such as Fortran, C++, or Pascal.
72
Advances in Human-Computer Interaction
While there has been significant research on algorithms for automated matching and manipulation of regular expressions (Hopcraft & Ullman, 1979), there has been little research on the human element of these systems. Insight into the manipulation of regular expressions provides insight into the manipulation of formal languages, and hence on computer programming—a foundational task in human-computer interaction. In this chapter, we address this deficiency and explore the cognition underlying programming by examining performance on the manipulation of regular expressions. The remainder of this chapter is organised as follows. A brief overview of regular expressions in the context of formal language theory is first provided. Then, we present the first of two studies that we conducted to investigate performance on expression application (i.e., matching) and creation tasks. The results of the second, revised, study are then presented, after which we describe a third study exploring the similarity between regular and Boolean expressions. Finally, the chapter concludes with an examination of some future research directions.
2. Regular Languages and Expressions The Chomsky hierarchy of languages (Chomsky, 1959) orders languages into four classes identified by number. Each class is properly included in all lower numbered classes giving Class 0 the largest number of languages and Class 3 the smallest. In most of the literature, the language classes are identified by alternative names: recursively enumerable or phrase structured (Class 0), context sensitive (Class 1), context free (Class 2) and regular (Class 3). Every language can be defined by a grammar or set of rules that describe valid constructions in the language. The symbols that are combined to form valid constructions are known as the alphabet, Σ, of the language. Thus, given an alphabet and a grammar it is possible to decide whether a specified sequence of symbols is a member of the language described by the grammar. Furthermore, every regular language can be described by a regular expression (Hopcraft & Ullman, 1979). Regular expressions are formed by combining the elements of Σ using three operations: concatenation, alternation and repetition (i.e., the Kleene closure). Figure 1 provides a recursive definition for well-formed regular expressions. Given an alphabet Σ where a ∈ Σ, b ∈ Σ: (1) a is a regular expression (2) ab is a regular expression (3) a|b is a regular expression (4) a* is a regular expression (5) (a) is a regular expression
(Concatenation) (Alternation) (Repetition or Kleene Closure) (Parenthesis)
Figure 1. Well-Formed Regular Expressions Concatenation appends two regular expressions and is the mechanism by which longer expressions are built from shorter ones. Alternation is a selection mechanism with the expression a|b indicating a choice in selecting either the expression a or the expression b but not both (i.e., exclusive or). Repetition describes the set of zero or more successive occurrences of an expression. Parentheses may be used to modify the order that operations are performed (i.e., precedence) or can be used to modify the scope of an operator’s application. Every regular expression is equivalent to a grammar for the corresponding
73
How Do Programmers Think?
regular language and provides a mechanism for defining the language. Languages that are defined using only concatenation and alternation have a finite number of members while languages defined using repetition have an infinite number of members. Hence, for the alphabet, Σ = {a, b, c, d}, Figure 2 provides some examples of well-formed regular expressions and their associated regular languages. (1) (2) (3) (4) (5) (6) (7) (8)
ab defines a*b defines ab* defines (ab)* defines a|b defines ab|cd defines (ab)|(cd) defines (a|b)* defines
{ab} {b, ab, aab, aaab, …} {a, ab, abb, abbb, …} {λ, ab, abab, ababab, …} {a, b} {abd, acd} {ab, cd} {λ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, …}
Figure 2. Example Regular Expressions and Their Languages In Figure 2, it can be seen that λ, the empty string, is a valid member of some languages. In our studies, we wished to avoid issues in coding results that contain λ. Consequently, we use a modified version of regular expressions that replaces the *, zero or more, operator with the +, one or more, operator. This change, apart from excluding λ as an element of any defined language, has no other effect on the expressivity of regular expressions. Regular languages are simple enough to be easily defined but provide sufficient flexibility for describing the results of searches. It is for this role that regular expressions are best known in the field of computer science. Another mechanism for specifying search results is Boolean algebra, as used in many information retrieval and world wide web search tools. Boolean algebra also provides an alternation (i.e., or) operator, but replaces concatenation with a conjunction (i.e., and) operator. The repetition operator does not exist in Boolean algebra, but a negation (i.e., not) operator is available. The use of Boolean algebra to specify search results has been previously studied by Green et al. (1990). Although Boolean algebra, a logical calculus for two valued systems, and regular expressions, a restricted class of formal language, are significantly different, the common use of an alternation operator and their application for similar roles provides a link between them. The studies presented here can be seen as a first attempt at examining the relationship between the skills used in the manipulation of each system. Study 3 uses performance times to explore this similarity. To measure performance when manipulating (e.g., creating and applying) regular expressions, we adopted the information retrieval measures of precision, otherwise known as accuracy, and recall, which is also called completeness. Precision and recall have been previously used to measure performance of Boolean search specifications (Turtle, 1994). For a search that returns a set of solutions S, where C is the complete set of possible solutions, P the precision of the search, and hence of the search specification, is defined as:
P=
|S
∩C | .
|S|
(1)
74
Advances in Human-Computer Interaction
In the above equation, the notation |S| is used to identify the cardinality or size of the set S (i.e., number of members in S). Precision measures the fraction of the search results that are accurate or correct. Recall measures the completeness of the search result and is the fraction of the correct results with respect to the total possible results. For the same set of solutions, R the recall of the search, is defined as:
R=
|S
∩C | .
|C |
(2)
Specifically, our research addresses four distinct issues. First, we explore the effects of using different granularities for measuring precision and recall. It is possible that evaluating results at the character level is under-sensitive since small or single character errors may not significantly affect results. Conversely, evaluating solutions as a whole and thus at a higher level of granularity may be overly sensitive with respect to small errors. The exploration of multiple levels of granularity is intended to identify experimental results that can be attributed to overly sensitive, or conversely, insensitive measures. For example, given the string: xxxyzzzxxxyzzxx and the pattern xyz, the participant response, where the participant has underlined the matches to the pattern: xxxyzzzxxxyzzxx has a precision of .857 at the character level (6 of 7 characters correct) and a precision of .5 at the substring level (1 of 2 solution substrings correct). The term substring is used to indicate that match elements are substrings of the data string. To explore the relationship between these granularities, precision and recall values were calculated at both the character and substring level and then compared. Second, we investigated the relationship between precision and recall to identify the use of specific strategies from an information retrieval perspective. We explored whether participants used a conservative strategy to improve precision at the expense of recall, or an aggressive strategy that improved recall at the expense of precision. For example, the conservative omission of a suspect, but correct solution, will have no effect upon precision, but will lower recall. We believe that regular expression use is more like natural language use than like information retrieval and will therefore demonstrate a consistent relationship between precision and recall that is not found when performing an information retrieval task. As indicated by Salvatori (1983), writing skill can be increased by improving reading skill, but the two skills are not completely related. That is, writing and reading abilities may increase independently, which indicates their basis in different, but related, cognitive skills. Rouet (2006) classes reading comprehension as a restricted form of literacy that precedes the functional literacy needed to synthesise and express (i.e., write) ideas. Thus, it is likely that there is a relationship between pattern matching, which resembles reading in that an existing “sentence” must be understood, and pattern creation, which, like writing, requires the development of new sentences. We expect novice’s matching ability to be better than their pattern creation ability in the same way that one’s reading skills develop before one’s writing skills.
How Do Programmers Think?
75
Third, we therefore hypothesise that the incidental learning that occurs during each study will improve reading ability for regular expressions, but not writing ability, as writing skill develops more slowly than reading skill and requires more developed cognitive abilities (Salvatori, 1983). That is, participants may advance to the interpreting level of the Wilkinson Cognition Measure (Wilkinson, 1979) for expression matching, but not to the generalising level needed for accurate pattern formation. In previous research, Ledgard et al. (1980) explored the hypothesis that making computer languages more like natural languages improves their ease of use, thus suggesting that the two types of language have some similarity that permits natural language skills to be employed when working with formal languages. However, Blackwell (2000) found that graphical notations for regular expressions exhibited improved usability over a conventional textual notation. Blackwell's finding indicates that, although there is a relationship between formal and natural language, the “content” of some formal languages (e.g., regular expressions) is better represented using other notations. On consideration of these findings, we believe that there is significant difference in the content of formal and natural language. Fourth, it is known that in Boolean algebra the alternation operator is more difficult to use than the conjunction operator (Greene et al., 1990; Vakkari, 2000). We hypothesise that this effect will also appear in the context of regular expressions. Finding this effect would provide evidence of similar skills being applied when using the alternation operator, regardless of the context of use.
3. Study 1 Study 1 was our initial attempt at exploring the relationship between pattern application and pattern creation. The study provides some preliminary evidence that the manipulation of regular expressions is dissimilar to information retrieval. 3.1 Participants Participants, from a diverse set of ethnic and socioeconomic backgrounds, were recruited as volunteers from various psychology classes at a major Canadian university located in a large metropolitan city. The final sample included 36 participants (age in years, M = 20.65, SD = 2.23), excluding 5 surveys we omitted due to a clearly indicated lack of task comprehension. We considered a participant as not understanding the task if he or she had less than 3 correct responses for 20 items. All participants reported that they had no previous programming experience, therefore mitigating any confounds introduced by prior experience or training in formal or programming language use. 3.2 Stimuli and Procedure Participants were given a four-part survey. In part one, participants were given 3 minutes to study an instruction sheet that explained the formation of regular expressions. The instruction sheet was not taken from the participants and the experimenter suggested that it could be consulted for reference when completing the remainder of the survey. In part two, participants were given 5 minutes to complete a pattern matching task. Participants were instructed to underline all occurrences of a pattern in a given string of
76
Advances in Human-Computer Interaction
characters. There were 10 items in the task, each having a different pattern and string. Figure 3 provides an example of a matching item. Matching
Pattern: bg String: acdbggbcgbgbedccdfabagabadefbgcccfeedbbbbbgcbabcdgcef
Creation
A sequence of c’s containing one f and that begins and ends with a c. e.g., cfc, ccfc, cfcc, cccfc, ccfcc, cfccc, …
Figure 3. Sample Task Items for Study One In part three, participants were given 5 minutes to complete a pattern creation task. Participants were presented with a written description of a search solution and asked to create a regular expression that matched the solution. For the last 7 (i.e., more complex) items, examples of possible matches were provided to supplement the written description. An example of a creation task item is shown in Figure 3. The order of presentation for the matching and creation tasks was counter-balanced, such that half of the participants received the matching task first, and the rest received the creation task first. In part four, participants answered a few demographic and follow-up questions. The demographic items addressed the age and sex of the participants while the follow-up questions examined their satisfaction with the instruction sheet and opinions on the relative difficulty of the tasks. The generation of precision and recall values for the matching task is accomplished by counting the number of attempted, and the subset of correct solutions, and forming the appropriate ratios. For the creation task, the created strings were applied to a set of arbitrarily constructed representative strings and the precision and recall values calculated. The representative strings were generated by the same experimenter as the data strings of the matching task with the intent that both sets of strings contain similar character orderings and constructions. 3.3 Results There were three hypotheses for Study 1. First, we predicted a difference in performance based on the granularity of recording pattern matches. Second, we hypothesised the existence of a relationship between precision and recall measures. Third, we predicted a relationship between pattern matching and creation abilities. Due to the number of comparisons, we adopted a conservative significance level of α = .01 to reduce the possibility of creating a Type I error. As well, because of the unspecified direction of some hypotheses, all reported analyses are two-tailed. To test the first hypothesis, the possibility of differences in performance due to granularity, we conducted paired-samples t-tests for precision and recall scores at the character and substring levels. Individual mean performance on character precision was significantly higher than substring precision, t(35) = 12.82, p < .000. Character precision yielded M = 0.88 (SD = 0.08) whereas substring precision yielded M = 0.69 (SD = 0.12). Individual mean character recall was also significantly higher than substring recall, t(35) = 13.67, p .01. Also in contrast with Study 1, there was a significant relationship between matching and creation, r(63) = 0.64, p < .000. To ensure that these findings were not due to an order effect, a repeated measures Analysis of Variance (ANOVA) was conducted. This analysis yielded non-significant results for the main effect of task, F(1,62) = 0.72, p > .01, and for the interaction of the task and version; F(1,62) = 0.16, p > .01. To assess the relationships between the tasks of pattern creation and matching and the performance measures of recall and precision, we performed four paired-samples t-tests. First, we paired creation precision with matching precision to examine the influence of task on precision scores, yielding t(63) = 1.67, p > .01. Second, we paired creation recall with matching recall, again to investigate the influence of task on recall scores, yielding t(63) =
80
Advances in Human-Computer Interaction
3.03, p < .01. Third, we paired creation precision with creation recall to examine the influence of performance within a task, resulting in t(63) = 1.05, p > .01. Fourth, we paired matching precision with matching recall, again to assess the influence of performance within a task, yielding t(63) = 7.33, p < .000. Paired-sample correlations resulted in significant relationships for all comparisons (p < .000); creation precision with matching precision r = 0.63, creation recall with matching recall, r = 0.58, creation precision and recall, r = 0.97, and matching precision and recall, r = 0.81. Finally, we examined the differences in performance on items containing alternation or repetition in the creation and matching tasks. For the creation task, analysis indicated significant differences between alternation and repetition items, t(63) = 3.09, p < .01. Alternation items resulted in lower values than repetition items, M = 0.71 (SD = 0.33) and M = 0.83 (SD = 0.21), respectively. We also compared alternation items with items containing both alternation and repetition, t(61) = 0.06, p > .01. A final comparison of repetition items with items containing both alternation and repetition revealed a significant difference, t(61) = 3.35, p < .01. Items with both operators resulted in significantly lower values than alternation items, M = 0.70 (SD = 0.28) and M = 0.84 (SD = 0.20). The same pattern emerged for the matching task. Analysis identified significant differences between alternation and repetition items, t(62) = 3.75, p < .000. Alternation resulted in significantly lower scores, M = 0.72 (SD = 0.19), than repetition, M = 0.82 (SD = 0.18). A comparison of alternation with items containing both repetition and alternation revealed no significant difference, t(58) = 1.28, p > .01. Finally a comparison of repetition with items containing both repetition and alternation resulted in significant differences, t(39) = 5.29, p < .000. Repetition resulted in higher scores, M = 0.82 (SD = 0.17), than items containing both operators M = 0.68 (SD = 0.19). 4.4 Discussion The modifications to the instruction sheet changed the participants' reported satisfaction with the instruction sheet from 44.4% (Study 1) to 70.4% (Study 2). Anecdotal reports during debriefing indicated that the addition of an example suite was the primary cause of the participants' increased satisfaction. The replication of the correlation between character level and substring level measures provides additional evidence of the exchangeability of the two scores. Future researchers may use either scoring technique without affecting results. However, it should be noted that the high sensitivity of substring level scores, and the associated lower mean for substring level than for character level scores, may obscure small effects in performance. Replication of the correlation between precision and recall strongly suggests the absence of any significant individual strategy differences. As a community sample was used, it is unlikely that the correlation was due to any specific occupational factor (i.e., participants being students). Participants tend to use a conservative strategy and favour accuracy over completeness. While strategy differences may exist, they are displayed with respect to the amount of conservatism a specific participant employed. No evidence exists for the use of an aggressive strategy favouring recall over precision. During debriefing, participants indicated that they focused on the actual formation of patterns and not on the strings identified by the patterns. Thus, there was no evidence to indicate a relationship between regular expression use and information retrieval skills.
How Do Programmers Think?
81
There was no significant difference in the means for the matching and creation tasks, unlike in Study 1. Consequently, when the difference in task difficulty was removed, performance on matching was found to correlate with that of creation. This correlation is suggestive of a common skill set being used for both tasks. The lack of an order effect also indicates the lack of a practice effect where the first task provides practice for the second. We believe that the lack of feedback given after the first task prevented individuals from improving their skill in expression manipulation. As found by Salvatori (1983), humans develop the ability to read before they develop the ability to write and that the abilities to read and write are related but not correlated. Thus, it is unlikely that participants’ skills when manipulating regular expressions, and hence formal language, are related to our skills for reading and writing natural language. The historic relationship of computer science to mathematics provides evidence of the similarity between the two fields. Debriefing of the participants determined that they viewed regular expressions according to their formation rules and considered them as a rule-based system, much like formal mathematics. Consequently, computer programming and formal language manipulation is more akin to a rule-based system than to a new and novel language. As it is not possible to speak or hear a formal language and thereby invoke the related cognitive abilities, there is considerable difference between programming and natural languages. This difference, along with how formal language is taught and presented, likely leads to the application of different cognitive skills when programming as opposed to when using natural language. Our findings confirm the hypothesis that alternation is more difficult than concatenation or repetition. To ensure that the alternation operator was the cause of the effect we divided items into three groups, those containing only the alternation operator, those containing only the repetition operator and those containing both. The results indicate that there was a difference between the repetition and alternation group and between the repetition and both operator group. However, there was no difference between the alternation and both operator group. This finding indicates that it is the presence of the alternation operator that is responsible for the difference and not some unidentified form of operator interaction.
5. Study 3 Study 3 further explored the differences between regular and Boolean expressions, for which it has been found that using alternation is more time consuming, and hence difficult, than using conjunction and negation. We expected to find an analogous result and hypothesised that matching regular expressions containing alternation is slower than for expressions involving concatenation and repetition. 5.1 Participants Participants were computer science students solicited at a Canadian university. A total of 32 participants (age, in years, M = 22.56, SD = 3.68) completed the study, but 2 participants were excluded as they were unable to correctly complete more than 3 of the 20 experimental items. The participants were predominantly male (N = 2 for females) due to the current under-representation of females in informatics disciplines.
82
Advances in Human-Computer Interaction
5.2 Stimuli and Procedure Participants were asked to locate solutions to a given expression in a target string. Customised software was used to present both the expression to match and the target string containing potential solutions. Beneath the string, a sequence of 7 buttons was provided for participants to select the number of matches in the target (None, 1, 2, 3, 4, 5, More than 5). Twenty expressions were presented in random order, with participants controlling when the exposure to each item was to begin and the software measuring the time needed to identify a solution. During analysis, only correct solutions were used since solution times for incorrect results were believed to be inaccurate. The 20 items were divided into 4 subsets of 5 items: (C) concatenation only, (CR) concatenation and repetition, (CA) concatenation and alternation, and (CAR) concatenation, alternation, and repetition. 5.3 Results We hypothesised that, similar to the reported results for Boolean expressions, users would take longer to correctly identify matches containing alternation operators. Table 1 shows the mean times in seconds for correctly identifying solutions to the 5 items in each subset. Group C CR CA CAR
N 147 139 141 126
Mean 5.31 7.52 8.86 11.96
SD 1.30 1.74 2.45 2.96
Table 1. Descriptive Statistics for Study Three Paired-samples t-tests were used to examine the effect that alternation had on performance. The mean time for items containing alternation (CA and CAR) was significantly longer than for items not containing alternation (C and CR) t(29) = 14.18, p < .000. To ensure that the difference was not due to expression complexity, the subsets CA and CR were compared and CA was found to have a significantly longer solution times than CR, t(29) = 6.37, p